Contents

The objective of tpSVG is to detect spatially variable genes (SVG) when analyzing spatially-resolved transcriptomics data. This includes both unsupervised features where there’s not additional information is supplied besides the (normalized) gene counts and spatial coordinates, but also the spatial variation explained besides some covariates, such as tissue anatomy or possibly cell type composition.

Compared to previous SVG detection tools, tpSVG provides a scalable solution to model the gene expression as counts instead of logarithm-transformed counts. While log transformation provides convenience to model the spatial gene expression by mapping count data to the continuous domain, hence enabling well-understood Gaussian models, log transformation distorts low expressed genes counts and create bias populating high-expressed genes. For example, the rank of genes based on their effect size are commonly used for dimensional reduction, or its input. Hence, estimating gene ranking correctly is very important. Gaussian models, exemplified with nnSVG, often dissociates the mean-variance relationship which is commonly assumed for counts data, and hence often prioritizes the highly expressed genes over the lowly expressed genes. In the figure below, we saw that nnSVG is susceptible to such mean-rank relationship, meaning highly expressed genes are often ranked highly. In contrast, the proposed tpSVG with Poisson distribution is not susceptible to this mean-rank relationship when examining the DLPFC dataset.