1 Notice to users using OSX with R-devel

SpatialCPie depends on the ggiraph package, for which their is no r-devel version available at the moment. This means that SpatialCPie will not work on OSX with R-devel.

2 Introduction

SpatialCPie is an R package designed to facilitate cluster evaluation for Spatial Transcriptomics (ST) data by providing intuitive visualizations that display the relationship between clusters in order to guide the user during cluster identification, selection and further downstream applications

3 Usage example


3.1 Input data

In this example, we will use a downsampled dataset from an experiment on the human heart. We begin by loading the count data1 Generated by the ST pipeline:

counts <- read.table(
    system.file("extdata", "counts.tsv", package = "SpatialCPie"),
    sep = "\t",
    check.names = FALSE
counts[1:5, 1:5]
# >        3x34  3x30  3x31  3x32  3x33
# > ACTB  2.511 2.116 2.910 3.792 2.432
# > CD74  1.744 2.323 1.666 3.061 0.000
# > CFL1  0.000 2.116 1.666 1.909 2.432
# > CST3  3.009 2.664 0.000 3.061 2.925
# > ERBB2 1.744 3.461 0.000 3.473 3.584

To overlay the data on the tissue, we also need to load the tissue image and its corresponding spot data2 Generated by the ST spot detector., which specifies the pixel coordinates of each spot:

tissue <- jpeg::readJPEG(
    system.file("extdata", "he_image.jpg", package = "SpatialCPie")
spots <- parseSpotFile(
    system.file("extdata", "spot_data.tsv", package = "SpatialCPie")
# >             x        y
# > 11x9 149.3096 119.4788
# > 12x9 166.1704 120.5682
# > 13x9 180.5411 120.4645
# > 14x9 195.2749 120.3607
# > 15x9 209.6456 120.3607
# > 16x9 225.3132 120.9314

3.2 Preprocessing

Typically, it’s good to conduct some data filtering prior to the analysis: This could include removing spots that are outside of the tissue or removing spots or genes that have a low number of reads.

Since the spot file only contains the spots that are under the tissue, we can use it to subset the counts:

counts <- counts[, which(colnames(counts) %in% rownames(spots))]

Let’s also remove all spots and genes that have less than 20 reads in total:

repeat {
    d <- dim(counts)
    counts <- counts[rowSums(counts) >= 20, colSums(counts) >= 20]
    if (all(dim(counts) == d)) {

3.3 Computing cluster assignments

The cluster assignments will be calculated during the loading of the gadget. The cluster assignments are labels assigned to each spot over different cluster resolutions, where we use the terminology “(cluster) resolution k” to refer to a partitioning of the spots into \(k\) clusters.

By default, clustering is performed with base::kmeans function at resolution 2:4. The user can specify the number of cluster resolutions with the resolution argument. The algorithm for clustering can also be specified with the assignmentFunction argument. Use ?runCPie for a complete list of function options.

3.4 Visualization

We are now ready to use the gadget to visualize the data:

result <- runCPie(
    image = tissue,
    spotCoordinates = spots

This opens up the gadget window, which has two main elements: the cluster tree and the spatial array plots. The cluster tree is interactive and by selecting rows of nodes, the corresponding spatial array plots are also displayed. These can be viewed by scrolling down. The information contained in these plots will be discussed in the next section. There are a number of parameters that can be changed within the gadget which affects the visual representation.

To exit the gadget, press the “Done” button. The plots in the gadget and the cluster assignments are stored in the output object, which has the following structure:

str(result, max.level = 1)
# > List of 3
# >  $ clusters:'data.frame':   2400 obs. of  3 variables:
# >  $ treePlot:List of 10
# >   ..- attr(*, "class")= chr [1:2] "gg" "ggplot"
# >  $ piePlots:List of 3

3.4.1 Cluster tree


The cluster tree is a visual representation of how the spatial features transition from clusters of lower resolution to clusters of higher resolution. The edge opacities show the proportion of spots that transition to or from a given cluster while node radii display the size of each cluster.

When launching the gadget, spots are relabeled so as to minimize the number of crossovers (label switches) between resolutions in the data. Moreover, cluster color labels are selected so that dissimilar clusters are farther away in color space.

3.4.2 Spatial array plots