Contents

1 Installation

# Install the package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("Statial")

2 Load packages

# Loading required packages
library(Statial)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(dplyr)
library(SingleCellExperiment)
library(ggplot2)
library(ggsurvfit)
library(survival)
library(tibble)
library(treekoR)

theme_set(theme_classic())
nCores <- 1

3 Overview

There are over 37 trillion cells in the human body, each taking up different forms and functions. The behaviour of these cells can be described by canonical characteristics, but their functions can also dynamically change based on their environmental context. Understanding the interplay between cells is key to understanding their mechanisms of action and how they contribute to human disease. Statial is a suite of functions to quantify the spatial relationships between cell types. This guide will provide a step-by-step overview of some key functions within Statial.

4 Loading example data

To illustrate the functionality of Statial we will use a multiplexed ion beam imaging by time-of-flight (MIBI-TOF) dataset profiling tissue from triple-negative breast cancer patients\(^1\) by Keren et al., 2018. This dataset simultaneously quantifies in situ expression of 36 proteins in 34 immune rich patients. Note: The data contains some “uninformative” probes and the original cohort included 41 patients.

These images are stored in a SingleCellExperiment object called kerenSCE. This object contains 57811 cells across 10 images and includes information on cell type and patient survival.

Note: The original dataset was reduced down from 41 images to 10 images for the purposes of this vignette, due to size restrictions.

# Load breast cancer
data("kerenSCE")

kerenSCE
#> class: SingleCellExperiment 
#> dim: 48 57811 
#> metadata(0):
#> assays(1): intensities
#> rownames(48): Na Si ... Ta Au
#> rowData names(0):
#> colnames(57811): 1 2 ... 171281 171282
#> colData names(8): x y ... Survival_days_capped Censored
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

5 Kontextual: Context aware spatial relationships

Kontextual is a method for performing inference on cell localisation which explicitly defines the contexts in which spatial relationships between cells can be identified and interpreted. These contexts may represent landmarks, spatial domains, or groups of functionally similar cells which are consistent across regions. By modelling spatial relationships between cells relative to these contexts, Kontextual produces robust spatial quantifications that are not confounded by biases such as the choice of region to image and the tissue structure present in the images.

In this example we demonstrate how cell type hierarchies can be used as a means to derive appropriate “contexts” for the evaluation of cell localisation. We then demonstrate the types of conclusions which Kontextual enables.

5.1 Using cell type hierarchies to define a “context”

A cell type hierarchy may be used to define the “context” in which cell type relationships are evaluated within. A cell type hierarchy defines how cell types are functionally related to one another. The bottom of the hierarchy represents homogeneous populations of a cell type (child), the cell populations at the nodes of the hierarchy represent broader parent populations with shared generalised function. For example CD4 T cells may be considered a child population to the Immune parent population.

There are two ways to define the cell type hierarchy. First, they can be defined based on our biological understanding of the cell types. We can represent this by creating a named list containing the names of each parent and the associated vector of child cell types.

Note: The all vector must be created to include cell types which do not have a parent e.g. the undefined cell type in this data set.

# Examine all cell types in image
unique(kerenSCE$cellType)
#>  [1] "Keratin_Tumour" "CD3_Cell"       "B"              "CD4_Cell"      
#>  [5] "Dc/Mono"        "Unidentified"   "Macrophages"    "CD8_Cell"      
#>  [9] "other immune"   "Endothelial"    "Mono/Neu"       "Mesenchymal"   
#> [13] "Neutrophils"    "NK"             "Tumour"         "DC"            
#> [17] "Tregs"

# Named list of parents and their child cell types
biologicalHierarchy = list(
  "tumour" = c("Keratin_Tumour", "Tumour"),
  "tcells" = c("CD3_Cell", "CD4_Cell", "CD8_Cell", "Tregs"),
  "myeloid" = c("Dc/Mono", "DC", "Mono/Neu", "Macrophages", "Neutrophils"),
  "tissue" = c("Endothelial", "Mesenchymal")
)

# Adding more broader immune parent populationse
biologicalHierarchy$immune = c(biologicalHierarchy$bcells,
                               biologicalHierarchy$tcells,
                               biologicalHierarchy$myeloid,
                               "NK", "other immune", "B")


# Creating a vector for all cellTypes
all <- unique(kerenSCE$cellType)

Alternatively, you can use the treeKor bioconductor package treekoR to define these hierarchies in a data driven way.

Note: These parent populations may not be accurate as we are using a small subset of the data.

# Calculate hierarchy using treekoR
kerenTree <- treekoR::getClusterTree(t(assay(kerenSCE, "intensities")),
                            kerenSCE$cellType,
                            hierarchy_method="hopach",
                            hopach_K = 1)

# Convert treekoR result to a name list of parents and children.
treekorParents = getParentPhylo(kerenTree)

treekorParents
#> $parent_1
#> [1] "Dc/Mono"     "Macrophages" "NK"         
#> 
#> $parent_2
#> [1] "CD3_Cell" "B"        "CD4_Cell" "CD8_Cell" "Tregs"   
#> 
#> $parent_3
#> [1] "Endothelial" "Mesenchymal" "DC"         
#> 
#> $parent_4
#> [1] "Unidentified" "other immune" "Mono/Neu"     "Neutrophils"  "Tumour"

5.2 Application on triple negative breast cancer image.

Here we examine an image highlighted in the Keren et al. 2018 manuscript where accounting for context information enables new conclusions. In image 6 of the Keren et al. dataset, we can see that p53+ tumour cells and immune cells are dispersed i.e. these two cell types are not mixing. However we can also see that p53+ tumour cells appear much more localised to immune cells relative to the tumour context (tumour cells and p53+ tumour cells).

# Lets define a new cell type vector
kerenSCE$cellTypeNew <- kerenSCE$cellType

# Select for all cells that express higher than baseline level of p53
p53Pos <- assay(kerenSCE)["p53", ] > -0.300460

# Find p53+ tumour cells
kerenSCE$cellTypeNew[kerenSCE$cellType %in% biologicalHierarchy$tumour] <- "Tumour"
kerenSCE$cellTypeNew[p53Pos & kerenSCE$cellType %in% biologicalHierarchy$tumour] <- "p53_Tumour"

# Group all immune cells under the name "Immune"
kerenSCE$cellTypeNew[kerenSCE$cellType %in% biologicalHierarchy$immune] <- "Immune"

# Plot image 6
kerenSCE |>
  colData() |>
  as.data.frame() |>
  filter(imageID == "6") |>
  filter(cellTypeNew %in% c("Immune", "Tumour", "p53_Tumour")) |>
  arrange(cellTypeNew) |>
  ggplot(aes(x = x, y = y, color = cellTypeNew)) +
  geom_point(size = 1) +
  scale_colour_manual(values = c("Immune" = "#505050", "p53_Tumour" = "#64BC46", "Tumour" = "#D6D6D6")) +
  guides(colour = guide_legend(title = "Cell types", override.aes = list(size = 3)))

Kontextual accepts a SingleCellExperiment object, a single image, or list of images from a SingleCellExperiment object, which gets passed into the cells argument. The two cell types which will be evaluated are specified in the to and from arguments. A parent population must also be specified in the parent argument, note the parent cell population must include the to cell type. The argument r will specify the radius which the cell relationship will be evaluated on. Kontextual supports parallel processing, the number of cores can be specified using the cores argument. Kontextual can take a single value or multiple values for each argument and will test all combinations of the arguments specified.

We can calculate these relationships across all images for a single radius (r = 100). Small radii will examine local spatial relationships, whereas larger radii will examine global spatial relationships.

p53_Kontextual <- Kontextual(
  cells = kerenSCE,
  r = 100,
  from = "Immune",
  to = "p53_Tumour",
  parent = c("p53_Tumour", "Tumour"),
  cellType = "cellTypeNew"
)

p53_Kontextual
#>    imageID               test   original  kontextual   r inhomL
#> 1        1 Immune__p53_Tumour -16.212016  -1.6815952 100  FALSE
#> 2       14 Immune__p53_Tumour -14.671281  -4.2879138 100  FALSE
#> 3       18 Immune__p53_Tumour  -1.953366   0.5795853 100  FALSE
#> 4       21 Immune__p53_Tumour -14.300802  -7.1425133 100  FALSE
#> 5       29 Immune__p53_Tumour -20.728463  -7.0172785 100  FALSE
#> 6        3 Immune__p53_Tumour   1.719549  44.5060581 100  FALSE
#> 7       32 Immune__p53_Tumour -18.174569 -10.8972277 100  FALSE
#> 8       35 Immune__p53_Tumour -75.980619 -66.2395276 100  FALSE
#> 9        5 Immune__p53_Tumour         NA          NA 100  FALSE
#> 10       6 Immune__p53_Tumour -24.897348  -1.2724241 100  FALSE

The kontextCurve function plots the L-function value and Kontextual values over a range of radii. If the points lie above the red line (expected pattern) then localisation is indicated for that radius, if the points lie below the red line then dispersion is indicated.

As seen in the following plot the L-function produces negative values over a range of radii, indicating that p53+ tumour cells and immune cells are dispersed from one another. However by taking into account the tumour context, Kontextual shows positive values over some radii, indicating localisation between p53+ tumour cells and immune cells.

curves <- kontextCurve(
  cells = kerenSCE,
  from = "Immune",
  to = "p53_Tumour",
  parent = c("p53_Tumour", "Tumour"),
  rs = seq(50, 510, 50),
  image = "6",
  cellType = "cellTypeNew",
  cores = nCores
)

kontextPlot(curves)

Alternatively all pairwise cell relationships and their corresponding parent in the dataset can be tested. A data frame with all pairwise combinations can be creating using the parentCombinations function. This function takes in a vector of all the cells, as well as the named list of parents and children created earlier in the parentList argument. As shown below the output is a data frame specifying the to, from, and parent arguments for Kontextual.

Note: the output of getPhyloParent may also be using the in the parentList argument, for example if you wanted to use the treekoR defined hierarchy instead.

# Get all relationships between cell types and their parents
parentDf <- parentCombinations(
  all = all,
  parentList = biologicalHierarchy
)

5.3 Calculating all pairwise relationships

Rather than specifying to, from, and parent in Kontextual, the output from parentCombinations can be inputed into Kontextual using the parentDf argument, to examine all pairwise relationships in the dataset. This chunk will take a significant amount of time to run, for demonstration the results have been saved and are loaded in.

# Running Kontextual on all relationships across all images.
kerenKontextual <- Kontextual(
  cells = kerenSCE,
  parentDf = parentDf,
  r = 100,
  cores = nCores
)

For every pairwise relationship (named accordingly: from__to__parent) Kontextual outputs the L-function values (original) and the Kontextual value. The relationships where the L-function and Kontextual disagree (e.g. one metric is positive and the other is negative) represent relationships where adding context information results in different conclusions on the spatial relationship between the two cell types.

data("kerenKontextual")

head(kerenKontextual, 10)
#>    imageID                      test   original  kontextual   r inhomL
#> 1        1 Keratin_Tumour__B__immune -32.547645 -20.8129718 100  FALSE
#> 2       14 Keratin_Tumour__B__immune         NA          NA 100  FALSE
#> 3       18 Keratin_Tumour__B__immune  -2.879684  -0.4266132 100  FALSE
#> 4       21 Keratin_Tumour__B__immune         NA          NA 100  FALSE
#> 5       29 Keratin_Tumour__B__immune         NA          NA 100  FALSE
#> 6        3 Keratin_Tumour__B__immune -36.175444 -15.4940620 100  FALSE
#> 7       32 Keratin_Tumour__B__immune -43.187880 -40.6868426 100  FALSE
#> 8       35 Keratin_Tumour__B__immune -66.782273 -46.2862443 100  FALSE
#> 9        5 Keratin_Tumour__B__immune -68.676955 -46.3064625 100  FALSE
#> 10       6 Keratin_Tumour__B__immune -31.393046  -0.4636465 100  FALSE

5.4 Associating the relationships with survival outcomes.

To examine whether the features obtained from Statial are associated with patient outcomes or groupings, we can use the spicy function from the spicyR package. spicy requires the SingleCellExperiment object being used to contain a column called survival.

# add survival column to kerenSCE
kerenSCE$event = 1 - kerenSCE$Censored
kerenSCE$survival = Surv(kerenSCE$Survival_days_capped, kerenSCE$event)

In addition to this, the Kontextual results must be converted from a data.frame to a wide matrix, this can be done using prepMatrix. Note, to extract the original L-function values, specify column = "original" in prepMatrix.

# Converting Kontextual result into data matrix
kontextMat <- prepMatrix(kerenKontextual)

# Ensuring rownames of kontextMat match up with the image IDs of the SCE object
kontextMat <- kontextMat[kerenSCE$imageID |> unique(), ]

# Replace NAs with 0
kontextMat[is.na(kontextMat)] <- 0

Finally, both the SingleCellExperiment object and the Kontextual matrix are passed into the spicy function, with condition = "survival". The resulting coefficients and p values can be obtained by accessing the survivalResults name.

Note: You can specify additional covariates and include a subject id for mixed effects survival modelling, see \code{ for more information.

# Running survival analysis
survivalResults = spicy(cells = kerenSCE,
                        alternateResult = kontextMat,
                        condition = "survival",
                        weights = TRUE)

head(survivalResults$survivalResults, 10)
#> # A tibble: 10 × 4
#>    test                           coef se.coef   p.value
#>    <chr>                         <dbl>   <dbl>     <dbl>
#>  1 DC__NK__immune               -0.209    283. 2.95e-178
#>  2 NK__DC__immune               -0.209    285. 3.81e-176
#>  3 NK__DC__myeloid              -0.209    285. 3.81e-176
#>  4 NK__Tumour__tumour           -0.204    233. 1.20e-151
#>  5 Tumour__NK__immune           -0.232    303. 1.68e-127
#>  6 CD4_Cell__Tumour__tumour     -0.189    188. 8.17e-125
#>  7 other immune__Tumour__tumour -1.84     387. 4.79e-100
#>  8 Tumour__CD4_Cell__immune     -0.271    306. 9.86e- 87
#>  9 Tumour__CD3_Cell__immune     -0.709    264. 2.35e- 73
#> 10 other immune__DC__immune     -1.39     396. 1.77e- 67

The survival results can also be visualised using the signifPlot function.

signifPlot(survivalResults)

As we can see from the results DC__NK__immune is the one of the most significant pairwise relationship which contributes to patient survival. That is the relationship between dendritic cells and natural killer cells, relative to the parent population of immune cells. We can see that there is a negative coefficient associated with this relationship, which tells us an increase in localisation of these cell types relative to immune cells leads to better survival outcomes for patients.

The association between DC__NK__immune and survival can also be visualised on a Kaplan-Meier curve. First, we extract survival data from the SingleCellExperiment object and create a survival vector.

# Extracting survival data
survData <- kerenSCE |>
  colData() |>
  data.frame() |>
  select(imageID, survival) |>
  unique()

# Creating survival vector
kerenSurv <- survData$survival
names(kerenSurv) <- survData$imageID

kerenSurv
#>     1     3     5     6    14    18    21    29    32    35 
#>  2612 3130+ 1683+ 2275+ 4145+ 5063+   635  1319 1568+ 2759+

Next, we extract the Kontextual values of this relationship across all images. We then determine if dendritic and natural killer cells are relatively attracted or avoiding in each image by comparing the Kontextual value in each image to the median Kontextual value.

Finally, we plot a Kaplan-Meier curve using the ggsurvfit package. As shown below, when dendritic and natural killer cells are more localised to one another relative to the immune cell population, patients tend to have better survival outcomes.

# Selecting most significant relationship
survRelationship <- kontextMat[["DC__NK__immune"]]
survRelationship <- ifelse(survRelationship > median(survRelationship), "Localised", "Dispersed")

# Plotting Kaplan-Meier curve
survfit2(kerenSurv ~ survRelationship) |>
  ggsurvfit() +
  ggtitle("DC__NK__immune")

6 SpatioMark: Identifying continuous changes in cell state

Changes in cell states can be analytically framed as the change in abundance of a gene or protein within a particular cell type. We can use marker expression to identify and quantify evidence of cell interactions that catalyse cell state changes. This approach measures how protein markers in a cell change with spatial proximity and abundance to other cell types. The methods utilised here will thereby provide a framework to explore how the dynamic behaviour of cells are altered by the agents they are surrounded by.

6.1 Continuous cell state changes within a single image

The first step in analysing these changes is to calculate the spatial proximity (getDistances) and abundance (getAbundances) of each cell to every cell type. These values will then be stored in the reducedDims slot of the SingleCellExperiment object under the names distances and abundances respectively.

kerenSCE <- getDistances(kerenSCE,
  maxDist = 200,
  nCores = 1
)

kerenSCE <- getAbundances(kerenSCE,
  r = 200,
  nCores = 1
)

First, let’s examine the same effect observed earlier with Kontextual - the localisation between p53-positive keratin/tumour cells and macrophages in the context of total keratin/tumour cells for image 6 of the Keren et al. dataset.

Statial provides two main functions to assess this relationship - calcStateChanges and plotStateChanges. We can use calcStateChanges to examine the relationship between 2 cell types for 1 marker in a specific image. In this case, we’re examining the relationship between keratin/tumour cells (from = Keratin_Tumour) and macrophages (to = "Macrophages") for the marker p53 (marker = "p53") in image = "6". We can appreciate that the fdr statistic for this relationship is significant, with a negative tvalue, indicating that the expression of p53 in keratin/tumour cells decreases as distance from macrophages increases.

stateChanges <- calcStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "6",
  from = "Keratin_Tumour",
  to = "Macrophages",
  marker = "p53",
  nCores = 1
)

stateChanges
#>   imageID primaryCellType otherCellType marker         coef      tval
#> 1       6  Keratin_Tumour   Macrophages    p53 -0.001402178 -7.010113
#>           pval          fdr
#> 1 2.868257e-12 2.868257e-12

Statial also provides a convenient function for visualising this interaction - plotStateChanges. Here, again we can specify image = 6 and our main cell types of interest, keratin/tumour cells and macrophages, and our marker p53, in the same format as calcStateChanges.

Through this analysis, we can observe that keratin/tumour cells closer to a group of macrophages tend to have higher expression of p53, as observed in the first graph. This relationship is quantified with the second graph, showing an overall decrease of p53 expression in keratin/tumour cells as distance to macrophages increase.

These results allow us to essentially arrive at the same result as Kontextual, which calculated a localisation between p53+ keratin/tumour cells and macrophages in the wider context of keratin/tumour cells.

p <- plotStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "6",
  from = "Keratin_Tumour",
  to = "Macrophages",
  marker = "p53",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm"
)

p
#> $image

#> 
#> $scatter

6.2 Continuous cell state changes across all images

Beyond looking at single cell-to-cell interactions for a single image, we can also look at all interactions across all images. The calcStateChanges function provided by Statial can be expanded for this exact purpose - by not specifying cell types, a marker, or an image, calcStateChanges will examine the most significant correlations between distance and marker expression across the entire dataset. Here, we’ve filtered out the most significant interactions to only include those found within image 6 of the Keren et al. dataset.

stateChanges <- calcStateChanges(
  cells = kerenSCE,
  type = "distances",
  nCores = 1,
  minCells = 100
)

stateChanges |>
  filter(imageID == 6) |>
  head(n = 10)
#>    imageID primaryCellType otherCellType       marker         coef      tval
#> 1        6  Keratin_Tumour  Unidentified           Na  0.004218419  25.03039
#> 2        6  Keratin_Tumour   Macrophages  HLA_Class_1 -0.003823497 -24.69629
#> 3        6  Keratin_Tumour      CD4_Cell  HLA_Class_1 -0.003582774 -23.87797
#> 4        6  Keratin_Tumour  Unidentified Beta.catenin  0.005893120  23.41953
#> 5        6  Keratin_Tumour      CD8_Cell  HLA_Class_1 -0.003154544 -23.13804
#> 6        6  Keratin_Tumour       Dc/Mono  HLA_Class_1 -0.003353834 -22.98944
#> 7        6  Keratin_Tumour      CD3_Cell  HLA_Class_1 -0.003123446 -22.63197
#> 8        6  Keratin_Tumour        Tumour  HLA_Class_1  0.003684079  21.94265
#> 9        6  Keratin_Tumour      CD4_Cell           Fe -0.003457338 -21.43550
#> 10       6  Keratin_Tumour      CD4_Cell   phospho.S6 -0.002892457 -20.50767
#>             pval           fdr
#> 1  6.971648e-127 1.176442e-123
#> 2  7.814253e-124 1.236215e-120
#> 3  1.745242e-116 2.208779e-113
#> 4  1.917245e-112 2.257178e-109
#> 5  5.444541e-110 5.991836e-107
#> 6  1.053130e-108 1.110701e-105
#> 7  1.237988e-105 1.205229e-102
#> 8  8.188258e-100  7.025803e-97
#> 9   1.287478e-95  9.727951e-93
#> 10  3.928912e-88  2.583081e-85

In image 6, the majority of the top 10 most significant interactions occur between keratin/tumour cells and an immune population, and many of these interactions appear to involve the HLA class I ligand.

We can examine some of these interactions further with the plotStateChanges function. Taking a closer examination of the relationship between macrophages and keratin/tumour HLA class I expression, the plot below shows us a clear visual correlation - as macrophage density increases, keratin/tumour cells increase their expression HLA class I.

Biologically, HLA Class I is a ligand which exists on all nucleated cells, tasked with presenting internal cell antigens for recognition by the immune system, marking aberrant cells for destruction by either CD8+ T cells or NK cells.

p <- plotStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "6",
  from = "Keratin_Tumour",
  to = "Macrophages",
  marker = "HLA_Class_1",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm"
)

p
#> $image

#> 
#> $scatter

Next, let’s take a look at the top 10 most significant results across all images.

stateChanges |> head(n = 10)
#>      imageID primaryCellType otherCellType     marker         coef      tval
#> 8674      35        CD4_Cell             B       CD20 -0.029185750 -40.57355
#> 8770      35        CD4_Cell       Dc/Mono       CD20  0.019125946  40.53436
#> 1819      35               B       Dc/Mono phospho.S6  0.005282065  40.41385
#> 8779      35        CD4_Cell       Dc/Mono phospho.S6  0.004033218  34.72882
#> 1813      35               B       Dc/Mono     HLA.DR  0.011120703  34.15344
#> 1971      35               B  other immune          P  0.011182182  34.14375
#> 8626      35        CD4_Cell      CD3_Cell       CD20  0.016349492  33.91901
#> 1816      35               B       Dc/Mono     H3K9ac  0.005096632  33.99856
#> 2011      35               B  other immune phospho.S6  0.005986586  33.66466
#> 1818      35               B       Dc/Mono   H3K27me3  0.006980810  33.22740
#>               pval           fdr
#> 8674 7.019343e-282 3.553472e-277
#> 8770 1.891267e-281 4.787176e-277
#> 1819 5.306590e-278 8.954694e-274
#> 8779 4.519947e-219 5.720445e-215
#> 1813 8.401034e-212 8.505879e-208
#> 1971 1.056403e-211 8.913225e-208
#> 8626 1.219488e-210 8.819335e-207
#> 1816 3.266533e-210 2.067062e-206
#> 2011 8.545691e-207 4.806856e-203
#> 1818 2.438769e-202 1.234603e-198

Immediately, we can appreciate that a couple of these interactions are not biologically plausible. One of the most significant interactions occurs between B cells and CD4 T cells in image 35, where CD4 T cells are found to increase in CD20 expression when in close proximity to B cells. Biologically, CD20 is a highly specific ligand for B cells, and under healthy circumstances are usually not expressed in T cells.

Could this potentially be an artefact of calcStateChanges? We can examine the image through the plotStateChanges function, where we indeed observe a strong increase in CD20 expression in T cells nearby B cell populations.

p <- plotStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "35",
  from = "CD4_Cell",
  to = "B",
  marker = "CD20",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm"
)

p
#> $image