1 Introduction

This vignette introduces the SpaceTrooper package for spatial data analysis from platforms like CosMx on Protein assay.

2 Installation

To install SpaceTrooper, use the following commands:

# Install BiocManager if not already installed, then install SpaceTrooper
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("drighelli/SpaceTrooper")

3 Data Loading

In this section, we load data from various platforms using the package’s functions. The goal is to provide a uniform SpatialExperiment object across all technologies, allowing for consistent QC analysis.

The functions in SpaceTrooper compute missing metrics as needed and allow for the inclusion of polygons with the keep_polygons argument. This stores polygons in the colData of the SpatialExperiment.

# Load the SpaceTrooper library
library(SpaceTrooper)

# Load Xenium data into a Spatial Experiment object (SPE)
protfolder <- system.file( "extdata", "S0_prot", package="SpaceTrooper")
(spe <- readCosmxProteinSPE(protfolder))
## class: SpatialExperiment 
## dim: 69 2298 
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(2298): f60_c1 f60_c10 ... f60_c998 f60_c999
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id
colData(spe)
## DataFrame with 2298 rows and 58 columns
##                 fov    cellID      Area AspectRatio     Width    Height
##           <integer> <integer> <integer>   <numeric> <integer> <integer>
## f60_c1           60         1      2086        0.77        57        44
## f60_c10          60        10      2186        0.97        59        57
## f60_c100         60       100      3550        0.89        68        76
## f60_c1000        60      1000     11235        0.93       129       139
## f60_c1001        60      1001     11801        0.52       205       107
## ...             ...       ...       ...         ...       ...       ...
## f60_c995         60       995      4579        0.87        78        90
## f60_c996         60       996      5301        0.75        79       105
## f60_c997         60       997      2236        0.49        79        39
## f60_c998         60       998      4529        0.65        65       100
## f60_c999         60       999     10317        0.73       112       153
##           Mean.PanCK Max.PanCK Mean.CD68  Max.CD68 Mean.Membrane Max.Membrane
##            <integer> <integer> <integer> <integer>     <integer>    <integer>
## f60_c1           551      1420       555      3816          2896         4456
## f60_c10          536       740       303       628          2947         5336
## f60_c100         433       688       289       552          1749         4524
## f60_c1000        767      1788       524      3796          2173         6788
## f60_c1001        402      1448       361      4364          1860         5472
## ...              ...       ...       ...       ...           ...          ...
## f60_c995         575      1004       276       448          1941         5164
## f60_c996         607       960       331       768          3240         9912
## f60_c997         629      2088       308       580          2837         5168
## f60_c998         485       724       256       664          2088         4060
## f60_c999         549      1592       520      3788          2038         6840
##           Mean.CD45  Max.CD45 Mean.DAPI  Max.DAPI SplitRatioToLocal   NucArea
##           <integer> <integer> <integer> <integer>         <numeric> <integer>
## f60_c1         8585     16216      1688      6772              0.41       556
## f60_c10        8061     18856      3017      5512              0.43      1519
## f60_c100       2914     11692      1632      4380              0.00      1444
## f60_c1000      1877      8548      3767      6844              0.00      8068
## f60_c1001      2804      9556      1534      5540              0.00      4308
## ...             ...       ...       ...       ...               ...       ...
## f60_c995       4332     14240      2695      5356                 0      3760
## f60_c996       5167     12164      3396      5744                 0      3724
## f60_c997       7152     15356      1907      4056                 0       816
## f60_c998       5270     15824      2595      5196                 0      2860
## f60_c999       3414     18572      2814      6268                 0      6140
##           NucAspectRatio Circularity Eccentricity Perimeter  Solidity
##                <numeric>   <numeric>    <numeric> <integer> <numeric>
## f60_c1              0.69     2912.61         0.00         3    695.33
## f60_c10             0.90        1.49         0.09       136     16.07
## f60_c100            0.96        0.91         0.77       221     16.06
## f60_c1000           0.85        0.86         0.90       405     27.74
## f60_c1001           0.38        0.62         0.55       488     24.18
## ...                  ...         ...          ...       ...       ...
## f60_c995            0.93        0.86         0.82       258     17.75
## f60_c996            0.85        0.83         0.74       283     18.73
## f60_c997            0.89        0.77         0.52       191     11.71
## f60_c998            0.70        0.89         0.69       253     17.90
## f60_c999            0.94        0.79         0.74       406     25.41
##               cell_id         X     version   dualfiles    Run_name
##           <character> <integer> <character> <character> <character>
## f60_c1         f60_c1         1          v6           ?        Run0
## f60_c10       f60_c10         1          v6           ?        Run0
## f60_c100     f60_c100         1          v6           ?        Run0
## f60_c1000   f60_c1000         1          v6           ?        Run0
## f60_c1001   f60_c1001         1          v6           ?        Run0
## ...               ...       ...         ...         ...         ...
## f60_c995     f60_c995         1          v6           ?        Run0
## f60_c996     f60_c996         1          v6           ?        Run0
## f60_c997     f60_c997         1          v6           ?        Run0
## f60_c998     f60_c998         1          v6           ?        Run0
## f60_c999     f60_c999         1          v6           ?        Run0
##           Run_Tissue_name ISH.concentration        Dash      tissue       Panel
##               <character>       <character> <character> <character> <character>
## f60_c1                 S0               1nM       PILOT      tissue         WTx
## f60_c10                S0               1nM       PILOT      tissue         WTx
## f60_c100               S0               1nM       PILOT      tissue         WTx
## f60_c1000              S0               1nM       PILOT      tissue         WTx
## f60_c1001              S0               1nM       PILOT      tissue         WTx
## ...                   ...               ...         ...         ...         ...
## f60_c995               S0               1nM       PILOT      tissue         WTx
## f60_c996               S0               1nM       PILOT      tissue         WTx
## f60_c997               S0               1nM       PILOT      tissue         WTx
## f60_c998               S0               1nM       PILOT      tissue         WTx
## f60_c999               S0               1nM       PILOT      tissue         WTx
##            assay_type  slide_ID median_RNA RNA_quantile_0.75 RNA_quantile_0.8
##           <character> <integer>  <numeric>         <numeric>        <numeric>
## f60_c1        protein         1    93752.1            421872           536217
## f60_c10       protein         1    93752.1            421872           536217
## f60_c100      protein         1    93752.1            421872           536217
## f60_c1000     protein         1    93752.1            421872           536217
## f60_c1001     protein         1    93752.1            421872           536217
## ...               ...       ...        ...               ...              ...
## f60_c995      protein         1    93752.1            421872           536217
## f60_c996      protein         1    93752.1            421872           536217
## f60_c997      protein         1    93752.1            421872           536217
## f60_c998      protein         1    93752.1            421872           536217
## f60_c999      protein         1    93752.1            421872           536217
##           RNA_quantile_0.85 RNA_quantile_0.9 RNA_quantile_0.95
##                   <numeric>        <numeric>         <numeric>
## f60_c1               816363          1623303           4461905
## f60_c10              816363          1623303           4461905
## f60_c100             816363          1623303           4461905
## f60_c1000            816363          1623303           4461905
## f60_c1001            816363          1623303           4461905
## ...                     ...              ...               ...
## f60_c995             816363          1623303           4461905
## f60_c996             816363          1623303           4461905
## f60_c997             816363          1623303           4461905
## f60_c998             816363          1623303           4461905
## f60_c999             816363          1623303           4461905
##           RNA_quantile_0.99 nCount_RNA nFeature_RNA median_negprobes
##                   <numeric>  <numeric>    <integer>        <numeric>
## f60_c1              7300861    39507.4           67          24469.2
## f60_c10             7300861    37093.6           67          24469.2
## f60_c100            7300861    24483.5           67          24469.2
## f60_c1000           7300861    15666.4           67          24469.2
## f60_c1001           7300861    20078.0           67          24469.2
## ...                     ...        ...          ...              ...
## f60_c995            7300861    19585.0           67          24469.2
## f60_c996            7300861    24066.8           67          24469.2
## f60_c997            7300861    32250.8           67          24469.2
## f60_c998            7300861    23750.0           67          24469.2
## f60_c999            7300861    16817.9           67          24469.2
##           negprobes_quantile_0.75 negprobes_quantile_0.8
##                         <numeric>              <numeric>
## f60_c1                    30785.4                32048.6
## f60_c10                   30785.4                32048.6
## f60_c100                  30785.4                32048.6
## f60_c1000                 30785.4                32048.6
## f60_c1001                 30785.4                32048.6
## ...                           ...                    ...
## f60_c995                  30785.4                32048.6
## f60_c996                  30785.4                32048.6
## f60_c997                  30785.4                32048.6
## f60_c998                  30785.4                32048.6
## f60_c999                  30785.4                32048.6
##           negprobes_quantile_0.85 negprobes_quantile_0.9
##                         <numeric>              <numeric>
## f60_c1                    33311.9                34575.1
## f60_c10                   33311.9                34575.1
## f60_c100                  33311.9                34575.1
## f60_c1000                 33311.9                34575.1
## f60_c1001                 33311.9                34575.1
## ...                           ...                    ...
## f60_c995                  33311.9                34575.1
## f60_c996                  33311.9                34575.1
## f60_c997                  33311.9                34575.1
## f60_c998                  33311.9                34575.1
## f60_c999                  33311.9                34575.1
##           negprobes_quantile_0.95 negprobes_quantile_0.99 nCount_negprobes
##                         <numeric>               <numeric>        <numeric>
## f60_c1                    35838.4                   36849            16.26
## f60_c10                   35838.4                   36849            17.41
## f60_c100                  35838.4                   36849            15.31
## f60_c1000                 35838.4                   36849            18.14
## f60_c1001                 35838.4                   36849            18.18
## ...                           ...                     ...              ...
## f60_c995                  35838.4                   36849            39.87
## f60_c996                  35838.4                   36849            21.70
## f60_c997                  35838.4                   36849            16.14
## f60_c998                  35838.4                   36849            21.24
## f60_c999                  35838.4                   36849            17.12
##           nFeature_negprobes  Area.um2 CenterX_local_px CenterY_local_px
##                    <integer> <numeric>        <integer>        <integer>
## f60_c1                     2   30.0384               28               22
## f60_c10                    2   31.4784              781               28
## f60_c100                   2   51.1200             3965               86
## f60_c1000                  2  161.7840             3482             1303
## f60_c1001                  2  169.9344             4120             1271
## ...                      ...       ...              ...              ...
## f60_c995                   2   65.9376             2207             1271
## f60_c996                   2   76.3344             3585             1278
## f60_c997                   2   32.1984              895             1249
## f60_c998                   2   65.2176             1858             1281
## f60_c999                   2  148.5648             2289             1308
##                  cell   sample_id
##           <character> <character>
## f60_c1       c_1_60_1    sample01
## f60_c10     c_1_60_10    sample01
## f60_c100   c_1_60_100    sample01
## f60_c1000 c_1_60_1000    sample01
## f60_c1001 c_1_60_1001    sample01
## ...               ...         ...
## f60_c995   c_1_60_995    sample01
## f60_c996   c_1_60_996    sample01
## f60_c997   c_1_60_997    sample01
## f60_c998   c_1_60_998    sample01
## f60_c999   c_1_60_999    sample01

4 Data Analysis for CosMx protein

The package offers several functions for spatial data analysis, including quality control and visualization.

This tutorial focuses on CosMx protein data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx.

Additionally, even if not tested, the same approach can be extended on Akoya CODEX data, as far as a SpatialExperiment object is created. Polygons can be loaded later if needed.

5 Field of Views (FOVs) Visualization

The plotCellsFovs function shows a map of the FoVs within an experiment. This plot is specific to CosMx data and uses cell centroids.

Please keep in mind, that this specific experiment had unaligned fov_positions and cell centroids positions. An alignment approach, can be found at the end of the scripts/datacreation.R file.

# Plot the cells within their respective Field of Views (FOVs)
plotCellsFovs(spe)

Because the dataset is a subset of just one Field of View of the original experiment, we are able to see the identifier of the FoV in black and the centroids of the cells in purple.

When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their identifiers.

6 Quality control

The spatialPerCellQC function, inspired by scater::addPerCellQC, computes additional metrics for each cell in the SpatialExperiment. It also allows for the detection of negative control probes, which is crucial for QC.

By default, it automatically removes 0 counts cells, but this can be handled with the rmZeros argument.

Here, for transparency, we specified the negProbList for CosMx protein assays, but the algorithm has already a set of negative probes for the mostly used probes in multiple technologies. Notice that despite the same approach can be applied to CODEX data, it is not provided a list of negative probes for this technology, so the user needs to specify them.

# Perform per-cell quality control checks
spe <- spatialPerCellQC(spe, negProbList=c("Ms IgG1", "Rb IgG"))
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"

7 Metrics Histograms

You can investigate individual metrics by viewing their histograms. For outliers, use the use_fences argument to display the fences computed by computeSpatialOutlier (see next chunk).

# Plot a histogram of counts (sum)
plotMetricHist(spe, metric="sum")

# Plot a histogram of cell areas (Area_um)
plotMetricHist(spe, metric="Area_um")

# Plot a histogram of proportion of counts respect to the cell area in micron 
plotMetricHist(spe, metric="log2CountArea")

# Plot a histogram of proportion of negative probe counts respect to the total
# counts in cells
plotMetricHist(spe, metric="log2Ctrl_total_ratio")

These plots show, respectively, the distributions of the total counts (sum), of the Area in micron (Area_um), the relationship between the counts and the Area of each cell (log2CountArea) and the proportion between the negative probes counts and the total counts of each cell (log2Ctrl_total_ratio).

8 Spatial Outlier Detection

Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed.

The computeSpatialOutlier function allows the computation of the medcouple statistics on a specified metric (compute_by argument). The medcouple is specifically designed for symmetric distributions, indeed the function stamps a warning message when this requisite is not satisfied. It can also use scuttle::isOutlier for asymmetric distributions. The method argument supports mc, scuttle, or both.

This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric.

# Identify spatial outliers based on cell area (Area_um)
spe <- computeSpatialOutlier(spe, computeBy="Area_um", method="both")

# Identify spatial outliers based on mean DAPI intensity
spe <- computeSpatialOutlier(spe, computeBy="Mean.DAPI", method="both")
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"            "Area_um_outlier_mc"      
## [85] "Area_um_outlier_sc"       "Mean.DAPI_outlier_mc"    
## [87] "Mean.DAPI_outlier_sc"

If we computed outliers with the computeSpatialOutlier function, we can also visualize which fences have been used to create the filter on the cells.

# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_mc")

# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_sc")

# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_mc")

# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_sc")

We visualize the fences computed with medcouple and scuttle outlier detection approaches, to directly inspect differences and the amount of detected outlier each method detected.

If we want, we can already use these fences to remove the computed outliers.

9 The Quality Score

Next, we use computeQScore to calculate a flag score based on previously computed metrics. The flag score combines transcript counts related to cell area, the aspect ratio of each cell, and its distance from the FoV border (only for CosMx, this last one is not used otherwise).

See the help(computeQScore) details section for additional details.

# Calculate quality scores for each cell
spe <- computeQScore(spe)
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"            "Area_um_outlier_mc"      
## [85] "Area_um_outlier_sc"       "Mean.DAPI_outlier_mc"    
## [87] "Mean.DAPI_outlier_sc"     "quality_score"           
## [89] "training_status"

Logical filters can then be computed using computeQScoreFlags, which requires thresholds for various metrics. Currently, the function considers:

  • Flag Score (qs_threshold): Cells with scores below this threshold (default 0.5) are flagged for exclusion. This value can be used to indicate the quantile for the filtering when setting the use_fs_quantiles argument to TRUE.

  • Flag Score Quantiles (use_qs_quantiles): Option to filter based on quantiles (default FALSE).

# Compute flags to identify cells for filtering
spe <- computeQScoreFlags(spe, qsThreshold=0.5)
names(colData(spe))
##  [1] "fov"                      "cellID"                  
##  [3] "Area"                     "AspectRatio"             
##  [5] "Width"                    "Height"                  
##  [7] "Mean.PanCK"               "Max.PanCK"               
##  [9] "Mean.CD68"                "Max.CD68"                
## [11] "Mean.Membrane"            "Max.Membrane"            
## [13] "Mean.CD45"                "Max.CD45"                
## [15] "Mean.DAPI"                "Max.DAPI"                
## [17] "SplitRatioToLocal"        "NucArea"                 
## [19] "NucAspectRatio"           "Circularity"             
## [21] "Eccentricity"             "Perimeter"               
## [23] "Solidity"                 "cell_id"                 
## [25] "X"                        "version"                 
## [27] "dualfiles"                "Run_name"                
## [29] "Run_Tissue_name"          "ISH.concentration"       
## [31] "Dash"                     "tissue"                  
## [33] "Panel"                    "assay_type"              
## [35] "slide_ID"                 "median_RNA"              
## [37] "RNA_quantile_0.75"        "RNA_quantile_0.8"        
## [39] "RNA_quantile_0.85"        "RNA_quantile_0.9"        
## [41] "RNA_quantile_0.95"        "RNA_quantile_0.99"       
## [43] "nCount_RNA"               "nFeature_RNA"            
## [45] "median_negprobes"         "negprobes_quantile_0.75" 
## [47] "negprobes_quantile_0.8"   "negprobes_quantile_0.85" 
## [49] "negprobes_quantile_0.9"   "negprobes_quantile_0.95" 
## [51] "negprobes_quantile_0.99"  "nCount_negprobes"        
## [53] "nFeature_negprobes"       "Area_um"                 
## [55] "CenterX_local_px"         "CenterY_local_px"        
## [57] "cell"                     "sample_id"               
## [59] "sum"                      "detected"                
## [61] "subsets_Ms IgG1_sum"      "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent"  "subsets_Rb IgG_sum"      
## [65] "subsets_Rb IgG_detected"  "subsets_Rb IgG_percent"  
## [67] "total"                    "control_sum"             
## [69] "control_detected"         "target_sum"              
## [71] "target_detected"          "CenterX_global_px"       
## [73] "CenterY_global_px"        "ctrl_total_ratio"        
## [75] "log2Ctrl_total_ratio"     "CenterX_global_um"       
## [77] "CenterY_global_um"        "dist_border_x"           
## [79] "dist_border_y"            "dist_border"             
## [81] "log2AspectRatio"          "CountArea"               
## [83] "log2CountArea"            "Area_um_outlier_mc"      
## [85] "Area_um_outlier_sc"       "Mean.DAPI_outlier_mc"    
## [87] "Mean.DAPI_outlier_sc"     "quality_score"           
## [89] "training_status"          "low_qscore"
table(spe$low_qscore)
## 
## FALSE  TRUE 
##  2062   236

We detected 61 cells to be removed.

10 Additional metrics to filter out cells

While for other metrics such as the total counts and the negative prob ratio, the function computeThresholdFlags considers:

  • Total Counts (total_threshold): Minimum count threshold (default 0).

  • Negative Probe Ratio (ctrl_tot_ratio_threshold): Minimum ratio of negative probes to total counts (default 0.1).

spe <- computeThresholdFlags(spe, totalThreshold=0, 
                                ctrlTotRatioThreshold=0.1)
table(spe$threshold_flags)
## 
## FALSE 
##  2298

In this example, we don’t find any cell to be removed.

11 Adding Polygon and Visualization

To better understand the quality score values we start to load the polygons, giving us a better overview of the cells characteristics.

We can load and add polygons to the SPE object using the following functions. Each technology has its own readPolygons function to standardize the loaded sf object and handle different file types.

# Read polygon data associated with cells in the SPE
# the polygon file path is stored in the spe metadata
pols <- readPolygonsCosmx(metadata(spe)$polygons)

# Add the polygon data to the SPE object
spe <- addPolygonsToSPE(spe, pols)

Once the polygons are stored in an sf object within colData, they can be visualized using functions based on the ggplot2 library.

# Plot the polygons of the selected cells
plotPolygons(spe, bgColor="white")

Showing the cells on a white background for better visualization.

# Plot polygons colored by cell area
plotPolygons(spe, colourBy="log2AspectRatio")

plotPolygons(spe, colourBy="Area_um")

We can see in yellow and darkviolet that there are few cells with extreme values of log2AspectRatio and Area:um in micron.

plotPolygons(spe, colourBy="quality_score")

plotPolygons(spe, colourBy="low_qscore")

We can see that the quality score is able to detect both these aspects and highlight the cells that are mostly isolated on the FoV border or showing a weird confomation.

We always recommend to be aware of the cell populations in the under-study context, before proceeding to remove the detected cells.

12 Fov Zoom and Map

The plotZoomFovsMap function allows you to visualize a map of the FoVs with a zoom-in of selected FoVs, colored by the colour_by argument.

plotZoomFovsMap(spe, fovs=60, colourBy="quality_score")

plotZoomFovsMap(spe, fovs=60, colourBy="low_qscore")

We see on the left side the map of all the FoVs (only the FoV 16 in this case), together with the poligons on the right, coloured by the quality score. Allowing us to have a better view of a specific tissue area in the whole experiment.

13 Conclusion

In this vignette, we explored the main functionalities of the SpaceTrooper package for spatial data analysis. Main steps shown are: * data and polygons loading for CosMx Protein, CosMx * quality control: + outlier detection: medcouple and scuttle MAD + flag score: a score combining transcript counts, cell area, aspect ratio and distance from the FoV border * visualization: + centroids: with ggplot2 + polygons: sf + ggplot2

14 Session Information

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SpaceTrooper_0.99.3         SpatialExperiment_1.19.1   
##  [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.1
##  [5] Biobase_2.69.0              GenomicRanges_1.61.1       
##  [7] Seqinfo_0.99.2              IRanges_2.43.0             
##  [9] S4Vectors_0.47.0            BiocGenerics_0.55.1        
## [11] generics_0.1.4              MatrixGenerics_1.21.0      
## [13] matrixStats_1.5.0           BiocStyle_2.37.1           
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.5.1             tibble_3.3.0             
##   [3] R.oo_1.27.1               leaflegend_1.2.1         
##   [5] XML_3.99-0.18             lifecycle_1.0.4          
##   [7] sf_1.0-21                 rstatix_0.7.2            
##   [9] edgeR_4.7.3               lattice_0.22-7           
##  [11] crosstalk_1.2.1           backports_1.5.0          
##  [13] magrittr_2.0.3            limma_3.65.3             
##  [15] sass_0.4.10               rmarkdown_2.29           
##  [17] jquerylib_0.1.4           yaml_2.3.10              
##  [19] sp_2.2-0                  cowplot_1.2.0            
##  [21] DBI_1.2.3                 RColorBrewer_1.1-3       
##  [23] abind_1.4-8               purrr_1.1.0              
##  [25] R.utils_2.13.0            ggrepel_0.9.6            
##  [27] irlba_2.3.5.1             terra_1.8-60             
##  [29] units_0.8-7               dqrng_0.4.1              
##  [31] DelayedMatrixStats_1.31.0 codetools_0.2-20         
##  [33] DropletUtils_1.29.4       DelayedArray_0.35.2      
##  [35] scuttle_1.19.0            tidyselect_1.2.1         
##  [37] shape_1.4.6.1             raster_3.6-32            
##  [39] farver_2.1.2              ScaledMatrix_1.17.0      
##  [41] viridis_0.6.5             base64enc_0.1-3          
##  [43] jsonlite_2.0.0            cols4all_0.8             
##  [45] BiocNeighbors_2.3.1       e1071_1.7-16             
##  [47] Formula_1.2-5             survival_3.8-3           
##  [49] scater_1.37.0             iterators_1.0.14         
##  [51] foreach_1.5.2             tools_4.5.1              
##  [53] Rcpp_1.1.0                glue_1.8.0               
##  [55] gridExtra_2.3             SparseArray_1.9.1        
##  [57] xfun_0.52                 leaflet.providers_2.0.0  
##  [59] dplyr_1.1.4               HDF5Array_1.37.0         
##  [61] withr_3.0.2               BiocManager_1.30.26      
##  [63] fastmap_1.2.0             rhdf5filters_1.21.0      
##  [65] digest_0.6.37             rsvd_1.0.5               
##  [67] R6_2.6.1                  microbenchmark_1.5.0     
##  [69] colorspace_2.1-1          wk_0.9.4                 
##  [71] spacesXYZ_1.6-0           dichromat_2.0-0.1        
##  [73] R.methodsS3_1.8.2         h5mread_1.1.1            
##  [75] tidyr_1.3.1               data.table_1.17.8        
##  [77] robustbase_0.99-4-1       class_7.3-23             
##  [79] htmlwidgets_1.6.4         S4Arrays_1.9.1           
##  [81] tmaptools_3.3             pkgconfig_2.0.3          
##  [83] gtable_0.3.6              XVector_0.49.0           
##  [85] htmltools_0.5.8.1         carData_3.0-5            
##  [87] bookdown_0.43             scales_1.4.0             
##  [89] png_0.1-8                 knitr_1.50               
##  [91] rjson_0.2.23              proxy_0.4-27             
##  [93] cachem_1.1.0              rhdf5_2.53.3             
##  [95] KernSmooth_2.23-26        parallel_4.5.1           
##  [97] vipor_0.4.7               arrow_21.0.0             
##  [99] s2_1.1.9                  leafsync_0.1.0           
## [101] pillar_1.11.0             grid_4.5.1               
## [103] logger_0.4.0              vctrs_0.6.5              
## [105] ggpubr_0.6.1              car_3.1-3                
## [107] BiocSingular_1.25.0       beachmat_2.25.4          
## [109] sfheaders_0.4.4           beeswarm_0.4.0           
## [111] evaluate_1.0.4            SpatialExperimentIO_1.1.0
## [113] tinytex_0.57              magick_2.8.7             
## [115] cli_3.6.5                 locfit_1.5-9.12          
## [117] compiler_4.5.1            rlang_1.1.6              
## [119] crayon_1.5.3              tmap_4.1                 
## [121] maptiles_0.10.0           ggsignif_0.6.4           
## [123] labeling_0.4.3            classInt_0.4-11          
## [125] ggbeeswarm_0.7.2          viridisLite_0.4.2        
## [127] BiocParallel_1.43.4       stars_0.6-8              
## [129] assertthat_0.2.1          leaflet_2.2.2            
## [131] glmnet_4.1-10             Matrix_1.7-3             
## [133] sparseMatrixStats_1.21.0  bit64_4.6.0-1            
## [135] leafem_0.2.4              ggplot2_3.5.2            
## [137] Rhdf5lib_1.31.0           statmod_1.5.0            
## [139] broom_1.0.9               bslib_0.9.0              
## [141] lwgeom_0.2-14             DEoptimR_1.1-4           
## [143] bit_4.6.0