This vignette introduces the SpaceTrooper
package for spatial data analysis
from platforms like CosMx on Protein assay.
To install SpaceTrooper
, use the following commands:
# Install BiocManager if not already installed, then install SpaceTrooper
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("drighelli/SpaceTrooper")
In this section, we load data from various platforms using the package’s
functions. The goal is to provide a uniform SpatialExperiment
object across
all technologies, allowing for consistent QC analysis.
The functions in SpaceTrooper
compute missing metrics as needed and allow
for the inclusion of polygons with the keep_polygons
argument. This stores
polygons in the colData
of the SpatialExperiment
.
# Load the SpaceTrooper library
library(SpaceTrooper)
# Load Xenium data into a Spatial Experiment object (SPE)
protfolder <- system.file( "extdata", "S0_prot", package="SpaceTrooper")
(spe <- readCosmxProteinSPE(protfolder))
## class: SpatialExperiment
## dim: 69 2298
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(2298): f60_c1 f60_c10 ... f60_c998 f60_c999
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id
colData(spe)
## DataFrame with 2298 rows and 58 columns
## fov cellID Area AspectRatio Width Height
## <integer> <integer> <integer> <numeric> <integer> <integer>
## f60_c1 60 1 2086 0.77 57 44
## f60_c10 60 10 2186 0.97 59 57
## f60_c100 60 100 3550 0.89 68 76
## f60_c1000 60 1000 11235 0.93 129 139
## f60_c1001 60 1001 11801 0.52 205 107
## ... ... ... ... ... ... ...
## f60_c995 60 995 4579 0.87 78 90
## f60_c996 60 996 5301 0.75 79 105
## f60_c997 60 997 2236 0.49 79 39
## f60_c998 60 998 4529 0.65 65 100
## f60_c999 60 999 10317 0.73 112 153
## Mean.PanCK Max.PanCK Mean.CD68 Max.CD68 Mean.Membrane Max.Membrane
## <integer> <integer> <integer> <integer> <integer> <integer>
## f60_c1 551 1420 555 3816 2896 4456
## f60_c10 536 740 303 628 2947 5336
## f60_c100 433 688 289 552 1749 4524
## f60_c1000 767 1788 524 3796 2173 6788
## f60_c1001 402 1448 361 4364 1860 5472
## ... ... ... ... ... ... ...
## f60_c995 575 1004 276 448 1941 5164
## f60_c996 607 960 331 768 3240 9912
## f60_c997 629 2088 308 580 2837 5168
## f60_c998 485 724 256 664 2088 4060
## f60_c999 549 1592 520 3788 2038 6840
## Mean.CD45 Max.CD45 Mean.DAPI Max.DAPI SplitRatioToLocal NucArea
## <integer> <integer> <integer> <integer> <numeric> <integer>
## f60_c1 8585 16216 1688 6772 0.41 556
## f60_c10 8061 18856 3017 5512 0.43 1519
## f60_c100 2914 11692 1632 4380 0.00 1444
## f60_c1000 1877 8548 3767 6844 0.00 8068
## f60_c1001 2804 9556 1534 5540 0.00 4308
## ... ... ... ... ... ... ...
## f60_c995 4332 14240 2695 5356 0 3760
## f60_c996 5167 12164 3396 5744 0 3724
## f60_c997 7152 15356 1907 4056 0 816
## f60_c998 5270 15824 2595 5196 0 2860
## f60_c999 3414 18572 2814 6268 0 6140
## NucAspectRatio Circularity Eccentricity Perimeter Solidity
## <numeric> <numeric> <numeric> <integer> <numeric>
## f60_c1 0.69 2912.61 0.00 3 695.33
## f60_c10 0.90 1.49 0.09 136 16.07
## f60_c100 0.96 0.91 0.77 221 16.06
## f60_c1000 0.85 0.86 0.90 405 27.74
## f60_c1001 0.38 0.62 0.55 488 24.18
## ... ... ... ... ... ...
## f60_c995 0.93 0.86 0.82 258 17.75
## f60_c996 0.85 0.83 0.74 283 18.73
## f60_c997 0.89 0.77 0.52 191 11.71
## f60_c998 0.70 0.89 0.69 253 17.90
## f60_c999 0.94 0.79 0.74 406 25.41
## cell_id X version dualfiles Run_name
## <character> <integer> <character> <character> <character>
## f60_c1 f60_c1 1 v6 ? Run0
## f60_c10 f60_c10 1 v6 ? Run0
## f60_c100 f60_c100 1 v6 ? Run0
## f60_c1000 f60_c1000 1 v6 ? Run0
## f60_c1001 f60_c1001 1 v6 ? Run0
## ... ... ... ... ... ...
## f60_c995 f60_c995 1 v6 ? Run0
## f60_c996 f60_c996 1 v6 ? Run0
## f60_c997 f60_c997 1 v6 ? Run0
## f60_c998 f60_c998 1 v6 ? Run0
## f60_c999 f60_c999 1 v6 ? Run0
## Run_Tissue_name ISH.concentration Dash tissue Panel
## <character> <character> <character> <character> <character>
## f60_c1 S0 1nM PILOT tissue WTx
## f60_c10 S0 1nM PILOT tissue WTx
## f60_c100 S0 1nM PILOT tissue WTx
## f60_c1000 S0 1nM PILOT tissue WTx
## f60_c1001 S0 1nM PILOT tissue WTx
## ... ... ... ... ... ...
## f60_c995 S0 1nM PILOT tissue WTx
## f60_c996 S0 1nM PILOT tissue WTx
## f60_c997 S0 1nM PILOT tissue WTx
## f60_c998 S0 1nM PILOT tissue WTx
## f60_c999 S0 1nM PILOT tissue WTx
## assay_type slide_ID median_RNA RNA_quantile_0.75 RNA_quantile_0.8
## <character> <integer> <numeric> <numeric> <numeric>
## f60_c1 protein 1 93752.1 421872 536217
## f60_c10 protein 1 93752.1 421872 536217
## f60_c100 protein 1 93752.1 421872 536217
## f60_c1000 protein 1 93752.1 421872 536217
## f60_c1001 protein 1 93752.1 421872 536217
## ... ... ... ... ... ...
## f60_c995 protein 1 93752.1 421872 536217
## f60_c996 protein 1 93752.1 421872 536217
## f60_c997 protein 1 93752.1 421872 536217
## f60_c998 protein 1 93752.1 421872 536217
## f60_c999 protein 1 93752.1 421872 536217
## RNA_quantile_0.85 RNA_quantile_0.9 RNA_quantile_0.95
## <numeric> <numeric> <numeric>
## f60_c1 816363 1623303 4461905
## f60_c10 816363 1623303 4461905
## f60_c100 816363 1623303 4461905
## f60_c1000 816363 1623303 4461905
## f60_c1001 816363 1623303 4461905
## ... ... ... ...
## f60_c995 816363 1623303 4461905
## f60_c996 816363 1623303 4461905
## f60_c997 816363 1623303 4461905
## f60_c998 816363 1623303 4461905
## f60_c999 816363 1623303 4461905
## RNA_quantile_0.99 nCount_RNA nFeature_RNA median_negprobes
## <numeric> <numeric> <integer> <numeric>
## f60_c1 7300861 39507.4 67 24469.2
## f60_c10 7300861 37093.6 67 24469.2
## f60_c100 7300861 24483.5 67 24469.2
## f60_c1000 7300861 15666.4 67 24469.2
## f60_c1001 7300861 20078.0 67 24469.2
## ... ... ... ... ...
## f60_c995 7300861 19585.0 67 24469.2
## f60_c996 7300861 24066.8 67 24469.2
## f60_c997 7300861 32250.8 67 24469.2
## f60_c998 7300861 23750.0 67 24469.2
## f60_c999 7300861 16817.9 67 24469.2
## negprobes_quantile_0.75 negprobes_quantile_0.8
## <numeric> <numeric>
## f60_c1 30785.4 32048.6
## f60_c10 30785.4 32048.6
## f60_c100 30785.4 32048.6
## f60_c1000 30785.4 32048.6
## f60_c1001 30785.4 32048.6
## ... ... ...
## f60_c995 30785.4 32048.6
## f60_c996 30785.4 32048.6
## f60_c997 30785.4 32048.6
## f60_c998 30785.4 32048.6
## f60_c999 30785.4 32048.6
## negprobes_quantile_0.85 negprobes_quantile_0.9
## <numeric> <numeric>
## f60_c1 33311.9 34575.1
## f60_c10 33311.9 34575.1
## f60_c100 33311.9 34575.1
## f60_c1000 33311.9 34575.1
## f60_c1001 33311.9 34575.1
## ... ... ...
## f60_c995 33311.9 34575.1
## f60_c996 33311.9 34575.1
## f60_c997 33311.9 34575.1
## f60_c998 33311.9 34575.1
## f60_c999 33311.9 34575.1
## negprobes_quantile_0.95 negprobes_quantile_0.99 nCount_negprobes
## <numeric> <numeric> <numeric>
## f60_c1 35838.4 36849 16.26
## f60_c10 35838.4 36849 17.41
## f60_c100 35838.4 36849 15.31
## f60_c1000 35838.4 36849 18.14
## f60_c1001 35838.4 36849 18.18
## ... ... ... ...
## f60_c995 35838.4 36849 39.87
## f60_c996 35838.4 36849 21.70
## f60_c997 35838.4 36849 16.14
## f60_c998 35838.4 36849 21.24
## f60_c999 35838.4 36849 17.12
## nFeature_negprobes Area.um2 CenterX_local_px CenterY_local_px
## <integer> <numeric> <integer> <integer>
## f60_c1 2 30.0384 28 22
## f60_c10 2 31.4784 781 28
## f60_c100 2 51.1200 3965 86
## f60_c1000 2 161.7840 3482 1303
## f60_c1001 2 169.9344 4120 1271
## ... ... ... ... ...
## f60_c995 2 65.9376 2207 1271
## f60_c996 2 76.3344 3585 1278
## f60_c997 2 32.1984 895 1249
## f60_c998 2 65.2176 1858 1281
## f60_c999 2 148.5648 2289 1308
## cell sample_id
## <character> <character>
## f60_c1 c_1_60_1 sample01
## f60_c10 c_1_60_10 sample01
## f60_c100 c_1_60_100 sample01
## f60_c1000 c_1_60_1000 sample01
## f60_c1001 c_1_60_1001 sample01
## ... ... ...
## f60_c995 c_1_60_995 sample01
## f60_c996 c_1_60_996 sample01
## f60_c997 c_1_60_997 sample01
## f60_c998 c_1_60_998 sample01
## f60_c999 c_1_60_999 sample01
The package offers several functions for spatial data analysis, including quality control and visualization.
This tutorial focuses on CosMx protein data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx.
Additionally, even if not tested, the same approach can be extended on Akoya
CODEX data, as far as a SpatialExperiment
object is created.
Polygons can be loaded later if needed.
The plotCellsFovs
function shows a map of the FoVs within an experiment.
This plot is specific to CosMx data and uses cell centroids.
Please keep in mind, that this specific experiment had unaligned fov_positions
and cell centroids positions.
An alignment approach, can be found at the end of the scripts/datacreation.R
file.
# Plot the cells within their respective Field of Views (FOVs)
plotCellsFovs(spe)
Because the dataset is a subset of just one Field of View of the original experiment, we are able to see the identifier of the FoV in black and the centroids of the cells in purple.
When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their identifiers.
The spatialPerCellQC
function, inspired by scater::addPerCellQC
, computes
additional metrics for each cell in the SpatialExperiment
. It also allows for
the detection of negative control probes, which is crucial for QC.
By default, it automatically removes 0 counts cells, but this can be handled
with the rmZeros
argument.
Here, for transparency, we specified the negProbList
for CosMx protein assays,
but the algorithm has already a set of negative probes for the mostly used
probes in multiple technologies.
Notice that despite the same approach can be applied to CODEX data, it is not
provided a list of negative probes for this technology, so the user needs to
specify them.
# Perform per-cell quality control checks
spe <- spatialPerCellQC(spe, negProbList=c("Ms IgG1", "Rb IgG"))
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea"
You can investigate individual metrics by viewing their histograms. For
outliers, use the use_fences
argument to display the fences computed by
computeSpatialOutlier
(see next chunk).
# Plot a histogram of counts (sum)
plotMetricHist(spe, metric="sum")
# Plot a histogram of cell areas (Area_um)
plotMetricHist(spe, metric="Area_um")
# Plot a histogram of proportion of counts respect to the cell area in micron
plotMetricHist(spe, metric="log2CountArea")
# Plot a histogram of proportion of negative probe counts respect to the total
# counts in cells
plotMetricHist(spe, metric="log2Ctrl_total_ratio")
These plots show, respectively, the distributions of the total counts (sum
),
of the Area in micron (Area_um
), the relationship between the counts and
the Area of each cell (log2CountArea
) and the proportion between the
negative probes counts and the total counts of each cell
(log2Ctrl_total_ratio
).
Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed.
The computeSpatialOutlier
function allows the computation of the medcouple
statistics on a specified metric (compute_by
argument).
The medcouple is specifically designed for symmetric distributions, indeed the
function stamps a warning message when this requisite is not satisfied.
It can also use scuttle::isOutlier
for asymmetric distributions.
The method
argument supports mc
, scuttle
, or both
.
This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric.
# Identify spatial outliers based on cell area (Area_um)
spe <- computeSpatialOutlier(spe, computeBy="Area_um", method="both")
# Identify spatial outliers based on mean DAPI intensity
spe <- computeSpatialOutlier(spe, computeBy="Mean.DAPI", method="both")
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc"
If we computed outliers with the computeSpatialOutlier
function, we can also
visualize which fences have been used to create the filter on the cells.
# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_mc")
# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_sc")
# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_mc")
# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_sc")
We visualize the fences computed with medcouple and scuttle outlier detection approaches, to directly inspect differences and the amount of detected outlier each method detected.
If we want, we can already use these fences to remove the computed outliers.
Next, we use computeQScore
to calculate a flag score based on previously
computed metrics.
The flag score combines transcript counts related to
cell area, the aspect ratio of each cell, and its
distance from the FoV border (only for CosMx, this last one is not used
otherwise).
See the help(computeQScore)
details section for additional details.
# Calculate quality scores for each cell
spe <- computeQScore(spe)
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc" "quality_score"
## [89] "training_status"
Logical filters can then be computed using computeQScoreFlags
, which requires
thresholds for various metrics. Currently, the function considers:
Flag Score (qs_threshold
): Cells with scores below this threshold
(default 0.5) are flagged for exclusion. This value can be used to indicate
the quantile for the filtering when setting the use_fs_quantiles
argument
to TRUE
.
Flag Score Quantiles (use_qs_quantiles
): Option to filter based on
quantiles (default FALSE
).
# Compute flags to identify cells for filtering
spe <- computeQScoreFlags(spe, qsThreshold=0.5)
names(colData(spe))
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc" "quality_score"
## [89] "training_status" "low_qscore"
table(spe$low_qscore)
##
## FALSE TRUE
## 2062 236
We detected 61 cells to be removed.
While for other metrics such as the total counts and the negative prob ratio,
the function computeThresholdFlags
considers:
Total Counts (total_threshold
): Minimum count threshold (default 0).
Negative Probe Ratio (ctrl_tot_ratio_threshold
): Minimum ratio of
negative probes to total counts (default 0.1).
spe <- computeThresholdFlags(spe, totalThreshold=0,
ctrlTotRatioThreshold=0.1)
table(spe$threshold_flags)
##
## FALSE
## 2298
In this example, we don’t find any cell to be removed.
To better understand the quality score values we start to load the polygons, giving us a better overview of the cells characteristics.
We can load and add polygons to the SPE object using the following functions.
Each technology has its own readPolygons
function to standardize the
loaded sf
object and handle different file types.
# Read polygon data associated with cells in the SPE
# the polygon file path is stored in the spe metadata
pols <- readPolygonsCosmx(metadata(spe)$polygons)
# Add the polygon data to the SPE object
spe <- addPolygonsToSPE(spe, pols)
Once the polygons are stored in an sf
object within colData
, they can be
visualized using functions based on the ggplot2
library.
# Plot the polygons of the selected cells
plotPolygons(spe, bgColor="white")
Showing the cells on a white background for better visualization.
# Plot polygons colored by cell area
plotPolygons(spe, colourBy="log2AspectRatio")
plotPolygons(spe, colourBy="Area_um")
We can see in yellow
and darkviolet
that there are few cells with extreme
values of log2AspectRatio
and Area:um
in micron.
plotPolygons(spe, colourBy="quality_score")
plotPolygons(spe, colourBy="low_qscore")
We can see that the quality score is able to detect both these aspects and highlight the cells that are mostly isolated on the FoV border or showing a weird confomation.
We always recommend to be aware of the cell populations in the under-study context, before proceeding to remove the detected cells.
The plotZoomFovsMap
function allows you to visualize a map of the FoVs with
a zoom-in of selected FoVs, colored by the colour_by
argument.
plotZoomFovsMap(spe, fovs=60, colourBy="quality_score")
plotZoomFovsMap(spe, fovs=60, colourBy="low_qscore")
We see on the left side the map of all the FoVs (only the FoV 16 in this case), together with the poligons on the right, coloured by the quality score. Allowing us to have a better view of a specific tissue area in the whole experiment.
In this vignette, we explored the main functionalities of the SpaceTrooper
package for spatial data analysis.
Main steps shown are:
* data and polygons loading for CosMx Protein, CosMx
* quality control:
+ outlier detection: medcouple and scuttle MAD
+ flag score: a score combining transcript counts, cell area,
aspect ratio and distance from the FoV border
* visualization:
+ centroids: with ggplot2
+ polygons: sf + ggplot2
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SpaceTrooper_0.99.3 SpatialExperiment_1.19.1
## [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.1
## [5] Biobase_2.69.0 GenomicRanges_1.61.1
## [7] Seqinfo_0.99.2 IRanges_2.43.0
## [9] S4Vectors_0.47.0 BiocGenerics_0.55.1
## [11] generics_0.1.4 MatrixGenerics_1.21.0
## [13] matrixStats_1.5.0 BiocStyle_2.37.1
##
## loaded via a namespace (and not attached):
## [1] splines_4.5.1 tibble_3.3.0
## [3] R.oo_1.27.1 leaflegend_1.2.1
## [5] XML_3.99-0.18 lifecycle_1.0.4
## [7] sf_1.0-21 rstatix_0.7.2
## [9] edgeR_4.7.3 lattice_0.22-7
## [11] crosstalk_1.2.1 backports_1.5.0
## [13] magrittr_2.0.3 limma_3.65.3
## [15] sass_0.4.10 rmarkdown_2.29
## [17] jquerylib_0.1.4 yaml_2.3.10
## [19] sp_2.2-0 cowplot_1.2.0
## [21] DBI_1.2.3 RColorBrewer_1.1-3
## [23] abind_1.4-8 purrr_1.1.0
## [25] R.utils_2.13.0 ggrepel_0.9.6
## [27] irlba_2.3.5.1 terra_1.8-60
## [29] units_0.8-7 dqrng_0.4.1
## [31] DelayedMatrixStats_1.31.0 codetools_0.2-20
## [33] DropletUtils_1.29.4 DelayedArray_0.35.2
## [35] scuttle_1.19.0 tidyselect_1.2.1
## [37] shape_1.4.6.1 raster_3.6-32
## [39] farver_2.1.2 ScaledMatrix_1.17.0
## [41] viridis_0.6.5 base64enc_0.1-3
## [43] jsonlite_2.0.0 cols4all_0.8
## [45] BiocNeighbors_2.3.1 e1071_1.7-16
## [47] Formula_1.2-5 survival_3.8-3
## [49] scater_1.37.0 iterators_1.0.14
## [51] foreach_1.5.2 tools_4.5.1
## [53] Rcpp_1.1.0 glue_1.8.0
## [55] gridExtra_2.3 SparseArray_1.9.1
## [57] xfun_0.52 leaflet.providers_2.0.0
## [59] dplyr_1.1.4 HDF5Array_1.37.0
## [61] withr_3.0.2 BiocManager_1.30.26
## [63] fastmap_1.2.0 rhdf5filters_1.21.0
## [65] digest_0.6.37 rsvd_1.0.5
## [67] R6_2.6.1 microbenchmark_1.5.0
## [69] colorspace_2.1-1 wk_0.9.4
## [71] spacesXYZ_1.6-0 dichromat_2.0-0.1
## [73] R.methodsS3_1.8.2 h5mread_1.1.1
## [75] tidyr_1.3.1 data.table_1.17.8
## [77] robustbase_0.99-4-1 class_7.3-23
## [79] htmlwidgets_1.6.4 S4Arrays_1.9.1
## [81] tmaptools_3.3 pkgconfig_2.0.3
## [83] gtable_0.3.6 XVector_0.49.0
## [85] htmltools_0.5.8.1 carData_3.0-5
## [87] bookdown_0.43 scales_1.4.0
## [89] png_0.1-8 knitr_1.50
## [91] rjson_0.2.23 proxy_0.4-27
## [93] cachem_1.1.0 rhdf5_2.53.3
## [95] KernSmooth_2.23-26 parallel_4.5.1
## [97] vipor_0.4.7 arrow_21.0.0
## [99] s2_1.1.9 leafsync_0.1.0
## [101] pillar_1.11.0 grid_4.5.1
## [103] logger_0.4.0 vctrs_0.6.5
## [105] ggpubr_0.6.1 car_3.1-3
## [107] BiocSingular_1.25.0 beachmat_2.25.4
## [109] sfheaders_0.4.4 beeswarm_0.4.0
## [111] evaluate_1.0.4 SpatialExperimentIO_1.1.0
## [113] tinytex_0.57 magick_2.8.7
## [115] cli_3.6.5 locfit_1.5-9.12
## [117] compiler_4.5.1 rlang_1.1.6
## [119] crayon_1.5.3 tmap_4.1
## [121] maptiles_0.10.0 ggsignif_0.6.4
## [123] labeling_0.4.3 classInt_0.4-11
## [125] ggbeeswarm_0.7.2 viridisLite_0.4.2
## [127] BiocParallel_1.43.4 stars_0.6-8
## [129] assertthat_0.2.1 leaflet_2.2.2
## [131] glmnet_4.1-10 Matrix_1.7-3
## [133] sparseMatrixStats_1.21.0 bit64_4.6.0-1
## [135] leafem_0.2.4 ggplot2_3.5.2
## [137] Rhdf5lib_1.31.0 statmod_1.5.0
## [139] broom_1.0.9 bslib_0.9.0
## [141] lwgeom_0.2-14 DEoptimR_1.1-4
## [143] bit_4.6.0