shinyDSP 1.1.2
The purpose of this vignette is to look under the hood and explain what a few underlying functions do to make this app work.
The human kidney data from Nanostring is loaded from ExperimentHub()
. Two files are required: a count matrix and matching annotation file. Let’s read those files.
library(standR)
library(SummarizedExperiment)
library(ExperimentHub)
library(readr)
library(dplyr)
library(stats)
eh <- ExperimentHub()
AnnotationHub::query(eh, "standR")
## ExperimentHub with 3 records
## # snapshotDate(): 2025-06-05
## # $dataprovider: Nanostring
## # $species: NA
## # $rdataclass: data.frame
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH7364"]]'
##
## title
## EH7364 | GeomxDKDdata_count
## EH7365 | GeomxDKDdata_sampleAnno
## EH7366 | GeomxDKDdata_featureAnno
countFilePath <- eh[["EH7364"]]
sampleAnnoFilePath <- eh[["EH7365"]]
countFile <- readr::read_delim(unname(countFilePath), na = character())
sampleAnnoFile <- readr::read_delim(unname(sampleAnnoFilePath), na = character())
Time for this code chunk to run: 15.76 seconds
A key step in analysis is to select a biological group(s) of interest. Let’s inspect sampleAnnoFile
.
colnames(sampleAnnoFile)
## [1] "SlideName" "ScanName" "ROILabel"
## [4] "SegmentLabel" "SegmentDisplayName" "Sample_ID"
## [7] "AOISurfaceArea" "AOINucleiCount" "ROICoordinateX"
## [10] "ROICoordinateY" "RawReads" "TrimmedReads"
## [13] "StitchedReads" "AlignedReads" "DeduplicatedReads"
## [16] "SequencingSaturation" "UMIQ30" "RTSQ30"
## [19] "disease_status" "pathology" "region"
## [22] "LOQ" "NormalizationFactor" "RoiReportX"
## [25] "RoiReportY"
sampleAnnoFile$disease_status %>% unique()
## [1] "DKD" "normal"
sampleAnnoFile$region %>% unique()
## [1] "glomerulus" "tubule"
“disease_status” and “region” look like interesting variables. For example, we might be interested in comparing “normal_glomerulus” to “DKD_glomerulus”. The app can create a new sampleAnnoFile
where two or more variables of interest can be combined into one grouped variable (under “Variable(s) of interest” in the main side bar
).
new_sampleAnnoFile <- sampleAnnoFile %>%
tidyr::unite(
"disease_region", # newly created grouped variable
c("disease_status","region"), # variables to combine
sep = "_"
)
new_sampleAnnoFile$disease_region %>% unique()
## [1] "DKD_glomerulus" "DKD_tubule" "normal_tubule"
## [4] "normal_glomerulus"
Time for this code chunk to run: 0.04 seconds
Finally a spatial experiment object can be created.
spe <- standR::readGeoMx(as.data.frame(countFile),
as.data.frame(new_sampleAnnoFile))
Time for this code chunk to run: 1.69 seconds
We merged “disease_status” and “region” to create a new group called, “disease_region”. The app allows users to pick any subset of groups of interest. In this example, the four possible groups are “DKD_glomerulus”, “DKD_tubule”,“normal_tubule”, and “normal_glomerulus”. The following script can subset spe
to keep certain groups.
selectedTypes <- c("DKD_glomerulus", "DKD_tubule", "normal_tubule", "normal_glomerulus")
toKeep <- colData(spe) %>%
tibble::as_tibble() %>%
pull(disease_region)
spe <- spe[, grepl(paste(selectedTypes, collapse = "|"), toKeep)]
Time for this code chunk to run: 0.14 seconds
Regions of interest(ROIs) can be filtered out based on any quantitative variable in colData(spe)
(same as colnames(new_sampleAnnoFile)
). These options can be found under the “QC” nav panel
’s side bar
. Let’s say I want to keep ROIs whose “SequencingSaturation” is at least 85.
filter <- lapply("SequencingSaturation", function(column) {
cutoff_value <- 85
if(!is.na(cutoff_value)) {
return(colData(spe)[[column]] > cutoff_value)
} else {
return(NULL)
}
})
filtered_spe <- spe[,unlist(filter)]
colData(spe) %>% dim()
## [1] 231 24
colData(filtered_spe) %>% dim()
## [1] 217 24
Time for this code chunk to run: 0.06 seconds
We can see that 14 ROIs have been filtered out.
Two PCA plots (colour-coded by biological groups or batch) for three normalization schemes are automatically created in the “PCA” nav panel
. Let’s take Q3 and RUV4 normalization as an example.
speQ3 <- standR::geomxNorm(filtered_spe, method = "upperquartile")
speQ3 <- scater::runPCA(filtered_spe)
speQ3_compute <- SingleCellExperiment::reducedDim(speQ3, "PCA")
Time for this code chunk to run: 1.73 seconds
speRuv_NCGs <- standR::findNCGs(filtered_spe,
batch_name = "SlideName",
top_n = 200)
speRuvBatchCorrection <- standR::geomxBatchCorrection(speRuv_NCGs,
factors = "disease_region",
NCGs = S4Vectors::metadata(speRuv_NCGs)$NCGs, k = 2
)
speRuv <- scater::runPCA(speRuvBatchCorrection)
speRuv_compute <- SingleCellExperiment::reducedDim(speRuv, "PCA")
Time for this code chunk to run: 11.66 seconds
Then, .PCAFunction()
is called to generate the plot. We can see that biological replicates group better with RUV4 (top) compared to Q3 (bottom).
.PCAFunction(speRuv, speRuv_compute, "disease_region", selectedTypes, c(21,22,23,24), c("blue","dodgerblue1","black","grey75"))
.PCAFunction(speQ3, speQ3_compute, "disease_region", selectedTypes, c(21,22,23,24), c("blue","dodgerblue1","black","grey75"))
Time for this code chunk to run: 1.16 seconds
Let’s proceed with RUV4 normalization. A design matrix is created next.
design <- model.matrix(~0+disease_region+ruv_W1+ruv_W2, data = colData(speRuv))
# Clean up column name
colnames(design) <- gsub("disease_region", "", colnames(design))
Time for this code chunk to run: 0.06 seconds
If confounding variables are chosen in the main side bar
, those would be added to model.matrix
as ~0 + disease_region + confounder1 + confounder2)
.
edgeR
is used to convert SpatialExperiment
into DGEList
, filter and estimate dispersion.
library(edgeR)
dge <- SE2DGEList(speRuv)
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes = FALSE]
dge <- estimateDisp(dge, design = design, robust = TRUE)
Time for this code chunk to run: 48.03 seconds
Recall our “selectedTypes” from above: “DKD_glomerulus”, “DKD_tubule”, “normal_tubule”, “normal_glomerulus”.
The following code creates all pairwise comparisons between them.
# In case there are spaces
selectedTypes_underscore <- gsub(" ", "_", selectedTypes)
comparisons <- list()
comparisons <- lapply(
seq_len(choose(length(selectedTypes_underscore), 2)),
function(i) {
noquote(
paste0(
utils::combn(selectedTypes_underscore, 2,
simplify = FALSE
)[[i]][1],
"-",
utils::combn(selectedTypes_underscore, 2,
simplify = FALSE
)[[i]][2]
)
)
}
)
con <- makeContrasts(
# Must use as.character()
contrasts = as.character(unlist(comparisons)),
levels = colnames(design)
)
colnames(con) <- sub("-", "_vs_", colnames(con))
con
## Contrasts
## Levels DKD_glomerulus_vs_DKD_tubule
## DKD_glomerulus 1
## DKD_tubule -1
## normal_glomerulus 0
## normal_tubule 0
## ruv_W1 0
## ruv_W2 0
## Contrasts
## Levels DKD_glomerulus_vs_normal_tubule
## DKD_glomerulus 1
## DKD_tubule 0
## normal_glomerulus 0
## normal_tubule -1
## ruv_W1 0
## ruv_W2 0
## Contrasts
## Levels DKD_glomerulus_vs_normal_glomerulus
## DKD_glomerulus 1
## DKD_tubule 0
## normal_glomerulus -1
## normal_tubule 0
## ruv_W1 0
## ruv_W2 0
## Contrasts
## Levels DKD_tubule_vs_normal_tubule DKD_tubule_vs_normal_glomerulus
## DKD_glomerulus 0 0
## DKD_tubule 1 1
## normal_glomerulus 0 -1
## normal_tubule -1 0
## ruv_W1 0 0
## ruv_W2 0 0
## Contrasts
## Levels normal_tubule_vs_normal_glomerulus
## DKD_glomerulus 0
## DKD_tubule 0
## normal_glomerulus -1
## normal_tubule 1
## ruv_W1 0
## ruv_W2 0
Time for this code chunk to run: 0.02 seconds
limma
The app uses duplicateCorrelation()
“[s]ince we need to make comparisons both within and between subjects, it is necessary to treat Patient as a random effect” limma user’s guide (Ritchie et al. 2015). limma-voom
method is used as standR
package recommends (Liu et al. 2024).
library(limma)
block_by <- colData(speRuv)[["SlideName"]]
v <- voom(dge, design)
corfit <- duplicateCorrelation(v, design,
block = block_by
)
v2 <- voom(dge, design,
block = block_by,
correlation = corfit$consensus
)
corfit2 <- duplicateCorrelation(v, design,
block = block_by
)
fit <- lmFit(v, design,
block = block_by,
correlation = corfit2$consensus
)
fit_contrast <- contrasts.fit(fit,
contrasts = con
)
efit <- eBayes(fit_contrast, robust = TRUE)
Time for this code chunk to run: 167.59 seconds
For each contrast (a column in con
), the app creates a table of top differentially expressed genes sorted by their adjusted P value.
# Keep track of how many comparisons there are
numeric_vector <- seq_len(ncol(con))
new_list <- as.list(numeric_vector)
# This adds n+1th index to new_list where n = ncol(con)
# This index contains seq_len(ncol(con))
# ex. new_list[[7]] = 1 2 3 4 5 6
# This coefficient allows ANOVA-like comparison in toptable()
if (length(selectedTypes) > 2) {
new_list[[length(new_list) + 1]] <- numeric_vector
}
topTabDF <- lapply(new_list, function(i) {
limma::topTable(efit,
coef = i, number = Inf, p.value = 0.05,
adjust.method = "BH", lfc = 1
) %>%
tibble::rownames_to_column(var = "Gene")
})
# Adds names to topTabDF
if (length(selectedTypes) > 2) {
names(topTabDF) <- c(
colnames(con),
colnames(con) %>% stringr::str_split(., "_vs_") %>%
unlist() %>% unique() %>% paste(., collapse = "_vs_")
)
} else {
names(topTabDF) <- colnames(con)
}
Time for this code chunk to run: 0.08 seconds
topTabDF
is now a list of tables where the list index corresponds to that of columns in con
.
colnames(con)
## [1] "DKD_glomerulus_vs_DKD_tubule" "DKD_glomerulus_vs_normal_tubule"
## [3] "DKD_glomerulus_vs_normal_glomerulus" "DKD_tubule_vs_normal_tubule"
## [5] "DKD_tubule_vs_normal_glomerulus" "normal_tubule_vs_normal_glomerulus"
head(topTabDF[[1]])
## Gene Type mean_zscore mean_expr logFC AveExpr t P.Value
## 1 SRGAP2B gene 5.392825 8.094697 2.591325 8.064988 30.74096 1.974029e-80
## 2 PODXL gene 11.146967 8.563414 4.502203 8.518927 30.81248 4.080221e-80
## 3 FGF1 gene 8.865378 7.692389 3.539804 7.632976 30.30632 7.097517e-79
## 4 CLIC5 gene 9.624410 7.925377 3.780739 7.869413 29.14067 5.750702e-76
## 5 PLAT gene 8.262970 8.297025 3.965194 8.256626 29.08766 7.827953e-76
## 6 SPOCK1 gene 5.899207 7.119840 2.745986 7.061100 28.75805 6.656020e-76
## adj.P.Val B
## 1 3.628068e-76 172.6863
## 2 3.749520e-76 171.7997
## 3 4.348175e-75 168.8320
## 4 2.397832e-72 162.3045
## 5 2.397832e-72 162.1593
## 6 2.397832e-72 162.1138
These tables are then shown to the user in the “Tables” nav panel
.
No further serious data processing is performed at this point. The numbers in topTabDF
is now parsed to create volcano plots and heatmaps in their respective nav panel
s.
volcanoDF <- lapply(seq_len(ncol(con)), function(i) {
limma::topTable(efit, coef = i, number = Inf) %>%
tibble::rownames_to_column(var = "Target.name") %>%
dplyr::select("Target.name", "logFC", "adj.P.Val") %>%
dplyr::mutate(de = ifelse(logFC >= 1 &
adj.P.Val < 0.05, "UP",
ifelse(logFC <= -(1) &
adj.P.Val < 0.05, "DN",
"NO"
)
)) %>%
dplyr::mutate(
logFC_threshold = stats::quantile(abs(logFC), 0.99,
na.rm = TRUE
),
pval_threshold = stats::quantile(adj.P.Val, 0.01,
na.rm = TRUE
),
deLab = ifelse(abs(logFC) > logFC_threshold &
adj.P.Val < pval_threshold &
abs(logFC) >= 0.05 &
adj.P.Val < 0.05, Target.name, NA)
)
})
plots <- lapply(seq_along(volcanoDF), function(i) {
.volcanoFunction(
volcanoDF[[i]], 12, 5,
colnames(con)[i],
1, 0.05,
"Blue", "Grey", "Red"
) +
ggplot2::xlim(
-2,
2
) +
ggplot2::ylim(
0, 20
)
})
plots[1]
## [[1]]
Time for this code chunk to run: 2.63 seconds
In contrast to Volcano plots, only the top N genes are shown in a heatmap. Setup is a bit more involved.
lcpmSubScaleTopGenes <- lapply(names(topTabDF), function(name) {
columns <- stringr::str_split_1(name, "_vs_") %>%
lapply(function(.) {
which(SummarizedExperiment::colData(speRuv) %>%
tibble::as_tibble() %>%
dplyr::pull(disease_region) == .)
}) %>%
unlist()
table <- SummarizedExperiment::assay(speRuv, 2)[
topTabDF[[name]] %>%
dplyr::slice_head(n = 50) %>%
dplyr::select(Gene) %>%
unlist() %>%
unname(),
columns
] %>%
data.frame() %>%
t() %>%
scale() %>%
t()
return(table)
})
names(lcpmSubScaleTopGenes) <- names(topTabDF)
columnSplit <- lapply(names(topTabDF), function(name) {
columnSplit <- stringr::str_split_1(name, "_vs_") %>%
lapply(function(.){
which(
SummarizedExperiment::colData(speRuv) %>%
tibble::as_tibble() %>% dplyr::select(disease_region) == .
)
} ) %>%
as.vector() %>%
summary() %>%
.[, "Length"]
})
names(columnSplit) <- names(lcpmSubScaleTopGenes)
Time for this code chunk to run: 1.01 seconds
colFunc <- circlize::colorRamp2(
c(
-3, 0,
3
),
hcl_palette = "Inferno"
)
heatmap <- lapply(names(lcpmSubScaleTopGenes), function(name) {
ComplexHeatmap::Heatmap(lcpmSubScaleTopGenes[[name]],
cluster_columns = FALSE, col = colFunc,
heatmap_legend_param = list(
border = "black",
title = "Z score",
title_gp = grid::gpar(
fontsize = 12,
fontface = "plain",
fontfamily = "sans"
),
labels_gp = grid::gpar(
fontsize = 12,
fontface = "plain",
fontfamily = "sans"
),
legend_height = grid::unit(
3 * as.numeric(30),
units = "mm"
)
),
top_annotation = ComplexHeatmap::HeatmapAnnotation(
foo = ComplexHeatmap::anno_block(
gp = grid::gpar(lty = 0, fill = "transparent"),
labels = columnSplit[[name]] %>% names(),
labels_gp = grid::gpar(
col = "black", fontsize = 14,
fontfamily = "sans",
fontface = "bold"
),
labels_rot = 0, labels_just = "center",
labels_offset = grid::unit(4.5, "mm")
)
),
border_gp = grid::gpar(col = "black", lwd = 0.2),
row_names_gp = grid::gpar(
fontfamily = "sans",
fontface = "italic",
fontsize = 10
),
show_column_names = FALSE,
column_title = NULL,
column_split = rep(
LETTERS[seq_len(columnSplit[[name]] %>% length())],
columnSplit[[name]] %>% unname() %>% as.numeric()
)
)
})
names(heatmap) <- names(lcpmSubScaleTopGenes)
heatmap[[1]]
Time for this code chunk to run: 2.68 seconds
Liu, Ning, Dharmesh D Bhuva, Ahmed Mohamed, Micah Bokelund, Arutha Kulasinghe, Chin Wee Tan, and Melissa J Davis. 2024. “standR: Spatial Transcriptomic Analysis for GeoMx DSP Data.” Nucleic Acids Research 52 (1): e2–e2. https://doi.org/10.1093/nar/gkad1026.
Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47–e47. https://doi.org/10.1093/nar/gkv007.