This vignette endeavors to put Bioconductor and scvi-tools together to help understand how different data structures and methods relevant to CITE-seq analysis contribute to interpretation of CITE-seq exeperiments.
The scvi-tools tutorial (for version 0.20.0) analyzes a pair of 10x PBMC CITE-seq experiments (5k and 10k cells). Chapter 12 of the OSCA book analyzes only the 10k dataset.
The following subsections are essentially “code-only”. We exhibit steps necessary to assemble a SingleCellExperiment instance with the subset of the totalVI quantifications produced for the cells from the “10k” dataset.
## class: SingleCellExperiment
## dim: 33538 7472
## metadata(2): Samples se.averaged
## assays(2): counts logcounts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): ID Symbol Type
## colnames(7472): AAACCCAAGATTGTGA-1 AAACCCACATCGGTTA-1 ...
## TTTGTTGTCGAGTGAG-1 TTTGTTGTCGTTCAGA-1
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: Gene Expression
## altExpNames(1): Antibody Capture
## AnnData object with n_obs × n_vars = 10849 × 4000
## obs: 'n_genes', 'percent_mito', 'n_counts', 'batch', '_scvi_labels', '_scvi_batch', 'leiden_totalVI'
## var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
## uns: '_scvi_manager_uuid', '_scvi_uuid', 'hvg', 'leiden', 'log1p', 'neighbors', 'umap'
## obsm: 'X_totalVI', 'X_umap', 'denoised_protein', 'protein_expression', 'protein_foreground_prob'
## layers: 'counts', 'denoised_rna'
## obsp: 'connectivities', 'distances'
In this section we reduce the cell collections to cells common to the chapter 12 and totalVI datasets.
# select and order
totvi_latent = totvi_latent[comm,]
totvi_umap = totvi_umap[comm,]
totvi_denoised_rna = totvi_denoised_rna[comm,]
totvi_denoised_protein = totvi_denoised_protein[comm,]
totvi_leiden = totvi_leiden[comm]
# organize the totalVI into SCE with altExp
totsce = SingleCellExperiment(SimpleList(logcounts=t(totvi_denoised_rna))) # FALSE name
rowData(totsce) = S4Vectors::DataFrame(fullvi$var)
rownames(totsce) = rownames(fullvi$var)
rowData(totsce)$Symbol = rownames(totsce)
nn = SingleCellExperiment(SimpleList(logcounts=t(totvi_denoised_protein))) # FALSE name
reducedDims(nn) = list(UMAP=totvi_umap)
altExp(totsce) = nn
altExpNames(totsce) = "denoised_protein"
totsce$leiden = totvi_leiden
altExp(totsce)$leiden = totvi_leiden
altExp(totsce)$ch12.clusters = altExp(ch12sce[,comm])$label
# add average ADT abundance to metadata, for adt_profiles
tot.se.averaged <- sumCountsAcrossCells(altExp(totsce), altExp(totsce)$leiden,
exprs_values="logcounts", average=TRUE)
rownames(tot.se.averaged) = gsub("_TotalSeqB", "", rownames(tot.se.averaged))
metadata(totsce)$se.averaged = tot.se.averaged
The TSNE projection of the normalized ADT quantifications and the walktrap cluster assignments are produced for the cells common to the two approaches.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text_repel()`).