Contents

1 Overview

This vignette endeavors to put Bioconductor and scvi-tools together to help understand how different data structures and methods relevant to CITE-seq analysis contribute to interpretation of CITE-seq exeperiments.

The scvi-tools tutorial (for version 0.20.0) analyzes a pair of 10x PBMC CITE-seq experiments (5k and 10k cells). Chapter 12 of the OSCA book analyzes only the 10k dataset.

2 Technical steps to facilitate comparison

The following subsections are essentially “code-only”. We exhibit steps necessary to assemble a SingleCellExperiment instance with the subset of the totalVI quantifications produced for the cells from the “10k” dataset.

2.1 Acquire software

library(SingleCellExperiment)
library(scater)
library(scviR)

2.2 Obtain key data representations

ch12sce = getCh12Sce(clear_cache=FALSE)
ch12sce
## class: SingleCellExperiment 
## dim: 33538 7472 
## metadata(2): Samples se.averaged
## assays(2): counts logcounts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(3): ID Symbol Type
## colnames(7472): AAACCCAAGATTGTGA-1 AAACCCACATCGGTTA-1 ...
##   TTTGTTGTCGAGTGAG-1 TTTGTTGTCGTTCAGA-1
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: Gene Expression
## altExpNames(1): Antibody Capture
options(timeout=3600)
fullvi = getTotalVI5k10kAdata()
fullvi
## AnnData object with n_obs × n_vars = 10849 × 4000
##     obs: 'n_genes', 'percent_mito', 'n_counts', 'batch', '_scvi_labels', '_scvi_batch', 'leiden_totalVI'
##     var: 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
##     uns: '_scvi_manager_uuid', '_scvi_uuid', 'hvg', 'leiden', 'log1p', 'neighbors', 'umap'
##     obsm: 'X_totalVI', 'X_umap', 'denoised_protein', 'protein_expression', 'protein_foreground_prob'
##     layers: 'counts', 'denoised_rna'
##     obsp: 'connectivities', 'distances'

2.3 Assemble a SingleCellExperiment with totalVI outputs

2.3.1 Acquire cell identities and batch labels

totvi_cellids = rownames(fullvi$obs)
totvi_batch = fullvi$obs$batch

2.3.2 Acquire quantifications and latent space positions

totvi_latent = fullvi$obsm$get("X_totalVI")
totvi_umap = fullvi$obsm$get("X_umap")
totvi_denoised_rna = fullvi$layers$get("denoised_rna")
totvi_denoised_protein = fullvi$obsm$get("denoised_protein")
totvi_leiden = fullvi$obs$leiden_totalVI

2.3.3 Drop 5k data from all

is5k = which(totvi_batch == "PBMC5k")
totvi_cellids = totvi_cellids[-is5k]
totvi_latent = totvi_latent[-is5k,]
totvi_umap = totvi_umap[-is5k,]
totvi_denoised_rna = totvi_denoised_rna[-is5k,]
totvi_denoised_protein = totvi_denoised_protein[-is5k,]
totvi_leiden = totvi_leiden[-is5k]

2.3.4 Label the rows of components

rownames(totvi_latent) = totvi_cellids
rownames(totvi_umap) = totvi_cellids
rownames(totvi_denoised_rna) = totvi_cellids
rownames(totvi_denoised_protein) = totvi_cellids
names(totvi_leiden) = totvi_cellids

2.3.5 Find common cell ids

In this section we reduce the cell collections to cells common to the chapter 12 and totalVI datasets.

comm = intersect(totvi_cellids, ch12sce$Barcode)

2.3.6 Build the totalVI SingleCellExperiment

# select and order
totvi_latent = totvi_latent[comm,]
totvi_umap = totvi_umap[comm,]
totvi_denoised_rna = totvi_denoised_rna[comm,]
totvi_denoised_protein = totvi_denoised_protein[comm,]
totvi_leiden = totvi_leiden[comm]
 
# organize the totalVI into SCE with altExp

totsce = SingleCellExperiment(SimpleList(logcounts=t(totvi_denoised_rna))) # FALSE name
rowData(totsce) = S4Vectors::DataFrame(fullvi$var)
rownames(totsce) = rownames(fullvi$var)
rowData(totsce)$Symbol = rownames(totsce)
nn = SingleCellExperiment(SimpleList(logcounts=t(totvi_denoised_protein))) # FALSE name
reducedDims(nn) = list(UMAP=totvi_umap)
altExp(totsce) = nn
altExpNames(totsce) = "denoised_protein"
totsce$leiden = totvi_leiden
altExp(totsce)$leiden = totvi_leiden
altExp(totsce)$ch12.clusters = altExp(ch12sce[,comm])$label

# add average ADT abundance to metadata, for adt_profiles

tot.se.averaged <- sumCountsAcrossCells(altExp(totsce), altExp(totsce)$leiden,
    exprs_values="logcounts", average=TRUE)
rownames(tot.se.averaged) = gsub("_TotalSeqB", "", rownames(tot.se.averaged))
metadata(totsce)$se.averaged = tot.se.averaged

2.3.7 Reduce the chapter 12 dataset to the cells held in common

colnames(ch12sce) = ch12sce$Barcode
ch12sce_matched = ch12sce[, comm]

3 Key outputs of the chapter 12 analysis

3.1 Clustering and projection based on the ADT quantifications

The TSNE projection of the normalized ADT quantifications and the walktrap cluster assignments are produced for the cells common to the two approaches.

plotTSNE(altExp(ch12sce_matched), color_by="label", text_by="label")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text_repel()`).