1 Loading Processed Single-Cell Data

For the demonstration of escape, we will use the example “pbmc_small” data from Seurat and also generate a SingleCellExperiment object from it.

pbmc_small <- get("pbmc_small")
pbmc_small <- DietSeurat(suppressMessages(UpdateSeuratObject(pbmc_small)))
sce <- as.SingleCellExperiment(pbmc_small, assay = "RNA")

2 Getting Gene Sets

3 Getting Gene Sets

3.1 Option 1: Molecular Signture Database

The first step in the process of performing gene set enrichment analysis is identifying the gene sets we would like to use. The function getGeneSets() allows users to isolate a whole or multiple libraries from a list of GSEABase GeneSetCollection objects. We can do this for gene set collections from the built-in Molecular Signature Database by setting the parameter library equal to library/libraries of interest. For multiple libraries, just set library = c(“Library1”, “Library2”, etc).

In Addition:
- Individual pathways/gene sets can be isolated from the libraries selected, by setting gene.sets = the name(s) of the gene sets of interest.
- Subcategories of the invidual libaries can be selected using the subcategory parameter.

If the sequencing of the single-cell data is performed on a species other than “Homo sapiens”, make sure to use the species parameter in getGeneSets() in order to get the correct gene nomenclature.

GS.hallmark <- getGeneSets(library = "H")

3.2 Option 2: Built-In gene sets

data("escape.gene.sets", package="escape")
gene.sets <- escape.gene.sets

3.3 Option 3: Define personal gene sets

gene.sets <- list(Tcell_signature = c("CD2","CD3E","CD3D"),
            Myeloid_signature = c("SPI1","FCER1G","CSF1R"))

4 Enrichment

The next step is performing the enrichment on the RNA count data. The function enrichIt() can handle either a matrix of raw count data or will pull that data directly from a SingleCellExperiment or Seurat object. The gene.sets parameter in the function is the GeneSets, either generated from getGeneSets() or from the user. The enrichment scores will be calculated across all individual cells and groups is the n size to break the enrichment by while the cores is the number of cores to perform in parallel during the enrichment calculation too.

enrichIt() can utilize two distinct methods for quantification using the method parameter - either the “ssGSEA” method described by Barbie et al 2009 or “UCell” described by Andreatta and Carmona 2021.

To prevent issues with enrichment calculations for gene sets with a low number of genes represented in the count data of a single-cell object - min.size was instituted to remove gene sets with less than the indicated genes.

Important Note: This is computationally intensive and is highly dependent on the number of cells and the number of gene sets included.

ES.seurat <- enrichIt(obj = pbmc_small, 
                      gene.sets = GS.hallmark, 
                      groups = 1000, cores = 2, 
                      min.size = 5)
## [1] "Using sets of 1000 cells. Running 1 times."
## Setting parallel calculations through a SnowParam back-end
## with workers=2 and tasks=100.
## Estimating ssGSEA scores for 17 gene sets.
## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."
ES.sce <- enrichIt(obj = sce, 
                   gene.sets = GS.hallmark, 
                   method = "UCell",
                   groups = 1000, cores = 2, 
                   min.size = 5)
## [1] "Using sets of 1000 cells. Running 1 times."

We can then easily add these results back to our Seurat or SCE object.

## if working with a Seurat object
pbmc_small <- Seurat::AddMetaData(pbmc_small, ES.seurat)

## if working with an SCE object
met.data <- merge(colData(sce), ES.sce, by = "row.names", all=TRUE)
row.names(met.data) <- met.data$Row.names
met.data$Row.names <- NULL
colData(sce) <- met.data

5 Visualizations

The easiest way to generate almost any visualization for single cell data is via dittoSeq, which is an extremely flexible visualization package for both bulk and single-cell RNA-seq data that works very well for both expression data and metadata. Better yet, it can handle both SingleCellExperiment and Seurat objects.

To keep things consistent, we’ll define a pleasing color scheme.

colors <- colorRampPalette(c("#0D0887FF","#7E03A8FF","#CC4678FF","#F89441FF","#F0F921FF"))

5.1 Heatmaps

A simple way to approach visualizations for enrichment results is the heatmap, especially if you are using a number of gene sets or libraries.

dittoHeatmap(pbmc_small, genes = NULL, metas = names(ES.seurat), 
             annot.by = "groups", 
             fontsize = 7, 
             cluster_cols = TRUE,
             heatmap.colors = colors(50))

A user can also produce a heatmap with select gene sets by providing specific names to the metas parameter. For example, we can isolated gene sets involved immune response.

dittoHeatmap(sce, genes = NULL, 
             metas = c("HALLMARK_IL2_STAT5_SIGNALING", 
             annot.by = "groups", 
             fontsize = 7,
             heatmap.colors = colors(50))

5.2 Violin Plots

Another way to visualize a subset of gene set enrichment would be to graph the distribution of enrichment using violin, jitter, boxplot, or ridgeplots. We can also compare between categorical variables using the group.by parameter.

multi_dittoPlot(sce, vars = c("HALLMARK_IL2_STAT5_SIGNALING", 
                group.by = "groups", plots = c("jitter", "vlnplot", "boxplot"), 
                ylab = "Enrichment Scores", 
                theme = theme_classic() + theme(plot.title = element_text(size = 10)))