Introduction

tidySingleCellExperiment provides a bridge between Bioconductor single-cell packages (Amezquita et al. 2019) and the tidyverse (Wickham et al. 2019). It enables viewing the Bioconductor SingleCellExperiment object as a tidyverse tibble, and provides SingleCellExperiment-compatible dplyr, tidyr, ggplot2 and plotly functions (see Table 1). This allows users to get the best of both Bioconductor and tidyverse worlds.


Table 1: Available tidySingleCellExperiment functions and utilities.
All functions compatible with SingleCellExperiments After all, a tidySingleCellExperiment
is a SingleCellExperiment, just better!
tidyverse
dplyr All tibble-compatible
functions (e.g., select())
tidyr All tibble-compatible
functions (e.g., pivot_longer())
ggplot2 Plotting with ggplot()
plotly Plotting with plot_ly()
Utilities
as_tibble() Convert cell-wise information to a tbl_df
join_features() Add feature-wise information;
returns a tbl_df
aggregate_cells() Aggregate feature abundances as pseudobulks;
returns a SummarizedExperiment

Installation

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("tidySingleCellExperiment")

Load libraries used in this vignette.

# Bioconductor single-cell packages
library(scran)
library(scater)
library(igraph)
library(celldex)
library(SingleR)
library(SingleCellSignalR)

# Tidyverse-compatible packages
library(purrr)
library(GGally)
library(tidyHeatmap)

# Both
library(tidySingleCellExperiment)

# Other
library(Matrix)
library(dittoSeq)

1 Data representation of tidySingleCellExperiment

This is a SingleCellExperiment object but it is evaluated as a tibble. So it is compatible both with SingleCellExperiment and tidyverse.

data(pbmc_small, package="tidySingleCellExperiment")
pbmc_small_tidy <- pbmc_small

It looks like a tibble

pbmc_small_tidy
## # A SingleCellExperiment-tibble abstraction: 80 × 17
## # Features=230 | Cells=80 | Assays=counts, logcounts
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # ℹ 70 more rows
## # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
## #   PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
## #   tSNE_2 <dbl>

…but it is a SingleCellExperiment after all!

counts(pbmc_small_tidy)[1:5, 1:4]
## 5 x 4 sparse Matrix of class "dgCMatrix"
##         ATGCCAGAACGACT CATGGCCTGTGCAT GAACCTGATGAACC TGACTGGATTCTCA
## MS4A1                .              .              .              .
## CD79B                1              .              .              .
## CD79A                .              .              .              .
## HLA-DRA              .              1              .              .
## TCL1A                .              .              .              .

The SingleCellExperiment object’s tibble visualisation can be turned off, or back on at any time.

# Turn off the tibble visualisation
options("restore_SingleCellExperiment_show" = TRUE)
pbmc_small_tidy
## class: SingleCellExperiment 
## dim: 230 80 
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
##   vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
##   CTTGATTGATCTTC
## colData names(9): orig.ident nCount_RNA ... file ident
# Turn on the tibble visualisation
options("restore_SingleCellExperiment_show" = FALSE)

2 Annotation polishing

We may have a column that contains the directory each run was taken from, such as the “file” column in pbmc_small_tidy.

pbmc_small_tidy$file[1:5]
## [1] "../data/sample2/outs/filtered_feature_bc_matrix/"
## [2] "../data/sample1/outs/filtered_feature_bc_matrix/"
## [3] "../data/sample2/outs/filtered_feature_bc_matrix/"
## [4] "../data/sample2/outs/filtered_feature_bc_matrix/"
## [5] "../data/sample2/outs/filtered_feature_bc_matrix/"

We may want to extract the run/sample name out of it into a separate column. The tidyverse function extract() can be used to convert a character column into multiple columns using regular expression groups.

# Create sample column
pbmc_small_polished <-
    pbmc_small_tidy %>%
    extract(file, "sample", "../data/([a-z0-9]+)/outs.+", remove=FALSE)

# Reorder to have sample column up front
pbmc_small_polished %>%
    select(sample, everything())
## # A SingleCellExperiment-tibble abstraction: 80 × 18
## # Features=230 | Cells=80 | Assays=counts, logcounts
##    .cell sample orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents
##    <chr> <chr>  <fct>           <dbl>        <int> <fct>           <fct>        
##  1 ATGC… sampl… SeuratPro…         70           47 0               A            
##  2 CATG… sampl… SeuratPro…         85           52 0               A            
##  3 GAAC… sampl… SeuratPro…         87           50 1               B            
##  4 TGAC… sampl… SeuratPro…        127           56 0               A            
##  5 AGTC… sampl… SeuratPro…        173           53 0               A            
##  6 TCTG… sampl… SeuratPro…         70           48 0               A            
##  7 TGGT… sampl… SeuratPro…         64           36 0               A            
##  8 GCAG… sampl… SeuratPro…         72           45 0               A            
##  9 GATA… sampl… SeuratPro…         52           36 0               A            
## 10 AATG… sampl… SeuratPro…        100           41 0               A            
## # ℹ 70 more rows
## # ℹ 11 more variables: groups <chr>, RNA_snn_res.1 <fct>, file <chr>,
## #   ident <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>,
## #   tSNE_1 <dbl>, tSNE_2 <dbl>

3 Preliminary plots

Set colours and theme for plots.

# Use colourblind-friendly colours
friendly_cols <- dittoSeq::dittoColors()

# Set theme
custom_theme <- list(
    scale_fill_manual(values=friendly_cols),
    scale_color_manual(values=friendly_cols),
    theme_bw() + theme(
        aspect.ratio=1,
        legend.position="bottom",
        axis.line=element_line(),
        text=element_text(size=12),
        panel.border=element_blank(),
        strip.background=element_blank(),
        panel.grid.major=element_line(linewidth=0.2),
        panel.grid.minor=element_line(linewidth=0.1),
        axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
        axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))))

We can treat pbmc_small_polished as a tibble for plotting.

Here we plot number of features per cell.

pbmc_small_polished %>%
    ggplot(aes(nFeature_RNA, fill=groups)) +
    geom_histogram() +
    custom_theme

Here we plot total features per cell.

pbmc_small_polished %>%
    ggplot(aes(groups, nCount_RNA, fill=groups)) +
    geom_boxplot(outlier.shape=NA) +
    geom_jitter(width=0.1) +
    custom_theme

Here we plot abundance of two features for each group.

pbmc_small_polished %>%
    join_features(features=c("HLA-DRA", "LYZ")) %>%
    ggplot(aes(groups, .abundance_counts + 1, fill=groups)) +
    geom_boxplot(outlier.shape=NA) +
    geom_jitter(aes(size=nCount_RNA), alpha=0.5, width=0.2) +
    scale_y_log10() +
    custom_theme