tidySingleCellExperiment 1.14.0
tidySingleCellExperiment
provides a bridge between Bioconductor single-cell packages (Amezquita et al. 2019) and the tidyverse (Wickham et al. 2019). It enables viewing the Bioconductor SingleCellExperiment object as a tidyverse tibble
, and provides SingleCellExperiment
-compatible dplyr, tidyr, ggplot2 and plotly functions (see Table 1). This allows users to get the best of both Bioconductor and tidyverse worlds.
All functions compatible with SingleCellExperiment s |
After all, a tidySingleCellExperiment is a SingleCellExperiment , just better! |
tidyverse | |
dplyr |
All tibble -compatible functions (e.g., select() ) |
tidyr |
All tibble -compatible functions (e.g., pivot_longer() ) |
ggplot2 |
Plotting with ggplot() |
plotly |
Plotting with plot_ly() |
Utilities | |
as_tibble() |
Convert cell-wise information to a tbl_df |
join_features() |
Add feature-wise information; returns a tbl_df |
aggregate_cells() |
Aggregate feature abundances as pseudobulks; returns a SummarizedExperiment |
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("tidySingleCellExperiment")
Load libraries used in this vignette.
# Bioconductor single-cell packages
library(scran)
library(scater)
library(igraph)
library(celldex)
library(SingleR)
library(SingleCellSignalR)
# Tidyverse-compatible packages
library(purrr)
library(GGally)
library(tidyHeatmap)
# Both
library(tidySingleCellExperiment)
# Other
library(Matrix)
library(dittoSeq)
tidySingleCellExperiment
This is a SingleCellExperiment
object but it is evaluated as a tibble
.
So it is compatible both with SingleCellExperiment
and tidyverse.
data(pbmc_small, package="tidySingleCellExperiment")
pbmc_small_tidy <- pbmc_small
It looks like a tibble
…
pbmc_small_tidy
## # A SingleCellExperiment-tibble abstraction: 80 × 17
## # [90mFeatures=230 | Cells=80 | Assays=counts, logcounts[0m
## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
## 1 ATGC… SeuratPro… 70 47 0 A g2
## 2 CATG… SeuratPro… 85 52 0 A g1
## 3 GAAC… SeuratPro… 87 50 1 B g2
## 4 TGAC… SeuratPro… 127 56 0 A g2
## 5 AGTC… SeuratPro… 173 53 0 A g2
## 6 TCTG… SeuratPro… 70 48 0 A g1
## 7 TGGT… SeuratPro… 64 36 0 A g1
## 8 GCAG… SeuratPro… 72 45 0 A g1
## 9 GATA… SeuratPro… 52 36 0 A g1
## 10 AATG… SeuratPro… 100 41 0 A g1
## # ℹ 70 more rows
## # ℹ 10 more variables: RNA_snn_res.1 <fct>, file <chr>, ident <fct>,
## # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
## # tSNE_2 <dbl>
…but it is a SingleCellExperiment
after all!
counts(pbmc_small_tidy)[1:5, 1:4]
## 5 x 4 sparse Matrix of class "dgCMatrix"
## ATGCCAGAACGACT CATGGCCTGTGCAT GAACCTGATGAACC TGACTGGATTCTCA
## MS4A1 . . . .
## CD79B 1 . . .
## CD79A . . . .
## HLA-DRA . 1 . .
## TCL1A . . . .
The SingleCellExperiment
object’s tibble visualisation can be turned off, or back on at any time.
# Turn off the tibble visualisation
options("restore_SingleCellExperiment_show" = TRUE)
pbmc_small_tidy
## class: SingleCellExperiment
## dim: 230 80
## metadata(0):
## assays(2): counts logcounts
## rownames(230): MS4A1 CD79B ... SPON2 S100B
## rowData names(5): vst.mean vst.variance vst.variance.expected
## vst.variance.standardized vst.variable
## colnames(80): ATGCCAGAACGACT CATGGCCTGTGCAT ... GGAACACTTCAGAC
## CTTGATTGATCTTC
## colData names(9): orig.ident nCount_RNA ... file ident
# Turn on the tibble visualisation
options("restore_SingleCellExperiment_show" = FALSE)
We may have a column that contains the directory each run was taken from,
such as the “file” column in pbmc_small_tidy
.
pbmc_small_tidy$file[1:5]
## [1] "../data/sample2/outs/filtered_feature_bc_matrix/"
## [2] "../data/sample1/outs/filtered_feature_bc_matrix/"
## [3] "../data/sample2/outs/filtered_feature_bc_matrix/"
## [4] "../data/sample2/outs/filtered_feature_bc_matrix/"
## [5] "../data/sample2/outs/filtered_feature_bc_matrix/"
We may want to extract the run/sample name out of it into a separate column.
The tidyverse function extract()
can be used to convert a character column
into multiple columns using regular expression groups.
# Create sample column
pbmc_small_polished <-
pbmc_small_tidy %>%
extract(file, "sample", "../data/([a-z0-9]+)/outs.+", remove=FALSE)
# Reorder to have sample column up front
pbmc_small_polished %>%
select(sample, everything())
## # A SingleCellExperiment-tibble abstraction: 80 × 18
## # [90mFeatures=230 | Cells=80 | Assays=counts, logcounts[0m
## .cell sample orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents
## <chr> <chr> <fct> <dbl> <int> <fct> <fct>
## 1 ATGC… sampl… SeuratPro… 70 47 0 A
## 2 CATG… sampl… SeuratPro… 85 52 0 A
## 3 GAAC… sampl… SeuratPro… 87 50 1 B
## 4 TGAC… sampl… SeuratPro… 127 56 0 A
## 5 AGTC… sampl… SeuratPro… 173 53 0 A
## 6 TCTG… sampl… SeuratPro… 70 48 0 A
## 7 TGGT… sampl… SeuratPro… 64 36 0 A
## 8 GCAG… sampl… SeuratPro… 72 45 0 A
## 9 GATA… sampl… SeuratPro… 52 36 0 A
## 10 AATG… sampl… SeuratPro… 100 41 0 A
## # ℹ 70 more rows
## # ℹ 11 more variables: groups <chr>, RNA_snn_res.1 <fct>, file <chr>,
## # ident <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>,
## # tSNE_1 <dbl>, tSNE_2 <dbl>
Set colours and theme for plots.
# Use colourblind-friendly colours
friendly_cols <- dittoSeq::dittoColors()
# Set theme
custom_theme <- list(
scale_fill_manual(values=friendly_cols),
scale_color_manual(values=friendly_cols),
theme_bw() + theme(
aspect.ratio=1,
legend.position="bottom",
axis.line=element_line(),
text=element_text(size=12),
panel.border=element_blank(),
strip.background=element_blank(),
panel.grid.major=element_line(linewidth=0.2),
panel.grid.minor=element_line(linewidth=0.1),
axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))))
We can treat pbmc_small_polished
as a tibble
for plotting.
Here we plot number of features per cell.
pbmc_small_polished %>%
ggplot(aes(nFeature_RNA, fill=groups)) +
geom_histogram() +
custom_theme
Here we plot total features per cell.
pbmc_small_polished %>%
ggplot(aes(groups, nCount_RNA, fill=groups)) +
geom_boxplot(outlier.shape=NA) +
geom_jitter(width=0.1) +
custom_theme
Here we plot abundance of two features for each group.
pbmc_small_polished %>%
join_features(features=c("HLA-DRA", "LYZ")) %>%
ggplot(aes(groups, .abundance_counts + 1, fill=groups)) +
geom_boxplot(outlier.shape=NA) +
geom_jitter(aes(size=nCount_RNA), alpha=0.5, width=0.2) +
scale_y_log10() +
custom_theme