Chapter 24 Interoperability
24.1 Motivation
The Bioconductor single-cell ecosystem is but one of many popular frameworks for scRNA-seq data analysis. Seurat is very widely used for analysis of droplet-based datasets while scanpy provides an option for users who prefer working in Python. In many scenarios, these frameworks provide useful functionality that we might want to use from a Bioconductor-centric analysis (or vice versa). For example, Python has well-established machine learning libraries while R has a large catalogue of statistical tools, and it would be much better to use this functionality directly than to attempt to transplant it into a new framework. However, effective re-use requires some consideration towards interoperability during development of the relevant software tools.
In an ideal world, everyone would agree on a common data structure that could be seamlessly and faithfully exchanged between frameworks.
In the real world, though, each framework uses a different structure for various pragmatic or historical reasons.
(This obligatory xkcd sums up the situation.)
Most single cell-related Bioconductor packages use the SingleCellExperiment
class, as previously discussed; Seurat defines its own SeuratObject
class; and scanpy has its AnnData
class.
This inevitably introduces some friction if we are forced to convert from one structure to another in order to use another framework’s methods.
In the absence of coordination of data structures, the next best solution is for each framework to provide methods that can operate on its most basic data object.
Depending on the method, this might be the count matrix, the normalized expression matrix, a matrix of PCs or a graph object.
If such methods are available, we can simply extract the relevant component from our SingleCellExperiment
and call an external method directly without having to assemble that framework’s structure.
Indeed, it is for this purpose that almost all scran functions and many scater functions are capable of accepting matrix objects or equivalents (e.g., sparse matrices) in addition to SingleCellExperiment
s.
In this chapter, we will provide some examples of using functionality from frameworks outside of the SingleCellExperiment
ecosystem in a single-cell analysis.
We will focus on Seurat and scanpy as these are the two of the most popular analysis frameworks in the field.
However, the principles of interoperability are generally applicable and are worth keeping in mind when developing or evaluating any type of analysis software.
24.2 Interchanging with Seurat
24.3 Interchanging with scanpy
Session Info
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.12-books/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.12-books/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocStyle_2.18.1 rebook_1.0.0
loaded via a namespace (and not attached):
[1] graph_1.68.0 knitr_1.31 magrittr_2.0.1
[4] BiocGenerics_0.36.0 R6_2.5.0 rlang_0.4.10
[7] highr_0.8 stringr_1.4.0 tools_4.0.4
[10] parallel_4.0.4 xfun_0.22 jquerylib_0.1.3
[13] htmltools_0.5.1.1 CodeDepends_0.6.5 yaml_2.2.1
[16] digest_0.6.27 bookdown_0.21 processx_3.4.5
[19] callr_3.5.1 BiocManager_1.30.10 ps_1.6.0
[22] codetools_0.2-18 sass_0.3.1 evaluate_0.14
[25] rmarkdown_2.7 stringi_1.5.3 compiler_4.0.4
[28] bslib_0.2.4 XML_3.99-0.6 stats4_4.0.4
[31] jsonlite_1.7.2