Chapter 24 Interoperability

24.1 Motivation

The Bioconductor single-cell ecosystem is but one of many popular frameworks for scRNA-seq data analysis. Seurat is very widely used for analysis of droplet-based datasets while scanpy provides an option for users who prefer working in Python. In many scenarios, these frameworks provide useful functionality that we might want to use from a Bioconductor-centric analysis (or vice versa). For example, Python has well-established machine learning libraries while R has a large catalogue of statistical tools, and it would be much better to use this functionality directly than to attempt to transplant it into a new framework. However, effective re-use requires some consideration towards interoperability during development of the relevant software tools.

In an ideal world, everyone would agree on a common data structure that could be seamlessly and faithfully exchanged between frameworks. In the real world, though, each framework uses a different structure for various pragmatic or historical reasons. (This obligatory xkcd sums up the situation.) Most single cell-related Bioconductor packages use the SingleCellExperiment class, as previously discussed; Seurat defines its own SeuratObject class; and scanpy has its AnnData class. This inevitably introduces some friction if we are forced to convert from one structure to another in order to use another framework’s methods.

In the absence of coordination of data structures, the next best solution is for each framework to provide methods that can operate on its most basic data object. Depending on the method, this might be the count matrix, the normalized expression matrix, a matrix of PCs or a graph object. If such methods are available, we can simply extract the relevant component from our SingleCellExperiment and call an external method directly without having to assemble that framework’s structure. Indeed, it is for this purpose that almost all scran functions and many scater functions are capable of accepting matrix objects or equivalents (e.g., sparse matrices) in addition to SingleCellExperiments.

In this chapter, we will provide some examples of using functionality from frameworks outside of the SingleCellExperiment ecosystem in a single-cell analysis. We will focus on Seurat and scanpy as these are the two of the most popular analysis frameworks in the field. However, the principles of interoperability are generally applicable and are worth keeping in mind when developing or evaluating any type of analysis software.

24.2 Interchanging with Seurat

Need to add this at some point.

Figure 24.1: Need to add this at some point.

24.3 Interchanging with scanpy

Need to add this at some point.

Figure 24.2: Need to add this at some point.

Session Info

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/
LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocStyle_2.18.1 rebook_1.0.0    

loaded via a namespace (and not attached):
 [1] bookdown_0.21       codetools_0.2-18    XML_3.99-0.5       
 [4] ps_1.5.0            digest_0.6.27       stats4_4.0.3       
 [7] magrittr_2.0.1      evaluate_0.14       highr_0.8          
[10] graph_1.68.0        rlang_0.4.9         stringi_1.5.3      
[13] callr_3.5.1         rmarkdown_2.5       tools_4.0.3        
[16] stringr_1.4.0       processx_3.4.5      parallel_4.0.3     
[19] xfun_0.19           yaml_2.2.1          compiler_4.0.3     
[22] BiocGenerics_0.36.0 BiocManager_1.30.10 htmltools_0.5.0    
[25] CodeDepends_0.6.5   knitr_1.30