Chapter 28 Filtered human PBMCs (10X Genomics)

28.1 Introduction

This performs an analysis of the public PBMC ID dataset generated by 10X Genomics (Zheng et al. 2017), starting from the filtered count matrix.

28.3 Quality control

Cell calling implicitly serves as a QC step to remove libraries with low total counts and number of detected genes. Thus, we will only filter on the mitochondrial proportion.

Percentage of mitochondrial reads in each cell in each of the 10X PBMC datasets, compared to the total count. Each point represents a cell and is colored according to whether that cell was discarded.

Figure 28.1: Percentage of mitochondrial reads in each cell in each of the 10X PBMC datasets, compared to the total count. Each point represents a cell and is colored according to whether that cell was discarded.

## $pbmc3k
##    Mode   FALSE    TRUE 
## logical    2609      91 
## 
## $pbmc4k
##    Mode   FALSE    TRUE 
## logical    4182     158 
## 
## $pbmc8k
##    Mode   FALSE    TRUE 
## logical    8157     224

28.4 Normalization

We perform library size normalization, simply for convenience when dealing with file-backed matrices.

## $pbmc3k
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.234   0.748   0.926   1.000   1.157   6.604 
## 
## $pbmc4k
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.315   0.711   0.890   1.000   1.127  11.027 
## 
## $pbmc8k
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.296   0.704   0.877   1.000   1.118   6.794

28.6 Dimensionality reduction

For various reasons, we will first analyze each PBMC dataset separately rather than merging them together. We use randomized SVD, which is more efficient for file-backed matrices.

28.7 Clustering

## $pbmc3k
## 
##   1   2   3   4   5   6   7   8   9  10 
## 487 154 603 514  31 150 179 333 147  11 
## 
## $pbmc4k
## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13 
##  497  185  569  786  373  232   44 1023   77  218   88   54   36 
## 
## $pbmc8k
## 
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1004  759 1073 1543  367  150  201 2067   59  154  244   67   76  285   20   15 
##   17   18 
##   64    9
Obligatory $t$-SNE plots of each PBMC dataset, where each point represents a cell in the corresponding dataset and is colored according to the assigned cluster.

Figure 28.3: Obligatory \(t\)-SNE plots of each PBMC dataset, where each point represents a cell in the corresponding dataset and is colored according to the assigned cluster.

28.8 Data integration

With the per-dataset analyses out of the way, we will now repeat the analysis after merging together the three batches.

We use the percentage of lost variance as a diagnostic measure.

##         pbmc3k    pbmc4k   pbmc8k
## [1,] 7.003e-03 3.126e-03 0.000000
## [2,] 7.137e-05 5.125e-05 0.003003

We proceed to clustering:

##     
##      pbmc3k pbmc4k pbmc8k
##   1     113    387    825
##   2     507    395    806
##   3     175    344    581
##   4     295    539   1018
##   5     346    638   1210
##   6      11      3      9
##   7      17     27    111
##   8      33    113    185
##   9     423    754   1546
##   10      4     36     67
##   11    197    124    221
##   12    150    180    293
##   13    327    588   1125
##   14     11     54    160

And visualization:

Obligatory $t$-SNE plots for the merged PBMC datasets, where each point represents a cell and is colored by cluster (top) or batch (bottom).

Figure 28.4: Obligatory \(t\)-SNE plots for the merged PBMC datasets, where each point represents a cell and is colored by cluster (top) or batch (bottom).

Session Info

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] batchelor_1.6.0             BiocSingular_1.6.0         
 [3] scran_1.18.0                scater_1.18.0              
 [5] ggplot2_3.3.2               TENxPBMCData_1.8.0         
 [7] HDF5Array_1.18.0            rhdf5_2.34.0               
 [9] DelayedArray_0.16.0         Matrix_1.2-18              
[11] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
[13] Biobase_2.50.0              GenomicRanges_1.42.0       
[15] GenomeInfoDb_1.26.0         IRanges_2.24.0             
[17] S4Vectors_0.28.0            BiocGenerics_0.36.0        
[19] MatrixGenerics_1.2.0        matrixStats_0.57.0         
[21] BiocStyle_2.18.0            rebook_1.0.0               

loaded via a namespace (and not attached):
  [1] Rtsne_0.15                    ggbeeswarm_0.6.0             
  [3] colorspace_1.4-1              ellipsis_0.3.1               
  [5] scuttle_1.0.0                 bluster_1.0.0                
  [7] XVector_0.30.0                BiocNeighbors_1.8.0          
  [9] farver_2.0.3                  bit64_4.0.5                  
 [11] RSpectra_0.16-0               interactiveDisplayBase_1.28.0
 [13] AnnotationDbi_1.52.0          codetools_0.2-16             
 [15] sparseMatrixStats_1.2.0       knitr_1.30                   
 [17] ResidualMatrix_1.0.0          dbplyr_1.4.4                 
 [19] uwot_0.1.8                    graph_1.68.0                 
 [21] shiny_1.5.0                   BiocManager_1.30.10          
 [23] compiler_4.0.3                httr_1.4.2                   
 [25] dqrng_0.2.1                   assertthat_0.2.1             
 [27] fastmap_1.0.1                 limma_3.46.0                 
 [29] later_1.1.0.1                 htmltools_0.5.0              
 [31] tools_4.0.3                   igraph_1.2.6                 
 [33] rsvd_1.0.3                    gtable_0.3.0                 
 [35] glue_1.4.2                    GenomeInfoDbData_1.2.4       
 [37] dplyr_1.0.2                   rappdirs_0.3.1               
 [39] Rcpp_1.0.5                    vctrs_0.3.4                  
 [41] rhdf5filters_1.2.0            ExperimentHub_1.16.0         
 [43] DelayedMatrixStats_1.12.0     xfun_0.19                    
 [45] stringr_1.4.0                 ps_1.4.0                     
 [47] beachmat_2.6.0                mime_0.9                     
 [49] lifecycle_0.2.0               irlba_2.3.3                  
 [51] statmod_1.4.35                XML_3.99-0.5                 
 [53] edgeR_3.32.0                  AnnotationHub_2.22.0         
 [55] zlibbioc_1.36.0               scales_1.1.1                 
 [57] promises_1.1.1                yaml_2.2.1                   
 [59] curl_4.3                      memoise_1.1.0                
 [61] gridExtra_2.3                 stringi_1.5.3                
 [63] RSQLite_2.2.1                 BiocVersion_3.12.0           
 [65] highr_0.8                     BiocParallel_1.24.0          
 [67] rlang_0.4.8                   pkgconfig_2.0.3              
 [69] bitops_1.0-6                  evaluate_0.14                
 [71] lattice_0.20-41               purrr_0.3.4                  
 [73] Rhdf5lib_1.12.0               CodeDepends_0.6.5            
 [75] labeling_0.4.2                cowplot_1.1.0                
 [77] bit_4.0.4                     processx_3.4.4               
 [79] tidyselect_1.1.0              RcppAnnoy_0.0.16             
 [81] magrittr_1.5                  bookdown_0.21                
 [83] R6_2.5.0                      generics_0.1.0               
 [85] DBI_1.1.0                     pillar_1.4.6                 
 [87] withr_2.3.0                   RCurl_1.98-1.2               
 [89] tibble_3.0.4                  crayon_1.3.4                 
 [91] BiocFileCache_1.14.0          rmarkdown_2.5                
 [93] viridis_0.5.1                 locfit_1.5-9.4               
 [95] grid_4.0.3                    FNN_1.1.3                    
 [97] blob_1.2.1                    callr_3.5.1                  
 [99] digest_0.6.27                 xtable_1.8-4                 
[101] httpuv_1.5.4                  munsell_0.5.0                
[103] beeswarm_0.2.3                viridisLite_0.3.0            
[105] vipor_0.4.5                  

Bibliography

Zheng, G. X., J. M. Terry, P. Belgrader, P. Ryvkin, Z. W. Bent, R. Wilson, S. B. Ziraldo, et al. 2017. “Massively parallel digital transcriptional profiling of single cells.” Nat Commun 8 (January): 14049.