Chapter 31 Muraro human pancreas (CEL-seq)

31.1 Introduction

This performs an analysis of the Muraro et al. (2016) CEL-seq dataset, consisting of human pancreas cells from various donors.

31.3 Quality control

This dataset lacks mitochondrial genes so we will do without. For the one batch that seems to have a high proportion of low-quality cells, we compute an appropriate filter threshold using a shared median and MAD from the other batches (Figure 31.1).

Distribution of each QC metric across cells from each donor in the Muraro pancreas dataset. Each point represents a cell and is colored according to whether that cell was discarded.

Figure 31.1: Distribution of each QC metric across cells from each donor in the Muraro pancreas dataset. Each point represents a cell and is colored according to whether that cell was discarded.

We have a look at the causes of removal:

##              low_lib_size            low_n_features high_altexps_ERCC_percent 
##                       663                       700                       738 
##                   discard 
##                       773

31.4 Normalization

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.088   0.541   0.821   1.000   1.211  13.987
Relationship between the library size factors and the deconvolution size factors in the Muraro pancreas dataset.

Figure 31.2: Relationship between the library size factors and the deconvolution size factors in the Muraro pancreas dataset.

31.6 Data integration

We use the proportion of variance lost as a diagnostic measure:

##           D28      D29      D30     D31
## [1,] 0.060847 0.024121 0.000000 0.00000
## [2,] 0.002646 0.003018 0.062421 0.00000
## [3,] 0.003449 0.002641 0.002598 0.08162

31.8 Clustering

Heatmap of the frequency of cells from each cell type label in each cluster.

Figure 31.4: Heatmap of the frequency of cells from each cell type label in each cluster.

##        Donor
## Cluster D28 D29 D30 D31
##      1  104   6  57 112
##      2   59  21  77  97
##      3   12  75  64  43
##      4   28 149 126 120
##      5   87 261 277 214
##      6   21   7  54  26
##      7    1   6   6  37
##      8    6   6   5   2
##      9   11  68   5  30
##      10   4   2   5   8
Obligatory $t$-SNE plots of the Muraro pancreas dataset. Each point represents a cell that is colored by cluster (left) or batch (right).

Figure 31.5: Obligatory \(t\)-SNE plots of the Muraro pancreas dataset. Each point represents a cell that is colored by cluster (left) or batch (right).

Session Info

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] pheatmap_1.0.12             batchelor_1.6.0            
 [3] scran_1.18.0                scater_1.18.0              
 [5] ggplot2_3.3.2               ensembldb_2.14.0           
 [7] AnnotationFilter_1.14.0     GenomicFeatures_1.42.0     
 [9] AnnotationDbi_1.52.0        AnnotationHub_2.22.0       
[11] BiocFileCache_1.14.0        dbplyr_1.4.4               
[13] scRNAseq_2.4.0              SingleCellExperiment_1.12.0
[15] SummarizedExperiment_1.20.0 Biobase_2.50.0             
[17] GenomicRanges_1.42.0        GenomeInfoDb_1.26.0        
[19] IRanges_2.24.0              S4Vectors_0.28.0           
[21] BiocGenerics_0.36.0         MatrixGenerics_1.2.0       
[23] matrixStats_0.57.0          BiocStyle_2.18.0           
[25] rebook_1.0.0               

loaded via a namespace (and not attached):
  [1] igraph_1.2.6                  lazyeval_0.2.2               
  [3] BiocParallel_1.24.0           digest_0.6.27                
  [5] htmltools_0.5.0               viridis_0.5.1                
  [7] magrittr_1.5                  memoise_1.1.0                
  [9] limma_3.46.0                  Biostrings_2.58.0            
 [11] askpass_1.1                   prettyunits_1.1.1            
 [13] colorspace_1.4-1              blob_1.2.1                   
 [15] rappdirs_0.3.1                xfun_0.19                    
 [17] dplyr_1.0.2                   callr_3.5.1                  
 [19] crayon_1.3.4                  RCurl_1.98-1.2               
 [21] graph_1.68.0                  glue_1.4.2                   
 [23] gtable_0.3.0                  zlibbioc_1.36.0              
 [25] XVector_0.30.0                DelayedArray_0.16.0          
 [27] BiocSingular_1.6.0            scales_1.1.1                 
 [29] DBI_1.1.0                     edgeR_3.32.0                 
 [31] Rcpp_1.0.5                    viridisLite_0.3.0            
 [33] xtable_1.8-4                  progress_1.2.2               
 [35] dqrng_0.2.1                   bit_4.0.4                    
 [37] rsvd_1.0.3                    ResidualMatrix_1.0.0         
 [39] httr_1.4.2                    RColorBrewer_1.1-2           
 [41] ellipsis_0.3.1                pkgconfig_2.0.3              
 [43] XML_3.99-0.5                  farver_2.0.3                 
 [45] scuttle_1.0.0                 CodeDepends_0.6.5            
 [47] locfit_1.5-9.4                tidyselect_1.1.0             
 [49] labeling_0.4.2                rlang_0.4.8                  
 [51] later_1.1.0.1                 munsell_0.5.0                
 [53] BiocVersion_3.12.0            tools_4.0.3                  
 [55] generics_0.1.0                RSQLite_2.2.1                
 [57] ExperimentHub_1.16.0          evaluate_0.14                
 [59] stringr_1.4.0                 fastmap_1.0.1                
 [61] yaml_2.2.1                    processx_3.4.4               
 [63] knitr_1.30                    bit64_4.0.5                  
 [65] purrr_0.3.4                   sparseMatrixStats_1.2.0      
 [67] mime_0.9                      xml2_1.3.2                   
 [69] biomaRt_2.46.0                compiler_4.0.3               
 [71] beeswarm_0.2.3                curl_4.3                     
 [73] interactiveDisplayBase_1.28.0 tibble_3.0.4                 
 [75] statmod_1.4.35                stringi_1.5.3                
 [77] highr_0.8                     ps_1.4.0                     
 [79] lattice_0.20-41               bluster_1.0.0                
 [81] ProtGenerics_1.22.0           Matrix_1.2-18                
 [83] vctrs_0.3.4                   pillar_1.4.6                 
 [85] lifecycle_0.2.0               BiocManager_1.30.10          
 [87] BiocNeighbors_1.8.0           cowplot_1.1.0                
 [89] bitops_1.0-6                  irlba_2.3.3                  
 [91] httpuv_1.5.4                  rtracklayer_1.50.0           
 [93] R6_2.5.0                      bookdown_0.21                
 [95] promises_1.1.1                gridExtra_2.3                
 [97] vipor_0.4.5                   codetools_0.2-16             
 [99] assertthat_0.2.1              openssl_1.4.3                
[101] withr_2.3.0                   GenomicAlignments_1.26.0     
[103] Rsamtools_2.6.0               GenomeInfoDbData_1.2.4       
[105] hms_0.5.3                     grid_4.0.3                   
[107] beachmat_2.6.0                rmarkdown_2.5                
[109] DelayedMatrixStats_1.12.0     Rtsne_0.15                   
[111] shiny_1.5.0                   ggbeeswarm_0.6.0             

Bibliography

Muraro, M. J., G. Dharmadhikari, D. Grun, N. Groen, T. Dielen, E. Jansen, L. van Gurp, et al. 2016. “A Single-Cell Transcriptome Atlas of the Human Pancreas.” Cell Syst 3 (4): 385–94.