Chapter 13 Messmer human ESC (Smart-seq2)

13.1 Introduction

This performs an analysis of the human embryonic stem cell (hESC) dataset generated with Smart-seq2 (Messmer et al. 2019), which contains several plates of naive and primed hESCs. The chapter’s code is based on the steps in the paper’s GitHub repository, with some additional steps for cell cycle effect removal contributed by Philippe Boileau.

13.7 Batch correction

We eliminate the obvious batch effect between batches with linear regression, which is possible due to the replicated nature of the experimental design. We set keep=1:2 to retain the effect of the first two coefficients in design corresponding to our phenotype of interest.

13.8 Dimensionality Reduction

We could have set d= and subset.row= in correctExperiments() to automatically perform a PCA on the the residual matrix with the subset of HVGs, but we’ll just explicitly call runPCA() here to keep things simple.

From a naive PCA, the cell cycle appears to be a major source of biological variation within each phenotype.

Obligatory $t$-SNE plots of the Messmer hESC dataset, where each point is a cell and is colored by various attributes.

Figure 13.5: Obligatory \(t\)-SNE plots of the Messmer hESC dataset, where each point is a cell and is colored by various attributes.

We perform contrastive PCA (cPCA) and sparse cPCA (scPCA) on the corrected log-expression data to obtain the same number of PCs. Given that the naive hESCs are actually reprogrammed primed hESCs, we will use the single batch of primed-only hESCs as the “background” dataset to remove the cell cycle effect.

We see greater intermingling between phases within both the naive and primed cells after cPCA and scPCA.

More $t$-SNE plots of the Messmer hESC dataset after cPCA and scPCA, where each point is a cell and is colored by its assigned cell cycle phase.

Figure 13.6: More \(t\)-SNE plots of the Messmer hESC dataset after cPCA and scPCA, where each point is a cell and is colored by its assigned cell cycle phase.

We can quantify the change in the separation between phases within each phenotype using the silhouette coefficient.

##   naive  primed 
## 0.02032 0.03025
##    naive   primed 
## 0.007696 0.011941
##    naive   primed 
## 0.006614 0.014601

Session Info

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] bluster_1.4.0               scPCA_1.8.0                
 [3] batchelor_1.10.0            scran_1.22.0               
 [5] scater_1.22.0               ggplot2_3.3.5              
 [7] scuttle_1.4.0               AnnotationHub_3.2.0        
 [9] BiocFileCache_2.2.0         dbplyr_2.1.1               
[11] ensembldb_2.18.0            AnnotationFilter_1.18.0    
[13] GenomicFeatures_1.46.0      AnnotationDbi_1.56.0       
[15] scRNAseq_2.7.2              SingleCellExperiment_1.16.0
[17] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[19] GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
[21] IRanges_2.28.0              S4Vectors_0.32.0           
[23] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[25] matrixStats_0.61.0          BiocStyle_2.22.0           
[27] rebook_1.4.0               

loaded via a namespace (and not attached):
  [1] igraph_1.2.7                  lazyeval_0.2.2               
  [3] listenv_0.8.0                 BiocParallel_1.28.0          
  [5] digest_0.6.28                 htmltools_0.5.2              
  [7] viridis_0.6.2                 fansi_0.5.0                  
  [9] magrittr_2.0.1                memoise_2.0.0                
 [11] ScaledMatrix_1.2.0            cluster_2.1.2                
 [13] limma_3.50.0                  globals_0.14.0               
 [15] Biostrings_2.62.0             prettyunits_1.1.1            
 [17] colorspace_2.0-2              blob_1.2.2                   
 [19] rappdirs_0.3.3                ggrepel_0.9.1                
 [21] rbibutils_2.2.4               xfun_0.27                    
 [23] dplyr_1.0.7                   crayon_1.4.1                 
 [25] RCurl_1.98-1.5                jsonlite_1.7.2               
 [27] graph_1.72.0                  glue_1.4.2                   
 [29] gtable_0.3.0                  zlibbioc_1.40.0              
 [31] XVector_0.34.0                DelayedArray_0.20.0          
 [33] coop_0.6-3                    kernlab_0.9-29               
 [35] BiocSingular_1.10.0           future.apply_1.8.1           
 [37] abind_1.4-5                   scales_1.1.1                 
 [39] edgeR_3.36.0                  DBI_1.1.1                    
 [41] Rcpp_1.0.7                    viridisLite_0.4.0            
 [43] xtable_1.8-4                  progress_1.2.2               
 [45] dqrng_0.3.0                   bit_4.0.4                    
 [47] rsvd_1.0.5                    ResidualMatrix_1.4.0         
 [49] metapod_1.2.0                 httr_1.4.2                   
 [51] dir.expiry_1.2.0              ellipsis_0.3.2               
 [53] pkgconfig_2.0.3               XML_3.99-0.8                 
 [55] farver_2.1.0                  CodeDepends_0.6.5            
 [57] sass_0.4.0                    locfit_1.5-9.4               
 [59] utf8_1.2.2                    tidyselect_1.1.1             
 [61] labeling_0.4.2                rlang_0.4.12                 
 [63] later_1.3.0                   munsell_0.5.0                
 [65] BiocVersion_3.14.0            tools_4.1.1                  
 [67] cachem_1.0.6                  generics_0.1.1               
 [69] RSQLite_2.2.8                 ExperimentHub_2.2.0          
 [71] evaluate_0.14                 stringr_1.4.0                
 [73] fastmap_1.1.0                 yaml_2.2.1                   
 [75] knitr_1.36                    bit64_4.0.5                  
 [77] purrr_0.3.4                   KEGGREST_1.34.0              
 [79] future_1.22.1                 sparseMatrixStats_1.6.0      
 [81] mime_0.12                     origami_1.0.5                
 [83] xml2_1.3.2                    biomaRt_2.50.0               
 [85] compiler_4.1.1                beeswarm_0.4.0               
 [87] filelock_1.0.2                curl_4.3.2                   
 [89] png_0.1-7                     interactiveDisplayBase_1.32.0
 [91] statmod_1.4.36                tibble_3.1.5                 
 [93] bslib_0.3.1                   stringi_1.7.5                
 [95] highr_0.9                     RSpectra_0.16-0              
 [97] lattice_0.20-45               ProtGenerics_1.26.0          
 [99] Matrix_1.3-4                  vctrs_0.3.8                  
[101] pillar_1.6.4                  lifecycle_1.0.1              
[103] BiocManager_1.30.16           Rdpack_2.1.2                 
[105] jquerylib_0.1.4               BiocNeighbors_1.12.0         
[107] data.table_1.14.2             cowplot_1.1.1                
[109] bitops_1.0-7                  irlba_2.3.3                  
[111] httpuv_1.6.3                  rtracklayer_1.54.0           
[113] R6_2.5.1                      BiocIO_1.4.0                 
[115] bookdown_0.24                 promises_1.2.0.1             
[117] KernSmooth_2.23-20            gridExtra_2.3                
[119] parallelly_1.28.1             vipor_0.4.5                  
[121] codetools_0.2-18              assertthat_0.2.1             
[123] rjson_0.2.20                  sparsepca_0.1.2              
[125] withr_2.4.2                   GenomicAlignments_1.30.0     
[127] Rsamtools_2.10.0              GenomeInfoDbData_1.2.7       
[129] parallel_4.1.1                hms_1.1.1                    
[131] grid_4.1.1                    beachmat_2.10.0              
[133] rmarkdown_2.11                DelayedMatrixStats_1.16.0    
[135] Rtsne_0.15                    shiny_1.7.1                  
[137] ggbeeswarm_0.6.0              restfulr_0.0.13              

References

Messmer, T., F. von Meyenn, A. Savino, F. Santos, H. Mohammed, A. T. L. Lun, J. C. Marioni, and W. Reik. 2019. “Transcriptional heterogeneity in naive and primed human pluripotent stem cells at single-cell resolution.” Cell Rep 26 (4): 815–24.