Chapter 39 Pijuan-Sala chimeric mouse embryo (10X Genomics)

39.1 Introduction

This performs an analysis of the Pijuan-Sala et al. (2019) dataset on mouse gastrulation. Here, we examine chimeric embryos at the E8.5 stage of development where td-Tomato-positive embryonic stem cells (ESCs) were injected into a wild-type blastocyst.

39.2 Data loading

## class: SingleCellExperiment 
## dim: 29453 20935 
## metadata(0):
## assays(1): counts
## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095742 tomato-td
## rowData names(2): ENSEMBL SYMBOL
## colnames(20935): cell_9769 cell_9770 ... cell_30702 cell_30703
## colData names(11): cell barcode ... doub.density sizeFactor
## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5
## altExpNames(0):

39.3 Quality control

Quality control on the cells has already been performed by the authors, so we will not repeat it here. We additionally remove cells that are labelled as stripped nuclei or doublets.

39.4 Normalization

We use the pre-computed size factors in sce.chimera.

39.5 Variance modelling

We retain all genes with any positive biological component, to preserve as much signal as possible across a very heterogeneous dataset.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Figure 39.1: Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Figure 39.2: Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Figure 39.3: Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

39.6 Merging

We use a hierarchical merge to first merge together replicates with the same genotype, and then merge samples across different genotypes.

We use the percentage of variance lost as a diagnostic:

##              5         6         7         8        9       10
## [1,] 0.000e+00 0.0204433 0.000e+00 0.0169567 0.000000 0.000000
## [2,] 0.000e+00 0.0007389 0.000e+00 0.0004409 0.000000 0.015474
## [3,] 3.090e-02 0.0000000 2.012e-02 0.0000000 0.000000 0.000000
## [4,] 9.024e-05 0.0000000 8.272e-05 0.0000000 0.018047 0.000000
## [5,] 4.321e-03 0.0072518 4.124e-03 0.0078280 0.003831 0.007786

39.7 Clustering

We examine the distribution of cells across clusters and samples.

##        Sample
## Cluster   5   6   7   8   9  10
##      1  152  72  85  88 164 386
##      2   19   7  13  17  20  36
##      3  130  96 109  63 159 311
##      4   43  35  81  81  87 353
##      5   68  31 120 107  83 197
##      6  122  65  64  52  63 141
##      7  187 113 322 587 458 541
##      8   47  22  84  50  90 131
##      9  182  47 231 192 216 391
##      10  95  19  36  18  50  34
##      11   9   7  18  13  30  27
##      12 110  69  73  96 127 252
##      13   0   2   0  51   0   5
##      14  38  39  50  47 126 123
##      15  98  16 164 125 368 273
##      16 146  37 132 110 231 216
##      17 114  43  44  37  40 154
##      18  78  45 189 119 340 493
##      19  86  20  64  54 153  77
##      20 159  77 137 101 147 401
##      21   2   1   7   3  65 133
##      22  11  16  20   9  47  57
##      23   1   5   0  84   0  66
##      24 170  47 282 173 426 542
##      25 109  23 117  55 271 285
##      26 122  72 298 572 296 776

39.8 Dimensionality reduction

We use an external algorithm to compute nearest neighbors for greater speed.

Obligatory $t$-SNE plots of the Pijuan-Sala chimeric mouse embryo dataset, where each point represents a cell and is colored according to the assigned cluster (top) or sample of origin (bottom).

Figure 39.4: Obligatory \(t\)-SNE plots of the Pijuan-Sala chimeric mouse embryo dataset, where each point represents a cell and is colored according to the assigned cluster (top) or sample of origin (bottom).

Session Info

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.12-books/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.12-books/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] batchelor_1.6.2             scran_1.18.5               
 [3] scater_1.18.6               ggplot2_3.3.3              
 [5] MouseGastrulationData_1.4.0 SingleCellExperiment_1.12.0
 [7] SummarizedExperiment_1.20.0 Biobase_2.50.0             
 [9] GenomicRanges_1.42.0        GenomeInfoDb_1.26.4        
[11] IRanges_2.24.1              S4Vectors_0.28.1           
[13] BiocGenerics_0.36.0         MatrixGenerics_1.2.1       
[15] matrixStats_0.58.0          BiocStyle_2.18.1           
[17] rebook_1.0.0               

loaded via a namespace (and not attached):
  [1] Rtsne_0.15                    ggbeeswarm_0.6.0             
  [3] colorspace_2.0-0              ellipsis_0.3.1               
  [5] scuttle_1.0.4                 bluster_1.0.0                
  [7] XVector_0.30.0                BiocNeighbors_1.8.2          
  [9] farver_2.1.0                  bit64_4.0.5                  
 [11] interactiveDisplayBase_1.28.0 AnnotationDbi_1.52.0         
 [13] fansi_0.4.2                   codetools_0.2-18             
 [15] sparseMatrixStats_1.2.1       cachem_1.0.4                 
 [17] knitr_1.31                    jsonlite_1.7.2               
 [19] ResidualMatrix_1.0.0          dbplyr_2.1.0                 
 [21] uwot_0.1.10                   graph_1.68.0                 
 [23] shiny_1.6.0                   BiocManager_1.30.10          
 [25] compiler_4.0.4                httr_1.4.2                   
 [27] dqrng_0.2.1                   assertthat_0.2.1             
 [29] Matrix_1.3-2                  fastmap_1.1.0                
 [31] limma_3.46.0                  later_1.1.0.1                
 [33] BiocSingular_1.6.0            htmltools_0.5.1.1            
 [35] tools_4.0.4                   rsvd_1.0.3                   
 [37] igraph_1.2.6                  gtable_0.3.0                 
 [39] glue_1.4.2                    GenomeInfoDbData_1.2.4       
 [41] dplyr_1.0.5                   rappdirs_0.3.3               
 [43] Rcpp_1.0.6                    jquerylib_0.1.3              
 [45] vctrs_0.3.6                   ExperimentHub_1.16.0         
 [47] DelayedMatrixStats_1.12.3     xfun_0.22                    
 [49] stringr_1.4.0                 ps_1.6.0                     
 [51] beachmat_2.6.4                mime_0.10                    
 [53] lifecycle_1.0.0               irlba_2.3.3                  
 [55] statmod_1.4.35                XML_3.99-0.6                 
 [57] edgeR_3.32.1                  AnnotationHub_2.22.0         
 [59] zlibbioc_1.36.0               scales_1.1.1                 
 [61] promises_1.2.0.1              yaml_2.2.1                   
 [63] curl_4.3                      memoise_2.0.0                
 [65] gridExtra_2.3                 sass_0.3.1                   
 [67] stringi_1.5.3                 RSQLite_2.2.4                
 [69] highr_0.8                     BiocVersion_3.12.0           
 [71] BiocParallel_1.24.1           rlang_0.4.10                 
 [73] pkgconfig_2.0.3               bitops_1.0-6                 
 [75] evaluate_0.14                 lattice_0.20-41              
 [77] purrr_0.3.4                   labeling_0.4.2               
 [79] CodeDepends_0.6.5             cowplot_1.1.1                
 [81] bit_4.0.4                     processx_3.4.5               
 [83] tidyselect_1.1.0              magrittr_2.0.1               
 [85] bookdown_0.21                 R6_2.5.0                     
 [87] generics_0.1.0                DelayedArray_0.16.2          
 [89] DBI_1.1.1                     pillar_1.5.1                 
 [91] withr_2.4.1                   RCurl_1.98-1.3               
 [93] tibble_3.1.0                  crayon_1.4.1                 
 [95] utf8_1.2.1                    BiocFileCache_1.14.0         
 [97] rmarkdown_2.7                 viridis_0.5.1                
 [99] locfit_1.5-9.4                grid_4.0.4                   
[101] blob_1.2.1                    callr_3.5.1                  
[103] digest_0.6.27                 xtable_1.8-4                 
[105] httpuv_1.5.5                  munsell_0.5.0                
[107] beeswarm_0.3.1                viridisLite_0.3.0            
[109] vipor_0.4.5                   bslib_0.2.4                  

Bibliography

Pijuan-Sala, B., J. A. Griffiths, C. Guibentif, T. W. Hiscock, W. Jawaid, F. J. Calero-Nieto, C. Mulas, et al. 2019. “A Single-Cell Molecular Map of Mouse Gastrulation and Early Organogenesis.” Nature 566 (7745): 490–95.