Chapter 10 Chimeric mouse embryo (10X Genomics)

10.1 Introduction

This performs an analysis of the Pijuan-Sala et al. (2019) dataset on mouse gastrulation. Here, we examine chimeric embryos at the E8.5 stage of development where td-Tomato-positive embryonic stem cells (ESCs) were injected into a wild-type blastocyst.

10.2 Data loading

library(MouseGastrulationData)
sce.chimera <- WTChimeraData(samples=5:10)
counts(sce.chimera) <- as(counts(sce.chimera), "CsparseMatrix")
sce.chimera
## class: SingleCellExperiment 
## dim: 29453 20935 
## metadata(0):
## assays(1): counts
## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095742 tomato-td
## rowData names(2): ENSEMBL SYMBOL
## colnames(20935): cell_9769 cell_9770 ... cell_30702 cell_30703
## colData names(11): cell barcode ... doub.density sizeFactor
## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5
## mainExpName: NULL
## altExpNames(0):
library(scater)
rownames(sce.chimera) <- uniquifyFeatureNames(
    rowData(sce.chimera)$ENSEMBL, rowData(sce.chimera)$SYMBOL)

10.3 Quality control

Quality control on the cells has already been performed by the authors, so we will not repeat it here. We additionally remove cells that are labelled as stripped nuclei or doublets.

drop <- sce.chimera$celltype.mapped %in% c("stripped", "Doublet")
sce.chimera <- sce.chimera[,!drop]

10.4 Normalization

We use the pre-computed size factors in sce.chimera.

sce.chimera <- logNormCounts(sce.chimera)

10.5 Variance modelling

We retain all genes with any positive biological component, to preserve as much signal as possible across a very heterogeneous dataset.

library(scran)
dec.chimera <- modelGeneVar(sce.chimera, block=sce.chimera$sample)
chosen.hvgs <- dec.chimera$bio > 0
par(mfrow=c(1,2))
blocked.stats <- dec.chimera$per.block
for (i in colnames(blocked.stats)) {
    current <- blocked.stats[[i]]
    plot(current$mean, current$total, main=i, pch=16, cex=0.5,
        xlab="Mean of log-expression", ylab="Variance of log-expression")
    curfit <- metadata(current)
    curve(curfit$trend(x), col='dodgerblue', add=TRUE, lwd=2)
}
Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Figure 10.1: Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Figure 10.2: Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

Figure 10.3: Per-gene variance as a function of the mean for the log-expression values in the Pijuan-Sala chimeric mouse embryo dataset. Each point represents a gene (black) with the mean-variance trend (blue) fitted to the variances.

10.6 Merging

We use a hierarchical merge to first merge together replicates with the same genotype, and then merge samples across different genotypes.

library(batchelor)
set.seed(01001001)
merged <- correctExperiments(sce.chimera, 
    batch=sce.chimera$sample, 
    subset.row=chosen.hvgs,
    PARAM=FastMnnParam(
        merge.order=list(
            list(1,3,5), # WT (3 replicates)
            list(2,4,6)  # td-Tomato (3 replicates)
        )
    )
)

We use the percentage of variance lost as a diagnostic:

metadata(merged)$merge.info$lost.var
##              5         6         7         8        9       10
## [1,] 0.000e+00 0.0204433 0.000e+00 0.0169567 0.000000 0.000000
## [2,] 0.000e+00 0.0007389 0.000e+00 0.0004409 0.000000 0.015474
## [3,] 3.090e-02 0.0000000 2.012e-02 0.0000000 0.000000 0.000000
## [4,] 9.024e-05 0.0000000 8.272e-05 0.0000000 0.018047 0.000000
## [5,] 4.321e-03 0.0072518 4.124e-03 0.0078280 0.003831 0.007786

10.7 Clustering

g <- buildSNNGraph(merged, use.dimred="corrected")
clusters <- igraph::cluster_louvain(g)
colLabels(merged) <- factor(clusters$membership)

We examine the distribution of cells across clusters and samples.

table(Cluster=colLabels(merged), Sample=merged$sample)
##        Sample
## Cluster   5   6   7   8   9  10
##      1   87  20  62  53 151  74
##      2  146  37 132 110 231 215
##      3   98  16 163 125 367 273
##      4  134 101 188 442 378 465
##      5  108  42 314 398 185 242
##      6  206  51 338 205 525 607
##      7  149  71  85  86 163 380
##      8  131  95 108  65 161 311
##      9   82  20  75  33 165 203
##      10  97  19  36  18  50  35
##      11 109  40  46  37  41 147
##      12 125  67  66  52  63 142
##      13 157  78 131 104 162 438
##      14 110  69  72  96 127 253
##      15  84  48 159 357 198 623
##      16  43  36  82  81  86 357
##      17 176  47 223 180 210 381
##      18  77  43 189 117 324 485
##      19  47  22  84  50  90 130
##      20  39  41  50  49 130 126
##      21   1   5   0  84   0  66
##      22  18   7  13  17  20  38
##      23  52  25  79  69  76 182
##      24   9   7  18  13  30  27
##      25  11  16  20   9  47  58
##      26   2   1   7   3  77 138
##      27   0   2   0  51   0   5

10.8 Dimensionality reduction

We use an external algorithm to compute nearest neighbors for greater speed.

merged <- runTSNE(merged, dimred="corrected", external_neighbors=TRUE)
merged <- runUMAP(merged, dimred="corrected", external_neighbors=TRUE)
gridExtra::grid.arrange(
    plotTSNE(merged, colour_by="label", text_by="label", text_colour="red"),
    plotTSNE(merged, colour_by="batch")
)
Obligatory $t$-SNE plots of the Pijuan-Sala chimeric mouse embryo dataset, where each point represents a cell and is colored according to the assigned cluster (top) or sample of origin (bottom).

Figure 10.4: Obligatory \(t\)-SNE plots of the Pijuan-Sala chimeric mouse embryo dataset, where each point represents a cell and is colored according to the assigned cluster (top) or sample of origin (bottom).

Session Info

R version 4.3.2 Patched (2023-11-13 r85521)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] batchelor_1.18.1             scran_1.30.2                
 [3] scater_1.30.1                ggplot2_3.4.4               
 [5] scuttle_1.12.0               MouseGastrulationData_1.16.0
 [7] SpatialExperiment_1.12.0     SingleCellExperiment_1.24.0 
 [9] SummarizedExperiment_1.32.0  Biobase_2.62.0              
[11] GenomicRanges_1.54.1         GenomeInfoDb_1.38.6         
[13] IRanges_2.36.0               S4Vectors_0.40.2            
[15] BiocGenerics_0.48.1          MatrixGenerics_1.14.0       
[17] matrixStats_1.2.0            BiocStyle_2.30.0            
[19] rebook_1.12.0               

loaded via a namespace (and not attached):
  [1] jsonlite_1.8.8                CodeDepends_0.6.5            
  [3] magrittr_2.0.3                ggbeeswarm_0.7.2             
  [5] magick_2.8.2                  farver_2.1.1                 
  [7] rmarkdown_2.25                zlibbioc_1.48.0              
  [9] vctrs_0.6.5                   memoise_2.0.1                
 [11] DelayedMatrixStats_1.24.0     RCurl_1.98-1.14              
 [13] htmltools_0.5.7               S4Arrays_1.2.0               
 [15] AnnotationHub_3.10.0          curl_5.2.0                   
 [17] BiocNeighbors_1.20.2          SparseArray_1.2.4            
 [19] sass_0.4.8                    bslib_0.6.1                  
 [21] cachem_1.0.8                  ResidualMatrix_1.12.0        
 [23] igraph_2.0.1.1                mime_0.12                    
 [25] lifecycle_1.0.4               pkgconfig_2.0.3              
 [27] rsvd_1.0.5                    Matrix_1.6-5                 
 [29] R6_2.5.1                      fastmap_1.1.1                
 [31] GenomeInfoDbData_1.2.11       shiny_1.8.0                  
 [33] digest_0.6.34                 colorspace_2.1-0             
 [35] AnnotationDbi_1.64.1          dqrng_0.3.2                  
 [37] irlba_2.3.5.1                 ExperimentHub_2.10.0         
 [39] RSQLite_2.3.5                 beachmat_2.18.1              
 [41] labeling_0.4.3                filelock_1.0.3               
 [43] fansi_1.0.6                   httr_1.4.7                   
 [45] abind_1.4-5                   compiler_4.3.2               
 [47] bit64_4.0.5                   withr_3.0.0                  
 [49] BiocParallel_1.36.0           viridis_0.6.5                
 [51] DBI_1.2.1                     highr_0.10                   
 [53] rappdirs_0.3.3                DelayedArray_0.28.0          
 [55] rjson_0.2.21                  bluster_1.12.0               
 [57] tools_4.3.2                   vipor_0.4.7                  
 [59] beeswarm_0.4.0                interactiveDisplayBase_1.40.0
 [61] httpuv_1.6.14                 glue_1.7.0                   
 [63] promises_1.2.1                grid_4.3.2                   
 [65] Rtsne_0.17                    cluster_2.1.6                
 [67] generics_0.1.3                gtable_0.3.4                 
 [69] metapod_1.10.1                BiocSingular_1.18.0          
 [71] ScaledMatrix_1.10.0           utf8_1.2.4                   
 [73] XVector_0.42.0                ggrepel_0.9.5                
 [75] BiocVersion_3.18.1            pillar_1.9.0                 
 [77] limma_3.58.1                  BumpyMatrix_1.10.0           
 [79] later_1.3.2                   dplyr_1.1.4                  
 [81] BiocFileCache_2.10.1          lattice_0.22-5               
 [83] bit_4.0.5                     tidyselect_1.2.0             
 [85] locfit_1.5-9.8                Biostrings_2.70.2            
 [87] knitr_1.45                    gridExtra_2.3                
 [89] bookdown_0.37                 edgeR_4.0.15                 
 [91] xfun_0.42                     statmod_1.5.0                
 [93] yaml_2.3.8                    evaluate_0.23                
 [95] codetools_0.2-19              tibble_3.2.1                 
 [97] BiocManager_1.30.22           graph_1.80.0                 
 [99] cli_3.6.2                     uwot_0.1.16                  
[101] xtable_1.8-4                  munsell_0.5.0                
[103] jquerylib_0.1.4               Rcpp_1.0.12                  
[105] dir.expiry_1.10.0             dbplyr_2.4.0                 
[107] png_0.1-8                     XML_3.99-0.16.1              
[109] parallel_4.3.2                ellipsis_0.3.2               
[111] blob_1.2.4                    sparseMatrixStats_1.14.0     
[113] bitops_1.0-7                  viridisLite_0.4.2            
[115] scales_1.3.0                  purrr_1.0.2                  
[117] crayon_1.5.2                  rlang_1.1.3                  
[119] cowplot_1.1.3                 KEGGREST_1.42.0              

Butler, A., P. Hoffman, P. Smibert, E. Papalexi, and R. Satija. 2018. “Integrating single-cell transcriptomic data across different conditions, technologies, and species.” Nat. Biotechnol. 36 (5): 411–20.

Büttner, Maren, Zhichao Miao, F Alexander Wolf, Sarah A Teichmann, and Fabian J Theis. 2019. “A Test Metric for Assessing Single-Cell Rna-Seq Batch Correction.” Nature Methods 16 (1): 43–49.

Chen, Y., A. T. Lun, and G. K. Smyth. 2016. “From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.” F1000Res 5: 1438.

Crowell, H. L., C. Soneson, P.-L. Germain, D. Calini, L. Collin, C. Raposo, D. Malhotra, and M. D. Robinson. 2019. “On the Discovery of Population-Specific State Transitions from Multi-Sample Multi-Condition Single-Cell Rna Sequencing Data.” bioRxiv. https://doi.org/10.1101/713412.

Finak, G., J. Frelinger, W. Jiang, E. W. Newell, J. Ramey, M. M. Davis, S. A. Kalams, S. C. De Rosa, and R. Gottardo. 2014. “OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis.” PLoS Comput. Biol. 10 (8): e1003806.

Grun, D., M. J. Muraro, J. C. Boisset, K. Wiebrands, A. Lyubimova, G. Dharmadhikari, M. van den Born, et al. 2016. “De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data.” Cell Stem Cell 19 (2): 266–77.

Haghverdi, L., A. T. L. Lun, M. D. Morgan, and J. C. Marioni. 2018. “Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.” Nat. Biotechnol. 36 (5): 421–27.

Lawlor, N., J. George, M. Bolisetty, R. Kursawe, L. Sun, V. Sivakamasundari, I. Kycia, P. Robson, and M. L. Stitzel. 2017. “Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes.” Genome Res. 27 (2): 208–22.

Leek, J. T., W. E. Johnson, H. S. Parker, A. E. Jaffe, and J. D. Storey. 2012. “The sva package for removing batch effects and other unwanted variation in high-throughput experiments.” Bioinformatics 28 (6): 882–83.

Lin, Y., S. Ghazanfar, K. Y. X. Wang, J. A. Gagnon-Bartsch, K. K. Lo, X. Su, Z. G. Han, et al. 2019. “scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.” Proc. Natl. Acad. Sci. U.S.A. 116 (20): 9775–84.

Lun, A., S. Riesenfeld, T. Andrews, T. P. Dao, T. Gomes, participants in the 1st Human Cell Atlas Jamboree, and J. Marioni. 2019. “EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data.” Genome Biol. 20 (1): 63.

Lun, A. T. L., and J. C. Marioni. 2017. “Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data.” Biostatistics 18 (3): 451–64.

Lun, A. T. L., A. C. Richard, and J. C. Marioni. 2017. “Testing for differential abundance in mass cytometry data.” Nat. Methods 14 (7): 707–9.

McCarthy, D. J., and G. K. Smyth. 2009. “Testing significance relative to a fold-change threshold is a TREAT.” Bioinformatics 25 (6): 765–71.

Muraro, M. J., G. Dharmadhikari, D. Grun, N. Groen, T. Dielen, E. Jansen, L. van Gurp, et al. 2016. “A Single-Cell Transcriptome Atlas of the Human Pancreas.” Cell Syst 3 (4): 385–94.

Phipson, B., S. Lee, I. J. Majewski, W. S. Alexander, and G. K. Smyth. 2016. “Robust Hyperparameter Estimation Protects Against Hypervariable Genes and Improves Power to Detect Differential Expression.” Ann. Appl. Stat. 10 (2): 946–63.

Pijuan-Sala, B., J. A. Griffiths, C. Guibentif, T. W. Hiscock, W. Jawaid, F. J. Calero-Nieto, C. Mulas, et al. 2019. “A Single-Cell Molecular Map of Mouse Gastrulation and Early Organogenesis.” Nature 566 (7745): 490–95.

Richard, A. C., A. T. L. Lun, W. W. Y. Lau, B. Gottgens, J. C. Marioni, and G. M. Griffiths. 2018. “T cell cytolytic capacity is independent of initial stimulation strength.” Nat. Immunol. 19 (8): 849–58.

Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, and G. K. Smyth. 2015. “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Res. 43 (7): e47.

Robinson, M. D., D. J. McCarthy, and G. K. Smyth. 2010. “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics 26 (1): 139–40.

Robinson, M. D., and A. Oshlack. 2010. “A scaling normalization method for differential expression analysis of RNA-seq data.” Genome Biol. 11 (3): R25.

Scialdone, A., Y. Tanaka, W. Jawaid, V. Moignard, N. K. Wilson, I. C. Macaulay, J. C. Marioni, and B. Gottgens. 2016. “Resolving early mesoderm diversification through single-cell expression profiling.” Nature 535 (7611): 289–93.

Segerstolpe, A., A. Palasantza, P. Eliasson, E. M. Andersson, A. C. Andreasson, X. Sun, S. Picelli, et al. 2016. “Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes.” Cell Metab. 24 (4): 593–607.

Tung, P. Y., J. D. Blischak, C. J. Hsiao, D. A. Knowles, J. E. Burnett, J. K. Pritchard, and Y. Gilad. 2017. “Batch effects and the effective design of single-cell gene expression studies.” Sci. Rep. 7 (January): 39921.

Young, M. D., and S. Behjati. 2018. “SoupX Removes Ambient RNA Contamination from Droplet Based Single Cell RNA Sequencing Data.” bioRxiv.

Zheng, G. X., J. M. Terry, P. Belgrader, P. Ryvkin, Z. W. Bent, R. Wilson, S. B. Ziraldo, et al. 2017. “Massively parallel digital transcriptional profiling of single cells.” Nat Commun 8 (January): 14049.

References

Pijuan-Sala, B., J. A. Griffiths, C. Guibentif, T. W. Hiscock, W. Jawaid, F. J. Calero-Nieto, C. Mulas, et al. 2019. “A Single-Cell Molecular Map of Mouse Gastrulation and Early Organogenesis.” Nature 566 (7745): 490–95.