To install and load the package, run:
peco
uses SingleCellExperiment
class objects.
library(peco)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
library(doParallel)
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
library(foreach)
peco
is a supervised approach for PrEdicting cell cycle phase in a COntinuum using single-cell RNA sequencing data. The R package provides functions to build training dataset and also functions to use existing training data to predict cell cycle on a continuum.
Our work demonstrated that peco is able to predict continuous cell cylce phase using a small set of cylcic genes: CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (identified as cell cycle marker genes in studies of yeast (Spellman et al., 1998) and HeLa cells (Whitfield et al., 2002)).
Below we provide two use cases. Vignette 1 shows how to use the built-training dataset to predict continuous cell cycle. Vignette 2 shows how to make a training datast and build a predictor using training data.
Users can also view the vigenettes via browseVignettes("peco")
.
training_human
stores built-in training data of 101 significant cyclic genes. Below are the slots contained in training_human
:
predict.yy
: a gene by sample matrix (101 by 888) that stores predict cyclic expression values.cellcycle_peco_reordered
: cell cycle phase in a unit circle (angle), ordered from 0 to 2\(pi\)cellcycle_function
: lists of 101 function corresponding to the top 101 cyclic genes identified in our datasetsigma
: standard error associated with cyclic trends of gene expressionpve
: proportion of variance explained by the cyclic trendpeco
is integrated with SingleCellExperiment
object in Bioconductor. Below shows an example of inputting SingleCellExperiment
object to perform cell cycle phase prediction.
sce_top101genes
includes 101 genes and 888 single-cell samples and one assay slot of counts
.
Transform the expression values to quantile-normalizesd counts-per-million values. peco
uses the cpm_quantNormed
slot as input data for predictions.
sce_top101genes <- data_transform_quantile(sce_top101genes)
#> computing on 2 cores
assays(sce_top101genes)
#> List of length 3
#> names(3): counts cpm cpm_quantNormed
Apply the prediction model using function cycle_npreg_outsample
and generate prediction results contained in a list object pred_top101genes
.
pred_top101genes <- cycle_npreg_outsample(
Y_test=sce_top101genes,
sigma_est=training_human$sigma[rownames(sce_top101genes),],
funs_est=training_human$cellcycle_function[rownames(sce_top101genes)],
method.trend="trendfilter",
ncores=1,
get_trend_estimates=FALSE)
The pred_top101genes$Y
contains a SingleCellExperiment object with the predict cell cycle phase in the colData
slot.
head(colData(pred_top101genes$Y)$cellcycle_peco)
#> 20170905-A01 20170905-A02 20170905-A03 20170905-A06 20170905-A07 20170905-A08
#> 0.848230 4.680973 2.481858 4.303982 4.052655 1.413717
Visualize results of prediction for one gene. Below we choose CDK1 (“ENSG00000170312”). Because CDK1 is a known cell cycle gene, this visualization serves as a sanity check for the results of fitting. The fitted function training_human$cellcycle_function[[1]]
was obtained from our training data.
plot(y=assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",],
x=colData(pred_top101genes$Y)$theta_shifted, main = "CDK1",
ylab = "quantile normalized expression")
points(y=training_human$cellcycle_function[["ENSG00000170312"]](seq(0,2*pi, length.out=100)),
x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
Visualize results of prediction for the top 10 genesone genes. Use fit_cyclical_many
to estimate cyclic function based on the input data.
# predicted cell time in the input data
theta_predict = colData(pred_top101genes$Y)$cellcycle_peco
names(theta_predict) = rownames(colData(pred_top101genes$Y))
# expression values of 10 genes in the input data
yy_input = assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,]
# apply trendfilter to estimate cyclic gene expression trend
fit_cyclic <- fit_cyclical_many(Y=yy_input,
theta=theta_predict)
#> computing on 2 cores
gene_symbols = rowData(pred_top101genes$Y)$hgnc[rownames(yy_input)]
par(mfrow=c(2,3))
for (i in 1:6) {
plot(y=yy_input[i,],
x=fit_cyclic$cellcycle_peco_ordered,
main = gene_symbols[i],
ylab = "quantile normalized expression")
points(y=fit_cyclic$cellcycle_function[[i]](seq(0,2*pi, length.out=100)),
x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
}
sessionInfo()
#> R version 4.4.0 RC (2024-04-16 r86468)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] parallel stats4 stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] doParallel_1.0.17 iterators_1.0.14
#> [3] foreach_1.5.2 SingleCellExperiment_1.27.0
#> [5] SummarizedExperiment_1.35.0 Biobase_2.65.0
#> [7] GenomicRanges_1.57.0 GenomeInfoDb_1.41.0
#> [9] IRanges_2.39.0 S4Vectors_0.43.0
#> [11] BiocGenerics_0.51.0 MatrixGenerics_1.17.0
#> [13] matrixStats_1.3.0 peco_1.17.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2
#> [3] vipor_0.4.7 dplyr_1.1.4
#> [5] viridis_0.6.5 fastmap_1.1.1
#> [7] pracma_2.4.4 digest_0.6.35
#> [9] rsvd_1.0.5 lifecycle_1.0.4
#> [11] magrittr_2.0.3 compiler_4.4.0
#> [13] rlang_1.1.3 sass_0.4.9
#> [15] tools_4.4.0 igraph_2.0.3
#> [17] utf8_1.2.4 yaml_2.3.8
#> [19] knitr_1.46 S4Arrays_1.5.0
#> [21] DelayedArray_0.31.0 abind_1.4-5
#> [23] BiocParallel_1.39.0 grid_4.4.0
#> [25] fansi_1.0.6 beachmat_2.21.0
#> [27] colorspace_2.1-0 ggplot2_3.5.1
#> [29] scales_1.3.0 cli_3.6.2
#> [31] mvtnorm_1.2-4 rmarkdown_2.26
#> [33] crayon_1.5.2 generics_0.1.3
#> [35] httr_1.4.7 DelayedMatrixStats_1.27.0
#> [37] genlasso_1.6.1 scuttle_1.15.0
#> [39] ggbeeswarm_0.7.2 cachem_1.0.8
#> [41] geigen_2.3 zlibbioc_1.51.0
#> [43] assertthat_0.2.1 XVector_0.45.0
#> [45] vctrs_0.6.5 boot_1.3-30
#> [47] Matrix_1.7-0 jsonlite_1.8.8
#> [49] BiocSingular_1.21.0 BiocNeighbors_1.23.0
#> [51] ggrepel_0.9.5 irlba_2.3.5.1
#> [53] beeswarm_0.4.0 scater_1.33.0
#> [55] jquerylib_0.1.4 glue_1.7.0
#> [57] codetools_0.2-20 gtable_0.3.5
#> [59] circular_0.5-0 UCSC.utils_1.1.0
#> [61] ScaledMatrix_1.13.0 munsell_0.5.1
#> [63] tibble_3.2.1 pillar_1.9.0
#> [65] htmltools_0.5.8.1 conicfit_1.0.4
#> [67] GenomeInfoDbData_1.2.12 R6_2.5.1
#> [69] sparseMatrixStats_1.17.0 evaluate_0.23
#> [71] lattice_0.22-6 highr_0.10
#> [73] bslib_0.7.0 Rcpp_1.0.12
#> [75] gridExtra_2.3 SparseArray_1.5.0
#> [77] xfun_0.43 pkgconfig_2.0.3