TMExplorer 1.15.0
library(TMExplorer)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
TMExplorer (Tumour Microenvironment Explorer) is a curated collection of scRNAseq datasets sequenced from tumours. It aims to provide a single point of entry for users looking to study the tumour microenvironment at the single-cell level.
Users can quickly search available datasets using the metadata table, and then download the datasets they are interested in for analysis. Optionally, users can save the datasets for use in applications other than R.
This package will improve the ease of studying the tumour microenvironment with single-cell sequencing. Developers may use this package to obtain data for validation of new algorithms and researchers interested in the tumour microenvironment may use it to study specific cancers more closely.
Start by exploring the available datasets through metadata.
res = queryTME(metadata_only = TRUE)
Reference | accession | author | journal | year |
---|---|---|---|---|
Patel_Science_2014 | GSE57872 | Patel | Science | 2014 |
Tirosh_Science_2016a | GSE72056 | Tirosh | Science | 2016 |
Tirosh_Nature_ 2016b | GSE70630 | Tirosh | Nature | 2016 |
Venteicher_Science_2017 | GSE89567 | Venteicher | Science | 2017 |
Li_Nature_Genetics_2017 | GSE81861 | Li | Nature Genetics | 2017 |
Chung_Nature_Commun_2017 | GSE75688 | Chung | Nature Comm | 2017 |
This will return a list containing a single dataframe of metadata for all available datasets.
View the metadata with View(res[[1]])
and then check ?queryTME
for a description of searchable fields.
Note: in order to keep the function’s interface consistent, queryTME
always returns a list of objects, even if there is only one object.
You may prefer running res = queryTME(metadata_only = TRUE)[[1]]
in order to save the dataframe directly.
The metatadata_only
argument can be applied alongside any other argument in order to examine only datasets that have certain qualities.
You can, for instance, view only breast cancer datasets by using
res = queryTME(tumour_type = 'Breast cancer', metadata_only = TRUE)[[1]]
Reference | accession | author | journal | year |
---|---|---|---|---|
Chung_Nature_Commun_2017 | GSE75688 | Chung | Nature Comm | 2017 |
Jordan_Nature_2016 | GSE75367 | Jordan | Nature | 2016 |
Azizi_Cell_2018 | GSE114727 | Azizi | Cell | 2018 |
Yeo_Elife_2020 | GSE123366 | Yeo | Elife | 2020 |
Search Parameter | Description | Examples |
---|---|---|
geo_accession | Search by GEO accession number | GSE72056, GSE57872 |
score_type | Search by type of score shown in $expression | TPM, RPKM, FPKM |
has_signatures | Filter by presence of cell-type gene signatures | TRUE, FALSE |
has_truth | Filter by presence of cell-type labels | TRUE, FALSE |
tumour_type | Search by tumour type | Breast cancer, Melanoma |
author | Search by first author | Patel, Tirosh, Chung |
journal | Search by publication journal | Science, Nature, Cell |
year | Search by year of publication | <2015, >2015, 2013-2015 |
pmid | Search by publication ID | 24925914, 27124452 |
sequence_tech | Search by sequencing technology | SMART-seq, Fluidigm C1 |
organism | Search by source organism | Human, Mice |
sparse | Return expression in sparse matrices | TRUE, FALSE |
In order to search by single years and a range of years, the package looks for specific patterns. ‘2013-2015’ will search for datasets published between 2013 and 2015, inclusive. ‘<2015’ or ‘2015>’ will search for datasets published before or in 2015. ‘>2015’ or ‘2015<’ will search for datasets published in or after 2015.
Once you’ve found a field to search on, you can get your data. For this example, we’re pulling a specific dataset by its GEO ID.
res = queryTME(geo_accession = "GSE81861")
This will return a list containing dataset GSE72056.
The dataset is stored as a SingleCellExperiment
object,
which has the following metadata list
Attribute | Description |
---|---|
signatures | A data.frame containing the cell types and a list of genes that represent that cell type |
cells | A list of cells included in the study |
genes | A list of genes included in the study |
pmid | The PubMed ID of the study |
technology | The sequencing technology used |
score_type | The type of score shown in tme_data$expression |
organism | The type of organism from which cells were sequenced |
author | The first author of the paper presenting the data |
tumour_type | The type of tumour sequenced |
patients | The number of patients included in the study |
tumours | The number of tumours sampled by the study |
geo_accession | The GEO accession ID for the dataset |
To access the expression data for a result, use
View(counts(res[[1]]))
RHC3546__Tcell__.C6E879 | RHC3552__Epithelial__.2749FE | |
---|---|---|
chrX:99883666-99894988_TSPAN6_ENSG00000000003.10 | 3 | 0 |
chrX:99839798-99854882_TNMD_ENSG00000000005.5 | 0 | 0 |
chr20:49505584-49575092_DPM1_ENSG00000000419.8 | 0 | 0 |
chr1:169631244-169863408_SCYL3_ENSG00000000457.9 | 0 | 0 |
chr1:169631244-169863408_C1orf112_ENSG00000000460.12 | 0 | 0 |
chr1:27938574-27961788_FGR_ENSG00000000938.8 | 0 | 0 |
Cell type labels are stored under colData(res[[1]])
for datasets
for which cell type labels are available
Metadata is stored in a named list accessible by metadata(res[[1]])
.
Specific entries can be accessed by attribute name.
metadata(res[[1]])$pmid
#> # A tibble: 1 × 1
#> PMID
#> <dbl>
#> 1 28319088
Say you want to measure the performance of cell-type classification methods. To do this, you need datasets that have the true cell-types available.
res = queryTME(has_truth = TRUE)
This will return a list of all datasets that have true cell-types available. You can see the cell types for the first dataset using the following command:
View(colData(res[[1]]))
label | |
---|---|
RHC3546__Tcell__.C6E879 | Tcell |
RHC3552__Epithelial__.2749FE | Epithelial |
RHC3553__Epithelial__.2749FE | Epithelial |
RHC3555__Bcell__.7DEA7B | Bcell |
RHC3556__Epithelial__.2749FE | Epithelial |
RHC3557__Bcell__.7DEA7B | Bcell |
The first column of this dataframe contains the cell barcode, and the second contains the cell type.
Some cell-type classification methods require a list of gene signatures, to return only datasets that have cell-type gene signatures available, use:
res = queryTME(has_truth = TRUE, has_signatures = TRUE)
View(metadata(res[[1]])$signatures)
MYELOID | FIBROBLAST | TCELL |
---|---|---|
ITGAX_ENSG00000140678.12 | SPARC_ENSG00000113140.6 | TRBC2_ENSG00000211772.4 |
CD68_ENSG00000129226.9 | COL14A1_ENSG00000187955.7 | TRBC2_ENSG00000211772.4 |
CD14_ENSG00000170458.9 | COL13A1_ENSG00000197467.9 | CD3E_ENSG00000198851.5 |
CCL3_ENSG00000006075.11 | DCN_ENSG00000011465.12 | CD3G_ENSG00000160654.5 |
To facilitate the use of any or all datasets outside of R, you can use saveTME()
.
saveTME
takes two parameters, one a tme_data
object to be saved, and the other the directory you would like data to be saved in.
Note that the output directory should not already exist.
To save the data from the earlier example to disk, use the following commands.
res = queryTME(geo_accession = "GSE72056")[[1]]
saveTME(res, '~/Downloads/GSE72056')
The result is three CSV files that can be used in other programs. In the future we will support saving in other formats.
sessionInfo()
#> R version 4.4.0 RC (2024-04-16 r86468)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] TMExplorer_1.15.0 BiocFileCache_2.13.0
#> [3] dbplyr_2.5.0 SingleCellExperiment_1.27.0
#> [5] SummarizedExperiment_1.35.0 Biobase_2.65.0
#> [7] GenomicRanges_1.57.0 GenomeInfoDb_1.41.0
#> [9] IRanges_2.39.0 S4Vectors_0.43.0
#> [11] BiocGenerics_0.51.0 MatrixGenerics_1.17.0
#> [13] matrixStats_1.3.0 BiocStyle_2.33.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3
#> [4] SparseArray_1.5.0 RSQLite_2.3.6 lattice_0.22-6
#> [7] digest_0.6.35 magrittr_2.0.3 evaluate_0.23
#> [10] grid_4.4.0 bookdown_0.39 blob_1.2.4
#> [13] fastmap_1.1.1 jsonlite_1.8.8 Matrix_1.7-0
#> [16] DBI_1.2.2 BiocManager_1.30.22 httr_1.4.7
#> [19] purrr_1.0.2 fansi_1.0.6 UCSC.utils_1.1.0
#> [22] jquerylib_0.1.4 abind_1.4-5 cli_3.6.2
#> [25] rlang_1.1.3 crayon_1.5.2 XVector_0.45.0
#> [28] bit64_4.0.5 cachem_1.0.8 DelayedArray_0.31.0
#> [31] yaml_2.3.8 S4Arrays_1.5.0 tools_4.4.0
#> [34] memoise_2.0.1 dplyr_1.1.4 filelock_1.0.3
#> [37] GenomeInfoDbData_1.2.12 curl_5.2.1 vctrs_0.6.5
#> [40] R6_2.5.1 lifecycle_1.0.4 zlibbioc_1.51.0
#> [43] bit_4.0.5 pkgconfig_2.0.3 bslib_0.7.0
#> [46] pillar_1.9.0 glue_1.7.0 tidyselect_1.2.1
#> [49] tibble_3.2.1 xfun_0.43 knitr_1.46
#> [52] htmltools_0.5.8.1 rmarkdown_2.26 compiler_4.4.0