Contents

Note: TREG is pronounced as a single word and fully capitalized, unlike Regulatory T cells, which are known as “Tregs” (pronounced “T-regs”). The work described here is unrelated to regulatory T cells.

1 Basics

1.1 Install TREG

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. TREG is a R package available via Bioconductor. R can be installed on any operating system from CRAN after which you can install TREG by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("TREG")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

1.2 Required knowledge

TREG (Huuki-Myers and Collado-Torres, 2024) is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with single cell RNA sequencing data, visualization functions, and interactive data exploration. That is, packages like SummarizedExperiment that allow you to store the data.

If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.

1.3 Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help regarding Bioconductor. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

1.4 Citing TREG

We hope that TREG will be useful for your research. Please use the following information to cite the package and the research article describing the data provided by TREG. Thank you!

## Citation info
citation("TREG")
#> To cite package 'TREG' in publications use:
#> 
#>   Huuki-Myers LA, Collado-Torres L (2024). _TREG: a R/Bioconductor
#>   package to identify Total RNA Expression Genes_.
#>   doi:10.18129/B9.bioc.TREG <https://doi.org/10.18129/B9.bioc.TREG>,
#>   https://github.com/LieberInstitute/TREG/TREG - R package version
#>   1.9.0, <http://www.bioconductor.org/packages/TREG>.
#> 
#>   Huuki-Myers LA, Montgomery KD, Kwon SH, Page SC, Hicks SC, Maynard
#>   KR, Collado-Torres L (2022). "Data Driven Identification of Total RNA
#>   Expression Genes "TREGs" for estimation of RNA abundance in
#>   heterogeneous cell types." _bioRxiv_. doi:10.1101/2022.04.28.489923
#>   <https://doi.org/10.1101/2022.04.28.489923>,
#>   <https://doi.org/10.1101/2022.04.28.489923>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

2 Overview

The TREG (Huuki-Myers and Collado-Torres, 2024) package was developed for identifying candidate Total RNA Expression Genes (TREGs) for estimating RNA abundance for individual cells in an snFISH experiment by researchers at the Lieber Institute for Brain Development (LIBD) (Huuki-Myers, Montgomery, Kwon, Page, Hicks, Maynard, and Collado-Torres, 2022).

In this vignette we’ll showcase how you can use the R functions provided by TREG (Huuki-Myers and Collado-Torres, 2024) with the snRNA-seq dataset that was recently published by our LIBD collaborators (Tran, Maynard, Spangler, Huuki, Montgomery, Sadashivaiah, Tippani, Barry, Hancock, Hicks, Kleinman, Hyde, Collado-Torres, Jaffe, and Martinowich, 2021).

To get started, please load the TREG package.

library("TREG")

The goal of TREG is to help find candidate Total RNA Expression Genes (TREGs) in single nucleus (or single cell) RNA-seq data.

2.1 Why are TREGs useful?

The expression of a TREG is proportional to the the overall RNA expression in a cell. This relationship can be used to estimate total RNA content in cells in assays where only a few genes can be measured, such as single-molecule fluorescent in situ hybridization (smFISH).

In a smFISH experiment the number of TREG puncta can be used to infer the total RNA expression of the cell.

The motivation of this work is to collect data via smFISH in to help build better deconvolution algorithms. But may be many other application for TREGs in experimental design!

The Expression of a TREG can inform total RNA content of a cell

2.2 What makes a gene a good TREG?

  1. The gene must have non-zero expression in most cells across different tissue and cell types.

  2. A TREG should also be expressed at a constant level in respect to other genes across different cell types or have high rank invariance.

  3. Be measurable as a continuous metric in the experimental assay, for example have a dynamic range of puncta when observed in RNAscope. This will need to be considered for the candidate TREGs, and may need to be validated experimentally.

Distribution of ranks of a gene of High and Low Invariance

2.3 How to find candidate TREGs with TREG