TEKRABber 1.1.0
TEKRABber is used to estimate the correlations between genes and transposable elements (TEs) from RNA-seq data comparing between: (1) Two Species (2) Control vs. Experiment. In the following sections, we will use built-in data to demonstrate how to implement TEKRABber on you own analysis.
To use TEKRABber from your R environment, you need to install it using BiocManager:
install.packages("BiocManager")
BiocManager::install("TEKRABber")
library(TEKRABber)
library(SummarizedExperiment) # load it if you are running this tutorial
Gene and TE expression data are generated from randomly picked brain regions FASTQ files from 10 humans and 10 chimpanzees (Khrameeva E et al., Genome Research, 2020). The values for the first column of gene and TE count table must be Ensembl gene ID and TE name:
# load built-in data
data(speciesCounts)
hmGene <- speciesCounts$hmGene
hmTE <- speciesCounts$hmTE
chimpGene <- speciesCounts$chimpGene
chimpTE <- speciesCounts$chimpTE
# the first column must be Ensembl gene ID for gene, and TE name for TE
head(hmGene)
## Geneid SRR8750453 SRR8750454 SRR8750455 SRR8750456 SRR8750457
## 1 ENSG00000000003 250 267 227 286 128
## 2 ENSG00000000005 13 2 15 9 5
## 3 ENSG00000000419 260 311 159 259 272
## 4 ENSG00000000457 86 131 100 94 80
## 5 ENSG00000000460 21 17 42 33 55
## 6 ENSG00000000938 162 75 95 252 195
## SRR8750458 SRR8750459 SRR8750460 SRR8750461 SRR8750462
## 1 394 268 102 370 244
## 2 0 1 8 0 2
## 3 408 371 126 211 374
## 4 158 119 46 77 81
## 5 29 50 11 18 20
## 6 137 93 108 197 69
In the first step, we use orthologScale()
to get orthology information and
calculate the scaling factor between two species. The species name needs to be
the abbreviation of scientific species name used in Ensembl. (Note: (1)This
step queries information using biomaRt and it might
need some time or try different mirrors due to the connections to Ensembl
(2)It might take some time to calculate scaling factor based on
your data size).
# You can use the code below to search for species name
ensembl <- biomaRt::useEnsembl(biomart = "genes")
biomaRt::listDatasets(ensembl)
# In order to save time, we have save the data for this tutorial.
data(fetchDataHmChimp)
fetchData <- fetchDataHmChimp
# Query the data and calculate scaling factor using orthologScale():
# fetchData <- orthologScale(
# geneCountRef = hmGene,
# geneCountCompare = chimpGene,
# speciesRef = "hsapiens",
# speciesCompare = "ptroglodytes"
# )
correlation estimation
We use DECorrInputs()
to return input files for downstream analysis.
inputBundle <- DECorrInputs(
orthologTable = fetchData$orthologTable,
scaleFactor = fetchData$scaleFactor,
geneCountRef = hmGene,
geneCountCompare = chimpGene,
teCountRef = hmTE,
teCountCompare = chimpTE
)
In this step, we need to generate a metadata contain species name
(i.e., human and chimpanzee). The row names need to be same as the DE input
table and the column name must be species (see the example below). Then we
use DEgeneTE()
to perform DE analysis. When you are comparing samples between
two species, the parameter expDesign should be TRUE (as default).
meta <- data.frame(
species = c(rep("human", ncol(hmGene) - 1),
rep("chimpanzee", ncol(chimpGene) - 1))
)
meta$species <- factor(meta$species, levels = c("human", "chimpanzee"))
rownames(meta) <- colnames(inputBundle$geneInputDESeq2)
hmchimpDE <- DEgeneTE(
geneTable = inputBundle$geneInputDESeq2,
teTable = inputBundle$teInputDESeq2,
metadata = meta,
expDesign = TRUE
)
Here we use corrOrthologTE()
to perform correlation estimation comparing
each ortholog and TE. This is the most time-consuming step if you have large
data. For a quick demonstration, we use a relatively small data. You can
specify the correlation method and adjusted p-value method. The default
methods are Pearson’s correlation and FDR. Note: For more efficient and
specific analysis, you can subset your data in this step to focus on only the
orthologs and TEs that you are interested in.
# load built-in data
data(speciesCorr)
hmGeneCorrInput <- assay_tekcorrset(speciesCorr, "gene", "human")
hmTECorrInput <- assay_tekcorrset(speciesCorr, "te", "human")
chimpGeneCorrInput <- assay_tekcorrset(speciesCorr, "gene", "chimpanzee")
chimpTECorrInput <- assay_tekcorrset(speciesCorr, "te", "chimpanzee")
hmCorrResult <- corrOrthologTE(
geneInput = hmGeneCorrInput,
teInput = hmTECorrInput,
corrMethod = "pearson",
padjMethod = "fdr"
)
chimpCorrResult <- corrOrthologTE(
geneInput = chimpGeneCorrInput,
teInput = chimpTECorrInput,
corrMethod = "pearson",
padjMethod = "fdr"
)
head(hmCorrResult)
## geneName teName pvalue coef padj
## 1 ENSG00000143125 L1MD 0.271964872 0.38497828 0.9990235
## 2 ENSG00000143125 MSTA 0.335873091 0.34036703 0.9990235
## 3 ENSG00000143125 MLT1N2-int 0.966658172 0.01524552 0.9990235
## 4 ENSG00000143125 LTR57 0.067870603 0.59794954 0.9990235
## 5 ENSG00000143125 HERVK11-int 0.001028058 0.87118988 0.3210294
## 6 ENSG00000143125 LTR5 0.855235258 -0.06647109 0.9990235
appTEKRABber()
:TEKRABber provides an app function for you to
quickly view your result. First, you will need to assign the differentially
expressed orthologs/TEs results, correlation results and metadata as global
variables: appDE
, appRef
, appCompare
and appMeta
. See the following
example.
#create global variables for app-use
appDE <- hmchimpDE
appRef <- hmCorrResult
appCompare <- chimpCorrResult
appMeta <- meta # this is the same one in DE analysis
appTEKRABber()
In the Expression tab page, (1) you can specify your input gene and
TE. The result will show in box plots with data points in normalized log2
expression level (2) DE analysis result will show in table including
statistical information (3) Correlation result will indicate if these
selected pairs are significantly correlated and the value of correlation
coefficients.