Introduction

Recent technological advances in chemical modification detection on DNA and RNA using high throughput sequencing have generated a plethora of epigenomic and epitranscriptomic datasets. However, functional characterization of chemical modifications on DNA and RNA still remain largely unexplored. Generally, in silico methods rely on identifying location and quantitative differential analysis of the modifications.

The number of epigenomic/epitranscriptomic modification events associated with the gene can provide the degree of modification/accessibility associated with the gene of interest, which in turn plays a crucial role in determining the levels of differential gene expression. Data integration methods can combine both these layers and provide information about the degree of events occurring and its influence on the differential gene expression, ribosome occupancy or protein translation. We made an assumption that if an epi-mark is functional for up- or down-regulation of expression in a particular context, a gene with the epi-mark is more likely to be impacted for its expression. If a gene has more marked sites thus higher degree of modification would be impacted more, which can be quantified as differential RNA expression, ribosome occupancy, or protein abundance obtained from RNA-seq, Ribo-seq, or proteomics datasets

Here we present epidecodeR, an R package capable of integrating chemical modification data generated from a host of epigenomic or epitranscriptomic techniques such as ChIP-seq, ATAC-seq, m6A-seq, etc. and dysregulated gene lists in the form of differential gene expression, ribosome occupancy or differential protein translation and identify impact of dysregulation of genes caused due to varying degrees of chemical modifications associated with the genes. epidecodeR generates cumulative distribution function (CDF) plots showing shifts in trend of overall log2FC between genes divided into groups based on the degree of modification associated with the genes. The tool also tests for significance of difference in log2FC between groups of genes.

Implementation steps

  1. Calculate sum of all events associated with each gene

  2. Group genes into user defined groups based on degree (count) of events per gene

  3. Calculate theoretical and empirical cumulative probabilities (quantiles) of log2FC per group

  4. Perform ANOVA one-way test of significance in difference of mean between groups

  5. Plot CDF plots and boxplot with significance testing results

For the impatient

Installation

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("epidecodeR")
library(epidecodeR)

Run example

events<-system.file("extdata", "NOMO-1_ref_peaks.bed", package="epidecodeR")
deg<-system.file("extdata", "FTOi.txt", package="epidecodeR")
epiobj <- epidecodeR(events = events, deg = deg, pval=0.05, param = 3, ints=c(2,4))
makeplot(epiobj, lim = c(-10,10), title = "m6A mediated dysregulation after FTO inhibitor treatment", xlab = "log2FC")
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).

plot_test(epiobj, title = "m6A mediated dysregulation after FTO inhibitor treatment", ylab = "log2FC")