1 ALPS-Introduction

ALPS (AnaLysis routines for ePigenomicS data) is an R package (Venu 2019) that provides tools for the analysis and to produce publication-ready visualizations, mainly aimed at genome-wide epigenomics data, e.g. ChIP-seq, ATAC-seq etc.

1.1 Bigwig files

Bigwig files evolved to be a multi-purpose compressed binary format to store genome-wide data at base pair level. Bigwig files are mostly used to store genome-wide quantitative data such as ChIP-seq, ATAC-seq, WGBS, GRO-seq etc. Following figure illsutrates the important usecases with bigwig files.

1.2 Generate bigwig files

There are multiple ways one can generate bigwig files from BAM files, using UCSC kent utils (ucscGenomeBrowser 2019) or with the deeptools bamCoverage function (Ramírez et al. 2014), which is the easiest way. Once the normalized bigwig files are generated and peaks are identified from BAM files, one would seldom use BAM files again in the entire workflow. The requirements of all downstream processes can be satisified with normalized bigwig files, e.g quantifying normalized read counts at peaks or promoters, visualizing enrichments in genome broswer or igv.

After the peaks are identified, the immediate steps would be to quantify normalized read counts at the identified peaks in order to perform explorative data analysis (EDA), PCA, unsupervised clustering to identify patterns among samples under consideration and generate novel biological insights.

1.3 ALPS - workflow

ALPS package is designed in a way to start with a minimal set of input and to reach a rich source of insights from the data. At the most, most functions in ALPS require a data table with paths to bigwig files and associated sample meta information. Various functions will utilize this data table and generate downstream outputs. The package produces publication quality visualizations, of which most can be customized within R using ggplot2 ecosystem.

Following is the overview of the ALPS workflow and available functions