Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. – Peter J. Park
Quality Assessment of sequencing: FastQC
Peak calling: CCAT, SICER, MACS, ZINBA, BayesPeak, chipseq, ChIPseqR, CSAR, csaw, GenoGAM, iSeq, PICS
Quality assessment: ChIPQC
Peak annotation and pathway analysis: ChIPpeakAnno, Homer, PAVIS, GREAT, ChIPseeker, clusterProfiler, GOstats
Gene Network Building: GeneNetworkBuilder, ChIPXpress
Visualization: trackViewer, IGV, UCSC genome browser, rtracklayer
Motif enrichment analysis: The MEME Suite, homer
The workflow is based primarily on software packages from the open-source Bioconductor project. It contains all steps that are necessary for detecting peaks, starting from the raw read sequences.
Reads are first aligned to the genome using the Rsubread package.
Peaks are called by MACS2.
The peaks are annotated and virualized by ChIPpeakAnno , trackViewer, and GeneNetworkBuilder packages.
We will demonstrate the workflow on a publicly available ChIP-seq dataset to profile YAP/TAZ/TEAD binding sites (Zanconato et al., 2015).
Main functions:
Compare and combine biological replicates (IDRfilter
, findOverlapsOfPeaks
, peakPermTest
)
Annotate to nearest gene (annotatePeakInBatch
)
Pathway Analysis (getEnrichedGO
, getEnrichedPath
)
Separate binding sites into active enhancer regions and promoter regions using histone modification data
Visualize the signal pattern for genomic loci bound by multiple transcription factors
Build regulatory network (buildNetwork
)
Filter regulatory network (filterNetwork
)
Polish regulatory network (polishNetwork
)
Import data (importScore
)
Build gene model (geneModelFromTxdb
)
View the tracks (viewTracks
, browseTracks
)
NCBI GEO ID | SRA project | Description | package |
---|---|---|---|
GSE66081 | SRP055170 | the data of ChIP-seq raw reads for the study of association between YAP/TAZ/TEAD and AP-1 at enhancers drivers oncogenic trowth | SRAdb |
GSE49651 | the H3K27Ac, H3K4me1, H3K4me3 ChIP-seq data in breast cancer cell line | GEOquery | |
GSE59232 | transcriptomic data of Affymetrix GeneChips Human Genome U133 Plus 1.0 | GEOquery |
Here we use SRAdb package to download the sequence data.
## load library SRAdb to retrieve SRA files for GSE66081 dataset
library(SRAdb)
## SRAdb need to download a light sqlite file
## to extract sra file information
## and then download by getSRAfile function.
sqlfile <- "SRAmetadb.sqlite"
if(!file.exists(sqlfile))
sqlfile <- getSRAdbFile()
sra_con <- dbConnect(SQLite(), sqlfile)
conversion <- sraConvert(c("SRP055170"),
sra_con=sra_con)
out <- getSRAfile(conversion$sample,
sra_con=sra_con,
fileType="sra")
## extract file annotations
rs <- listSRAfile(conversion$sample,
sra_con=sra_con,
fileType="sra")
experiment <-
dbGetQuery(sra_con,
paste0("select * from experiment where",
" experiment_accession in ('",
paste(conversion$experiment,
collapse="','"), "')"))
info <- merge(experiment[, c("experiment_accession",
"title", "library_layout")],
rs[, c("experiment", "run")], by=1)
info[1:5, 2:3]
## title library_layout
## 1 GSM1614028: YAP_rep1_ChIPSeq; Homo sapiens; ChIP-Seq SINGLE -
## 2 GSM1614028: YAP_rep1_ChIPSeq; Homo sapiens; ChIP-Seq SINGLE -
## 3 GSM1614028: YAP_rep1_ChIPSeq; Homo sapiens; ChIP-Seq SINGLE -
## 4 GSM1614029: YAP_rep2_ChIPSeq; Homo sapiens; ChIP-Seq SINGLE -
## 5 GSM1614029: YAP_rep2_ChIPSeq; Homo sapiens; ChIP-Seq SINGLE -
These files are downloaded in the SRA format, and need to be unpacked to the FASTQ format prior to alignment. This can be done using the fastq-dump utility from SRA Toolkit.
runs <- info[, "run"]
## extract fastq files from sra by sratoolkit
sapply(paste0(runs, ".sra"),
function(.ele) system(paste("fastq-dump",
.ele)))
Technical replicates are merged together prior to further processing. This reflects the fact that they originate from a single library of DNA fragments.
grp <- do.call(rbind,
strsplit(info$title, "(;|:) "))
info <- cbind(info, grp)
runs <- split(runs, grp[, 2])
group <- gsub("_ChIPSeq", "", names(runs))
runs[1:2]
## $IgG_rep1_ChIPSeq
## [1] "SRR1810913" "SRR1810912" "SRR1810914"
##
## $IgG_rep2_ChIPSeq
## [1] "SRR1810915" "SRR1810917" "SRR1810916"
mapply(function(.ele, n)
system(paste("cat", paste0(.ele, ".fastq", collapse=" "),
">>", paste0(n, ".fastq"), collapse=" ")),
runs, group)
## remove unused files to save storage.
unlink(paste0(info$run, ".fastq"))
unlink(paste0(info$run, ".sra"))
Reads in each library are aligned to the hg19 build of human genome, using the Rsubread package.
library(Rsubread)
fastq.files <- paste0(group, ".fastq")
bam.files <- paste0(group, ".bam")
##getPhredOffset, if the offset is not correct, it will report error.
x <- qualityScores(filename=fastq.files[1],
input_format="FASTQ", offset=33,
nreads=1000)
x[1:3, 1:10]
## 1 2 3 4 5 6 7 8 9 10
## [1,] 34 34 34 37 37 37 35 35 39 39
## [2,] 34 34 34 37 37 37 37 37 39 39
## [3,] 34 34 34 37 37 37 37 37 35 37
library(Rsubread)
reads <- system.file("extdata","reads.txt.gz",package="Rsubread")
head(qualityScores(filename = reads,
input_format = "gzFASTQ",
offset = 33,
nreads = 100)) ## please check the errors
head(qualityScores(filename = reads,
input_format = "gzFASTQ",
offset = 64,
nreads = 100))
An index is constructed with the prefix index/hg19 and the index can be re-used. Here, the type
of sequencing data is set to 1 for genomic DNA-seq data. The uniquely mapped reads should be reported only by setting the uniqe
to TRUE.
## build index and this can be re-used.
hg19.fasta <-
"Genomes/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa"
dir.create("index")
buildindex(basename="index/hg19",
reference=hg19.fasta)
## alignment
align(index="./index/hg19", readfile1=fastq.files,
type=1,
output_file=bam.files, maxMismatches=2,
nthreads=2,
phredOffset=33, unique=TRUE)
Subsample the data to save time for this tutorial.
library(Rsamtools)
filter <- FilterRules(list(seqn = function(x) x$rname == "chr1"))
## sort and index
null <- sapply(group, function(gp) {
sortBam(paste0(gp, ".bam"), paste0(gp, ".srt"))
file.copy(paste0(gp, ".srt.bam"),
paste0(gp, ".bam"), overwrite = TRUE)
indexBam(paste0(gp, ".bam"))
})
## subset (optional)
null <- sapply(group, function(gp) {
dest <- paste0(gp, ".chr1.bam")
filterBam(paste0(gp, ".bam"), paste0(gp, ".chr1.bam"),
filter=filter)
file.rename(paste0(gp, ".chr1.bam"), paste0(gp, ".bam"))
indexBam(paste0(gp, ".bam"))
})
Potential PCR duplicates are removed by removeDupReads function in Rsubread package. This step can also be done by MarkDuplicates tool from the Picard software suite.
By asBam function from Rsamtools package, the alignment is sorted by their mapping location, and an index created on the destination (with extension ‘.bai’) when indexDestination=TRUE
.
## remove reads which are mapped to identical locations,
## using mapping location of the first base of each read.
## and resort by position
library(Rsamtools)
library(Rsubread)
null <- sapply(group[c(1, 11:13)], function(gp){
out <- asSam(paste0(gp, ".bam"), gp)
removeDupReads(out, threshold=2,
outputFile=paste0(gp, ".rmdup.sam"))
asBam(paste0(gp, ".rmdup.sam"), paste0(gp, ".rmdup"),
indexDestination=TRUE)
unlink(out)
unlink(paste0(gp, ".rmdup.sam"))
})
The control datasets are pooled before calling peaks according the description of the paper.
## pool IgG depend on experiment
IgG.bg <- list(rep1.2=c("IgG_rep1.rmdup.bam",
"IgG_rep2.rmdup.bam",
"IgG_rep1.2.rmdup.bam"),
rep3.4=c("IgG_rep3.rmdup.bam",
"IgG_rep4.rmdup.bam",
"IgG_rep3.4.rmdup.bam"),
rep3.5=c("IgG_rep3.rmdup.bam",
"IgG_rep5.rmdup.bam",
"IgG_rep3.5.rmdup.bam"))
null <- sapply(IgG.bg[3], function(.ele){
out <- mergeBam(files=.ele[-3],
destination = .ele[3],
indexDestination=TRUE)
})
Before we call peaks, we may want to check the difference between ChIP signal and background signal.
By ChIPpeakAnno::cumulativePercentage
function, the difference between the cumulative percentage tag allocation in Input and IP could be clearly checked (Diaz et al., 2012, Ramírez et al., 2016).
library(ChIPpeakAnno)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
pair <- data.frame(treat=c("YAP", "TAZ", "TEAD4", "JUN"),
control=c("1.2", "1.2", "3.4", "3.5"),
stringsAsFactors = FALSE)
## use chromosome 1 to save the time.
chr1 <- as(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)["chr1"], "GRanges")
## check the difference between TEAD4 ChIP and background signal
i <- 3
cumulativePercentage(
bamfiles = c(paste0(pair[i, "treat"],
"_rep1.rmdup.bam"),
paste0("IgG_rep", pair[i, "control"],
".rmdup.bam")),
gr=chr1)
See documentation from deeptools
We set loose filter condition for MACS2 to get more peaks for irreproducibility discovery rate (IDR) analysis.
for(i in seq.int(nrow(pair))){
system(paste("macs2 callpeak --to-large -t",
paste0(pair[i, "treat"], "_rep1.rmdup.bam"),
"-c ", paste0("IgG_rep", pair[i, "control"], ".rmdup.bam"),
" -f BAM -g hs -q 0.1 -n",
paste0(pair[i, "treat"], "_rep1.q1")))
system(paste("macs2 callpeak --to-large -t",
paste0(pair[i, "treat"], "_rep2.rmdup.bam"),
"-c ", paste0("IgG_rep", pair[i, "control"], ".rmdup.bam"),
" -f BAM -g hs -q 0.1 -n",
paste0(pair[i, "treat"], "_rep2.q1")))
## we will use idr to filter the results later
}
After peak calling, the output of MACS2 will be imported into R by ChIPpeakAnno::toGRanges
function. The top 10000 peaks sorted by pValue (FDR) will be used for IDR analysis.
library(ChIPpeakAnno)
macs2.files <- dir(".", pattern="*.q1_peaks.narrowPeak$")
peaks <- sapply(macs2.files, function(.ele)
toGRanges(.ele, format="narrowPeak"))
peaks.bk <- peaks
peaks <- lapply(peaks, function(.ele)
.ele[order(.ele$pValue, decreasing=TRUE)])
toGRanges
library(ChIPpeakAnno)
packagePath <- system.file("extdata", package = "ChIPpeakAnno")
dir(packagePath, "gff|bed|narrowPeak|broadPeak|gz")
toGRanges(file.path(packagePath, "GFF_peaks.gff"), format = "GFF")
toGRanges(file.path(packagePath, "WS220.bed"), format = "BED")
toGRanges(file.path(packagePath, "peaks.narrowPeak"), format = "narrowPeak")
toGRanges(file.path(packagePath, "TAF.broadPeak"), format = "broadPeak")
toGRanges(file.path(packagePath, "wgEncodeUmassDekker5CGm12878PkV2.bed.gz"), format = "BED")
packagePath <- system.file("extdata", "macs", package = "ChIPseqStepByStep")
macs2.files <- dir(packagePath, pattern="*.q1_peaks.narrowPeak$")
peaks <- sapply(macs2.files, function(.ele)
toGRanges(file.path(packagePath, .ele), format="narrowPeak"))
peaks.bk <- peaks
peaks <- lapply(peaks, function(.ele) .ele[order(.ele$pValue, decreasing=TRUE)])
There are lots of noise peaks when we set loose filter condition for MACS2. We will use IDR framework to filter the low reproducible peaks (Landt et al., 2012). Only overlapping peaks will be used for IDR calculation.
peaks <- lapply(peaks, function(.ele) .ele[1:min(1e5, length(.ele))])
lengths(peaks)
## JUN_rep1.q1_peaks.narrowPeak JUN_rep2.q1_peaks.narrowPeak
## 8877 23544
## TAZ_rep1.q1_peaks.narrowPeak TAZ_rep2.q1_peaks.narrowPeak
## 3647 4200
## TEAD4_rep1.q1_peaks.narrowPeak TEAD4_rep2.q1_peaks.narrowPeak
## 6850 1520
## YAP_rep1.q1_peaks.narrowPeak YAP_rep2.q1_peaks.narrowPeak
## 4187 2593
new.group <- gsub("^(.*?)_.*$", "\\1", names(peaks))
peaks.grp <- split(peaks, new.group)
names(peaks.grp)
## find overlapping peaks
GSE66081 <- sapply(peaks.grp, findOverlapsOfPeaks, simplify = FALSE)
GSE66081.bk <- GSE66081
GSE66081 <- sapply(GSE66081, function(.ele)
.ele$peaklist[[names(.ele$peaklist)[grepl("\\/\\/\\/",
names(.ele$peaklist))]]],
simplify=FALSE)
lengths(GSE66081)
## JUN TAZ TEAD4 YAP
## 3226 1136 693 923
The IDR are calculated by the average coverages of each overlapping peak. The reads counts for peaks are done by summarizeOverlaps function in GenomicAlignments package.
library(ChIPpeakAnno)
new.group.names <- lapply(GSE66081.bk, function(.ele)
sub(".q1_peaks.narrowPeak", ".bam",
names(.ele$peaklist)[!grepl("\\/\\/\\/", names(.ele$peaklist))]))
GSE66081 <- mapply(function(.peaks, .bamfiles)
IDRfilter(peaksA=.peaks[[1]], peaksB=.peaks[[2]],
bamfileA=.bamfiles[1], bamfileB=.bamfiles[2],
IDRcutoff=0.01),
peaks.grp, new.group.names)
## The original peaks are filtered to decrease the memory usage.
peaks.keep <- sapply(GSE66081, function(.ele)
as.character(unlist(.ele$peakNames)))
peaks.keep <- lapply(peaks.keep, function(.ele) gsub("^.*__", "", .ele))
peaks.keep <- unlist(peaks.keep)
peaks.s <- lapply(peaks, function(.ele) .ele[names(.ele) %in% peaks.keep])
peaks.unlist <- unlist(GRangesList(peaks.s), use.names = FALSE)
rtracklayer
Export the filtered peak by export function in rtracklayer package.
library(rtracklayer)
null <- mapply(function(.dat, .name)
export(.dat,
sub("^(.*?).q1.*$", "\\1.fil.bed", .name),
format="BED"),
peaks.s, names(peaks.s))
## filter the peaks by qValue < 10^-10 to get similar number peaks
## as filtered by idr to confirm the effect of IDR filter.
peaks.bk <- lapply(peaks.bk, function(.ele)
.ele[.ele$qValue>10])
null <- mapply(function(.dat, .name)
export(.dat,
sub("^(.*?).q1.*$", "\\1.fil.q01.bed", .name),
format="BED"),
peaks.bk, names(peaks.bk))
The ChIP-seq experiment could be checked by ChIPQC package.
Usally, before mapping, FastQC could be used for sequence quality accessment. And after mapping, SAMStat could be used for mapping quality analysis, and strand cross-correlation could be used for checking the experiment quality via csaw package.
## strand cross-correlation
library(csaw)
scc <- lapply(sub("^(.*?).q1.*$", "\\1.rmdup.bam", names(peaks)),
correlateReads, cross=TRUE,
param=readParam(minq=30,
restrict="chr1",
dedup=TRUE))
names(scc) <- gsub(".q1.*$", "", names(peaks))
scc.param <- lapply(scc, function(.ele){
readsLength <- 50
cutline <- 2 * readsLength
read_length <- which.max(.ele[1:cutline])
fragment_length <- which.max(.ele[cutline:length(.ele)]) + cutline - 1
baseline <- which.min(.ele[cutline:length(.ele)]) + cutline - 1
#normalized.strand.coefficient
NSC <- .ele[fragment_length] / .ele[baseline]
# relative strand correlation
RSC <- (.ele[fragment_length] - .ele[baseline])/(.ele[read_length] - .ele[baseline])
c(read_length=read_length-1,
fragment_length=fragment_length-1,
baseline=baseline-1,
NSC=NSC,
RSC=RSC)
})
The normalized strand coefficient (NSC) and relative strand correlation (RSC) are strong metrics for assessing signal-to-noise ratios in a ChIP-seq experiment. High-quality ChIP-seq data sets should have NSC values >= 1.05 and RSC values >= 0.8 (Landt et al., 2012).
do.call(rbind, scc.param)[, c("NSC", "RSC")]
## NSC RSC
## JUN_rep1 3.410316 1.4603898
## JUN_rep2 2.280298 1.1741713
## TAZ_rep1 2.185869 0.8979216
## TAZ_rep2 2.038841 0.8035722
## TEAD4_rep1 2.409461 0.9944640
## TEAD4_rep2 1.889183 0.6928578
## YAP_rep1 2.295337 0.8845790
## YAP_rep2 2.049041 0.8366900
The absolute and relative height of ‘phantom’ peak and ChIP peak are useful determinants of the success of a ChIP-seq experiment.
op <- par(mfrow=c(4, 2))
for(i in 1:length(scc)){
plot(1:length(scc[[i]])-1, scc[[i]],
xlab="Delay (bp)", ylab="cross-correlation",
main=names(scc)[i], type="l",
ylim=c(scc[[i]][scc.param[[i]]["baseline"]]*.8, max(scc[[i]])*1.1))
abline(v=scc.param[[i]]["fragment_length"], col="red", lty=3)
abline(h=scc[[i]][scc.param[[i]][c("fragment_length", "read_length", "baseline")]+1],
col=c("blue", "green", "gray30"), lty=2)
}
Now we have estimated fragment length and can generate track files for UCSC genome browser.
library(GenomicAlignments)
library(rtracklayer)
bwPath <- "bw"
dir.create(bwPath)
cvgs <- mapply(function(.ele, .name){
gal <- readGAlignments(paste0(.name, ".rmdup.bam"))
cvg <- coverage(gal, width = as.numeric(.ele["fragment_length"]))
seqinfo(cvg) <- seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)
export.bw(cvg,
con=file.path(bwPath,
paste0(.name, ".rmdup.bigWig")))
cvg
}, scc.param, names(scc.param))
ChIPQC is used for quality control.
library(ChIPQC)
samples <-
data.frame(SampleID=gsub("^(.*?).q1.*$", "\\1", names(peaks)),
Factor=gsub("^(.*?)_.*$", "\\1", names(peaks)),
Replicate=gsub("^.*?_rep(.*?).q1.*$", "\\1",
names(peaks)),
bamReads=gsub("^(.*?).q1.*$", "\\1.rmdup.bam",
names(peaks)),
Peaks=names(peaks),
PeakCaller="narrow")
samples[1:3, ]
## SampleID Factor Replicate bamReads
## 1 JUN_rep1 JUN 1 JUN_rep1.rmdup.bam
## 2 JUN_rep2 JUN 2 JUN_rep2.rmdup.bam
## 3 TAZ_rep1 TAZ 1 TAZ_rep1.rmdup.bam
## Peaks PeakCaller
## 1 JUN_rep1.q1_peaks.narrowPeak narrow
## 2 JUN_rep2.q1_peaks.narrowPeak narrow
## 3 TAZ_rep1.q1_peaks.narrowPeak narrow
We run ChIPQC for peaks before filter, filtered by IDR and qValue.
## before filter
exampleExp <- ChIPQC(experiment=samples, annotaiton="hg19")
ChIPQCreport(exampleExp, reportFolder="ChIPQC", facetBy="Factor")
## after IDR filter
samples.fil <- samples
samples.fil$Peaks <- gsub("^(.*?).q1.*$", "\\1.fil.bed", names(peaks))
samples.fil$PeakCaller="bed"
exampleExp.fil <- ChIPQC(experiment=samples.fil, annotaiton="hg19")
ChIPQCreport(exampleExp.fil,
reportFolder="ChIPQC.fil",
facetBy="Factor")
## filtered by qValue from the MACS2 result
samples.fil.q01 <- samples.fil
samples.fil.q01$Peaks <-
gsub("^(.*?).q1.*$", "\\1.fil.q01.bed", names(peaks))
exampleExp.fil.q01 <- ChIPQC(experiment=samples.fil.q01,
annotaiton="hg19")
ChIPQCreport(exampleExp.fil.q01,
reportFolder="ChIPQC.fil.q01",
facetBy="Factor")
Test the overlaps of all the peaks. From the testing result, we can confirm the widespread association between YAP, TAZ, TEAD and JUN. The vennDiagram shows the numbers of overlapping peak for these TFs.
ol <- findOverlapsOfPeaks(GSE66081, connectedPeaks="keepAll")
library(ChIPpeakAnno)
packagePath <- system.file("extdata", "filtered", package = "ChIPseqStepByStep")
filtedFiles <- dir(packagePath, ".bed")
GSE66081 <- sapply(filtedFiles, function(.ele){
toGRanges(file.path(packagePath, .ele), format="BED")
})
names(GSE66081) <- sub(".bed", "", names(GSE66081))
lengths(GSE66081)
ol <- findOverlapsOfPeaks(GSE66081)
names(ol)
names(ol$peaklist)
makeVennDiagram(ol)
makeVennDiagram(ol)
## $p.value
## JUN TAZ TEAD4 YAP pval
## [1,] 0 0 1 1 0
## [2,] 0 1 0 1 0
## [3,] 0 1 1 0 0
## [4,] 1 0 0 1 0
## [5,] 1 0 1 0 0
## [6,] 1 1 0 0 0
##
## $vennCounts
## JUN TAZ TEAD4 YAP Counts count.JUN count.TAZ count.TEAD4 count.YAP
## [1,] 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 1 26 0 0 0 26
## [3,] 0 0 1 0 48 0 0 48 0
## [4,] 0 0 1 1 17 0 0 17 17
## [5,] 0 1 0 0 80 0 80 0 0
## [6,] 0 1 0 1 29 0 29 0 29
## [7,] 0 1 1 0 28 0 28 28 0
## [8,] 0 1 1 1 63 0 63 63 63
## [9,] 1 0 0 0 2172 2172 0 0 0
## [10,] 1 0 0 1 73 73 0 0 75
## [11,] 1 0 1 0 44 44 0 44 0
## [12,] 1 0 1 1 23 23 0 23 24
## [13,] 1 1 0 0 187 187 188 0 0
## [14,] 1 1 0 1 255 256 260 0 258
## [15,] 1 1 1 0 51 53 56 53 0
## [16,] 1 1 1 1 412 418 432 417 431
## attr(,"class")
## [1] "VennCounts"
We want to confirm that not only the peaks are overlapped but also their summits are close to each other.
peaks.summit <- lapply(peaks, function(.ele) {
.ele <- shift(.ele, .ele$peak)
width(.ele) <- 1
.ele
})
TEAD.summit <- unlist(GRangesList(peaks.summit[new.group=="TEAD4"]))
TAZ.summit <- unlist(GRangesList(peaks.summit[new.group=="TAZ"]))
bof <- binOverFeature(TEAD.summit,
annotationData=TAZ.summit,
radius=300, nbins=300, FUN=length)
plot(as.numeric(rownames(bof)), bof,
ylab="Peak density",
xlab="Distance to the summit of TAZ peaks",
main="Position of TEAD4 peak summit",
type="l", xlim=c(-300, 300))
ChIPpeakAnno
The functions, toRanges
, annotatePeakInBatch
, and addGeneIDs
in the ChIPpeakAnno, the annotation of ChIP-Seq peaks becomes streamlined into four major steps:
Read peak data with toGRanges
Generate annotation data with toGRanges
Annotate peaks with annotatePeakInBatch
Add additional informations with addGeneIDs
The method toGRanges
can be used to construct GRanges object from various annotation format such as BED, GFF, user defined readable text files, EnsDb or TxDb object.
Please type ?toGRanges
for more information.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
annoData <- toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene, feature="gene")
annoData[1]
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 1 chr19 [58858172, 58874214] -
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
library(EnsDb.Hsapiens.v75)
toGRanges(EnsDb.Hsapiens.v75)
toGRanges(EnsDb.Hsapiens.v75, feature = "transcript")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
annoData <- toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene)
toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene, feature = "geneModel")
toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene, feature = "threeUTR")
annotatePeakInBatch
Then we use annotatePeakInBatch or annoPeaks function in ChIPpeakAnno package to annotate the peaks. Depend on the experiment, we can annotate the peaks in multiple ways.
overlaps <- ol$peaklist[["TAZ///TEAD4///YAP"]]
## annotate the peaks by nearest annotations.
YAP.TAZ.TEAD4.nearest <-
annotatePeakInBatch(overlaps,
AnnotationData=annoData)
## annotate the peaks by both side in a given range.
YAP.TAZ.TEAD4.2side <-
annotatePeakInBatch(overlaps,
AnnotationData=annoData,
output="nearestBiDirectionalPromoters",
bindingRegion=c(-100000, 100000))
overlaps <- ol$peaklist[["TAZ///TEAD4///YAP"]]
YAP.TAZ.TEAD4.nearest <-
annotatePeakInBatch(overlaps,
AnnotationData=annoData)
YAP.TAZ.TEAD4.promoter <-
annotatePeakInBatch(overlaps,
AnnotationData=annoData,
output="nearestBiDirectionalPromoters",
bindingRegion=c(-10000, 2000))
pie1(table(YAP.TAZ.TEAD4.2side$insideFeature))
addGeneIDs
library(org.Hs.eg.db)
YAP.TAZ.TEAD4.nearest.anno <-
addGeneIDs(annotatedPeak = YAP.TAZ.TEAD4.nearest,
orgAnn = org.Hs.eg.db,
feature_id_type = "entrez_id")
YAP.TAZ.TEAD4.nearest.anno[1:3]
## GRanges object with 3 ranges and 11 metadata columns:
## seqnames ranges strand |
## <Rle> <IRanges> <Rle> |
## X01.23254 chr1 [14812601, 14813726] * |
## X02.7709 chr1 [16249621, 16250159] * |
## X03.23400 chr1 [17339107, 17339960] * |
## peakNames peak
## <CharacterList> <character>
## X01.23254 TAZ__olp0130,TEAD4__olp0074,YAP__olp0097 01
## X02.7709 TEAD4__olp0089,YAP__olp0120,TAZ__olp0155 02
## X03.23400 YAP__olp0137,TAZ__olp0184,TEAD4__olp0105 03
## feature start_position end_position feature_strand
## <character> <integer> <integer> <character>
## X01.23254 23254 14925213 15444544 +
## X02.7709 7709 16268364 16302627 -
## X03.23400 23400 17312453 17338423 -
## insideFeature distancetoFeature shortestDistance
## <factor> <numeric> <integer>
## X01.23254 upstream -112612 111487
## X02.7709 downstream 53006 18205
## X03.23400 upstream -684 684
## fromOverlappingOrNearest symbol
## <character> <character>
## X01.23254 NearestLocation KAZN
## X02.7709 NearestLocation ZBTB17
## X03.23400 NearestLocation ATP13A2
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
library(org.Hs.eg.db)
YAP.TAZ.TEAD4.promoter.anno <-
addGeneIDs(annotatedPeak = YAP.TAZ.TEAD4.promoter,
orgAnn = org.Hs.eg.db,
feature_id_type = "entrez_id")
The annotation results can be saved in XLS file format using WriteXLS package to avoid the gene name errors that can be inadvertently introduced when opened by Excel (Zeeberg et al., 2004).
## save annotations
YAP.TAZ.TEAD4.2side.m <- as.data.frame(unname(YAP.TAZ.TEAD4.2side))
YAP.TAZ.TEAD4.2side.m$peakNames <- NULL
library(WriteXLS)
WriteXLS("YAP.TAZ.TEAD4.2side.m",
"YAP.TAZ.TEAD4.overlapping.peaks.anno.xls")
With the annotated peak data, you can call the function getEnrichedGO
to obtain a list of enriched GO terms. Please note that the default setting of feature_id_type
is “ensembl_gene_id”. If you are using TxDb as annotation data, you need to change it to “entrez_id”.
## enrichment analysis
library(org.Hs.eg.db)
over <- getEnrichedGO(YAP.TAZ.TEAD4.2side,
orgAnn="org.Hs.eg.db",
feature_id_type="entrez_id",
maxP=.01, condense=TRUE)
over[["bp"]][1:3, -c(3, 10)]
## go.id go.term Ontology count.InDataset
## 1 GO:0002252 immune effector process BP 8
## 2 GO:0006879 cellular iron ion homeostasis BP 2
## 3 GO:0006882 cellular zinc ion homeostasis BP 2
## count.InGenome pvalue totaltermInDataset totaltermInGenome
## 1 1169 0.008433204 3535 1462775
## 2 48 0.006117350 3535 1462775
## 3 25 0.001687957 3535 1462775
dim(over[["bp"]])
## [1] 15 10
For pathway analysis, you can call function getEnrichedPATH
using reactome or KEGG database.
library(reactome.db)
path <- getEnrichedPATH(YAP.TAZ.TEAD4.2side,
orgAnn="org.Hs.eg.db",
pathAnn="reactome.db",
feature_id_type="entrez_id",
maxP=.05)
path[1:2, ]
## path.id EntrezID count.InDataset count.InGenome pvalue
## 1 R-HSA-165054 1104 1 33 0.04744971
## 2 R-HSA-166663 1401 1 23 0.03331229
## totaltermInDataset totaltermInGenome PATH
## 1 159 108031 NA
## 2 159 108031 NA
Bidirectional promoters are the DNA regions located between the 5’ ends of two adjacent genes coded on opposite strands. The two adjacent genes are transcribed to the opposite directions, and often co-regulated by this shared promoter region(Robertson et al., 2007). Here is an example to find peaks with bi-directional promoters and output the percentage of the peaks near bi-directional promoters.
bdp <- peaksNearBDP(overlaps, annoData, MaxDistance=100000)
bdp
## $peaksWithBDP
## GRangesList object of length 9:
## $2
## GRanges object with 2 ranges and 9 metadata columns:
## seqnames ranges strand |
## <Rle> <IRanges> <Rle> |
## X2 chr1 [16249621, 16250159] * |
## X2 chr1 [16249621, 16250159] * |
## peakNames bdp_idx peak
## <CharacterList> <integer> <character>
## X2 TEAD4__olp0089,YAP__olp0120,TAZ__olp0155 2 X2
## X2 TEAD4__olp0089,YAP__olp0120,TAZ__olp0155 2 X2
## feature feature.ranges feature.strand distance
## <character> <IRanges> <Rle> <integer>
## X2 149563 [16330731, 16333184] + 80571
## X2 729614 [16160710, 16174642] - 74978
## insideFeature distanceToSite
## <factor> <integer>
## X2 upstream 80571
## X2 upstream 74978
##
## $4
## GRanges object with 2 ranges and 9 metadata columns:
## seqnames ranges strand |
## X4 chr1 [17516131, 17517419] * |
## X4 chr1 [17516131, 17517419] * |
## peakNames bdp_idx peak feature
## X4 TEAD4__olp0109,TAZ__olp0187,YAP__olp0140 4 X4 11240
## X4 TEAD4__olp0109,TAZ__olp0187,YAP__olp0140 4 X4 29943
## feature.ranges feature.strand distance insideFeature
## X4 [17393256, 17445948] - 70182 upstream
## X4 [17531621, 17572501] + 14201 upstream
## distanceToSite
## X4 70182
## X4 14201
##
## $22
## GRanges object with 2 ranges and 9 metadata columns:
## seqnames ranges strand |
## X22 chr1 [85666766, 85667899] * |
## X22 chr1 [85666766, 85667899] * |
## peakNames bdp_idx peak feature
## X22 YAP__olp0713,TEAD4__olp0529,TAZ__olp0876 22 X22 646626
## X22 YAP__olp0713,TEAD4__olp0529,TAZ__olp0876 22 X22 84144
## feature.ranges feature.strand distance insideFeature
## X22 [85742041, 85865646] + 74141 upstream
## X22 [85623356, 85666728] - 37 upstream
## distanceToSite
## X22 74141
## X22 37
##
## ...
## <6 more elements>
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
##
## $percentPeaksWithBDP
## [1] 0.1428571
##
## $n.peaks
## [1] 63
##
## $n.peaksWithBDP
## [1] 9
The distribution of peaks over exon, intron, enhancer, proximal promoter, 5’UTR and 3’UTR could give you some clues of how to annotate the peaks. The distribution can be summarized in the peak centric or nucleotide centric view using the function assignChromosomeRegion in ChIPpeakAnno package.
seqlevelsStyle(overlaps) ## == UCSC
aCR<-assignChromosomeRegion(overlaps, nucleotideLevel=FALSE,
TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene)
par(mar=c(10.1, 4.1, 4.1, 2.1))
barplot(aCR$percentage, las=3)
Owing to most of the peaks are in the intergenic regions, we questioned whether most YAP/TAZ/TEAD common peaks are located in enhancers. Enhancers can be distinguished from promoters by their epigenetic features, that is, relative enrichment of histone H3 monomethylation (H3K4me1) on Lys4 at enhancers, and trimethylation (H3K4me3) at promoters. We first determine the distribution of the distance between the peaks and known genes. And then check the correlation of peaks and enhancers.
dist2TSS <-
lapply(list(YAP=GSE66081[["YAP"]],
TAZ=GSE66081[["TAZ"]],
TEAD4=GSE66081[["TEAD4"]],
YAP.TAZ.TEAD4=ol$peaklist[["TAZ///TEAD4///YAP"]]),
function(.ele) {
.ele <- annotatePeakInBatch(.ele,
AnnotationData=annoData,
output="nearestLocation")
.ele$shortestDistance})
dist2TSS.cut <- lapply(dist2TSS, cut, breaks=c(0, 1e3, 1e4, 1e5, 1e10))
dist2TSS.table <- sapply(dist2TSS.cut, table)
dist2TSS.percentage <- apply(dist2TSS.table, 2,
function(.ele) .ele/sum(.ele))
library(ggplot2)
library(reshape2)
dist2TSS.percentage <- melt(dist2TSS.percentage)
dist2TSS.percentage$value <- dist2TSS.percentage$value * 100
ggplot(dist2TSS.percentage, aes(x=Var2, y=value, fill=Var1)) +
geom_bar(stat="identity") +
xlab("") + ylab("%") + labs(title="Distance to TSS") + theme_bw() +
scale_fill_manual(name="",
values=c("#FF0000FF", "#FF0000BB",
"#FF000077", "#FF000033"),
labels=c("<1kb", "1-10kb", "10-100kb", ">100kb"))
Because over 70% of the peaks are far from known TSS (more than 10Kb), we want to confirm the peaks could be linked to active enhancer region. The ChIP-seq data of epigenetic marks of same cell line are availble in GSE49651.
We use GEOquery package to download data from GEO. We can also download the data by SRAdb package and call peaks from raw reads. Here, we use the peaks saved in the GSE49651.
library(GEOquery)
getGEOSuppFiles("GSE49651")
epipeaks.files <- dir("GSE49651", full.names=TRUE)
library(ChIPpeakAnno)
epipeaks <- lapply(epipeaks.files[3:5],
function(.ele) toGRanges(gzfile(.ele), format="BED"))
names(epipeaks) <-
gsub("GSE49651_MDAMB231.(.*?).hg19.*._peaks.txt.gz", "\\1",
basename(epipeaks.files)[3:5])
The presence of H3K4me1 and H3K4me3 peaks, their genomic locations and their overlap can be used as the criteria to define promoters and enhancers. H3K4me3 peaks not overlapping with H3K4me1 peaks and close to a TSS (5kb) are defined as promoters, as NA otherwise; H3K4me1 peaks not overlapping with H3K4me3 peaks are defined as enhancers. Promoters or enhancers are defined as active if overlapping with H3K27ac peaks.
promoter.UCSC <- promoters(TxDb.Hsapiens.UCSC.hg19.knownGene, 5e3, 5e3)
promoter.UCSC <- unique(promoter.UCSC)
attach(epipeaks)
ol.promoter <- findOverlapsOfPeaks(H3K4me1, H3K4me3, H3K27Ac,
promoter.UCSC,
ignore.strand = FALSE)
promoter <- c(ol.promoter$peaklist[["H3K4me3///H3K27Ac///promoter.UCSC"]],
ol.promoter$peaklist[["H3K4me1///H3K4me3///H3K27Ac///promoter.UCSC"]])
enhancer.active <- ol.promoter$peaklist[["H3K4me1///H3K27Ac"]]
enhancer.inactive <- ol.promoter$peaklist[["H3K4me1"]]
mcols(enhancer.active) <- DataFrame(type="enhancer.active")
mcols(enhancer.inactive) <- DataFrame(type="enhancer.inactive")
mcols(promoter) <- DataFrame(type="promoter")
types <- c(promoter, enhancer.active, enhancer.inactive)
YAP.TAZ.TEAD <- ol$peaklist[["TAZ///TEAD4///YAP"]]
seqlevelsStyle(YAP.TAZ.TEAD) <- "UCSC"
YAP.TAZ.TEAD <- YAP.TAZ.TEAD[seqnames(YAP.TAZ.TEAD) %in%
seqlevels(types)]
YAP.TAZ.TEAD4.type <- findOverlaps(YAP.TAZ.TEAD, types)
tbl <- table(types[subjectHits(YAP.TAZ.TEAD4.type)]$type)
tbl["not.classified"] <- length(YAP.TAZ.TEAD) -
length(unique(queryHits(YAP.TAZ.TEAD4.type)))
pie1(tbl)
The binding pattern could be visualized and compared by heatmap and distribution curve from the binding ranges of overlapping peaks of target TFs. For big bedgraph files, bedtools are used to decrease the file size before importing into R.
##heatmap
YAP.TAZ.TEAD.assigned <- YAP.TAZ.TEAD[queryHits(YAP.TAZ.TEAD4.type)]
YAP.TAZ.TEAD.assigned$type <- types[subjectHits(YAP.TAZ.TEAD4.type)]$type
YAP.TAZ.TEAD.assigned <- c(YAP.TAZ.TEAD.assigned[YAP.TAZ.TEAD.assigned$type=="enhancer.active"], YAP.TAZ.TEAD.assigned[YAP.TAZ.TEAD.assigned$type=="promoter"])
YAP.TAZ.TEAD.assigned <- unique(YAP.TAZ.TEAD.assigned)
library(rtracklayer)
YAP.TAZ.TEAD.assigned.center <-
reCenterPeaks(YAP.TAZ.TEAD.assigned, width=1)
YAP.TAZ.TEAD.assigned.reCenter <-
reCenterPeaks(YAP.TAZ.TEAD.assigned, width=2000)
untar("GSE49651/GSE49651_RAW.tar")
sapply(c("GSM1204470_MDAMB231.H3K4me1_1.hg19.tags.bedGraph.gz",
"GSM1204472_MDAMB231.H3K4me3_1.hg19.tags.bedGraph.gz",
"GSM1204474_MDAMB231.H3K27Ac_1.hg19.tags.bedGraph.gz"), gunzip)
bams <- c("YAP", "TAZ", "TEAD4", "JUN")
library(BSgenome.Hsapiens.UCSC.hg19)
len <- seqlengths(Hsapiens)
write.table(len, file="hg19.genome.txt",
quote=FALSE, col.names=FALSE,
sep="\t")
sapply(bams, function(.ele){
system(paste0("genomeCoverageBed -bga -split -ibam ",
.ele, "_rep1.rmdup.bam -g hg19.genome.txt > ",
.ele, "_rep1.bedGraph"))
})
files <- c(YAP="YAP_rep1.bedGraph",
TAZ="TAZ_rep1.bedGraph",
TEAD4="TEAD4_rep1.bedGraph",
JUN="JUN_rep1.bedGraph",
H3K4me1="GSM1204470_MDAMB231.H3K4me1_1.hg19.tags.bedGraph",
H3K4me3="GSM1204472_MDAMB231.H3K4me3_1.hg19.tags.bedGraph",
H3K27Ac="GSM1204474_MDAMB231.H3K27Ac_1.hg19.tags.bedGraph")
export(YAP.TAZ.TEAD.assigned.reCenter, "tmp.bed", format="BED")
sapply(files, function(.ele)
system(paste("intersectBed -a", .ele, "-b tmp.bed >",
gsub("bedGraph", "sub.bedGraph", .ele))))
unlink("tmp.bed")
library(trackViewer)
cvglists <- sapply(gsub("bedGraph", "sub.bedGraph", files),
importData, format="bedGraph")
names(cvglists) <- names(files)
sig <- featureAlignedSignal(cvglists, YAP.TAZ.TEAD.assigned.center,
upstream=1000, downstream=1000)
names(sig)
## [1] "YAP" "TAZ" "TEAD4" "JUN" "H3K4me1" "H3K4me3" "H3K27Ac"
heatmap <- featureAlignedHeatmap(sig, YAP.TAZ.TEAD.assigned.center,
upstream=1000, downstream=1000,
annoMcols=c("type"),
margin=c(.1, .01, .15, .25))
We first get the sequences of the peaks and then summarize the enriched oligos.
library(BSgenome.Hsapiens.UCSC.hg19)
YAP.TAZ.TEAD4.uniq <- unique(YAP.TAZ.TEAD4.nearest)
YAP.TAZ.TEAD4.uniq <- YAP.TAZ.TEAD4.uniq[!is.na(YAP.TAZ.TEAD4.uniq$feature)]
strand(YAP.TAZ.TEAD4.uniq) <- as.character(YAP.TAZ.TEAD4.uniq$feature_strand)
seq <- getAllPeakSequence(YAP.TAZ.TEAD4.uniq, upstream=0, downstream=0,
genome=Hsapiens)
freqs <- oligoFrequency(Hsapiens$chr1, MarkovOrder=3)
os <- oligoSummary(seq, oligoLength=6, MarkovOrder=3,
quickMotif=TRUE, freqs=freqs, revcomp=TRUE)
zscore <- sort(os$zscore)
h <- hist(zscore, breaks=100)
text(zscore[length(zscore)], max(h$counts)/10,
labels=names(zscore[length(zscore)]), adj=1)
library(motifStack)
pfms <- mapply(function(.ele, id)
new("pfm", mat=.ele, name=paste("SAMPLE motif", id)),
os$motifs, 1:length(os$motifs))
motifStack(pfms)
library(BSgenome.Hsapiens.UCSC.hg19)
YAP.TAZ.TEAD4.uniq <- unique(YAP.TAZ.TEAD4.nearest)
YAP.TAZ.TEAD4.uniq <- YAP.TAZ.TEAD4.uniq[!is.na(YAP.TAZ.TEAD4.uniq$feature)]
strand(YAP.TAZ.TEAD4.uniq) <- as.character(YAP.TAZ.TEAD4.uniq$feature_strand)
seq <- getAllPeakSequence(YAP.TAZ.TEAD4.uniq, upstream=0, downstream=0, genome=Hsapiens)
freqs <- oligoFrequency(Hsapiens$chr1, MarkovOrder=3)
os <- oligoSummary(seq, oligoLength=6, MarkovOrder=3,
quickMotif=TRUE, freqs=freqs, revcomp=TRUE)
zscore <- sort(os$zscore)
h <- hist(zscore, breaks=100)
text(zscore[length(zscore)], max(h$counts)/10,
labels=names(zscore[length(zscore)]), adj=1)
library(motifStack)
pfms <- mapply(function(.ele, id)
new("pfm", mat=.ele, name=paste("SAMPLE motif", id)),
os$motifs, 1:length(os$motifs))
motifStack(pfms)
The oligoSummary is a quick tool to calculate the enriched k-mers in the peaks. To get accurate motifs, multiple softwares could be used, such as the MEME Suite(Bailey et al., 1994), Consensus(Hertz et al., 1990), rGADEM, Homer, and et. al.
ap1 <- GSE66081[["JUN"]]
seq.ap1 <- getAllPeakSequence(ap1, upstream=0, downstream=0, genome=Hsapiens)
os.ap1 <- oligoSummary(seq.ap1, oligoLength=7, MarkovOrder=3,
quickMotif=FALSE, freqs=freqs, revcomp=TRUE)
zscore.ap1 <- sort(os.ap1$zscore)
h.ap1 <- hist(zscore.ap1, breaks=100)
text(zscore.ap1[length(zscore.ap1)], max(h.ap1$counts)/10,
labels=names(zscore.ap1[length(zscore.ap1)]), adj=1)
Intersecting genes with promoter regions bound by YAP/TAZ from the ChIP experiment and genes differential expressed in knockdown of YAP/TAZ allow us to identify genes directly/indirectly regulated by YAP/TAZ by GeneNetworkBuilder
package.
Gene expression data are downloaded from GSE59232 by GEOquery
package.
library(GEOquery)
library(limma)
gset <- getGEO("GSE59232", GSEMatrix =TRUE, AnnotGPL=TRUE)[[1]]
pD <- pData(gset)[, c("title", "geo_accession")]
gset <- gset[, grepl("MDA-MB-231 cells", pD$title, fixed = TRUE)]
pD <- pData(gset)[, c("title", "geo_accession")]
ex <- exprs(gset) ## log2 transformed expression profile
Affymetrix microarray will be analyzed by limma
package.
fl <- do.call(rbind,
strsplit(sub("MDA-MB-231 cells ",
"", pD$title), ", "))
sml <- sub("#.$", "", fl[, 1])
design.table <- data.frame(group=sml,
batch=fl[, 1],
replicate=fl[, 2])
design <- model.matrix(~ -1 + group + batch + replicate,
data = design.table)
colnames(design) <- sub("group", "",
make.names(colnames(design)))
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(contrasts = "siYAP.TAZ-siControl",
levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
tT <- topTable(fit2, adjust="fdr", number = nrow(fit2))
Using interaction data from BioGRID saved in simpIntLists
package, combined with the direct targets of TAZ, buildNetwork
function will build a regulation network. And the network will be filtered by filterNetwork
function by the expression profile.
library(simpIntLists)
interactionmap <- findInteractionList("human", "Official")
interactionmap <- lapply(interactionmap, function(.ele)
cbind(from=as.character(.ele$name),
to=as.character(.ele$interactors)))
interactionmap <- do.call(rbind, interactionmap)
library(GeneNetworkBuilder)
rootgene <- "TAZ"
TFbindingTable <-
cbind(from=rootgene,
to=unique(YAP.TAZ.TEAD4.nearest.anno$symbol))
sifNetwork <-
buildNetwork(TFbindingTable=TFbindingTable,
interactionmap=interactionmap, level=2)
tT$symbols <- tT$Gene.symbol
tT <- tT[, c("ID", "logFC", "P.Value", "symbols")]
expressionData <- tT[!duplicated(tT[, "symbols"]), ]
cifNetwork <-
filterNetwork(rootgene=rootgene, sifNetwork=sifNetwork,
exprsData=expressionData, mergeBy="symbols",
miRNAlist=character(0),
tolerance=1, cutoffPVal=0.001,
cutoffLFC=1.5)
## polish network
gR<-polishNetwork(cifNetwork)
## plot by Rgraphviz
library(Rgraphviz)
plotNetwork<-function(gR, layouttype="dot", ...){
if(!is(gR,"graphNEL")) stop("gR must be a graphNEL object")
if(!(GeneNetworkBuilder:::inList(layouttype, c("dot", "neato", "twopi", "circo", "fdp")))){
stop("layouttype must be dot, neato, twopi, circo or fdp")
}
g1<-Rgraphviz::layoutGraph(gR, layoutType=layouttype, ...)
nodeRenderInfo(g1)$col <- nodeRenderInfo(gR)$col
nodeRenderInfo(g1)$fill <- nodeRenderInfo(gR)$fill
renderGraph(g1)
}
plotNetwork(gR)
browseNetwork(gR)
trackViewer
The peaks could be visualized by trackViewer package.
library(trackViewer)
## prepare ranges to view
gene <- "SYDE2"
eID <- mget(gene, org.Hs.egSYMBOL2EG)
gr <- as(annoData[eID[[1]]], "GRanges")
gr.promoter <- promoters(gr, upstream=5000, downstream=2000)
seqlevels(gr.promoter) <- "chr1"
seqinfo(gr.promoter) <- seqinfo(gr.promoter)["chr1"]
## import data
bams.rep1 <- sub(".bam", ".rmdup.bam",
bam.files[grepl("rep1", bam.files)])
syde <- lapply(bams.rep1, importBam, ranges=gr.promoter)
names(syde) <- gsub("_rep1.rmdup.bam", "", bams.rep1)
## get gene model
trs <- geneModelFromTxdb(TxDb.Hsapiens.UCSC.hg19.knownGene,
orgDb = org.Hs.eg.db, gr = gr.promoter)
trs.names <- sapply(trs, function(.ele) .ele@name)
trs <- trs[match(gene, trs.names)]
names(trs) <- gene
## modify the plot styles
optSty <- optimizeStyle(trackList(syde, trs, heightDist=c(.9, .1)),
theme="col")
trackList <- optSty$tracks
viewerStyle <- optSty$style
for(i in 1:length(trackList))
setTrackStyleParam(trackList[[i]], "ylabgp", list(cex=.8))
## plot
vp <- viewTracks(trackList, gr=gr.promoter, viewerStyle=viewerStyle)
addGuideLine(guideLine = c(85666950, 85667700), vp = vp)
Too complicated? Try browseTracks
library(trackViewer)
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ChIPseqStepByStep)
(ex.gr <- parse2GRanges("chr1:85664729-85671728:-"))
## get gene model
trs <- geneModelFromTxdb(TxDb.Hsapiens.UCSC.hg19.knownGene,
orgDb = org.Hs.eg.db, gr = ex.gr)
## import signals
packagePath <- system.file("extdata", "bedGraph", package = "ChIPseqStepByStep")
(ex.bg <- dir(packagePath, "bedGraph"))
ex.scores <- sapply(ex.bg, function(.ele){
importScore(file.path(packagePath, .ele),
format = "bedGraph",
ranges = ex.gr)
})
## browse the tracks
browseTracks(trackList(ex.scores, trs, heightDist = c(3/4, 1/4)), gr=ex.gr)
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Rgraphviz_2.21.0
## [2] plotly_4.7.1
## [3] simpIntLists_1.13.0
## [4] limma_3.33.7
## [5] png_0.1-7
## [6] trackViewer_1.13.8
## [7] knitr_1.16
## [8] motifStack_1.21.1
## [9] ade4_1.7-6
## [10] MotIV_1.33.0
## [11] grImport_0.9-0
## [12] XML_3.98-1.9
## [13] BSgenome.Hsapiens.UCSC.hg19_1.4.0
## [14] BSgenome_1.45.1
## [15] reactome.db_1.59.1
## [16] org.Hs.eg.db_3.4.1
## [17] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [18] GenomicFeatures_1.29.8
## [19] AnnotationDbi_1.39.2
## [20] csaw_1.11.1
## [21] BiocParallel_1.11.4
## [22] ChIPQC_1.13.1
## [23] DiffBind_2.5.8
## [24] ggplot2_2.2.1
## [25] idr_1.2
## [26] GenomicAlignments_1.13.4
## [27] SummarizedExperiment_1.7.5
## [28] DelayedArray_0.3.17
## [29] matrixStats_0.52.2
## [30] Biobase_2.37.2
## [31] rtracklayer_1.37.3
## [32] GeneNetworkBuilder_1.19.1
## [33] Rcpp_0.12.12
## [34] ChIPpeakAnno_3.11.4
## [35] VennDiagram_1.6.17
## [36] futile.logger_1.4.3
## [37] Rsamtools_1.29.0
## [38] Biostrings_2.45.3
## [39] XVector_0.17.0
## [40] GenomicRanges_1.29.12
## [41] GenomeInfoDb_1.13.4
## [42] IRanges_2.11.12
## [43] S4Vectors_0.15.5
## [44] Rsubread_1.27.4
## [45] reshape2_1.4.2
## [46] SRAdb_1.35.0
## [47] RCurl_1.95-4.8
## [48] bitops_1.0-6
## [49] graph_1.55.0
## [50] BiocGenerics_0.23.0
## [51] RSQLite_2.0
##
## loaded via a namespace (and not attached):
## [1] htmlwidgets_0.9
## [2] BatchJobs_1.6
## [3] munsell_0.4.3
## [4] systemPipeR_1.11.0
## [5] colorspace_1.3-2
## [6] BiocInstaller_1.27.2
## [7] Category_2.43.1
## [8] labeling_0.3
## [9] GenomeInfoDbData_0.99.1
## [10] hwriter_1.3.2
## [11] bit64_0.9-7
## [12] pheatmap_1.0.8
## [13] fail_1.3
## [14] rprojroot_1.2
## [15] TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2
## [16] lambda.r_1.1.9
## [17] biovizBase_1.25.1
## [18] regioneR_1.9.1
## [19] R6_2.2.2
## [20] locfit_1.5-9.1
## [21] AnnotationFilter_1.1.3
## [22] assertthat_0.2.0
## [23] scales_0.4.1
## [24] nnet_7.3-12
## [25] gtable_0.2.0
## [26] ensembldb_2.1.10
## [27] seqLogo_1.43.0
## [28] rlang_0.1.1
## [29] genefilter_1.59.0
## [30] BBmisc_1.11
## [31] splines_3.4.1
## [32] lazyeval_0.2.0
## [33] acepack_1.4.1
## [34] GEOquery_2.43.0
## [35] dichromat_2.0-0
## [36] brew_1.0-6
## [37] checkmate_1.8.3
## [38] yaml_2.1.14
## [39] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
## [40] backports_1.1.0
## [41] httpuv_1.3.5
## [42] Hmisc_4.0-3
## [43] RBGL_1.53.0
## [44] tools_3.4.1
## [45] gplots_3.0.1
## [46] RColorBrewer_1.1-2
## [47] plyr_1.8.4
## [48] base64enc_0.1-3
## [49] progress_1.1.2
## [50] zlibbioc_1.23.0
## [51] purrr_0.2.2.2
## [52] prettyunits_1.0.2
## [53] rpart_4.1-11
## [54] pbapply_1.3-3
## [55] chipseq_1.27.0
## [56] ggrepel_0.6.5
## [57] cluster_2.0.6
## [58] magrittr_1.5
## [59] data.table_1.10.4
## [60] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
## [61] futile.options_1.0.0
## [62] amap_0.8-14
## [63] ProtGenerics_1.9.0
## [64] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
## [65] mime_0.5
## [66] evaluate_0.10.1
## [67] xtable_1.8-2
## [68] gridExtra_2.2.1
## [69] compiler_3.4.1
## [70] biomaRt_2.33.3
## [71] tibble_1.3.3
## [72] KernSmooth_2.23-15
## [73] htmltools_0.3.6
## [74] GOstats_2.43.0
## [75] Formula_1.2-2
## [76] tidyr_0.6.3
## [77] sendmailR_1.2-1
## [78] DBI_0.7
## [79] MASS_7.3-47
## [80] rGADEM_2.25.0
## [81] ShortRead_1.35.1
## [82] Matrix_1.2-10
## [83] gdata_2.18.0
## [84] Gviz_1.21.1
## [85] bindr_0.1
## [86] pkgconfig_2.0.1
## [87] revealjs_0.9
## [88] foreign_0.8-69
## [89] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2
## [90] annotate_1.55.0
## [91] multtest_2.33.0
## [92] AnnotationForge_1.19.4
## [93] stringr_1.2.0
## [94] VariantAnnotation_1.23.6
## [95] digest_0.6.12
## [96] rmarkdown_1.6
## [97] htmlTable_1.9
## [98] edgeR_3.19.3
## [99] GSEABase_1.39.0
## [100] curl_2.8.1
## [101] shiny_1.0.3
## [102] gtools_3.5.0
## [103] rjson_0.2.15
## [104] jsonlite_1.5
## [105] bindrcpp_0.2
## [106] seqinr_3.3-6
## [107] viridisLite_0.2.0
## [108] lattice_0.20-35
## [109] Nozzle.R1_1.1-1
## [110] Rhtslib_1.9.1
## [111] httr_1.2.1
## [112] survival_2.41-3
## [113] GO.db_3.4.1
## [114] interactiveDisplayBase_1.15.0
## [115] glue_1.1.1
## [116] bit_1.1-12
## [117] stringi_1.1.5
## [118] blob_1.1.0
## [119] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
## [120] AnnotationHub_2.9.5
## [121] latticeExtra_0.6-28
## [122] caTools_1.17.1
## [123] memoise_1.1.0
## [124] dplyr_0.7.2
We would like to thank the support from the Department of Molecular, Cell and Cancer Biology (MCCB) at UMass Medical School (UMMS). We thank the early adopters who provided great ideas and feedbacks to enhance the features of the software.