TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

Tiago Chedraoui Silva1*, Antonio Colaprico2,3**, Catharina Olsen2,3***, Fulvio D’Angelo4,5****, Gianluca Bontempi3*****, Michele Ceccarelli6,7****** and Houtan Noushmehr1,8*******

1Department of Genetics, Ribeirao Preto Medical School, University of Sao Paulo, Ribeirao Preto, Brazil
2Interuniversity Institute of Bioinformatics in Brussels , Brussels, Belgium
3Machine Learning Group (MLG), University Libre de Bruxelles, Brussels, Belgium
4Institute for Cancer Genetics, Columbia University Medical Center, New York, New York 10032, USA.
5BIOGEM Istituto di Ricerche Genetiche ‘G. Salvatore’, Campo Reale, 83031 Ariano Irpino, Italy.
6Department of Science and Technology, University of Sannio, Benevento, Italy.
7BIOGEM Istituto di Ricerche Genetiche "G. Salvatore", Ariano Irpino, Italy.
8Department of Neurosurgery, Henry Ford Hospital, Detroit, Detroit, MI, USA

*tiagochst@gmail.com
**antonio.colaprico@gmail.com
***colsen@ulb.ac.be
****fulvio.dangelo@biogem.it
*****gbonte@ulb.ac.be
******ceccarelli@unisannio.it
*******houtana@gmail.com

2024-04-30

Environment

R version: R version 4.4.0 RC (2024-04-16 r86468)

Bioconductor version: 3.19

Package: 1.29.0

About

This workflow is based on the article: TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages (Silva et al. 2016). Due to time and space limitations, we downloaded only a subset of the data, for a real analysis please use all data available. The data used in the examples are available in the package TCGAWorkflowData.

Installation

To be able to execute all the steps of this workflow please install it with the following code:

if (!"BiocManager" %in% rownames(installed.packages()))
  install.packages("BiocManager")
BiocManager::install("TCGAWorkflow")
BiocManager::install("TCGAWorkflowData")

Loading packages

At the beginning of each section, the packages required to execute the code will be loaded. However, the following packages are required for all sections.

TCGAWorkflowData: this package contains the data necessary to execute each of the analysis steps. This is a subset of the downloaded to make the example faster. For a real analysis, please use all the data available.
DT: we will use it to visualize the results

library(TCGAWorkflowData)
library(DT)

Abstract

Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer.

To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM).

All the package landing pages used in this workflow can be found through the biocViews interface.

Keywords: Epigenomics, Genomics, Cancer, non-coding, TCGA, ENCODE, Roadmap, Bioinformatics.

Introduction

Cancer is a complex genetic disease spanning multiple molecular events such as point mutations, structural variations, translocations and activation of epigenetic and transcriptional signatures and networks. The effects of these events take place at different spatial and temporal scales with interlayer communications and feedback mechanisms creating a highly complex dynamic system. To gain insight into the biology of tumors most of the research in cancer genomics is aimed at the integration of the observations at multiple molecular scales and the analysis of their interplay. Even if many tumors share similar recurrent genomic events, their relationships with the observed phenotype are often not understood. For example, although we know that the majority of the most aggressive form of brain tumors such as glioma harbor the mutation of a single gene (IDH), the mechanistic explanation of the activation of its characteristic epigenetic and transcriptional signatures are still far to be well characterized. Moreover, network-based strategies have recently emerged as an effective framework for the discovery functional disease drivers that act as main regulators of cancer phenotypes. Here we describe a comprehensive workflow that integrates many Bioconductor packages in order to analyze and integrate the multiplicity of molecular observation layers in large-scale cancer dataset.

Indeed, recent technological developments allowed the deposition of large amounts of genomic and epigenomic data, such as gene expression, DNA methylation, and genomic localization of transcription factors, into freely available public international consortia like The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap) (Hawkins, Hon, and Ren 2010). An overview of the three consortia is described below:

The Cancer Genome Atlas (TCGA): The TCGA consortium, which is a National Institute of Health (NIH) initiative, makes publicly available molecular and clinical information for more than 30 types of human cancers including exome (variant analysis), single nucleotide polymorphism (SNP), DNA methylation, transcriptome (mRNA), microRNA (miRNA) and proteome. Sample types available at TCGA are: primary solid tumors, recurrent solid tumors, blood derived normal and tumor, metastatic, and solid tissue normal (Weinstein et al. 2013).
The Encyclopedia of DNA Elements (ENCODE): Found in 2003 by the National Human Genome Research Institute (NHGRI), the project aims to build a comprehensive list of functional elements that have an active role in the genome, including regulatory elements that govern gene expression. Biosamples include immortalized cell lines, tissues, primary cells and stem cells (Consortium and others 2011).
The NIH Roadmap Epigenomics Mapping Consortium: This was launched with the goal of producing a public resource of human epigenomic data in order to analyze biology and disease-oriented research. Roadmap maps DNA methylation, histone modifications, chromatin accessibility, and small RNA transcripts in stem cells and primary ex vivo tissues (Fingerman et al. 2011; Bernstein et al. 2010).

Briefly, these three consortia provide large-scale epigenomic data onto a variety of microarrays and next-generation sequencing (NGS) platforms. Each consortium encompasses specific types of biological information on a specific type of tissue or cell and when analyzed together, it provides an invaluable opportunity for research laboratories to better understand the developmental progression of normal cells to cancer state at the molecular level and importantly, correlate these phenotypes with tissue of origins.

Although there exists a wealth of possibilities (Kannan et al. 2015) in accessing cancer associated data, Bioconductor represents the most comprehensive set of open source, updated and integrated professional tools for the statistical analysis of large-scale genomic data. Thus, we propose our workflow within Bioconductor to describe how to download, process, analyze and integrate cancer data to understand specific cancer-related specific questions. However, there is no tool that solves the issue of integration in a comprehensive sequence and mutation information, epigenomic state and gene expression within the context of gene regulatory networks to identify oncogenic drivers and characterize altered pathways during cancer progression. Therefore, our workflow presents several Bioconductor packages to work with genomic and epigenomics data.

Methods

Access to the data

TCGA data is accessible via the NCI Genomic Data Commons (GDC) data portal, and the Broad Institute’s GDAC Firehose. The GDC Data Portal provides access to the subset of TCGA data that has been harmonized against GRCh38 (hg38) using GDC Bioinformatics Pipelines which provides methods to the standardization of biospecimen and clinical data, the re-alignment of DNA and RNA sequence data against a common reference genome build GRCh38, and the generation of derived data.

The previously stored data in CGHub, TCGA Data Portal and Broad Institute’s GDAC Firehose, were provided as different levels or tiers that were defined in terms of a specific combination of both processing level (raw, normalized, integrated) and access level (controlled or open access). Level 1 indicated raw and controlled data, level 2 indicated processed and controlled data, level 3 indicated Segmented or Interpreted Data and open access and level 4 indicated region of interest and open access data. While the TCGA data portal provided level 1 to 3 data, Firehose only provides level 3 and 4. An explanation of the different levels can be found at TCGA Wiki. However, the GDC data portal no longer uses this based classification model in levels. Instead, a new data model was created, its documentation can be found in GDC documentation. In this new model, data can be open or controlled access. While the GDC open access data does not require authentication or authorization to access it and generally includes high-level genomic data that is not individually identifiable, as well as most clinical and all biospecimen data elements, the GDC controlled access data requires dbGaP authorization and eRA Commons authentication and generally includes individually identifiable data such as low-level genomic sequencing data, germline variants, SNP6 genotype data, and certain clinical data elements. The process to obtain access to controlled data is found in GDC web site.

Finally, the data provided by GDC data portal can be accessed using Bioconductor package TCGAbiolinks, while the data provided by Firehose can be accessed by Bioconductor package RTCGAToolbox.

The next steps describe how one could use TCGAbiolinks & RTCGAToolbox to download clinical, genomics, transcriptomics, epigenomics data, as well as subtype information and GISTIC results (i.e., identified genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth). All the data used in this workflow has as reference the Genome Reference Consortium human genome (build 37 - hg19).

Downloading data from TCGA data portal

The Bioconductor package TCGAbiolinks (Colaprico et al. 2016) has three main functions GDCquery, GDCdownload and GDCprepare that should sequentially be used to respectively search, download and load the data as an R object.

GDCquery uses GDC API to search the data for a given project and data category and filters the results by samples, sample type, file type and others features if requested by the user. This function returns an object with a summary table with the results found (samples, files and other useful information) and the arguments used in the query. The most important GDCquery arguments are project which receives a GDC project (TCGA-USC, TCGA-LGG, TARGET-AML, etc), data.category which receives a data category (Transcriptome Profiling, Copy Number Variation, DNA methylation, Gene expression, etc), data.type which receives a data type (Gene expression quantification, Isoform Expression Quantification, miRNA Expression Quantification, Copy Number Segment, Masked Copy Number Segment, etc), workflow.type, which receives a GDC workflow type (STAR - Counts), and platform, which receives a platform for the searches in the legacy database (HumanMethylation27, Genome_Wide_SNP_6, IlluminaHiSeq_RNASeqV2, etc). A complete list of possible entries for arguments can be found in the TCGAbiolinks vignette. Listing 1 shows an example of this function.

After the search step, the user will be able to download the data using the GDCdownload function which can use either the GDC API to download the samples, or the gdc client tools. The downloaded data will be saved in a directory with the project name and a sub-folder with the data.category, for example “TCGA-GBM/DNA_methylation”.

Finally, GDCprepare transforms the downloaded data into a summarizedExperiment object (Huber et al. 2015) or a data frame. If SummarizedExperiment is set to TRUE, TCGAbiolinks will add to the object sub-type information, which was defined by The Cancer Genome Atlas (TCGA) Research Network reports (the full list of papers can be seen in TCGAquery_subtype section in TCGAbiolinks vignette), and clinical information. Listing 1 shows how to use these functions to download DNA methylation and gene expression data from the GDC legacy database and 2 shows how to download copy number variation from harmonized data portal. Other examples, that access the harmonized data can be found in the TCGAbiolinks vignette.

library(TCGAbiolinks)
query_met_gbm <- GDCquery(
  project = "TCGA-GBM", 
  data.category = "DNA Methylation",
  data.type = "Methylation Beta Value",
  platform = "Illumina Human Methylation 450", 
  barcode = c("TCGA-76-4926-01B-01D-1481-05", "TCGA-28-5211-01C-11D-1844-05")
)
GDCdownload(query_met_gbm)

met_gbm_450k <- GDCprepare(
  query = query_met_gbm,
  summarizedExperiment = TRUE
)

query_met_lgg <- GDCquery(
  project = "TCGA-LGG", 
  data.category = "DNA Methylation",
  data.type = "Methylation Beta Value",
  platform = "Illumina Human Methylation 450",
  barcode = c("TCGA-HT-7879-01A-11D-2399-05", "TCGA-HT-8113-01A-11D-2399-05")
)
GDCdownload(query_met_lgg)
met_lgg_450k <- GDCprepare(
  query = query_met_lgg,
  summarizedExperiment = TRUE
)

met_lgg_450k$days_to_death <- NA
met_lgg_450k$year_of_death <- NA
met_gbm_lgg <- SummarizedExperiment::cbind(
  met_lgg_450k, 
  met_gbm_450k
)


# A total of 2.27 GB
query_exp_lgg <- GDCquery(
  project = "TCGA-LGG",
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification", 
  workflow.type = "STAR - Counts"
)

GDCdownload(query_exp_lgg)
exp_lgg <- GDCprepare(
  query = query_exp_lgg
)

query_exp_gbm <- GDCquery(
  project = "TCGA-GBM",
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification", 
  workflow.type = "STAR - Counts"
)
GDCdownload(query_exp_gbm)
exp_gbm <- GDCprepare(
  query = query_exp_gbm
)

# The following clinical data is not available in GBM
missing_cols <- setdiff(colnames(colData(exp_lgg)),colnames(colData(exp_gbm)))
for(i in missing_cols){
  exp_lgg[[i]] <- NULL
}

exp_gbm_lgg <- SummarizedExperiment::cbind(
  exp_lgg, 
  exp_gbm
)

#-----------------------------------------------------------------------------
#                   Data.category: Copy number variation aligned to hg38
#-----------------------------------------------------------------------------
query <- GDCquery(
  project = "TCGA-ACC",
  data.category = "Copy Number Variation",
  data.type = "Copy Number Segment",
  barcode = c( "TCGA-OR-A5KU-01A-11D-A29H-01", "TCGA-OR-A5JK-01A-11D-A29H-01")
)
GDCdownload(query)
data <- GDCprepare(query)

query <- GDCquery(
  project = "TCGA-ACC",
  data.category = "Copy Number Variation",
  data.type = "Masked Copy Number Segment",
  sample.type = c("Primary Tumor")
) # see the barcodes with getResults(query)$cases
GDCdownload(query)
data <- GDCprepare(query)

If a SummarizedExperiment object was chosen, the data can be accessed with three different accessors: assay for the data information, rowRanges to gets the range of values in each row and colData to get the sample information (patient, batch, sample type, etc) (Huber et al. 2015; H. J. Morgan M Obenchain V and H., n.d.). An example is shown in listing below.

library(SummarizedExperiment)

# Load object from TCGAWorkflowData package
# This object will be created in subsequent sections for enhanced clarity and understanding.
data(TCGA_GBM_Transcriptome_20_samples) 

# get gene expression matrix
data <- assay(exp_gbm)
datatable(
  data = data[1:10,], 
  options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
  rownames = TRUE
)

# get genes information
genes.info <- rowRanges(exp_gbm)
genes.info

## GRanges object with 60660 ranges and 10 metadata columns:
##                      seqnames              ranges strand |   source     type
##                         <Rle>           <IRanges>  <Rle> | <factor> <factor>
##   ENSG00000000003.15     chrX 100627108-100639991      - |   HAVANA     gene
##    ENSG00000000005.6     chrX 100584936-100599885      + |   HAVANA     gene
##   ENSG00000000419.13    chr20   50934867-50958555      - |   HAVANA     gene
##   ENSG00000000457.14     chr1 169849631-169894267      - |   HAVANA     gene
##   ENSG00000000460.17     chr1 169662007-169854080      + |   HAVANA     gene
##                  ...      ...                 ...    ... .      ...      ...
##    ENSG00000288669.1    chr19     7728958-7745662      - |   HAVANA     gene
##    ENSG00000288670.1     chr1 161368022-161371964      + |   HAVANA     gene
##    ENSG00000288671.1    chr19   42215133-42232149      - |   HAVANA     gene
##    ENSG00000288674.1     chr1 226870184-226987545      + |   HAVANA     gene
##    ENSG00000288675.1    chr11       797511-799190      + |   HAVANA     gene
##                          score     phase            gene_id      gene_type
##                      <numeric> <integer>        <character>    <character>
##   ENSG00000000003.15        NA      <NA> ENSG00000000003.15 protein_coding
##    ENSG00000000005.6        NA      <NA>  ENSG00000000005.6 protein_coding
##   ENSG00000000419.13        NA      <NA> ENSG00000000419.13 protein_coding
##   ENSG00000000457.14        NA      <NA> ENSG00000000457.14 protein_coding
##   ENSG00000000460.17        NA      <NA> ENSG00000000460.17 protein_coding
##                  ...       ...       ...                ...            ...
##    ENSG00000288669.1        NA      <NA>  ENSG00000288669.1 protein_coding
##    ENSG00000288670.1        NA      <NA>  ENSG00000288670.1         lncRNA
##    ENSG00000288671.1        NA      <NA>  ENSG00000288671.1 protein_coding
##    ENSG00000288674.1        NA      <NA>  ENSG00000288674.1 protein_coding
##    ENSG00000288675.1        NA      <NA>  ENSG00000288675.1 protein_coding
##                        gene_name       level     hgnc_id          havana_gene
##                      <character> <character> <character>          <character>
##   ENSG00000000003.15      TSPAN6           2  HGNC:11858 OTTHUMG00000022002.2
##    ENSG00000000005.6        TNMD           2  HGNC:17757 OTTHUMG00000022001.2
##   ENSG00000000419.13        DPM1           2   HGNC:3005 OTTHUMG00000032742.2
##   ENSG00000000457.14       SCYL3           2  HGNC:19285 OTTHUMG00000035941.6
##   ENSG00000000460.17    C1orf112           2  HGNC:25565 OTTHUMG00000035821.9
##                  ...         ...         ...         ...                  ...
##    ENSG00000288669.1  AC008763.4           2        <NA>                 <NA>
##    ENSG00000288670.1  AL592295.6           2        <NA>                 <NA>
##    ENSG00000288671.1  AC006486.3           2        <NA>                 <NA>
##    ENSG00000288674.1  AL391628.1           2        <NA>                 <NA>
##    ENSG00000288675.1  AP006621.6           2        <NA> OTTHUMG00000189301.4
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

# get sample information
sample.info <- colData(exp_gbm)
datatable(
  data = as.data.frame(sample.info), 
  options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
  rownames = FALSE
)

The clinical data can be obtained using TCGAbiolinks through two methods. The first one will download only the indexed GDC clinical data which includes diagnoses (vital status, days to death, age at diagnosis, days to last follow up, days to recurrence), treatments (days to treatment, treatment id, therapeutic agents, treatment intent type), demographic (gender, race, ethnicity) and exposures (cigarettes per day, weight, height, alcohol history) information. This indexed clinical data can be obtained using the function GDCquery_clinical which can be used as described in listing below. This function has two arguments project (“TCGA-GBM”,“TARGET-AML”,etc) and type (“Clinical” or “Biospecimen”). The second method will download the XML files with all clinical data for the patient and retrieve the desired information from it. This will give access to all clinical data available which includes patient (tumor tissue site, histological type, gender, vital status, days to birth, days to last follow up, etc), drug (days to drug therapy start, days to drug therapy end, therapy types, drug name), radiation (days to radiation therapy start, days to radiation therapy end, radiation type, radiation dosage ), new tumor event (days to new tumor event after initial treatment, new neoplasm event type, additional pharmaceutical therapy), follow up (primary therapy outcome success, follow up treatment success, vital status, days to last follow up, date of form completion), stage event (pathologic stage), admin (batch number, project code, disease code, Biospecimen Core Resource).

# get indexed clinical patient data for GBM samples
gbm_clin <- GDCquery_clinic(
  project = "TCGA-GBM", 
  type = "Clinical"
)

# get indexed clinical patient data for LGG samples
lgg_clin <- GDCquery_clinic(
  project = "TCGA-LGG", 
  type = "Clinical"
)

# Bind the results, as the columns might not be the same,
# we will will plyr rbind.fill, to have all columns from both files
clinical <- plyr::rbind.fill(
  gbm_clin,
  lgg_clin
)

datatable(
  clinical[1:10,], 
  options = list(scrollX = TRUE, keys = TRUE), 
  rownames = FALSE
)

# Fetch clinical data directly from the clinical XML files.
# if barcode is not set, it will consider all samples.
# We only set it to make the example faster
query <- GDCquery(
  project = "TCGA-GBM",
  data.format = "bcr xml",
  data.category = "Clinical",
  barcode = c("TCGA-08-0516","TCGA-02-0317")
) 
GDCdownload(query)
clinical <- GDCprepare_clinic(
  query = query, 
  clinical.info = "patient"
)

datatable(
  data = clinical, 
  options = list(scrollX = TRUE, keys = TRUE), 
  rownames = FALSE
)

clinical_drug <- GDCprepare_clinic(
  query = query, 
  clinical.info = "drug"
)

clinical_drug |>
  datatable(
    options = list(scrollX = TRUE, keys = TRUE), 
    rownames = FALSE
  )

clinical_radiation <- GDCprepare_clinic(
  query = query, 
  clinical.info = "radiation"
)

clinical_radiation |> 
  datatable(
    options = list(scrollX = TRUE, keys = TRUE), 
    rownames = FALSE
  )

clinical_admin <- GDCprepare_clinic(
  query = query, 
  clinical.info = "admin"
)

clinical_admin |>
  datatable(
    options = list(scrollX = TRUE, keys = TRUE), 
    rownames = FALSE
  )

Mutation information is stored in two types of Mutation Annotation Format (MAF): Protected and Somatic (or Public) MAF files, which are derived from the GDC annotated VCF files. Annotated VCF files often have variants reported on multiple transcripts whereas the protected MAF (*protected.maf) only reports the most critically affected one and the Somatic MAFs (*somatic.maf) are further processed to remove low quality and potential germline variants. To code below shows how to download Somatic MAFs data using TCGAbiolinks.

query <- GDCquery(
  project = c("TCGA-LGG","TCGA-GBM"), 
  data.category = "Simple Nucleotide Variation", 
  access = "open",
  data.type = "Masked Somatic Mutation", 
  workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)

data(maf_lgg_gbm)
maf[1:10,] |>
  datatable(
    options = list(scrollX = TRUE, keys = TRUE), 
    rownames = FALSE
  )

Finally, the Cancer Genome Atlas (TCGA) Research Network has reported integrated genome-wide studies of various diseases (ACC (Zheng et al. 2016), BRCA (C. G. A. Network and others 2012 b), COAD (C. G. A. Network and others 2012 a), GBM (Ceccarelli et al. 2016), HNSC (C. G. A. Network and others 2015 a), KICH (Davis et al. 2014), KIRC (Network and others 2013), KIRP (Network and others 2016), LGG (Ceccarelli et al. 2016), LUAD (Network and others 2014 c), LUSC (Network and others 2012), PRAD (Network and others 2015), READ (C. G. A. Network and others 2012 a), SKCM (C. G. A. Network and others 2015 b), STAD (Network and others 2014 a), THCA (Network and others 2014 d) and UCEC (Network and others 2014 b)) which classified them in different subtypes. This classification can be retrieved using the TCGAquery_subtype function or by accessing the samples information in the SummarizedExperiment object that created by the GDCprepare function, which automatically incorporates it into the object.

gbm_subtypes <- TCGAquery_subtype(
  tumor = "gbm"
)

datatable(
  gbm_subtypes[1:10,], 
  options = list(scrollX = TRUE, keys = TRUE), 
  rownames = FALSE
)

Downloading data from Broad TCGA GDAC

The Bioconductor package RTCGAToolbox (Samur 2014) provides access to Firehose Level 3 and 4 data through the function getFirehoseData. The following arguments allow users to select the version and tumor type of interest:

dataset - Tumor to download. A complete list of possibilities can be view with getFirehoseDatasets function.
runDate - Stddata run dates. Dates can be viewed with getFirehoseRunningDates function.
gistic2_Date - Analyze run dates. Dates can viewed with getFirehoseAnalyzeDates function.

These arguments can be used to select the data type to download: RNAseq_Gene, Clinic, miRNASeq_Gene, ccRNAseq2_Gene_Norm, CNA_SNP, CNV_SNP, CNA_Seq, CNA_CGH, Methylation, Mutation, mRNA_Array , miRNA_Array, and RPPA.

By default, RTCGAToolbox allows users to download up to 500 MB worth of data. To increase the size of the download, users are encouraged to use fileSizeLimit argument. An example is found in listing below. The getData function allows users to access the downloaded data.

library(RTCGAToolbox)

# Get the last run dates
lastRunDate <- getFirehoseRunningDates()[1]

# get DNA methylation data, RNAseq2 and clinical data for GBM
gbm_data <- getFirehoseData(
  dataset = "GBM",
  runDate = lastRunDate, 
  gistic2Date = getFirehoseAnalyzeDates(1),
  Methylation = FALSE,  
  clinical = TRUE,
  RNASeq2GeneNorm  = FALSE, 
  Mutation = TRUE,
  fileSizeLimit = 10000
)

gbm_mut <- getData(gbm_data,"Mutation")
gbm_clin <- getData(gbm_data,"clinical")

Finally, RTCGAToolbox can access level 4 data, which can be handy when the user requires GISTIC results. GISTIC is used to identify genes targeted by somatic copy-number alterations (SCNAs) (Mermel et al. 2011).

# Download GISTIC results
lastanalyzedate <- getFirehoseAnalyzeDates(1)
gistic <- getFirehoseData(
  dataset = "GBM",
  GISTIC = TRUE, 
  gistic2Date = lastanalyzedate
)

# get GISTIC results
gistic_allbygene <- getData(
  object = gistic, 
  type = "GISTIC", 
  platform = "AllByGene"
)
gistic_thresholedbygene <- getData(
  object = gistic, 
  type = "GISTIC", 
  platform = "ThresholdedByGene"
)

data(gbm_gistic)
gistic_allbygene %>% head() %>% gt::gt()

Gene.Symbol	Locus.ID	Cytoband	TCGA.02.0001.01C.01D.0182.01	TCGA.02.0003.01A.01D.0182.01	TCGA.02.0006.01B.01D.0182.01	TCGA.02.0007.01A.01D.0182.01	TCGA.02.0009.01A.01D.0182.01	TCGA.02.0010.01A.01D.0182.01	TCGA.02.0011.01B.01D.0182.01	TCGA.02.0014.01A.01D.0182.01	TCGA.02.0015.01A.01G.0293.01	TCGA.02.0016.01A.01G.0293.01	TCGA.02.0021.01A.01D.0182.01	TCGA.02.0023.01B.01G.0293.01	TCGA.02.0024.01B.01D.0182.01	TCGA.02.0025.01A.01G.0293.01	TCGA.02.0026.01B.01G.0293.01	TCGA.02.0027.01A.01D.0182.01	TCGA.02.0028.01A.01D.0182.01	TCGA.02.0033.01A.01D.0182.01	TCGA.02.0034.01A.01D.0182.01	TCGA.02.0037.01A.01D.0182.01	TCGA.02.0038.01A.01D.0182.01	TCGA.02.0039.01A.01G.0293.01	TCGA.02.0043.01A.01D.0182.01	TCGA.02.0046.01A.01D.0182.01	TCGA.02.0047.01A.01D.0182.01	TCGA.02.0048.01A.01G.0293.01	TCGA.02.0051.01A.01G.0293.01	TCGA.02.0052.01A.01D.0182.01	TCGA.02.0054.01A.01D.0182.01	TCGA.02.0055.01A.01D.0182.01	TCGA.02.0057.01A.01D.0182.01	TCGA.02.0058.01A.01D.0182.01	TCGA.02.0059.01A.01G.0293.01	TCGA.02.0060.01A.01D.0182.01	TCGA.02.0064.01A.01D.0193.01	TCGA.02.0068.01A.01G.0293.01	TCGA.02.0069.01A.01D.0193.01	TCGA.02.0070.01A.01G.0293.01	TCGA.02.0071.01A.01D.0193.01	TCGA.02.0074.01A.01D.0193.01	TCGA.02.0075.01A.01D.0193.01	TCGA.02.0079.01A.01D.0310.01	TCGA.02.0080.01A.01D.0193.01	TCGA.02.0083.01A.01D.0193.01	TCGA.02.0084.01A.01D.0310.01	TCGA.02.0085.01A.01D.0193.01	TCGA.02.0086.01A.01D.0193.01	TCGA.02.0087.01A.01D.0275.01	TCGA.02.0099.01A.01D.0193.01	TCGA.02.0102.01A.01D.0193.01	TCGA.02.0104.01A.01G.0293.01	TCGA.02.0106.01A.01D.0275.01	TCGA.02.0107.01A.01D.0193.01	TCGA.02.0111.01A.01D.0275.01	TCGA.02.0113.01A.01D.0193.01	TCGA.02.0114.01A.01D.0193.01	TCGA.02.0115.01A.01D.0193.01	TCGA.02.0116.01A.01D.0193.01	TCGA.02.0258.01A.01D.0275.01	TCGA.02.0260.01A.03D.0275.01	TCGA.02.0266.01A.01D.0275.01	TCGA.02.0269.01B.01D.0275.01	TCGA.02.0271.01A.01D.0275.01	TCGA.02.0281.01A.01D.0275.01	TCGA.02.0285.01A.01D.0275.01	TCGA.02.0289.01A.01D.0275.01	TCGA.02.0290.01A.01D.0275.01	TCGA.02.0317.01A.01D.0275.01	TCGA.02.0321.01A.01D.0275.01	TCGA.02.0324.01A.01D.0275.01	TCGA.02.0325.01A.01D.0275.01	TCGA.02.0326.01A.01D.0275.01	TCGA.02.0330.01A.01D.0275.01	TCGA.02.0332.01A.01D.0275.01	TCGA.02.0333.01A.02D.0275.01	TCGA.02.0337.01A.01D.0275.01	TCGA.02.0338.01A.01D.0275.01	TCGA.02.0339.01A.01D.0275.01	TCGA.02.0430.01A.01D.0275.01	TCGA.02.0432.01A.02D.0275.01	TCGA.02.0440.01A.01D.0275.01	TCGA.02.0446.01A.01D.0275.01	TCGA.02.0451.01A.01D.0275.01	TCGA.02.0456.01A.01D.0275.01	TCGA.02.2466.01A.01D.0784.01	TCGA.02.2470.01A.01D.0784.01	TCGA.02.2483.01A.01D.0784.01	TCGA.02.2485.01A.01D.0784.01	TCGA.02.2486.01A.01D.0784.01	TCGA.06.0119.01A.08D.0214.01	TCGA.06.0121.01A.04D.0214.01	TCGA.06.0122.01A.01D.0214.01	TCGA.06.0124.01A.01D.0214.01	TCGA.06.0125.01A.01D.0214.01	TCGA.06.0126.01A.01D.0214.01	TCGA.06.0127.01A.01D.0310.01	TCGA.06.0128.01A.01D.0214.01	TCGA.06.0129.01A.01D.0214.01	TCGA.06.0130.01A.01D.0214.01	TCGA.06.0132.01A.02D.0236.01	TCGA.06.0133.01A.02D.0214.01	TCGA.06.0137.01A.02D.0214.01	TCGA.06.0138.01A.02D.0236.01	TCGA.06.0139.01B.05D.0214.01	TCGA.06.0140.01A.01D.0214.01	TCGA.06.0141.01A.01D.0214.01	TCGA.06.0142.01A.01D.0214.01	TCGA.06.0143.01A.01D.0214.01	TCGA.06.0145.01A.05D.0214.01	TCGA.06.0146.01A.01D.0275.01	TCGA.06.0147.01A.02D.0236.01	TCGA.06.0148.01A.01D.0214.01	TCGA.06.0149.01A.05D.0275.01	TCGA.06.0150.01A.01D.0236.01	TCGA.06.0151.01A.01D.0236.01	TCGA.06.0152.01A.02D.0310.01	TCGA.06.0154.01A.03D.0236.01	TCGA.06.0155.01B.01D.0517.01	TCGA.06.0157.01A.01D.0236.01	TCGA.06.0158.01A.01D.0236.01	TCGA.06.0159.01A.01D.0236.01	TCGA.06.0160.01A.01D.0236.01	TCGA.06.0162.01A.01D.0275.01	TCGA.06.0164.01A.01D.0275.01	TCGA.06.0165.01A.01D.0236.01	TCGA.06.0166.01A.01D.0236.01	TCGA.06.0168.01A.02D.0236.01	TCGA.06.0169.01A.01D.0214.01	TCGA.06.0171.01A.02D.0236.01	TCGA.06.0173.01A.01D.0236.01	TCGA.06.0174.01A.01D.0236.01	TCGA.06.0175.01A.01D.0275.01	TCGA.06.0176.01A.03D.0236.01	TCGA.06.0177.01A.01D.0275.01	TCGA.06.0178.01A.01D.0236.01	TCGA.06.0179.01A.02D.0275.01	TCGA.06.0182.01A.01D.0275.01	TCGA.06.0184.01A.01D.0236.01	TCGA.06.0185.01A.01D.0236.01	TCGA.06.0187.01A.01D.0236.01	TCGA.06.0188.01A.01D.0236.01	TCGA.06.0189.01A.01D.0236.01	TCGA.06.0190.01A.01D.0236.01	TCGA.06.0192.01B.01D.0333.01	TCGA.06.0194.01A.01D.0275.01	TCGA.06.0195.01B.01D.0236.01	TCGA.06.0197.01A.02D.0236.01	TCGA.06.0206.01A.01D.0236.01	TCGA.06.0208.01B.01D.0236.01	TCGA.06.0209.01A.01D.0236.01	TCGA.06.0210.01B.01D.0236.01	TCGA.06.0211.01B.01D.0236.01	TCGA.06.0213.01A.01D.0236.01	TCGA.06.0214.01A.02D.0236.01	TCGA.06.0216.01B.01D.0333.01	TCGA.06.0219.01A.01D.0236.01	TCGA.06.0221.01A.01D.0236.01	TCGA.06.0237.01A.02D.0236.01	TCGA.06.0238.01A.02D.0310.01	TCGA.06.0240.01A.03D.0236.01	TCGA.06.0241.01A.02D.0236.01	TCGA.06.0394.01A.01D.0275.01	TCGA.06.0397.01A.01D.0275.01	TCGA.06.0402.01A.01D.0275.01	TCGA.06.0409.01A.02D.0275.01	TCGA.06.0410.01A.01D.0275.01	TCGA.06.0412.01A.01D.0275.01	TCGA.06.0413.01A.01D.0275.01	TCGA.06.0414.01A.01D.0275.01	TCGA.06.0644.01A.02D.0310.01	TCGA.06.0645.01A.01D.0310.01	TCGA.06.0646.01A.01D.0310.01	TCGA.06.0648.01A.01D.0310.01	TCGA.06.0649.01B.01D.0333.01	TCGA.06.0650.01A.02D.1694.01	TCGA.06.0686.01A.01D.0333.01	TCGA.06.0743.01A.01D.0333.01	TCGA.06.0744.01A.01D.0333.01	TCGA.06.0745.01A.01D.0333.01	TCGA.06.0747.01A.01D.0333.01	TCGA.06.0749.01A.01D.0333.01	TCGA.06.0750.01A.01D.0333.01	TCGA.06.0875.01A.01D.0384.01	TCGA.06.0876.01A.01D.0384.01	TCGA.06.0877.01A.01D.0384.01	TCGA.06.0878.01A.01D.0384.01	TCGA.06.0879.01A.01D.0384.01	TCGA.06.0881.01A.02D.0384.01	TCGA.06.0882.01A.01D.0384.01	TCGA.06.0939.01A.01D.1224.01	TCGA.06.1084.01A.01D.0517.01	TCGA.06.1086.01A.02D.0517.01	TCGA.06.1087.01A.02D.0517.01	TCGA.06.1800.01A.01D.0591.01	TCGA.06.1801.01A.02D.0591.01	TCGA.06.1802.01A.01D.0591.01	TCGA.06.1804.01A.01D.1694.01	TCGA.06.1805.01A.01D.0591.01	TCGA.06.1806.01A.02D.1842.01	TCGA.06.2557.01A.01D.0784.01	TCGA.06.2558.01A.01D.0784.01	TCGA.06.2559.01A.01D.0784.01	TCGA.06.2561.01A.02D.0784.01	TCGA.06.2562.01A.01D.0784.01	TCGA.06.2563.01A.01D.0784.01	TCGA.06.2564.01A.01D.0784.01	TCGA.06.2565.01A.01D.0784.01	TCGA.06.2566.01A.01D.0784.01	TCGA.06.2567.01A.01D.0784.01	TCGA.06.2569.01A.01D.0784.01	TCGA.06.2570.01A.01D.0784.01	TCGA.06.5408.01A.01D.1694.01	TCGA.06.5410.01A.01D.1694.01	TCGA.06.5411.01A.01D.1694.01	TCGA.06.5412.01A.01D.1694.01	TCGA.06.5413.01A.01D.1694.01	TCGA.06.5414.01A.01D.1479.01	TCGA.06.5415.01A.01D.1479.01	TCGA.06.5416.01A.01D.1479.01	TCGA.06.5418.01A.01D.1479.01	TCGA.06.5856.01A.01D.1694.01	TCGA.06.5858.01A.01D.1694.01	TCGA.06.5859.01A.01D.1694.01	TCGA.06.6388.01A.12D.1842.01	TCGA.06.6389.01A.11D.1694.01	TCGA.06.6390.01A.11D.1694.01	TCGA.06.6391.01A.11D.1694.01	TCGA.06.6693.01A.11D.1842.01	TCGA.06.6694.01A.12D.1842.01	TCGA.06.6695.01A.11D.1842.01	TCGA.06.6697.01A.11D.1842.01	TCGA.06.6698.01A.11D.1842.01	TCGA.06.6699.01A.11D.1842.01	TCGA.06.6700.01A.12D.1842.01	TCGA.06.6701.01A.11D.1842.01	TCGA.06.A5U0.01A.11D.A33S.01	TCGA.06.A5U1.01A.11D.A33S.01	TCGA.06.A6S0.01A.11D.A33S.01	TCGA.06.A6S1.01A.11D.A33S.01	TCGA.06.A7TK.01A.21D.A390.01	TCGA.06.A7TL.01A.11D.A390.01	TCGA.08.0244.01A.01G.0293.01	TCGA.08.0245.01A.01G.0293.01	TCGA.08.0246.01A.01G.0293.01	TCGA.08.0344.01A.01G.0293.01	TCGA.08.0345.01A.01D.0310.01	TCGA.08.0346.01A.01G.0293.01	TCGA.08.0347.01A.01G.0293.01	TCGA.08.0348.01A.01G.0293.01	TCGA.08.0349.01A.01D.0310.01	TCGA.08.0350.01A.01G.0293.01	TCGA.08.0351.01A.01G.0293.01	TCGA.08.0352.01A.01D.0310.01	TCGA.08.0353.01A.01G.0293.01	TCGA.08.0354.01A.01G.0293.01	TCGA.08.0355.01A.01G.0293.01	TCGA.08.0356.01A.01G.0293.01	TCGA.08.0357.01A.01G.0293.01	TCGA.08.0358.01A.01D.0310.01	TCGA.08.0359.01A.01G.0293.01	TCGA.08.0375.01A.01G.0293.01	TCGA.08.0380.01A.01G.0293.01	TCGA.08.0386.01A.01D.0310.01	TCGA.08.0389.01A.01G.0293.01	TCGA.08.0390.01A.01G.0293.01	TCGA.08.0392.01A.01G.0293.01	TCGA.08.0509.01A.01D.0275.01	TCGA.08.0510.01A.01D.0275.01	TCGA.08.0512.01A.01D.0275.01	TCGA.08.0514.01A.01D.0275.01	TCGA.08.0516.01A.01D.0275.01	TCGA.08.0517.01A.01D.0275.01	TCGA.08.0518.01A.01D.0275.01	TCGA.08.0520.01A.01D.0275.01	TCGA.08.0521.01A.01D.0275.01	TCGA.08.0524.01A.01D.0275.01	TCGA.08.0525.01A.01D.0275.01	TCGA.08.0529.01A.02D.0275.01	TCGA.08.0531.01A.01D.0275.01	TCGA.12.0615.01A.01D.0310.01	TCGA.12.0616.01A.01D.0310.01	TCGA.12.0618.01A.01D.0310.01	TCGA.12.0619.01A.01D.0310.01	TCGA.12.0620.01A.01D.0310.01	TCGA.12.0654.01B.01D.0333.01	TCGA.12.0656.01A.03D.0333.01	TCGA.12.0657.01A.01D.0333.01	TCGA.12.0662.01A.01D.0333.01	TCGA.12.0670.01B.01D.0384.01	TCGA.12.0688.01A.02D.0333.01	TCGA.12.0692.01A.01D.0333.01	TCGA.12.0703.01A.02D.0333.01	TCGA.12.0707.01A.01D.0333.01	TCGA.12.0769.01A.01D.0333.01	TCGA.12.0772.01A.01D.0333.01	TCGA.12.0773.01A.01D.0333.01	TCGA.12.0775.01A.01D.0333.01	TCGA.12.0776.01A.01D.0333.01	TCGA.12.0778.01A.01D.0333.01	TCGA.12.0780.01A.01D.0333.01	TCGA.12.0818.01A.01D.0384.01	TCGA.12.0819.01A.01D.0384.01	TCGA.12.0820.01A.01D.0384.01	TCGA.12.0821.01A.01D.0384.01	TCGA.12.0822.01A.01D.0384.01	TCGA.12.0826.01A.01D.0384.01	TCGA.12.0827.01A.01D.0384.01	TCGA.12.0828.01A.01D.0384.01	TCGA.12.0829.01A.01D.0384.01	TCGA.12.1088.01A.01D.0517.01	TCGA.12.1089.01A.01D.0517.01	TCGA.12.1090.01A.01D.0517.01	TCGA.12.1091.01A.01D.0517.01	TCGA.12.1092.01B.01D.0517.01	TCGA.12.1093.01A.01D.0517.01	TCGA.12.1094.01A.01D.0517.01	TCGA.12.1095.01A.01D.0517.01	TCGA.12.1096.01A.01D.0517.01	TCGA.12.1097.01B.01D.0517.01	TCGA.12.1098.01C.01D.0517.01	TCGA.12.1099.01A.01D.0517.01	TCGA.12.1598.01A.01D.0591.01	TCGA.12.1599.01A.01D.0591.01	TCGA.12.1600.01A.01D.0591.01	TCGA.12.1602.01A.01D.0591.01	TCGA.12.3644.01A.01D.0911.01	TCGA.12.3646.01A.01D.0911.01	TCGA.12.3648.01A.01D.0911.01	TCGA.12.3649.01A.01D.0911.01	TCGA.12.3650.01A.01D.0911.01	TCGA.12.3651.01A.01D.0911.01	TCGA.12.3652.01A.01D.0911.01	TCGA.12.3653.01A.01D.0911.01	TCGA.12.5295.01A.01D.1479.01	TCGA.12.5299.01A.02D.1479.01	TCGA.12.5301.01A.01D.1479.01	TCGA.14.0736.01A.01D.0517.01	TCGA.14.0740.01B.01D.1842.01	TCGA.14.0781.01B.01D.1694.01	TCGA.14.0783.01B.01D.0517.01	TCGA.14.0786.01B.01D.0517.01	TCGA.14.0787.01A.01D.0384.01	TCGA.14.0789.01A.01D.0384.01	TCGA.14.0790.01B.01D.0784.01	TCGA.14.0812.01B.01D.0591.01	TCGA.14.0813.01A.01D.0384.01	TCGA.14.0817.01A.01D.0384.01	TCGA.14.0862.01B.01D.1842.01	TCGA.14.0865.01B.01D.0591.01	TCGA.14.0866.01B.01D.0591.01	TCGA.14.0867.01A.01D.0384.01	TCGA.14.0871.01A.01D.0384.01	TCGA.14.1034.01A.01D.0517.01	TCGA.14.1037.01A.01D.0591.01	TCGA.14.1043.01B.11D.1842.01	TCGA.14.1395.01B.11D.1842.01	TCGA.14.1396.01A.01D.0517.01	TCGA.14.1401.01A.01D.0517.01	TCGA.14.1402.01A.01D.0517.01	TCGA.14.1450.01B.01D.1842.01	TCGA.14.1451.01A.01D.0517.01	TCGA.14.1452.01A.01D.0517.01	TCGA.14.1453.01A.01D.0517.01	TCGA.14.1454.01A.01D.0517.01	TCGA.14.1455.01A.01D.0591.01	TCGA.14.1456.01B.01D.0784.01	TCGA.14.1458.01A.01D.0591.01	TCGA.14.1459.01A.01D.0517.01	TCGA.14.1794.01A.01D.0591.01	TCGA.14.1795.01A.01D.0591.01	TCGA.14.1821.01A.01D.0591.01	TCGA.14.1823.01A.01D.0591.01	TCGA.14.1825.01A.01D.0591.01	TCGA.14.1827.01A.01D.0591.01	TCGA.14.1829.01A.01D.0591.01	TCGA.14.2554.01A.01D.0784.01	TCGA.14.2555.01B.01D.0911.01	TCGA.14.3477.01A.01D.0911.01	TCGA.14.4157.01A.01D.1224.01	TCGA.15.0742.01A.01D.0333.01	TCGA.15.1444.01A.02D.1694.01	TCGA.15.1446.01A.01D.0517.01	TCGA.15.1447.01A.01D.0517.01	TCGA.15.1449.01A.01D.0517.01	TCGA.16.0846.01A.01D.0384.01	TCGA.16.0848.01A.01D.0384.01	TCGA.16.0849.01A.01D.0384.01	TCGA.16.0850.01A.01D.0384.01	TCGA.16.0861.01A.01D.0384.01	TCGA.16.1045.01B.01D.0517.01	TCGA.16.1047.01B.01D.0517.01	TCGA.16.1048.01B.01D.1224.01	TCGA.16.1055.01B.01D.0517.01	TCGA.16.1056.01B.01D.0517.01	TCGA.16.1060.01A.01D.0517.01	TCGA.16.1062.01A.01D.0517.01	TCGA.16.1063.01B.01D.0517.01	TCGA.16.1460.01A.01D.0591.01	TCGA.19.0955.01A.02D.0517.01	TCGA.19.0957.01C.01D.0591.01	TCGA.19.0960.01A.02D.0517.01	TCGA.19.0962.01B.01D.0517.01	TCGA.19.0963.01B.01D.0517.01	TCGA.19.0964.01A.01D.0517.01	TCGA.19.1385.01A.02D.0591.01	TCGA.19.1386.01A.01D.0591.01	TCGA.19.1387.01A.01D.0591.01	TCGA.19.1388.01A.01D.0591.01	TCGA.19.1389.01A.01D.0591.01	TCGA.19.1390.01A.01D.0911.01	TCGA.19.1392.01A.01D.0517.01	TCGA.19.1786.01A.01D.0591.01	TCGA.19.1787.01B.01D.0911.01	TCGA.19.1789.01A.01D.0591.01	TCGA.19.1791.01A.01D.0591.01	TCGA.19.2619.01A.01D.0911.01	TCGA.19.2620.01A.01D.0911.01	TCGA.19.2621.01B.01D.0911.01	TCGA.19.2623.01A.01D.0911.01	TCGA.19.2624.01A.01D.0911.01	TCGA.19.2625.01A.01D.0911.01	TCGA.19.2629.01A.01D.0911.01	TCGA.19.2631.01A.01D.1224.01	TCGA.19.4065.01A.01D.2002.01	TCGA.19.5947.01A.11D.1694.01	TCGA.19.5950.01A.11D.1694.01	TCGA.19.5951.01A.11D.1694.01	TCGA.19.5952.01A.11D.1694.01	TCGA.19.5953.01B.12D.1842.01	TCGA.19.5954.01A.11D.1694.01	TCGA.19.5955.01A.11D.1694.01	TCGA.19.5956.01A.11D.1694.01	TCGA.19.5958.01A.11D.1694.01	TCGA.19.5959.01A.11D.1694.01	TCGA.19.5960.01A.11D.1694.01	TCGA.19.A60I.01A.12D.A33S.01	TCGA.19.A6J4.01A.11D.A33S.01	TCGA.19.A6J5.01A.21D.A33S.01	TCGA.26.1438.01A.01D.0517.01	TCGA.26.1439.01A.01D.1224.01	TCGA.26.1440.01A.01D.0517.01	TCGA.26.1442.01A.01D.1694.01	TCGA.26.1443.01A.01D.0517.01	TCGA.26.1799.01A.02D.0591.01	TCGA.26.5132.01A.01D.1479.01	TCGA.26.5133.01A.01D.1479.01	TCGA.26.5134.01A.01D.1479.01	TCGA.26.5135.01A.01D.1479.01	TCGA.26.5136.01B.01D.1479.01	TCGA.26.5139.01A.01D.1479.01	TCGA.26.6173.01A.11D.1842.01	TCGA.26.6174.01A.21D.1842.01	TCGA.26.A7UX.01B.11D.A390.01	TCGA.27.1830.01A.01D.0591.01	TCGA.27.1831.01A.01D.0784.01	TCGA.27.1832.01A.01D.0591.01	TCGA.27.1833.01A.01D.0591.01	TCGA.27.1834.01A.01D.0591.01	TCGA.27.1835.01A.01D.0784.01	TCGA.27.1836.01A.01D.0784.01	TCGA.27.1837.01A.01D.0784.01	TCGA.27.1838.01A.01D.0784.01	TCGA.27.2518.01A.01D.0784.01	TCGA.27.2519.01A.01D.0784.01	TCGA.27.2521.01A.01D.0784.01	TCGA.27.2523.01A.01D.0784.01	TCGA.27.2524.01A.01D.0784.01	TCGA.27.2526.01A.01D.0784.01	TCGA.27.2527.01A.01D.0784.01	TCGA.27.2528.01A.01D.0784.01	TCGA.28.1746.01A.01D.0591.01	TCGA.28.1747.01C.01D.0784.01	TCGA.28.1749.01A.01D.0591.01	TCGA.28.1750.01A.01D.0591.01	TCGA.28.1751.01A.02D.0591.01	TCGA.28.1752.01A.01D.0591.01	TCGA.28.1753.01A.01D.0784.01	TCGA.28.1755.01A.01D.0591.01	TCGA.28.1756.01C.01D.0784.01	TCGA.28.1757.01A.02D.0591.01	TCGA.28.2501.01A.01D.1694.01	TCGA.28.2502.01B.01D.0784.01	TCGA.28.2506.01A.02D.0784.01	TCGA.28.2509.01A.01D.0784.01	TCGA.28.2510.01A.01D.1694.01	TCGA.28.2513.01A.01D.0784.01	TCGA.28.2514.01A.02D.0784.01	TCGA.28.5204.01A.01D.1479.01	TCGA.28.5207.01A.01D.1479.01	TCGA.28.5208.01A.01D.1479.01	TCGA.28.5209.01A.01D.1479.01	TCGA.28.5211.01C.11D.1842.01	TCGA.28.5213.01A.01D.1479.01	TCGA.28.5214.01A.01D.1479.01	TCGA.28.5215.01A.01D.1479.01	TCGA.28.5216.01A.01D.1479.01	TCGA.28.5218.01A.01D.1479.01	TCGA.28.5219.01A.01D.1479.01	TCGA.28.5220.01A.01D.1479.01	TCGA.28.6450.01A.11D.1694.01	TCGA.32.1970.01A.01D.0784.01	TCGA.32.1973.01A.01D.1224.01	TCGA.32.1976.01A.01D.0784.01	TCGA.32.1977.01A.01D.1224.01	TCGA.32.1978.01A.01D.1224.01	TCGA.32.1979.01A.01D.1694.01	TCGA.32.1980.01A.01D.1694.01	TCGA.32.1982.01A.01D.0784.01	TCGA.32.1986.01A.01D.0784.01	TCGA.32.1987.01A.01D.1224.01	TCGA.32.1991.01A.01D.1224.01	TCGA.32.2491.01A.01D.1224.01	TCGA.32.2494.01A.01D.1224.01	TCGA.32.2495.01A.01D.1224.01	TCGA.32.2615.01A.01D.0911.01	TCGA.32.2616.01A.01D.0911.01	TCGA.32.2632.01A.01D.0911.01	TCGA.32.2634.01A.01D.0911.01	TCGA.32.2638.01A.01D.0911.01	TCGA.32.4208.01A.01D.1224.01	TCGA.32.4210.01A.01D.1224.01	TCGA.32.4211.01A.01D.1224.01	TCGA.32.4213.01A.01D.1224.01	TCGA.32.4719.01A.01D.1224.01	TCGA.32.5222.01A.01D.1479.01	TCGA.41.2571.01A.01D.0911.01	TCGA.41.2572.01A.01D.1224.01	TCGA.41.2573.01A.01D.0911.01	TCGA.41.2575.01A.01D.0911.01	TCGA.41.3392.01A.01D.0911.01	TCGA.41.3393.01A.01D.1224.01	TCGA.41.3915.01A.01D.1224.01	TCGA.41.4097.01A.01D.1224.01	TCGA.41.5651.01A.01D.1694.01	TCGA.41.6646.01A.11D.1842.01	TCGA.4W.AA9R.01A.11D.A390.01	TCGA.4W.AA9S.01A.11D.A390.01	TCGA.4W.AA9T.01A.11D.A390.01	TCGA.74.6573.01A.12D.1842.01	TCGA.74.6575.01A.11D.1842.01	TCGA.74.6577.01A.11D.1842.01	TCGA.74.6578.01A.11D.1842.01	TCGA.74.6581.01A.11D.1842.01	TCGA.74.6584.01A.11D.1842.01	TCGA.76.4925.01A.01D.1479.01	TCGA.76.4926.01B.01D.1479.01	TCGA.76.4928.01B.01D.1479.01	TCGA.76.4929.01A.01D.1479.01	TCGA.76.4931.01A.01D.1479.01	TCGA.76.4934.01A.01D.1479.01	TCGA.76.4935.01A.01D.1479.01	TCGA.76.6191.01A.12D.1694.01	TCGA.76.6192.01A.11D.1694.01	TCGA.76.6193.01A.11D.1694.01	TCGA.76.6280.01A.21D.1842.01	TCGA.76.6282.01A.11D.1694.01	TCGA.76.6283.01A.11D.1842.01	TCGA.76.6285.01A.11D.1694.01	TCGA.76.6286.01A.11D.1842.01	TCGA.76.6656.01A.11D.1842.01	TCGA.76.6657.01A.11D.1842.01	TCGA.76.6660.01A.11D.1842.01	TCGA.76.6661.01B.11D.1842.01	TCGA.76.6662.01A.11D.1842.01	TCGA.76.6663.01A.11D.1842.01	TCGA.76.6664.01A.11D.1842.01	TCGA.81.5910.01A.11D.1694.01	TCGA.81.5911.01A.12D.1842.01	TCGA.87.5896.01A.01D.1694.01	TCGA.OX.A56R.01A.11D.A33S.01	TCGA.RR.A6KA.01A.21D.A33S.01	TCGA.RR.A6KB.01A.12D.A33S.01	TCGA.RR.A6KC.01A.31D.A33S.01
ACAP3	116983	1p36.33	0.242	0.007	0.005	-0.072	0.048	-0.094	0.013	-0.029	-0.222	-0.063	-0.017	-0.739	-0.462	-0.135	-0.639	-0.002	-0.887	0.069	0.024	-0.026	-0.012	0.007	-0.053	0.004	-0.020	0.095	-0.176	0.143	0.028	0.146	-0.014	-0.026	0.056	0.019	-0.688	-0.072	-0.547	0.018	0.009	0.029	-0.001	0.015	-0.039	-0.001	0.010	0.007	0.033	0.020	0.018	0.007	0.060	0.216	0.023	-0.012	0.662	-0.921	-0.600	-0.018	-0.062	-0.030	0.594	0.525	0.038	-0.234	-0.002	-0.012	-0.085	0.001	0.089	-0.017	-0.032	0.055	0.055	0.281	0.112	0.021	0.010	-0.010	-0.001	-0.062	-0.002	0.173	0.666	0.176	0.090	0.045	-0.289	-0.059	0.092	0.038	0.006	-0.039	0.015	-0.062	-0.872	-0.101	0.084	0.015	0.123	0.013	0.626	-0.012	0.092	0.014	0.048	0.004	0.010	0.032	0.017	0.055	0.018	-0.062	0.016	0.002	-0.024	0.284	-0.049	-0.015	0.002	0.008	0.086	0.012	-0.656	-0.531	0.002	-0.490	1.006	-0.012	-0.710	-0.028	0.035	-0.652	0.093	0.336	-0.000	0.004	-0.015	-0.586	-0.013	0.967	0.359	-0.009	0.037	0.107	0.054	-0.497	-0.001	0.352	-0.165	0.411	-0.001	0.002	0.239	-0.018	0.296	0.013	0.034	0.296	0.051	0.007	0.071	0.004	0.020	-1.013	-0.045	0.281	0.093	0.059	0.326	0.108	0.131	1.228	0.073	0.240	0.001	0.045	-0.160	-0.037	-0.781	-0.875	0.028	-0.015	0.048	-0.034	-0.032	-0.683	-0.791	-0.014	0.003	0.020	-0.502	-0.085	-0.296	0.014	-0.260	-0.023	0.064	0.024	-0.094	0.485	0.067	0.006	0.089	0.067	-0.041	0.042	-0.086	-0.061	0.036	0.026	0.010	0.035	0.005	-0.136	0.006	0.006	-0.027	-0.059	0.002	0.099	0.201	0.016	-0.904	-0.064	0.004	0.030	0.001	-0.789	0.065	-0.006	-0.002	0.344	-0.015	0.002	0.001	-0.035	0.002	0.002	0.057	0.036	0.019	-0.315	0.014	-0.067	-0.950	0.022	0.016	0.012	-0.063	0.465	0.278	0.002	-0.008	0.054	-0.043	-0.015	-0.021	-0.004	-0.021	-0.052	-0.068	-0.025	0.406	-0.706	-0.404	-0.100	0.033	-0.135	0.016	-0.788	-0.039	-0.906	0.037	-0.739	-0.003	-0.022	-0.028	0.034	-0.911	-0.043	0.041	0.027	0.723	0.025	0.010	-0.057	-0.003	0.578	-0.033	-0.860	-0.050	-0.005	0.042	-0.054	0.008	-0.010	0.018	0.015	0.040	-0.033	0.013	-0.030	-0.842	0.059	-0.058	0.267	0.098	0.018	-0.031	0.097	-0.006	0.210	0.081	0.003	-0.023	-0.140	-0.688	0.105	0.095	0.139	0.259	1.013	-0.015	0.023	-0.070	-0.066	-0.094	-0.042	-0.159	-0.024	0.059	-0.052	0.446	0.645	-0.000	0.022	-0.021	0.008	0.013	0.538	-0.991	-0.954	-0.017	-0.089	-0.028	0.176	0.516	-0.324	-0.896	-0.032	0.408	0.590	-0.042	-0.002	0.001	0.037	-0.456	0.009	-0.008	0.012	0.008	0.591	-0.984	0.502	0.030	0.004	-0.202	0.037	0.196	0.032	-0.117	0.450	0.061	0.486	0.068	-0.046	-0.010	0.594	-0.041	-0.075	-0.004	0.004	0.076	0.160	-0.034	0.010	-0.016	-0.031	-0.942	0.007	0.074	-0.724	-0.027	-0.748	-0.094	-0.016	-0.020	0.029	-0.013	0.049	-0.006	-0.731	-0.661	-0.000	0.011	-0.075	-0.213	0.179	0.009	0.171	0.049	-0.034	0.975	-0.008	0.104	-0.429	-0.002	-0.011	0.089	-0.029	0.018	-0.201	-0.009	0.292	-0.565	0.002	-0.009	0.003	0.005	0.002	-0.008	0.063	0.085	0.041	0.066	0.087	0.014	-0.053	-0.002	-0.009	0.684	0.014	-0.001	-0.027	0.105	-0.035	0.030	0.072	0.120	-0.004	0.458	0.010	-0.036	0.033	0.015	-0.073	-0.109	0.030	-0.042	0.009	0.012	0.035	-0.017	0.072	-0.680	0.011	-0.858	-0.031	0.032	-0.036	0.032	0.032	-0.027	-0.493	-0.045	-0.159	0.004	0.029	-0.362	0.020	-0.345	-0.439	-0.019	0.393	-0.330	-0.034	-0.045	0.020	0.132	0.014	-0.058	-0.444	0.078	0.012	0.044	0.411	-0.033	0.064	-0.033	0.926	0.192	0.036	0.009	-0.581	0.061	-0.023	0.028	0.008	0.037	0.383	0.003	0.030	0.061	-0.020	0.086	0.102	-0.018	0.166	0.038	-0.331	0.287	0.009	-0.004	-0.039	0.543	-0.003	-0.062	0.016	-1.083	0.012	-0.031	0.001	-0.103	-0.067	0.007	-0.004	0.036	-0.032	0.037	-0.215	0.001	0.029	0.055	0.198	0.136	0.006	0.023	-0.002	-0.160	0.177	-0.011	0.074	0.007	0.005	-0.017	-0.003	0.713	-0.873	0.137	-0.033	0.122	-0.300	0.036	-0.003	-0.003	-0.001	0.052	-0.020	-0.916	0.152	-0.047	0.027	-0.016
ACTRT2	140625	1p36.32	0.242	0.007	0.005	-0.072	0.048	-0.094	0.013	-0.029	-0.222	-0.063	-0.017	-0.739	-0.462	-0.135	-0.639	-0.002	-0.887	0.069	0.024	-0.026	-0.012	0.007	-0.053	0.004	-0.020	0.095	-0.176	0.143	0.028	0.146	-0.014	-0.026	0.056	0.019	-0.688	-0.072	-0.547	0.018	0.009	0.029	-0.001	0.015	-0.039	-0.001	0.010	0.007	0.033	0.020	0.018	0.007	0.060	0.216	0.023	-0.012	0.662	-0.921	-0.600	-0.018	-0.062	-0.030	0.594	0.525	0.038	-0.234	-0.002	-0.012	-0.085	0.001	0.089	-0.017	-0.032	0.055	0.055	0.281	0.112	0.021	0.010	-0.010	-0.001	-0.062	-0.002	0.173	0.666	0.176	0.090	0.045	-0.289	-0.059	0.092	0.038	0.006	-0.039	0.015	-0.062	-0.872	-0.101	0.084	0.015	0.123	0.013	0.626	-0.012	0.092	0.014	0.048	0.004	0.010	0.032	0.017	0.055	0.018	-0.062	0.016	0.002	-0.024	0.284	-0.049	-0.015	0.002	0.008	0.086	0.012	-0.656	-0.531	0.002	-0.490	1.006	-0.012	-0.710	-0.028	0.035	-0.652	0.093	0.336	-0.000	0.004	-0.015	-0.586	-0.013	0.967	0.359	-0.009	0.037	0.107	0.054	-0.497	-0.001	0.352	-0.165	0.411	-0.001	0.002	0.239	-0.018	0.296	0.013	0.034	0.296	0.051	0.007	0.071	0.004	0.020	-1.013	-0.045	0.281	0.093	0.059	0.326	0.108	0.131	1.228	0.073	0.240	0.001	0.045	-0.160	-0.037	-0.781	-0.875	0.028	-0.015	0.048	-0.034	-0.032	-0.683	-0.791	-0.014	0.003	0.020	-0.502	-0.085	-0.296	0.014	-0.260	-0.023	0.064	0.024	-0.094	0.485	0.067	0.006	0.089	0.067	-0.041	0.042	-0.086	-0.061	0.036	0.026	0.010	0.035	0.005	-0.136	0.006	0.006	-0.027	-0.059	0.002	0.099	0.201	0.016	-0.904	-0.064	0.004	0.030	0.001	-0.789	0.065	-0.006	-0.002	0.344	-0.015	0.002	0.001	-0.035	0.002	0.002	0.057	0.036	0.019	-0.315	0.014	-0.067	-0.950	0.022	0.016	0.012	-0.063	0.465	0.278	0.002	-0.008	0.054	-0.043	-0.015	-0.021	-0.004	-0.021	-0.052	-0.068	-0.025	0.406	-0.706	-0.404	-0.100	0.033	-0.135	0.016	-0.788	-0.039	-0.906	0.037	-0.739	-0.003	-0.022	-0.028	0.034	-0.911	-0.043	0.041	0.027	0.723	0.025	0.010	-0.057	-0.003	0.578	-0.033	-0.860	-0.050	-0.005	0.042	-0.054	0.008	-0.010	0.018	0.015	0.040	-0.033	0.013	-0.030	-0.842	0.059	-0.058	0.267	0.098	0.018	-0.031	0.097	-0.006	0.210	0.081	0.003	-0.023	-0.140	-0.688	0.105	0.095	0.139	0.259	1.013	-0.015	0.023	-0.070	-0.066	-0.094	-0.042	-0.159	-0.024	0.059	-0.052	0.446	0.645	-0.000	0.022	-0.021	0.008	0.013	0.538	-0.991	-0.954	-0.017	-0.089	-0.028	0.176	0.516	-0.324	-0.896	-0.032	0.408	0.590	-0.042	-0.002	0.001	0.037	-0.456	0.009	-0.008	0.012	0.008	0.591	-0.984	0.502	0.030	0.004	-0.202	0.037	0.196	0.032	-0.117	0.450	0.061	0.486	0.068	-0.046	-0.010	0.594	-0.041	-0.075	-0.004	0.004	0.076	0.160	-0.034	0.010	-0.016	-0.031	-0.942	0.007	0.074	-0.724	-0.027	-0.748	-0.094	-0.016	-0.020	0.029	-0.013	0.049	-0.006	-0.731	-0.661	-0.000	0.011	-0.075	-0.213	0.179	0.009	0.171	0.049	-0.034	0.975	-0.008	0.104	-0.429	-0.002	-0.011	0.089	-0.029	0.018	-0.201	-0.009	0.292	-0.565	0.002	-0.009	0.003	0.005	0.002	-0.008	0.063	0.085	0.041	0.066	0.087	0.014	-0.053	-0.002	-0.009	0.684	0.014	-0.001	-0.027	0.105	-0.035	0.030	0.072	0.120	-0.004	0.458	0.010	-0.036	0.033	0.015	-0.073	-0.109	0.030	-0.042	0.009	0.012	0.035	-0.017	0.072	-0.680	0.011	-0.858	-0.031	0.032	-0.036	0.032	0.032	-0.027	-0.493	-0.045	-0.159	0.004	0.029	-0.362	0.020	-0.345	-0.439	-0.019	0.393	-0.330	-0.034	-0.045	0.020	0.132	0.014	-0.058	-0.444	0.078	0.012	0.044	0.411	-0.033	0.064	-0.033	0.926	0.192	0.036	0.009	-0.581	0.061	-0.023	0.028	0.008	0.037	0.383	0.003	0.030	0.061	-0.020	0.086	0.102	-0.018	0.166	0.038	-0.331	0.287	0.009	-0.004	-0.039	0.543	-0.003	-0.062	0.016	-1.083	0.012	-0.031	0.001	-0.103	-0.067	0.007	-0.004	0.036	-0.032	0.037	-0.215	0.001	0.029	0.055	0.198	0.136	0.006	0.023	-0.002	-0.160	0.177	-0.011	0.074	0.007	0.005	-0.017	-0.003	0.713	-0.873	0.137	-0.033	0.122	-0.300	0.036	-0.003	-0.003	-0.001	0.052	-0.020	-0.916	0.152	-0.047	0.027	-0.016
AGRN	375790	1p36.33	0.242	0.007	0.005	-0.072	0.048	-0.094	0.013	-0.029	-0.222	-0.063	-0.017	-0.739	-0.462	-0.135	-0.639	-0.002	-0.887	0.069	0.024	-0.026	-0.012	0.007	-0.053	0.004	-0.020	0.095	-0.176	0.143	0.028	0.146	-0.014	-0.026	0.056	0.019	-0.688	-0.072	-0.547	0.018	0.009	0.029	-0.001	0.015	-0.039	-0.001	0.010	0.007	0.033	0.020	0.018	0.007	0.060	0.216	0.023	-0.012	0.662	-0.921	-0.600	-0.018	-0.062	-0.030	0.594	0.525	0.038	-0.234	-0.002	-0.012	-0.085	0.001	0.089	-0.017	-0.032	0.055	0.055	0.281	0.112	0.021	0.010	-0.010	-0.001	-0.062	-0.002	0.173	0.666	0.176	0.090	0.045	-0.289	-0.059	0.092	0.038	0.006	-0.039	0.015	-0.062	-0.872	-0.101	0.084	0.015	0.123	0.013	0.626	-0.012	0.092	0.014	0.048	0.004	0.010	0.032	0.017	0.055	0.018	-0.062	0.016	0.002	-0.024	0.284	-0.049	-0.015	0.002	0.008	0.086	0.012	-0.656	-0.531	0.002	-0.490	1.006	-0.012	-0.710	-0.028	0.035	-0.652	0.093	0.336	-0.000	0.004	-0.015	-0.586	-0.013	0.967	0.359	-0.009	0.037	0.107	0.054	-0.497	-0.001	0.352	-0.165	0.411	-0.001	0.002	0.239	-0.018	0.296	0.013	0.034	0.296	0.051	0.007	0.071	0.004	0.020	-1.013	-0.045	0.281	0.093	0.059	0.326	0.108	0.131	1.228	0.073	0.240	0.001	0.045	-0.160	-0.037	-0.781	-0.875	0.028	-0.015	0.048	-0.034	-0.032	-0.683	-0.791	-0.014	0.003	0.020	-0.502	-0.085	-0.296	0.014	-0.260	-0.023	0.064	0.024	-0.094	0.485	0.067	0.006	0.089	0.067	-0.041	0.042	-0.086	-0.061	0.036	0.026	0.010	0.035	0.005	-0.136	0.006	0.006	-0.027	-0.059	0.002	0.099	0.201	0.016	-0.904	-0.064	0.004	0.030	0.001	-0.789	0.065	-0.006	-0.002	0.344	-0.015	0.002	0.001	-0.035	0.002	0.002	0.057	0.036	0.019	-0.315	0.014	-0.067	-0.950	0.022	0.016	0.012	-0.063	0.465	0.278	0.002	-0.008	0.054	-0.043	-0.015	-0.021	-0.004	-0.021	-0.052	-0.068	-0.025	0.406	-0.706	-0.404	-0.100	0.033	-0.135	0.016	-0.788	-0.039	-0.906	0.037	-0.739	-0.003	-0.022	-0.028	0.034	-0.911	-0.043	0.041	0.027	0.723	0.025	0.010	-0.057	-0.003	0.578	-0.033	-0.860	-0.050	-0.005	0.042	-0.054	0.008	-0.010	0.018	0.015	0.040	-0.033	0.013	-0.030	-0.842	0.059	-0.058	0.267	0.098	0.018	-0.031	0.097	-0.006	0.210	0.081	0.003	-0.023	-0.140	-0.688	0.105	0.095	0.139	0.259	1.013	-0.015	0.023	-0.070	-0.066	-0.094	-0.042	-0.159	-0.024	0.059	-0.052	0.446	0.645	-0.000	0.022	-0.021	0.008	0.013	0.538	-0.991	-0.954	-0.017	-0.089	-0.028	0.176	0.516	-0.324	-0.896	-0.032	0.408	0.590	-0.042	-0.002	0.001	0.037	-0.456	0.009	-0.008	0.012	0.008	0.591	-0.984	0.502	0.030	0.004	-0.202	0.037	0.196	0.032	-0.117	0.450	0.061	0.486	0.068	-0.046	-0.010	0.594	-0.041	-0.075	-0.004	0.004	0.076	0.160	-0.034	0.010	-0.016	-0.031	-0.942	0.007	0.074	-0.724	-0.027	-0.748	-0.094	-0.016	-0.020	0.029	-0.013	0.049	-0.006	-0.731	-0.661	-0.000	0.011	-0.075	-0.213	0.179	0.009	0.171	0.049	-0.034	0.975	-0.008	0.104	-0.429	-0.002	-0.011	0.089	-0.029	0.018	-0.201	-0.009	0.292	-0.565	0.002	-0.009	0.003	0.005	0.002	-0.008	0.063	0.085	0.041	0.066	0.087	0.014	-0.053	-0.002	-0.009	0.684	0.014	-0.001	-0.027	0.105	-0.035	0.030	0.072	0.120	-0.004	0.458	0.010	-0.036	0.033	0.015	-0.073	-0.109	0.030	-0.042	0.009	0.012	0.035	-0.017	0.072	-0.680	0.011	-0.858	-0.031	0.032	-0.036	0.032	0.032	-0.027	-0.493	-0.045	-0.159	0.004	0.029	-0.362	0.020	-0.345	-0.439	-0.019	0.393	-0.330	-0.034	-0.045	0.020	0.132	0.014	-0.058	-0.444	0.078	0.012	0.044	0.411	-0.033	0.064	-0.033	0.926	0.192	0.036	0.009	-0.581	0.061	-0.023	0.028	0.008	0.037	0.383	0.003	0.030	0.061	-0.020	0.086	0.102	-0.018	0.166	0.038	-0.331	0.287	0.009	-0.004	-0.039	0.543	-0.003	-0.062	0.016	-1.083	0.012	-0.031	0.001	-0.103	-0.067	0.007	-0.004	0.036	-0.032	0.037	-0.215	0.001	0.029	0.055	0.198	0.136	0.006	0.023	-0.002	-0.160	0.177	-0.011	0.074	0.007	0.005	-0.017	-0.003	0.713	-0.873	0.137	-0.033	0.122	-0.300	0.036	-0.003	-0.003	-0.001	0.052	-0.020	-0.916	0.152	-0.047	0.027	-0.016
ANKRD65	441869	1p36.33	0.242	0.007	0.005	-0.072	0.048	-0.094	0.013	-0.029	-0.222	-0.063	-0.017	-0.739	-0.462	-0.135	-0.639	-0.002	-0.887	0.069	0.024	-0.026	-0.012	0.007	-0.053	0.004	-0.020	0.095	-0.176	0.143	0.028	0.146	-0.014	-0.026	0.056	0.019	-0.688	-0.072	-0.547	0.018	0.009	0.029	-0.001	0.015	-0.039	-0.001	0.010	0.007	0.033	0.020	0.018	0.007	0.060	0.216	0.023	-0.012	0.662	-0.921	-0.600	-0.018	-0.062	-0.030	0.594	0.525	0.038	-0.234	-0.002	-0.012	-0.085	0.001	0.089	-0.017	-0.032	0.055	0.055	0.281	0.112	0.021	0.010	-0.010	-0.001	-0.062	-0.002	0.173	0.666	0.176	0.090	0.045	-0.289	-0.059	0.092	0.038	0.006	-0.039	0.015	-0.062	-0.872	-0.101	0.084	0.015	0.123	0.013	0.626	-0.012	0.092	0.014	0.048	0.004	0.010	0.032	0.017	0.055	0.018	-0.062	0.016	0.002	-0.024	0.284	-0.049	-0.015	0.002	0.008	0.086	0.012	-0.656	-0.531	0.002	-0.490	1.006	-0.012	-0.710	-0.028	0.035	-0.652	0.093	0.336	-0.000	0.004	-0.015	-0.586	-0.013	0.967	0.359	-0.009	0.037	0.107	0.054	-0.497	-0.001	0.352	-0.165	0.411	-0.001	0.002	0.239	-0.018	0.296	0.013	0.034	0.296	0.051	0.007	0.071	0.004	0.020	-1.013	-0.045	0.281	0.093	0.059	0.326	0.108	0.131	1.228	0.073	0.240	0.001	0.045	-0.160	-0.037	-0.781	-0.875	0.028	-0.015	0.048	-0.034	-0.032	-0.683	-0.791	-0.014	0.003	0.020	-0.502	-0.085	-0.296	0.014	-0.260	-0.023	0.064	0.024	-0.094	0.485	0.067	0.006	0.089	0.067	-0.041	0.042	-0.086	-0.061	0.036	0.026	0.010	0.035	0.005	-0.136	0.006	0.006	-0.027	-0.059	0.002	0.099	0.201	0.016	-0.904	-0.064	0.004	0.030	0.001	-0.789	0.065	-0.006	-0.002	0.344	-0.015	0.002	0.001	-0.035	0.002	0.002	0.057	0.036	0.019	-0.315	0.014	-0.067	-0.950	0.022	0.016	0.012	-0.063	0.465	0.278	0.002	-0.008	0.054	-0.043	-0.015	-0.021	-0.004	-0.021	-0.052	-0.068	-0.025	0.406	-0.706	-0.404	-0.100	0.033	-0.135	0.016	-0.788	-0.039	-0.906	0.037	-0.739	-0.003	-0.022	-0.028	0.034	-0.911	-0.043	0.041	0.027	0.723	0.025	0.010	-0.057	-0.003	0.578	-0.033	-0.860	-0.050	-0.005	0.042	-0.054	0.008	-0.010	0.018	0.015	0.040	-0.033	0.013	-0.030	-0.842	0.059	-0.058	0.267	0.098	0.018	-0.031	0.097	-0.006	0.210	0.081	0.003	-0.023	-0.140	-0.688	0.105	0.095	0.139	0.259	1.013	-0.015	0.023	-0.070	-0.066	-0.094	-0.042	-0.159	-0.024	0.059	-0.052	0.446	0.645	-0.000	0.022	-0.021	0.008	0.013	0.538	-0.991	-0.954	-0.017	-0.089	-0.028	0.176	0.516	-0.324	-0.896	-0.032	0.408	0.590	-0.042	-0.002	0.001	0.037	-0.456	0.009	-0.008	0.012	0.008	0.591	-0.984	0.502	0.030	0.004	-0.202	0.037	0.196	0.032	-0.117	0.450	0.061	0.486	0.068	-0.046	-0.010	0.594	-0.041	-0.075	-0.004	0.004	0.076	0.160	-0.034	0.010	-0.016	-0.031	-0.942	0.007	0.074	-0.724	-0.027	-0.748	-0.094	-0.016	-0.020	0.029	-0.013	0.049	-0.006	-0.731	-0.661	-0.000	0.011	-0.075	-0.213	0.179	0.009	0.171	0.049	-0.034	0.975	-0.008	0.104	-0.429	-0.002	-0.011	0.089	-0.029	0.018	-0.201	-0.009	0.292	-0.565	0.002	-0.009	0.003	0.005	0.002	-0.008	0.063	0.085	0.041	0.066	0.087	0.014	-0.053	-0.002	-0.009	0.684	0.014	-0.001	-0.027	0.105	-0.035	0.030	0.072	0.120	-0.004	0.458	0.010	-0.036	0.033	0.015	-0.073	-0.109	0.030	-0.042	0.009	0.012	0.035	-0.017	0.072	-0.680	0.011	-0.858	-0.031	0.032	-0.036	0.032	0.032	-0.027	-0.493	-0.045	-0.159	0.004	0.029	-0.362	0.020	-0.345	-0.439	-0.019	0.393	-0.330	-0.034	-0.045	0.020	0.132	0.014	-0.058	-0.444	0.078	0.012	0.044	0.411	-0.033	0.064	-0.033	0.926	0.192	0.036	0.009	-0.581	0.061	-0.023	0.028	0.008	0.037	0.383	0.003	0.030	0.061	-0.020	0.086	0.102	-0.018	0.166	0.038	-0.331	0.287	0.009	-0.004	-0.039	0.543	-0.003	-0.062	0.016	-1.083	0.012	-0.031	0.001	-0.103	-0.067	0.007	-0.004	0.036	-0.032	0.037	-0.215	0.001	0.029	0.055	0.198	0.136	0.006	0.023	-0.002	-0.160	0.177	-0.011	0.074	0.007	0.005	-0.017	-0.003	0.713	-0.873	0.137	-0.033	0.122	-0.300	0.036	-0.003	-0.003	-0.001	0.052	-0.020	-0.916	0.152	-0.047	0.027	-0.016
ATAD3A	55210	1p36.33	0.242	0.007	0.005	-0.072	0.048	-0.094	0.013	-0.029	-0.222	-0.063	-0.017	-0.739	-0.462	-0.135	-0.639	-0.002	-0.887	0.069	0.024	-0.026	-0.012	0.007	-0.053	0.004	-0.020	0.095	-0.176	0.143	0.028	0.146	-0.014	-0.026	0.056	0.019	-0.688	-0.072	-0.547	0.018	0.009	0.029	-0.001	0.015	-0.039	-0.001	0.010	0.007	0.033	0.020	0.018	0.007	0.060	0.216	0.023	-0.012	0.662	-0.921	-0.600	-0.018	-0.062	-0.030	0.594	0.525	0.038	-0.234	-0.002	-0.012	-0.085	0.001	0.089	-0.017	-0.032	0.055	0.055	0.281	0.112	0.021	0.010	-0.010	-0.001	-0.062	-0.002	0.173	0.666	0.176	0.090	0.045	-0.289	-0.059	0.092	0.038	0.006	-0.039	0.015	-0.062	-0.872	-0.101	0.084	0.015	0.123	0.013	0.626	-0.012	0.092	0.014	0.048	0.004	0.010	0.032	0.017	0.055	0.018	-0.062	0.016	0.002	-0.024	0.284	-0.049	-0.015	0.002	0.008	0.086	0.012	-0.656	-0.531	0.002	-0.490	1.006	-0.012	-0.710	-0.028	0.035	-0.652	0.093	0.336	-0.000	0.004	-0.015	-0.586	-0.013	0.967	0.359	-0.009	0.037	0.107	0.054	-0.497	-0.001	0.352	-0.165	0.411	-0.001	0.002	0.239	-0.018	0.296	0.013	0.034	0.296	0.051	0.007	0.071	0.004	0.020	-1.013	-0.045	0.281	0.093	0.059	0.326	0.108	0.131	1.228	0.073	0.240	0.001	0.045	-0.160	-0.037	-0.781	-0.875	0.028	-0.015	0.048	-0.034	-0.032	-0.683	-0.791	-0.014	0.003	0.020	-0.502	-0.085	-0.296	0.014	-0.260	-0.023	0.064	0.024	-0.094	0.485	0.067	0.006	0.089	0.067	-0.041	0.042	-0.086	-0.061	0.036	0.026	0.010	0.035	0.005	-0.136	0.006	0.006	-0.027	-0.059	0.002	0.099	0.201	0.016	-0.904	-0.064	0.004	0.030	0.001	-0.789	0.065	-0.006	-0.002	0.344	-0.015	0.002	0.001	-0.035	0.002	0.002	0.057	0.036	0.019	-0.315	0.014	-0.067	-0.950	0.022	0.016	0.012	-0.063	0.465	0.278	0.002	-0.008	0.054	-0.043	-0.015	-0.021	-0.004	-0.021	-0.052	-0.068	-0.025	0.406	-0.706	-0.404	-0.100	0.033	-0.135	0.016	-0.788	-0.039	-0.906	0.037	-0.739	-0.003	-0.022	-0.028	0.034	-0.911	-0.043	0.041	0.027	0.723	0.025	0.010	-0.057	-0.003	0.578	-0.033	-0.860	-0.050	-0.005	0.042	-0.054	0.008	-0.010	0.018	0.015	0.040	-0.033	0.013	-0.030	-0.842	0.059	-0.058	0.267	0.098	0.018	-0.031	0.097	-0.006	0.210	0.081	0.003	-0.023	-0.140	-0.688	0.105	0.095	0.139	0.259	1.013	-0.015	0.023	-0.070	-0.066	-0.094	-0.042	-0.159	-0.024	0.059	-0.052	0.446	0.645	-0.000	0.022	-0.021	0.008	0.013	0.538	-0.991	-0.954	-0.017	-0.089	-0.028	0.176	0.516	-0.324	-0.896	-0.032	0.408	0.590	-0.042	-0.002	0.001	0.037	-0.456	0.009	-0.008	0.012	0.008	0.591	-0.984	0.502	0.030	0.004	-0.202	0.037	0.196	0.032	-0.117	0.450	0.061	0.486	0.068	-0.046	-0.010	0.594	-0.041	-0.075	-0.004	0.004	0.076	0.160	-0.034	0.010	-0.016	-0.031	-0.942	0.007	0.074	-0.724	-0.027	-0.748	-0.094	-0.016	-0.020	0.029	-0.013	0.049	-0.006	-0.731	-0.661	-0.000	0.011	-0.075	-0.213	0.179	0.009	0.171	0.049	-0.034	0.975	-0.008	0.104	-0.429	-0.002	-0.011	0.089	-0.029	0.018	-0.201	-0.009	0.292	-0.565	0.002	-0.009	0.003	0.005	0.002	-0.008	0.063	0.085	0.041	0.066	0.087	0.014	-0.053	-0.002	-0.009	0.684	0.014	-0.001	-0.027	0.105	-0.035	0.030	0.072	0.120	-0.004	0.458	0.010	-0.036	0.033	0.015	-0.073	-0.109	0.030	-0.042	0.009	0.012	0.035	-0.017	0.072	-0.680	0.011	-0.858	-0.031	0.032	-0.036	0.032	0.032	-0.027	-0.493	-0.045	-0.159	0.004	0.029	-0.362	0.020	-0.345	-0.439	-0.019	0.393	-0.330	-0.034	-0.045	0.020	0.132	0.014	-0.058	-0.444	0.078	0.012	0.044	0.411	-0.033	0.064	-0.033	0.926	0.192	0.036	0.009	-0.581	0.061	-0.023	0.028	0.008	0.037	0.383	0.003	0.030	0.061	-0.020	0.086	0.102	-0.018	0.166	0.038	-0.331	0.287	0.009	-0.004	-0.039	0.543	-0.003	-0.062	0.016	-1.083	0.012	-0.031	0.001	-0.103	-0.067	0.007	-0.004	0.036	-0.032	0.037	-0.215	0.001	0.029	0.055	0.198	0.136	0.006	0.023	-0.002	-0.160	0.177	-0.011	0.074	0.007	0.005	-0.017	-0.003	0.713	-0.873	0.137	-0.033	0.122	-0.300	0.036	-0.003	-0.003	-0.001	0.052	-0.020	-0.916	0.152	-0.047	0.027	-0.016
ATAD3B	83858	1p36.33	0.242	0.007	0.005	-0.072	0.048	-0.094	0.013	-0.029	-0.222	-0.063	-0.017	-0.739	-0.462	-0.135	-0.639	-0.002	-0.887	0.069	0.024	-0.026	-0.012	0.007	-0.053	0.004	-0.020	0.095	-0.176	0.143	0.028	0.146	-0.014	-0.026	0.056	0.019	-0.688	-0.072	-0.547	0.018	0.009	0.029	-0.001	0.015	-0.039	-0.001	0.010	0.007	0.033	0.020	0.018	0.007	0.060	0.216	0.023	-0.012	0.662	-0.921	-0.600	-0.018	-0.062	-0.030	0.594	0.525	0.038	-0.234	-0.002	-0.012	-0.085	0.001	0.089	-0.017	-0.032	0.055	0.055	0.281	0.112	0.021	0.010	-0.010	-0.001	-0.062	-0.002	0.173	0.666	0.176	0.090	0.045	-0.289	-0.059	0.092	0.038	0.006	-0.039	0.015	-0.062	-0.872	-0.101	0.084	0.015	0.123	0.013	0.626	-0.012	0.092	0.014	0.048	0.004	0.010	0.032	0.017	0.055	0.018	-0.062	0.016	0.002	-0.024	0.284	-0.049	-0.015	0.002	0.008	0.086	0.012	-0.656	-0.531	0.002	-0.490	1.006	-0.012	-0.710	-0.028	0.035	-0.652	0.093	0.336	-0.000	0.004	-0.015	-0.586	-0.013	0.967	0.359	-0.009	0.037	0.107	0.054	-0.497	-0.001	0.352	-0.165	0.411	-0.001	0.002	0.239	-0.018	0.296	0.013	0.034	0.296	0.051	0.007	0.071	0.004	0.020	-1.013	-0.045	0.281	0.093	0.059	0.326	0.108	0.131	1.228	0.073	0.240	0.001	0.045	-0.160	-0.037	-0.781	-0.875	0.028	-0.015	0.048	-0.034	-0.032	-0.683	-0.791	-0.014	0.003	0.020	-0.502	-0.085	-0.296	0.014	-0.260	-0.023	0.064	0.024	-0.094	0.485	0.067	0.006	0.089	0.067	-0.041	0.042	-0.086	-0.061	0.036	0.026	0.010	0.035	0.005	-0.136	0.006	0.006	-0.027	-0.059	0.002	0.099	0.201	0.016	-0.904	-0.064	0.004	0.030	0.001	-0.789	0.065	-0.006	-0.002	0.344	-0.015	0.002	0.001	-0.035	0.002	0.002	0.057	0.036	0.019	-0.315	0.014	-0.067	-0.950	0.022	0.016	0.012	-0.063	0.465	0.278	0.002	-0.008	0.054	-0.043	-0.015	-0.021	-0.004	-0.021	-0.052	-0.068	-0.025	0.406	-0.706	-0.404	-0.100	0.033	-0.135	0.016	-0.788	-0.039	-0.906	0.037	-0.739	-0.003	-0.022	-0.028	0.034	-0.911	-0.043	0.041	0.027	0.723	0.025	0.010	-0.057	-0.003	0.578	-0.033	-0.860	-0.050	-0.005	0.042	-0.054	0.008	-0.010	0.018	0.015	0.040	-0.033	0.013	-0.030	-0.842	0.059	-0.058	0.267	0.098	0.018	-0.031	0.097	-0.006	0.210	0.081	0.003	-0.023	-0.140	-0.688	0.105	0.095	0.139	0.259	1.013	-0.015	0.023	-0.070	-0.066	-0.094	-0.042	-0.159	-0.024	0.059	-0.052	0.446	0.645	-0.000	0.022	-0.021	0.008	0.013	0.538	-0.991	-0.954	-0.017	-0.089	-0.028	0.176	0.516	-0.324	-0.896	-0.032	0.408	0.590	-0.042	-0.002	0.001	0.037	-0.456	0.009	-0.008	0.012	0.008	0.591	-0.984	0.502	0.030	0.004	-0.202	0.037	0.196	0.032	-0.117	0.450	0.061	0.486	0.068	-0.046	-0.010	0.594	-0.041	-0.075	-0.004	0.004	0.076	0.160	-0.034	0.010	-0.016	-0.031	-0.942	0.007	0.074	-0.724	-0.027	-0.748	-0.094	-0.016	-0.020	0.029	-0.013	0.049	-0.006	-0.731	-0.661	-0.000	0.011	-0.075	-0.213	0.179	0.009	0.171	0.049	-0.034	0.975	-0.008	0.104	-0.429	-0.002	-0.011	0.089	-0.029	0.018	-0.201	-0.009	0.292	-0.565	0.002	-0.009	0.003	0.005	0.002	-0.008	0.063	0.085	0.041	0.066	0.087	0.014	-0.053	-0.002	-0.009	0.684	0.014	-0.001	-0.027	0.105	-0.035	0.030	0.072	0.120	-0.004	0.458	0.010	-0.036	0.033	0.015	-0.073	-0.109	0.030	-0.042	0.009	0.012	0.035	-0.017	0.072	-0.680	0.011	-0.858	-0.031	0.032	-0.036	0.032	0.032	-0.027	-0.493	-0.045	-0.159	0.004	0.029	-0.362	0.020	-0.345	-0.439	-0.019	0.393	-0.330	-0.034	-0.045	0.020	0.132	0.014	-0.058	-0.444	0.078	0.012	0.044	0.411	-0.033	0.064	-0.033	0.926	0.192	0.036	0.009	-0.581	0.061	-0.023	0.028	0.008	0.037	0.383	0.003	0.030	0.061	-0.020	0.086	0.102	-0.018	0.166	0.038	-0.331	0.287	0.009	-0.004	-0.039	0.543	-0.003	-0.062	0.016	-1.083	0.012	-0.031	0.001	-0.103	-0.067	0.007	-0.004	0.036	-0.032	0.037	-0.215	0.001	0.029	0.055	0.198	0.136	0.006	0.023	-0.002	-0.160	0.177	-0.011	0.074	0.007	0.005	-0.017	-0.003	0.713	-0.873	0.137	-0.033	0.122	-0.300	0.036	-0.003	-0.003	-0.001	0.052	-0.020	-0.916	0.152	-0.047	0.027	-0.016

gistic_thresholedbygene %>% head() %>% gt::gt()

Gene.Symbol	Locus.ID	Cytoband	TCGA.02.0001.01C.01D.0182.01	TCGA.02.0015.01A.01G.0293.01	TCGA.02.0023.01B.01G.0293.01	TCGA.02.0024.01B.01D.0182.01	TCGA.02.0025.01A.01G.0293.01	TCGA.02.0026.01B.01G.0293.01	TCGA.02.0028.01A.01D.0182.01	TCGA.02.0051.01A.01G.0293.01	TCGA.02.0052.01A.01D.0182.01	TCGA.02.0055.01A.01D.0182.01	TCGA.02.0064.01A.01D.0193.01	TCGA.02.0069.01A.01D.0193.01	TCGA.02.0106.01A.01D.0275.01	TCGA.02.0113.01A.01D.0193.01	TCGA.02.0114.01A.01D.0193.01	TCGA.02.0115.01A.01D.0193.01	TCGA.02.0266.01A.01D.0275.01	TCGA.02.0269.01B.01D.0275.01	TCGA.02.0281.01A.01D.0275.01	TCGA.02.0332.01A.01D.0275.01	TCGA.02.0333.01A.02D.0275.01	TCGA.02.0446.01A.01D.0275.01	TCGA.02.0451.01A.01D.0275.01	TCGA.02.0456.01A.01D.0275.01	TCGA.02.2483.01A.01D.0784.01	TCGA.06.0126.01A.01D.0214.01	TCGA.06.0127.01A.01D.0310.01	TCGA.06.0130.01A.01D.0214.01	TCGA.06.0133.01A.02D.0214.01	TCGA.06.0152.01A.02D.0310.01	TCGA.06.0162.01A.01D.0275.01	TCGA.06.0164.01A.01D.0275.01	TCGA.06.0166.01A.01D.0236.01	TCGA.06.0168.01A.02D.0236.01	TCGA.06.0171.01A.02D.0236.01	TCGA.06.0175.01A.01D.0275.01	TCGA.06.0177.01A.01D.0275.01	TCGA.06.0184.01A.01D.0236.01	TCGA.06.0187.01A.01D.0236.01	TCGA.06.0188.01A.01D.0236.01	TCGA.06.0192.01B.01D.0333.01	TCGA.06.0195.01B.01D.0236.01	TCGA.06.0206.01A.01D.0236.01	TCGA.06.0208.01B.01D.0236.01	TCGA.06.0209.01A.01D.0236.01	TCGA.06.0213.01A.01D.0236.01	TCGA.06.0216.01B.01D.0333.01	TCGA.06.0237.01A.02D.0236.01	TCGA.06.0402.01A.01D.0275.01	TCGA.06.0410.01A.01D.0275.01	TCGA.06.0414.01A.01D.0275.01	TCGA.06.0644.01A.02D.0310.01	TCGA.06.0645.01A.01D.0310.01	TCGA.06.0646.01A.01D.0310.01	TCGA.06.0649.01B.01D.0333.01	TCGA.06.0743.01A.01D.0333.01	TCGA.06.0745.01A.01D.0333.01	TCGA.06.0747.01A.01D.0333.01	TCGA.06.0878.01A.01D.0384.01	TCGA.06.0879.01A.01D.0384.01	TCGA.06.1084.01A.01D.0517.01	TCGA.06.1087.01A.02D.0517.01	TCGA.06.1801.01A.02D.0591.01	TCGA.06.2557.01A.01D.0784.01	TCGA.06.5411.01A.01D.1694.01	TCGA.06.5856.01A.01D.1694.01	TCGA.06.5859.01A.01D.1694.01	TCGA.06.6693.01A.11D.1842.01	TCGA.06.6698.01A.11D.1842.01	TCGA.08.0244.01A.01G.0293.01	TCGA.08.0344.01A.01G.0293.01	TCGA.08.0349.01A.01D.0310.01	TCGA.08.0350.01A.01G.0293.01	TCGA.08.0386.01A.01D.0310.01	TCGA.08.0389.01A.01G.0293.01	TCGA.08.0390.01A.01G.0293.01	TCGA.08.0510.01A.01D.0275.01	TCGA.08.0514.01A.01D.0275.01	TCGA.08.0517.01A.01D.0275.01	TCGA.08.0520.01A.01D.0275.01	TCGA.08.0531.01A.01D.0275.01	TCGA.12.0619.01A.01D.0310.01	TCGA.12.0662.01A.01D.0333.01	TCGA.12.0688.01A.02D.0333.01	TCGA.12.0820.01A.01D.0384.01	TCGA.12.0826.01A.01D.0384.01	TCGA.12.1090.01A.01D.0517.01	TCGA.12.1094.01A.01D.0517.01	TCGA.12.1095.01A.01D.0517.01	TCGA.12.1096.01A.01D.0517.01	TCGA.12.1098.01C.01D.0517.01	TCGA.12.1099.01A.01D.0517.01	TCGA.12.1598.01A.01D.0591.01	TCGA.12.3649.01A.01D.0911.01	TCGA.12.3653.01A.01D.0911.01	TCGA.12.5295.01A.01D.1479.01	TCGA.14.0783.01B.01D.0517.01	TCGA.14.0786.01B.01D.0517.01	TCGA.14.0787.01A.01D.0384.01	TCGA.14.0813.01A.01D.0384.01	TCGA.14.0817.01A.01D.0384.01	TCGA.14.0862.01B.01D.1842.01	TCGA.14.0865.01B.01D.0591.01	TCGA.14.0867.01A.01D.0384.01	TCGA.14.0871.01A.01D.0384.01	TCGA.14.1396.01A.01D.0517.01	TCGA.14.1452.01A.01D.0517.01	TCGA.14.1453.01A.01D.0517.01	TCGA.14.1454.01A.01D.0517.01	TCGA.14.1458.01A.01D.0591.01	TCGA.14.1794.01A.01D.0591.01	TCGA.14.1821.01A.01D.0591.01	TCGA.14.1823.01A.01D.0591.01	TCGA.14.1827.01A.01D.0591.01	TCGA.14.3477.01A.01D.0911.01	TCGA.15.1449.01A.01D.0517.01	TCGA.16.0861.01A.01D.0384.01	TCGA.16.1048.01B.01D.1224.01	TCGA.16.1056.01B.01D.0517.01	TCGA.19.0962.01B.01D.0517.01	TCGA.19.0963.01B.01D.0517.01	TCGA.19.1387.01A.01D.0591.01	TCGA.19.1388.01A.01D.0591.01	TCGA.19.1390.01A.01D.0911.01	TCGA.19.1787.01B.01D.0911.01	TCGA.19.1791.01A.01D.0591.01	TCGA.19.2619.01A.01D.0911.01	TCGA.19.2629.01A.01D.0911.01	TCGA.19.4065.01A.01D.2002.01	TCGA.19.5947.01A.11D.1694.01	TCGA.26.1440.01A.01D.0517.01	TCGA.26.5132.01A.01D.1479.01	TCGA.26.5136.01B.01D.1479.01	TCGA.26.6173.01A.11D.1842.01	TCGA.27.1833.01A.01D.0591.01	TCGA.27.2521.01A.01D.0784.01	TCGA.27.2524.01A.01D.0784.01	TCGA.28.1750.01A.01D.0591.01	TCGA.28.1752.01A.01D.0591.01	TCGA.28.1756.01C.01D.0784.01	TCGA.28.2501.01A.01D.1694.01	TCGA.28.2502.01B.01D.0784.01	TCGA.28.2509.01A.01D.0784.01	TCGA.28.2510.01A.01D.1694.01	TCGA.28.5207.01A.01D.1479.01	TCGA.28.5211.01C.11D.1842.01	TCGA.28.5216.01A.01D.1479.01	TCGA.28.6450.01A.11D.1694.01	TCGA.32.1970.01A.01D.0784.01	TCGA.32.1977.01A.01D.1224.01	TCGA.32.1987.01A.01D.1224.01	TCGA.32.2616.01A.01D.0911.01	TCGA.32.2634.01A.01D.0911.01	TCGA.32.4208.01A.01D.1224.01	TCGA.32.4210.01A.01D.1224.01	TCGA.32.5222.01A.01D.1479.01	TCGA.41.2575.01A.01D.0911.01	TCGA.41.4097.01A.01D.1224.01	TCGA.74.6575.01A.11D.1842.01	TCGA.74.6584.01A.11D.1842.01	TCGA.76.4925.01A.01D.1479.01	TCGA.76.4931.01A.01D.1479.01	TCGA.76.4934.01A.01D.1479.01	TCGA.76.6283.01A.11D.1842.01	TCGA.76.6285.01A.11D.1694.01	TCGA.76.6286.01A.11D.1842.01	TCGA.76.6657.01A.11D.1842.01	TCGA.76.6660.01A.11D.1842.01	TCGA.87.5896.01A.01D.1694.01	TCGA.OX.A56R.01A.11D.A33S.01
ACAP3	116983	1p36.33	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	-1	1	1	1	1	1	-1	-1	-1	1	1	1	-1	-1	-1	2	-1	-1	1	-1	1	1	1	-1	1	-1	1	1	1	1	-1	1	1	1	1	2	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	-1	1	-1	-2	1	1	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	1	1	-1	1	1	1	-1	-2	1	1	-1	-1	1	1	-1	1	-1	1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	1	1	1	1	-1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	1	2	1	-1	1	1	1	-1	1	1	-1	-1	-1	1	1	-1	1	1	-1	1	1	-1	-1	1
ACTRT2	140625	1p36.32	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	-1	1	1	1	1	1	-1	-1	-1	1	1	1	-1	-1	-1	2	-1	-1	1	-1	1	1	1	-1	1	-1	1	1	1	1	-1	1	1	1	1	2	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	-1	1	-1	-2	1	1	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	1	1	-1	1	1	1	-1	-2	1	1	-1	-1	1	1	-1	1	-1	1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	1	1	1	1	-1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	1	2	1	-1	1	1	1	-1	1	1	-1	-1	-1	1	1	-1	1	1	-1	1	1	-1	-1	1
AGRN	375790	1p36.33	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	-1	1	1	1	1	1	-1	-1	-1	1	1	1	-1	-1	-1	2	-1	-1	1	-1	1	1	1	-1	1	-1	1	1	1	1	-1	1	1	1	1	2	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	-1	1	-1	-2	1	1	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	1	1	-1	1	1	1	-1	-2	1	1	-1	-1	1	1	-1	1	-1	1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	1	1	1	1	-1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	1	2	1	-1	1	1	1	-1	1	1	-1	-1	-1	1	1	-1	1	1	-1	1	1	-1	-1	1
ANKRD65	441869	1p36.33	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	-1	1	1	1	1	1	-1	-1	-1	1	1	1	-1	-1	-1	2	-1	-1	1	-1	1	1	1	-1	1	-1	1	1	1	1	-1	1	1	1	1	2	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	-1	1	-1	-2	1	1	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	1	1	-1	1	1	1	-1	-2	1	1	-1	-1	1	1	-1	1	-1	1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	1	1	1	1	-1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	1	2	1	-1	1	1	1	-1	1	1	-1	-1	-1	1	1	-1	1	1	-1	1	1	-1	-1	1
ATAD3A	55210	1p36.33	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	-1	1	1	1	1	1	-1	-1	-1	1	1	1	-1	-1	-1	2	-1	-1	1	-1	1	1	1	-1	1	-1	1	1	1	1	-1	1	1	1	1	2	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	-1	1	-1	-2	1	1	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	1	1	-1	1	1	1	-1	-2	1	1	-1	-1	1	1	-1	1	-1	1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	1	1	1	1	-1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	1	2	1	-1	1	1	1	-1	1	1	-1	-1	-1	1	1	-1	1	1	-1	1	1	-1	-1	1
ATAD3B	83858	1p36.33	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	-1	1	1	1	1	1	-1	-1	-1	1	1	1	-1	-1	-1	2	-1	-1	1	-1	1	1	1	-1	1	-1	1	1	1	1	-1	1	1	1	1	2	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	-1	1	-1	-2	1	1	1	-1	-1	-1	-1	-1	-1	-1	1	1	-1	-1	1	1	-1	-1	1	1	1	1	-1	1	1	1	-1	-2	1	1	-1	-1	1	1	-1	1	-1	1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	1	1	1	1	-1	-1	1	-1	1	1	1	1	-1	-1	-1	-1	-1	-1	-1	-1	1	-1	1	-1	1	2	1	-1	1	1	1	-1	1	1	-1	-1	-1	1	1	-1	1	1	-1	1	1	-1	-1	1

Genomic analysis

Copy number variations (CNVs) have a critical role in cancer development and progression. A chromosomal segment can be deleted or amplified as a result of genomic rearrangements, such as deletions, duplications, insertions, and translocations. CNVs are genomic regions greater than 1 kb with an alteration of copy number between two conditions (e.g., Tumor versus Normal).

TCGA collects copy number data and allows the CNV profiling of cancer. Tumor and paired-normal DNA samples were analyzed for CNV detection using microarray- and sequencing-based technologies. Level 3 processed data are the aberrant regions of the genome resulting from CNV segmentation, and they are available for all copy number technologies.

In this section, we will show how to analyze CNV level 3 data from TCGA to identify recurrent alterations in the cancer genome. We analyzed GBM segmented CNV from SNP array (Affymetrix Genome-Wide Human SNP Array 6.0).

Visualizing multiple genomic alteration events

In order to visualize multiple genomic alteration events, we recommend using maftools plot which is provided by Bioconductor package maftools (Mayakonda and Koeffler 2016). The listing below shows how to download mutation data using GDCquery_maf (line 4) and prepare it to use with maftools.

The function read.maf is used to prepare the MAF data to be used with maftools. We also added clinical information that will be used in survival plots.

library(maftools)
# recovering data from TCGAWorkflowData package.
data(maf_lgg_gbm)

# To prepare for maftools we will also include clinical data
# For a mutant vs WT survival analysis 
# get indexed clinical patient data for GBM samples
gbm_clin <- GDCquery_clinic(project = "TCGA-GBM", type = "Clinical")
# get indexed clinical patient data for LGG samples
lgg_clin <- GDCquery_clinic(project = "TCGA-LGG", type = "Clinical")
# Bind the results, as the columns might not be the same,
# we will will plyr rbind.fill, to have all columns from both files
clinical <- plyr::rbind.fill(gbm_clin,lgg_clin)
colnames(clinical)[grep("submitter_id",colnames(clinical))] <- "Tumor_Sample_Barcode"

# we need to create a binary variable 1 is dead 0 is not dead
plyr::count(clinical$vital_status)
clinical$Overall_Survival_Status <- 1 # dead
clinical$Overall_Survival_Status[which(clinical$vital_status != "Dead")] <- 0

# If patient is not dead we don't have days_to_death (NA)
# we will set it as the last day we know the patient is still alive
clinical$time <- clinical$days_to_death
clinical$time[is.na(clinical$days_to_death)] <- clinical$days_to_last_follow_up[is.na(clinical$days_to_death)]

# Create object to use in maftools
maf <- read.maf(
  maf = maf, 
  clinicalData = clinical, 
  isTCGA = TRUE
)

We can plot a MAF summary.

plotmafSummary(
  maf = maf,
  rmOutlier = TRUE,
  addStat = 'median',
  dashboard = TRUE
)

We can draw oncoplot with the top 20 most mutated genes and add metadata information such as molecular subtypes information.

oncoplot(
  maf = maf,
  top = 20,
  legendFontSize = 8,
  clinicalFeatures = c("tissue_or_organ_of_origin")
)

We can also perform survival analysis by grouping samples from MAF based on mutation status of given gene(s).

plot <- mafSurvival(
  maf = maf,
  genes = "TP53",
  time = 'time',
  Status = 'Overall_Survival_Status',
  isTCGA = TRUE
)

## TP53 
##  355 
##     Group medianTime     N
##    <char>      <num> <int>
## 1: Mutant      635.5   354
## 2:     WT      448.0   521

Transcriptomic analysis

Pre-Processing Data

The LGG and GBM data used for following transcriptomic analysis were downloaded using TCGAbiolinks. We downloaded only primary tumor (TP) samples, which resulted in 516 LGG samples and 156 GBM samples, then prepared it in two separate RSE objects (RangedSummarizedExperiment) saving them as an R object with a filename including both the cancer name and the name of the platform used for gene expression data.

query_exp_lgg <- GDCquery(
  project = "TCGA-LGG",
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification", 
  workflow.type = "STAR - Counts"
)
# Get only first 20 samples to make example faster
query_exp_lgg$results[[1]] <- query_exp_lgg$results[[1]][1:20,]
GDCdownload(query_exp_lgg)
exp_lgg <- GDCprepare(
  query = query_exp_lgg
)

query_exp_gbm <- GDCquery(
  project = "TCGA-GBM",
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification", 
  workflow.type = "STAR - Counts"
)
# Get only first 20 samples to make example faster
query_exp_gbm$results[[1]] <- query_exp_gbm$results[[1]][1:20,]
GDCdownload(query_exp_gbm)
exp_gbm <- GDCprepare(
  query = query_exp_gbm
)

To pre-process the data, first, we searched for possible outliers using the TCGAanalyze_Preprocessing function, which performs an Array Array Intensity correlation AAIC. In this way, we defined a square symmetric matrix of pearson correlation among all samples in each cancer type (LGG or GBM). This matrix found 0 samples with low correlation (cor.cut = 0.6) that can be identified as possible outliers.

Second, using the TCGAanalyze_Normalization function, which encompasses the functions of the EDASeq package, we normalized mRNA transcripts.

This function implements Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al. 2011) and between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al. 2010).

data("TCGA_LGG_Transcriptome_20_samples")
data("TCGA_GBM_Transcriptome_20_samples")

exp_lgg_preprocessed <- TCGAanalyze_Preprocessing(
  object = exp_lgg,
  cor.cut = 0.6,    
  datatype = "unstranded",
  filename = "LGG_IlluminaHiSeq_RNASeqV2.png"
)

exp_gbm_preprocessed <- TCGAanalyze_Preprocessing(
  object = exp_gbm,
  cor.cut = 0.6, 
  datatype = "unstranded",
  filename = "GBM_IlluminaHiSeq_RNASeqV2.png"
)
exp_preprocessed <- cbind(
  exp_lgg_preprocessed, 
  exp_gbm_preprocessed
)

exp_normalized <- TCGAanalyze_Normalization(
  tabDF = cbind(exp_lgg_preprocessed, exp_gbm_preprocessed),
  geneInfo = TCGAbiolinks::geneInfoHT,
  method = "gcContent"
) # 60513   40

exp_filtered <- TCGAanalyze_Filtering(
  tabDF = exp_normalized,
  method = "quantile",
  qnt.cut =  0.25
)  # 44630   40

exp_filtered_lgg <- exp_filtered[
  ,substr(colnames(exp_filtered),1,12) %in% lgg_clin$bcr_patient_barcode
]

exp_filtered_gbm <-   exp_filtered[
  ,substr(colnames(exp_filtered),1,12) %in% gbm_clin$bcr_patient_barcode
]

diff_expressed_genes <- TCGAanalyze_DEA(
  mat1 = exp_filtered_lgg,
  mat2 = exp_filtered_gbm,
  Cond1type = "LGG",
  Cond2type = "GBM",
  fdr.cut = 0.01 ,
  logFC.cut = 1,
  method = "glmLRT"
)

# Number of differentially expressed genes (DEG)
nrow(diff_expressed_genes)

[1] 9599

EA: enrichment analysis

In order to understand the underlying biological process of DEGs we performed an enrichment analysis using TCGAanalyze_EA_complete function.

#-------------------  4.2 EA: enrichment analysis             --------------------
ansEA <- TCGAanalyze_EAcomplete(
  TFname = "DEA genes LGG Vs GBM", 
  RegulonList = diff_expressed_genes$gene_name
)

TCGAvisualize_EAbarplot(
  tf = rownames(ansEA$ResBP),
  filename = NULL,
  GOBPTab = ansEA$ResBP,
  nRGTab = diff_expressed_genes$gene_name,
  nBar = 20
)

TCGAvisualize_EAbarplot(
  tf = rownames(ansEA$ResBP),
  filename = NULL,
  GOCCTab = ansEA$ResCC,
  nRGTab = diff_expressed_genes$gene_name,
  nBar = 20
)

TCGAvisualize_EAbarplot(
  tf = rownames(ansEA$ResBP),
  filename = NULL,
  GOMFTab = ansEA$ResMF,
  nRGTab = diff_expressed_genes$gene_name,
  nBar = 20
)

TCGAvisualize_EAbarplot(
  tf = rownames(ansEA$ResBP),
  filename = NULL,
  PathTab = ansEA$ResPat,
  nRGTab = rownames(diff_expressed_genes),
  nBar = 20
)

The plot shows canonical pathways significantly overrepresented (enriched) by the DEGs (differentially expressed genes) with the number of genes for the main categories of three ontologies (GO:biological process, GO:cellular component, and GO:molecular function, respectively). The most statistically significant canonical pathways identified in DEGs list are listed according to their p-value corrected FDR (-Log) (colored bars) and the ratio of list genes found in each pathway over the total number of genes in that pathway (ratio, red line).]

TCGAanalyze_EAbarplot outputs a bar chart as shown in figure with the number of genes for the main categories of three ontologies (i.e., GO:biological process, GO:cellular component, and GO:molecular function).

The Figure shows canonical pathways significantly overrepresented (enriched) by the DEGs. The most statistically significant canonical pathways identified in the DEGs are ranked according to their p-value corrected FDR (-Log10) (colored bars) and the ratio of list genes found in each pathway over the total number of genes in that pathway (ratio, red line).

PEA: Pathways enrichment analysis

To verify if the genes found have a specific role in a pathway, the Bioconductor package pathview (Luo and Brouwer 2013) can be used. Listing below shows an example how to use it. It can receive, for example, a named vector of genes with their expression level, the pathway.id which can be found in KEGG database, the species (’hsa’ for Homo sapiens) and the limits for the gene expression.

library(SummarizedExperiment)

# DEGs TopTable
dataDEGsFiltLevel <- TCGAanalyze_LevelTab(
  FC_FDR_table_mRNA = diff_expressed_genes,
  typeCond1 = "LGG",
  typeCond2 = "GBM",
  TableCond1 = exp_filtered[,colnames(exp_filtered_lgg)],
  TableCond2 = exp_filtered[,colnames(exp_filtered_gbm)]
)

dataDEGsFiltLevel$GeneID <- 0

library(clusterProfiler)
# Converting Gene symbol to geneID
eg = as.data.frame(
  bitr(
    dataDEGsFiltLevel$mRNA,
    fromType = "ENSEMBL",
    toType = c("ENTREZID","SYMBOL"),
    OrgDb = "org.Hs.eg.db"
  )
)
eg <- eg[!duplicated(eg$SYMBOL),]
eg <- eg[order(eg$SYMBOL,decreasing=FALSE),]

dataDEGsFiltLevel <- dataDEGsFiltLevel[dataDEGsFiltLevel$mRNA %in% eg$ENSEMBL,]
dataDEGsFiltLevel <- dataDEGsFiltLevel[eg$ENSEMBL,]
rownames(dataDEGsFiltLevel) <- eg$SYMBOL

all(eg$SYMBOL == rownames(dataDEGsFiltLevel))

[1] TRUE

dataDEGsFiltLevel$GeneID <- eg$ENTREZID

dataDEGsFiltLevel_sub <- subset(dataDEGsFiltLevel, select = c("GeneID", "logFC"))
genelistDEGs <- as.numeric(dataDEGsFiltLevel_sub$logFC)
names(genelistDEGs) <- dataDEGsFiltLevel_sub$GeneID
library(pathview)
# pathway.id: hsa05214 is the glioma pathway
# limit: sets the limit for gene expression legend and color
hsa05214 <- pathview::pathview(
  gene.data  = genelistDEGs,
  pathway.id = "hsa05214",
  species    = "hsa",
  limit = list(gene = as.integer(max(abs(genelistDEGs))))
)

The red genes are up-regulated and the green genes are down-regulated in the LGG samples compared to GBM.

Pathways enrichment analysis: glioma pathway. Red defines genes that are up-regulated and green defines genes that are down-regulated.

Inference of gene regulatory networks

Starting with the set of differentially expressed genes, we infer gene regulatory networks using the following state-of-the-art inference algorithms: ARACNE (Margolin et al. 2006), CLR (Faith et al. 2007), MRNET (Meyer et al. 2007) and C3NET (Altay and Emmert-Streib 2010). These methods are based on mutual inference and use different heuristics to infer the edges of the network. These methods have been made available via Bioconductor/CRAN packages (MINET (Meyer, Lafitte, and Bontempi 2008) and c3net, (Altay and Emmert-Streib 2010) respectively). Many gene regulatory interactions have been experimentally validated and published. These ’known’ interactions can be accessed using different tools and databases such as BioGrid (Stark et al. 2006) or GeneMANIA (Montojo et al. 2010). However, this knowledge is far from complete and in most cases only contains a small subset of the real interactome. The quality assessment of the inferred networks can be carried out by comparing the inferred interactions to those that have been validated. This comparison results in a confusion matrix as presented in Table below.

Confusion matrix, comparing inferred network to network of validated interactions.
validated not	validated/n	on-existing
inferred	TP	FP
not inferred	FN	TN

Different quality measures can then be computed such as the false positive rate \[fpr=\frac{FP}{FP+TN},\] the true positive rate (also called recall) \[tpr=\frac{TP}{TP+FN}\] and the precision \[p=\frac{TP}{TP+FP}.\] The performance of an algorithm can then be summarized using ROC (false positive rate versus true positive rate) or PR (precision versus recall) curves.

A weakness of this type of comparison is that an edge that is not present in the set of known interactions can either mean that an experimental validation has been tried and did not show any regulatory mechanism or (more likely) has not yet been attempted.
In the following, we ran the nce on i) the 2,901 differentially expressed genes identified in Section “Transcriptomic analysis”.

Retrieving known interactions

We obtained a set of known interactions from the BioGrid database.

There are 3,941 unique interactions between the 2,901 differentially expressed genes.

Using differentially expressed genes from TCGAbiolinks workflow

We start this analysis by inferring one gene set for the LGG data.

### read biogrid info (available in TCGAWorkflowData as "biogrid")
### Check last version in https://thebiogrid.org/download.php 
file <- "https://downloads.thebiogrid.org/Download/BioGRID/Latest-Release/BIOGRID-ALL-LATEST.tab2.zip"
if(!file.exists(gsub("zip","txt",basename(file)))){
  downloader::download(file,basename(file))
  unzip(basename(file),junkpaths =TRUE)
}

tmp.biogrid <- vroom::vroom(
  dir(pattern = "BIOGRID-ALL.*\\.txt")
)

### plot details (colors & symbols)
mycols <- c('#e41a1c','#377eb8','#4daf4a','#984ea3','#ff7f00','#ffff33','#a65628')

### load network inference libraries
library(minet)
library(c3net)

### deferentially identified genes using TCGAbiolinks
# we will use only a subset (first 50 genes) of it to make the example faster
names.genes.de <- rownames(diff_expressed_genes)[1:30]

data(biogrid)
net.biogrid.de <- getAdjacencyBiogrid(tmp.biogrid, names.genes.de)

mydata <- exp_filtered_lgg[names.genes.de, ]

### infer networks
t.mydata <- t(mydata)
net.aracne <- minet(t.mydata, method = "aracne")
net.mrnet <- minet(t.mydata)
net.clr <- minet(t.mydata, method = "clr")
net.c3net <- c3net(mydata)

### validate compared to biogrid network
tmp.val <- list(
  validate(net.aracne, net.biogrid.de), 
  validate(net.mrnet, net.biogrid.de),
  validate(net.clr, net.biogrid.de), 
  validate(net.c3net, net.biogrid.de)
)

### plot roc and compute auc for the different networks
dev1 <- show.roc(tmp.val[[1]],cex=0.3,col=mycols[1],type="l")
res.auc <- auc.roc(tmp.val[[1]])
for(count in 2:length(tmp.val)){
  show.roc(tmp.val[[count]],device=dev1,cex=0.3,col=mycols[count],type="l")
  res.auc <- c(res.auc, auc.roc(tmp.val[[count]]))
}

legend(
  "bottomright", 
  legend = paste(c("aracne","mrnet","clr","c3net"), signif(res.auc,4), sep=": "),
  col = mycols[1:length(tmp.val)],
  lty = 1, 
  bty = "n" 
)
# Please, uncomment this line to produce the pdf files.
# dev.copy2pdf(width=8,height=8,device = dev1, file = paste0("roc_biogrid_",cancertype,".pdf"))

ROC with corresponding AUC for inferred GBM networks compared to BioGrid interactions

In Figure above, the obtained ROC curve and the corresponding area under the curve (AUC) are presented. It can be observed that CLR and MRNET perform best when comparing the inferred network with known interactions from the BioGrid database.

Epigenetic analysis

The DNA methylation is an important component in numerous cellular processes, such as embryonic development, genomic imprinting, X-chromosome inactivation, and preservation of chromosome stability (Phillips 2008).

In mammals DNA methylation is found sparsely but globally, distributed in definite CpG sequences throughout the entire genome; however, there is an exception. CpG islands (CGIs) which are short interspersed DNA sequences that are enriched for GC. These islands are normally found in sites of transcription initiation and their methylation can lead to gene silencing (Deaton and Bird 2011).

Thus, the investigation of the DNA methylation is crucial to understanding regulatory gene networks in cancer as the DNA methylation represses transcription (Robertson 2005). Therefore, the DMR (Differentially Methylation Region) detection can help us investigate regulatory gene networks.

This section describes the analysis of DNA methylation using the Bioconductor package TCGAbiolinks (Colaprico et al. 2016). For this analysis, and due to the time required to perform it, we selected only 10 LGG samples and 10 GBM samples that have both DNA methylation data from Infinium HumanMethylation450 and gene expression from Illumina HiSeq 2000 RNA Sequencing Version 2 analysis. We started by checking the mean DNA methylation of different groups of samples, then performed a DMR in which we search for regions of possible biological significance, (e.g., regions that are methylated in one group and unmethylated in the other). After finding these regions, they can be visualized using heatmaps.

Visualizing the mean DNA methylation of each patient

It should be highlighted that some pre-processing of the DNA methylation data was done. The DNA methylation data from the 450k platform has three types of probes cg (CpG loci) , ch (non-CpG loci) and rs (SNP assay). The last type of probe can be used for sample identification and tracking and should be excluded for differential methylation analysis according to the ilumina manual. Therefore, the rs probes were removed. Also, probes in chromosomes X, Y were removed to eliminate potential artifacts originating from the presence of a different proportion of males and females (Marabita et al. 2013). The last pre-processing steps were to remove probes with at least one NA value.

After this pre-processing step and using the function TCGAvisualize_meanMethylation function, we can look at the mean DNA methylation of each patient in each group. It receives as argument a SummarizedExperiment object with the DNA methylation data, and the arguments groupCol and subgroupCol which should be two columns from the sample information matrix of the SummarizedExperiment object (accessed by the colData function).

#----------------------------
# Obtaining DNA methylation
#----------------------------
# Samples
lgg.samples <- matchedMetExp("TCGA-LGG", n = 10)
gbm.samples <- matchedMetExp("TCGA-GBM", n = 10)
samples <- c(lgg.samples,gbm.samples)

#-----------------------------------
# 1 - Methylation
# ----------------------------------
# For DNA methylation it is quicker in this case to download the tar.gz file
# and get the samples we want instead of downloading files by files
query <- GDCquery(
  project = c("TCGA-LGG","TCGA-GBM"),
  data.category = "DNA Methylation",
  platform = "Illumina Human Methylation 450",
  data.type = "Methylation Beta Value",
  barcode = samples
)
GDCdownload(query)
met <- GDCprepare(
  query = query, 
  save = FALSE
)

# We will use only chr9 to make the example faster
met <- subset(met,subset = as.character(seqnames(met)) %in% c("chr9"))
# This data is avaliable in the package (object elmerExample)

data(elmerExample)
#----------------------------
# Mean methylation
#----------------------------
# Plot a barplot for the groups in the disease column in the
# summarizedExperiment object

# remove probes with NA (similar to na.omit)
met <- met[rowSums(is.na(assay(met))) == 0,]

df <- data.frame(
  "Sample.mean" = colMeans(assay(met), na.rm = TRUE),
  "groups" = met$project_id
)

library(ggpubr)
ggpubr::ggboxplot(
  data = df,
  y = "Sample.mean",
  x = "groups",
  color = "groups",
  add = "jitter",
  ylab = "Mean DNA methylation (beta-values)",
  xlab = ""
) + stat_compare_means()

The figure above illustrates a mean DNA methylation plot for each sample in the GBM group (140 samples) and a mean DNA methylation for each sample in the LGG group. Genome-wide view of the data highlights a difference between the groups of tumors.

Searching for differentially methylated CpG sites

The next step is to define differentially methylated CpG sites between the two groups. This can be done using the TCGAanalyze_DMC function (see listing below). The DNA methylation data (level 3) is presented in the form of beta-values that uses a scale ranging from 0.0 (probes completely unmethylated ) up to 1.0 (probes completely methylated).

To find these differentially methylated CpG sites, first, the function calculates the difference between the mean DNA methylation (mean of the beta-values) of each group for each probe. Second, it tests for differential expression between two groups using the Wilcoxon test adjusting by the Benjamini-Hochberg method. Arguments of TCGAanalyze_DMR was set to require a minimum absolute beta-values difference of 0.15 and an adjusted p-value of less than \(0.05\).

After these tests, a volcano plot (x-axis: difference of mean DNA methylation, y-axis: statistical significance) is created to help users identify the differentially methylated CpG sites and return the object with the results in the rowRanges.

#------- Searching for differentially methylated CpG sites     ----------
dmc <- TCGAanalyze_DMC(
  data = met,
  groupCol = "project_id", # a column in the colData matrix
  group1 = "TCGA-GBM", # a type of the disease type column
  group2 = "TCGA-LGG", # a type of the disease column
  p.cut = 0.05,
  diffmean.cut = 0.15,
  save = FALSE,
  legend = "State",
  plot.filename = "LGG_GBM_metvolcano.png",
  cores = 1 # if set to 1 there will be a progress bar
)

The figure below shows the volcano plot produced by listing below. This plot aids the user in selecting relevant thresholds, as we search for candidate biological DMRs.

Volcano plot: searching for differentially methylated CpG sites (x-axis:difference of mean DNA methylation, y-axis: statistical significance)

To visualize the level of DNA methylation of these probes across all samples, we use heatmaps that can be generated by the Bioconductor package complexHeatmap (Z., n.d.). To create a heatmap using the complexHeatmap package, the user should provide at least one matrix with the DNA methylation levels. Also, annotation layers can be added and placed at the bottom, top, left side and right side of the heatmap to provide additional metadata description. The listing below shows the code to produce the heatmap of a DNA methylation datum.

#--------------------------
# DNA Methylation heatmap
#-------------------------
library(ComplexHeatmap)
clinical <- plyr::rbind.fill(
  gbm_clin,
  lgg_clin
)

# get the probes that are Hypermethylated or Hypomethylated
# met is the same object of the section 'DNA methylation analysis'
status.col <- "status"
probes <- rownames(dmc)[grep("hypo|hyper",dmc$status,ignore.case = TRUE)]
sig.met <- met[probes,]


# top annotation, which samples are LGG and GBM
# We will add clinical data as annotation of the samples
# we will sort the clinical data to have the same order of the DNA methylation matrix
clinical.ordered <- clinical[match(substr(colnames(sig.met),1,12),clinical$bcr_patient_barcode),]

ta <- HeatmapAnnotation(
  df = clinical.ordered[, c("primary_diagnosis", "gender", "vital_status", "race")],
  col = list(
    disease = c("LGG" = "grey", "GBM" = "black"),
    gender = c("male" = "blue", "female" = "pink")
  )
)

# row annotation: add the status for LGG in relation to GBM
# For exmaple: status.gbm.lgg Hypomethyated means that the
# mean DNA methylation of probes for lgg are hypomethylated
# compared to GBM ones.
ra = rowAnnotation(
  df = dmc[probes, status.col],
  col = list(
    "status.TCGA.GBM.TCGA.LGG" =
      c(
        "Hypomethylated" = "orange",
        "Hypermethylated" = "darkgreen"
      )
  ),
  width = unit(1, "cm")
)

heatmap  <- Heatmap(
  matrix = assay(sig.met),
  name = "DNA methylation",
  col = matlab::jet.colors(200),
  show_row_names = FALSE,
  cluster_rows = TRUE,
  cluster_columns = FALSE,
  show_column_names = FALSE,
  bottom_annotation = ta,
  column_title = "DNA Methylation"
) 
# Save to pdf
png("heatmap.png",width = 600, height = 400)
draw(heatmap, annotation_legend_side =  "bottom")
dev.off()

Motif analysis

Motif discovery is the attempt to extract small sequence signals hidden within largely non-functional intergenic sequences. These small sequence nucleotide signals (6-15 bp) might have a biological significance as they can be used to control the expression of genes. These sequences are called Regulatory motifs. The Bioconductor package rGADEM (Droit et al. 2015; Li 2009) provides an efficient de novo motif discovery algorithm for large-scale genomic sequence data.

The user may be interested in looking for unique signatures in the regions defined by ‘differentially methylated’ to identify candidate transcription factors that could bind to these elements affected by the accumulation or absence of DNA methylation. For this analysis we use a sequence of 100 bases before and after the probe location. An object will be returned which contains all relevant information about your motif analysis (i.e., sequence consensus, PWM, chromosome, p-value, etc).

Using Bioconductor package motifStack (Ou et al. 2013) it is possible to generate a graphic representation of multiple motifs with different similarity scores.

library(rGADEM)
library(BSgenome.Hsapiens.UCSC.hg19)
library(motifStack)
library(SummarizedExperiment)
library(dplyr)

probes <- rowRanges(met)[rownames(dmc)[grep("hypo|hyper",dmc$status,ignore.case = TRUE)],]

# Get hypo/hyper methylated probes and make a 200bp window 
# surrounding each probe.
sequence <- GRanges(
  seqnames = as.character(seqnames(probes)),
  IRanges(
    start = ranges(probes) %>% as.data.frame() %>% dplyr::pull("start") - 100,
    end = ranges(probes) %>% as.data.frame() %>% dplyr::pull("end") + 100), 
  strand = "*"
)
#look for motifs
gadem <- GADEM(sequence, verbose = FALSE, genome = Hsapiens)

top 3 4, 5-mers: 20 40 60 top 3 4, 5-mers: 20 40 60 top 3 4, 5-mers: 20 40 60 top 3 4, 5-mers: 20 40 60

# How many motifs were found?
nMotifs(gadem)

[1] 3

# get the number of occurrences
nOccurrences(gadem)

[1] 268 183 137

# view all sequences consensus
consensus(gadem)

[1] “nGsnGGGGsnGGrGssnGGGs” “nAAAAAnrArAn” “nCCCAGGsmn”

# Print motif
pwm <- getPWM(gadem)
pfm  <- new("pfm",mat = pwm[[1]],name = "Novel Site 1")
plotMotifLogo(pfm)

# Number of instances of motif 1?
length(gadem@motifList[[1]]@alignList)

[1] 268

Integrative (Epigenomic & Transcriptomic) analysis

Recent studies have shown that providing a deep integrative analysis can aid researchers in identifying and extracting biological insight from high throughput data (Phillips 2008; Shi et al. 2014; Rhodes and Chinnaiyan 2005). In this section, we will introduce a Bioconductor package called ELMER to identify regulatory enhancers using gene expression + DNA methylation data + motif analysis. In addition, we show how to integrate the results from the previous sections with important epigenomic data derived from both the ENCODE and Roadmap.

ChIP-seq analysis

ChIP-seq is used primarily to determine how transcription factors and other chromatin-associated proteins influence phenotype-affecting mechanisms. Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding many biological processes and disease states. The aim is to explore significant overlap datasets for inferring co-regulation or transcription factor complex for further investigation. A summary of the association of each histone mark is shown in the table below.

Histone marks	Role
Histone H3 lysine 4 trimethylation (H3K4me3)	Promoter regions (Heintzman et al. 2007, @bernstein2005genomic)
Histone H3 lysine 4 monomethylation (H3K4me1)	Enhancer regions (Heintzman et al. 2007)
Histone H3 lysine 36 trimethylation (H3K36me3)	Transcribed regions
Histone H3 lysine 27 trimethylation (H3K27me3)	Polycomb repression (Bonasio, Tu, and Reinberg 2010)
Histone H3 lysine 9 trimethylation (H3K9me3)	Heterochromatin regions (Peters et al. 2003)
Histone H3 acetylated at lysine 27 (H3K27ac)	Increase activation of genomic elements (Heintzman et al. 2009, @rada2011unique, @creyghton2010histone)
Histone H3 lysine 9 acetylation (H3K9ac)	Transcriptional activation (Nishida et al. 2006)

Besides, ChIP-seq data exists in the ROADMAP database and can be obtained through the AnnotationHub package (T. D. Morgan M Carlson M and S., n.d.) or from Roadmap web portal. The table below shows the description of all the roadmap files that are available through AnnotationHub.

File	Description
fc.signal.bigwig	Bigwig File containing fold enrichment signal tracks
pval.signal.bigwig	Bigwig File containing -log10(p-value) signal tracks
hotspot.fdr0.01.broad.bed.gz	Broad domains on enrichment for DNase-seq for consolidated epigenomes
hotspot.broad.bed.gz	Broad domains on enrichment for DNase-seq for consolidated epigenomes
broadPeak.gz	Broad ChIP-seq peaks for consolidated epigenomes
gappedPeak.gz	Gapped ChIP-seq peaks for consolidated epigenomes
narrowPeak.gz	Narrow ChIP-seq peaks for consolidated epigenomes
hotspot.fdr0.01.peaks.bed.gz	Narrow DNasePeaks for consolidated epigenomes
hotspot.all.peaks.bed.gz	Narrow DNasePeaks for consolidated epigenomes
.macs2.narrowPeak.gz	Narrow DNasePeaks for consolidated epigenomes
coreMarks_mnemonics.bed.gz	15 state chromatin segmentations
mCRF_FractionalMethylation.bigwig	MeDIP/MRE(mCRF) fractional methylation calls
RRBS_FractionalMethylation.bigwig	RRBS fractional methylation calls
WGBS_FractionalMethylation.bigwig	Whole genome bisulphite fractional methylation calls

After obtaining the ChIP-seq data, we can then identify overlapping regions with the regions identified in the starburst plot. The narrowPeak files are the ones selected for this step. For a complete pipeline with Chip-seq data, Bioconductor provides excellent tutorials to work with ChIP-seq and we encourage our readers to review the following article (Aleksandra Pekowska 2015). The first step is shown in listing below is to download the chip-seq data. The function query received as argument the annotationHub database (ah) and a list of keywords to be used for searching the data, EpigenomeRoadmap is selecting the roadmap database, consolidated is selecting only the consolidate epigenomes, brain is selecting the brain samples, E068 is one of the epigenomes for the brain (keywords can be seen in the summary table) and narrowPeak is selecting the type of file. The data downloaded is a processed data from an integrative Analysis of 111 reference human epigenomes (Kundaje et al. 2015).

library(ChIPseeker)
library(pbapply)
library(ggplot2)

#------------------ Working with ChipSeq data ---------------
# Step 1: download histone marks for a brain and non-brain samples.
#------------------------------------------------------------
# loading annotation hub database
library(AnnotationHub)
ah = AnnotationHub()

# Searching for brain consolidated epigenomes in the roadmap database
bpChipEpi_brain <- query(ah , c("EpigenomeRoadMap", "narrowPeak", "chip", "consolidated","brain","E068"))
# Get chip-seq data
histone.marks <- pblapply(names(bpChipEpi_brain), function(x) {ah[[x]]})
names(histone.marks) <- names(bpChipEpi_brain) 
# OBS: histone.marks is available in TCGAWorkflowData package

The Chipseeker package (Yu, Wang, and He 2015) implements functions that use Chip-seq data to retrieve the nearest genes around the peak, to annotate genomic region of the peak, among others. Also, it provides several visualization functions to summarize the coverage of the peak, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS and overlap of peaks or genes.

After downloading the histone marks, it is useful to verify the average profile of peaks binding to hypomethylated and hypermethylated regions, which will help the user understand better the regions found. Listing below shows an example of code to plot the average profile.

To help the user better understand the regions found in the DMR analysis, we downloaded histone marks specific to brain tissue using the AnnotationHub package that can access the Roadmap database. Next, the Chipseeker was used to visualize how histone modifications are enriched onto hypomethylated and hypermethylated regions, (listing below). The enrichment heatmap and the average profile of peaks binding to those regions.

data(histoneMarks)
# Create a GR object based on the hypo/hypermethylated probes.
probes <- keepStandardChromosomes(
  rowRanges(met)[rownames(dmc)[dmc$status %in% c("Hypermethylated in TCGA-GBM", "Hypomethylated in TCGA-GBM")],]
)
# Defining a window of 3kbp - 3kbp_probe_3kbp
# to make it work with ChIPseeker package version "1.31.3.900"
attributes(probes)$type <- "start_site"
attributes(probes)$downstream <- 3000
attributes(probes)$upstream <- 3000
probes <- GenomicRanges::resize(probes,6001,fix = "center") 

### Profile of ChIP peaks binding to TSS regions
# First of all, to calculate the profile of ChIP peaks binding to TSS regions, we should
# prepare the TSS regions, which are defined as the flanking sequence of the TSS sites.
# Then align the peaks that are mapping to these regions and generate the tagMatrix.
tagMatrixList <- pbapply::pblapply(histone.marks, function(x) {
  getTagMatrix(keepStandardChromosomes(x), windows = probes, weightCol = "score")
})
# change names retrieved with the following command: basename(bpChipEpi_brain$title)
names(tagMatrixList) <- c("H3K4me1","H3K4me3", "H3K9ac", "H3K9me3", "H3K27ac",  "H3K27me3", "H3K36me3")

To plot the enrichment heatmap use the function tagHeatmap

tagHeatmap(tagMatrixList)

To plot the average profile of peaks binding to those region use plotAvgProf:

p <- plotAvgProf(tagMatrixList, xlim = c(-3000,3000), xlab = "Genomic Region (5'->3', centered on CpG)")
# We are centreing in the CpG instead of the TSS. So we'll change the labels manually
p <- p + scale_x_continuous(
  breaks = c(-3000,-1500,0,1500,3000),
  labels = c(-3000,-1500,"CpG",1500,3000)
)

library(ggthemes)
p + theme_few() + scale_colour_few(name = "Histone marks") +  guides(colour = guide_legend(override.aes = list(size=4)))

plotting figure… 2024-04-30 11:28:07

The hypomethylated and hypermethylated regions are enriched for H3K4me3, H3K9ac, H3K27ac, and H3K4me1 which indicates regions of enhancers, promoters and increased activation of genomic elements. However, these regions are associated neither with transcribed regions nor Polycomb repression as the H3K36me3 and H3K27me3 heatmaps do not show an enrichment nearby the position 0, and the average profile also does not show a peak at position 0.

Enhancer Linking by Methylation/Expression Relationship

Recently, many studies suggest that enhancers play a major role as regulators of cell-specific phenotypes leading to alteration in transcriptomes related to diseases (Giorgio et al. 2015; Gröschel et al. 2014; Sur et al. 2012; Lijing Yao, Berman, and Farnham 2015). In order to investigate regulatory enhancers that can be located at long distances upstream or downstream of target genes Bioconductor offer the Enhancer Linking by Methylation/Expression Relationship (ELMER) package. This package is designed to combine DNA methylation and gene expression data from human tissues to infer multi-level cis-regulatory networks. It uses DNA methylation to identify enhancers and correlates their state with the expression of nearby genes to identify one or more transcriptional targets. Transcription factor (TF) binding site analysis of enhancers is coupled with expression analysis of all TFs to infer upstream regulators. This package can be easily applied to TCGA public available cancer datasets and custom DNA methylation and gene expression data sets (L Yao et al. 2015, @ChedraouiSilva148726).

ELMER analysis have 5 main steps:

Identify distal probes on HM450K or EPIC.
Identification of distal probes with significant differential DNA methylation (i.e. DMCs) in tumor vs. normal samples.
Identification of putative target gene(s) for differentially methylated distal probes.
Identification of enriched motifs within a set of probes in significant probe-gene pairs.
Identification of master regulator Transcription Factors (TF) for each enriched motif.

This section shows how to use ELMER to analyze TCGA data using as example LGG and GBM samples.

Preparing the data for ELMER package

The listing below shows how to use TCGAbiolinks (Colaprico et al. 2016) to search, download and prepare the data for the ELMER package. Due to time and memory constraints, we will use in this example only data from 10 LGG patients and 10 GBM patients that have both DNA methylation and gene expression data. These samples are the same used in the previous steps.

To perform ELMER analyses, we need to populate a MultiAssayExperiment with a DNA methylation matrix or SummarizedExperiment object from HM450K or EPIC platform; a gene expression matrix or SummarizedExperiment object for the same samples; a matrix mapping DNA methylation samples to gene expression samples; and a matrix with sample metadata (i.e. clinical data, molecular subtype, etc.). If TCGA data are used, the last two matrices will be automatically generated. If using non-TCGA data, the matrix with sample metadata should be provided with at least a column with a patient identifier and another one identifying its group which will be used for analysis, if samples in the methylation and expression matrices are not ordered and with same names, a matrix mapping for each patient identifier their DNA methylation samples and their gene expression samples should be provided to the createMAE function. Based on the genome of reference selected, metadata for the DNA methylation probes, such as genomic coordinates, will be added from http://zwdzwd.github.io/InfiniumAnnotation (Zhou, Laird, and Shen 2016); and metadata for gene expression and annotation is added from Ensembl database (Yates et al. 2015) using biomaRt (Durinck et al. 2009).

#----------- 8.3 Identification of Regulatory Enhancers   -------
library(TCGAbiolinks)
# Samples: primary solid tumor w/ DNA methylation and gene expression
lgg.samples <- matchedMetExp("TCGA-LGG", n = 10)
gbm.samples <- matchedMetExp("TCGA-GBM", n = 10)
samples <- c(lgg.samples,gbm.samples)

#-----------------------------------
# 1 - Methylation
# ----------------------------------
query_met <- GDCquery(
  project = c("TCGA-LGG","TCGA-GBM"),
  data.category = "DNA Methylation",
  platform = "Illumina Human Methylation 450",
  data.type = "Methylation Beta Value",
  barcode = samples
)
GDCdownload(query_met)
met <- GDCprepare(query_met, save = FALSE)
met <- subset(met,subset = as.character(GenomicRanges::seqnames(met)) %in% c("chr9"))

#-----------------------------------
# 2 - Expression
# ----------------------------------
query_exp <- GDCquery(
  project = c("TCGA-LGG","TCGA-GBM"),
  data.category = "Gene expression",
  data.type = "Gene expression quantification",
  platform = "Illumina HiSeq", 
  file.type  = "results", 
  legacy = TRUE, 
  barcode =  samples
)
GDCdownload(query_exp)
exp <- GDCprepare(query_exp, save = FALSE)
save(exp, met, gbm.samples, lgg.samples, file = "elmer.example.rda", compress = "xz")

Probes from HumanMethylationEPIC (EPIC) array and Infinium HumanMethylation450 (HM450) array are removed from the analysis if they have either internal SNPs close to the \(3'\) end of the probe; non-unique mapping to the bisulfite-converted genome; or off-target hybridization due to partial overlap with non-unique elements (Zhou, Laird, and Shen 2017). This probe metadata information is included in ELMER.data package, populated from the source file at http://zwdzwd.github.io/InfiniumAnnotation (Zhou, Laird, and Shen 2017). To limit ELMER to the analysis of distal elements, probes located in regions of \(\pm2 kb\) around transcription start sites (TSSs) were removed. The distal elements are retrieved using the get.feature.probe function.

library(ELMER)
library(MultiAssayExperiment)
library(GenomicRanges)
distal.probes <- get.feature.probe(
  genome = "hg19", 
  met.platform = "450K"
)

# Recover the data created in the last step
data(elmerExample)
rownames(exp) <- values(exp)$ensembl_gene_id
mae <- createMAE(
  exp = assay(exp), 
  met = met,
  save = TRUE,
  linearize.exp = TRUE,
  save.filename = "mae.rda",
  filter.probes = distal.probes,
  met.platform = "450K",
  genome = "hg19",
  TCGA = TRUE
)
mae

## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 2:
##  [1] DNA methylation: RangedSummarizedExperiment with 2970 rows and 20 columns
##  [2] Gene expression: RangedSummarizedExperiment with 20153 rows and 20 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save data to flat files

ELMER analysis

After preparing the data into an MAE object, we executed the five ELMER steps for the hypo direction, which means where are searching for distal enhancer probes hypomethylated in the GBM group compared to the LGG group. The code is shown below. A description of how these distal enhancer probes are identified is found in the ELMER.data vignette. As the number of samples is small we loosened some of the cut-off values.

cores <- 1 # you can use more cores if you want
group.col <- "project_id"
group1 <- "TCGA-GBM"
group2 <- "TCGA-LGG"

# Available directions are hypo and hyper, we will use only hypo
# due to speed constraint
direction <- "hypo"
dir.out <- paste0("elmer/",direction)
dir.create(dir.out, showWarnings = FALSE, recursive = TRUE)

Supervised and unsupervised analysis mode

ELMER is designed to identify differences between two sets of samples within a given dataset. But, before we start our analysis, it is better to understand the two analysis mode available in ELMER 2.0 (supervised and unsupervised mode) which affects how the samples are selected for each comparison.

In original version (L Yao et al. 2015), the first step - identification of differentially methylated CpG probes (DMCs) - was hard-coded to identify DMCs between non-cancer vs. cancer samples, and the subsequent step was unsupervised, identifying changes within any subset of tumors. In ELMER 2.0, we generalize these strategies so that they are applicable to any paired dataset, including disease vs. healthy tissue for any disease type.

In the Unsupervised mode, as in ELMER 1.0, it is assumed that at least one of our groups is heterogenous (i.e Breast Invasive Carcinoma primary solid tumors samples might belong to basal,Luminal A, Her2 molecular subtypes), for that reason for each comparison samples from two most extreme quintiles based on DNA methylation levels were used, generating both the \(M\) (methylated) and the \(U\) (unmethylated) groups. In the Supervised mode, it is assumed that our groups are homogenous, for that reason the \(U\) and \(M\) groups are defined strictly by sample group labels, and all samples in each group are used (Silva et al. 2018).

The next steps we will perform ELMER analysis using the supervised mode, which will use all samples from GBM and compare to all LGG samples in all analysis.

Identification of differentially methylated CpGs (DMCs)

For each distal probe, samples of each group (group 1 and group 2) are ranked by their DNA methylation beta values, those samples in the lower quintile (20% samples with the lowest methylation levels) of each group are used to identify if the probe is hypomethylated in group 1 compared to group 2, using an unpaired one-tailed t-test. The 20% is a parameter to the diff.meth function called minSubgroupFrac. For the (ungrouped) cancer case, this is set to 20% as in (L Yao et al. 2015), because we typically wanted to be able to detect a specific molecular subtype among the tumor samples; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power (high enough sample numbers to yield t-test p-values that could overcome multiple hypothesis corrections, yet low enough to be able to capture changes in individual molecular subtypes occurring in 20% or more of the cases.) This number can be set arbitrarily as an input to the diff.meth function and should be tuned based on sample sizes in individual studies.

As we are dealing with a low number of samples, and we are comparing two cancer types the minSubgroupFrac parameter is set to 100% to use all samples in this analysis.

#--------------------------------------
# STEP 3: Analysis                     |
#--------------------------------------
# Step 3.1: Get diff methylated probes |
#--------------------------------------
message("Get diff methylated probes")
Sig.probes <- get.diff.meth(
  data = mae, 
  group.col = group.col,
  group1 = group1,
  group2 =  group2,
  minSubgroupFrac = 1.0, # Use all samples
  sig.dif = 0.2, # defualt is 0.3
  diff.dir = direction, # Search for hypomethylated probes in group 1
  cores = cores, 
  dir.out = dir.out, 
  pvalue = 0.1
)

datatable(
  data = Sig.probes[1:10,], 
  options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
  rownames = TRUE
)

Identification of putative target gene(s) for differentially methylated distal probes

For each differentially methylated distal probe (DMC), the closest 10 upstream genes and the closest 10 downstream genes are tested for inverse correlation between methylation of the probe and expression of the gene (the number 10 can be changed using the numFlankingGenes parameter). To select these genes, the probe-gene distance is defined as the distance from the probe to the transcription start site specified by the ENSEMBL gene level annotations (Yates et al. 2015) accessed via the R/Bioconductor package biomaRt (Durinck et al. 2009, @durinck2005biomart). By choosing a constant number of genes to test for each probe, our goal is to avoid systematic false positives for probes in gene-rich regions. This is especially important given the highly non-uniform gene density of mammalian genomes. Thus, exactly 20 statistical tests were performed for each probe, as follows.

For each probe-gene pair, the samples (all samples from both groups) are divided into two groups: the M group, which consisted of the upper methylation quintile (the 20% of samples with the highest methylation at the enhancer probe), and the U group, which consists of the lowest methylation quintile (the 20% of samples with the lowest methylation.) The 20% ile cutoff is a configurable parameter minSubgroupFrac in the get.pair function. As with its usage in the diff.meth function, the default value of 20% is a balance, allowing for the identification of changes in a molecular subtype making up a minority (i.e. 20%) of cases, while also yielding enough statistical power to make strong predictions. For larger sample sizes or other experimental designs, this could be set even lower.

For each candidate probe-gene pair, the Mann-Whitney U test is used to test the null hypothesis that overall gene expression in group M is greater than or equal than that in group U. This non-parametric test was used in order to minimize the effects of expression outliers, which can occur across a very wide dynamic range. For each probe-gene pair tested, the raw p-value \(P_r\) is corrected for multiple hypothesis using a permutation approach as follows. The gene in the pair is held constant, and x random methylation probes are chosen to perform the same one-tailed U test, generating a set of x permutation p-values \(P_p\). We chose the x random probes only from among those that were “distal” (farther than \(2kb\) from an annotated transcription start site), in order to draw these null-model probes from the same set as the probe being tested (???). An empirical p-value \(P_e\) value was calculated using the following formula (which introduces a pseudo-count of 1):

\[\begin{equation} P_e = \frac{num(P_p \leq P_r)+ 1}{x+1} \end{equation}\]

Notice that in the Supervised mode, no additional filtering is necessary to ensure that the M and U group segregate by sample group labels. The two sample groups are segregated by definition, since these probes were selected for their differential methylation, with the same directionality, between the two groups.

In our example, we will reduce the number of closest genes evaluated to 5 (the 2 closest upstream genes and the closest 2 downstream genes). The mode is set to “supervised” which will set group 1 (GBM samples) as the \(U\) group, and the group 2 (LGG samples) as \(M\) group. Also, to make the example faster, the number of permutation is reduced to 5 (permu.size = 5), the raw p-value cut-off is loosened to 0.1 and the empirical p-value to 0.5, but it is recommended in a real analysis to use the default values.

#-------------------------------------------------------------
# Step 3.2: Identify significant probe-gene pairs            |
#-------------------------------------------------------------
# Collect nearby 20 genes for Sig.probes
message("Get nearby genes")
nearGenes <- GetNearGenes(
  data = mae,
  numFlankingGenes = 4, # default is 20 genes
  probes = Sig.probes$probe
)

length(Sig.probes$probe)

## [1] 293

dim(nearGenes)

## [1] 1172    5

head(nearGenes)

ID	GeneID	Symbol	Distance	Side
cg00049440	ENSG00000198887	SMC5	-56838	L2
cg00049440	ENSG00000119138	KLF9	0	L1
cg00049440	ENSG00000083067	TRPM3	117335	R1
cg00049440	ENSG00000135048	TMEM2	1271638	R2
cg00352576	ENSG00000188523	C9orf171	-5492	L2
cg00352576	ENSG00000125492	BARHL1	3374	L1

message("Get anti correlated probes-genes")
pair <- get.pair(
  data = mae,
  group.col = group.col,
  group1 = group1,
  group2 =  group2,
  nearGenes = nearGenes,
  mode = "supervised",
  minSubgroupFrac = 1, # % of samples to use in to create groups U/M
  raw.pvalue = 0.1,   # defualt is 0.001
  Pe = 0.5, # Please set to 0.001 to get significant results
  filter.probes = TRUE, # See preAssociationProbeFiltering function
  filter.percentage = 0.05,
  save = FALSE, # Create CVS file
  filter.portion = 0.3,
  dir.out = dir.out,
  diff.dir = direction,
  cores = cores,
  label = direction
)

datatable(
  pair[1:10,], 
  options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
  rownames = TRUE
)

Motif enrichment analysis

To identify enriched motifs and potential upstream regulatory TFs, HOCOMOCO v11 (Kulakovskiy et al. 2016) TF binding models were used as input for HOMER (Heinz et al. 2010) to find motif occurrences in a \(\pm 250bp\) region around each probe from EPIC and HM450 arrays. Transcription factor (TF) binding models are available at http://hocomoco.autosome.ru/downloads (using the HOMER specific format with threshold score levels corresponding to p-value \(>1^{-4}\)).

A motif enrichment analysis using Fisher’s exact test and Benjamini-Hochberg multiple hypothesis testing correction is performed using as correction background all distal probes.

A probe set was considered significantly enriched for a particular motif if the 95% confidence interval of the Odds Ratio was greater than \(1.1\) (specified by option lower.OR, \(1.1\) is default), and the motif occurred at least 10 times (specified by option min.incidence, \(10\) is default) in the probe set.

#-------------------------------------------------------------
# Step 3.3: Motif enrichment analysis on the selected probes |
#-------------------------------------------------------------
enriched.motif <- get.enriched.motif(
  data = mae,
  probes = pair$Probe, 
  dir.out = dir.out, 
  label = direction,
  pvalue = 1, # default is FDR < 0.05
  min.incidence = 10,
  lower.OR = 1.1
)
# One of the output from the  previous function is a file with the motif, OR and Number of probes
# It will be used for plotting purposes
motif.enrichment <- read.csv(paste0(dir.out,"/getMotif.",direction, ".motif.enrichment.csv"))

head(enriched.motif[names(enriched.motif)[1]]) ## probes in the given set that have the first motif.

## $ZN524_HUMAN.H11MO.0.D
##  [1] "cg13787850" "cg14159539" "cg13857678" "cg13493766" "cg02602455"
##  [6] "cg14560364" "cg13562069" "cg26787801" "cg14262955" "cg14121282"
## [11] "cg13663116" "cg21210721" "cg12819393" "cg14186948" "cg14223856"
## [16] "cg00505502" "cg13753488" "cg10962166" "cg13595495" "cg14567085"
## [21] "cg26756625" "cg04681525" "cg14061378" "cg19264651" "cg00938819"
## [26] "cg14103872" "cg03916189" "cg13512951" "cg06752260"

motif.enrichment %>% head %>% gt::gt()

motif	NumOfProbes	PercentageOfProbes	lowerOR	upperOR	OR	p.value	FDR	TF.family	TF.subfamily	TF.family.member	TF.subfamily.member
ZN524_HUMAN.H11MO.0.D	29	0.4264706	1.505958	4.261003	2.546084	0.0003546740	0.1699217	More than 3 adjacent zinc finger factors{2.3.3}	ZNF524-like factors{2.3.3.15}	BCL6;BCL6B;CTCF;CTCFL;FEZF1;FEZF2;GFI1;GFI1B;GLI1;GLI2;GLI3;GLI4;GLIS1;GLIS2;GLIS3;HKR1;MTF1;MYNN;MZF1;OSR2;OVOL1;OVOL2;PLAG1;PLAGL1;PLAGL2;PRDM1;PRDM14;PRDM6;SCRT1;SCRT2;SNAI1;SNAI2;SNAI3;WT1;YY1;YY2;ZBTB12;ZBTB14;ZBTB18;ZBTB20;ZBTB26;ZBTB42;ZBTB45;ZBTB47;ZBTB48;ZBTB49;ZBTB6;ZBTB7A;ZBTB7B;ZBTB7C;ZFP14;ZFP2;ZFP28;ZFP30;ZFP37;ZFP42;ZFP64;ZFP69;ZFP69B;ZFP82;ZFP91;ZFX;ZIC1;ZIC2;ZIC3;ZIC4;ZIC5;ZIK1;ZIM3;ZKSCAN1;ZKSCAN2;ZKSCAN3;ZKSCAN4;ZNF121;ZNF124;ZNF133;ZNF136;ZNF138;ZNF14;ZNF140;ZNF143;ZNF146;ZNF148;ZNF155;ZNF157;ZNF160;ZNF169;ZNF175;ZNF177;ZNF18;ZNF180;ZNF181;ZNF2;ZNF20;ZNF212;ZNF213;ZNF214;ZNF221;ZNF222;ZNF223;ZNF224;ZNF225;ZNF226;ZNF227;ZNF229;ZNF230;ZNF232;ZNF233;ZNF234;ZNF235;ZNF24;ZNF25;ZNF250;ZNF257;ZNF26;ZNF260;ZNF263;ZNF264;ZNF268;ZNF274;ZNF276;ZNF28;ZNF280A;ZNF280B;ZNF280C;ZNF280D;ZNF281;ZNF282;ZNF283;ZNF284;ZNF285;ZNF286A;ZNF286B;ZNF3;ZNF30;ZNF300;ZNF302;ZNF317;ZNF32;ZNF320;ZNF322;ZNF324;ZNF324B;ZNF329;ZNF331;ZNF333;ZNF33A;ZNF33B;ZNF343;ZNF345;ZNF347;ZNF350;ZNF354A;ZNF354B;ZNF362;ZNF366;ZNF383;ZNF384;ZNF394;ZNF397;ZNF398;ZNF404;ZNF41;ZNF410;ZNF419;ZNF420;ZNF431;ZNF432;ZNF436;ZNF439;ZNF44;ZNF440;ZNF442;ZNF443;ZNF446;ZNF449;ZNF45;ZNF460;ZNF468;ZNF479;ZNF484;ZNF490;ZNF500;ZNF502;ZNF524;ZNF525;ZNF528;ZNF543;ZNF544;ZNF546;ZNF547;ZNF548;ZNF549;ZNF554;ZNF555;ZNF557;ZNF558;ZNF559;ZNF561;ZNF562;ZNF563;ZNF564;ZNF566;ZNF567;ZNF568;ZNF57;ZNF570;ZNF571;ZNF572;ZNF577;ZNF581;ZNF582;ZNF583;ZNF585A;ZNF586;ZNF589;ZNF595;ZNF599;ZNF600;ZNF605;ZNF607;ZNF611;ZNF613;ZNF614;ZNF615;ZNF616;ZNF619;ZNF620;ZNF621;ZNF625;ZNF627;ZNF649;ZNF652;ZNF653;ZNF665;ZNF667;ZNF669;ZNF670;ZNF672;ZNF679;ZNF680;ZNF683;ZNF689;ZNF692;ZNF701;ZNF705D;ZNF705E;ZNF705G;ZNF708;ZNF709;ZNF71;ZNF710;ZNF713;ZNF721;ZNF727;ZNF729;ZNF736;ZNF75A;ZNF75D;ZNF76;ZNF763;ZNF764;ZNF765;ZNF768;ZNF77;ZNF771;ZNF773;ZNF774;ZNF776;ZNF777;ZNF780A;ZNF780B;ZNF782;ZNF785;ZNF799;ZNF805;ZNF808;ZNF81;ZNF813;ZNF816;ZNF823;ZNF829;ZNF836;ZNF841;ZNF844;ZNF845;ZNF846;ZNF85;ZNF853;ZNF860;ZNF878;ZNF891;ZNF99;ZSCAN16;ZSCAN2;ZSCAN22;ZSCAN23;ZSCAN29;ZSCAN31;ZSCAN32;ZSCAN4;ZSCAN5A;ZSCAN5B;ZSCAN5C;ZSCAN9;ZXDA;ZXDB;ZXDC	ZNF524
EHF_HUMAN.H11MO.0.B	21	0.3088235	1.445554	4.440354	2.572671	0.0008670141	0.1699217	Ets-related factors{3.5.2}	EHF-like factors{3.5.2.4}	EHF;ELF1;ELF2;ELF3;ELF4;ELF5;ELK1;ELK3;ELK4;ERF;ERG;ETS1;ETS2;ETV1;ETV2;ETV3;ETV3L;ETV4;ETV5;ETV6;ETV7;FEV;FLI1;GABPA;SPDEF;SPI1;SPIB;SPIC	EHF;ELF3;ELF5
ELF2_HUMAN.H11MO.0.C	31	0.4558824	1.429042	4.007392	2.400113	0.0006700769	0.1699217	Ets-related factors{3.5.2}	Elf-1-like factors{3.5.2.3}	EHF;ELF1;ELF2;ELF3;ELF4;ELF5;ELK1;ELK3;ELK4;ERF;ERG;ETS1;ETS2;ETV1;ETV2;ETV3;ETV3L;ETV4;ETV5;ETV6;ETV7;FEV;FLI1;GABPA;SPDEF;SPI1;SPIB;SPIC	ELF1;ELF2
ELF5_HUMAN.H11MO.0.A	27	0.3970588	1.412208	4.045152	2.407506	0.0008815652	0.1699217	Ets-related factors{3.5.2}	EHF-like factors{3.5.2.4}	EHF;ELF1;ELF2;ELF3;ELF4;ELF5;ELK1;ELK3;ELK4;ERF;ERG;ETS1;ETS2;ETV1;ETV2;ETV3;ETV3L;ETV4;ETV5;ETV6;ETV7;FEV;FLI1;GABPA;SPDEF;SPI1;SPIB;SPIC	EHF;ELF3;ELF5
PLAL1_HUMAN.H11MO.0.D	35	0.5147059	1.339891	3.735913	2.235187	0.0014715503	0.2198588	More than 3 adjacent zinc finger factors{2.3.3}	PLAG factors{2.3.3.25}	BCL6;BCL6B;CTCF;CTCFL;FEZF1;FEZF2;GFI1;GFI1B;GLI1;GLI2;GLI3;GLI4;GLIS1;GLIS2;GLIS3;HKR1;MTF1;MYNN;MZF1;OSR2;OVOL1;OVOL2;PLAG1;PLAGL1;PLAGL2;PRDM1;PRDM14;PRDM6;SCRT1;SCRT2;SNAI1;SNAI2;SNAI3;WT1;YY1;YY2;ZBTB12;ZBTB14;ZBTB18;ZBTB20;ZBTB26;ZBTB42;ZBTB45;ZBTB47;ZBTB48;ZBTB49;ZBTB6;ZBTB7A;ZBTB7B;ZBTB7C;ZFP14;ZFP2;ZFP28;ZFP30;ZFP37;ZFP42;ZFP64;ZFP69;ZFP69B;ZFP82;ZFP91;ZFX;ZIC1;ZIC2;ZIC3;ZIC4;ZIC5;ZIK1;ZIM3;ZKSCAN1;ZKSCAN2;ZKSCAN3;ZKSCAN4;ZNF121;ZNF124;ZNF133;ZNF136;ZNF138;ZNF14;ZNF140;ZNF143;ZNF146;ZNF148;ZNF155;ZNF157;ZNF160;ZNF169;ZNF175;ZNF177;ZNF18;ZNF180;ZNF181;ZNF2;ZNF20;ZNF212;ZNF213;ZNF214;ZNF221;ZNF222;ZNF223;ZNF224;ZNF225;ZNF226;ZNF227;ZNF229;ZNF230;ZNF232;ZNF233;ZNF234;ZNF235;ZNF24;ZNF25;ZNF250;ZNF257;ZNF26;ZNF260;ZNF263;ZNF264;ZNF268;ZNF274;ZNF276;ZNF28;ZNF280A;ZNF280B;ZNF280C;ZNF280D;ZNF281;ZNF282;ZNF283;ZNF284;ZNF285;ZNF286A;ZNF286B;ZNF3;ZNF30;ZNF300;ZNF302;ZNF317;ZNF32;ZNF320;ZNF322;ZNF324;ZNF324B;ZNF329;ZNF331;ZNF333;ZNF33A;ZNF33B;ZNF343;ZNF345;ZNF347;ZNF350;ZNF354A;ZNF354B;ZNF362;ZNF366;ZNF383;ZNF384;ZNF394;ZNF397;ZNF398;ZNF404;ZNF41;ZNF410;ZNF419;ZNF420;ZNF431;ZNF432;ZNF436;ZNF439;ZNF44;ZNF440;ZNF442;ZNF443;ZNF446;ZNF449;ZNF45;ZNF460;ZNF468;ZNF479;ZNF484;ZNF490;ZNF500;ZNF502;ZNF524;ZNF525;ZNF528;ZNF543;ZNF544;ZNF546;ZNF547;ZNF548;ZNF549;ZNF554;ZNF555;ZNF557;ZNF558;ZNF559;ZNF561;ZNF562;ZNF563;ZNF564;ZNF566;ZNF567;ZNF568;ZNF57;ZNF570;ZNF571;ZNF572;ZNF577;ZNF581;ZNF582;ZNF583;ZNF585A;ZNF586;ZNF589;ZNF595;ZNF599;ZNF600;ZNF605;ZNF607;ZNF611;ZNF613;ZNF614;ZNF615;ZNF616;ZNF619;ZNF620;ZNF621;ZNF625;ZNF627;ZNF649;ZNF652;ZNF653;ZNF665;ZNF667;ZNF669;ZNF670;ZNF672;ZNF679;ZNF680;ZNF683;ZNF689;ZNF692;ZNF701;ZNF705D;ZNF705E;ZNF705G;ZNF708;ZNF709;ZNF71;ZNF710;ZNF713;ZNF721;ZNF727;ZNF729;ZNF736;ZNF75A;ZNF75D;ZNF76;ZNF763;ZNF764;ZNF765;ZNF768;ZNF77;ZNF771;ZNF773;ZNF774;ZNF776;ZNF777;ZNF780A;ZNF780B;ZNF782;ZNF785;ZNF799;ZNF805;ZNF808;ZNF81;ZNF813;ZNF816;ZNF823;ZNF829;ZNF836;ZNF841;ZNF844;ZNF845;ZNF846;ZNF85;ZNF853;ZNF860;ZNF878;ZNF891;ZNF99;ZSCAN16;ZSCAN2;ZSCAN22;ZSCAN23;ZSCAN29;ZSCAN31;ZSCAN32;ZSCAN4;ZSCAN5A;ZSCAN5B;ZSCAN5C;ZSCAN9;ZXDA;ZXDB;ZXDC	PLAG1;PLAGL1
GSC_HUMAN.H11MO.0.D	7	0.1029412	1.330059	8.042670	3.542114	0.0063899599	0.3284439	Paired-related HD factors{3.1.3}	GSC{3.1.3.9}	ALX1;ALX3;ALX4;ARGFX;ARX;CRX;DMBX1;DPRX;DRGX;DUX4;DUXA;ESX1;GSC;GSC2;HESX1;ISX;LEUTX;MIXL1;NOBOX;OTP;OTX1;OTX2;PHOX2A;PHOX2B;PITX1;PITX2;PITX3;PROP1;PRRX1;PRRX2;RAX;RAX2;RHOXF1;RHOXF2;SEBOX;SHOX;SHOX2;TPRX1;UNCX;VSX1;VSX2	GSC2;GSC

Identification of master regulator TFs

When a group of enhancers is coordinately altered in a specific sample subset, this is often the result of an altered upstream master regulator transcription factor in the gene regulatory network. ELMER tries to identify such transcription factors corresponding to each of the TF binding motifs enriched from the previous analysis step. For each enriched motif, ELMER takes the average DNA methylation of all distal probes (in significant probe-gene pairs) that contain that motif occurrence (within a \(\pm 250bp\) region) and compares this average DNA methylation to the expression of each gene annotated as a human TF.

In the Unsupervised mode, a statistical test is performed for each motif-TF pair, as follows. All samples are divided into two groups: the \(M\) group, which consists of the 20% of samples with the highest average methylation at all motif-adjacent probes, and the \(U\) group, which consisted of the 20% of samples with the lowest methylation. This step is performed by the get.TFs function, which takes minSubgroupFrac as an input parameter, again with a default of 20%. For each candidate motif-TF pair, the Mann-Whitney U test is used to test the null hypothesis that overall gene expression in group \(M\) is greater or equal than that in group \(U\). This non-parametric test was used in order to minimize the effects of expression outliers, which can occur across a very wide dynamic range. For each motif tested, this results in a raw p-value (\(P_r\)) for each of the human TFs.

All TFs are ranked by their \(-log_{10}(Pr)\) values, and those falling within the top 5% of this ranking were considered candidate upstream regulators. The best upstream TFs which are known to recognize to specific binding motif are automatically extracted as putative regulatory TFs, and rank ordered plots are created to visually inspect these relationships, as shown in the example below. Because the same motif can be recognized by many transcription factors of the same binding domain family, we define these relationships at both the family and subfamily classification level using the classifications from TFClass database (Wingender, Schoeps, and Dönitz 2013).

The Supervised mode uses the same approach as described for the identification of putative target gene(s) step. The \(U\) and \(M\) groups are one of the the label group of samples and the minSubgroupFrac parameter is set to 100% to use all samples from both groups in the statistical test.

#-------------------------------------------------------------
# Step 3.4: Identifying regulatory TFs                        |
#-------------------------------------------------------------
TF <- get.TFs(
  data = mae, 
  group.col = group.col,
  group1 = group1,
  group2 =  group2,
  mode = "supervised",
  enriched.motif = enriched.motif,
  dir.out = dir.out, 
  cores = cores,
  save.plots = FALSE,
  diff.dir = direction,
  label = direction
)

# One of the output from the previous function is a file with the raking of TF,
# for each motif. It will be used for plotting purposes
TF.meth.cor <- get(load(paste0(dir.out,"/getTF.",direction,".TFs.with.motif.pvalue.rda")))

The output of this step is a data frame with the following columns:

motif: enriched motif name.
top_5percent_TFs: the top 5% TFs ranked
potential.TFs.family: TF from the “top_5percent_TFs” that belongs to the same family as the TF of the motif,
top.potential.TFs.family is the highest ranked TF belonging to the same family as the TF of the motif (same as the first TF from potential.TFs.family column).
potential.TFs.subfamily and top.potential.TFs.subfamily are the same as potential.TFs.family and top.potential.TFs.family but considering the subfamily classification instead (you can check the classification at http://hocomoco11.autosome.ru/human/mono?full=true).

datatable(
  TF, 
  options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
  rownames = FALSE
)

datatable(
  TF.meth.cor[1:10,1:6], 
  options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
  rownames = TRUE
)

ELMER visualization functions

The pairs identified by ELMER can be visualized through a heatmap using the function heatmapPairs. The first heatmap shows the DNA methylation levels, while the second heatmap shows the gene expression levels of the target genes. The last heatmap shows the distance between the probe and the gene TSS.

heatmapPairs(
  data = mae, 
  group.col = group.col,
  group1 = group1, 
  group2 = group2, 
  annotation.col = c("gender"),
  pairs = pair,
  filename =  NULL
)

When ELMER Identifies the enriched motifs for the distal enhancer probes which are significantly differentially methylated and linked to putative target gene, it will plot the Odds Ratio (x-axis) for the each motif found.

The list of enriched motifs for the hypo direction (probes hypomethylated in GBM group compared to the LGG group) with lower OR >= 1.1 is found in the Figure below.

motif.enrichment.plot(
  motif.enrichment = motif.enrichment, 
  save = FALSE,
  significant = list(lowerOR = 1.1)
) # Filter motifs in the plot lowerOR > 1.3

After finding the enriched motifs, ELMER identifies regulatory transcription factors (TFs) whose expression is associated with DNA methylation at motifs. ELMER automatically creates a TF ranking plot for each enriched motif. This plot shows the TF ranking plots based on the association score \((-log(p-value))\) between TF expression and DNA methylation of the motif. This plot represents the rank of p-values of all TF, the top 3, the TF classified in the same family and sub-family according to TFClass database are highlighted. The dashed line represents the top 5% of all TF.

grid:TF.rank.plot(motif.pvalue=TF.meth.cor, motif=TF$motif[1], save=FALSE)

| 0%

|====================================================|100% ~0 s remaining
|====================================================|100% Completed after 0 s

Also, for each motif, we can take a look at the three most relevant TFs. For example, Figure below shows the average DNA methylation level of sites with the first motif plotted against the expression of some top TFs associated with it. We can see that the GBM samples have a lower average methylation level of sites with the motif plotted and a higher average expression of the TFs.

png("TF.png",width = 800, height = 400)
scatter.plot(
  data = mae, 
  category = group.col, 
  save = FALSE, 
  lm_line = TRUE,
  byTF = list(
    TF = unlist(stringr::str_split(TF[1,"top_5percent_TFs"],";"))[1:4], 
    probe = enriched.motif[[TF$motif[1]]]
  )
)
dev.off()

png 2

Conclusion

This workflow outlines how one can use specific Bioconductor packages for the analysis of cancer genomics and epigenomics data derived from the TCGA. In addition, we highlight the importance of using ENCODE and Roadmap data to inform on the biology of the non-coding elements defined by functional roles in gene regulation. We introduced TCGAbiolinks and RTCGAToolbox Bioconductor packages in order to illustrate how one can acquire TCGA specific data, followed by key steps for genomics analysis using GAIA package and maftools, for transcriptomic analysis using TCGAbiolinks, pathview packages and for DNA methylation analysis using TCGAbiolinks package. An inference of gene regulatory networks was also introduced by MINET package. Finally, we introduced Bioconductor packages AnnotationHub, ChIPSeeker, ComplexHeatmap, and ELMER to illustrate how one can acquire ENCODE/Roadmap data and integrate these data with the results obtained from analyzing TCGA data to identify and characterize candidate regulatory enhancers associated with cancer.

Session Information

pander::pander(sessionInfo(), compact = FALSE)

R version 4.4.0 RC (2024-04-16 r86468)

Platform: x86_64-pc-linux-gnu

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_GB, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages:

grid
stats4
stats
graphics
grDevices
utils
datasets
methods
base

other attached packages:

sesameData(v.1.21.10)
ExperimentHub(v.2.11.3)
AnnotationHub(v.3.11.5)
BiocFileCache(v.2.11.2)
dbplyr(v.2.5.0)
MultiAssayExperiment(v.1.29.3)
ELMER(v.2.27.0)
ELMER.data(v.2.27.0)
pbapply(v.1.7-2)
ChIPseeker(v.1.39.0)
dplyr(v.1.1.4)
motifStack(v.1.47.1)
BSgenome.Hsapiens.UCSC.hg19(v.1.4.3)
rGADEM(v.2.51.0)
seqLogo(v.1.69.0)
BSgenome(v.1.71.4)
rtracklayer(v.1.63.3)
BiocIO(v.1.13.1)
Biostrings(v.2.71.6)
XVector(v.0.43.1)
ComplexHeatmap(v.2.19.0)
ggpubr(v.0.6.0)
ggplot2(v.3.5.1)
c3net(v.1.1.1.1)
igraph(v.2.0.3)
minet(v.3.61.0)
pathview(v.1.43.1)
clusterProfiler(v.4.11.1)
maftools(v.2.19.0)
SummarizedExperiment(v.1.33.3)
Biobase(v.2.63.1)
GenomicRanges(v.1.55.4)
GenomeInfoDb(v.1.39.14)
IRanges(v.2.37.1)
S4Vectors(v.0.41.7)
BiocGenerics(v.0.49.1)
MatrixGenerics(v.1.15.1)
matrixStats(v.1.3.0)
TCGAbiolinks(v.2.31.4)
DT(v.0.33)
TCGAWorkflowData(v.1.27.0)
TCGAWorkflow(v.1.29.0)
BiocStyle(v.2.31.0)

loaded via a namespace (and not attached):

R.methodsS3(v.1.8.2)
dichromat(v.2.0-0.1)
vroom(v.1.6.5)
progress(v.1.2.3)
nnet(v.7.3-19)
poweRlaw(v.0.80.0)
vctrs(v.0.6.5)
digest(v.0.6.35)
png(v.0.1-8)
shape(v.1.4.6.1)
BiocBaseUtils(v.1.5.2)
ggrepel(v.0.9.5)
deldir(v.2.0-4)
magick(v.2.8.3)
MASS(v.7.3-60.2)
reshape(v.0.8.9)
reshape2(v.1.4.4)
foreach(v.1.5.2)
qvalue(v.2.35.0)
withr(v.3.0.0)
xfun(v.0.43)
ggfun(v.0.1.4)
survival(v.3.6-4)
memoise(v.2.0.1)
gson(v.0.1.0)
systemfonts(v.1.0.6)
ragg(v.1.3.0)
KEGGgraph(v.1.63.0)
tidytree(v.0.4.6)
GlobalOptions(v.0.1.2)
gtools(v.3.9.5)
DNAcopy(v.1.77.0)
R.oo(v.1.26.0)
Formula(v.1.2-5)
prettyunits(v.1.2.0)
RTCGAToolbox(v.2.33.3)
KEGGREST(v.1.43.1)
promises(v.1.3.0)
httr(v.1.4.7)
EDASeq(v.2.37.0)
downloader(v.0.4)
rstatix(v.0.7.2)
restfulr(v.0.0.15)
ps(v.1.7.6)
rstudioapi(v.0.16.0)
archive(v.1.1.8)
UCSC.utils(v.0.99.7)
generics(v.0.1.3)
DOSE(v.3.29.2)
base64enc(v.0.1-3)
processx(v.3.8.4)
curl(v.5.2.1)
zlibbioc(v.1.49.3)
ggraph(v.2.2.1)
polyclip(v.1.10-6)
GenomeInfoDbData(v.1.2.12)
SparseArray(v.1.3.7)
xtable(v.1.8-4)
stringr(v.1.5.1)
ade4(v.1.7-22)
pracma(v.2.4.4)
doParallel(v.1.0.17)
evaluate(v.0.23)
S4Arrays(v.1.3.7)
hms(v.1.1.3)
bookdown(v.0.39)
colorspace(v.2.1-0)
filelock(v.1.0.3)
Rgraphviz(v.2.47.0)
magrittr(v.2.0.3)
readr(v.2.1.5)
later(v.1.3.2)
viridis(v.0.6.5)
ggtree(v.3.11.2)
lattice(v.0.22-6)
XML(v.3.99-0.16.1)
shadowtext(v.0.1.3)
cowplot(v.1.1.3)
Hmisc(v.5.1-2)
pillar(v.1.9.0)
nlme(v.3.1-164)
iterators(v.1.0.14)
pwalign(v.0.99.2)
caTools(v.1.18.2)
compiler(v.4.4.0)
stringi(v.1.8.3)
GenomicAlignments(v.1.39.5)
plyr(v.1.8.9)
crayon(v.1.5.2)
abind(v.1.4-5)
gridGraphics(v.0.5-1)
locfit(v.1.5-9.9)
org.Hs.eg.db(v.3.19.1)
graphlayouts(v.1.1.1)
bit(v.4.0.5)
chromote(v.0.2.0)
fastmatch(v.1.1-4)
textshaping(v.0.3.7)
codetools(v.0.2-20)
crosstalk(v.1.2.1)
bslib(v.0.7.0)
TxDb.Hsapiens.UCSC.hg19.knownGene(v.3.2.2)
biovizBase(v.1.51.0)
GetoptLong(v.1.0.5)
plotly(v.4.10.4)
mime(v.0.12)
RaggedExperiment(v.1.27.2)
splines(v.4.4.0)
circlize(v.0.4.16)
Rcpp(v.1.0.12)
TCGAbiolinksGUI.data(v.1.23.0)
HDO.db(v.0.99.1)
interp(v.1.1-6)
knitr(v.1.46)
blob(v.1.2.4)
utf8(v.1.2.4)
BiocVersion(v.3.19.1)
clue(v.0.3-65)
AnnotationFilter(v.1.27.0)
RJSONIO(v.1.3-1.9)
fs(v.1.6.4)
checkmate(v.2.3.1)
Gviz(v.1.47.1)
ggsignif(v.0.6.4)
ggplotify(v.0.1.2)
tibble(v.3.2.1)
Matrix(v.1.7-0)
statmod(v.1.5.0)
tzdb(v.0.4.0)
tweenr(v.2.0.3)
pkgconfig(v.2.0.3)
tools(v.4.4.0)
cachem(v.1.0.8)
RSQLite(v.2.3.6)
viridisLite(v.0.4.2)
rvest(v.1.0.4)
DBI(v.1.2.2)
fastmap(v.1.1.1)
rmarkdown(v.2.26)
scales(v.1.3.0)
gt(v.0.10.1)
Rsamtools(v.2.19.4)
broom(v.1.0.5)
sass(v.0.4.9)
patchwork(v.1.2.0)
BiocManager(v.1.30.22)
VariantAnnotation(v.1.49.7)
graph(v.1.81.1)
carData(v.3.0-5)
rpart(v.4.1.23)
farver(v.2.1.1)
mgcv(v.1.9-1)
tidygraph(v.1.3.1)
scatterpie(v.0.2.2)
yaml(v.2.3.8)
latticeExtra(v.0.6-30)
foreign(v.0.8-86)
ggthemes(v.5.1.0)
cli(v.3.6.2)
purrr(v.1.0.2)
lifecycle(v.1.0.4)
backports(v.1.4.1)
BiocParallel(v.1.37.1)
annotate(v.1.81.2)
gtable(v.0.3.5)
rjson(v.0.2.21)
limma(v.3.59.10)
parallel(v.4.4.0)
ape(v.5.8)
edgeR(v.4.1.33)
jsonlite(v.1.8.8)
TFBSTools(v.1.41.1)
bitops(v.1.0-7)
bit64(v.4.0.5)
yulab.utils(v.0.1.4)
matlab(v.1.0.4)
CNEr(v.1.39.1)
highr(v.0.10)
jquerylib(v.0.1.4)
GOSemSim(v.2.29.2)
R.utils(v.2.12.3)
lazyeval(v.0.2.2)
pander(v.0.6.5)
htmltools(v.0.5.8.1)
enrichplot(v.1.23.2)
GO.db(v.3.19.1)
rappdirs(v.0.3.3)
tinytex(v.0.50)
ensembldb(v.2.27.1)
glue(v.1.7.0)
TFMPvalue(v.0.0.9)
httr2(v.1.0.1)
RCurl(v.1.98-1.14)
treeio(v.1.27.1)
jpeg(v.0.1-10)
gridExtra(v.2.3)
boot(v.1.3-30)
R6(v.2.5.1)
tidyr(v.1.3.1)
gplots(v.3.1.3.1)
labeling(v.0.4.3)
GenomicFeatures(v.1.55.4)
cluster(v.2.1.6)
grImport2(v.0.3-1)
aplot(v.0.2.2)
DirichletMultinomial(v.1.45.0)
DelayedArray(v.0.29.9)
tidyselect(v.1.2.1)
plotrix(v.3.8-4)
ProtGenerics(v.1.35.4)
htmlTable(v.2.4.2)
ggforce(v.0.4.2)
xml2(v.1.3.6)
car(v.3.1-2)
AnnotationDbi(v.1.65.2)
munsell(v.0.5.1)
KernSmooth(v.2.23-22)
data.table(v.1.15.4)
websocket(v.1.4.1)
aroma.light(v.3.33.0)
htmlwidgets(v.1.6.4)
fgsea(v.1.29.2)
RColorBrewer(v.1.1-3)
hwriter(v.1.3.2.1)
biomaRt(v.2.59.1)
rlang(v.1.1.3)
ShortRead(v.1.61.4)
Cairo(v.1.6-2)
fansi(v.1.0.6)

Author contributions

HN conceived the study. HN, MC and GB provided direction on the design of the Transcriptomics, Genomics, master regulatory networks and DNA methylation workflows. TCS developed and tested sections “Experimental data”, “DNA methylation analysis”, “Motif analysis” and “Integrative analysis”. AC developed and tested section “Transcriptomic analysis”. CO developed and tested the section “Inference of gene regulatory networks”. FDA developed and tested section “Genomic analysis”. TCS, AC, CO, and FDA prepared the first draft of the manuscript. All authors were involved in the revision of the draft manuscript and have agreed to the final content. Also, AC, TS, CO, MC, GB, and HN are authors of the TCGAbiolinks package and MC is the author of the GAIA package.

Competing interests

No competing interests were disclosed.

Grant information

The project was supported by the São Paulo Research Foundation (FAPESP) (2015/02844-7 and 2016/01389-7 to T.C.S. & H.N. and 2015/07925-5 to H.N.), the BridgeIRIS project, funded by INNOVIRIS, Region de Bruxelles Capitale, Brussels, Belgium, and by GENomic profiling of Gastrointestinal Inflammatory-Sensitive CANcers (GENGISCAN), Belgian FNRS PDR (T100914F to G.B.). Funding for open access charge: São Paulo Research Foundation (FAPESP) (2015/07925-5).

Acknowledgements

We are grateful to all the authors of the packages used in this article. Also, we would like to thank The GDC Support Team for the provided help, which was necessary in order to update the TCGAbiolinks package to use GDC API.

References

Aleksandra Pekowska, Simon Anders. 2015. ChIP-Seq Analysis Basics.

Altay, Gökmen, and Frank Emmert-Streib. 2010. “Inferring the Conservative Causal Core of Gene Regulatory Networks.” BMC Systems Biology 4 (1): 132.

Bernstein, B. E., J. A. Stamatoyannopoulos, J. F. Costello, B. Ren, A. Milosavljevic, A. Meissner, M. Kellis, et al. 2010. “The NIH Roadmap Epigenomics Mapping Consortium.” Nat. Biotechnol. 28 (10): 1045–8.

Bernstein, Bradley E, Michael Kamal, Kerstin Lindblad-Toh, Stefan Bekiranov, Dione K Bailey, Dana J Huebert, Scott McMahon, et al. 2005. “Genomic Maps and Comparative Analysis of Histone Modifications in Human and Mouse.” Cell 120 (2): 169–81.

Bonasio, Roberto, Shengjiang Tu, and Danny Reinberg. 2010. “Molecular Signals of Epigenetic States.” Science 330 (6004): 612–16.

Bullard, James H, Elizabeth Purdom, Kasper D Hansen, and Sandrine Dudoit. 2010. “Evaluation of Statistical Methods for Normalization and Differential Expression in mRNA-Seq Experiments.” BMC Bioinformatics 11 (1): 94.

Ceccarelli, Michele, FlorisP. Barthel, TathianeM. Malta, ThaisS. Sabedot, SofieR. Salama, BradleyA. Murray, Olena Morozova, et al. 2016. “Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma.” Cell 164 (3): 550–63. https://doi.org/http://dx.doi.org/10.1016/j.cell.2015.12.028.

Colaprico, Antonio, Tiago C. Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S. Sabedot, et al. 2016. “TCGAbiolinks: An R/Bioconductor Package for Integrative Analysis of Tcga Data.” Nucleic Acids Research 44 (8): e71. https://doi.org/10.1093/nar/gkv1507.

Consortium, ENCODE Project, and others. 2011. “A User’s Guide to the Encyclopedia of Dna Elements (Encode).” PLoS Biol 9 (4): e1001046.

Creyghton, Menno P, Albert W Cheng, G Grant Welstead, Tristan Kooistra, Bryce W Carey, Eveline J Steine, Jacob Hanna, et al. 2010. “Histone H3k27ac Separates Active from Poised Enhancers and Predicts Developmental State.” Proceedings of the National Academy of Sciences 107 (50): 21931–6.

Davis, Caleb F, Christopher J Ricketts, Min Wang, Lixing Yang, Andrew D Cherniack, Hui Shen, Christian Buhay, et al. 2014. “The Somatic Genomic Landscape of Chromophobe Renal Cell Carcinoma.” Cancer Cell 26 (3): 319–30.

Deaton, Aimée M, and Adrian Bird. 2011. “CpG Islands and the Regulation of Transcription.” Genes & Development 25 (10): 1010–22.

Droit, A, R Gottardo, G Roberston, and L Li. 2015. “RGADEM: De Novo Motif Discovery.”

Durinck, Steffen, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma, and Wolfgang Huber. 2005. “BioMart and Bioconductor: A Powerful Link Between Biological Databases and Microarray Data Analysis.” Bioinformatics 21 (16): 3439–40.

Durinck, Steffen, Paul T Spellman, Ewan Birney, and Wolfgang Huber. 2009. “Mapping Identifiers for the Integration of Genomic Datasets with the R/Bioconductor Package biomaRt.” Nature Protocols 4 (8): 1184–91.

Faith, Jeremiah J, Boris Hayete, Joshua T Thaden, Ilaria Mogno, Jamey Wierzbowski, Guillaume Cottarel, Simon Kasif, James J Collins, and Timothy S Gardner. 2007. “Large-Scale Mapping and Validation of Escherichia Coli Transcriptional Regulation from a Compendium of Expression Profiles.” PLoS Biol 5 (1): e8.

Fingerman, I. M., L. McDaniel, X. Zhang, W. Ratzat, T. Hassan, Z. Jiang, R. F. Cohen, and G. D. Schuler. 2011. “NCBI Epigenomics: a new public resource for exploring epigenomic data sets.” Nucleic Acids Res. 39 (Database issue): D908–912.

Giorgio, Elisa, Daniel Robyr, Malte Spielmann, Enza Ferrero, Eleonora Di Gregorio, Daniele Imperiale, Giovanna Vaula, et al. 2015. “A Large Genomic Deletion Leads to Enhancer Adoption by the Lamin B1 Gene: A Second Path to Autosomal Dominant Leukodystrophy (Adld).” Human Molecular Genetics, ddv065.

Gröschel, Stefan, Mathijs A Sanders, Remco Hoogenboezem, Elzo de Wit, Britta AM Bouwman, Claudia Erpelinck, Vincent HJ van der Velden, et al. 2014. “A Single Oncogenic Enhancer Rearrangement Causes Concomitant Evi1 and Gata2 Deregulation in Leukemia.” Cell 157 (2): 369–81.

Hawkins, R. D., G. C. Hon, and B. Ren. 2010. “Next-generation genomics: an integrative approach.” Nat. Rev. Genet. 11 (7): 476–86.

Heintzman, Nathaniel D, Gary C Hon, R David Hawkins, Pouya Kheradpour, Alexander Stark, Lindsey F Harp, Zhen Ye, et al. 2009. “Histone Modifications at Human Enhancers Reflect Global Cell-Type-Specific Gene Expression.” Nature 459 (7243): 108–12.

Heintzman, Nathaniel D, Rhona K Stuart, Gary Hon, Yutao Fu, Christina W Ching, R David Hawkins, Leah O Barrera, et al. 2007. “Distinct and Predictive Chromatin Signatures of Transcriptional Promoters and Enhancers in the Human Genome.” Nature Genetics 39 (3): 311–18.

Heinz, Sven, Christopher Benner, Nathanael Spann, Eric Bertolino, Yin C Lin, Peter Laslo, Jason X Cheng, Cornelis Murre, Harinder Singh, and Christopher K Glass. 2010. “Simple Combinations of Lineage-Determining Transcription Factors Prime Cis-Regulatory Elements Required for Macrophage and B Cell Identities.” Molecular Cell 38 (4): 576–89.

Huber, Wolfgang, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21.

Kannan, Lavanya, Marcel Ramos, Angela Re, Nehme El-Hachem, Zhaleh Safikhani, Deena MA Gendoo, Sean Davis, et al. 2015. “Public Data and Open Source Tools for Multi-Assay Genomic Investigation of Disease.” Briefings in Bioinformatics, bbv080.

Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Anastasiia V Soboleva, Artem S Kasianov, Haitham Ashoor, Wail Ba-Alawi, et al. 2016. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44 (D1): D116–D125.

Kundaje, Anshul, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, et al. 2015. “Integrative Analysis of 111 Reference Human Epigenomes.” Nature 518 (7539): 317–30.

Li, Leping. 2009. “GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an Em Algorithm for Motif Discovery.” Journal of Computational Biology 16 (2): 317–29.

Luo, Weijun, and Cory Brouwer. 2013. “Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (14): 1830–1.

Marabita, Francesco, Malin Almgren, Maléne E Lindholm, Sabrina Ruhrmann, Fredrik Fagerström-Billai, Maja Jagodic, Carl J Sundberg, et al. 2013. “An Evaluation of Analysis Pipelines for Dna Methylation Profiling Using the Illumina Humanmethylation450 Beadchip Platform.” Epigenetics 8 (3): 333–46.

Margolin, Adam A, Ilya Nemenman, Katia Basso, Chris Wiggins, Gustavo Stolovitzky, Riccardo D Favera, and Andrea Califano. 2006. “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.” BMC Bioinformatics 7 (Suppl 1): S7.

Mayakonda, Anand, and H Phillip Koeffler. 2016. “Maftools: Efficient Analysis, Visualization and Summarization of Maf Files from Large-Scale Cohort Based Cancer Studies.” bioRxiv, 052662.

Mermel, Craig H, Steven E Schumacher, Barbara Hill, Matthew L Meyerson, Rameen Beroukhim, Gad Getz, and others. 2011. “GISTIC2. 0 Facilitates Sensitive and Confident Localization of the Targets of Focal Somatic Copy-Number Alteration in Human Cancers.” Genome Biol 12 (4): R41.

Meyer, Patrick E, Kevin Kontos, Frederic Lafitte, and Gianluca Bontempi. 2007. “Information-Theoretic Inference of Large Transcriptional Regulatory Networks.” EURASIP Journal on Bioinformatics and Systems Biology 2007: 8–8.

Meyer, Patrick E., Frédéric Lafitte, and Gianluca Bontempi. 2008. “Minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.” BMC Bioinformatics 9 (1): 1–10. https://doi.org/10.1186/1471-2105-9-461.

Montojo, J, K Zuberi, H Rodriguez, F Kazi, G Wright, S L Donaldson, Q Morris, and G D Bader. 2010. “GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop.” Bioinformatics 26 (22): 2927–8.

Morgan M, Hester J, Obenchain V, and Pagès H. n.d. “SummarizedExperiment: SummarizedExperiment Container. R Package Version 1.1.0.” http://bioconductor.org/packages/SummarizedExperiment/.

Morgan M, Tenenbaum D, Carlson M, and Arora S. n.d. “AnnotationHub: Client to Access Annotationhub Resources. R Package Version 2.2.2.” http://bioconductor.org/packages/Gviz/.

Network, Cancer Genome Atlas, and others. 2012a. “Comprehensive Molecular Characterization of Human Colon and Rectal Cancer.” Nature 487 (7407): 330–37.

———. 2012b. “Comprehensive Molecular Portraits of Human Breast Tumours.” Nature 490 (7418): 61–70.

———. 2015a. “Comprehensive Genomic Characterization of Head and Neck Squamous Cell Carcinomas.” Nature 517 (7536): 576–82.

———. 2015b. “Genomic Classification of Cutaneous Melanoma.” Cell 161 (7): 1681–96.

Network, Cancer Genome Atlas Research, and others. 2012. “Comprehensive Genomic Characterization of Squamous Cell Lung Cancers.” Nature 489 (7417): 519–25.

———. 2013. “Comprehensive Molecular Characterization of Clear Cell Renal Cell Carcinoma.” Nature 499 (7456): 43–49.

———. 2014a. “Comprehensive Molecular Characterization of Gastric Adenocarcinoma.” Nature 513 (7517): 202–9.

———. 2014b. “Comprehensive Molecular Characterization of Gastric Adenocarcinoma.” Nature 513 (7517): 202–9.

———. 2014c. “Comprehensive Molecular Profiling of Lung Adenocarcinoma.” Nature 511 (7511): 543–50.

———. 2014d. “Integrated Genomic Characterization of Papillary Thyroid Carcinoma.” Cell 159 (3): 676–90.

———. 2015. “The Molecular Taxonomy of Primary Prostate Cancer.” Cell 163 (4): 1011–25.

———. 2016. “Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma.” N Engl J Med 2016 (374): 135–45.

Nishida, Hiromi, Takahiro Suzuki, Shinji Kondo, Hisashi Miura, Yu-ichi Fujimura, and Yoshihide Hayashizaki. 2006. “Histone H3 Acetylated at Lysine 9 in Promoter Is Associated with Low Nucleosome Density in the Vicinity of Transcription Start Site in Human Cell.” Chromosome Research 14 (2): 203–11.

Ou, J, M Brodsky, S Wolfe, and LJ Zhu. 2013. “MotifStack: Plot Stacked Logos for Single or Multiple Dna, Rna and Amino Acid Sequence.”

Peters, Antoine HFM, Stefan Kubicek, Karl Mechtler, Roderick J O’Sullivan, Alwin AHA Derijck, Laura Perez-Burgos, Alexander Kohlmaier, et al. 2003. “Partitioning and Plasticity of Repressive Histone Methylation States in Mammalian Chromatin.” Molecular Cell 12 (6): 1577–89.

Phillips, Theresa. 2008. “The Role of Methylation in Gene Expression.” Nature Education 1 (1): 116.

Rada-Iglesias, Alvaro, Ruchi Bajpai, Tomek Swigut, Samantha A Brugmann, Ryan A Flynn, and Joanna Wysocka. 2011. “A Unique Chromatin Signature Uncovers Early Developmental Enhancers in Humans.” Nature 470 (7333): 279–83.

Rhodes, Daniel R, and Arul M Chinnaiyan. 2005. “Integrative Analysis of the Cancer Transcriptome.” Nature Genetics 37: S31–S37.

Risso, Davide, Katja Schwartz, Gavin Sherlock, and Sandrine Dudoit. 2011. “GC-Content Normalization for Rna-Seq Data.” BMC Bioinformatics 12 (1): 480.

Robertson, Keith D. 2005. “DNA Methylation and Human Disease.” Nature Reviews Genetics 6 (8): 597–610.

Samur, Mehmet Kemal. 2014. “RTCGAToolbox: A New Tool for Exporting Tcga Firehose Data.”

Shi, Xingjie, Jin Liu, Jian Huang, Yong Zhou, BenChang Shia, and Shuangge Ma. 2014. “Integrative Analysis of High-Throughput Cancer Studies with Contrasted Penalization.” Genetic Epidemiology 38 (2): 144–51.

Silva, TC, A Colaprico, C Olsen, F D’Angelo, G Bontempi, M Ceccarelli, and H Noushmehr. 2016. “TCGA Workflow: Analyze Cancer Genomics and Epigenomics Data Using Bioconductor Packages [Version 2; Referees: 1 Approved, 1 Approved with Reservations].” F1000Research 5 (1542). https://doi.org/10.12688/f1000research.8923.2.

Silva, Tiago C, Simon G Coetzee, Nicole Gull, Lijing Yao, Dennis J Hazelett, Houtan Noushmehr, De-Chen Lin, and Benjamin P Berman. 2018. “ELMER V.2: An R/Bioconductor Package to Reconstruct Gene Regulatory Networks from Dna Methylation and Transcriptome Profiles.” Bioinformatics, bty902. https://doi.org/10.1093/bioinformatics/bty902.

Stark, Chris, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. 2006. “BioGRID: A General Repository for Interaction Datasets.” Nucleic Acids Research 34 (suppl 1): D535–D539. https://doi.org/10.1093/nar/gkj109.

Sur, Inderpreet Kaur, Outi Hallikas, Anna Vähärautio, Jian Yan, Mikko Turunen, Martin Enge, Minna Taipale, Auli Karhu, Lauri A Aaltonen, and Jussi Taipale. 2012. “Mice Lacking a Myc Enhancer That Includes Human Snp Rs6983267 Are Resistant to Intestinal Tumors.” Science 338 (6112): 1360–3.

Weinstein, John N, Eric A Collisson, Gordon B Mills, Kenna R Mills Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, et al. 2013. “The Cancer Genome Atlas Pan-Cancer Analysis Project.” Nature Genetics 45 (10): 1113–20.

Wingender, Edgar, Torsten Schoeps, and Jürgen Dönitz. 2013. “TFClass: An Expandable Hierarchical Classification of Human Transcription Factors.” Nucleic Acids Research 41 (D1): D165–D170.

Yao, Lijing, Benjamin P Berman, and Peggy J Farnham. 2015. “Demystifying the Secret Mission of Enhancers: Linking Distal Regulatory Elements to Target Genes.” Critical Reviews in Biochemistry and Molecular Biology 50 (6): 550–73.

Yao, L, H Shen, PW Laird, PJ Farnham, and BP Berman. 2015. “Inferring Regulatory Element Landscapes and Transcription Factor Networks from Cancer Methylomes.” Genome Biology 16 (1): 105–5.

Yates, Andrew, Wasiu Akanni, M Ridwan Amode, Daniel Barrell, Konstantinos Billis, Denise Carvalho-Silva, Carla Cummins, et al. 2015. “Ensembl 2016.” Nucleic Acids Research, gkv1157.

Yu, Guangchuang, Li-Gen Wang, and Qing-Yu He. 2015. “ChIPseeker: An R/Bioconductor Package for Chip Peak Annotation, Comparison and Visualization.” Bioinformatics, btv145.

Z., Gu. n.d. “ComplexHeatmap: Making Complex Heatmaps. R Package Version 1.7.1.” https://github.com/jokergoo/ComplexHeatmap.

Zheng, Siyuan, Andrew D Cherniack, Ninad Dewal, Richard A Moffitt, Ludmila Danilova, Bradley A Murray, Antonio M Lerario, et al. 2016. “Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma.” Cancer Cell 29 (5): 723–36.

Zhou, Wanding, Peter W Laird, and Hui Shen. 2016. “Comprehensive Characterization, Annotation and Innovative Use of Infinium Dna Methylation Beadchip Probes.” Nucleic Acids Research, gkw967.

Zhou, Wanding, Peter W. Laird, and Hui Shen. 2017. “Comprehensive Characterization, Annotation and Innovative Use of Infinium Dna Methylation Beadchip Probes.” Nucleic Acids Research 45 (4): e22. https://doi.org/10.1093/nar/gkw967.