About This Document »

Package name TCGAWorkflow
Built with Bioconductor (R) 3.6 (3.4.2)
Last Built Wed, 20 Dec 2017 12:06:06 -0800
Last Modified Wed, 20 Dec 2017 10:38:53 -0800 (r132263)
Source Package TCGAWorkflow_0.99.85.tar.gz
Windows Binary TCGAWorkflow_0.99.85.zip
R Script TCGAWorkflow.R

To install this workflow under Bioconductor 3.6, start R and enter:

source("http://bioconductor.org/workflows.R")
workflowInstall("TCGAWorkflow")

TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

Tiago C. Silva, Antonio Colaprico, Catharina Olsen, Fulvio D’Angelo, Gianluca Bontempi, Michele Ceccarelli, and Houtan Noushmehr

Table of Contents
Abstract
Top

1 About

This workflow is based on the article: TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages (Silva et al. 2016). Due to time and space limitations, we downloaded only a subset of the data, for a real analysis please use all data available. The data used in the examples are available in the package TCGAWorkflowData.

Table of Contents
Loading packages
About

1.1 Installation

To be able to execute all the steps of this workflow please install it with the following code:

source("http://bioconductor.org/workflows.R")
workflowInstall("TCGAWorkflow")

Table of Contents
Installation
About

1.2 Loading packages

At the beginning of each section, the packages required to execute the code will be loaded. However, the following packages are required for all sections.

  • TCGAWorkflowData: this package contains the data necessary to execute each of the analysis steps. This is a subset of the downloaded to make the example faster. For a real analysis, please use all the data available.
  • DT: we will use it to visualize the results
library(TCGAWorkflowData)
library(DT)

Table of Contents
Introduction
About
Top

2 Abstract

Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer.

To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM).

All the package landing pages used in this workflow can be found through the biocViews interface.

Keywords: Epigenomics, Genomics, Cancer, non-coding, TCGA, ENCODE, Roadmap, Bioinformatics.

Table of Contents
Methods
Abstract
Top

3 Introduction

Cancer is a complex genetic disease spanning multiple molecular events such as point mutations, structural variations, translocations and activation of epigenetic and transcriptional signatures and networks. The effects of these events take place at different spatial and temporal scales with interlayer communications and feedback mechanisms creating a highly complex dynamic system. To gain insight into the biology of tumors most of the research in cancer genomics is aimed at the integration of the observations at multiple molecular scales and the analysis of their interplay. Even if many tumors share similar recurrent genomic events, their relationships with the observed phenotype are often not understood. For example, although we know that the majority of the most aggressive form of brain tumors such as glioma harbor the mutation of a single gene (IDH), the mechanistic explanation of the activation of its characteristic epigenetic and transcriptional signatures are still far to be well characterized. Moreover, network-based strategies have recently emerged as an effective framework for the discovery functional disease drivers that act as main regulators of cancer phenotypes. Here we describe a comprehensive workflow that integrates many Bioconductor packages in order to analyze and integrate the multiplicity of molecular observation layers in large-scale cancer dataset.

Indeed, recent technological developments allowed the deposition of large amounts of genomic and epigenomic data, such as gene expression, DNA methylation, and genomic localization of transcription factors, into freely available public international consortia like The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap) (Hawkins, Hon, and Ren 2010). An overview of the three consortia is described below:

  • The Cancer Genome Atlas (TCGA): The TCGA consortium, which is a National Institute of Health (NIH) initiative, makes publicly available molecular and clinical information for more than 30 types of human cancers including exome (variant analysis), single nucleotide polymorphism (SNP), DNA methylation, transcriptome (mRNA), microRNA (miRNA) and proteome. Sample types available at TCGA are: primary solid tumors, recurrent solid tumors, blood derived normal and tumor, metastatic, and solid tissue normal (Weinstein et al. 2013).

  • The Encyclopedia of DNA Elements (ENCODE): Found in 2003 by the National Human Genome Research Institute (NHGRI), the project aims to build a comprehensive list of functional elements that have an active role in the genome, including regulatory elements that govern gene expression. Biosamples include immortalized cell lines, tissues, primary cells and stem cells (Consortium and others 2011).

  • The NIH Roadmap Epigenomics Mapping Consortium: This was launched with the goal of producing a public resource of human epigenomic data in order to analyze biology and disease-oriented research. Roadmap maps DNA methylation, histone modifications, chromatin accessibility, and small RNA transcripts in stem cells and primary ex vivo tissues (Fingerman et al. 2011; Bernstein et al. 2010).

Briefly, these three consortia provide large-scale epigenomic data onto a variety of microarrays and next-generation sequencing (NGS) platforms. Each consortium encompasses specific types of biological information on a specific type of tissue or cell and when analyzed together, it provides an invaluable opportunity for research laboratories to better understand the developmental progression of normal cells to cancer state at the molecular level and importantly, correlate these phenotypes with tissue of origins.

Although there exists a wealth of possibilities (Kannan et al. 2015) in accessing cancer associated data, Bioconductor represents the most comprehensive set of open source, updated and integrated professional tools for the statistical analysis of large-scale genomic data. Thus, we propose our workflow within Bioconductor to describe how to download, process, analyze and integrate cancer data to understand specific cancer-related specific questions. However, there is no tool that solves the issue of integration in a comprehensive sequence and mutation information, epigenomic state and gene expression within the context of gene regulatory networks to identify oncogenic drivers and characterize altered pathways during cancer progression. Therefore, our workflow presents several Bioconductor packages to work with genomic and epigenomics data.

Table of Contents
Conclusion
Introduction
Top
Genomic analysis

4 Methods

4.1 Access to the data

TCGA data is accessible via the NCI Genomic Data Commons (GDC) data portal, GDC Legacy Archive and the Broad Institute’s GDAC Firehose. The GDC Data Portal provides access to the subset of TCGA data that has been harmonized against GRCh38 (hg38) using GDC Bioinformatics Pipelines which provides methods to the standardization of biospecimen and clinical data, the re-alignment of DNA and RNA sequence data against a common reference genome build GRCh38, and the generation of derived data. Whereas the GDC Legacy Archive provides access to an unmodified copy of data that was previously stored in CGHub(Wilks et al. 2014) and in the TCGA Data Portal hosted by the TCGA Data Coordinating Center (DCC), in which uses as references GRCh37 (hg19) and GRCh36 (hg18).

The previously stored data in CGHub, TCGA Data Portal and Broad Institute’s GDAC Firehose, were provided as different levels or tiers that were defined in terms of a specific combination of both processing level (raw, normalized, integrated) and access level (controlled or open access). Level 1 indicated raw and controlled data, level 2 indicated processed and controlled data, level 3 indicated Segmented or Interpreted Data and open access and level 4 indicated region of interest and open access data. While the TCGA data portal provided level 1 to 3 data, Firehose only provides level 3 and 4. An explanation of the different levels can be found at TCGA Wiki. However, the GDC data portal no longer uses this based classification model in levels. Instead, a new data model was created, its documentation can be found in GDC documentation. In this new model, data can be open or controlled access. While the GDC open access data does not require authentication or authorization to access it and generally includes high-level genomic data that is not individually identifiable, as well as most clinical and all biospecimen data elements, the GDC controlled access data requires dbGaP authorization and eRA Commons authentication and generally includes individually identifiable data such as low-level genomic sequencing data, germline variants, SNP6 genotype data, and certain clinical data elements. The process to obtain access to controlled data is found in GDC web site.

Finally, the data provided by GDC data portal and GDC Legacy Archive can be accessed using Bioconductor package TCGAbiolinks, while the data provided by Firehose can be accessed by Bioconductor package RTCGAToolbox.

The next steps describe how one could use TCGAbiolinks & RTCGAToolbox to download clinical, genomics, transcriptomics, epigenomics data, as well as subtype information and GISTIC results (i.e., identified genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth). All the data used in this workflow has as reference the Genome Reference Consortium human genome (build 37 - hg19).

Table of Contents
Downloading data from Broad TCGA GDAC
Access to the data

4.1.1 Downloading data from TCGA data portal

The Bioconductor package TCGAbiolinks (Colaprico et al. 2016) has three main functions GDCquery, GDCdownload and GDCprepare that should sequentially be used to respectively search, download and load the data as an R object.

GDCquery uses GDC API to search the data for a given project and data category and filters the results by samples, sample type, file type and others features if requested by the user. This function returns an object with a summary table with the results found (samples, files and other useful information) and the arguments used in the query. The most important GDCquery arguments are project which receives a GDC project (TCGA-USC, TCGA-LGG, TARGET-AML, etc), data.category which receives a data category (Transcriptome Profiling, Copy Number Variation, DNA methylation, Gene expression, etc), data.type which receives a data type (Gene expression quantification, Isoform Expression Quantification, miRNA Expression Quantification, Copy Number Segment, Masked Copy Number Segment, etc), workflow.type, which receives a GDC workflow type (HTSeq - Counts, HTSeq - FPKM-UQ, HTSeq - FPKM), legacy, which selects to use the legacy database or the harmonized database, file.type, which receives a file type for the searches in the legacy database (hg18.seg, hg19.seg, nocnv_,hg18.seg, nocnv_hg19.seg, rsem.genes.results, rsem.genes.normalized_results, etc) and platform, which receives a platform for the searches in the legacy database (HumanMethylation27, Genome_Wide_SNP_6, IlluminaHiSeq_RNASeqV2, etc). A complete list of possible entries for arguments can be found in the TCGAbiolinks vignette. Listing 1 shows an example of this function.

After the search step, the user will be able to download the data using the GDCdownload function which can use either the GDC API to download the samples, or the gdc client tools. The downloaded data will be saved in a directory with the project name and a sub-folder with the data.category, for example “TCGA-GBM/DNA_methylation”.

Finally, GDCprepare transforms the downloaded data into a summarizedExperiment object (Huber et al. 2015) or a data frame. If SummarizedExperiment is set to TRUE, TCGAbiolinks will add to the object sub-type information, which was defined by The Cancer Genome Atlas (TCGA) Research Network reports (the full list of papers can be seen in TCGAquery_subtype section in TCGAbiolinks vignette), and clinical information. Listing 1 shows how to use these functions to download DNA methylation and gene expression data from the GDC legacy database and 2 shows how to download copy number variation from harmonized data portal. Other examples, that access the harmonized data can be found in the TCGAbiolinks vignette.

library(TCGAbiolinks)
# Obs: The data in the legacy database has been aligned to hg19
query.met.gbm <- GDCquery(project = "TCGA-GBM", 
                          legacy = TRUE,
                          data.category = "DNA methylation",
                          platform = "Illumina Human Methylation 450", 
                          barcode = c("TCGA-76-4926-01B-01D-1481-05", "TCGA-28-5211-01C-11D-1844-05"))
GDCdownload(query.met.gbm)

met.gbm.450 <- GDCprepare(query = query.met.gbm,
                          save = TRUE, 
                          save.filename = "gbmDNAmet450k.rda",
                          summarizedExperiment = TRUE)

query.met.lgg <- GDCquery(project = "TCGA-LGG", 
                          legacy = TRUE,
                          data.category = "DNA methylation",
                          platform = "Illumina Human Methylation 450",
                          barcode = c("TCGA-HT-7879-01A-11D-2399-05", "TCGA-HT-8113-01A-11D-2399-05"))
GDCdownload(query.met.lgg)
met.lgg.450 <- GDCprepare(query = query.met.lgg,
                          save = TRUE, 
                          save.filename = "lggDNAmet450k.rda",
                          summarizedExperiment = TRUE)
met.gbm.lgg <- SummarizedExperiment::cbind(met.lgg.450, met.gbm.450)


query.exp.lgg <- GDCquery(project = "TCGA-LGG", 
                          legacy = TRUE,
                          data.category = "Gene expression",
                          data.type = "Gene expression quantification",
                          platform = "Illumina HiSeq", 
                          file.type = "results",
                          sample.type = "Primary solid Tumor")
GDCdownload(query.exp.lgg)
exp.lgg <- GDCprepare(query = query.exp.lgg, save = TRUE, save.filename = "lggExp.rda")

query.exp.gbm <- GDCquery(project = "TCGA-GBM", 
                          legacy = TRUE,
                          data.category = "Gene expression",
                          data.type = "Gene expression quantification",
                          platform = "Illumina HiSeq", 
                          file.type = "results",
                          sample.type = "Primary solid Tumor")
GDCdownload(query.exp.gbm)
exp.gbm <- GDCprepare(query = query.exp.gbm, save = TRUE, save.filename = "gbmExp.rda")
exp.gbm.lgg <- SummarizedExperiment::cbind(exp.lgg, exp.gbm)
#-----------------------------------------------------------------------------
#                   Data.category: Copy number variation aligned to hg38
#-----------------------------------------------------------------------------
query <- GDCquery(project = "TCGA-ACC",
                  data.category = "Copy Number Variation",
                  data.type = "Copy Number Segment",
                  barcode = c( "TCGA-OR-A5KU-01A-11D-A29H-01", "TCGA-OR-A5JK-01A-11D-A29H-01"))
GDCdownload(query)
data <- GDCprepare(query)

query <- GDCquery("TCGA-ACC",
                  "Copy Number Variation",
                  data.type = "Masked Copy Number Segment",
                  sample.type = c("Primary solid Tumor")) # see the barcodes with getResults(query)$cases
GDCdownload(query)
data <- GDCprepare(query)

If a SummarizedExperiment object was chosen, the data can be accessed with three different accessors: assay for the data information, rowRanges to gets the range of values in each row and colData to get the sample information (patient, batch, sample type, etc) (Huber et al. 2015; Morgan M and H., n.d.). An example is shown in listing below.

library(SummarizedExperiment)

# Load object from TCGAWorkflowData package
# THis object will be created in the further sections,
data(GBMIllumina_HiSeq) 

# get expression matrix
data <- assay(gbm.exp)
datatable(data[1:10,], 
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = TRUE)
# get genes information
genes.info <- rowRanges(gbm.exp)
genes.info
## GRanges object with 21022 ranges and 4 metadata columns:
##                          seqnames                 ranges strand |      gene_id
##                             <Rle>              <IRanges>  <Rle> |  <character>
##                   A1BG|1    chr19   [58856544, 58864865]      - |         A1BG
##                    A2M|2    chr12   [ 9220260,  9268825]      - |          A2M
##                   NAT1|9     chr8   [18027986, 18081198]      + |         NAT1
##                  NAT2|10     chr8   [18248755, 18258728]      + |         NAT2
##              SERPINA3|12    chr14   [95058395, 95090983]      + | RP11-986E7.7
##                      ...      ...                    ...    ... .          ...
##     NCRNA00182|100302692     chrX [ 73183790,  73513409]      - |          FTX
##   TMED7-TICAM2|100302736     chr5 [114914339, 114961858]      - | TMED7-TICAM2
##   TMED7-TICAM2|100302736     chr5 [114949205, 114968689]      - |        TMED7
##   TMED7-TICAM2|100302736     chr5 [114914339, 114961876]      - |       TICAM2
##   LOC100303728|100303728     chrX [118599997, 118603061]      - |  SLC25A5-AS1
##                          entrezgene ensembl_gene_id
##                           <numeric>     <character>
##                   A1BG|1          1 ENSG00000121410
##                    A2M|2          2 ENSG00000175899
##                   NAT1|9          9 ENSG00000171428
##                  NAT2|10         10 ENSG00000156006
##              SERPINA3|12         12 ENSG00000273259
##                      ...        ...             ...
##     NCRNA00182|100302692  100302692 ENSG00000230590
##   TMED7-TICAM2|100302736  100302736 ENSG00000251201
##   TMED7-TICAM2|100302736  100302736 ENSG00000134970
##   TMED7-TICAM2|100302736  100302736 ENSG00000243414
##   LOC100303728|100303728  100303728 ENSG00000224281
##                                                                    transcript_id.transcript_id_TCGA-28-1753-01A-01R-1850-01
##                                                                                                                 <character>
##                   A1BG|1                                                                              uc002qsd.3,uc002qsf.1
##                    A2M|2                                                                   uc001qvj.1,uc001qvk.1,uc009zgk.1
##                   NAT1|9 uc003wyq.2,uc003wyr.2,uc003wys.2,uc003wyt.2,uc003wyu.2,uc003wyv.2,uc010ltc.2,uc010ltd.2,uc011kyl.1
##                  NAT2|10                                                                                         uc003wyw.1
##              SERPINA3|12                       uc001ydo.3,uc001ydp.2,uc001ydq.2,uc001ydr.2,uc001yds.2,uc010avf.1,uc010avg.2
##                      ...                                                                                                ...
##     NCRNA00182|100302692                                                                              uc004ebr.1,uc010nlq.1
##   TMED7-TICAM2|100302736                                                                              uc003krd.2,uc003kre.2
##   TMED7-TICAM2|100302736                                                                              uc003krd.2,uc003kre.2
##   TMED7-TICAM2|100302736                                                                              uc003krd.2,uc003kre.2
##   LOC100303728|100303728                                                                              uc004ere.1,uc004erg.1
##   -------
##   seqinfo: 24 sequences from an unspecified genome; no seqlengths
# get sample information
sample.info <- colData(gbm.exp)
datatable(as.data.frame(sample.info), 
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)

The clinical data can be obtained using TCGAbiolinks through two methods. The first one will download only the indexed GDC clinical data which includes diagnoses (vital status, days to death, age at diagnosis, days to last follow up, days to recurrence), treatments (days to treatment, treatment id, therapeutic agents, treatment intent type), demographic (gender, race, ethnicity) and exposures (cigarettes per day, weight, height, alcohol history) information. This indexed clinical data can be obtained using the function GDCquery_clinical which can be used as described in listing below. This function has two arguments project (“TCGA-GBM”,“TARGET-AML”,etc) and type (“Clinical” or “Biospecimen”). The second method will download the XML files with all clinical data for the patient and retrieve the desired information from it. This will give access to all clinical data available which includes patient (tumor tissue site, histological type, gender, vital status, days to birth, days to last follow up, etc), drug (days to drug therapy start, days to drug therapy end, therapy types, drug name), radiation (days to radiation therapy start, days to radiation therapy end, radiation type, radiation dosage ), new tumor event (days to new tumor event after initial treatment, new neoplasm event type, additional pharmaceutical therapy), follow up (primary therapy outcome success, follow up treatment success, vital status, days to last follow up, date of form completion), stage event (pathologic stage), admin (batch number, project code, disease code, Biospecimen Core Resource).

# get indexed clinical patient data for GBM samples
gbm_clin <- GDCquery_clinic(project = "TCGA-GBM", type = "Clinical")

# get indexed clinical patient data for LGG samples
lgg_clin <- GDCquery_clinic(project = "TCGA-LGG", type = "Clinical")

# Bind the results, as the columns might not be the same,
# we will will plyr rbind.fill, to have all columns from both files
clinical <- plyr::rbind.fill(gbm_clin,lgg_clin)
datatable(clinical[1:10,], options = list(scrollX = TRUE, keys = TRUE), rownames = FALSE)
# Fetch clinical data directly from the clinical XML files.
# if barcode is not set, it will consider all samples.
# We only set it to make the example faster
query <- GDCquery(project = "TCGA-GBM",
                  data.category = "Clinical",
                  barcode = c("TCGA-08-0516","TCGA-02-0317")) 
GDCdownload(query)
clinical <- GDCprepare_clinic(query, clinical.info = "patient")
datatable(clinical, options = list(scrollX = TRUE, keys = TRUE), rownames = FALSE)
clinical.drug <- GDCprepare_clinic(query, clinical.info = "drug")
datatable(clinical.drug, options = list(scrollX = TRUE, keys = TRUE), rownames = FALSE)
clinical.radiation <- GDCprepare_clinic(query, clinical.info = "radiation")
datatable(clinical.radiation, options = list(scrollX = TRUE,  keys = TRUE), rownames = FALSE)
clinical.admin <- GDCprepare_clinic(query, clinical.info = "admin")
datatable(clinical.admin, options = list(scrollX = TRUE, keys = TRUE), rownames = FALSE)

Mutation information is stored in two types of Mutation Annotation Format (MAF): Protected and Somatic (or Public) MAF files, which are derived from the GDC annotated VCF files. Annotated VCF files often have variants reported on multiple transcripts whereas the protected MAF (*protected.maf) only reports the most critically affected one and the Somatic MAFs (*somatic.maf) are further processed to remove low quality and potential germline variants. To download Somatic MAFs data using TCGAbiolinks, GDCquery_maf function is provided (see listing below).

LGGmut <- GDCquery_Maf(tumor = "LGG", pipelines = "mutect2")
data(mafMutect2LGGGBM)
datatable(LGGmut[1:10,], options = list(scrollX = TRUE, keys = TRUE), rownames = FALSE)

Finally, the Cancer Genome Atlas (TCGA) Research Network has reported integrated genome-wide studies of various diseases (ACC (Zheng et al. 2016), BRCA (C. G. A. Network and others 2012b), COAD (C. G. A. Network and others 2012a), GBM (Ceccarelli et al. 2016), HNSC (C. G. A. Network and others 2015a), KICH (Davis et al. 2014), KIRC (Network and others 2013), KIRP (Network and others 2016), LGG (Ceccarelli et al. 2016), LUAD (Network and others 2014c), LUSC (Network and others 2012), PRAD (Network and others 2015), READ (C. G. A. Network and others 2012a), SKCM (C. G. A. Network and others 2015b), STAD (Network and others 2014a), THCA (Network and others 2014d) and UCEC (Network and others 2014b)) which classified them in different subtypes. This classification can be retrieved using the TCGAquery_subtype function or by accessing the samples information in the SummarizedExperiment object that created by the GDCprepare function, which automatically incorporates it into the object.

gbm.subtypes <- TCGAquery_subtype(tumor = "gbm")
datatable(gbm.subtypes[1:10,], options = list(scrollX = TRUE, keys = TRUE), rownames = FALSE)

Table of Contents
Downloading data from TCGA data portal
Access to the data

4.1.2 Downloading data from Broad TCGA GDAC

The Bioconductor package RTCGAToolbox (Samur 2014) provides access to Firehose Level 3 and 4 data through the function getFirehoseData. The following arguments allow users to select the version and tumor type of interest:

  • dataset - Tumor to download. A complete list of possibilities can be view with getFirehoseDatasets function.

  • runDate - Stddata run dates. Dates can be viewed with getFirehoseRunningDates function.

  • gistic2_Date - Analyze run dates. Dates can viewed with getFirehoseAnalyzeDates function.

These arguments can be used to select the data type to download: RNAseq_Gene, Clinic, miRNASeq_Gene, ccRNAseq2_Gene_Norm, CNA_SNP, CNV_SNP, CNA_Seq, CNA_CGH, Methylation, Mutation, mRNA_Array , miRNA_Array, and RPPA.

By default, RTCGAToolbox allows users to download up to 500 MB worth of data. To increase the size of the download, users are encouraged to use fileSizeLimit argument. An example is found in listing below. The getData function allows users to access the downloaded data (see lines 22-24 of listing below as an S4Vector object.

library(RTCGAToolbox)
# Get the last run dates
lastRunDate <- getFirehoseRunningDates()[1]

# get DNA methylation data, RNAseq2 and clinical data for GBM
gbm.data <- getFirehoseData(dataset = "GBM",
                            runDate = lastRunDate, gistic2_Date = getFirehoseAnalyzeDates(1),
                            Methylation = FALSE, Clinic = TRUE, 
                            RNAseq2_Gene_Norm = FALSE, Mutation = TRUE,
                            fileSizeLimit = 10000)

gbm.mut <- getData(gbm.data,"Mutations")
gbm.clin <- getData(gbm.data,"Clinical")

Finally, RTCGAToolbox can access level 4 data, which can be handy when the user requires GISTIC results. GISTIC is used to identify genes targeted by somatic copy-number alterations (SCNAs) (Mermel et al. 2011).

# Download GISTIC results
lastanalyzedate <- getFirehoseAnalyzeDates(1)
gistic <- getFirehoseData("GBM",gistic2_Date = lastanalyzedate)

# get GISTIC results
gistic.allbygene <- getData(gistic,type = "GISTIC", CN = "All")
gistic.thresholedbygene <- getData(gistic,type = "GISTIC", CN = "Thresholed")
data(GBMGistic)
datatable(gistic.allbygene,
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)
datatable(gistic.thresholedbygene,
          filter = 'top',
          options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
          rownames = FALSE)