Intra-tumor heterogeneity (ITH) is now thought to be a key factor that results in the therapeutic failures and drug resistance, which have arose increasing attention in cancer research. Here, we present an R package, MesKit, for characterizing cancer genomic ITH and inferring the history of tumor evolutionary. MesKit provides a wide range of analyses including ITH evaluation, enrichment, signature, clone evolution analysis via implementation of well-established computational and statistical methods. The source code and documents are freely available through Github (https://github.com/Niinleslie/MesKit). We also developed a shiny application to provide easier analysis and visualization.
In R console, enter citation("MesKit")
.
MesKit: a tool kit for dissecting cancer evolution from multi-region derived tumor biopsies via somatic mutations (Submitted)
To analyze with MesKit, you need to provide:
*.maf / *.maf.gz
). RequiredNote: Patient_ID
and Tumor_Sample_Barcode
should be consistant in all input files, respectively.
Mutation Annotation Format (MAF) files are tab-delimited text files with aggregated mutations information from VCF Files. The input MAF file (or "*.maf.gz") of MesKit should have additional columns named Patient_ID
and Tumor_ID
on the basis of standard MAF files. Besides, as for the Variant_Classification
column, allowed values can be found at Mutation Annotation Format Page.
The following fields are required to be contained in the MAF files with MesKit.
Mandatory fields:
Hugo_Symbol
, Chromosome
, Start_Position
, End_Position
, Variant_Classification
, Variant_Type
, Reference_Allele
, Tumor_Seq_Allele2
, Ref_allele_depth
, Alt_allele_dept
, VAF
, Tumor_Sample_Barcode
, Patient_ID
, Tumor_ID
Note: Multi-region samples from the a single tumor are indicated with the same Tumor_ID
, such as “primary”, “metastasis” and “lymph”. In addition, values in the Hugo_Symbol
field are not necessarily from the HUGO database. Example MAF file
## Hugo_Symbol Chromosome Start_Position End_Position Variant_Classification
## 1 CFAP74 1 1880545 1880545 Intron
## 2 TFAP2A 6 10159520 10159520 IGR
## 3 IGSF21 1 18605309 18605309 Intron
## Variant_Type Reference_Allele Tumor_Seq_Allele2 Ref_allele_depth
## 1 SNP C A 16
## 2 SNP T A 29
## 3 SNP A C 144
## Alt_allele_depth VAF Tumor_Sample_Barcode Patient_ID Tumor_ID
## 1 3 0.1578 T1 HCC5647 Primary
## 2 17 0.3695 T1 HCC6952 Primary
## 3 19 0.1165 T4 HCC8031 Primary
By default, there are six mandatory fields in input CCF file: Patient_ID
, Tumor_Sample_Barcode
, Chromosome
, Start_Position
, CCF
and CCF_std
/CCF_CI_High
(required when identifying clonal/subclonal mutations). Chromosome
field of mafFile and ccfFile should be in format (both in number or both start with “chr”). Notably, if CCF files contain other variants apart from SNVs, Reference_Allele
and Tumor_Seq_Allele2
should also be included in the input CCF files.
Example CCF file
## Patient_ID Tumor_Sample_Barcode Chromosome Start_Position CCF
## 1 HCC5647 T4 22 43190575 0.6112993
## 2 HCC5647 T5 22 43190575 0.6239556
## 3 HCC5647 T3 22 43190575 0.5121414
## 4 HCC5647 T1 22 43190575 0.6891924
## 5 HCC5647 T4 5 178224614 0.7668806
## CCF_Std
## 1 0.19713556
## 2 0.19523997
## 3 0.17751275
## 4 0.21622254
## 5 0.09085722
The segmentation file is a tab-delimited file with the following 6 or 7 columns:
Patient_ID
- ID of a patientTumor_Sample_Barcode
- Tumor sample barcode of samplesChromosome
- chromosome name or IDStart_Position
- genomic start position of segments (1-indexed)End_Position
- genomic end position of segments (1-indexed)Segment_Mean/CopyNumber
- segment mean value or absolute integer copy numberNote: Positions are in base pair units.
Example Segmentation file
## Patient_ID Tumor_Sample_Barcode Chromosome Start_Position End_Position
## 1 HCC5647 T1 1 138488 6479452
## 2 HCC5647 T1 1 6504488 120906360
## 3 HCC5647 T1 1 144921930 157805992
## 4 HCC5647 T1 1 157809143 160394321
## 5 HCC5647 T1 1 160604266 165877230
## CopyNumber
## 1 2
## 2 2
## 3 6
## 4 2
## 5 8
Install the latest version of this package by typing the commands below in R console:
readMaf
function creates Maf/MafList objects by reading MAF files and cancer cell fraction (CCF) data (optional but recommended). Parameter refBuild
is used to set reference genome version for Homo sapiens reference ("hg18"
, "hg19"
or "hg38"
).
maf.File <- system.file("extdata/", "HCC_LDC.maf", package = "MesKit")
ccf.File <- system.file("extdata/", "HCC_LDC.ccf.tsv", package = "MesKit")
# Maf object generation
maf <- readMaf(mafFile = maf.File, refBuild = "hg19")
# Maf object with CCF information
maf <- readMaf(mafFile = maf.File,
ccfFile = ccf.File,
refBuild = "hg19")
In order to explore the genomic alterations during cancer progression with multi-region sequencing approach, we provided classifyMut
function to categorize mutations. The classification is based on shared pattern or clonal status (CCF data is required) of mutations, which can be specified by class
option. Additionally, classByTumor
can be used to reveal the mutational profile within tumors.
# Driver genes of CRC collected from [IntOGen](https://www.intogen.org/search) (v.2020.2)
driverGene.file <- system.file("extdata/", "IntOGen-DriverGenes_HC.tsv", package = "MesKit")
driverGene <- as.character(read.table(driverGene.file)$V1)
mut.class <- classifyMut(maf, class = "SP", patient.id = 'HCC8257')
head(mut.class)
plotMutProfile
function can visualize the mutational profile of tumor samples.
# specify the order of patients and samples via "patient.id" and "sampleOrder" respectively
patientOrder <- c("HCC5647", "HCC6690", "HCC6046")
sampleOrder <- list("HCC5647" = c("T1", "T4", "T5", "T3"))
plotMutProfile(
maf,
patient.id = patientOrder,
sampleOrder = sampleOrder,
class = "SP", geneList = driverGene)