This vignette introduces the usage of the Bioconductor package iPath (individualized pathway analysis), which is capable of identifying highly predictive biomarkers for clinical outcomes. It includes two major steps: calculating the personalized iES for each sample and each pathway, and investigating whether stratified tumor samples are associated with clinical. Here, we introduce iPath, or individual-level pathway analysis, to quantify the magnitude of alteration occurring for a particular pathway at the individual sample level. Our goal is to understand cancer one tumor sample at a time outcomes.
To install this package, start R (version > “4.0”) and enter:
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("iPath")
If you have any iPath-related questions, please post to the GitHub Issue section of iPath at https://github.com/suke18/iPath/issues, which will be helpful for the construction of iPath.
Identifying biomarkers to predict the clinical outcomes of individual patients is a fundamental problem in clinical oncology. Multiple single-gene biomarkers have already been identified and used in the clinics. However, multiple oncogenes or tumor-suppressor genes are involved during the process of tumorigenesis. Additionally, the efficacy of single-gene biomarkers is limited by the extensively variable expression levels measured by high-throughput assays. In this study, we hypothesize that in individual tumor samples, the disruption of transcription homeostasis in key pathways or gene set plays an important role in tumorigenesis and has profound implications for the patient’s clinical outcome. We devised a computational method named iPath to identify, at the individual sample level, which pathways or gene sets significantly deviate from their norms. We conducted a pan-cancer analysis and demonstrated that iPath is capable of identifying highly predictive biomarkers for clinical outcomes, including overall survival, tumor subtypes, and tumor stage classifications.
iPath requires an normalized expression matrix with rows representing the genes and columns representing the samples. To preprocess the expression matrix, iPath filters out the genes depending on standard deviations (sd). Here, we sampled PRAD TCGA dataset for illustration. It is noted that iPath requires a gene set database (GSDB) as another input, which can be obtained by the MSigDB database.
PRAD_data dataset is loaded with three objects including the RPKM expression matrix (
prad_exprs), corresponding phenotype information (prad_inds).
prad_inds is the binary vector with 0 representing normal and 1 representing tumor sample, and simulated clinical dataset (
library(iPath) data(PRAD_data) dim(prad_exprs) data(GSDB_example) head(prad_cli)
The core of iPath is to calculate the iES score for each patient and pathway. The function iES_cal2 requires two input an expression matrix and gene set database (GSDB). The returned matrix contains iES with rows corresponding to the pathways and columns corresponding to the samples.
iES_mat = iES_cal2(Y = prad_exprs, GSDB = GSDB_example) iES_mat[1:2, 1:4]
After computing iES matrix, it is important to investigate whether the classified normal-like and perturbed groups exist significance different in terms of survival outcomes. To perform the classifcaiton in tumor samples, we use normal sampels as reference by fiting a Guassian Mixture. The investigation is conducted for each individual pathway.
iES_surv function inputs the iES matrix from the
iES_cal2 step, the clinical data, and the binary vector indicating the patient phenotypes; for example, 0 represents normal sample and 1 represents tumor sample.
surv_outcomes = iES_surv(iES_mat = iES_mat, cli = prad_cli, indVec = prad_inds) #> Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, : #> Loglik converged before variable 1 ; coefficient may be infinite. head(surv_outcomes) #> nPerturb c-index coef pval #> SimPathway1 8 0.5418719 -0.6874565 0.5041147 #> SimPathway2 5 0.5615764 -18.2110687 0.1970832 #> SimPathway3 11 0.4802956 -0.2586764 0.7386278
We also provide two forms of visualization for iES scores. One is the waterfall plot ranked from the smallest to the largest.
water_fall(iES_mat, gs_str = "SimPathway2", indVec = prad_inds)
density_fall(iES_mat, gs_str = "SimPathway2", indVec = prad_inds)
iES_survPlot(iES_mat = iES_mat, cli = prad_cli, gs_str = "SimPathway1", indVec = prad_inds, title = TRUE)