1 Installation and help

1.1 Install iPath

To install this package, start R (version > “4.0”) and enter:

if(!requireNamespace("BiocManager", quietly = TRUE))

1.2 Help for iPath

If you have any iPath-related questions, please post to the GitHub Issue section of iPath at https://github.com/suke18/iPath/issues, which will be helpful for the construction of iPath.

2 Introduction

2.1 Background

Identifying biomarkers to predict the clinical outcomes of individual patients is a fundamental problem in clinical oncology. Multiple single-gene biomarkers have already been identified and used in the clinics. However, multiple oncogenes or tumor-suppressor genes are involved during the process of tumorigenesis. Additionally, the efficacy of single-gene biomarkers is limited by the extensively variable expression levels measured by high-throughput assays. In this study, we hypothesize that in individual tumor samples, the disruption of transcription homeostasis in key pathways or gene set plays an important role in tumorigenesis and has profound implications for the patient’s clinical outcome. We devised a computational method named iPath to identify, at the individual sample level, which pathways or gene sets significantly deviate from their norms. We conducted a pan-cancer analysis and demonstrated that iPath is capable of identifying highly predictive biomarkers for clinical outcomes, including overall survival, tumor subtypes, and tumor stage classifications.

2.2 Citation

3 Calculate iES

iPath requires an normalized expression matrix with rows representing the genes and columns representing the samples. To preprocess the expression matrix, iPath filters out the genes depending on standard deviations (sd). Here, we sampled PRAD TCGA dataset for illustration. It is noted that iPath requires a gene set database (GSDB) as another input, which can be obtained by the MSigDB database.

3.1 Load the data

The PRAD_data dataset is loaded with three objects including the RPKM expression matrix (prad_exprs), corresponding phenotype information (prad_inds). prad_inds is the binary vector with 0 representing normal and 1 representing tumor sample, and simulated clinical dataset (prad_cli).


3.2 Calculate iES per sample per pathway

The core of iPath is to calculate the iES score for each patient and pathway. The function iES_cal2 requires two input an expression matrix and gene set database (GSDB). The returned matrix contains iES with rows corresponding to the pathways and columns corresponding to the samples.

iES_mat = iES_cal2(Y = prad_exprs, GSDB = GSDB_example)
iES_mat[1:2, 1:4]

4 Test association with survival outcomes

After computing iES matrix, it is important to investigate whether the classified normal-like and perturbed groups exist significance different in terms of survival outcomes. To perform the classifcaiton in tumor samples, we use normal sampels as reference by fiting a Guassian Mixture. The investigation is conducted for each individual pathway. iES_surv function inputs the iES matrix from the iES_cal2 step, the clinical data, and the binary vector indicating the patient phenotypes; for example, 0 represents normal sample and 1 represents tumor sample.

surv_outcomes = iES_surv(iES_mat = iES_mat, cli = prad_cli, indVec = prad_inds)
#> Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
#> Loglik converged before variable 1 ; coefficient may be infinite.
#>             nPerturb   c-index        coef      pval
#> SimPathway1        8 0.5418719  -0.6874565 0.5041147
#> SimPathway2        5 0.5615764 -18.2110687 0.1970832
#> SimPathway3       11 0.4802956  -0.2586764 0.7386278

5 Data visualization

5.1 waterfall

We also provide two forms of visualization for iES scores. One is the waterfall plot ranked from the smallest to the largest.

water_fall(iES_mat, gs_str = "SimPathway2", indVec = prad_inds)

density_fall(iES_mat, gs_str = "SimPathway2", indVec = prad_inds)