The CaDrA package currently supports four scoring functions to search for subsets of genomic features that are likely associated with a specific outcome of interest (e.g., protein expression, pathway activity, etc.)

  1. Kolmogorov-Smirnov Method (ks)
  2. Conditional Mutual Information Method (revealer)
  3. Wilcoxon Rank-Sum Method (wilcox)
  4. Custom - An User Defined Scoring Method (custom)

Below, we run candidate_search() over the top 3 starting features using each of the four scoring functions described above.

Important Note:

1 Load packages


2 Load required datasets

  1. A binary features matrix also known as Feature Set (such as somatic mutations, copy number alterations, chromosomal translocations, etc.) The 1/0 row vectors indicate the presence/absence of ‘omics’ features in the samples. The Feature Set can be a matrix or an object of class SummarizedExperiment from SummarizedExperiment package)
  2. A vector of continuous scores (or Input Scores) representing a functional response of interest (such as protein expression, pathway activity, etc.)

# Load pre-computed feature set
# Load pre-computed input scores

3 Heatmap of simulated feature set

The simulated dataset, sim_FS, comprises of 1000 genomic features and 100 sample profiles. There are 10 left-skewed (i.e. True Positive or TP) and 990 uniformly-distributed (i.e. True Null or TN) features simulated in the dataset. Below is a heatmap of the first 100 features.

mat <- SummarizedExperiment::assay(sim_FS)
pheatmap::pheatmap(mat[1:100, ], color = c("white", "red"), cluster_rows = FALSE, cluster_cols = FALSE)