iModMix is a novel network-based approach that integrates multi-omics data, including metabolomics, proteomics, and transcriptomics data, into a unified network to unveil associations across various layers of omics data and reveal significant associations with phenotypes of interest.
iModMix can incorporate both identified and unidentified metabolites, addressing a key limitation of existing methodologies.
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("iModMix")Note: to install development version:
iModMix includes an interactive Shiny application that allows users to explore multi-omics integration results in a user-friendly interface. The app can be launched locally from R or accessed online.
To launch the app from your R session, simply run:
This will open the app in your default web browser. Make sure the package is installed and loaded before running the command.
You can also explore the app without installing anything by visiting the hosted version: https://imodmix.moffitt.org/ This online version provides access to the same interactive features, including module visualization, enrichment analysis, and metadata exploration.
Below is an example showing the step-by-step analysis of two datasets through iModMix package: We used 24 normal and 52 tumor clear cell renal cell carcinoma (ccRCC, RC20 dataset) samples (Golkaram et al. 2022; Tang et al. 2023; Benedetti et al. 2023) as the example case study. (https://doi.org/10.5281/zenodo.13988161).
Input data:
# Load the package
library(iModMix)
# Get the path to the expression data file
pathMetabExp <- system.file("Example_data/ccRCC4_Data/Metab_exp.rds", package = "iModMix")
# Load the expression data
dataExp1 <- readRDS(pathMetabExp)
# Check the expression data
dataExp1[1:5, 1:5]## Feature_ID RC20.MSKC.00573 RC20.MSKC.00574 RC20.MSKC.00575 RC20.MSKC.00576
## 1 Metab373 19.03183 19.15170 18.94049 18.77380
## 2 Metab807 13.34715 13.34715 13.34715 13.34715
## 3 Metab806 13.21118 13.21118 13.21118 13.21118
## 4 Metab808 14.32570 14.32570 14.32570 14.32570
## 5 Metab197 21.29544 21.30204 20.00029 21.41036
# Get the path to the Metadata file
pathMetadata <- system.file("Example_data/ccRCC4_Data/Metadata.rds", package = "iModMix")
# Load the Metadata
metadata <- readRDS(pathMetadata)
# Check the Metadata
head(metadata)## Sample TN Histology
## 1 RC20.MSKC.00573 Tumor Clear Cell
## 2 RC20.MSKC.00574 Tumor Clear Cell
## 3 RC20.MSKC.00575 Tumor Clear Cell
## 4 RC20.MSKC.00576 Tumor Clear Cell
## 5 RC20.MSKC.00577 Tumor Clear Cell
## 6 RC20.MSKC.00578 Normal Clear Cell
The iModMix downstream pipeline requires an input matrix of RNA-seq counts. The load_data function performs the following steps: * Transposes the input data so that rows represent samples and columns represent features * Applies feature selection (SD below the 25th percentile) * Handles missing values (KNN) This step is optional. If the input matrix already has samples as rows, features as columns, and contains no missing values, users may skip it.
## Metab373 Metab807 Metab806 Metab808 Metab197
## RC20.MSKC.00573 0.47187064 -0.8291436 -0.9792751 -0.7464603 0.1386495
## RC20.MSKC.00574 0.65414303 -0.8291436 -0.9792751 -0.7464603 0.1433431
## RC20.MSKC.00575 0.33297295 -0.8291436 -0.9792751 -0.7464603 -0.7821402
## RC20.MSKC.00576 0.07949406 -0.8291436 -0.9792751 -0.7464603 0.2203568
## RC20.MSKC.00577 0.17085992 -0.8291436 -0.9792751 -0.7464603 -0.5085522
Calculations for partial correlation is usually the slowest step. Partial correlation is a method of analyzing the relationship between two variables when other variables are present. Graphical Lasso (Glasso) is used to estimate the partial correlation and captures only direct associations. Below is a preview of the sparse partial correlation of the first five features.
# Perform Partial correlation
parcorData1 <- fctPartialCors(loadData1, rho = 0.25)
parcorData1[1:5, 1:5]## Metab373 Metab807 Metab806 Metab808 Metab197
## Metab373 NA 0.0000000 0.005422781 0.0000000 0
## Metab807 0.000000000 NA -0.319434636 -0.2664755 0
## Metab806 0.005422781 -0.3194346 NA -0.2225263 0
## Metab808 0.000000000 -0.2664755 -0.222526343 NA 0
## Metab197 0.000000000 0.0000000 0.000000000 0.0000000 NA
Hierarchical clustering is used to identify common neighbors between the features. Calculations are determined using the topographical overlap matrix (TOM) and are based on the sparse partial correlations. Hierarchical clustering is visualized as a dendrogram.
The dendrogram plot is publication-ready and displays a dendrogram of features and modules. Axes: The vertical axis (y-axis) represents the dissimilarity between features, while the horizontal axis (x-axis) shows the modules. Branches: Each line in the dendrogram represents a feature. Features that are closer in the hierarchy (i.e., joined at a lower height in the dendrogram) have more similar expression profiles.
# Perform hierarchical clustering
hcData1 <- fctHierarchicalCluster(parcorMat = parcorData1, tom = TRUE, minModuleSize = 10)## Allowing parallel execution with up to 71 working processes.
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
hcClu <- hcData1$hclustTree
hcMod <- as.matrix(hcData1$dynamicMods_numeric)
WGCNA::plotDendroAndColors(
dendro = hcClu,
colors = hcMod,
dendroLabels = FALSE,
hang = 0.03,
addGuide = TRUE,
guideHang = 0.05,
groupLabels = "Modules",
main = "Feature dendrogram and module assignments"
)Let’s see the cluster assignments. Hierarchical clustering generates multiple modules (clusters) to which each feature is assigned. The table below details the following columns:
# Perform cluster assignments
clusterAssigData1 <- fctClusterAssignments(as.data.frame(hcData1$cluster_assignments))
head(clusterAssigData1)## feature cluster col feature_name
## 1 Metab373 cluster_000026 #AA5831 Metab373
## 2 Metab807 cluster_000030 #F481BD Metab807
## 3 Metab806 cluster_000030 #F481BD Metab806
## 4 Metab808 cluster_000030 #F481BD Metab808
## 5 Metab197 cluster_000019 #FFAD12 Metab197
## 6 Metab445 cluster_000017 #F27913 Metab445
The 904 features (identified metabolites) were assigned into 34 modules. The first principal component (PC1) is calculated for each module, referred to as an eigenfeature. Eigenfeatures are useful for
# Obtain Eigenfeatures
eigenData1 <- fctEigengenes(loadData1, hcData1$cluster_assignments[, 3])
eigengenesData1 <- eigenData1$module_eigenmetab_Me
eigengenesData1[1:5, 1:5]## ME#3B88A0 ME#3C7AB3 ME#419486 ME#46A06B ME#4BAC50
## RC20.MSKC.00573 -0.09452106 0.04054216 -0.09858921 0.1633222 0.067529213
## RC20.MSKC.00574 -0.05317106 -0.02626292 -0.04779067 0.1090938 0.019922717
## RC20.MSKC.00575 -0.17118698 -0.06062323 -0.12805564 0.1473363 -0.074034973
## RC20.MSKC.00576 -0.08161202 -0.04465726 -0.06401626 0.1345405 -0.001244864
## RC20.MSKC.00577 -0.15085973 -0.05007704 -0.10988318 0.1474912 -0.069908716
Since this is purely an example of how to run the entire analysis from start to finish, we are going to limit our analysis to the first 1000 features.
# Get the path to the expression data file
pathRNAexp <- system.file("Example_data/ccRCC4_Data/RNA_exp.rds", package = "iModMix")
# Load the expression data
dataExp2 <- readRDS(pathRNAexp)
dataExp2 <- dataExp2[1:1000, ]
# Check the expression data
dataExp2[1:5, 1:5]## Feature_ID RC20.MSKC.00573 RC20.MSKC.00574 RC20.MSKC.00575 RC20.MSKC.00576
## 1 A1BG 304 372 277 344
## 2 NAT2 7 10 6 12
## 3 ADA 787 1039 644 1041
## 4 CDH2 11991 7172 14457 9964
## 5 AKT3 13076 13876 10976 12730
Calculate correlated matrix
## A1BG NAT2 ADA CDH2 AKT3
## A1BG NA 0 0 0.0000000 0.0000000
## NAT2 0 NA 0 0.0000000 0.0000000
## ADA 0 0 NA 0.0000000 0.0000000
## CDH2 0 0 0 NA -0.2301523
## AKT3 0 0 0 -0.2301523 NA
Calculate hierarchical clustering
# Perform hierarchical clustering
hcData2 <- fctHierarchicalCluster(parcorMat = parcorData2, tom = TRUE, minModuleSize = 10)## Allowing parallel execution with up to 71 working processes.
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
# Access the hierarchical clustering tree and module assignments
hcClu2 <- hcData2$hclustTree
hcMod2 <- as.matrix(hcData2$dynamicMods_numeric)
# Plot the dendrogram
WGCNA::plotDendroAndColors(
dendro = hcClu2,
colors = hcMod2,
dendroLabels = FALSE,
hang = 0.03,
addGuide = TRUE,
guideHang = 0.05,
groupLabels = "Modules",
main = "Feature dendrogram and module assignments"
)Perform network construction and module eigengene calculation
# Perform cluster assignment
clusterAssigData2 <- fctClusterAssignments(as.data.frame(hcData2$cluster_assignments))
eigenData2 <- fctEigengenes(loadData2, hcData2$cluster_assignments[, 3])
eigengenesData2 <- eigenData2$module_eigenmetab_Me
eigengenesData2[1:5, 1:5]## ME#3982AE ME#3D8B9A ME#419486 ME#459D71 ME#4674A9
## RC20.MSKC.00573 0.10017350 -0.02712216 0.1993203 0.10061396 0.04785642
## RC20.MSKC.00574 0.03155250 -0.03207484 0.1931456 0.05334562 0.04834352
## RC20.MSKC.00575 0.11538115 -0.04139072 0.2011653 0.11052656 0.04044095
## RC20.MSKC.00576 0.04221625 -0.04664415 0.1922938 0.05054639 0.01798802
## RC20.MSKC.00577 0.09979154 -0.04398240 0.1641929 0.12994397 0.02970695
Statistical analysis by Students t-test compares phenotypes chosen as variable of interest and the user can specify a significance threshold for the p-value, the default p-value is set to 0.05. The eigenfeatures of each module, determined previously, are used as predictors. A data frame is generated with the following columns
Here, TN variable (Tumor, Normal) is selected as the phenotype variable of interest
Boxplots are automatically generated at the bottom for significant eigenfeatures, with outliers identified as individual dots and a legend is provided to describe the compared phenotypes.
# Perform Classification
ClassificationData <- fctPerformClassification(
eigengeneData = eigengenesData1,
metadata = metadata,
phenotypeVariable = "TN",
significanceThreshold = 0.09
)
ClassificationData$result[1:10, ]## Variable Class Result_t Result_pValue
## Comparison.t...2 ME#3C7AB3 Tumor vs Normal 6.68590e+00 0.00e+00
## Comparison.t...5 ME#4BAC50 Tumor vs Normal 5.53910e+00 0.00e+00
## Comparison.t...10 ME#904A67 Tumor vs Normal 9.61810e+00 0.00e+00
## Comparison.t...11 ME#91569A Tumor vs Normal 1.32229e+01 0.00e+00
## Comparison.t...12 ME#999999 Tumor vs Normal 7.37100e+00 0.00e+00
## Comparison.t...13 ME#A7558A Tumor vs Normal -4.45020e+00 0.00e+00
## Comparison.t...14 ME#AA5831 Tumor vs Normal -7.81820e+00 0.00e+00
## Comparison.t...15 ME#AF93A2 Tumor vs Normal 1.02016e+01 0.00e+00
## Comparison.t...16 ME#B6742A Tumor vs Normal -5.01910e+00 0.00e+00
## Comparison.t...19 ME#C06162 Tumor vs Normal 1.03488e+01 0.00e+00
## Adjusted_pValue
## Comparison.t...2 0.00e+00
## Comparison.t...5 0.00e+00
## Comparison.t...10 0.00e+00
## Comparison.t...11 0.00e+00
## Comparison.t...12 0.00e+00
## Comparison.t...13 0.00e+00
## Comparison.t...14 0.00e+00
## Comparison.t...15 0.00e+00
## Comparison.t...16 0.00e+00
## Comparison.t...19 0.00e+00
# Plot BoxPlot
selectedVariable <- "TN"
levels <- unique(metadata[[selectedVariable]])
classLabel <- paste(levels, collapse = " vs ")
plot <- ClassificationData$plots[[1]]
plot <- plot +
ggplot2::labs(
title = classLabel, fill = as.factor(levels),
x = "Variables",
y = "Class"
) +
ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 90, hjust = 1)
)
plotThis step perform the integration of the datasets previously analyzed using the eigenfeatures obtained from the steps above.
# Summarize into list the different eigenfeatures
eigengenesList <- list(eigengenesData1, eigengenesData2)
clusterList <- list(clusterAssigData1, clusterAssigData2)
IntegrationData1Data2 <- fctModulesCorrelation(eigengenesList, clusterList, threshold = 0.6)
TopCorrelations <- IntegrationData1Data2[["Top_cor_Prot_metab"]]
head(TopCorrelations)## from to value
## cor_Data1_Data2.1345 D1#C06162 D2#FF9609 -0.6106
## cor_Data1_Data2.1351 D1#E1C62F D2#FF9609 -0.6104
## cor_Data1_Data2.1209 D1#C06162 D2#EE83BB -0.6089
## cor_Data1_Data2.1132 D1#904A67 D2#E6D030 -0.6081
## cor_Data1_Data2.1134 D1#999999 D2#E6D030 -0.6033
# Plot correlation
CorrelationPlot <- IntegrationData1Data2$Correlation_Plot
hist(CorrelationPlot[[1]], main = "Correlation: Data 1 / Data 2 ")# Plot Network
nodes <- as.data.frame(IntegrationData1Data2$nodes)
edges <- as.data.frame(IntegrationData1Data2$edges)
n <- IntegrationData1Data2$n
shapes <- c("diamond", "triangle", "dot")
colors <- c("orange", "darkgreen", "darkblue")
network <- visNetwork::visNetwork(nodes = nodes, edges = edges, width = "100%", height = "800px")
network <- visNetwork::visLegend(network,
useGroups = FALSE,
addNodes = data.frame(
label = paste0("Data", 1:n, " Modules"),
shape = shapes[1:n], color = colors[1:n]
),
addEdges = data.frame(label = "Correlation", shape = "line", length = 200, color = "darkgreen")
)
network <- visNetwork::visInteraction(network, navigationButtons = TRUE)
networkEnrichment analysis can be performed if annotation data for proteomics or transcriptomics is uploaded with the column Symbol available. The drop-down menu displays available libraries for pathway analysis. Choose a library to automatically amend the dataset cluster descriptions on table below. For example, selecting “GO_Biological_Process_2023” will annotate cluster descriptions with enriched biological processes.
To perform enrichment analysis programmatically, you can use the function fctAssignmentGenesEnrichr(). Under column enriched_Term the most highly correlated pathway is displayed and in the following columns, along with enriched_Genes, and p-values as determined by Enrichr. The following code shows how to run the enrichment analysis using transcriptomics data (data2). This chunk is not evaluated to avoid long loading times during vignette rendering.
# Perform enrichment analysis per module
selectedDatabase <- "GO_Biological_Process_2023"
clusterAssignmentsData2Enrich <- fctAssignmentGenesEnrichr(clusterAssignmentsProtGenes = clusterAssigData2, database = selectedDatabase)we show a representative subset of the enrichment results:
Enrichment Analysis Results[1:5,]
| feature | cluster | col | feature_name | enriched_Term | enriched_Overlap | enriched_Genes | enriched_P.value | enriched_Adjusted.P.value |
|---|---|---|---|---|---|---|---|---|
| A1BG | cluster_000005 | #66628D | A1BG | Cilium Disassembly | 1/5 | HDAC6 | 0.008968455 | 0.09813736 |
| NAT2 | cluster_000041 | #DD87B4 | NAT2 | NA | NA | NA | NA | NA |
| ADA | cluster_000017 | #91569A | ADA | Purine Ribonucleoside Metabolic Process | 1/6 | ADA | 0.007477447 | 0.08081181 |
| CDH2 | cluster_000019 | #B45B76 | CDH2 | Detection Of Muscle Stretch | 1/6 | CDH2 | 0.007179245 | 0.11473611 |
| AKT3 | cluster_000019 | #B45B76 | AKT3 | Detection Of Muscle Stretch | 1/6 | CDH2 | 0.007179245 | 0.11473611 |
This table summarizes enriched biological processes associated with selected modules. The full enrichment analysis can be reproduced by uncommenting the function call above.
To ensure compatibility with Bioconductor standards, iModMix supports input data structured as SummarizedExperiment objects. This allows users to integrate metabolomics or other omics data using standardized containers that facilitate downstream analysis and interoperability with other Bioconductor packages.
Below is an example of how to convert your expression matrix and metadata into a SummarizedExperiment object:
# Load required library
library(SummarizedExperiment)
# Assuming:
# - 'dataExp1' has features in rows and samples in columns
# - 'Feature_ID' is the first column of dataExp1
# - 'metadata' has sample information with 'Sample' column matching column names of dataExp1 (except Feature_ID)
# Create SummarizedExperiment object
se <- fctiModMix2SE(dataExp1, metadata)
# Check structure
se## class: SummarizedExperiment
## dim: 904 76
## metadata(0):
## assays(1): counts
## rownames(904): Metab373 Metab807 ... Metab744 Metab429
## rowData names(0):
## colnames(76): RC20.MSKC.00573 RC20.MSKC.00574 ... RC20.MSKC.00653
## RC20.MSKC.00659
## colData names(3): Sample TN Histology
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SummarizedExperiment_1.39.2 Biobase_2.69.1
## [3] GenomicRanges_1.61.5 Seqinfo_0.99.2
## [5] IRanges_2.43.5 S4Vectors_0.47.4
## [7] BiocGenerics_0.55.4 generics_0.1.4
## [9] MatrixGenerics_1.21.0 matrixStats_1.5.0
## [11] iModMix_0.99.3 knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3 attempt_0.3.1
## [4] rlang_1.1.6 magrittr_2.0.4 compiler_4.5.1
## [7] RSQLite_2.4.3 png_0.1-8 vctrs_0.6.5
## [10] reshape2_1.4.4 stringr_1.5.2 pkgconfig_2.0.3
## [13] crayon_1.5.3 fastmap_1.2.0 backports_1.5.0
## [16] XVector_0.49.1 labeling_0.4.3 promises_1.3.3
## [19] rmarkdown_2.30 preprocessCore_1.71.2 purrr_1.1.0
## [22] bit_4.6.0 xfun_0.53 cachem_1.1.0
## [25] jsonlite_2.0.0 blob_1.2.4 later_1.4.4
## [28] DelayedArray_0.35.3 parallel_4.5.1 cluster_2.1.8.1
## [31] R6_2.6.1 golem_0.5.1 bslib_0.9.0
## [34] stringi_1.8.7 RColorBrewer_1.1-3 rpart_4.1.24
## [37] jquerylib_0.1.4 Rcpp_1.1.0 iterators_1.0.14
## [40] WGCNA_1.73 base64enc_0.1-3 httpuv_1.6.16
## [43] Matrix_1.7-4 splines_4.5.1 nnet_7.3-20
## [46] tidyselect_1.2.1 abind_1.4-8 rstudioapi_0.17.1
## [49] dichromat_2.0-0.1 yaml_2.3.10 doParallel_1.0.17
## [52] codetools_0.2-20 lattice_0.22-7 tibble_3.3.0
## [55] plyr_1.8.9 shiny_1.11.1 withr_3.0.2
## [58] KEGGREST_1.49.2 S7_0.2.0 glassoFast_1.0.1
## [61] evaluate_1.0.5 foreign_0.8-90 survival_3.8-3
## [64] Biostrings_2.77.2 pillar_1.11.1 checkmate_2.3.3
## [67] foreach_1.5.2 ggplot2_4.0.0 scales_1.4.0
## [70] xtable_1.8-4 glue_1.8.0 Hmisc_5.2-4
## [73] tools_4.5.1 data.table_1.17.8 visNetwork_2.1.4
## [76] fastcluster_1.3.0 grid_4.5.1 impute_1.83.0
## [79] tidyr_1.3.1 AnnotationDbi_1.71.2 colorspace_2.1-2
## [82] htmlTable_2.4.3 Formula_1.2-5 cli_3.6.5
## [85] config_0.3.2 S4Arrays_1.9.1 dplyr_1.1.4
## [88] gtable_0.3.6 dynamicTreeCut_1.63-1 sass_0.4.10
## [91] digest_0.6.37 SparseArray_1.9.1 htmlwidgets_1.6.4
## [94] farver_2.1.2 memoise_2.0.1 htmltools_0.5.8.1
## [97] lifecycle_1.0.4 httr_1.4.7 GO.db_3.22.0
## [100] mime_0.13 bit64_4.6.0-1