Introduction to MetID

Xuchen Wang



Metabolomics offers the opportunity to characterize complex diseases. The use of both LC-MS and GC-MS increases the coverage of the metabolome by taking advantage of their complementary features. Although numerous ions are detected using these platforms, only a small subset of the metabolites corresponding to these ions can be identified. The vast majority of them are either unknowns or “known-unknowns”. So we propose an innovative network-based approach to enhance our ability to determine the identities of significant ions detected by LC-MS. Specifically, it uses a probabilistic framework to determine the identities of known-unknowns by prioritizing their putative metabolite IDs. This will be accomplished by exploiting the inter-dependent relationships between metabolites in biological organisms based on knowledge from pathways/biochemical networks. This is the R package MetID that implements the algorithm.

The main function in this package is get_scores_for_LC_MS. See ?get_scores_for_LC_MC for documentation. This function takes an input dataset and assigns scores for each putative identifications. When working with this function, you must:


This example shows the usage of function get_scores_for_LC_MS with a small dataset: demo1. This dataset only contains 3 compounds and is documented in ?demo1. Note: the scores are only meaningful when we have a dataset with a large number of compounds. So the result of demo1 dataset does not make sense.

Load MetID package first.
Load demo1 dataset.
Check the form of demo1 dataset.
Change colnames of demo1.

Since the colnames do not meet our requirement, we need to change its colnames before we use get_scores_for_LC_MS function.

Other data sources

We also include a large dataset (demo2) which generates meaningful scores. As well as data frames, MetID works with data that is stored in other ways, like csv files and text files.