Package: MsFeatures
Authors: Johannes Rainer [aut, cre] (ORCID: https://orcid.org/0000-0002-6977-7147)
Last modified: 2024-10-23 23:43:10.886749
Compiled: Tue Oct 29 18:28:01 2024

1 Introduction

Electrospray ionization (ESI) is commonly used in mass spectrometry (MS)-based metabolomics to generate ions from the compounds to enable their detection by the MS instrument. Ionization can generate different ions (adducts) of the same original compound which are then reported as separate MS features with different mass-to-charge ratios (m/z). To reduce data set complexity (and to aid subsequent annotation steps) it is advisable to group features which supposedly represent signal from the same original compound into a single entity.

The MsFeatures package provides key concepts and functions for this feature grouping. Methods are implemented for base R objects as well as for Bioconductor’s SummarizedExperiment class. See also the description of the general grouping concept on the package webpage for more information. Additional grouping methodology is expected to be implemented in other R packages for data objects with additional LC-MS related information, such as the XCMSnExp object in the xcms package. The implementation for the SummarizedExperiment provided in this package can be used as a reference for these additional methodology.

After definition of the feature groups, the QFeatures package could be used to aggregate their abundances into a single signal.

2 Installation

The package can be installed with the BiocManager package. To install BiocManager use install.packages("BiocManager") and, after that, BiocManager::install("MsFeatures") to install this package.

3 Mass Spectrometry Feature Grouping

Features from the same originating compound inherit its characteristics including its retention time (for LC or GC-MS experiments) and abundance/intensity. For the latter it is expected that features from the same compound have the same pattern of feature values/abundances across samples.

The MsFeatures package defines the groupFeatures method to perform MS feature grouping based on the provided input data and a parameter object which selects and defines the feature grouping algorithm. This algorithm is supposed to assign individual features to a (single) feature group. Currently two feature grouping approaches are implemented:

  • SimilarRtimeParam: group features based on similar retention times.
  • AbundanceSimilarityParam: group features based on similar feature values/abundances across samples.

Additional algorithms, e.g. by considering also differences in features’ m/z values matching expected ions/adducts or isotopes, may be implemented in future in this or other packages.

In this document we demonstrate the feature grouping functionality on a simple toy data set used also in the xcms package with the raw data being provided in the faahKO data package. This data set consists of samples from 4 mice with knock-out of the fatty acid amide hydrolase (FAAH) and 4 wild type mice. Pre-processing of this data set is described in detail in the xcms vignette of the xcms package. Below we load all required packages and the result from this pre-processing which is provided as a SummarizedExperiment within this package and can be loaded with data(se).

library(MsFeatures)
library(SummarizedExperiment)

data("se")

Before performing the feature grouping we inspect the result object. Feature properties and definitions can be accessed with rowData, the feature abundances with assay.

rowData(se)
## DataFrame with 225 rows and 11 columns
##           mzmed     mzmin     mzmax     rtmed     rtmin     rtmax    npeaks
##       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001     200.1     200.1     200.1   2901.63   2880.73   2922.53         2
## FT002     205.0     205.0     205.0   2789.39   2782.30   2795.36         8
## FT003     206.0     206.0     206.0   2788.73   2780.73   2792.86         7
## FT004     207.1     207.1     207.1   2718.12   2713.21   2726.70         7
## FT005     219.1     219.1     219.1   2518.82   2517.40   2520.81         3
## ...         ...       ...       ...       ...       ...       ...       ...
## FT221    591.30     591.3     591.3   3005.03   2992.87   3006.05         5
## FT222    592.15     592.1     592.3   3022.11   2981.91   3107.59         6
## FT223    594.20     594.2     594.2   3418.16   3359.10   3427.90         3
## FT224    595.25     595.2     595.3   3010.15   2992.87   3013.77         6
## FT225    596.20     596.2     596.2   2997.91   2992.87   3002.95         2
##              KO        WT            peakidx  ms_level
##       <numeric> <numeric>             <list> <integer>
## FT001         2         0  287, 679,1577,...         1
## FT002         4         4     47,272,542,...         1
## FT003         3         4     32,259,663,...         1
## FT004         4         3     19,249,525,...         1
## FT005         1         2  639, 788,1376,...         1
## ...         ...       ...                ...       ...
## FT221         2         3    349,684,880,...         1
## FT222         1         3     86,861,862,...         1
## FT223         1         2  604, 985,1543,...         1
## FT224         2         3     67,353,876,...         1
## FT225         0         2  866,1447,1643,...         1
head(assay(se))
##        ko15.CDF   ko16.CDF   ko21.CDF  ko22.CDF  wt15.CDF  wt16.CDF  wt21.CDF
## FT001  159738.1  506848.88  113441.08  169955.6  216096.6  145509.7  230477.9
## FT002 1924712.0 1757150.96 1383416.72 1180288.2 2129885.1 1634342.0 1623589.2
## FT003  213659.3  289500.67  162897.19  178285.7  253825.6  241844.4  240606.0
## FT004  349011.5  451863.66  343897.76  208002.8  364609.8  360908.9  223322.5
## FT005  135978.5   25524.79   71530.84  107348.5  223951.8  134398.9  190203.8
## FT006  286221.4  289908.23  164008.97  149097.6  255697.7  311296.8  366441.5
##         wt22.CDF
## FT001  140551.30
## FT002 1354004.93
## FT003  185399.47
## FT004  221937.53
## FT005   84772.92
## FT006  271128.02

Columns "mzmed" and "rtmed" in the object’s rowData provide the m/z and retention time which characterizes each feature. In total 225 features are available in the present data set, with many of them most likely representing signal from different ions of the same compound. We aim to identify these based on the following assumptions of the LC-MS data:

  • Features (ions) of the same compound should have similar retention time.
  • The abundance of features (ions) of the same compound should have a similar pattern across samples, i.e. if a compound is highly concentrated in one sample and low in another, all ions from it also should follow the same pattern.

As detailed in the general grouping concept, the feature grouping implemented in MsFeatures is by default intended to be used as a stepwise approach in which each groupFeatures call further sub-groups (and thus refines) previously defined feature groups. This enables to either use a single algorithm for the feature grouping or to build a feature grouping pipeline by combining different algorithms. In our example we perform first a initial grouping of features based on similar retention time and subsequently further refine these feature groups by requiring also similarity of feature values across samples.

Note that it would also be possible to perform the grouping only on a subset of features instead of the full data set. An example is provided in the last section of this vignette.

3.1 Grouping of features by similar retention time

The most intuitive and simple way to group LC-MS features is based on their retention times: ionization of the compounds happens after the LC and thus all ions from the same compound should have the same retention time. The plot below shows the retention times (and m/z) of all features from the present data set.

plot(rowData(se)$rtmed, rowData(se)$mzmed,
     xlab = "retention time", ylab = "m/z", main = "features",
     col = "#00000060")
grid()