1 Abstract

In this document we discuss mass spectrometry (MS) data handling and access using Bioconductor’s MSnbase package (Gatto and Lilley 2012) and walk through the preprocessing of an (untargeted) LC-MS toy data set using the xcms package (Smith et al. 2006). The preprocessing comprises chromatographic peak detection, sample alignment and peak correspondence. Particular emphasis is given on defining data-set dependent values for the most important settings of popular preprocessing methods.

2 Introduction

Preprocessing of untargeted metabolomics data is the first step in the analysis of GC/LS-MS based untargeted metabolomics experiments. The aim of the preprocessing is the quantification of signals from ion species measured in a sample and matching of these entities across samples within an experiment. The resulting two-dimensional matrix with feature abundances in all samples can then be further processed, e.g. by normalizing the data to remove sampling differences, batch effects or injection order-dependent signal drifts. Another crucial step in untargeted metabolomics analysis is the annotation of the (m/z-retention time) features to the actual ions and metabolites they represent. Note that data normalization and annotation are not covered in this document.

People familiar with the concepts of mass spectrometry or LC-MS data analysis may jump directly to the next section.

2.1 Prerequisites

The analysis in this document requires an R version >= 3.6.0 and recent versions of the MSnbase and xcms packages.

library(BiocManager)
BiocManager::install(c("xcms",
                       "MSnbase",
                       "msdata",
                       "magrittr",
                       "png"))

2.2 Mass spectrometry

Mass spectrometry allows to measure abundances of charged molecules (ions) in a sample. Abundances are determined as ion counts for a specific mass-to-charge ratio m/z. The measured signal is represented as a spectrum: intensities along m/z.