CSAMA 2022

Content

  • Introduction to metabolomics
  • Types of metabolomics data
  • Handling and processing metabolomics data
  • Annotation of metabolomics data

Metabolite? Metabolism?

  • Key metabolic pathway common to all cells.
  • Creates energy by converting glucose to pyruvate.

Metabolite? Metabolism?

Metabolite? Metabolism?

Metabolite? Metabolism?

Metabolite? Metabolism?

Metabolite? Metabolism?

Metabolite? Metabolism?

  • Metabolites: intermediates and products of cellular processes.

Metabolomics?

  • Large-scale study of small molecules (metabolites) in a system (cell, tissue, organism, …).

Putting metabolomics into context:

  • Genome: what can happen.
  • Transcriptome: what appears to be happening.
  • Proteome: what makes it happen.
  • Metabolome: what actually happened.

Properties of the metabolome:

  • Metabolome is highly dynamic.
  • Metabolome influenced by genetic and environmental factors.

Influence from genetic factors

  • mGWAS: associations between genetic variants and metabolite concentrations.

  • Significant association between variant and carnitine, acetylcarnitine and butyrylcarnitine.
  • SLC22A5: carnitine transporter.
  • Genetic variant in this gene has influence on its function.

Environmental influence

Where can we measure metabolites?

  • Blood (serum):
    • insights into general physiological state.
    • venous blood/capillary (arterial) blood.
  • Cell extracts:
    • insights in mitochondrial metabolism.
  • Cell supernatant:
    • what did cells consume?
  • Other: plants, urine, food, soil, water (environmental sciences) …

How can we measure metabolites?

  • Nuclear Magnetic Resonance (NMR) - not covered here.
  • Mass spectrometry (MS)-based metabolomics.
  • Metabolites small enough to be directly measured by MS.
  • Most metabolites uncharged - need to create ions first.

Two main setups to measure metabolites:

  • targeted: quantitative measurement of selected metabolites.
  • untargeted: semi-quantitative measurement of all metabolites (detectable with the setup) in a sample.

Mass Spectrometry (MS)

Mass Spectrometry (MS)

  • Problem: unable to distinguish between metabolites with the same/similar mass-to-charge ratio (m/z).
  • Solution: additional separation of metabolites prior to MS.

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

  • Mobile phase carries analytes through column (stationary phase).

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

  • Mobile phase carries analytes through column (stationary phase).

  • Separation based on affinity for the column’s stationary phase.

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

  • Mobile phase carries analytes through column (stationary phase).

  • Separation based on affinity for the column’s stationary phase.

  • Commonly used: RPLC (Reversed Phase LC). HILIC (hyrophilic liquid interaction chromatography)

Mass Spectrometry (MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.
  • LC-MS: analyze data along retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

Mass Spectrometry Data in R

  • Data stored in mzML files.
  • To load such data into R:

ms <- readMSData(fl, mode = "onDisk")

sps <- Spectra(fl, backend = MsBackendMzR())

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection
    • alignment

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection
    • alignment
    • correspondence

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection
    • alignment
    • correspondence
## DataFrame with 779 rows and 6 columns
##               mz        rt  sample_1  sample_2  sample_3  sample_4
##        <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001    326.378    25.409  4654.057  4755.993  4750.997   4671.01
## FT002    134.096    16.513   767.239   926.769   791.133    852.95
## ...          ...       ...       ...       ...       ...       ...
## FTM936   501.383   137.340   7767.00   7871.74   7869.51   7697.32
## FTM937   612.404    28.094   1667.49   1640.29   1676.65   1652.04

But wait - what are we actually measuring?

## DataFrame with 779 rows and 6 columns
##               mz        rt  sample_1  sample_2  sample_3  sample_4
##        <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001    326.378    25.409  4654.057  4755.993  4750.997   4671.01
## FT002    134.096    16.513   767.239   926.769   791.133    852.95
## ...          ...       ...       ...       ...       ...       ...
## FTM936   501.383   137.340   7767.00   7871.74   7869.51   7697.32
## FTM937   612.404    28.094   1667.49   1640.29   1676.65   1652.04

But wait - what are we actually measuring?

  • LC-MS features characterized by their m/z and retention time.
  • Goal: annotate these features to metabolites (compounds).
## DataFrame with 779 rows and 6 columns
##               mz        rt  sample_1  sample_2  sample_3  sample_4
##        <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001    326.378    25.409  4654.057  4755.993  4750.997   4671.01
## FT002    134.096    16.513   767.239   926.769   791.133    852.95
## ...          ...       ...       ...       ...       ...       ...
## FTM936   501.383   137.340   7767.00   7871.74   7869.51   7697.32
## FTM937   612.404    28.094   1667.49   1640.29   1676.65   1652.04

One step back: ionization

name formula exactmass
Caffeine C8H10N4O2 194.1
  • Molecule not charged. Can not be detected with MS.

One step back: ionization

name formula exactmass [M+H]+
Caffeine C8H10N4O2 194.1 195.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions

One step back: ionization

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions, potentially multiple.

Annotation using m/z values

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions, potentially multiple.
  • Match measured m/z against these reference values.

mtch <- matchValues(query, target,
                    Mass2MzParam(c("[M+H]+", "[M+Na]+")))

Annotation using m/z values

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions, potentially multiple.
  • Match measured m/z against these reference values.

mtch <- matchValues(query, target,
                    Mass2MzParam(c("[M+H]+", "[M+Na]+")))
  • query: experimental m/z values.
  • target: reference masses.

A little additional complication

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
Enprofylline C8H10N4O2 194.1 195.1 217.1
  • Enprofylline: asthma treatment agent.
  • Compounds can have same formula - how to distinguish?

A little additional complication

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
Enprofylline C8H10N4O2 194.1 195.1 217.1
  • Enprofylline: asthma treatment agent.
  • Compounds can have same formula - how to distinguish?
  • They differ by their structure.

Using m/z and retention time

  • Structure can have an influence on the polarity of compounds and thus they will separate in the LC: -> different retention time.
  • Annotation using m/z and retention time:

mtch <- matchValues(query, target,
                    Mass2MzRtParam(c("[M+H]+", "[M+Na]+")))
  • Requires that we do have reference retention times available.
  • These are instrument set-up/lab-specific.

Using MS2 spectra

  • We can fragment ions to get some information on their structure.
  • LC-MS -> LC-MS/MS.
  • MS instrument selects ions for fragmentation and records MS2 spectrum.

  • LC-MS/MS data: MS1 for quantification, MS2 for annotation.

Using MS2 spectra

  • If we have MS2 spectra associated to features, we can match them against reference spectra.
  • Calculate similarity scores and compare spectra.

Spectra similarity calculation in R

simmat <- compareSpectra(a, b)

mtch <- matchSpectra(query, target,
                     CompareSpectraParam())

SpectriPy: integrate python MS libraries (matchms, MS2DeepScore) into Spectra-based workflows.

  • Limitation: availability of reference spectra.
  • Public reference databases are growing, collecting data shared by researchers.

Workshops

Thank you for your attention