Mass spectrometry and proteomics using Bioconductor

Laurent Gatto

Mass spectrometry and proteomics using Bioconductor

Laurent Gatto                      Computational Proteomics Unit
https://lgatto.github.io           University of Cambridge
lg390@cam.ac.uk                    @lgatt0

Licence

These slides are available under a creative common CC-BY license. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially.

Table of content

Mass spectrometry

Mass spectrometry

chromatogram

MS schematics

MS1 and MS2 spectra

MS data

MS data

Mass spec in R

In R

library("msdata")
library("mzR")
fls <- proteomics(full = TRUE)
basename(fl <- fls[2])
## [1] "MS3TMT10_01022016_32917-33481.mzML.gz"
rw <- openMSfile(fl)
rw
## Mass Spectrometry file handle.
## Filename:  MS3TMT10_01022016_32917-33481.mzML.gz 
## Number of scans:  565

Accessors

softwareInfo(rw)
## [1] "Xcalibur 2.0.1258.15"
str(spectra(rw, 10:11))
## List of 2
##  $ : num [1:725, 1:2] 172 173 174 175 176 ...
##  $ : num [1:779, 1:2] 195 197 209 213 216 ...
str(header(rw))
## 'data.frame':    565 obs. of  22 variables:
##  $ seqNum                  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ acquisitionNum          : int  32918 32919 32920 32921 32922 32923 32924 32925 32926 32927 ...
##  $ msLevel                 : int  1 2 2 2 2 3 2 2 3 2 ...
##  $ polarity                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ peaksCount              : int  48304 755 803 765 499 2540 685 858 1422 725 ...
##  $ totIonCurrent           : num  3.01e+09 1.03e+06 6.97e+05 1.22e+06 4.95e+05 ...
##  $ retentionTime           : num  4423 4423 4423 4423 4423 ...
##  $ basePeakMZ              : num  697 646 642 550 719 ...
##  $ basePeakIntensity       : num  1.73e+08 1.21e+05 6.04e+04 1.03e+05 1.15e+05 ...
##  $ collisionEnergy         : num  0 35 35 35 35 65 35 35 65 35 ...
##  $ ionisationEnergy        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ lowMZ                   : num  376 187 188 200 166 ...
##  $ highMZ                  : num  1515 1646 1701 1596 1375 ...
##  $ precursorScanNum        : int  0 32918 32918 32918 32918 0 32918 32918 0 32918 ...
##  $ precursorMZ             : num  0 652 647 673 575 ...
##  $ precursorCharge         : int  0 3 3 3 3 0 2 3 1 2 ...
##  $ precursorIntensity      : num  0 9841531 3921567 7623700 2357085 ...
##  $ mergedScan              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ mergedResultScanNum     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ mergedResultStartScanNum: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ mergedResultEndScanNum  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ injectionTime           : num  0.00081 0.01084 0.01869 0.01233 0.00818 ...

MSnExp

Using MSnExp objects from the MSnbase package to conveniently and efficiently manage raw MS experiments.

library("MSnbase")
(x <- readMSData2(fl))
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.14 Mb
## - - - Spectra data - - -
##  MS level(s): 1 2 3 
##  Number of spectra: 565 
##  MSn retention times: 73:43 - 74:54 minutes
## - - - Processing information - - -
## Data loaded [Thu Jun 15 06:02:59 2017] 
##  MSnbase version: 2.3.3 
## - - - Meta data  - - -
## phenoData
##   rowNames: MS3TMT10_01022016_32917-33481.mzML.gz
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   MS3TMT10_01022016_32917-33481.mzML.gz 
## protocolData: none
## featureData
##   featureNames: X001.1 X002.1 ... X565.1 (565 total)
##   fvarLabels: fileIdx spIdx ... spectrum (27 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'

profile and centroided

table(msLevel(x))
## 
##   1   2   3 
##  25 270 270
head(centroided(x))
## X001.1 X002.1 X003.1 X004.1 X005.1 X006.1 
##     NA     NA     NA     NA     NA     NA

table(iscent <- isCentroided(x), msLevel(x))
##        
##           1   2   3
##   FALSE  25   0 270
##   TRUE    0 270   0
centroided(x) <- iscent
head(centroided(x))
## X001.1 X002.1 X003.1 X004.1 X005.1 X006.1 
##  FALSE   TRUE   TRUE   TRUE   TRUE  FALSE

library("magrittr")
x2 <- x %>%
    filterMsLevel(2L) %>%
    filterMz(c(126, 132))
x2
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.08 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 270 
##  MSn retention times: 73:43 - 74:54 minutes
## - - - Processing information - - -
## Data loaded [Wed Jun 14 11:13:24 2017] 
## Filter: select MS level(s) 2 [Thu Jun 15 06:03:00 2017] 
## Filter: trim MZ [126..132] on MS level(s) 2. 
##  MSnbase version: 2.3.3 
## - - - Lazy processing queue content  - - -
##  o  filterMz 
## - - - Meta data  - - -
## phenoData
##   rowNames: MS3TMT10_01022016_32917-33481.mzML.gz
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   MS3TMT10_01022016_32917-33481.mzML.gz 
## protocolData: none
## featureData
##   featureNames: X002.1 X003.1 ... X563.1 (270 total)
##   fvarLabels: fileIdx spIdx ... spectrum (27 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'

data(itraqdata)
itraqdata[[22]]
## Object of class "Spectrum2"
##  Precursor: 974.4916 
##  Retention time: 31:44 
##  Charge: 2 
##  MSn level: 2 
##  Peaks count: 3124 
##  Total ion count: 151732400

plot(itraqdata[[22]], full = TRUE, reporters = iTRAQ4)

plot of chunk plotrw

itraqdata2 <- pickPeaks(itraqdata)
plot(itraqdata2[[22]], itraqdata2[[26]])

plot of chunk plotrw2

Proteomics

Proteomics