Modified: 2017-06-15 07:02:28
Compiled: Thu Jun 15 07:02:43 2017

Abstract In this workflow, we will use R/Bioconductor packages to explore, process, visualise and understand mass spectrometry-based proteomics data, starting with raw data, and proceeding with identification and quantitation data, discussing some of their peculiarities compared to sequencing data along the way. The workflow is aimed at a beginner to intermediate level, such as, for example, seasoned R users who want to get started with mass spectrometry and proteomics, or proteomics practitioners who want to familiarise themselves with R and Bioconductor infrastructure.

This material available under a creative common CC-BY license. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially.

1 Introduction

CSAMA 2017 Some part of this workshop require online access. The most bandwidth expensive operations are files downloads. Please copy the files from the local network and place them in your working directory1 If you are not sure what your working directory is, type getwd() or ask a helper.. This will stop them from being downloaded again.

Before we start:

If you identify typos, if there are parts that you would like to see expended or clarified, please let me know by telling me directly (during workshops), opening a github issue or by emailing me directly. Please do also briefly specify your background/familiarity with mass spectrometry and/or proteomics (beginner, intermediate or expert) so that I can update accordingly.

In recent years, there we have seen an increase in the number of packages to analyse mass spectrometry and proteomics data for R and Bioconductor, as well as an increase in total number of downloads.

Number of packages Number of downloads

It is also good to highlight that several of these package have become a group efforts, supported by several developers in the community. This post illustrates the various contributions to MSnbase; r Biocpkg("mzR") has benefited by a similar wide range of successful contributions. Both packages, and in particular mzR, are used by many others, and will be described in some detail in this workflow.

This workflow illustrates R / Bioconductor infrastructure for proteomics. Topics covered focus on support for open community-driven formats for raw data and identification results, packages for peptide-spectrum matching, data processing and analysis:

  • Exploring available infrastructure
  • Mass spectrometry data
  • Getting data from proteomics repositories
  • Handling raw MS data
  • Handling identification data
  • MS/MS database search
  • Analysing search results
  • High-level data interface
  • Quantitative proteomics
  • Importing third-party quantitation data
  • Data processing and analysis
  • Statistical analysis
  • Machine learning
  • Annotation
  • Other relevant packages/pipelines

Links to other packages and references are also documented. In particular, the vignettes included in the RforProteomics package also contains relevant material.

Other material

This workflow provides a general introduction to Bioconductor software for mass spectrometry and proteomics. If you are interested in

  • The application of machine learning to proteomics data, in particular spatial proteomics (i.e. the sub-cellular localisation), follow the tutorial vignette from pRoloc package, accessible with vignette("pRoloc-tutorial", package = "pRoloc") or online.
  • The analysis of identification data to retain the most reliable PSMs, see the MSnID vignette2 Section Analysing search results below is a summary of that vignette., accessible with vignette("msnid_vignette", package = "MSnID") or online. In addition, the vignettes of the msmsTest package describe how to analyse spectral counting data using packages dedicated for the analysis of high throughput sequencing data.
  • The analysis of MSE data independent acquisition (DIA), see the vignettes in the synapter package.
  • The processing and analysis of MALDI-MS data, read the MALDIquant introduction accessible with vignette("MALDIquant-intro", package = "MALDIquant") and available online.
  • The processing and analysis of imaging mass spectrometry (IMS), read the Carinal walkthrough vignette accessible with vignette("Cardinal-walkthrough", package = "Cardinal") and online.

Setup

CSAMA 2017 This section is not relevant, as you should already have installed all required packages.

The follow packages will be used throughout this documents. R version 3.3.1 or higher is required to install all the packages using BiocInstaller::biocLite.

library("mzR")
library("mzID")
library("MSnID")
library("MSnbase")
library("rpx")
library("MLInterfaces")
library("pRoloc")
library("pRolocdata")
library("MSGFplus")
library("rols")
library("hpar")

The most convenient way to install all the tutorials requirement (and more related content), is to install RforProteomics with all its dependencies.

library("BiocInstaller")
biocLite("RforProteomics", dependencies = TRUE)

Other packages of interest, such as rTANDEM or MSGFgui will be described later in the document but are not required to execute the code in this workflow.

Exploring available infrastructure

In Bioconductor version 3.6, there are respectively 92 proteomics, 62 mass spectrometry software packages and 17 mass spectrometry experiment packages. These respective packages can be extracted with the proteomicsPackages(), massSpectrometryPackages() and massSpectrometryDataPackages() and explored interactively, or looked at by exploring the respective biocViews on the Bioconductor web page.

library("RforProteomics")
pp <- proteomicsPackages()
display(pp)

Mass spectrometry data

Most community-driven formats described in the table are supported in R. We will see how to read and access these data in the following sections.

Type Format Package
raw mzML, mzXML, netCDF, mzData mzR (read)
identification mzIdentML mzR (read) and mzID (read)
quantitation mzQuantML
peak lists mgf MSnbase (read)
other mzTab MSnbase (read)

2 Accessing data

2.1 From the ProteomeXchange database

MS-based proteomics data is disseminated through the ProteomeXchange infrastructure, which centrally coordinates submission, storage and dissemination through multiple data repositories, such as the PRoteomics IDEntifications (PRIDE) database at the EBI for mass spectrometry-based experiments (including quantitative data, as opposed as the name suggests), PASSEL at the ISB for Selected Reaction Monitoring (SRM, i.e. targeted) data and the MassIVE resource. These data can be downloaded within R using the rpx package.

library("rpx")
pxannounced()
## 15 new ProteomeXchange annoucements
##     Data.Set    Publication.Data             Message
## 1  PXD004192 2017-06-14 12:23:35                 New
## 2  PXD004917 2017-06-14 12:18:17                 New
## 3  PXD004950 2017-06-14 12:09:15                 New
## 4  PXD004972 2017-06-14 12:04:02                 New
## 5  PXD004979 2017-06-14 11:58:10                 New
## 6  PXD005725 2017-06-14 11:55:09                 New
## 7  PXD006182 2017-06-14 11:38:47                 New
## 8  PXD003423 2017-06-14 11:19:48                 New
## 9  PXD005730 2017-06-14 09:38:08                 New
## 10 PXD004027 2017-06-14 07:26:48                 New
## 11 PXD005035 2017-06-14 07:05:21                 New
## 12 PXD003983 2017-06-13 15:32:39 Updated information
## 13 PXD004096 2017-06-13 15:16:20 Updated information
## 14 PXD005976 2017-06-13 15:06:09 Updated information
## 15 PXD006232 2017-06-13 15:05:36 Updated information

Using the unique PXD000001 identifier, we can retrieve the relevant metadata that will be stored in a PXDataset object. The names of the files available in this data can be retrieved with the pxfiles accessor function.

px <- PXDataset("PXD000001")
px
## Object of class "PXDataset"
##  Id: PXD000001 with 12 files
##  [1] 'F063721.dat' ... [12] 'generated'
##  Use 'pxfiles(.)' to see all files.
pxfiles(px)
##  [1] "F063721.dat"                                                         
##  [2] "F063721.dat-mztab.txt"                                               
##  [3] "PRIDE_Exp_Complete_Ac_22134.xml.gz"                                  
##  [4] "PRIDE_Exp_mzData_Ac_22134.xml.gz"                                    
##  [5] "PXD000001_mztab.txt"                                                 
##  [6] "README.txt"                                                          
##  [7] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML" 
##  [8] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzXML"
##  [9] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML"         
## [10] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.raw"           
## [11] "erwinia_carotovora.fasta"                                            
## [12] "generated"

Other metadata for the px data set:

pxtax(px)
## [1] "Erwinia carotovora"
pxurl(px)
## [1] "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD000001"
pxref(px)
## [1] "Gatto L, Christoforou A. Using R and Bioconductor for proteomics data analysis. Biochim Biophys Acta. 2014 Jan;1844(1 Pt A):42-51. Review"

Data files can then be downloaded with the pxget function. Below, we retrieve the raw data file. The file is downloaded3 If the file is already available, it is not downloaded a second time. in the working directory and the name of the file is return by the function and stored in the mzf variable for later use.

fn <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"
mzf <- pxget(px, fn)
## Downloading 1 file
## /home/lg390/Documents/Teaching/bioc-ms-prot/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML already present.
mzf
## [1] "/home/lg390/Documents/Teaching/bioc-ms-prot/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"

2.2 From AnnotationHub

CSAMA 2017 Executing this section will cause raw data file to be downloaded and stored as part of the AnnotationHub’s local resource. The mzML file that is available on the local network is identical to the one accessed here.

AnnotationHub is a cloud resource set up and managed by the Bioconductor project that serves various omics datasets. It is possible to contribute and access (albeit currently only a limited number of) proteomics data.

library("AnnotationHub")
## 
## Attaching package: 'AnnotationHub'
## The following object is masked from 'package:Biobase':
## 
##     cache
ah <- AnnotationHub()
## snapshotDate(): 2017-06-08
query(ah, "proteomics")
## AnnotationHub with 4 records
## # snapshotDate(): 2017-06-08 
## # $dataprovider: PRIDE
## # $species: Erwinia carotovora
## # $rdataclass: AAStringSet, MSnSet, mzRident, mzRpwiz
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## #   tags, rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH49006"]]' 
## 
##             title                                                         
##   AH49006 | PXD000001: Erwinia carotovora and spiked-in protein fasta file
##   AH49007 | PXD000001: Peptide-level quantitation data                    
##   AH49008 | PXD000001: raw mass spectrometry data                         
##   AH49009 | PXD000001: MS-GF+ identiciation data
ms <- ah[["AH49008"]]
ms
## Mass Spectrometry file handle.
## Filename:  55314 
## Number of scans:  7534

The data contains 7534 spectra - 1431 MS1 spectra and 6103 MS2 spectra. The file name, 55314, is not very descriptive because the data originates from the AnnotationHub cloud repository. If the data was read from a local file, is would be named as the mzML (or mzXML) file (see below).

3 Handling raw MS data

The mzR package provides an interface to the proteowizard C/C++ code base to access various raw data files, such as mzML, mzXML, netCDF, and mzData. The data is accessed on-disk, i.e it is not loaded entirely in memory, and only when explicitly requested. The three main functions are openMSfile to create a file handle to a raw data file, header to extract metadata about the spectra contained in the file and peaks to extract one or multiple spectra of interest. Other functions such as instrumentInfo, or runInfo can be used to gather general information about a run.

Below, we access the raw data file downloaded in the previous section and open a file handle that will allow us to extract data and metadata of interest.

library("mzR")
basename(mzf)
## [1] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"
ms <- openMSfile(mzf)
ms
## Mass Spectrometry file handle.
## Filename:  TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML 
## Number of scans:  7534

The object loaded from AnnotationHub in the previous section is of the same type, and was also created by the openMSfile function. All operations below can equally be applied to it.

The header function returns the metadata of all available peaks:

hd <- header(ms)
dim(hd)
## [1] 7534   22
names(hd)
##  [1] "seqNum"                   "acquisitionNum"          
##  [3] "msLevel"                  "polarity"                
##  [5] "peaksCount"               "totIonCurrent"           
##  [7] "retentionTime"            "basePeakMZ"              
##  [9] "basePeakIntensity"        "collisionEnergy"         
## [11] "ionisationEnergy"         "lowMZ"                   
## [13] "highMZ"                   "precursorScanNum"        
## [15] "precursorMZ"              "precursorCharge"         
## [17] "precursorIntensity"       "mergedScan"              
## [19] "mergedResultScanNum"      "mergedResultStartScanNum"
## [21] "mergedResultEndScanNum"   "injectionTime"

We can extract metadata and scan data for scan 1000 as follows:

hd[1000, ]
##      seqNum acquisitionNum msLevel polarity peaksCount totIonCurrent
## 1000   1000           1000       2        1        274       1048554
##      retentionTime basePeakMZ basePeakIntensity collisionEnergy
## 1000      1106.916    136.061            164464              45
##      ionisationEnergy    lowMZ   highMZ precursorScanNum precursorMZ
## 1000                0 104.5467 1370.758              992    683.0817
##      precursorCharge precursorIntensity mergedScan mergedResultScanNum
## 1000               2           689443.7          0                   0
##      mergedResultStartScanNum mergedResultEndScanNum injectionTime
## 1000                        0                      0    0.05521463
head(peaks(ms, 1000))
##          [,1]     [,2]
## [1,] 104.5467 308.9326
## [2,] 104.5684 308.6961
## [3,] 108.8340 346.7183
## [4,] 109.3928 365.1236
## [5,] 110.0345 616.7905
## [6,] 110.0703 429.1975
plot(peaks(ms, 1000), type = "h", xlab = "M/Z", ylab = "Intensity")
Manual extraction and plotting of an MS spectrum

Manual extraction and plotting of an MS spectrum

Exercise Identify the index of the MS2 spectrum with the highest precursor intensity (see the precursorIntensity feature variable) and plot it as illustrated above.

i <- which.max(hd$precursorIntensity)
sp <- spectra(ms, i)
plot(sp, type = "h", xlab = "M/Z", ylab = "Intensity")

plot of chunk answ1

Exercise Are the MS1 and MS2 spectra in profile mode or centroided? Choose any spectra and plot the region around a peak to find out.

par(mfrow = c(1, 2))
bpmz <- hd[i, "basePeakMZ"]
plot(sp, type = "h", xlim = c(bpmz - 0.5, bpmz + 0.5),
     xlab = "M/Z", ylab = "Intensity",
     main = "Centroided")
## Spectrum 100 is MS1
sp1 <- peaks(ms, 100)
plot(sp1, type = "l", xlim = c(536,540),
     xlab = "M/Z", ylab = "Intensity",
     main = "Profile mode")

plot of chunk answ2

4 Handling identification data

The RforProteomics package distributes a small identification result file that we load and parse using infrastructure from the mzR package.

f <- dir(system.file("extdata", package = "RforProteomics"),
         pattern = "mzid", full.names=TRUE)
basename(f)
## [1] "TMT_Erwinia.mzid.gz"

Along the lines of what is available for raw data, it provides support fasta parsing mzIdentML files with the openIDfile function. As for raw data, the underlying C/C++ code comes from the proteowizard.

library("mzR")
id1 <- openIDfile(f)
id1
## Identification file handle.
## Filename:  TMT_Erwinia.mzid.gz 
## Number of psms:  5655

Various data can be extracted from the identification object. The peptide spectrum matches (PSMs) and the identification scores can be accessed as a data.frame with psms and score respectively.

softwareInfo(id1)
## [1] "MS-GF+ Beta (v10072) "                      
## [2] "ProteoWizard MzIdentML 3.0.501 ProteoWizard"
enzymes(id1)
##      name nTermGain cTermGain minDistance missedCleavages
## 1 Trypsin                               0            1000
fid1 <- mzR::psms(id1)
head(fid1)
##   spectrumID chargeState rank passThreshold experimentalMassToCharge
## 1  scan=5782           3    1          TRUE                1080.2325
## 2  scan=6037           3    1          TRUE                1002.2089
## 3  scan=5235           3    1          TRUE                1189.2836
## 4  scan=5397           3    1          TRUE                 960.5365
## 5  scan=6075           3    1          TRUE                1264.3409
##   calculatedMassToCharge                            sequence modNum
## 1              1080.2321 PVQIQAGEDSNVIGALGGAVLGGFLGNTIGGGSGR      0
## 2              1002.2115        TQVLDGLINANDIEVPVALIDGEIDVLR      0
## 3              1189.2800   TKGLNVMQNLLTAHPDVQAVFAQNDEMALGALR      0
## 4               960.5365         SQILQQAGTSVLSQANQVPQTVLSLLR      0
## 5              1264.3419 PIIGDNPFVVVLPDVVLDESTADQTQENLALLISR      0
##   isDecoy post pre start end DatabaseAccess DBseqLength DatabaseSeq
## 1   FALSE    S   R    50  84        ECA1932         155            
## 2   FALSE    R   K   288 315        ECA1147         434            
## 3   FALSE    A   R   192 224        ECA0013         295            
## 4   FALSE    -   R   264 290        ECA1731         290            
## 5   FALSE    F   R   119 153        ECA1443         298            
##                                         DatabaseDescription acquisitionNum
## 1                        ECA1932 outer membrane lipoprotein           5782
## 2                                    ECA1147 trigger factor           6037
## 3                ECA0013 ribose-binding periplasmic protein           5235
## 4                                         ECA1731 flagellin           5397
## 5      ECA1443 UTP--glucose-1-phosphate uridylyltransferase           6075
##  [ reached getOption("max.print") -- omitted 1 row ]
sc1 <- mzR::score(id1)
head(sc1)
##   spectrumID MS.GF.RawScore MS.GF.DeNovoScore MS.GF.SpecEValue
## 1  scan=5782            147               174     3.764831e-27
## 2  scan=6037            214               245     6.902626e-26
## 3  scan=5235            211               264     1.778789e-25
## 4  scan=5397            154               178     1.792541e-24
## 5  scan=6075            188               252     1.510364e-23
## 6  scan=5761            123               138     1.618941e-23
##   MS.GF.EValue
## 1 5.430080e-21
## 2 9.943751e-20
## 3 2.564787e-19
## 4 2.581753e-18
## 5 2.178423e-17
## 6 2.329453e-17

The mzID package, has similar functionality to parse identification files, and was the first one to provide such capabilities in R. The main difference with mzR is that is parses the files using the XMLpackage and reads the whole data into memory rather than relying on proteowizard, and is slower.

Exercise Data wrangling with identification data; the standard tidyverse tools are fit for purpose here. Extract and combine the PSMs and their scores as described above and combine them. From the available data, calculate the length of each peptide (you can use nchar with the peptide sequence sequence) and the number of peptides for each protein (DatabaseDescription). Plot the length of the proteins for their respective number of peptides. Optionally, stratify the plot by the peptide e-value score (MS.GF.EValue) using for example cut to define bins.

suppressPackageStartupMessages(library("dplyr"))
fid1$peplen <- nchar(as.character(fid1$sequence))
ids <- as_tibble(full_join(fid1, sc1))
## Joining, by = "spectrumID"
npeps <- ids %>% group_by(DatabaseDescription) %>% tally 
ids <- full_join(ids, npeps)
## Joining, by = "DatabaseDescription"
library("ggplot2")
ggplot(ids, aes(x = n, y = DBseqLength)) + geom_point()
Identifcation data wrangling 1

Identifcation data wrangling 1

ids$evalBins <- cut(ids$MS.GF.EValue, summary(ids$MS.GF.EValue))
ggplot(ids, aes(x = n, y = DBseqLength, color = peplen)) +
    geom_point() +
    facet_wrap(~ evalBins)

Identifcation data wrangling 2

6 Analysing search results

The MSnID package can be used for post-search filtering of MS/MS identifications. One starts with the construction of an MSnID object that is populated with identification results that can be imported from a data.frame or from mzIdenML files. Here, we will use the example identification data provided with the package.

mzids <- system.file("extdata", "c_elegans.mzid.gz", package="MSnID")
basename(mzids)
## [1] "c_elegans.mzid.gz"

We start by loading the package, initialising the MSnID object, and add the identification result from our mzid file (there could of course be more that one).

library("MSnID")
msnid <- MSnID(".")
## Note, the anticipated/suggested columns in the
## peptide-to-spectrum matching results are:
## -----------------------------------------------
## accession
## calculatedMassToCharge
## chargeState
## experimentalMassToCharge
## isDecoy
## peptide
## spectrumFile
## spectrumID
msnid <- read_mzIDs(msnid, mzids)
## Loaded cached data
show(msnid)
## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 12263 at 36 % FDR
## #peptides: 9489 at 44 % FDR
## #accessions: 7414 at 76 % FDR

Printing the MSnID object returns some basic information such as

  • Working directory.
  • Number of spectrum files used to generate data.
  • Number of peptide-to-spectrum matches and corresponding FDR.
  • Number of unique peptide sequences and corresponding FDR.
  • Number of unique proteins or amino acid sequence accessions and corresponding FDR.

The package then enables to define, optimise and apply filtering based for example on missed cleavages, identification scores, precursor mass errors, etc. and assess PSM, peptide and protein FDR levels. To properly function, it expects to have access to the following data

## [1] "accession"                "calculatedMassToCharge"  
## [3] "chargeState"              "experimentalMassToCharge"
## [5] "isDecoy"                  "peptide"                 
## [7] "spectrumFile"             "spectrumID"

which are indeed present in our data:

names(msnid)
##  [1] "spectrumID"                "scan number(s)"           
##  [3] "acquisitionNum"            "passThreshold"            
##  [5] "rank"                      "calculatedMassToCharge"   
##  [7] "experimentalMassToCharge"  "chargeState"              
##  [9] "MS-GF:DeNovoScore"         "MS-GF:EValue"             
## [11] "MS-GF:PepQValue"           "MS-GF:QValue"             
## [13] "MS-GF:RawScore"            "MS-GF:SpecEValue"         
## [15] "AssumedDissociationMethod" "IsotopeError"             
## [17] "isDecoy"                   "post"                     
## [19] "pre"                       "end"                      
## [21] "start"                     "accession"                
## [23] "length"                    "description"              
## [25] "pepSeq"                    "modified"                 
## [27] "modification"              "idFile"                   
## [29] "spectrumFile"              "databaseFile"             
## [31] "peptide"

Here, we summarise a few steps and redirect the reader to the package’s vignette for more details:

6.1 Analysis of peptide sequences

Cleaning irregular cleavages at the termini of the peptides and missing cleavage site within the peptide sequences. The following two function call create the new numMisCleavages and numIrrCleabages columns in the MSnID object

msnid <- assess_termini(msnid, validCleavagePattern="[KR]\\.[^P]")
msnid <- assess_missed_cleavages(msnid, missedCleavagePattern="[KR](?=[^P$])")

6.2 Trimming the data

Now, we can use the apply_filter function to effectively apply filters. The strings passed to the function represent expressions that will be evaluated, this keeping only PSMs that have 0 irregular cleavages and 2 or less missed cleavages.

msnid <- apply_filter(msnid, "numIrregCleavages == 0")
msnid <- apply_filter(msnid, "numMissCleavages <= 2")
show(msnid)
## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 7838 at 17 % FDR
## #peptides: 5598 at 23 % FDR
## #accessions: 3759 at 53 % FDR

6.3 Parent ion mass errors

Using "calculatedMassToCharge" and "experimentalMassToCharge", the mass_measurement_error function calculates the parent ion mass measurement error in parts per million.

summary(mass_measurement_error(msnid))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -2184.0640    -0.6992     0.0000    17.6146     0.7512  2012.5178

We then filter any matches that do not fit the +/- 20 ppm tolerance

msnid <- apply_filter(msnid, "abs(mass_measurement_error(msnid)) < 20")
summary(mass_measurement_error(msnid))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -19.7797  -0.5866   0.0000  -0.2970   0.5713  19.6758

6.4 Filtering criteria

Filtering of the identification data will rely on

  • -log10 transformed MS-GF+ Spectrum E-value, reflecting the goodness of match experimental and theoretical fragmentation patterns
msnid$msmsScore <- -log10(msnid$`MS-GF:SpecEValue`)
  • the absolute mass measurement error (in ppm units) of the parent ion
msnid$absParentMassErrorPPM <- abs(mass_measurement_error(msnid))

MS2 filters are handled by a special MSnIDFilter class objects, where individual filters are set by name (that is present in names(msnid)) and comparison operator (>, <, = , …) defining if we should retain hits with higher or lower given the threshold and finally the threshold value itself.

filtObj <- MSnIDFilter(msnid)
filtObj$absParentMassErrorPPM <- list(comparison="<", threshold=10.0)
filtObj$msmsScore <- list(comparison=">", threshold=10.0)
show(filtObj)
## MSnIDFilter object
## (absParentMassErrorPPM < 10) & (msmsScore > 10)

We can then evaluate the filter on the identification data object, which return the false discovery rate and number of retained identifications for the filtering criteria at hand.

evaluate_filter(msnid, filtObj)
##           fdr    n
## PSM         0 3807
## peptide     0 2455
## accession   0 1009

6.5 Filter optimisation

Rather than setting filtering values by hand, as shown above, these can be set automativally to meet a specific false discovery rate.

filtObj.grid <- optimize_filter(filtObj, msnid, fdr.max=0.01,
                                method="Grid", level="peptide",
                                n.iter=500)
show(filtObj.grid)
## MSnIDFilter object
## (absParentMassErrorPPM < 3) & (msmsScore > 7.4)
evaluate_filter(msnid, filtObj.grid)
##                   fdr    n
## PSM       0.004097561 5146
## peptide   0.006447651 3278
## accession 0.021996616 1208

Filters can eventually be applied (rather than just evaluated) using the apply_filter function.

msnid <- apply_filter(msnid, filtObj.grid)
show(msnid)
## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 5146 at 0.41 % FDR
## #peptides: 3278 at 0.64 % FDR
## #accessions: 1208 at 2.2 % FDR

And finally, identifications that matched decoy and contaminant protein sequences are removed

msnid <- apply_filter(msnid, "isDecoy == FALSE") 
msnid <- apply_filter(msnid, "!grepl('Contaminant',accession)")
show(msnid)
## MSnID object
## Working directory: "."
## #Spectrum Files:  1 
## #PSMs: 5117 at 0 % FDR
## #peptides: 3251 at 0 % FDR
## #accessions: 1179 at 0 % FDR

The resulting filtered identification data can be exported to a data.frame or to a dedicated MSnSet data structure for quantitative MS data, described below, and further processed and analyses using appropriate statistical tests.

7 High-level data interface

The above sections introduced low-level interfaces to raw and identification results. The MSnbase package provides abstractions for raw data through the MSnExp class and containers for quantification data via the MSnSet class. Both store

  1. the actual assay data (spectra or quantitation matrix, see below), accessed with spectra (or the [, [[ operators) or exprs;
  2. sample metadata, accessed as a data.frame with pData;
  3. feature metadata, accessed as a data.frame with fData.

Another useful slot is processingData, accessed with processingData(.), that records all the processing that objects have undergone since their creation (see examples below).

The readMSData2 (there is also a readMSData variant - see the benchmarking vignette[^Open it with vignette("benchmarking", package = "MSnbase") or read it online] in MSnbase for their differences) will parse the raw data, extract all MS spectra (by default, but individual MS levels can selected, as illustrated below) and construct an MS experiment object of class MSnExp.

library("MSnbase")
rawFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
               full.name = TRUE, pattern = "mzXML$")
basename(rawFile)
## [1] "dummyiTRAQ.mzXML"
msexp <- readMSData2(rawFile, msLevel = 2L)
msexp
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.03 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 5 
##  MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded [Thu Jun 15 07:03:12 2017] 
## Filter: select MS level(s) 2 [Thu Jun 15 07:03:12 2017] 
##  MSnbase version: 2.3.3 
## - - - Meta data  - - -
## phenoData
##   rowNames: dummyiTRAQ.mzXML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: X1.1 X2.1 ... X5.1 (5 total)
##   fvarLabels: fileIdx spIdx ... spectrum (27 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'

MS2 spectra can be extracted as a list of Spectrum2 objects with the spectra accessor or as a subset of the original MSnExp data with the [ operator. Individual spectra can be accessed with [[.

length(msexp)
## [1] 5
msexp[1:2]
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.03 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 2 
##  MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded [Thu Jun 15 07:03:12 2017] 
## Filter: select MS level(s) 2 [Thu Jun 15 07:03:12 2017] 
##  MSnbase version: 2.3.3 
## - - - Meta data  - - -
## phenoData
##   rowNames: dummyiTRAQ.mzXML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: X1.1 X2.1
##   fvarLabels: fileIdx spIdx ... spectrum (27 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
msexp[[2]]
## Object of class "Spectrum2"
##  Precursor: 546.9586 
##  Retention time: 25:2 
##  Charge: 3 
##  MSn level: 2 
##  Peaks count: 1012 
##  Total ion count: 56758067

The identification results stemming from the same raw data file can then be used to add PSM matches.

fData(msexp)
##      fileIdx spIdx centroided smoothed seqNum acquisitionNum msLevel
## X1.1       1     1         NA       NA      1              1       2
## X2.1       1     2         NA       NA      2              2       2
## X3.1       1     3         NA       NA      3              3       2
##      polarity originalPeaksCount totIonCurrent retentionTime basePeakMZ
## X1.1        1               2921     182542000       1501.35    645.374
## X2.1        1               1012      16488100       1501.59    703.338
## X3.1        1               2125      41969700       1501.85    117.115
##      basePeakIntensity collisionEnergy ionisationEnergy lowMZ   highMZ
## X1.1          16292100              40                0   100 2016.660
## X2.1           2150600              40                0   100 1707.060
## X3.1           2332700              40                0   100 1701.750
##      precursorScanNum precursorMZ precursorCharge precursorIntensity
## X1.1                0    645.3741               3           47659400
## X2.1                0    546.9586               3           26356100
## X3.1                0    645.3741               2           23432400
##      mergedScan mergedResultScanNum mergedResultStartScanNum
## X1.1          0                   0                        0
## X2.1          0                   0                        0
## X3.1          0                   0                        0
##      mergedResultEndScanNum injectionTime spectrum
## X1.1                      0             0        1
## X2.1                      0             0        2
## X3.1                      0             0        3
##  [ reached getOption("max.print") -- omitted 2 rows ]
## find path to a mzIdentML file
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
                 full.name = TRUE, pattern = "dummyiTRAQ.mzid")
basename(identFile)
## [1] "dummyiTRAQ.mzid"
msexp <- addIdentificationData(msexp, identFile)
fvarLabels(msexp)
##  [1] "fileIdx"                   "spIdx"                    
##  [3] "centroided"                "smoothed"                 
##  [5] "seqNum"                    "acquisitionNum"           
##  [7] "msLevel"                   "polarity"                 
##  [9] "originalPeaksCount"        "totIonCurrent"            
## [11] "retentionTime"             "basePeakMZ"               
## [13] "basePeakIntensity"         "collisionEnergy"          
## [15] "ionisationEnergy"          "lowMZ"                    
## [17] "highMZ"                    "precursorScanNum"         
## [19] "precursorMZ"               "precursorCharge"          
## [21] "precursorIntensity"        "mergedScan"               
## [23] "mergedResultScanNum"       "mergedResultStartScanNum" 
## [25] "mergedResultEndScanNum"    "injectionTime"            
## [27] "spectrum"                  "scan number(s)"           
## [29] "passthreshold"             "rank"                     
## [31] "calculatedmasstocharge"    "experimentalmasstocharge" 
## [33] "chargestate"               "ms-gf:denovoscore"        
## [35] "ms-gf:evalue"              "ms-gf:rawscore"           
## [37] "ms-gf:specevalue"          "assumeddissociationmethod"
## [39] "isotopeerror"              "isdecoy"                  
## [41] "post"                      "pre"                      
## [43] "end"                       "start"                    
## [45] "accession"                 "length"                   
## [47] "description"               "pepseq"                   
## [49] "modified"                  "modification"             
## [51] "idFile"                    "databaseFile"             
## [53] "nprot"                     "npep.prot"                
## [55] "npsm.prot"                 "npsm.pep"

The readMSData2 (readMSdata) and addIdentificationData make use of the mzR and mzID packages to access the raw and identification data.

Exercise Are the spectra in the msexp object centroided? This time, instead of looking at the spectra, we can query (and set) this information directly using the centroided function (and centroided<- replacement function). The isCentroided function can be used to assess the mode from the data.

centroided(msexp)
## X1.1 X2.1 X3.1 X4.1 X5.1 
##   NA   NA   NA   NA   NA
isCentroided(msexp)
## [1] FALSE FALSE FALSE FALSE FALSE
centroided(msexp) <- isCentroided(msexp)

Spectra and (parts of) experiments can be extracted and plotted.

msexp[[1]]
## Object of class "Spectrum2"
##  Precursor: 645.3741 
##  Retention time: 25:1 
##  Charge: 3 
##  MSn level: 2 
##  Peaks count: 2921 
##  Total ion count: 668170086
plot(msexp[[1]], full=TRUE)
Plotting a object of class Spectrum.

Plotting a object of class Spectrum.

As this data was labeled with iTRAQ4 isobaric tags, we can highlight these four peaks of interest with

plot(msexp[[1]], full=TRUE, reporters = iTRAQ4)
Plotting a object of class Spectrum with reporter ions.

Plotting a object of class Spectrum with reporter ions.

msexp[1:3]
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.03 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 3 
##  MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded [Thu Jun 15 07:03:12 2017] 
## Filter: select MS level(s) 2 [Thu Jun 15 07:03:12 2017] 
##  MSnbase version: 2.3.3 
## - - - Meta data  - - -
## phenoData
##   rowNames: dummyiTRAQ.mzXML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: X1.1 X2.1 X3.1
##   fvarLabels: fileIdx spIdx ... npsm.pep (56 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
plot(msexp[1:3], full=TRUE)
Plotting a object of class MSnExp

Plotting a object of class MSnExp

Exercise Repeat the previous combination of raw and identification data with the TMT_Erwinia mzML and mzid files that were imported from ProteomeXchange and generated using MSGFplus respectively. Here, we know that the MS1 and MS2 level are in profile mode and centroided respectively. Read the readMSData2 documentation to set the centroided. argument to set these attributes appropriately.

tmterw <- readMSData2(mzf, centroided = c(FALSE, TRUE))
table(centroided(tmterw), msLevel(tmterw))
##        
##            1    2
##   FALSE 1431    0
##   TRUE     0 6103
## Let's filter MS2 spectra after reading (optional) and add the
## identification data using pipes
library("magrittr")
tmterw <- tmterw %>%
    filterMsLevel(2L) %>%
    addIdentificationData(idres)
tmterw
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 3.06 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of spectra: 6103 
##  MSn retention times: 3:22 - 60:2 minutes
## - - - Processing information - - -
## Data loaded [Wed Jun 14 08:30:08 2017] 
## Filter: select MS level(s) 2 [Wed Jun 14 08:30:08 2017] 
##  MSnbase version: 2.3.3 
## - - - Meta data  - - -
## phenoData
##   rowNames:
##     TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML
##   varLabels: sampleNames
##   varMetadata: labelDescription
## Loaded from:
##   TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML 
## protocolData: none
## featureData
##   featureNames: X0227.1 X0228.1 ... X7534.1 (6103 total)
##   fvarLabels: fileIdx spIdx ... npsm.pep (58 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'

8 Visualisation

8.1 Visualising raw data

The importance of flexible access to specialised data becomes visible in the figure below5 Figure and code taken from the RforProteomics visualisation vignette. Not only can we access specific data and understand/visualise them, but we can transverse all the data and extracted/visualise/understand structured slices of data.

Visual summary of a full round of MS1 and MS2 acquisition for specific MS1 spectrum.

Visual summary of a full round of MS1 and MS2 acquisition for specific MS1 spectrum.

In this code chunks we start by selecting relevant spectra of interest. We will focus on the first MS1 spectrum acquired after 30 minutes of retention time.

## (1) Open raw data file
ms <- openMSfile(mzf)
## (2) Extract the header information
hd <- header(ms)
## (3) MS1 spectra indices
ms1 <- which(hd$msLevel == 1)
## (4) Select MS1 spectra with retention time between 30 and 35 minutes
rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
## (5) Indices of the 1st and 2nd MS1 spectra after 30 minutes
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
## (6) Interleaved MS2 spectra
ms2 <- (i+1):(j-1)

Now now extract and plot all relevant information:

  1. The upper panel represents the chromatogram of the /home/lg390/Documents/Teaching/bioc-ms-prot/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML raw data file, produced by plotting the data returned by the chromatogram function.
plot(chromatogram(ms)[[1]], type = "l")
A chromatogram.

A chromatogram.

  1. We concentrate at a specific retention time, 30:1 minutes (1800.6836 seconds)
plot(chromatogram(ms)[[1]], type = "l")
abline(v = hd[i, "retentionTime"]/60, col = "red")
A chromatogram with highlighted retention time of interest.

A chromatogram with highlighted retention time of interest.

  1. This corresponds to the 2807th MS1 spectrum, shown on the second row of figures.
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
       legend = paste0(
           "Acquisition ", hd[i, "acquisitionNum"],  "\n",
           "Retention time ", formatRt(hd[i, "retentionTime"])))
The MS1 spectrum at the retention time of interest.

The MS1 spectrum at the retention time of interest.

  1. The ions that were selected for MS2 are highlighted by vertical lines. These are represented in the bottom part of the figure.
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
       legend = paste0(
           "Acquisition ", hd[i, "acquisitionNum"],  "\n",
           "Retention time ", formatRt(hd[i, "retentionTime"])))
abline(v = hd[ms2, "precursorMZ"],
       col = c("#FF000080",
           rep("#12121280", 9)))
The MS1 spectrum at the retention time of interest, highlighting the peptides selected for MS2.

The MS1 spectrum at the retention time of interest, highlighting the peptides selected for MS2.

  1. On the right, we zoom on the isotopic envelope of one peptide in particular (the one highlighted with a red line).
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5))
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")
Isotopic envelope of a peptide.

Isotopic envelope of a peptide.

  1. A final loop through the relevant MS2 spectra plots the length(ms2) MS2 spectra highlighted above.
par(mfrow = c(5, 2), mar = c(2, 2, 0, 1))
for (ii in ms2) {
    p <- peaks(ms, ii)
    plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
    legend("topright", legend = paste0("Prec M/Z\n",
                           round(hd[ii, "precursorMZ"], 2)),
           bty = "n", cex = .8)
}
MS2 spectra of all selected precursors from the previous MS1 spectrum.

MS2 spectra of all selected precursors from the previous MS1 spectrum.

Putting it all together:

layout(lout)
par(mar=c(4,2,1,1))

plot(chromatogram(ms)[[1]], type = "l")
abline(v = hd[i, "retentionTime"]/60, col = "red")
par(mar = c(3, 2, 1, 0))
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
       legend = paste0(
           "Acquisition ", hd[i, "acquisitionNum"],  "\n",
           "Retention time ", formatRt(hd[i, "retentionTime"])))
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"],
       col = c("#FF000080",
           rep("#12121280", 9)))

par(mar = c(3, 0.5, 1, 1))
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5),
     yaxt = "n")
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")

par(mar = c(2, 2, 0, 1))
for (ii in ms2) {
    p <- peaks(ms, ii)
    plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
    legend("topright", legend = paste0("Prec M/Z\n",
                           round(hd[ii, "precursorMZ"], 2)),
           bty = "n", cex = .8)
}
Visual summary of a full round of MS1 and MS2 acquisition for specific MS1 spectrum.

Visual summary of a full round of MS1 and MS2 acquisition for specific MS1 spectrum.

Below, we illustrate some additional visualisation and animations of raw MS data, also taken from the RforProteomics visualisation vignette. On the left, we have a heatmap visualisation of a MS map and a 3 dimensional representation of the same data. On the right, 2 MS1 spectra in blue and the set of interleaves 10 MS2 spectra.

## (1) MS space heaptmap
M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd)
## 1
ff <- colorRampPalette(c("yellow", "steelblue"))
trellis.par.set(regions=list(col=ff(100)))
m1 <- plot(M, aspect = 1, allTicks = FALSE)
## (2) Same data as (1), in 3 dimenstion
M@map[msMap(M) == 0] <- NA
m2 <- plot3D(M, rgl = FALSE)
## (3) The 2 MS1 and 10 interleaved MS2 spectra from above
i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
M2 <- MSmap(ms, i:j, 100, 1000, 1, hd)
## 1
m3 <- plot3D(M2)
grid.arrange(m1, m2, m3, ncol = 3)
Plotting MS maps along retention time, MZ range and intensity.

Plotting MS maps along retention time, MZ range and intensity.

Below, we have animations build from extracting successive slices as above.

MS animation 1 MS animation 2

8.2 Visualising identification data

Annotated spectra and comparing spectra.

par(mfrow = c(1, 2))
itraqdata2 <- pickPeaks(itraqdata, verbose = FALSE)
s <- "SIGFEGDSIGR"
plot(itraqdata2[[14]], s, main = s)
plot(itraqdata2[[25]], itraqdata2[[28]], sequences = rep("IMIDLDGTENK", 2))
Annotating and comparing MS2 spectra.

Annotating and comparing MS2 spectra.

The annotation of spectra is obtained by simulating fragmentation of a peptide and matching observed peaks to fragments:

calculateFragments("SIGFEGDSIGR")
##            mz  ion type pos z         seq
## 1    88.03931   b1    b   1 1           S
## 2   201.12337   b2    b   2 1          SI
## 3   258.14483   b3    b   3 1         SIG
## 4   405.21324   b4    b   4 1        SIGF
## 5   534.25583   b5    b   5 1       SIGFE
## 6   591.27729   b6    b   6 1      SIGFEG
## 7   706.30423   b7    b   7 1     SIGFEGD
## 8   793.33626   b8    b   8 1    SIGFEGDS
## 9   906.42032   b9    b   9 1   SIGFEGDSI
## 10  963.44178  b10    b  10 1  SIGFEGDSIG
## 11 1119.54289  b11    b  11 1 SIGFEGDSIGR
## 12  175.11895   y1    y   1 1           R
## 13  232.14041   y2    y   2 1          GR
## 14  345.22447   y3    y   3 1         IGR
## 15  432.25650   y4    y   4 1        SIGR
## 16  547.28344   y5    y   5 1       DSIGR
##  [ reached getOption("max.print") -- omitted 20 rows ]

Visualising a pair of spectra means that we can access them, and that, in addition to plotting, we can manipulate them and perform computations. The two spectra corresponding to the IMIDLDGTENK peptide, for example have 22 common peaks, a correlation of 0.198 and a dot product of 0.21 (see ?compareSpectra for details).

9 Quantitative proteomics

There are a wide range of proteomics quantitation techniques that can broadly be classified as labelled vs. label-free, depending whether the features are labelled prior the MS acquisition and the MS level at which quantitation is inferred, namely MS1 or MS2.

Label-free Labelled
MS1 XIC SILAC, 15N
MS2 Counting iTRAQ, TMT

In terms of raw data quantitation, most efforts have been devoted to MS2-level quantitation. Label-free XIC quantitation has however been addressed in the frame of metabolomics data processing by the xcms infrastructure.

Below is a list of suggested packages for some common proteomics quantitation technologies:

  • Isobaric tagging (iTRAQ and TMT): MSnbase and isobar.
  • Label-free: xcms (metabolomics).
  • Counting: MSnbase and MSnID for peptide-spectrum matching confidence assessment.
  • N14N15 for heavy Nitrogen-labelled data.

Isobaric tagging

An MSnExp is converted to an MSnSet by the quantitation method. Below, we use the iTRAQ 4-plex isobaric tagging strategy (defined by the iTRAQ4 parameter; other tags are available: see ?ReporterIons) and the max method to calculate the use the maximum of the reporter peak for quantitation.

plot(msexp[[1]], full=TRUE, reporters = iTRAQ4)
MS2 spectrum and it’s iTRAQ4 reporter ions.

MS2 spectrum and it’s iTRAQ4 reporter ions.

msset <- quantify(msexp, method = "max", reporters = iTRAQ4)

The figure below give a schematics of an MSnSet instance and the relation between the assay data and the respective feature and sample metadata, accessible respectively with the exprs, fData and pData functions.

MSnSet structure

MSnSet structure

New columns can be added to the metadata slots.

exprs(msset)
##      iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
## X1.1   706555.7   685055.1   929016.1   668245.2
## X2.1   260663.7   212745.0   163782.8   239142.7
## X3.1  2213566.0  2069209.6  2204032.2  2331846.8
## X4.1   616043.4   705976.6   671828.8   666845.6
## X5.1  1736128.2  1787622.5  1795311.8  1825523.0
pData(msset)
##                  mz reporters
## iTRAQ4.114 114.1112    iTRAQ4
## iTRAQ4.115 115.1083    iTRAQ4
## iTRAQ4.116 116.1116    iTRAQ4
## iTRAQ4.117 117.1150    iTRAQ4
pData(msset)$groups <- rep(c("Treat", "Cond"), each = 2)
pData(msset)
##                  mz reporters groups
## iTRAQ4.114 114.1112    iTRAQ4  Treat
## iTRAQ4.115 115.1083    iTRAQ4  Treat
## iTRAQ4.116 116.1116    iTRAQ4   Cond
## iTRAQ4.117 117.1150    iTRAQ4   Cond

Another useful slot is processingData, accessed with processingData(.), that records all the processing that objects have undergone since their creation.

processingData(msset)
## - - - Processing information - - -
## Fast iTRAQ4 quantitation by max [Thu Jun 15 07:03:36 2017] 
##  MSnbase version: 2.3.3

See also The isobar package supports quantitation from centroided mgf peak lists or its own tab-separated files that can be generated from Mascot and Phenyx vendor files.

Spectral counting

Other MS2 quantitation methods available in quantify include the (normalised) spectral index SI and (normalised) spectral abundance factor SAF or simply a simple count method6 The code below is for illustration only - it doesn’t make much sense to perform any of these quantitations on such a multiplexed data.

exprs(si <- quantify(msexp, method = "SIn"))     
##         dummyiTRAQ.mzXML
## ECA0510      0.003588641
## ECA1028      0.001470129
exprs(saf <- quantify(msexp, method = "NSAF"))
##         dummyiTRAQ.mzXML
## ECA0510        0.6235828
## ECA1028        0.3764172

Note that spectra that have not been assigned any peptide (NA) or that match non-unique peptides (npsm > 1) are discarded in the counting process.

As shown above, the MSnID package enables to explore and assess the confidence of identification data using mzid files. A subset of all peptide-spectrum matches, that pass a specific false discovery rate threshold can them be converted to an MSnSet, where the number of peptide occurrences are used to populate the assay data.

10 Importing third-party quantitation data

The Proteomics Standard Initiative (PSI) mzTab file format is aimed at providing a simpler (than XML formats) and more accessible file format to the wider community. It is composed of a key-value metadata section and peptide/protein/small molecule tabular sections. These data can be imported with the readMzTabData function7 We specify version 0.9 (which generates the warning) to fit with the version of that file. For recent files, the version argument should be ignored to use the importer for the current file version 1.0..

mztf <- pxget(px, "F063721.dat-mztab.txt")
## Downloading 1 file
(mzt <- readMzTabData(mztf, what = "PEP", version = "0.9"))
## Warning: Version 0.9 is deprecated. Please see '?readMzTabData' and '?
## MzTab' for details.
## MSnSet (storageMode: lockedEnvironment)
## assayData: 1528 features, 6 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: sub[1] sub[2] ... sub[6] (6 total)
##   varLabels: abundance
##   varMetadata: labelDescription
## featureData
##   featureNames: 1 2 ... 1528 (1528 total)
##   fvarLabels: sequence accession ... uri (14 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
## mzTab read: Tue Jun 13 17:55:11 2017 
##  MSnbase version: 2.3.3

It is also possible to import arbitrary spreadsheets as MSnSet objects into R with the readMSnSet2 function. The main 2 arguments of the function are (1) a text-based spreadsheet and (2) column names of indices that identify the quantitation data. The latter can be queried with the getEcols function.

csv <- dir(system.file ("extdata" , package = "pRolocdata"),
           full.names = TRUE, pattern = "pr800866n_si_004-rep1.csv")
getEcols(csv, split = ",")
##  [1] "\"Protein ID\""              "\"FBgn\""                   
##  [3] "\"Flybase Symbol\""          "\"No. peptide IDs\""        
##  [5] "\"Mascot score\""            "\"No. peptides quantified\""
##  [7] "\"area 114\""                "\"area 115\""               
##  [9] "\"area 116\""                "\"area 117\""               
## [11] "\"PLS-DA classification\""   "\"Peptide sequence\""       
## [13] "\"Precursor ion mass\""      "\"Precursor ion charge\""   
## [15] "\"pd.2013\""                 "\"pd.markers\""
ecols <- 7:10
res <- readMSnSet2(csv, ecols)
head(exprs(res))
##   area.114 area.115 area.116 area.117
## 1 0.379000 0.281000 0.225000 0.114000
## 2 0.420000 0.209667 0.206111 0.163889
## 3 0.187333 0.167333 0.169667 0.476000
## 4 0.247500 0.253000 0.320000 0.179000
## 5 0.216000 0.183000 0.342000 0.259000
## 6 0.072000 0.212333 0.573000 0.142667
head(fData(res))
##   Protein.ID        FBgn Flybase.Symbol No..peptide.IDs Mascot.score
## 1    CG10060 FBgn0001104    G-ialpha65A               3       179.86
## 2    CG10067 FBgn0000044         Act57B               5       222.40
## 3    CG10077 FBgn0035720        CG10077               5       219.65
## 4    CG10079 FBgn0003731           Egfr               2        86.39
## 5    CG10106 FBgn0029506        Tsp42Ee               1        52.10
## 6    CG10130 FBgn0010638      Sec61beta               2        79.90
##   No..peptides.quantified PLS.DA.classification Peptide.sequence
## 1                       1                    PM                 
## 2                       9                    PM                 
## 3                       3                                       
## 4                       2                    PM                 
## 5                       1                              GGVFDTIQK
## 6                       3              ER/Golgi                 
##   Precursor.ion.mass Precursor.ion.charge     pd.2013 pd.markers
## 1                                                  PM    unknown
## 2                                                  PM    unknown
## 3                                             unknown    unknown
## 4                                                  PM    unknown
## 5            626.887                    2 Phenotype 1    unknown
## 6                                            ER/Golgi         ER

11 Data processing and analysis

11.1 Raw data processing

For raw data processing look at MSnbase’s clean, smooth, pickPeaks, removePeaks and trimMz for MSnExp and spectra processing methods.

As an illustration, we show the pickPeaks function on the itraqdata data.

library("ggplot2") ## for coord_cartesian
data(itraqdata)
plot(itraqdata[[10]], full = TRUE) +
    coord_cartesian(xlim = c(915, 925))
Peak picking: profile mode.

Peak picking: profile mode.

itraqdata2 <- pickPeaks(itraqdata)
plot(itraqdata2[[10]], full = TRUE, w1 = 0.05) +
    coord_cartesian(xlim = c(915, 925))
Peak picking: centroided.

Peak picking: centroided.

The MALDIquant and xcms packages also features a wide range of raw data processing methods on their own ad hoc data instance types.

11.2 Processing and normalisation

Each different types of quantitative data will require their own pre-processing and normalisation steps. Both isobar and MSnbase allow to correct for isobaric tag impurities normalise the quantitative data.

data(itraqdata)
qnt <- quantify(itraqdata, method = "trap", reporters = iTRAQ4)
impurities <- matrix(c(0.929, 0.059, 0.002, 0.000,
                       0.020, 0.923, 0.056, 0.001,
                       0.000, 0.030, 0.924, 0.045,
                       0.000, 0.001, 0.040, 0.923),
                     nrow = 4, byrow = TRUE)
## or, using makeImpuritiesMatrix()
## impurities <- makeImpuritiesMatrix(4)
qnt <- purityCorrect(qnt, impurities)
processingData(qnt)
## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011 
## Updated from version 0.3.0 to 0.3.1 [Fri Jul  8 20:23:25 2016] 
## iTRAQ4 quantification by trapezoidation: Thu Jun 15 07:03:41 2017 
## Purity corrected: Thu Jun 15 07:03:41 2017 
##  MSnbase version: 1.1.22

Various normalisation methods can be applied the MSnSet instances using the normalise method: variance stabilisation (vsn), quantile (quantiles), median or mean centring (center.media or center.mean), …

qnt <- normalise(qnt, "quantiles")
processingData(qnt)
## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011 
## Updated from version 0.3.0 to 0.3.1 [Fri Jul  8 20:23:25 2016] 
## iTRAQ4 quantification by trapezoidation: Thu Jun 15 07:03:41 2017 
## Purity corrected: Thu Jun 15 07:03:41 2017 
## Normalised (quantiles): Thu Jun 15 07:03:41 2017 
##  MSnbase version: 1.1.22

The combineFeatures method combines spectra/peptides quantitation values into protein data. The grouping is defined by the groupBy parameter, which is generally taken from the feature metadata (protein accessions, for example).

gb <- fData(qnt)$ProteinDescription
prt <- combineFeatures(qnt, groupBy = gb, fun = "median")
processingData(prt)
## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011 
## Updated from version 0.3.0 to 0.3.1 [Fri Jul  8 20:23:25 2016] 
## iTRAQ4 quantification by trapezoidation: Thu Jun 15 07:03:41 2017 
## Purity corrected: Thu Jun 15 07:03:41 2017 
## Normalised (quantiles): Thu Jun 15 07:03:41 2017 
## Combined 55 features into 39 using median: Thu Jun 15 07:03:41 2017 
##  MSnbase version: 2.3.3

Missing values

Finally, proteomics data analysis is generally hampered by missing values. Missing data imputation is a sensitive operation whose success will be guided by many factors, such as degree and (non-)random nature of the missingness.

Below, we load an MSnSet with missing values, count the number missing and non-missing values.

data(naset)
table(is.na(naset))
## 
## FALSE  TRUE 
## 10254   770

The naplot figure will reorder cells within the data matrix so that the experiments and features with many missing values will be grouped towards the top and right of the heatmap, and barplots at the top and right summarise the number of missing values in the respective samples (column) and rows (rows).

naplot(naset)
Overview of missing values.

Overview of missing values.

The importance of missing values in a dataset will depend on the quantitation technology employed. Label-free quantitation in particular can suffer from a very high number of missing values.

Missing value in MSnSet instances can be filtered out with the filterNA functions. By default, it removes features that contain at least NA value.

## remove features with missing values
tmp <- filterNA(naset)
processingData(tmp)
## - - - Processing information - - -
## Subset [689,16][301,16] Thu Jun 15 07:03:41 2017 
## Removed features with more than 0 NAs: Thu Jun 15 07:03:41 2017 
## Dropped featureData's levels Thu Jun 15 07:03:41 2017 
##  MSnbase version: 1.15.6

It is of course possible to impute missing values (?impute). This is however not a straightforward thing, as is likely to dramatically fail when a high proportion of data is missing (10s of %)8 Note that when using limma for instance, downstream analyses can handle missing values. Still, it is recommended to explore missingness as part of the exploratory data analysis.. But also, there are two types of mechanisms resulting in missing values in LC/MSMS experiments.

  • Missing values resulting from absence of detection of a feature, despite ions being present at detectable concentrations. For example in the case of ion suppression or as a result from the stochastic, data-dependent nature of the MS acquisition method. These missing value are expected to be randomly distributed in the data and are defined as missing at random (MAR) or missing completely at random (MCAR).

  • Biologically relevant missing values, resulting from the absence or the low abundance of ions (below the limit of detection of the instrument). These missing values are not expected to be randomly distributed in the data and are defined as missing not at random (MNAR).

Random and non-random missing values.

Random and non-random missing values.

Different imputation methods are more appropriate to different classes of missing values (as documented in this paper). Values missing at random, and those missing not at random should be imputed with different methods.

Root-mean-square error (RMSE) observations standard deviation ratio (RSR), KNN and MinDet imputation. Lower (blue) is better. (See here for details)

Root-mean-square error (RMSE) observations standard deviation ratio (RSR), KNN and MinDet imputation. Lower (blue) is better. (See here for details)

Generally, it is recommended to use hot deck methods (nearest neighbour (left), maximum likelihood, …) when data are missing at random.Conversely, MNAR features should ideally be imputed with a left-censor (minimum value (right), but not zero, …) method.

## impute missing values using knn imputation
tmp <- impute(naset, method = "knn")
## Warning in knnimp(x, k, maxmiss = rowmax, maxp = maxp): 12 rows with more than 50 % entries missing;
##  mean imputation used for these rows
processingData(tmp)
## - - - Processing information - - -
## Data imputation using knn Thu Jun 15 07:03:42 2017 
##   Using default parameters 
##  MSnbase version: 1.15.6

There are various methods to perform data imputation, as described in ?impute.

12 Statistical analysis

R in general and Bioconductor in particular are well suited for the statistical analysis of data of quantitative proteomics data. Several packages provide dedicated resources for proteomics data:

  • MSstats: A set of tools for statistical relative protein significance analysis in Data dependent (DDA), SRM and Data independent acquisition (DIA) experiments. Data stored in data.frame or MSnSet objects can be used as input.

  • msmsTests: Statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package. All can be readily applied on MSnSet instances produced, for example by MSnID.

  • isobar also provides dedicated infrastructure for the statistical analysis of isobaric data.

13 Machine learning

The MLInterfaces package provides a unified interface to a wide range of machine learning algorithms. Initially developed for microarray and ExpressionSet instances, the pRoloc package enables application of these algorithms to MSnSet data.

13.1 Dimensionality reduction

Dimensionality reduction is very frequently used to summarise high-dimensional data. Below we will use principal component analysis (PCA), but other methods can be applied. Below, we will use the plot2D function from the pRoloc package9 While originally developed for the analysis of spatial/organelle proteomics data in mind, it is applicable many use cases., that will extract the expression values in the assay data, perform dimensionality reduction, an produce the scatter plot.

Let’s first use plot2D to visualise the pattern in 20 protein quantitation values (initial 20 dimensional data). Here, we use an example from spatial proteomics, where the quantitative protein profiles reflect the proteins sub-cellular localisation (from Christoforou et al, 2016, see also Breckels et al, 2016 for more data analysis background). We will use the known localisation of some proteins (marker proteins) to annotate the plot (using the fcol argument).

library("pRoloc")
library("pRolocdata")
data(hyperLOPIT2015)
plot2D(hyperLOPIT2015, fcol = "markers")
addLegend(hyperLOPIT2015, fcol = "markers", cex = .7)
PCA plot for protein sub-cellular localisation.

PCA plot for protein sub-cellular localisation.

In other cases, we want to visualise the relation of samples. plot2D uses the rows of the data to perform dimensionality reduction. To use the columns, we just need to transpose the MSnSet. By doing so, the pData becomes the fData and vice versa.

Let’s use a time-course experiment on stem cells (Mulvey et al. 2015). Below, we use the times (time points) variable to set colours and rep (replicate numbers) to set the plotting characters.

data(mulvey2015)
head(pData(mulvey2015))
##           rep times cond
## rep1_0hr    1     1    1
## rep1_16hr   1     2    1
## rep1_24hr   1     3    1
## rep1_48hr   1     4    1
## rep1_72hr   1     5    1
## rep1_XEN    1     6    1
plot2D(t(mulvey2015),  fcol = "times", fpch = "rep", cex = 2)
addLegend(t(mulvey2015),  fcol = "times")

PCA plots for sample in a time-course experiment. ## Classification

The example below uses knn with the 5 closest neighbours and the MLInterfaces package as an illustration to classify proteins of unknown sub-cellular localisation to one of 9 possible organelles.

library("MLInterfaces")
library("pRolocdata")
data(dunkley2006)
traininds <- which(fData(dunkley2006)$markers != "unknown")
ans <- MLearn(markers ~ ., data = t(dunkley2006), knnI(k = 5), traininds)
ans
## MLInterfaces classification output container
## The call was:
## MLearn(formula = markers ~ ., data = t(dunkley2006), .method = knnI(k = 5), 
##     trainInd = traininds)
## Predicted outcome distribution for test set:
## 
##      ER lumen   ER membrane         Golgi Mitochondrion       Plastid 
##             5           140            67            51            29 
##            PM      Ribosome           TGN       vacuole 
##            89            31             6            10 
## Summary of scores on test set (use testScores() method for details):
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4000  1.0000  1.0000  0.9332  1.0000  1.0000

13.2 Clustering

A wide range of classification and clustering algorithms are also available, as described in the ?MLearn documentation page, used below.

kcl <- MLearn( ~ ., data = dunkley2006, kmeansI, centers = 12)
kcl
## clusteringOutput: partition table
## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
##  55  28  29  18  60  30  44 145  78 116  33  53 
## The call that created this object was:
## MLearn(formula = ~., data = dunkley2006, .method = kmeansI, centers = 12)
plot(kcl, exprs(dunkley2006))
Kmeans clustering using r Biocpkg('MLInterfaces') with an MSnSet object.

Kmeans clustering using r Biocpkg('MLInterfaces') with an MSnSet object.

14 Annotation

All the Bioconductor annotation infrastructure, such as biomaRt, GO.db, organism specific annotations, .. are directly relevant to the analysis of proteomics data. A total of 191 ontologies, including some proteomics-centred annotations such as the PSI Mass Spectrometry Ontology, Molecular Interaction (PSI MI 2.5) or Protein Modifications are available through the rols

library("rols")
res <- OlsSearch(q = "ESI", ontology = "MS", exact = TRUE)
res
## Object of class 'OlsSearch':
##   ontolgy: MS 
##   query: ESI 
##   requested: 20 (out of 1)
##   response(s): 0

There is a single exact match (default is to retrieve 20 results), that can be retrieved and coerced to a Terms or data.frame object with

res <- olsSearch(res)
as(res, "Terms")
## Object of class 'Terms' with 1 entries
##  From the MS ontology
## MS:1000073
as(res, "data.frame")
##                                             id
## 1 ms:http://purl.obolibrary.org/obo/MS_1000073
##                                         iri short_form     obo_id
## 1 http://purl.obolibrary.org/obo/MS_1000073 MS_1000073 MS:1000073
##                     label
## 1 electrospray ionization
##                                                                                                                                                                                                                                                                                                                                                                                                                                                  description
## 1 A process in which ionized species in the gas phase are produced from an analyte-containing solution via highly charged fine droplets, by means of spraying the solution from a narrow-bore needle tip at atmospheric pressure in the presence of a high electric field. When a pressurized gas is used to aid in the formation of a stable spray, the term pneumatically assisted electrospray ionization is used. The term ion spray is not recommended.
##   ontology_name ontology_prefix  type is_defining_ontology
## 1            ms              MS class                 TRUE

Data from the Human Protein Atlas is available via the hpar package.

15 More information

15.2 Other relevant packages/pipelines

  • Analysis of post translational modification with isobar.
  • Processing and analysis or isobaric tagging mass spectrometry with isobar and MSnbase.
  • Analysis of spatial proteomics data and, more generally, supervised, semi-supervised learning, and transfer learning using the pRoloc package.
  • Analysis of MALDI data with the MALDIquant package.
  • Access to the Proteomics Standard Initiative Common QUery InterfaCe with the PSICQUIC package.
  • Cardinal: A mass spectrometry imaging toolbox for statistical analysis.
  • MSnID: Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications.
  • protViz: Visualising and Analysing Mass Spectrometry Related Data in Proteomics
  • aLFQ: Estimating Absolute Protein Quantities from Label-Free LC-MS/MS Proteomics Data.
  • protiq: Protein (identification and) quantification based on peptide evidence.
  • MSstats: Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments

Data Independent Acquisition

  • Analysis of label-free data from a Synapt G2 (including ion mobility) with synapter.
  • SWATH2stats: Transform and Filter SWATH Data for Statistical Packages and
  • specL: Prepare Peptide Spectrum Matches for Use in Targeted Proteomics
  • SwathXtend: SWATH extended library generation and statistical data analysis

16 Session Information

sessionInfo()
## R Under development (unstable) (2017-02-25 r72256)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/atlas-base/libf77blas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  methods   stats     graphics  grDevices utils    
## [8] datasets  base     
## 
## other attached packages:
##  [1] ggplot2_2.2.1         dplyr_0.7.0           AnnotationHub_2.9.5  
##  [4] knitr_1.16            hpar_1.19.0           rols_2.5.0           
##  [7] MSGFplus_1.11.0       pRolocdata_1.15.0     rpx_1.13.1           
## [10] MSnID_1.11.0          mzID_1.15.0           pRoloc_1.17.3        
## [13] MLInterfaces_1.57.0   cluster_2.0.6         annotate_1.55.0      
## [16] XML_3.98-1.7          AnnotationDbi_1.39.1  IRanges_2.11.5       
## [19] S4Vectors_0.15.4      gridExtra_2.2.1       lattice_0.20-35      
## [22] RforProteomics_1.15.0 BiocInstaller_1.27.2  MSnbase_2.3.3        
## [25] ProtGenerics_1.9.0    BiocParallel_1.11.2   mzR_2.11.3           
## [28] Rcpp_0.12.11          Biobase_2.37.2        BiocGenerics_0.23.0  
## [31] BiocStyle_2.5.2      
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.0               plyr_1.8.4                   
##   [3] lazyeval_0.2.0                GSEABase_1.39.0              
##   [5] splines_3.4.0                 ggvis_0.4.3                  
##   [7] digest_0.6.12                 foreach_1.4.3                
##   [9] htmltools_0.3.6               viridis_0.4.0                
##  [11] gdata_2.18.0                  magrittr_1.5                 
##  [13] memoise_1.1.0                 doParallel_1.0.10            
##  [15] sfsmisc_1.1-1                 limma_3.33.2                 
##  [17] rda_1.0.2-2                   R.utils_2.5.0                
##  [19] lpSolve_5.6.13                prettyunits_1.0.2            
##  [21] colorspace_1.3-2              RCurl_1.95-4.8               
##  [23] jsonlite_1.5                  hexbin_1.27.1                
##  [25] graph_1.55.0                  genefilter_1.59.0            
##  [27] lme4_1.1-13                   impute_1.51.0                
##  [29] survival_2.41-3               iterators_1.0.8              
##  [31] glue_1.0.0                    gtable_0.2.0                 
##  [33] zlibbioc_1.23.0               MatrixModels_0.4-1           
##  [35] R.cache_0.12.0                car_2.1-4                    
##  [37] kernlab_0.9-25                prabclus_2.2-6               
##  [39] DEoptimR_1.0-8                SparseM_1.77                 
##  [41] scales_0.4.1                  vsn_3.45.0                   
##  [43] mvtnorm_1.0-6                 DBI_0.6-1                    
##  [45] viridisLite_0.2.0             xtable_1.8-2                 
##  [47] progress_1.1.2                proxy_0.4-17                 
##  [49] mclust_5.3                    preprocessCore_1.39.0        
##  [51] httr_1.2.1                    sampling_2.8                 
##  [53] htmlwidgets_0.8               threejs_0.2.2                
##  [55] gplots_3.0.1                  FNN_1.1                      
##  [57] RColorBrewer_1.1-2            fpc_2.1-10                   
##  [59] modeltools_0.2-21             pkgconfig_2.0.1              
##  [61] R.methodsS3_1.7.1             flexmix_2.3-14               
##  [63] nnet_7.3-12                   caret_6.0-76                 
##  [65] labeling_0.3                  rlang_0.1.1.9000             
##  [67] reshape2_1.4.2                munsell_0.4.3                
##  [69] mlbench_2.1-1                 biocViews_1.45.0             
##  [71] tools_3.4.0                   msdata_0.17.0                
##  [73] RSQLite_1.1-2                 pls_2.6-0                    
##  [75] evaluate_0.10                 stringr_1.2.0                
##  [77] yaml_2.1.14                   ModelMetrics_1.1.0           
##  [79] robustbase_0.92-7             caTools_1.17.1               
##  [81] randomForest_4.6-12           dendextend_1.5.2             
##  [83] RBGL_1.53.0                   nlme_3.1-131                 
##  [85] quantreg_5.33                 whisker_0.3-2                
##  [87] mime_0.5                      R.oo_1.21.0                  
##  [89] xml2_1.1.1                    biomaRt_2.33.2               
##  [91] compiler_3.4.0                pbkrtest_0.4-7               
##  [93] curl_2.6                      interactiveDisplayBase_1.15.0
##  [95] e1071_1.6-8                   affyio_1.47.0                
##  [97] tibble_1.3.3                  stringi_1.1.5                
##  [99] highr_0.6                     trimcluster_0.1-2            
## [101] Matrix_1.2-10                 nloptr_1.0.4                 
## [103] gbm_2.1.3                     RUnit_0.4.31                 
## [105] MALDIquant_1.16.2             data.table_1.10.4            
## [107] bitops_1.0-6                  httpuv_1.3.3                 
## [109] R6_2.2.1                      pcaMethods_1.69.0            
## [111] affy_1.55.0                   hwriter_1.3.2                
## [113] KernSmooth_2.23-15            gridSVG_1.5-1                
## [115] codetools_0.2-15              MASS_7.3-47                  
## [117] gtools_3.5.0                  assertthat_0.2.0             
## [119] interactiveDisplay_1.15.0     Category_2.43.0              
## [121] rprojroot_1.2                 diptest_0.75-7               
## [123] mgcv_1.8-17                   grid_3.4.0                   
## [125] rpart_4.1-11                  class_7.3-14                 
## [127] minqa_1.2.4                   rmarkdown_1.5                
## [129] Rtsne_0.13                    shiny_1.0.3                  
## [131] base64enc_0.1-3