This vignette describes how to analyse mass-spectrometry based differential localisation experiments using the BANDLE method (Crook et al. 2021). Data should be stored as lists of MSnSet
s. There is also features for quality control and visualisation of results. Please see other vignettes for convergence and other methodology.
bandle 1.8.0
In this vignette we use a real-life biological use-case to demonstrate how to analyse mass-spectrometry based proteomics data using the Bayesian ANalysis of Differential Localisation Experiments (BANDLE) method.
As mentioned in “Vignette 1: Getting Started with BANDLE” data from mass
spectrometry based proteomics methods most commonly yield a matrix of
measurements where we have proteins/peptides/peptide spectrum matches
(PSMs) along the rows, and samples/fractions along the columns. To use bandle
the data must be stored as a MSnSet
, as implemented in the Bioconductor
MSnbase package. Please see the relevant vignettes in
MSnbase for constructing these data containers.
The data used in this vignette has been published in Mulvey et al. (2021) and is currently
stored as MSnSet
instances in the the pRolocdata package. We
will load it in the next section.
In this workflow we analyse the data produced by Mulvey et al. (2021). In this experiment triplicate hyperLOPIT experiments (Mulvey et al. 2017) were conducted on THP-1 human leukaemia cells where the samples were analysed and collected (1) when cells were unstimulated and then (2) following 12 hours stimulation with LPS (12h-LPS).
In the following code chunk we load 4 of the datasets from the study: 2 replicates of the unstimulated and 2 replicates of the 12h-LPS stimulated samples. Please note to adhere to Bioconductor vignette build times we only load 2 of the 3 replicates for each condition to demonstrate the BANDLE workflow.
library("pRolocdata")
data("thpLOPIT_unstimulated_rep1_mulvey2021")
data("thpLOPIT_unstimulated_rep3_mulvey2021")
data("thpLOPIT_lps_rep1_mulvey2021")
data("thpLOPIT_lps_rep3_mulvey2021")
By typing the names of the datasets we get a MSnSet
data summary. For
example,
thpLOPIT_unstimulated_rep1_mulvey2021
## MSnSet (storageMode: lockedEnvironment)
## assayData: 5107 features, 20 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: unstim_rep1_set1_126_cyto unstim_rep1_set1_127N_F1.4 ...
## unstim_rep1_set2_131_F24 (20 total)
## varLabels: Tag Treatment ... Fraction (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: A0AVT1 A0FGR8-2 ... Q9Y6Y8 (5107 total)
## fvarLabels: Checked_unst.r1.s1 Confidence_unst.r1.s1 ... markers (107
## total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
## - - - Processing information - - -
## Loaded on Tue Jan 12 14:46:48 2021.
## Normalised to sum of intensities.
## MSnbase version: 2.14.2
thpLOPIT_lps_rep1_mulvey2021
## MSnSet (storageMode: lockedEnvironment)
## assayData: 4879 features, 20 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: lps_rep1_set1_126_cyto lps_rep1_set1_127N_F1.4 ...
## lps_rep1_set2_131_F25 (20 total)
## varLabels: Tag Treatment ... Fraction (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: A0A0B4J2F0 A0AVT1 ... Q9Y6Y8 (4879 total)
## fvarLabels: Checked_lps.r1.s1 Confidence_lps.r1.s1 ... markers (107
## total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
## - - - Processing information - - -
## Loaded on Tue Jan 12 14:46:57 2021.
## Normalised to sum of intensities.
## MSnbase version: 2.14.2
We see that the datasets thpLOPIT_unstimulated_rep1_mulvey2021
and
thpLOPIT_lps_rep1_mulvey2021
contain 5107 and 4879 proteins respectively,
across 20 TMT channels. The data is accessed through different slots of the
MSnSet
(see str(thpLOPIT_unstimulated_rep1_mulvey2021)
for all available
slots). The 3 main slots which are used most frequently are those that contain
the quantitation data, the features i.e. PSM/peptide/protein information and the
sample information, and these can be accessed using the functions exprs
,
fData
, and pData
, respectively.
First, let us load the bandle
package along with some other R packages needed
for visualisation and data manipulation,
library("bandle")
library("pheatmap")
library("viridis")
library("dplyr")
library("ggplot2")
To run bandle
there are a few minimal requirements that the data must fulfill.
list
of MSnSet
instancesIf we use the dim
function we see that the datasets we have loaded have the
same number of channels but a different number of proteins per experiment.
dim(thpLOPIT_unstimulated_rep1_mulvey2021)
## [1] 5107 20
dim(thpLOPIT_unstimulated_rep3_mulvey2021)
## [1] 5733 20
dim(thpLOPIT_lps_rep1_mulvey2021)
## [1] 4879 20
dim(thpLOPIT_lps_rep3_mulvey2021)
## [1] 5848 20
We use the function commonFeatureNames
to extract proteins that are common
across all replicates. This function has a nice side effect which is that it
also wraps the data into a list
, ready for input into bandle
.
thplopit <- commonFeatureNames(c(thpLOPIT_unstimulated_rep1_mulvey2021, ## unstimulated rep
thpLOPIT_unstimulated_rep3_mulvey2021, ## unstimulated rep
thpLOPIT_lps_rep1_mulvey2021, ## 12h-LPS rep
thpLOPIT_lps_rep3_mulvey2021)) ## 12h-LPS rep
## 3727 features in common
We now have our list of MSnSet
s ready for bandle
with 3727 proteins common
across all 4 replicates/conditions.
thplopit
## Instance of class 'MSnSetList' containig 4 objects.
We can visualise the data using the plot2D
function from pRoloc
## create a character vector of title names for the plots
plot_id <- c("Unstimulated 1st rep", "Unstimulated 2nd rep",
"12h-LPS 1st rep", "12h-LPS 2nd rep")
## Let's set the stock colours of the classes to plot to be transparent
setStockcol(NULL)
setStockcol(paste0(getStockcol(), "90"))
## plot the data
par(mfrow = c(2,2))
for (i in seq(thplopit))
plot2D(thplopit[[i]], main = plot_id[i])
addLegend(thplopit[[4]], where = "topleft", cex = .75)