readSCP
scp 1.16.0
scp
data frameworkOur data structure is relying on two curated data classes: QFeatures
(Gatto and Vanderaa (2023)) and SingleCellExperiment
(Amezquita et al. (2020)).
QFeatures
is dedicated to the manipulation and processing of
MS-based quantitative data. It explicitly records the successive steps
to allow users to navigate up and down the different MS levels.
SingleCellExperiment
is another class designed as an efficient data
container that serves as an interface to state-of-the-art methods and
algorithms for single-cell data. Our framework combines the two
classes to inherit from their respective advantages.
Because mass spectrometry (MS)-based single-cell proteomics (SCP) only
captures the proteome of between one and a few tens of single-cells in
a single run, the data is usually acquired across many MS batches.
Therefore, the data for each run should conceptually be stored in its
own container, that we here call a set. The expected input for
working with the scp
package is quantification data of peptide to
spectrum matches (PSM). These data can then be processed to reconstruct
peptide and protein data. The links between related features across
different sets are stored to facilitate manipulation and
visualization of of PSM, peptide and protein data. This is
conceptually shown below.
The main input table required for starting an analysis with scp
is
called the assayData
.
assayData
tableThe assayData
table is generated after the identification and
quantification of the MS spectra by a pre-processing software such as
MaxQuant, ProteomeDiscoverer or MSFragger (the
list
of available software is actually much longer). We will here use as an
example a data table that has been generated by MaxQuant. The table is
available from the scp
package and is called mqScpData
(for
MaxQuant generated SCP data).
library(scp)
data("mqScpData")
dim(mqScpData)
#> [1] 1361 149
In this toy example, there are 1361 rows corresponding to features (quantified PSMs) and 149 columns corresponding to different data fields recorded by MaxQuant during the processing of the MS spectra. There are three types of columns:
quantCols
): 1 to n (depending on technology)runCol
): e.g. file name