1 The scp package

The scp package is used to process and analyse mass spectrometry-based single cell proteomics (SCP) data. It relies on the QFeatures (Gatto (2020)) package to manage and process SingleCellExperiment (Lun and Risso (2020)) objects.

`scp` relies on `SingleCellExperiment` and `QFeatures` objects.

(#fig:scp_framework)scp relies on SingleCellExperiment and QFeatures objects.

This vignette will guide you through the common steps of mass spectrometry-based single-cell proteomics data analysis. To start, we load the scp package.


We also load ggplot2, magrittr and dplyr for convenient data manipulation and plotting.


2 Read in SCP data

The workflow starts with reading in the tabular quantification data generated by, for example, MaxQuant (Cox and Mann (2008)). We created a small example data by subsetting the MaxQuant table provided in the SCoPE2 preprint (Specht et al. (2019)). The mqScpData table is a typical example of what you would get after reading in a CSV file using read.csv or read.table. See ?mqScpData for more information about the table content.

#> [1] 1197  139
mqScpData[1:10, 1:5]
#>                Sequence Length Modifications      Modified.sequence
#> 2            AANLNSIIHR     10    Unmodified           _AANLNSIIHR_
#> 3              AARATGLR      8    Unmodified             _AARATGLR_
#> 4             AAVLLEQER      9    Unmodified            _AAVLLEQER_
#> 5           AAYEAELGDAR     11    Unmodified          _AAYEAELGDAR_
#> 6               AFEAVDK      7    Unmodified              _AFEAVDK_
#> 7               AGESWDK      7    Unmodified              _AGESWDK_
#> 8              AGLYGLPR      8    Unmodified             _AGLYGLPR_
#> 9           AGQAVDDFIEK     11    Unmodified          _AGQAVDDFIEK_
#> 10          AGQAVDDFIEK     11    Unmodified          _AGQAVDDFIEK_
#>    Deamidation..N..Probabilities
#> 1                           <NA>
#> 2                           <NA>
#> 3                           <NA>
#> 4                           <NA>
#> 5                           <NA>
#> 6                           <NA>
#> 7                           <NA>
#> 8                           <NA>
#> 9                           <NA>
#> 10                          <NA>

In order to convert this tabular data to a scp-compatible QFeatures object, we need to provide a metadata table where rows contain sample information and columns must contain at least:

  • The name of the batch the sample was acquired in
  • The name of the channel the sample was acquired in

Any additional information about the samples will be stored in the colData.

We provide an example of such a data frame. It was formatted from the annotation table provided in the SCoPE2 preprint. See ?sampleAnnotation for more information about the table content.

#>                              Set Channel SampleType lcbatch sortday digest
#> 1          190222S_LCA9_X_FP94BM     RI1    Carrier    LCA9      s8      N
#> 2          190222S_LCA9_X_FP94BM     RI2  Reference    LCA9      s8      N
#> 3          190222S_LCA9_X_FP94BM     RI3     Unused    LCA9      s8      N
#> 4          190222S_LCA9_X_FP94BM     RI4   Monocyte    LCA9      s8      N
#> 5          190222S_LCA9_X_FP94BM     RI5      Blank    LCA9      s8      N
#> 6          190222S_LCA9_X_FP94BM     RI6   Monocyte    LCA9      s8      N
#> 7          190222S_LCA9_X_FP94BM     RI7 Macrophage    LCA9      s8      N
#> 8          190222S_LCA9_X_FP94BM     RI8 Macrophage    LCA9      s8      N
#> 9          190222S_LCA9_X_FP94BM     RI9 Macrophage    LCA9      s8      N
#> 10         190222S_LCA9_X_FP94BM    RI10 Macrophage    LCA9      s8      N
#> 11         190222S_LCA9_X_FP94BM    RI11 Macrophage    LCA9      s8      N
#> 12         190222S_LCA9_X_FP94BM    RI12     Unused    LCA9      s8      N
#> 13         190222S_LCA9_X_FP94BM    RI13     Unused    LCA9      s8      N
#> 14         190222S_LCA9_X_FP94BM    RI14     Unused    LCA9      s8      N
#> 15         190222S_LCA9_X_FP94BM    RI15     Unused    LCA9      s8      N
#> 16         190222S_LCA9_X_FP94BM    RI16     Unused    LCA9      s8      N
#> 17        190321S_LCA10_X_FP97AG     RI1    Carrier   LCA10      s8      Q
#> 18        190321S_LCA10_X_FP97AG     RI2  Reference   LCA10      s8      Q
#> 19        190321S_LCA10_X_FP97AG     RI3     Unused   LCA10      s8      Q
#> 20        190321S_LCA10_X_FP97AG     RI4 Macrophage   LCA10      s8      Q
#> 21        190321S_LCA10_X_FP97AG     RI5   Monocyte   LCA10      s8      Q
#> 22        190321S_LCA10_X_FP97AG     RI6 Macrophage   LCA10      s8      Q
#> 23        190321S_LCA10_X_FP97AG     RI7 Macrophage   LCA10      s8      Q
#> 24        190321S_LCA10_X_FP97AG     RI8 Macrophage   LCA10      s8      Q
#> 25        190321S_LCA10_X_FP97AG     RI9 Macrophage   LCA10      s8      Q
#> 26        190321S_LCA10_X_FP97AG    RI10 Macrophage   LCA10      s8      Q
#> 27        190321S_LCA10_X_FP97AG    RI11 Macrophage   LCA10      s8      Q
#> 28        190321S_LCA10_X_FP97AG    RI12     Unused   LCA10      s8      Q
#> 29        190321S_LCA10_X_FP97AG    RI13     Unused   LCA10      s8      Q
#> 30        190321S_LCA10_X_FP97AG    RI14     Unused   LCA10      s8      Q
#> 31        190321S_LCA10_X_FP97AG    RI15     Unused   LCA10      s8      Q
#> 32        190321S_LCA10_X_FP97AG    RI16     Unused   LCA10      s8      Q
#> 33  190914S_LCB3_X_16plex_Set_21     RI1    Carrier    LCB3      s9      R
#> 34  190914S_LCB3_X_16plex_Set_21     RI2  Reference    LCB3      s9      R
#> 35  190914S_LCB3_X_16plex_Set_21     RI3     Unused    LCB3      s9      R
#> 36  190914S_LCB3_X_16plex_Set_21     RI4     Unused    LCB3      s9      R
#> 37  190914S_LCB3_X_16plex_Set_21     RI5 Macrophage    LCB3      s9      R
#> 38  190914S_LCB3_X_16plex_Set_21     RI6 Macrophage    LCB3      s9      R
#> 39  190914S_LCB3_X_16plex_Set_21     RI7      Blank    LCB3      s9      R
#> 40  190914S_LCB3_X_16plex_Set_21     RI8   Monocyte    LCB3      s9      R
#> 41  190914S_LCB3_X_16plex_Set_21     RI9 Macrophage    LCB3      s9      R
#> 42  190914S_LCB3_X_16plex_Set_21    RI10   Monocyte    LCB3      s9      R
#> 43  190914S_LCB3_X_16plex_Set_21    RI11      Blank    LCB3      s9      R
#> 44  190914S_LCB3_X_16plex_Set_21    RI12 Macrophage    LCB3      s9      R
#> 45  190914S_LCB3_X_16plex_Set_21    RI13 Macrophage    LCB3      s9      R
#> 46  190914S_LCB3_X_16plex_Set_21    RI14 Macrophage    LCB3      s9      R
#> 47  190914S_LCB3_X_16plex_Set_21    RI15 Macrophage    LCB3      s9      R
#> 48  190914S_LCB3_X_16plex_Set_21    RI16 Macrophage    LCB3      s9      R
#> 49 190321S_LCA10_X_FP97_blank_01     RI1      Blank   LCA10      s8   <NA>
#> 50 190321S_LCA10_X_FP97_blank_01     RI2      Blank   LCA10      s8   <NA>
#> 51 190321S_LCA10_X_FP97_blank_01     RI3      Blank   LCA10      s8   <NA>
#> 52 190321S_LCA10_X_FP97_blank_01     RI4      Blank   LCA10      s8   <NA>
#> 53 190321S_LCA10_X_FP97_blank_01     RI5      Blank   LCA10      s8   <NA>
#> 54 190321S_LCA10_X_FP97_blank_01     RI6      Blank   LCA10      s8   <NA>
#> 55 190321S_LCA10_X_FP97_blank_01     RI7      Blank   LCA10      s8   <NA>
#> 56 190321S_LCA10_X_FP97_blank_01     RI8      Blank   LCA10      s8   <NA>
#> 57 190321S_LCA10_X_FP97_blank_01     RI9      Blank   LCA10      s8   <NA>
#> 58 190321S_LCA10_X_FP97_blank_01    RI10      Blank   LCA10      s8   <NA>
#> 59 190321S_LCA10_X_FP97_blank_01    RI11      Blank   LCA10      s8   <NA>
#> 60 190321S_LCA10_X_FP97_blank_01    RI12      Blank   LCA10      s8   <NA>
#> 61 190321S_LCA10_X_FP97_blank_01    RI13      Blank   LCA10      s8   <NA>
#> 62 190321S_LCA10_X_FP97_blank_01    RI14      Blank   LCA10      s8   <NA>
#> 63 190321S_LCA10_X_FP97_blank_01    RI15      Blank   LCA10      s8   <NA>
#> 64 190321S_LCA10_X_FP97_blank_01    RI16      Blank   LCA10      s8   <NA>

The two tables are supplied to the readSCP function.

scp <- readSCP(quantTable = mqScpData,
               metaTable = sampleAnnotation,
               channelCol = "Channel",
               batchCol = "Set")
#> Loading data as a 'SingleCellExperiment' object
#> Splitting data based on 'Set'
#> Formatting sample metadata (colData)
#> Formatting data as a 'QFeatures' object

As indicated by the output on the console, readSCP proceeds as follows:

  1. If quantTable is the path to a CSV file, it reads the file using read.csv. The table is converted to a SingleCellExperiment object. readSCP needs to know in which field(s) the quantative data is stored. Those field name(s) is/are provided by the channelCol field in the metaData table. So in this example, the column names hodling the quantitative data in mqScpData are stored in the column named Channel in sampleAnnotation.

  2. The SingleCellExperiment object is then split according to batch. The split is performed depending on the batchCol field in quantTable, in this case the data will be split according to the Set column in mqScpData.

  3. The sample metadata is generated from the metaTable. Note that in order for readSCP to correctly match the feature data with the metadata, metaTable must also contain the batchCol field with batch names.

  4. Finally, the split feature data and the sample metadata are stored in a single QFeatures object.

Here is a compact overview of the data:

#> An instance of class QFeatures containing 4 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 334 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 433 rows and 16 columns 
#>  [3] 190321S_LCA10_X_FP97_blank_01: SingleCellExperiment with 109 rows and 16 columns 
#>  [4] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 321 rows and 16 columns

We can see that the scp object we created is a QFeatures object containing 4 assays. Each assay has an associated name, this is the batch name that was used for splitting. We can also see that each assay is a SingleCellExperiment object. The rows represent the peptide to spectrum matches (PSMs), the number vary depending on the batch. Finally, all three assays contains 16 columns that correspond to the 16 TMT channels recorded during the 4 MS runs.

3 Clean missing data

Single-cell (proteomics or transcriptomics) data contains many zeros. The zeros can be biological zeros or technical zeros and differentiating between the two types is very hard. To avoid artefacts in dowstream steps, we replace the zeros by the missing value NA. The zeroIsNA function takes the QFeatures object and the name(s) or index/indices of the assay(s) to clean and automatically replaces any zero in the selected quantitative data by NA.

scp <- zeroIsNA(scp, i = 1:4)

4 Filter PSMs

A common steps in SCP is to filter out low-confidence PSMs. Each PSM assay contains feature meta-information that are stored in the assay’s rowData. The QFeatures package allows to quickly filter the rows of an assay by using these information. The available variables in the rowData are listed below for each assay.

#> CharacterList of length 4
#> [["190222S_LCA9_X_FP94BM"]] Sequence Length ... participated peptide
#> [["190321S_LCA10_X_FP97AG"]] Sequence Length ... participated peptide
#> [["190321S_LCA10_X_FP97_blank_01"]] Sequence Length ... participated peptide
#> [["190914S_LCB3_X_16plex_Set_21"]] Sequence Length ... participated peptide

4.1 Filter features based on feature metadata

Below are some examples of criteria that are used to identify low-confidence. The information is readily available since this was computed by MaxQuant:

  • Remove PSMs that are matched to contaminants
  • Remove PSMs that are matched to the decoy database
  • Keep PSMs that exhibit a high PIF (parental ion fraction), indicative of the purity of a spectrum

We can perform this filtering using the filterFeatures function. filterFeatures automatically accesses the feature metadata and selects the rows that meet the provided condition(s). For instance, Reverse != "+" keeps the rows for which the Reverse variable in the rowData is not "+" (i.e. the PSM is not matched to the decoy database).

scp <- filterFeatures(scp,
                      ~ Reverse != "+" &
                        !grepl("REV|CON", protein) &
                        Potential.contaminant != "+" &
                        !is.na(PIF) & PIF > 0.8)

4.2 Filter assays based on detected features

To avoid proceeding with failed runs, another interesting filter is to remove assays with too few features. If a batch contains less than, for example, 150 features we can then suspect something wrong happened in that batch and it should be removed. Using dims, we can query the dimensions (hence the number of features and the number of samples) of all assays contained in the dataset.

#>      190222S_LCA9_X_FP94BM 190321S_LCA10_X_FP97AG 190321S_LCA10_X_FP97_blank_01
#> [1,]                   253                    282                            57
#> [2,]                    16                     16                            16
#>      190914S_LCB3_X_16plex_Set_21
#> [1,]                          183
#> [2,]                           16

Actually, a QFeatures object can be seen as a three-order array: \(features \times samples \times assay\). Hence, QFeatures supports three-order subsetting x[rows, columns, assays]. We first select the assay that have sufficient PSMs (the number of rows is greater than 150), and then subset the scp object for the assays that meet the criterion.

keepAssay <- dims(scp)[1, ] > 150
scp <- scp[, , keepAssay]
#> harmonizing input:
#>   removing 16 sampleMap rows not in names(experiments)
#>   removing 16 colData rownames not in sampleMap 'primary'
#> An instance of class QFeatures containing 3 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 253 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 282 rows and 16 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 183 rows and 16 columns

Notice the 190321S_LCA10_X_FP97_blank_01 sample was removed because it did not contain sufficient features, as expected from a blank sample. This could also have been the case for failed runs.

4.3 Filter features based on SCP metrics

Another type of filtering is specific to SCP. In the SCoPE2 analysis, the authors suggest a filters based on the sample to carrier ratio (SCR), that is the reporter ion intensity of a single-cell sample divided by the reporter ion intensity of the carrier channel (200 cells) from the same batch. It is expected that the carrier intensities are much higher than the single-cell intensities.

The SCR can be computed using the computeSCR function from scp. The function must be told which channels are the samples that must be divided and which channel contains the carrier. This information is provided in the sample metadata and is accessed using the colData, under the SampleType field.

colData(scp)[, "SampleType"] %>%
#> .
#>      Blank    Carrier Macrophage   Monocyte  Reference     Unused 
#>          3          3         20          5          3         14

In this dataset, SampleType gives the type of sample that is present in each TMT channel. The SCoPE2 protocole includes 5 types of samples:

  • The carrier channels (Carrier) contain 200 cell equivalents and are meant to boost the peptide identification rate.
  • The normalization channels (Reference) contain 5 cell equivalents and are used to partially correct for between-run variation.
  • The unused channels (Unused) are channels that are left empty due to isotopic cross-contamination by the carrier channel.
  • The blank channels (Blank) contain samples that do not contain any cell but are processed as single-cell samples.
  • The single-cell sample channels contain the single-cell samples of interest, that are macrophage (Macrophage) or monocyte (Monocyte).

The computeSCR function expects the following input:

  • The QFeatures dataset
  • The assay name(s) or index/indices for which the SCR should be computed
  • The sample metadata variable pointing to the channel annotation
  • A string pattern (following regular expression syntax) that uniquely identifies the carrier channel in each batch
  • A string pattern (following regular expression syntax) that identifies the samples to divide

The function creates an field .meanSCR and stores it in the rowData of each assay.

scp <- computeSCR(scp,
                  i = 1:3,
                  colDataCol = "SampleType",
                  carrierPattern = "Carrier",
                  samplePattern = "Macrophage|Monocyte")

Before applying the filter, we plot the distribution of the average SCR. We can extract the .meanSCR variable from the rowData of several assays using the rowDataToDF. It takes the rowData field(s) of interest and returns a DataFrame table.

scp %>%
  rowDataToDF(i = 1:3,
              vars = ".meanSCR") %>%
  data.frame %>%
  ggplot(aes(x = .meanSCR)) +
  geom_histogram() +
  geom_vline(xintercept = 0.1)

Most values are close to 0.02 as expected since the experimental ratio is 1/200. There are a few point that have higher signal than expected. We therefore filter out those points using a cutoff of 0.1 using again the filterFeatures functions.

scp <- filterFeatures(scp,
                      ~ !is.na(.meanSCR) &
                        .meanSCR < 0.1)

4.4 Filter out PSMs with high false discovery rate

Finally, a last PSM filter criterion is the false discovery rate (FDR) for identification. Filtering on the PEP is too conservative (Käll et al. (2008)) so we provide the computeFDR function to convert PEPs to FDR. Beside the dataset and the assay(s) for which to compute the FDR, we also need to give the feature grouping variable, here the peptide sequence, and the variable containing the PEPs. Those are contained in the feature metadata.

scp <- computeFDR(scp,
                  i = 1:3,
                  groupCol = "peptide",
                  pepCol = "dart_PEP")

Note that a new variable .FDR containing the computed FDRs is added to the rowData. We filter the PSMs that have an associated peptide FDR smaller than 1%.

scp <- filterFeatures(scp,
                      ~ .FDR < 0.01)

5 Process the PSM data

5.1 Relative reporter ion intensity

In order to partialy correct for between-run variation, SCoPE2 suggests computing relative reporter ion intensities. This means that intensities measured for single-cells are divided by the reference channel containing 5-cell equivalents. We use the divideByReference function that divides channels of interest by the reference channel. Similarly to computeSCR, we can point to the samples and the reference columns in each assay using the annotation contained in the colData.

We here divide all columns (using the regular expression wildcard .) by the reference channel (Reference).

scp <- divideByReference(scp,
                         i = 1:3,
                         colDataCol = "SampleType",
                         samplePattern = ".",
                         refPattern = "Reference")

6 Aggregate PSM data to peptide data

Now that the PSM assays are processed, we can aggregate them to peptides. This is performed using the aggregateFeaturesOverAssays function. For each assay, the function aggregates several PSMs into a unique peptide. This is best illustrated by the figure below.

Conceptual illustration of feature aggregation.

(#fig:features_aggregation)Conceptual illustration of feature aggregation.

Remember there currently are three assays containing the PSM data.

#> An instance of class QFeatures containing 3 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 216 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 240 rows and 16 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns

The PSMs are aggregated over the fcol feature variable, here peptides. We also need to supply an aggregating function that will tell how to combine the quantitative data of the PSMs to aggregate. We here use the median value. We name the aggregated assays using the original names and appending peptides_ at the start.

scp <- aggregateFeaturesOverAssays(scp,
                                   i = 1:3,
                                   fcol = "peptide",
                                   name = paste0("peptides_", names(scp)),
                                   fun = matrixStats::colMedians, na.rm = TRUE)

Notice that 3 new assays were created in the scp object. Those new assays contain the aggregated features while the three first assays are unchanged. This allows to keep track of the data processing.

#> An instance of class QFeatures containing 6 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 216 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 240 rows and 16 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns 
#>  [4] peptides_190222S_LCA9_X_FP94BM: SingleCellExperiment with 214 rows and 16 columns 
#>  [5] peptides_190321S_LCA10_X_FP97AG: SingleCellExperiment with 240 rows and 16 columns 
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns

Under the hood, the QFeatures architecture preserves the relationship between the aggregated assays. See ?AssayLinks for more information on relationships between assays.

7 Join the SCoPE2 sets in one assay

Up to now, we kept the data belonging to each MS run in separate assays. We now combine all batches into a single assay. This is done using the joinAssays function from the QFeatures package. Note that we now use the aggregated assays, so assay 4 to 6.

scp <- joinAssays(scp,
                  i = 4:6,
                  name = "peptides")
#> harmonizing input:
#>   removing 48 sampleMap rows not in names(experiments)

In this case, one new assay is created in the scp object that combines the data from assay 4 to 6. The samples are always distinct so the number of column in the new assay (here \(48\)) will always equals the sum of the columns in the assays to join (here \(16 + 16 + 16\)). The feature in the joined assay might contain less features than the sum of the rows of the assays to join since common features between assays are joined in a single row.

#> An instance of class QFeatures containing 7 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 216 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 240 rows and 16 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns 
#>  [4] peptides_190222S_LCA9_X_FP94BM: SingleCellExperiment with 214 rows and 16 columns 
#>  [5] peptides_190321S_LCA10_X_FP97AG: SingleCellExperiment with 240 rows and 16 columns 
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns 
#>  [7] peptides: SingleCellExperiment with 358 rows and 48 columns

8 Filter single-cells

Another common step in single-cell data analysis pipelines is to remove low-quality cells. After subseting for the samples of interest, we will use 2 metrics: the median relative intensities per cell and the median coefficient of variation (CV) per cell.

8.1 Filter samples of interest

We first subset the cells of interest, that is the blank samples (sc_0), the macrophages (sc_m0) and the monocytes (sc_u).

We extract the SingleCellExperiment assay with the joined peptides, subset the channels corresponding to blank, macrophage or monocytes and add it as a new assay in the QFeatures object. However, the sample annotation is contained in the colData of the QFeatures dataset, but we need to access it from the SingeCellExperiment object. Therefore, we provide the transferColDataToAssay to copy the sample metadata from the QFeatures to a target assay.

#> DataFrame with 48 rows and 0 columns
scp <- transferColDataToAssay(scp, "peptides")
#> DataFrame with 48 rows and 6 columns
#>                                             Set     Channel  SampleType
#>                                     <character> <character> <character>
#> 190222S_LCA9_X_FP94BM_RI1         190222S_LC...         RI1     Carrier
#> 190222S_LCA9_X_FP94BM_RI2         190222S_LC...         RI2   Reference
#> 190222S_LCA9_X_FP94BM_RI3         190222S_LC...         RI3      Unused
#> 190222S_LCA9_X_FP94BM_RI4         190222S_LC...         RI4    Monocyte
#> 190222S_LCA9_X_FP94BM_RI5         190222S_LC...         RI5       Blank
#> ...                                         ...         ...         ...
#> 190914S_LCB3_X_16plex_Set_21_RI12 190914S_LC...        RI12  Macrophage
#> 190914S_LCB3_X_16plex_Set_21_RI13 190914S_LC...        RI13  Macrophage
#> 190914S_LCB3_X_16plex_Set_21_RI14 190914S_LC...        RI14  Macrophage
#> 190914S_LCB3_X_16plex_Set_21_RI15 190914S_LC...        RI15  Macrophage
#> 190914S_LCB3_X_16plex_Set_21_RI16 190914S_LC...        RI16  Macrophage
#>                                       lcbatch     sortday      digest
#>                                   <character> <character> <character>
#> 190222S_LCA9_X_FP94BM_RI1                LCA9          s8           N
#> 190222S_LCA9_X_FP94BM_RI2                LCA9          s8           N
#> 190222S_LCA9_X_FP94BM_RI3                LCA9          s8           N
#> 190222S_LCA9_X_FP94BM_RI4                LCA9          s8           N
#> 190222S_LCA9_X_FP94BM_RI5                LCA9          s8           N
#> ...                                       ...         ...         ...
#> 190914S_LCB3_X_16plex_Set_21_RI12        LCB3          s9           R
#> 190914S_LCB3_X_16plex_Set_21_RI13        LCB3          s9           R
#> 190914S_LCB3_X_16plex_Set_21_RI14        LCB3          s9           R
#> 190914S_LCB3_X_16plex_Set_21_RI15        LCB3          s9           R
#> 190914S_LCB3_X_16plex_Set_21_RI16        LCB3          s9           R

Once the metadata is transfered, we can subset the SingleCellExperiment assay.

sce <- scp[["peptides"]]
sce <- sce[, sce$SampleType %in% c("Blank", "Macrophage", "Monocyte")]

Then we add this assay back into the QFeatures object. This is done using the addAssay function. We also preserve the links between dfeatures using the addAssayLinkOneToOne. Since the features did not change (only the samples did), one-to-one links between features are added.

scp %>%
  addAssay(y = sce,
           name = "peptides_filter1") %>%
  addAssayLinkOneToOne(from = "peptides",
                       to = "peptides_filter1") ->

Note the added assay peptides_filter1 contains the same number of rows than its parent assay peptides, but less samples as 20 samples are not single-cells or blank samples bu rather unused, reference or carrier samples.

#> An instance of class QFeatures containing 8 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 216 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 240 rows and 16 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns 
#>  ...
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 142 rows and 16 columns 
#>  [7] peptides: SingleCellExperiment with 358 rows and 48 columns 
#>  [8] peptides_filter1: SingleCellExperiment with 358 rows and 28 columns

8.2 Filter based on the median relative intensity

We compute the median relative reporter ion intensity for each cell separately and apply a filter based on this statistic. This procedure recalls that of library size filtering commonly performed in scRNA-Seq data analysis, where the library size is the sum of the counts in each single cell. We will store the median intensity in the colData of the peptides_filter1 assay (so the SingleCellExperiment object) because this metric is specific to that assay. The medians are computed on the quantitative data using colMedians that requires a data matrix. The data matrix can be extracted from a SingleCellExperiment using the assay function.

scp[["peptides_filter1"]] %>%
  assay %>%
  matrixStats::colMedians(na.rm = TRUE) ->

Looking at the distribution of the median per cell can highlight low-quality cells.

scp[["peptides_filter1"]] %>%
  colData %>%
  data.frame %>%
  ggplot(aes(x = MedianRI, fill = SampleType)) +
  geom_boxplot() +