1 Installation

To install this package, start R and enter (uncommented):

# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# 
# BiocManager::install("CytoPipeline")

Note that CytoPipeline imports ggplot2 (>= 3.4.1).
The version requirement is due to a bug in version 3.4.0., affecting ggplot2::geom_hex().

2 Introduction

The CytoPipeline package provides infrastructure to support the definition, run and standardized visualization of pre-processing and quality control pipelines for flow cytometry data. This infrastructure consists of two main S4 classes, i.e. CytoPipeline and CytoProcessingStep, as well as dedicated wrapper functions around selected third-party package methods often used to implement these pre-processing steps.

In the following sections, we demonstrate how to create a CytoPipeline object implementing a simple pre-processing pipeline, how to run it and how to retrieve and visualize the results after each step.

3 Example dataset

The example dataset that will be used throughout this vignette is derived from a reference public dataset accompanying the OMIP-021 (Optimized Multicolor Immunofluorescence Panel 021) article (Gherardin et al. 2014).

A sub-sample of this public dataset is built-in in the CytoPipeline package, as the OMIP021 dataset. See the MakeOMIP021Samples.R script for more details on how the OMIP021 dataset was created. This script is to be found in the script subdirectory in the CytoPipeline package installation path.

Note that in the CytoPipelinepackage, as in the current vignette, matrices of flow cytometry events intensities are stored as flowCore::flowFrame objects (Ellis B 2022).

4 Example of pre-processing and QC pipelines

Let’s assume that we want to pre-process the two samples of the OMIP021 dataset, and let’s assume that we want to compare what we would obtain when pre-processing these files using two different QC methods.

In the first pre-processing pipeline, we will use the flowAI QC method (Monaco et al. 2016), while in the second pipeline, we will use the PeacoQC method (Emmaneel et al. 2021). Note that when we here refer to QC method, we mean the algorithm used to ensure stability (stationarity) of the channel signals in time.

In both pipelines, the first part consists in estimating appropriate scale transformation functions for all channels present in the sample flowFrame. In order to do this, we propose the following scale transformation processing queue (Fig. 1):

  • reading the two samples .fcs files
  • removing the margin events from each file
  • applying compensation for each file
  • aggregating and sub-sampling from each file
  • estimating the scale transformations from the aggregated and sub-sampled data
Scale transform processing queue

Figure 1: Scale transform processing queue

When this first part is done, one can apply pre-processing for each file one by one. However, depending on the choice of QC method, the order of steps needs to be slightly different:

  • when using flowAI, it is advised to eliminate the ‘bad events’ starting from raw data (see (Monaco et al. 2016))
  • when using PeacoQC, it is advised to eliminate the ‘bad events’ from already compensated and scale transformed data (see (Emmaneel et al. 2021))

Therefore, we propose the following pre-processing queues represented in Fig. 2.

Pre-processing queue for two different pipeline settings

Figure 2: Pre-processing queue for two different pipeline settings

5 Building the CytoPipeline

CytoPipeline is the central S4 class used in the CytoPipeline package to represent a flow cytometry pre-processing pipeline. The main slots of CytoPipeline objects are :

  • an experimentName, which gives a name to a particular user definition of a pre-processing pipeline. The experiment here, is not related to an assay experiment, but refers to a specific way to design a pipeline. For example, in the current use case, we will define two experimentNames, one to refer to the flowAI pipeline, and another one to refer to the PeacoQC pipeline (see previous section);

  • a vector of sampleFiles, which are .fcs raw data files on which one need to run the pre-processing pipeline;

  • two processing queues, i.e. a scaleTransformProcessingQueue, and a flowFramesPreProcessingQueue, which correspond to the two parts described in previous section. Each of these queues are composed of one or several CytoProcessingStep objects, will be processed in linear sequence, the output of one step being the input of the next step.

Note there are important differences between the two processing queues. On the one hand, the scaleTransformProcessingQueue takes the vector of all sample files as an input, and will be executed first, and only once. On the other hand, the flowFramesPreProcessingQueue will be run after the scale transformation processing queue, on each sample file one after the other, within a loop. The final output of the scaleTransformProcessingQueue, which should be a flowCore::tranformList, is also provided as input to the flowFramesPreProcessingQueue, by convention.

In the next subsections, we show the different steps involved in creating a CytoPipeline object.

5.1 preliminaries: paths definition

In the following code, rawDataDir refers to the directory in which the .fcs raw data files are stored. workDir will be used as root directory to store the disk cache. Indeed, when running the CytoPipeline objects, all the different step outputs will be stored in a BiocFileCache instance, in a sub-directory that will be created in workDirand of which the name will be set to the pipeline experimentName.

library(CytoPipeline)

# raw data
rawDataDir <- system.file("extdata", package = "CytoPipeline")
# output files
workDir <- suppressMessages(base::tempdir())

5.2 first method: step by step, using CytoPipeline methods

In this sub-section, we build a CytoPipeline object and successively add CytoProcessingStep objects to the two different processing queues. We do this for the PeacoQC pipeline.

# main parameters : sample files and output files

experimentName <- "OMIP021_PeacoQC"
sampleFiles <- file.path(rawDataDir, list.files(rawDataDir,
                                                pattern = "Donor"))

pipL_PeacoQC <- CytoPipeline(experimentName = experimentName,
                             sampleFiles = sampleFiles)

### SCALE TRANSFORMATION STEPS ###

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "scale transform",
        CytoProcessingStep(
            name = "flowframe_read",
            FUN = "readSampleFiles",
            ARGS = list(
                whichSamples = "all",
                truncate_max_range = FALSE,
                min.limit = NULL
            )
        )
    )

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "scale transform",
        CytoProcessingStep(
            name = "remove_margins",
            FUN = "removeMarginsPeacoQC",
            ARGS = list()
        )
    )

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "scale transform",
        CytoProcessingStep(
            name = "compensate",
            FUN = "compensateFromMatrix",
            ARGS = list(matrixSource = "fcs")
        )
    )

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "scale transform",
        CytoProcessingStep(
            name = "flowframe_aggregate",
            FUN = "aggregateAndSample",
            ARGS = list(
                nTotalEvents = 10000,
                seed = 0
            )
        )
    )

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "scale transform",
        CytoProcessingStep(
            name = "scale_transform_estimate",
            FUN = "estimateScaleTransforms",
            ARGS = list(
                fluoMethod = "estimateLogicle",
                scatterMethod = "linear",
                scatterRefMarker = "BV785 - CD3"
            )
        )
    )

### FLOW FRAME PRE-PROCESSING STEPS ###

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "flowframe_read",
            FUN = "readSampleFiles",
            ARGS = list(
                truncate_max_range = FALSE,
                min.limit = NULL
            )
        )
    )


pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "remove_margins",
            FUN = "removeMarginsPeacoQC",
            ARGS = list()
        )
    )

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "compensate",
            FUN = "compensateFromMatrix",
            ARGS = list(matrixSource = "fcs")
        )
    )

pipL_PeacoQC <-
    addProcessingStep(
        pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "perform_QC",
            FUN = "qualityControlPeacoQC",
            ARGS = list(
                preTransform = TRUE,
                min_cells = 150, # default
                max_bins = 500, # default
                step = 500, # default,
                MAD = 6, # default
                IT_limit = 0.55, # default
                force_IT = 150, # default
                peak_removal = 0.3333, # default
                min_nr_bins_peakdetection = 10 # default
            )
        )
    )

pipL_PeacoQC <-
    addProcessingStep(
        pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "remove_doublets",
            FUN = "removeDoubletsCytoPipeline",
            ARGS = list(
                areaChannels = c("FSC-A", "SSC-A"),
                heightChannels = c("FSC-H", "SSC-H"),
                nmads = c(3, 5))
            )
    )
                    

                
pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "remove_debris",
            FUN = "removeDebrisManualGate",
            ARGS = list(
                FSCChannel = "FSC-A",
                SSCChannel = "SSC-A",
                gateData =  c(73615, 110174, 213000, 201000, 126000,
                              47679, 260500, 260500, 113000, 35000)
            )
        )
    )

pipL_PeacoQC <-
    addProcessingStep(pipL_PeacoQC,
        whichQueue = "pre-processing",
        CytoProcessingStep(
            name = "remove_dead_cells",
            FUN = "removeDeadCellsManualGate",
            ARGS = list(
                FSCChannel = "FSC-A",
                LDMarker = "L/D Aqua - Viability",
                gateData = c(0, 0, 250000, 250000,
                             0, 650, 650, 0)
            )
        )
    )

5.3 second method: in one go, using JSON file input

In this sub-section, we build the flowAI pipeline, this time using a JSON file as an input. Note that the experimentName and sampleFiles are here specified in the JSON file itself. This is not necessary, as one could well specify the processing steps only in the JSON file, and pass the experimentName and sampleFiles directly in the CytoPipeline constructor.

jsonDir <- rawDataDir

# creation on CytoPipeline object,
# using json file as input
pipL_flowAI <-
  CytoPipeline(file.path(jsonDir, "OMIP021_flowAI_pipeline.json"),
               experimentName = "OMIP021_flowAI",
               sampleFiles = sampleFiles)

6 Executing pipelines

6.1 Executing PeacoQC pipeline

Note: executing the next statement might generate some warnings.
These are generated by the PeacoQC method, are highly dependent on the shape of the data investigated, and can safely be ignored here.

# execute PeacoQC pipeline
execute(pipL_PeacoQC, path = workDir)
## #####################################################
## ### running SCALE TRANSFORMATION processing steps ###
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...
## Removing margins from file : Donor1.fcs
## Warning in PeacoQC::RemoveMargins(ff, channels = channel4Margins,
## channel_specifications = PQCChannelSpecs): More than 10.12 % is considered as a
## margin event in file Donor1.fcs . This should be verified.
## Removing margins from file : Donor2.fcs
## Proceeding with step 3 [compensate] ...
## Proceeding with step 4 [flowframe_aggregate] ...
## Proceeding with step 5 [scale_transform_estimate] ...
## #####################################################
## ### NOW PRE-PROCESSING FILE /tmp/RtmpiQgRgJ/Rinst17bf2c1a112f5d/CytoPipeline/extdata/Donor1.fcs...
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...
## Removing margins from file : Donor1.fcs
## Warning in PeacoQC::RemoveMargins(ff, channels = channel4Margins,
## channel_specifications = PQCChannelSpecs): More than 10.12 % is considered as a
## margin event in file Donor1.fcs . This should be verified.
## Proceeding with step 3 [compensate] ...
## Proceeding with step 4 [perform_QC] ...
## Applying PeacoQC method...
## Starting quality control analysis for Donor1.fcs
## Warning in FindIncreasingDecreasingChannels(breaks, ff, channels, plot, : There
## seems to be an increasing or decreasing trend in a channel for Donor1.fcs .
## Please inspect this in the overview figure.
## Calculating peaks
## Warning in PeacoQC::PeacoQC(ff = ffIn, channels = channel4QualityControl, :
## There are not enough bins for a robust isolation tree analysis.
## MAD analysis removed 38.81% of the measurements
## The algorithm removed 38.81% of the measurements
## Proceeding with step 5 [remove_doublets] ...
## Proceeding with step 6 [remove_debris] ...
## Proceeding with step 7 [remove_dead_cells] ...
## #####################################################
## ### NOW PRE-PROCESSING FILE /tmp/RtmpiQgRgJ/Rinst17bf2c1a112f5d/CytoPipeline/extdata/Donor2.fcs...
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...
## Removing margins from file : Donor2.fcs
## Proceeding with step 3 [compensate] ...
## Proceeding with step 4 [perform_QC] ...
## Applying PeacoQC method...
## Starting quality control analysis for Donor2.fcs
## Warning in FindIncreasingDecreasingChannels(breaks, ff, channels, plot, : There
## seems to be an increasing or decreasing trend in a channel for Donor2.fcs .
## Please inspect this in the overview figure.
## Calculating peaks
## Warning in PeacoQC::PeacoQC(ff = ffIn, channels = channel4QualityControl, :
## There are not enough bins for a robust isolation tree analysis.
## MAD analysis removed 9.57% of the measurements
## The algorithm removed 9.57% of the measurements
## Proceeding with step 5 [remove_doublets] ...
## Proceeding with step 6 [remove_debris] ...
## Proceeding with step 7 [remove_dead_cells] ...

6.2 Executing flowAI pipeline

Note: again this might generate some warnings, due to flowAI.
These are highly dependent on the shape of the data investigated, and can safely be ignored here.

# execute flowAI pipeline
execute(pipL_flowAI, path = workDir)
## #####################################################
## ### running SCALE TRANSFORMATION processing steps ###
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...