This vignette describes the functionality implemented in the CytoPipeline
package. CytoPipeline
provides support for automation and visualization of flow cytometry data analysis pipelines. In the current state, the package focuses on the pre-processing and quality control part. This vignette is distributed under a CC BY-SA 4.0 license.
CytoPipeline 1.5.1
To install this package, start R and enter (uncommented):
# if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#
# BiocManager::install("CytoPipeline")
Note that CytoPipeline imports ggplot2 (>= 3.4.1).
The version requirement is due to a bug in version 3.4.0.,
affecting ggplot2::geom_hex()
.
The CytoPipeline
package provides infrastructure to support the definition,
run and standardized visualization of pre-processing and quality control
pipelines for flow cytometry data. This infrastructure consists of two main S4
classes, i.e. CytoPipeline
and CytoProcessingStep
, as well as dedicated
wrapper functions around selected third-party package methods often used to
implement these pre-processing steps.
In the following sections, we demonstrate how to create a CytoPipeline
object implementing a simple pre-processing pipeline, how to run it and
how to retrieve and visualize the results after each step.
The example dataset that will be used throughout this vignette is derived from a reference public dataset accompanying the OMIP-021 (Optimized Multicolor Immunofluorescence Panel 021) article (Gherardin et al. 2014).
A sub-sample of this public dataset is built-in in the CytoPipeline
package, as the OMIP021 dataset.
See the MakeOMIP021Samples.R
script for more details
on how the OMIP021
dataset was created. This script is to be found
in the script
subdirectory in the CytoPipeline
package installation path.
Note that in the CytoPipeline
package, as in the current vignette,
matrices of flow cytometry events intensities are stored as
flowCore::flowFrame
objects (Ellis B 2022).
Let’s assume that we want to pre-process the two samples of the OMIP021
dataset, and let’s assume that we want to compare what we would obtain when
pre-processing these files using two different QC methods.
In the first pre-processing pipeline, we will use the flowAI QC method (Monaco et al. 2016), while in the second pipeline, we will use the PeacoQC method (Emmaneel et al. 2021). Note that when we here refer to QC method, we mean the algorithm used to ensure stability (stationarity) of the channel signals in time.
In both pipelines, the first part consists in estimating appropriate scale
transformation functions for all channels present in the sample flowFrame
.
In order to do this, we propose the following scale transformation processing
queue (Fig. 1):
.fcs
filesWhen this first part is done, one can apply pre-processing for each file one by one. However, depending on the choice of QC method, the order of steps needs to be slightly different:
Therefore, we propose the following pre-processing queues represented in Fig. 2.
CytoPipeline
is the central S4 class used in the CytoPipeline
package to
represent a flow cytometry pre-processing pipeline. The main slots of
CytoPipeline
objects are :
an experimentName
, which gives a name to a particular user definition
of a pre-processing pipeline. The experiment here, is not related to an assay
experiment, but refers to a specific way to design a pipeline. For example, in
the current use case, we will define two experimentName
s, one to refer to the
flowAI pipeline, and another one to refer to the PeacoQC pipeline
(see previous section);
a vector of sampleFiles
, which are .fcs
raw data files on which one need
to run the pre-processing pipeline;
two processing queues, i.e. a scaleTransformProcessingQueue
, and a
flowFramesPreProcessingQueue
, which correspond to the two parts described in
previous section. Each of these queues are composed of one or several
CytoProcessingStep
objects, will be processed in linear sequence, the output
of one step being the input of the next step.
Note there are important differences between the two processing queues.
On the one hand, the scaleTransformProcessingQueue
takes the vector of all
sample files as an input, and will be executed first, and only once.
On the other hand, the flowFramesPreProcessingQueue
will be run after the
scale transformation processing queue, on each sample file one after the other,
within a loop. The final output of the scaleTransformProcessingQueue
, which
should be a flowCore::tranformList
, is also provided as input to the
flowFramesPreProcessingQueue
, by convention.
In the next subsections, we show the different steps involved in creating a
CytoPipeline
object.
In the following code, rawDataDir
refers to the directory in which the .fcs
raw data files are stored. workDir
will be used as root directory to store
the disk cache. Indeed, when running the CytoPipeline
objects, all the
different step outputs will be stored in a BiocFileCache
instance, in a
sub-directory that will be created in workDir
and of which the name will be
set to the pipeline experimentName
.
library(CytoPipeline)
# raw data
rawDataDir <- system.file("extdata", package = "CytoPipeline")
# output files
workDir <- suppressMessages(base::tempdir())
In this sub-section, we build a CytoPipeline
object and successively add
CytoProcessingStep
objects to the two different processing queues. We do this
for the PeacoQC pipeline.
# main parameters : sample files and output files
experimentName <- "OMIP021_PeacoQC"
sampleFiles <- file.path(rawDataDir, list.files(rawDataDir,
pattern = "Donor"))
pipL_PeacoQC <- CytoPipeline(experimentName = experimentName,
sampleFiles = sampleFiles)
### SCALE TRANSFORMATION STEPS ###
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "scale transform",
CytoProcessingStep(
name = "flowframe_read",
FUN = "readSampleFiles",
ARGS = list(
whichSamples = "all",
truncate_max_range = FALSE,
min.limit = NULL
)
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "scale transform",
CytoProcessingStep(
name = "remove_margins",
FUN = "removeMarginsPeacoQC",
ARGS = list()
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "scale transform",
CytoProcessingStep(
name = "compensate",
FUN = "compensateFromMatrix",
ARGS = list(matrixSource = "fcs")
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "scale transform",
CytoProcessingStep(
name = "flowframe_aggregate",
FUN = "aggregateAndSample",
ARGS = list(
nTotalEvents = 10000,
seed = 0
)
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "scale transform",
CytoProcessingStep(
name = "scale_transform_estimate",
FUN = "estimateScaleTransforms",
ARGS = list(
fluoMethod = "estimateLogicle",
scatterMethod = "linear",
scatterRefMarker = "BV785 - CD3"
)
)
)
### FLOW FRAME PRE-PROCESSING STEPS ###
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "flowframe_read",
FUN = "readSampleFiles",
ARGS = list(
truncate_max_range = FALSE,
min.limit = NULL
)
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "remove_margins",
FUN = "removeMarginsPeacoQC",
ARGS = list()
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "compensate",
FUN = "compensateFromMatrix",
ARGS = list(matrixSource = "fcs")
)
)
pipL_PeacoQC <-
addProcessingStep(
pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "perform_QC",
FUN = "qualityControlPeacoQC",
ARGS = list(
preTransform = TRUE,
min_cells = 150, # default
max_bins = 500, # default
step = 500, # default,
MAD = 6, # default
IT_limit = 0.55, # default
force_IT = 150, # default
peak_removal = 0.3333, # default
min_nr_bins_peakdetection = 10 # default
)
)
)
pipL_PeacoQC <-
addProcessingStep(
pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "remove_doublets",
FUN = "removeDoubletsCytoPipeline",
ARGS = list(
areaChannels = c("FSC-A", "SSC-A"),
heightChannels = c("FSC-H", "SSC-H"),
nmads = c(3, 5))
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "remove_debris",
FUN = "removeDebrisManualGate",
ARGS = list(
FSCChannel = "FSC-A",
SSCChannel = "SSC-A",
gateData = c(73615, 110174, 213000, 201000, 126000,
47679, 260500, 260500, 113000, 35000)
)
)
)
pipL_PeacoQC <-
addProcessingStep(pipL_PeacoQC,
whichQueue = "pre-processing",
CytoProcessingStep(
name = "remove_dead_cells",
FUN = "removeDeadCellsManualGate",
ARGS = list(
FSCChannel = "FSC-A",
LDMarker = "L/D Aqua - Viability",
gateData = c(0, 0, 250000, 250000,
0, 650, 650, 0)
)
)
)
In this sub-section, we build the flowAI pipeline, this time using a JSON
file as an input. Note that the experimentName
and sampleFiles
are here
specified in the JSON file itself. This is not necessary, as
one could well specify the processing steps only in the JSON file, and
pass the experimentName
and sampleFiles
directly
in the CytoPipeline
constructor.
jsonDir <- rawDataDir
# creation on CytoPipeline object,
# using json file as input
pipL_flowAI <-
CytoPipeline(file.path(jsonDir, "OMIP021_flowAI_pipeline.json"),
experimentName = "OMIP021_flowAI",
sampleFiles = sampleFiles)
Note: executing the next statement might generate some warnings.
These are generated by the PeacoQC method
, are highly dependent
on the shape of the data investigated, and can safely be ignored here.
# execute PeacoQC pipeline
execute(pipL_PeacoQC, path = workDir)
## #####################################################
## ### running SCALE TRANSFORMATION processing steps ###
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...
## Removing margins from file : Donor1.fcs
## Warning in PeacoQC::RemoveMargins(ff, channels = channel4Margins,
## channel_specifications = PQCChannelSpecs): More than 10.12 % is considered as a
## margin event in file Donor1.fcs . This should be verified.
## Removing margins from file : Donor2.fcs
## Proceeding with step 3 [compensate] ...
## Proceeding with step 4 [flowframe_aggregate] ...
## Proceeding with step 5 [scale_transform_estimate] ...
## #####################################################
## ### NOW PRE-PROCESSING FILE /tmp/RtmpiQgRgJ/Rinst17bf2c1a112f5d/CytoPipeline/extdata/Donor1.fcs...
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...
## Removing margins from file : Donor1.fcs
## Warning in PeacoQC::RemoveMargins(ff, channels = channel4Margins,
## channel_specifications = PQCChannelSpecs): More than 10.12 % is considered as a
## margin event in file Donor1.fcs . This should be verified.
## Proceeding with step 3 [compensate] ...
## Proceeding with step 4 [perform_QC] ...
## Applying PeacoQC method...
## Starting quality control analysis for Donor1.fcs
## Warning in FindIncreasingDecreasingChannels(breaks, ff, channels, plot, : There
## seems to be an increasing or decreasing trend in a channel for Donor1.fcs .
## Please inspect this in the overview figure.
## Calculating peaks
## Warning in PeacoQC::PeacoQC(ff = ffIn, channels = channel4QualityControl, :
## There are not enough bins for a robust isolation tree analysis.
## MAD analysis removed 38.81% of the measurements
## The algorithm removed 38.81% of the measurements
## Proceeding with step 5 [remove_doublets] ...
## Proceeding with step 6 [remove_debris] ...
## Proceeding with step 7 [remove_dead_cells] ...
## #####################################################
## ### NOW PRE-PROCESSING FILE /tmp/RtmpiQgRgJ/Rinst17bf2c1a112f5d/CytoPipeline/extdata/Donor2.fcs...
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...
## Removing margins from file : Donor2.fcs
## Proceeding with step 3 [compensate] ...
## Proceeding with step 4 [perform_QC] ...
## Applying PeacoQC method...
## Starting quality control analysis for Donor2.fcs
## Warning in FindIncreasingDecreasingChannels(breaks, ff, channels, plot, : There
## seems to be an increasing or decreasing trend in a channel for Donor2.fcs .
## Please inspect this in the overview figure.
## Calculating peaks
## Warning in PeacoQC::PeacoQC(ff = ffIn, channels = channel4QualityControl, :
## There are not enough bins for a robust isolation tree analysis.
## MAD analysis removed 9.57% of the measurements
## The algorithm removed 9.57% of the measurements
## Proceeding with step 5 [remove_doublets] ...
## Proceeding with step 6 [remove_debris] ...
## Proceeding with step 7 [remove_dead_cells] ...
Note: again this might generate some warnings, due to flowAI.
These are highly dependent on the shape of the data investigated,
and can safely be ignored here.
# execute flowAI pipeline
execute(pipL_flowAI, path = workDir)
## #####################################################
## ### running SCALE TRANSFORMATION processing steps ###
## #####################################################
## Proceeding with step 1 [flowframe_read] ...
## Proceeding with step 2 [remove_margins] ...