The flowAI package allows to perform quality control on flow cytometry data in order to warrant superior results for both manual and automated downstream analysis. The quality control can be done in an Automatic or Interactive way. The principle behind these two methods is complementary and takes into account three different properties, i.e. 1) flow rate, 2) signal acquisition and 3) dynamic range. This vignette guides you step by step on how to use flowAI.
flowAI 1.36.0
The flowAI package allows to perform quality control on flow cytometry data in order to warrant superior results for both manual and automated downstream analysis. The package is built on the functions: 1) flow_auto_qc
, for the automatic analysis and 2) flow_iQC()
, for the interactive analysis.
The full pipeline of our quality control procedure includes the removal of events having anomalous values when looking at three aspect of a flow cytometry analysis:
The evaluation of these aspects makes it possible to remove the technical variability derived from surges or deviations in the flow rate, defective laser-detection system, data range limitations and other technical issues.
After the installation of the package in your local system you can load the package.
require(flowAI)
For this documentation or for other testing purposes we use a small built-in dataset. The dataset was manually created extracting a subsample of cells and channels from three FCS files that were part of an aging study of a Singaporean cohort. The dataset is stored as a flowSet object.
data(Bcells)
Bcells
## A flowSet with 3 experiments.
##
## column names(13): FSC-A FSC-H ... PE-Cy7-A Time
To select FCS files from your working directory you can create a character vector of
the files you want to analyze calling the function dir
with "*fcs$" as regular expression
for the pattern argument.
setwd(...)
fcsfiles <- dir(".", pattern="*fcs$")
The automatic method is implemented in the function flow_auto_qc
.
The following calls show how to perform the quality control with default
settings on the FCS files in your folder and in the toy dataset that comes with the FlowAI packages.
The flowAI package depends on the flowCore package for the handling of the FCS files in the R environment. The flowCore package provides two main classes, flowFrame
and flowSet
. The Bcells
object is an instance of the flowSet
class and contains a set of three FCS files that taken singuarly are instances of the flowFrame
object. The flow_auto_qc
function can be called on either one of the flowCore objects, flowSet
and flowFrame
, and on a character vector of the fcs files:
resQC <- flow_auto_qc(Bcells) # using a flowSet
resQC <- flow_auto_qc(Bcells[[1]]) # using a flowFrame
resQC <- flow_auto_qc(fcsfiles) # using a character vector
When a character vector is used to call the flow_auto_qc
function, a flowSet
object is automatically generated since the creation of the histogram for the cell number comparison depends on it.
Therefore, to avoid memory saturation, we suggest to split large datasets in batches that are compatible with the hardware specifications of your computer system.
For example, if you want batches of maximum 2 gigabytes you can use:
GbLimit <- 2 # decide the limit in gigabyte for your batches of FCS files
size_fcs <- file.size(fcsfiles)/1024/1024/1024 # it calculates the size in gigabytes for each FCS file
groups <- ceiling(sum(size_fcs)/GbLimit)
cums <- cumsum(size_fcs)
batches <- cut(cums, groups)
Then you can run your analysis on the batches using a for-loop:
for(i in 1:groups){
flow_auto_qc(fcsfiles[which(batches == levels(batches)[i])], output = 0)
}
When setting the output argument to 0 (or any other value apart 1 and 2), no R objects are returned.
After the quality control, the automatic method generates by default a new FCS file containing an additional parameter where the low quality events have a value higher than 10,000, similarly to the flowClean flagging method. Alternatively, a new FCS containing only the high quality events can be generated.
Moreover, flowAI can be implemented in automatic pipelines of analysis through the returned objects of the flowFrame
or flowSet
class
Remember that there are several arguments that you can set to improve the quality control results obtained on your dataset.
Moreover, with the argument remove_from
it is possible to perform partial quality control on only one or two of the above mentioned properties (flow rate, signal acquisition and dynamic range).
The function flow_auto_qc
generates a report for each FCS file, in both a graphic and tabular format, to evaluate the performance of the algorithms in the detection of the anomalies.
We suggest to run the automatic method first with default settings. If the results are not satisfying you can either modify the settings or use the interactive method flow_iQC
.
The interactive method is implemented as a Shiny app and is executed through
the flow_iQC()
command on the R environment.
For performance and clearness reasons, it allows to analyze one file at a time only.
Once you open the Shiny app on your web browser, you can upload the FCS file from the top part of the left hand side panel.
Here, we give an example of the results obtained after performing the quality control on the first FCS file of the Bcells dataset.
The summary information of the FCS file analyzed is reported in the first section of the automatically generated report or on the left hand side panel of the flow_iQC
Shiny app. The summary information contains the name of the file, the number of events and the total percentage of anomalies detected and removed.
The following information were obtained from the automatically generated report of our example:
Input File Name: Bcells1
Number of Events: 64562
The anomalies were removed from: Flow Rate, Flow Signal and Flow Margin Anomalies Detected in total: 23% Number of high quality events: 49535
If the dataset has more than three FCS files, the automatic method will produce a histogram containing the number of events for each file. The bar in blue correspond to the FCS file whose quality control analysis is described in the remaining part of the report.
The flow rate is reconstructed using the keyword $TIMESTEP contained in FCS files with version equal or greater to 3. By default the analysis is performed using a timestep of 1/10 of a second. flow_auto_qc
uses an anomaly detection algorithm to detect and remove the data acquired during flow rate surges and shift from the median value. The algorithm is based on the Generalized ESD outlier detection method optimized to work on time series data. The anomalies automatically detected are circled in green.
flow_iQC
allows to manually select the most stable region of the flow rate.
For each channel, the median of the signal of equally-sized bins of events is reported as a Levy-Jennings-type graph. The mean and standard deviation of the median should remain constant over the course of the analysis. flow_auto_qc
uses a changepoint detection method to verify the stability of the signal. Precisely, a shift in the median or the variance is detected by the Binary Segmentation algorithm of the changepoint
package. In the resulting plot, the region that passed the quality control is highlighted in yellow.