0.1 Purpose

The purpose of this package is to provide the infrastructure to store, represent and exchange gated flow data. By this we mean accessing the samples, groups, transformations, compensation matrices, gates, and population statistics in the gating tree, which is represented as a GatingSet object in R.

The GatingSet can be built from scratch within R or imported from flowJo XML workspaces (i.e. .xml or .wsp files) or GatingML files . Note that we cannot import .jo files directly. You will have to save them in XML workspace format.

0.2 Import flowJo workspace

The following section walks through opening and importing a flowJo workspace.

0.2.1 Opening a Workspace

We represent flowJo workspaces using flowjo_workspace objects. We only need to know the path to, and filename of the flowJo workspace.

library(flowWorkspace)
path <- system.file("extdata",package="flowWorkspaceData")
wsfile <- list.files(path, pattern="A2004Analysis.xml", full = TRUE)

In order to open this workspace we need CytoML package:

library(CytoML)
## Warning: replacing previous import 'ncdfFlow::filter' by 'dplyr::filter' when
## loading 'CytoML'
ws <- open_flowjo_xml(wsfile)
ws
## File location:  /home/biocbuild/bbs-3.11-bioc/R/library/flowWorkspaceData/extdata/A2004Analysis.xml 
## 
## Groups in Workspace
##          Name Num.Samples
## 1 All Samples           2

We see that this a version 2.0 workspace file. It’s location and filename are printed. Additionally, you are notified that the workspace file is open. This refers to the fact that the XML document is internally represented using ‘C’ data structures from the XML package. After importing the file, the workspace must be explicitly closed using flowjo_ws_close() in order to free up that memory.

For example, the list of samples in a workspace can be accessed by:

fj_ws_get_samples(ws)
##   sampleID                       name count pop.counts
## 1        1 a2004_O1T2pb05i_A1_A01.fcs 61832         19
## 2        2 a2004_O1T2pb05i_A2_A02.fcs 45363         19

The compID column tells you which compensation matrix to apply to a group of files, and similarly, based on the name of the compensation matrix, which transformations to apply.

And the groups can be accessed by:

fj_ws_get_sample_groups(ws)
##     groupName groupID sampleID
## 1 All Samples       0        1
## 2 All Samples       0        2

Keywords stored in an XML workspace can also retrieved by:

sn <- "a2004_O1T2pb05i_A1_A01.fcs"
fj_ws_get_keywords(ws, sn)[1:5]
## $`$BEGINANALYSIS`
## [1] "0"
## 
## $`$BEGINDATA`
## [1] "3803"
## 
## $`$BEGINSTEXT`
## [1] "0"
## 
## $`$BTIM`
## [1] "09:20:24"
## 
## $`$BYTEORD`
## [1] "4,3,2,1"

0.2.2 Parsing the Workspace

These are all retrieved by directly querying xml file. In order to get more information about the gating tree, we need to actually parse the XML workspace into R data structures to represent some of the information therein. Specifically, by calling flowjo_to_gatingset() the user will be presented with a list of groups in the workspace file and need to choose one group to import. Why only one? Because of the way flowJo handles data transformation and compensation. Each group of samples is associated with a compensation matrix and specific data transformation. These are applied to all samples in the group. When a particular group of samples is imported, the package generates a GatingHierarchy for each sample, describing the set of gates applied to the data (note: polygons, rectangles, quadrants, and ovals and boolean gates are supported). The set of GatingHierarchies for the group of samples is stored in a GatingSet object. Calling flowjo_to_gatingset() is quite verbose, informing the user as each gate is created. The parsing can also be done non–interactively by specifying which group to import directly in the function call (either an index or a group name). An additional optional argument execute=T/F specifies whether you want to load, compensate, transform the data and compute statistics immediately after parsing the XML tree. Argument path can be used to specify where the FCS files are stored.

gs <- flowjo_to_gatingset(ws,name = 1); #import the first group
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
## invalid zeroChan: -2147483648
## caused by the invalid biexp parameters!Downcast the biexp to Calibration table instead!
#Lots of output here suppressed for the vignette.
gs
## A GatingSet with 2 samples

We have generated a GatingSet with 2 samples, each of which has 19 associated gates.

To list the samples stored in GatingSet:

sampleNames(gs)
## [1] "a2004_O1T2pb05i_A1_A01.fcs_61832" "a2004_O1T2pb05i_A2_A02.fcs_45363"

Note that it is different from the previous call fj_ws_get_samples on workspace where the latter list all samples stored in xml file and here are the ones actually get parsed. Because sometime not all of samples in xml will be imported for various reason. Also we’ve seen an extra string _xxx is attached to the end of each sample name. It is due to the argument additional.keys has the default value set to '$TOT'. See more details on these parsing options from How to parse a flowJo workspace.

0.3 Import gatingML

We currently support gatingML2.0 files exported from the Cytobank system. Parsing can be done with one convenient function, cytobank2GatingSet from the CytoML package, that simply takes file paths of gatingML and FCS.

xmlfile <- system.file("extdata/cytotrol_tcell_cytobank.xml", package = "CytoML")
fcsFiles <- list.files(pattern = "CytoTrol", 
                       system.file("extdata", package = "flowWorkspaceData"), full = T)
gs1 <- cytobank2GatingSet(xmlfile, fcsFiles)

If you want to dive into the details and sub-steps of the parsing process, see the vignette of CytoML.

0.4 Basics on GatingSet

Subsets of a GatingSet can be accessed using the standard R subset syntax [.

gs[1]
## A GatingSet with 1 samples

At this point we have parsed the workspace file and generated the gating hierarchy associated with each sample imported from the file. The data have been loaded, compensated, and transformed in the workspace, and gating has been executed. The resulting GatingSet contains a replicated analysis of the original flowJo workspace. It should be noted, however, that because GatingSet is a purely reference class, this sort of subsetting does not copy the underlying data but rather utilizes a view of it.

We can plot the gating tree:

plot(gs)