Contents

1 Introduction

GladiaTOX is an open-source solution for HCS data processing and reporting that expands the tcpl package (toxcast pipeline, Filer et al., 2016). In addition to tcpl’s functionalities (multiple dose-response fitting and best fit selection), GladiaTOX

1.1 Installation and package load

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("GladiaTOX")

1.2 Database configuration

The GladiaTOX installation includes the deployment of a sqlite database (sql/gladiatoxdb.sqlite folder). This file contains the database structure already initialized with the necessary content needed for data processing ( e.g. processing methods entries).

The first step after database deployment is to configure access parameters. The default configuration already points to the sqlite database at so no additional configurations are needed to complete the example below. Below the default database URL is assigned to the variable sqlite_src:

sqlite_src <- file.path(system.file(package="GladiaTOX"), "sql",
                        "gladiatoxdb.sqlite")

The gtoxConf configuration command below initializes all necessary variables.

# sqlite database location
gtoxConf(   drvr = "SQLite",
            host = NA,
            user = NA,
            pass = NULL,
            db = sqlite_src)

This database will be used in next sections to load and process a second study phase. The sqlite database can be seen as a sample database used for the following example.

In case users plan to use the package in production and for multiple studies, it is recommended to install the MySQL database (sql/gladiatoxdb_structure.mysql). This file contains SQL instructions to create and initialize an MySQL database.

In case you install the database schema or change location of the sqlite database, then you must run the configuration command below and point to the new database location. For example, in case you deploy the database schema provided in the sql folder, then change the driver (drvr) to MySQL. Eventually you may need to configure your user name and password.

Below is an example of configuration call pointing to an MySQL database called my_gl_database at local.host:

gtoxConf(   drvr = "MySQL",
            host = "local.host",
            user = "username",
            pass = "********",
            db = "my_gl_database")

1.3 Deployed database

The deployed database already contains fully processed study, with

  • asid: 1 (assay source id), the unique study identifier
  • asnm: SampleStudy (assay source name), the names of the study
  • asph: PhaseI (assay source phase), the study phase

The purpose of the call gtoxLoadAsid() is to list all studies available in the database.

# List available studies
gtoxLoadAsid()
#>    asid        asnm   asph
#> 1:    1 SampleStudy PhaseI

2 Data and metadata for vignette

In this section we will explore one simple way for loading data in the database. The following chunks prepare the metadata and data in R objects (data.frame) prior database loading.

The following commands loads the data for vignette. The command loads three objects in the environment. The content of each object is described in the following sections. These objects are what users need to prepare before study data can be loaded. In particular the dat object contains the raw data as fetched from the instrument database. This database is only accessible internally to the company, hence its content has been exported and saved in an Rdat object . Some fields, not used by the code, are not reported.

load(system.file("extdata", "data_for_vignette.rda", package="GladiaTOX"))

2.1 plate: plate metadata

The plate object stores metadata with plate information.

Most of the columns have self-contained names and content; plate is the plate number (usually an integer); tube is the well location (H1 is row 8 column 1); well_type is the content type of the well (c positive control, t treatment, n is the negative control); endpoint contains assay names with no exposure duration info appended; u_boxtrack is a plate identifier used to join the plate metadata table with the raw data table prior data is loaded in the GladiaTOX database.

print(head(plate), row.names = FALSE)
#>     stimulus stimulus concentration exposure duration plate tube well_type
#>  o-anisidine               10000 uM               24h     1   A1         t
#>  o-anisidine                5000 uM               24h     1   B1         t
#>  o-anisidine                1000 uM               24h     1   C1         t
#>  o-anisidine                 200 uM               24h     1   D1         t
#>  o-anisidine             0.00007 uM               24h     1   E1         t
#>  o-anisidine           0.0000006 uM               24h     1   F1         t
#>  vehicle_name       study study.phase cell type    endpoint exposure date
#>          EtOH SampleStudy     PhaseII      NHBE GSH content    2014-06-17
#>          EtOH SampleStudy     PhaseII      NHBE GSH content    2014-06-17
#>          EtOH SampleStudy     PhaseII      NHBE GSH content    2014-06-17
#>          EtOH SampleStudy     PhaseII      NHBE GSH content    2014-06-17
#>          EtOH SampleStudy     PhaseII      NHBE GSH content    2014-06-17
#>          EtOH SampleStudy     PhaseII      NHBE GSH content    2014-06-17
#>  plate_set Biological Replicate smkid well format           assay       Date
#>          0                    1           96-well GSH content_24h 2014-03-17
#>          0                    1           96-well GSH content_24h 2014-03-17
#>          0                    1           96-well GSH content_24h 2014-03-17
#>          0                    1           96-well GSH content_24h 2014-03-17
#>          0                    1           96-well GSH content_24h 2014-03-17
#>          0                    1           96-well GSH content_24h 2014-03-17
#>   u_boxtrack
#>  S-000031334
#>  S-000031334
#>  S-000031334
#>  S-000031334
#>  S-000031334
#>  S-000031334

2.2 chnmap: assay metadata and channel mapping

The second metadata table contains assay mapping information. In the example below two assays are shown: Cytotoxicity (TIER1) and DNA damage (pH2AX).

Five endpoints are part of the cytotoxicity assay (e.g., Cell count, membranepermeability). Two endpoints are shown to be part of the DNA damage assay. Since multiple endpoints can be read from the same plate, each of them is read on a separate channel. This column will also be used later on to join meatadata and data tables.

print(head(chnmap, 7), row.names = FALSE)
#>                 Assay                         Endpoint
#>  Cytotoxicity (TIER1)                       Cell count
#>  Cytotoxicity (TIER1)       Cell membrane permeability
#>  Cytotoxicity (TIER1) Mitochondrial membrane potential
#>  Cytotoxicity (TIER1)               Mitochondrial mass
#>  Cytotoxicity (TIER1)             Cytochrome C release
#>    DNA damage (pH2AX)                       Cell count
#>    DNA damage (pH2AX)               DNA damage (pH2AX)
#>                           Channel
#>  SelectedObjectCountPerValidField
#>              MEAN_CircAvgIntenCh2
#>          MEAN_RingSpotAvgIntenCh3
#>         MEAN_RingSpotTotalAreaCh3
#>              MEAN_CircAvgIntenCh4
#>  SelectedObjectCountPerValidField
#>              MEAN_CircAvgIntenCh2

The content of plate and chnmap are then combined to generate the assay table. In the assay table, assay and endpoint are concatenated to timepoints to generate assays entries for the database.

assay <- buildAssayTab(plate, chnmap)
print(head(assay, 4), row.names = FALSE)
#>                assay timepoint                            component
#>   Ox stress (DHE)_4h        4h  Ox stress (DHE)_Oxidative stress_4h
#>   Ox stress (DHE)_4h        4h  Ox stress (DHE)_Oxidative stress_4h
#>  Ox stress (DHE)_24h       24h Ox stress (DHE)_Oxidative stress_24h
#>  Ox stress (DHE)_24h       24h Ox stress (DHE)_Oxidative stress_24h
#>                                 endpoint              channel
#>   Ox stress (DHE)_Oxidative stress_4h_up MEAN_CircAvgIntenCh2
#>   Ox stress (DHE)_Oxidative stress_4h_dn MEAN_CircAvgIntenCh2
#>  Ox stress (DHE)_Oxidative stress_24h_up MEAN_CircAvgIntenCh2
#>  Ox stress (DHE)_Oxidative stress_24h_dn MEAN_CircAvgIntenCh2

2.3 dat: image quantification raw data

The data table is an export from the image quantification instrument.

This table contains the raw fluorescence quantification values: measure_val; rowi and coli are the row and column indexes; machine_name is the channel name and is used to join this table with the assay table above; u_boxtrack is the plate identified and is used to join the table with the plate table.

print(head(dat), row.names = FALSE)
#>  measure_val rowi coli                     machine_name  u_boxtrack
#>       134.15    1    1 SelectedObjectCountPerValidField S-000031358
#>       118.50    1    2 SelectedObjectCountPerValidField S-000031358
#>       139.05    1    3 SelectedObjectCountPerValidField S-000031358
#>       214.50    1    4 SelectedObjectCountPerValidField S-000031358
#>       226.55    1    5 SelectedObjectCountPerValidField S-000031358
#>       229.75    1    6 SelectedObjectCountPerValidField S-000031358

3 Database loading

In this sections data and metadata will be loaded in the GladiaTOX database. Let’s set the study parameters, study name and phase of the new study phase to be loaded in the database and processed.

## Set study parameters
std.nm <- "SampleStudy" # study name
phs.nm <- "PhaseII" # study phase

3.1 Register study info in database

The following code will register metadata file content in the database, including: assays, endpoints, treatments and controls. The status of the assay source table (study table) before and after new study creation is displayed below calling gtoxLoadAsid(). The purpose of the call is to list all studies available in the database before and after the new study is added with the function loadAnnot().

## List of studies before loading
gtoxLoadAsid()
#>    asid        asnm   asph
#> 1:    1 SampleStudy PhaseI

## Load annotation in gtoxDB
loadAnnot(plate, assay, NULL)
#> [1] TRUE

## List of studies after loading
gtoxLoadAsid()
#>    asid        asnm    asph
#> 1:    1 SampleStudy  PhaseI
#> 2:    2 SampleStudy PhaseII

The loadAnnot function call registers multiple study parameters in the database, including the creation of the new assay source id (asid). The asid identifies the pair study name, study phase. The asid is what will be used to load raw data of the study, process the study and generate reports.

3.2 Load raw data in database

The asid just created can be retrieved by querying the database and specify the study name and phase.

# Get assay source ID
asid = gtoxLoadAsid(fld = c("asnm", "asph"), val = list(std.nm, phs.nm))$asid
asid
#> [1] 2

The asid and the dat objects are the inputs to the prepareDatForDB function used to join metadata stored in database to the raw data stored in the dat object.

Raw data is then loaded in the database with the gtoxWriteData function. Study whose asid is 2 is now ready to be processed.

# Prepare and load data
dat <- prepareDatForDB(asid, dat)
gtoxWriteData(dat[ , list(acid, waid, wllq, rval)], lvl = 0, type = "mc")
#> Completed delete cascade for 34 ids (0.07 secs)
#> [1] TRUE

4 Quality control: data processing and reporting

Metadata and data are now registered in the database. Next step is to select the processing methods we want to apply on the data. There are multiple levels of processing (see gtoxLoadMthd(lvl=3) for details). The function assignDefaultMthds is a shortcut to assign all levels methods at once. The methods selected would probably fit well to most users.

assignDefaultMthds(asid = asid)
#> [1] TRUE

With the default selection, raw data is normalized by computing the log2 fold change of values in each well against the median of the corresponding controls.

4.1 Compute the noise band

The package computes a noise band to discriminate concentration series that are active versus those that are not. To compute the noise band we need to process and normalize vehicle’s data running the following code:

# Run level 1 to level 3 functions
res <- gtoxRun(asid = asid, slvl = 1, elvl = 3, mc.cores = 2)

The default behaviour is to compute noise band margins separately for each endpoint. Margins correspond to 3 times the baseline median absolute deviation of vehicle responses. The following code computes the cutoffs and store them in the database.

# Extract assay endpoints ids of the study
aeids <- gtoxLoadAeid(fld = "asid", val = asid)$aeid
# Compute Vehicle Median Absolute deviation
tmp <- mapply(function(xx){
    tryCatch(gtoxCalcVmad(inputs = xx, aeid = xx, 
    notes = "computed within study"), 
    error = function(e) NULL)},
    as.integer(aeids))

Once the database is populated with noise band margins, then all chemical’s data can be processed.

# Apply all functions from level 1 to level 6
res <- gtoxRun(asid = asid, slvl = 1, elvl = 6, mc.cores = 2)

In the original work (Filer et al., 2016), the default behaviour is to compute noise band margins based on the response of the lowest two concentrations of the series. That assumes that no response is observed at those concentrations. The current package overcome that assumption and extend the list of functionalities. The database design was modified accordingly.

4.2 Quality control report

Quality control is the mean to check the quality of the data produced in the lab. Each experimental plate is controlled. Plates not passing the control step are filtered out. Quality control is commonly based on a visual inspection. The package exposes functionalities to generate a self contained pdf file with plate heatmaps and positive control plots.

## QC report
gtoxReport(type = "qc", asid = asid, report_author = "report author",
report_title = "Vignette QC report", odir = outdir)

An example of plate heatmap is shown below, and is what included in the quality control pdf report. The following code is used to extract the plate id we want to plot.

# Define assay component and extract assay component ID
acnm <- "DNA damage (pH2AX)_DNA damage (pH2AX)_4h"
acid <- gtoxLoadAcid(fld=c("asid", "acnm"), val=list(asid,acnm))[, acid]
# Extract assay plate ID corresponding to plate name S-000031351
apid <- gtoxLoadApid()[u_boxtrack == "S-000031351", apid]
# Load level 2 data (Raw data before normalization)
l2 <- gtoxLoadData(lvl = 2L, fld = "acid", val = acid)

The plate heatmap is performed with the folliwing code.

gtoxPlotPlate(dat = l2, apid = apid, id = acid)