Contents

1 Why Do We Need A New Class?

The current implementation for the @treatmentResponse slot in a PharmacoSet has some limitations.

Firstly, it does not natively support dose-response experiments with multiple drugs and/or cancer cell lines. As a result we have not been able to include this data into a PharmacoSet thus far.

Secondly, drug combination data has the potential to scale to high dimensionality. As a result we need an object that is highly performant to ensure computations on such data can be completed in a timely manner.

To resolve these issues, we designed and implement the TreamtentResponseExperiment (or TRE for short)!

2 Design Philosophy

The current use case is supporting drug combinations experiments in PharmacoGx, but we wanted to create something flexible enough to fit other use cases. As such, we have used the generic term ‘treatment’ to refer to any experimental intervention one can conduct on a set of samples. In context of PharmacoGx, a treatment represents application of one or more anti-cancer compounds to a cancer cell-line. The resulting viability for this cell-line after treatment is the response metric. We hope that the implementation of our class is general enough to support other use cases. For example, the TreatmentResponseExperiment class is also being adopted for radiation dose-response experiments in cancer cell-lines in RadioGx as well as for investigating compound toxicity in healthy human and rat cell-lines in ToxicoGx.

Our design takes the aspects of the SummarizedExperiment and MultiAssayExperiment classes and implements them using the data.table package, which provides an R API to a rich set of tools for scalable, high performance data processing implemented in C.

3 Anatomy of a TreatmentResponseExperiment

3.1 Class Diagram

We have borrowed directly from the SummarizedExperiment class for the rowData, colData, metadata and assays slot names. We also implemented the SummarizedExperiment accessor methods for the TreatmentResponseExperiment. Therefore the interface should be familiar to users of common Bioconductor packages.

3.2 Object Structure and Cardinality

There are, however, some important differences which make this object more flexible when dealing with high dimensional data.

Unlike a SummarizedExperiment, there are three distinct subgroups of columns in rowData and colData.

The first are the rowKey and colKey which are implemented internally to map between each assay observation and its associated treatments or samples (rows or columns); these will not be returned by the accessors by default. The second are the rowIDs and colIDs, these hold all of the information necessary to uniquely identify a row or column and are used to generate the rowKey and colKey. Finally, there are the rowMeta and colMeta columns, which store any additional data about treatments or samples not required to uniquely identify a row in either table.

Within the TreatmentResponseExperiment, an assayIndex is stored in the @.intern slot which maps between unique combinations of rowKey and colKey and the experimental observations in each assay. This relationship is maintained using a separate primary key for each assay, which can map to one or more rowKey and colKey combination. For assays containing raw experimental observations, generally each assay row will map to one and only one combination of rowKey and colKey. However, for metrics computed over experimental observations, It may be desirable to summarized over some of the rowID and/or colID columns. In this case, the relationship between the summarized rows and the metadata stored in the rowData and colData slots are retained in the assayIndex, allowing

Also worth noting is the cardinality between rowData and colData for a given assay within the assays list. As indicated by the lower connection between these tables and an assay, for each row or column key there may be zero or more rows in the assay table. Conversely for each row in the assay there may be zero or one key in colData or rowData. When combined, the rowKey and colKey for a given row in an assay become a composite key which uniquely identify an observation.

4 Constructing a TreatmentResponseExperiment

To deal with the complex kinds of experimental designs which can be stored in a LongTable, we have engineered a new object to help document and validate the way data is mapped from raw data files, as a single large data.frame or data.table, to the various slots of a TreatmentResponseExperiment object.

4.1 The DataMapper Class

The DataMapper is an abstract class, which means in cannot be instatiated. Its purpose is to provide a description of the concept of a DataMapper and define a basic interface for any classes inheriting from it. A DataMapper is simply a way to map columns from some raw data file to the slots of an S4 class. It is similar to a schema in SQL in that it defines the valid parts of an object (analogously a SQL table), but differs in that no types are specified or enforced at this time.

This object is not important for general users, but may be useful for other developers who want to map from some raw data to some S4 class. In this case, any derived data mapper should inherit from the DataMapper abstract class. Only one slot is defined by default, a list or List in the @rawdata slot. An accessor method, rawdata(DataMapper), is defined to assign and retrieve the raw data from your mapper object.

4.2 The TREDataMapper Class

The TREDataMapper class is the first concrete sub-class of a DataMapper. It is the object which defines how to go from a single data.frame or data.table of raw experimental data to a properly formatted and valid TreatmentResponseExperiment object. This is accomplished by defining various mappings, which let the the user decide which columns from rawdata should go into which slots of the object. Each slot ma