1 API
2 Overview of the MultiAssayExperiment class
3 A simple example where sampleMap isn’t provided
4 Creating a MultiAssayExperiment object: a rich example
5 RangedRaggedAssay class
6 Integrated subsetting across experiments
7 rownames and colnames
8 Requirements for support of additional data classes
9 Methods for MultiAssayExperiment
10 Wishlist
11 sessionInfo()

A built html version of this vignette is available.

1 API

See the API wiki by typing:

API()

A Shiny App that browses the API is also available:

API(shiny=TRUE)

2 Overview of the `MultiAssayExperiment` class

Here is an overview of the class and its constructors and extractors:

empty <- MultiAssayExperiment()
empty

## A "MultiAssayExperiment" object of 0 listed
##  experiments with no user-defined names and respective classes. 
##  Containing an "Elist" class object of length 0:  
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

slotNames(empty)

## [1] "Elist"     "pData"     "sampleMap" "metadata"  "drops"

We explain the role of each of these components of MultiAssayExperiment below.

2.1 Elist: experimental data

The Elist slot and class is the driver for the MultiAssayExperiment class, as it contains the experimental data. It is a SimpleList (similar to the base list), with one element per data type.

class(Elist(empty)) # Elist

## [1] "Elist"
## attr(,"package")
## [1] "MultiAssayExperiment"

The elements of Elist can contain ID-based and range-based data class that support a minimum of methods, see API() for details on using data classes not discussed here. The following classes are known to work as Elist elements, to support a good variety of data types:

matrix: the base data class, can be used for ID-based datasets such as gene expression summarized per-gene, microRNA, metabolomics, or microbiome data.
ExpressionSet: A richer representation for ID-based datasets, could be used for the same types of data as matrix, but storing additional assay-level metadata.
SummarizedExperiment: Another rich representation for ID-based matrix-like datasets
RangedSummarizedExperiment: For rectangular range-based datasets, meaning that one set of genomic ranges are assayed for multiple samples. Could be used for gene expression, methylation, or other data types referring to genomic positions.
RangedRaggedAssay: inherits from GRangesList, for ranged-based ragged arrays, meaning that a potentially different set of genomic ranges are assayed for each sample. A typical example would be segmented copy number, where segmentation of copy number alterations occurs and different genomic locations in each sample.

The datasets contained in elements of Elist must have column names and row names. The column names correspond to samples, and are used to match assay data to specimen metadata stored in pData, as explained in the following two sections.

[: standard square bracket subsetting, with a single comma. It is assumed that values before the comma subset rows, and values after the comma subset columns.
colnames(): corresponding to experimental samples
rownames(): corresponding to features such as genes, proteins, etc
dim(): returns a vector of the number of rows and number of columns

2.2 pData: primary data

The MultiAssayExperiment keeps one set of “primary” metadata that describes the biological specimens experimental subject, patients, etc. For the sake of simplicity, we will refer to each experimental subject as a patient.

class(pData(empty)) # DataFrame

## [1] "DataFrame"
## attr(,"package")
## [1] "S4Vectors"

pData contains one row per patient or specimen, and one column per variable of clinical or experimental metadata. rownames(pData) must provide the identifiers used for the patients or specimens.

In the following examples we will have assays for 4 patients, with the following bit of metadata relating to the patients. Although pData is stored internally as a DataFrame, the MultiAssayExperiment constructor will automatically convert a data.frame.

patient.data <- data.frame(sex=c("M", "F", "M", "F"),
    age=38:41,
    row.names=c("Jack", "Jill", "Bob", "Barbara"))
patient.data

##         sex age
## Jack      M  38
## Jill      F  39
## Bob       M  40
## Barbara   F  41

2.3 Note on the flexibility of the `DataFrame`

For many typical purposes the DataFrame and data.frame behave equivalently; but the Dataframe is more flexible as it allows any vector-like data type to be stored in its columns. The flexibility of the DataFrame permits, for example, storing multiple dose-response values for a single cell line, even if the number of doses and responses is not consistent across all cell lines. Doses could be stored in one column of pData as a SimpleList, and responses in another column, also as a SimpleList. Or, dose-response values could be stored in a single column of pData as a two-column matrix for each cell line.

2.4 sampleMap: relating pData to multiple assays

sampleMap is a DataFrame that provides a map between the “primary” data (pData) and the experimental assays:

class(sampleMap(empty)) # DataFrame

## [1] "DataFrame"
## attr(,"package")
## [1] "S4Vectors"

The sampleMap provides an unambiguous map from every experimental observation to one and only one row in pData. It is, however, permissible for a row of pData to be associated with multiple experimental observations or no observations at all. In other words, there is a “many-to-one” mapping from experimental observations to rows of pData, and a “one-to-any-number” mapping from rows of pData to experimental observations. pData has three columns, with the following column names:

primary provides the “primary” sample names. All values in this column must also be present in the rownames of pData(MultiAssayExperiment). In this example, allowable values in this column are “Jack”, “Jill”, “Barbara”, and “Bob”.
assay provides the sample names used by experimental datasets, which in practice are often different than the primary sample names. For each assay, all column names must be found in this column. Otherwise, those assays would be orphaned: it would be impossible to match them up to samples in the overall experiment. As mentioned above, duplicated values are allowed, to represent replicates with the same overall experiment-level annotation.
assayname provides the names of the different experiments / assays performed. These are user-defined, with the only requirement that the names of the Elist, where the experimental assays are stored, must be contained in this column.

This design is motivated by the following situations:

It allows flexibility for any amount of technical replication and biological replication (such as tumor and matched normal for a single patient) of individual assays.
It allows missing observations (such as RNA-seq performed only for some of the patients).
It allows the use of different identifiers to be used for patients / specimens and for each assay. These different identifiers are matched unambiguously, and consistency between them is maintained during subsetting and re-ordering.

2.5 metadata

Can be any class, for storing study-wide metadata, such as citation information. It can also be NULL, as with the empty MultiAssayExperiment:

class(metadata(empty)) # NULL (class "ANY")

## [1] "NULL"

2.6 drops

Maintains provenance of rows and columns dropped during subsetting operations.

3 A simple example where sampleMap isn’t provided

If each assay uses the same colnames (ie, if the same sample identifiers are used for each experiment), a simple list of these datasets is sufficient for the MultiAssayExperiment() constructor function. It is not necessary for them to have the same rownames or colnames:

exprss1 <- matrix(rnorm(16), ncol = 4,
        dimnames = list(sprintf("ENST00000%i", sample(288754:290000, 4)),
                c("Jack", "Jill", "Bob", "Bobby")))
exprss2 <- matrix(rnorm(12), ncol = 3,
        dimnames = list(sprintf("ENST00000%i", sample(288754:290000, 4)),
                c("Jack", "Jane", "Bob")))
doubleExp <- list("methyl 2k"  = exprss1, "methyl 3k" = exprss2)
simpleMultiAssay <- MultiAssayExperiment(Elist=doubleExp)

## Warning in MultiAssayExperiment(Elist = doubleExp): neither sampleMap nor
## pData provided, sampleMap will be generated

simpleMultiAssay

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] methyl 2k: "matrix" - 4 rows, 4 columns 
##  [2] methyl 3k: "matrix" - 4 rows, 3 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

In the above example, the user did not provide the pData argument so the constructor function filled it with a trivial DataFrame:

pData(simpleMultiAssay)

## DataFrame with 5 rows and 1 column
##          pheno1
##       <logical>
## Jack         NA
## Jill         NA
## Bob          NA
## Bobby        NA
## Jane         NA

But the pData can be provided. Here, note that any assay sample (column) that cannot be mapped to a corresponding row in the provided pData gets dropped. This is part of ensuring internal validity of the MultiAssayExperiment.

simpleMultiAssay2 <- MultiAssayExperiment(Elist=doubleExp, pData=patient.data)

## Warning in MultiAssayExperiment(Elist = doubleExp, pData = patient.data):
## sampleMap not provided, trying to generate sampleMap...

## Warning in .generateMap(pData, newElist): Data from rows:
##  Bobby - methyl 2k
##  Jane - methyl 3k
## dropped due to missing phenotype data

simpleMultiAssay2

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] methyl 2k: "matrix" - 4 rows, 3 columns 
##  [2] methyl 3k: "matrix" - 4 rows, 2 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

pData(simpleMultiAssay2)

## DataFrame with 4 rows and 2 columns
##              sex       age
##         <factor> <integer>
## Jack           M        38
## Jill           F        39
## Bob            M        40
## Barbara        F        41

4 Creating a `MultiAssayExperiment` object: a rich example

In this section we demonstrate all core supported data classes, using different sample ID conventions for each assay, with primary pData. The currently supported classes of datasets are matrix, ExpressionSet, SummarizedExperiment, RangedSummarizedExperiment, and RangedRaggedAssay.

4.1 Create toy datasets demonstrating all supported data types

We have three matrix-like datasets. First let’s say expression data, which in this example we represent as an ExpressionSet:

library(Biobase)
(arraydat <- matrix(seq(101, 108), ncol=4,
                    dimnames=list(c("ENST00000294241", "ENST00000355076"),
                                  c("array1", "array2", "array3", "array4"))))

##                 array1 array2 array3 array4
## ENST00000294241    101    103    105    107
## ENST00000355076    102    104    106    108

arraypdat <- as(data.frame(slope53=rnorm(4),
                           row.names=c("array1", "array2", "array3",
                                       "array4")), "AnnotatedDataFrame")
exprdat <- ExpressionSet(assayData=arraydat, phenoData=arraypdat)
exprdat

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2 features, 4 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: array1 array2 array3 array4
##   varLabels: slope53
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation:

The following map matches pData sample names to exprdata sample names. Note that row orders aren’t initially matched up, and this is OK.

(exprmap <- data.frame(primary=rownames(patient.data)[c(1, 2, 4, 3)],
                       assay=c("array1", "array2", "array3", "array4"),
                       stringsAsFactors = FALSE))

##   primary  assay
## 1    Jack array1
## 2    Jill array2
## 3 Barbara array3
## 4     Bob array4

Now methylation data, which we will represent as a matrix. It uses gene identifiers also, but measures a partially overlapping set of genes. For fun, let’s store this as a simple matrix. Also, it contains a replicate for one of the patients.

(methyldat <-
   matrix(1:10, ncol=5,
          dimnames=list(c("ENST00000355076", "ENST00000383706"),
                        c("methyl1", "methyl2", "methyl3",
                          "methyl4", "methyl5"))))

##                 methyl1 methyl2 methyl3 methyl4 methyl5
## ENST00000355076       1       3       5       7       9
## ENST00000383706       2       4       6       8      10

The following map matches pData sample names to methyldat sample names.

(methylmap <- data.frame(primary = c("Jack", "Jack", "Jill", "Barbara", "Bob"),
              assay = c("methyl1", "methyl2", "methyl3", "methyl4", "methyl5"),
              stringsAsFactors = FALSE))

##   primary   assay
## 1    Jack methyl1
## 2    Jack methyl2
## 3    Jill methyl3
## 4 Barbara methyl4
## 5     Bob methyl5

Now we have a microRNA platform, which has no common identifiers with the other datasets, and which we also represent as a matrix. It is also missing data for Jill. Just for fun, let’s use the same sample naming convention as we did for arrays.

(microdat <- matrix(201:212, ncol=3,
                    dimnames=list(c("hsa-miR-21", "hsa-miR-191",
                                    "hsa-miR-148a", "hsa-miR148b"),
                                  c("micro1", "micro2", "micro3"))))

##              micro1 micro2 micro3
## hsa-miR-21      201    205    209
## hsa-miR-191     202    206    210
## hsa-miR-148a    203    207    211
## hsa-miR148b     204    208    212

And the following map matches pData sample names to microdat sample names.

(micromap <- data.frame(primary = c("Jack", "Barbara", "Bob"),
                        assay = c("micro1", "micro2", "micro3"),
                        stringsAsFactors = FALSE))

##   primary  assay
## 1    Jack micro1
## 2 Barbara micro2
## 3     Bob micro3

Let’s include a RangedRaggedAssay, which is defined in this package and extends GRangesList. This is intended for data such as segmented copy number, which provide genomic ranges that may be different for each sample. We start with a GRangesList, which will later be converted automatically by the MultiAssayExperiment constructor function.

suppressPackageStartupMessages(library(GenomicRanges))
## completely encompasses ENST00000355076
gr1 <-
  GRanges(seqnames = "chr3", ranges = IRanges(58000000, 59502360),
          strand = "+", score = 5L, GC = 0.45)
## first is within ENST0000035076
gr2 <-
  GRanges(seqnames = c("chr3", "chr3"),
          ranges = IRanges(c(58493000, 3), width=9000),
          strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
gr3 <-
  GRanges(seqnames = c("chr1", "chr2"),
          ranges = IRanges(c(1, 4), c(3, 9)),
          strand = c("-", "-"), score = c(6L, 2L), GC = c(0.4, 0.1))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2, "gr3" = gr3)
names(grl) <- c("snparray1", "snparray2", "snparray3")
grl

## GRangesList object of length 3:
## $snparray1 
## GRanges object with 1 range and 2 metadata columns:
##       seqnames               ranges strand |     score        GC
##          <Rle>            <IRanges>  <Rle> | <integer> <numeric>
##   [1]     chr3 [58000000, 59502360]      + |         5      0.45
## 
## $snparray2 
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames               ranges strand | score  GC
##   [1]     chr3 [58493000, 58501999]      + |     3 0.3
##   [2]     chr3 [       3,     9002]      - |     4 0.5
## 
## $snparray3 
## GRanges object with 2 ranges and 2 metadata columns:
##       seqnames ranges strand | score  GC
##   [1]     chr1 [1, 3]      - |     6 0.4
##   [2]     chr2 [4, 9]      - |     2 0.1
## 
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths

The following data.frame matches pData sample to the GRangesList:

(rangemap <- data.frame(primary = c("Jack", "Jill", "Jill"),
                        assay = c("snparray1", "snparray2", "snparray3"),
                        stringsAsFactors = FALSE))

##   primary     assay
## 1    Jack snparray1
## 2    Jill snparray2
## 3    Jill snparray3

Finally, we create a dataset of class RangedSummarizedExperiment:

library(SummarizedExperiment)
nrows <- 5; ncols <- 4
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(2, nrows - 2)),
                     IRanges(floor(runif(nrows, 1e5, 1e6)), width=100),
                     strand=sample(c("+", "-"), nrows, TRUE),
                     feature_id=sprintf("ID\\%03d", 1:nrows))
names(rowRanges) <- letters[1:5]
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 2),
                     row.names= c("mysnparray1", "mysnparray2",
                                  "mysnparray3", "mysnparray4"))
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
                            rowRanges=rowRanges, colData=colData)

(rangemap2 <-
   data.frame(primary = c("Jack", "Jill", "Bob", "Barbara"),
              assay = c("mysnparray1", "mysnparray2", "mysnparray3",
                        "mysnparray4"), stringsAsFactors = FALSE))

##   primary       assay
## 1    Jack mysnparray1
## 2    Jill mysnparray2
## 3     Bob mysnparray3
## 4 Barbara mysnparray4

4.2 creating a sampleMap

The MultiAssayExperiment constructor function can create the sampleMap automatically if a single naming convention is used, but in this example it cannot because we used platform-specific sample identifiers (e.g. mysnparray1, etc). So we must provide an ID map that matches the samples of each experiment back to the pData, as a three-column data.frame or DataFrame with three columns named “primary”, “assay”, and “assayname”. Here we start with a list:

listmap <- list(exprmap, methylmap, micromap, rangemap, rangemap2)
names(listmap) <- c("Affy", "Methyl 450k", "Mirna", "CNV gistic", "CNV gistic2")
listmap

## $Affy
##   primary  assay
## 1    Jack array1
## 2    Jill array2
## 3 Barbara array3
## 4     Bob array4
## 
## $`Methyl 450k`
##   primary   assay
## 1    Jack methyl1
## 2    Jack methyl2
## 3    Jill methyl3
## 4 Barbara methyl4
## 5     Bob methyl5
## 
## $Mirna
##   primary  assay
## 1    Jack micro1
## 2 Barbara micro2
## 3     Bob micro3
## 
## $`CNV gistic`
##   primary     assay
## 1    Jack snparray1
## 2    Jill snparray2
## 3    Jill snparray3
## 
## $`CNV gistic2`
##   primary       assay
## 1    Jack mysnparray1
## 2    Jill mysnparray2
## 3     Bob mysnparray3
## 4 Barbara mysnparray4

and use the convenience function listToMap to convert the list of data.frame objects to a valid object for the sampleMap:

dfmap <- listToMap(listmap)
dfmap

## DataFrame with 19 rows and 3 columns
##         primary       assay   assayname
##     <character> <character> <character>
## 1          Jack      array1        Affy
## 2          Jill      array2        Affy
## 3       Barbara      array3        Affy
## 4           Bob      array4        Affy
## 5          Jack     methyl1 Methyl 450k
## ...         ...         ...         ...
## 15         Jill   snparray3  CNV gistic
## 16         Jack mysnparray1 CNV gistic2
## 17         Jill mysnparray2 CNV gistic2
## 18          Bob mysnparray3 CNV gistic2
## 19      Barbara mysnparray4 CNV gistic2

Note, dfmap can be reverted to a list with another provided function:

mapToList(dfmap, "assayname")

4.3 Experimental data as a `list()`

Create an named list of experiments for the MultiAssay function. All of these names must be found within in the third column of dfmap:

objlist <- list("Affy" = exprdat, "Methyl 450k" = methyldat,
                "Mirna" = microdat, "CNV gistic" = grl, "CNV gistic2" = rse)

4.4 Creation of the `MultiAssayExperiment` class object

We recommend using the MultiAssayExperiment() constructor function:

myMultiAssay <- MultiAssayExperiment(objlist, patient.data, dfmap)
myMultiAssay

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 5 columns 
##  [3] Mirna: "matrix" - 4 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

The following extractor functions can be used to get extract data from the object:

Elist(myMultiAssay)

## "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 5 columns 
##  [3] Mirna: "matrix" - 4 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns

pData(myMultiAssay)

## DataFrame with 4 rows and 2 columns
##              sex       age
##         <factor> <integer>
## Jack           M        38
## Jill           F        39
## Bob            M        40
## Barbara        F        41

sampleMap(myMultiAssay)

## DataFrame with 19 rows and 3 columns
##         primary       assay   assayname
##     <character> <character> <character>
## 1          Jack      array1        Affy
## 2          Jill      array2        Affy
## 3       Barbara      array3        Affy
## 4           Bob      array4        Affy
## 5          Jack     methyl1 Methyl 450k
## ...         ...         ...         ...
## 15         Jill   snparray3  CNV gistic
## 16         Jack mysnparray1 CNV gistic2
## 17         Jill mysnparray2 CNV gistic2
## 18          Bob mysnparray3 CNV gistic2
## 19      Barbara mysnparray4 CNV gistic2

metadata(myMultiAssay)

## NULL

Note that the Elist class extends the SimpleList class to add some validity checks specific to MultiAssayExperiment. It can be used like a list.

4.5 Helper function to create a `MultiAssayExperiment` object

The PrepMultiAssay function helps diagnose common problems when creating a MultiAssayExperiment object. It provides error messages and/or warnings in instances where names (either colnames or Elist element names) are inconsistent with those found in the sampleMap. Input arguments are the same as those in the MultiAssayExperiment (i.e., Elist, pData, sampleMap). The resulting output of the PrepMultiAssay function is a list of inputs including a “drops” element for names that were not able to be matched.

Instances where Elist is created without names will prompt an error from PrepMultiAssay. Named Elists are essential for checks in MultiAssayExperiment.

objlist3 <- objlist
(names(objlist3) <- NULL)

## NULL

try(PrepMultiAssay(objlist3, patient.data, dfmap)$Elist)

Non-matching names may also be present in the Elist elements and the “assayname” column of the sampleMap. If names only differ by case and are identical and unique, names will be standardized to lower case and replaced.

names(objlist3) <- toupper(names(objlist))
names(objlist3)

## [1] "AFFY"        "METHYL 450K" "MIRNA"       "CNV GISTIC"  "CNV GISTIC2"

unique(dfmap[, "assayname"])

## [1] "Affy"        "Methyl 450k" "Mirna"       "CNV gistic"  "CNV gistic2"

PrepMultiAssay(objlist3, patient.data, dfmap)$Elist

## Names in the Elist do not match sampleMap assaynames
## standardizing will be attempted...

##  - names set to lowercase

## "Elist" class object of length 5: 
##  [1] affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] methyl 450k: "matrix" - 2 rows, 5 columns 
##  [3] mirna: "matrix" - 4 rows, 3 columns 
##  [4] cnv gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] cnv gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns

When colnames in the Elist cannot be matched back to the primary data (pData), these will be dropped and added to the drops element.

exampleMap <- sampleMap(simpleMultiAssay2)
sapply(doubleExp, colnames)

## $`methyl 2k`
## [1] "Jack"  "Jill"  "Bob"   "Bobby"
## 
## $`methyl 3k`
## [1] "Jack" "Jane" "Bob"

exampleMap

## DataFrame with 5 rows and 3 columns
##       primary       assay   assayname
##   <character> <character> <character>
## 1        Jack        Jack   methyl 2k
## 2        Jill        Jill   methyl 2k
## 3         Bob         Bob   methyl 2k
## 4        Jack        Jack   methyl 3k
## 5         Bob         Bob   methyl 3k

PrepMultiAssay(doubleExp, patient.data, exampleMap)$drops

## Not all colnames in the Elist are found in the sampleMap, 
##   dropping samples from Elist...

## methyl 2k methyl 3k 
##   "Bobby"    "Jane"

## $`columns.methyl 2k`
## [1] "Bobby"
## 
## $`columns.methyl 3k`
## [1] "Jane"

A similar operation is performed for checking “primary” sampleMap names and pData rownames. In this example, we add a row corresponding to “Joe” that does not have a match in the experiment data.

exMap <- rbind(dfmap,
               DataFrame(primary = "Joe",
                         assay = "Joe",
                         assayname = "New methyl"))
PrepMultiAssay(objlist, patient.data, exMap)$drops

## Warning in PrepMultiAssay(objlist, patient.data, exMap): Lengths of names
## in the Elist and sampleMap are not equal

## Not all names in the primary column of the sampleMap
##   could be matched to the pData rownames; see $drops

## DataFrame with 1 row and 3 columns
##       primary       assay   assayname
##   <character> <character> <character>
## 1         Joe         Joe  New methyl

## $sampleMap_rows
## DataFrame with 1 row and 3 columns
##       primary       assay   assayname
##   <character> <character> <character>
## 1         Joe         Joe  New methyl

To create a MultiAssayExperiment from the results of the PrepMultiAssay function, take each corresponding element from the resulting list and enter them as arguments to the MultiAssayExperiment constructor function.

prepped <- PrepMultiAssay(objlist, patient.data, exMap)

## Warning in PrepMultiAssay(objlist, patient.data, exMap): Lengths of names
## in the Elist and sampleMap are not equal

## Not all names in the primary column of the sampleMap
##   could be matched to the pData rownames; see $drops

## DataFrame with 1 row and 3 columns
##       primary       assay   assayname
##   <character> <character> <character>
## 1         Joe         Joe  New methyl

preppedMulti <- MultiAssayExperiment(prepped$Elist, prepped$pData, prepped$sampleMap)
preppedMulti

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 5 columns 
##  [3] Mirna: "matrix" - 4 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

5 `RangedRaggedAssay` class

Note that the GRangesList got converted to a RangedRaggedAssay, a class intended for data such as segmented copy number that is provides different genomic ranges for each sample. RangedRaggedAssay is defined by this package and inherits from GRangesList:

methods(class="RangedRaggedAssay")

##   [1] !                   !=                  $                  
##   [4] $<-                 %in%                <                  
##   [7] <=                  ==                  >                  
##  [10] >=                  Filter              NROW               
##  [13] ROWNAMES            Reduce              [                  
##  [16] [<-                 [[                  [[<-               
##  [19] aggregate           anyNA               append             
##  [22] as.character        as.complex          as.data.frame      
##  [25] as.env              as.integer          as.list            
##  [28] as.logical          as.matrix           as.numeric         
##  [31] as.raw              assay               by                 
##  [34] c                   classNameForDisplay coerce             
##  [37] coerce<-            colnames            colnames<-         
##  [40] countOverlaps       coverage            dim                
##  [43] disjoin             do.call             droplevels         
##  [46] duplicated          elementMetadata     elementMetadata<-  
##  [49] elementNROWS        elementType         end                
##  [52] end<-               endoapply           eval               
##  [55] expand              expand.grid         extractROWS        
##  [58] findOverlaps        flank               getHits            
##  [61] getListElement      head                high2low           
##  [64] ifelse              intersect           is.na              
##  [67] is.unsorted         isDisjoint          isEmpty            
##  [70] lapply              length              lengths            
##  [73] match               mcols               mcols<-            
##  [76] mendoapply          metadata            metadata<-         
##  [79] mstack              names               names<-            
##  [82] narrow              ncol                nrow               
##  [85] order               overlapsAny         parallelSlotNames  
##  [88] pcompare            pcompareRecursively pintersect         
##  [91] promoters           psetdiff            punion             
##  [94] range               ranges              ranges<-           
##  [97] rank                reduce              relist             
## [100] rename              rep                 rep.int            
## [103] replaceROWS         resize              restrict           
## [106] rev                 revElements         rowRanges<-        
## [109] rownames            sapply              score              
## [112] score<-             seqinfo             seqinfo<-          
## [115] seqlevelsInUse      seqnames            seqnames<-         
## [118] setdiff             shift               shiftApply         
## [121] show                showAsCell          sort               
## [124] split               split<-             splitAsList        
## [127] stack               start               start<-            
## [130] strand              strand<-            subset             
## [133] subsetByColumn      subsetByOverlaps    subsetByRow        
## [136] table               tail                tapply             
## [139] union               unique              unlist             
## [142] unsplit             updateObject        values             
## [145] values<-            width               width<-            
## [148] window              window<-            with               
## [151] within              xtabs               xtfrm              
## [154] zipdown            
## see '?methods' for accessing help and source code

getMethod("colnames", "RangedRaggedAssay")

## Method Definition:
## 
## function (x, do.NULL = TRUE, prefix = "col") 
## {
##     .local <- function (x) 
##     base::names(x)
##     .local(x)
## }
## <environment: namespace:MultiAssayExperiment>
## 
## Signatures:
##         x                  
## target  "RangedRaggedAssay"
## defined "RangedRaggedAssay"

It has some additional methods that are required for any data class contained in a MultiAssayExperiment:

class(Elist(myMultiAssay)[[4]])

## [1] "RangedRaggedAssay"
## attr(,"package")
## [1] "MultiAssayExperiment"

rownames(Elist(myMultiAssay)[[4]])

## [1] "1" "2" "3" "4" "5"

colnames(Elist(myMultiAssay)[[4]])

## [1] "snparray1" "snparray2" "snparray3"

One of the requirements for the assay method (specifically for this RangedRaggedAssay Elist element) is that the metadata have a score column from which to obtain values for the resulting assay matrix. Here we add ficticious values to such column contained within list elements. See assay,RangedRaggedAssay,ANY-method documentation.

metadata(Elist(myMultiAssay)[[4]]) <- list(snparray1 = DataFrame(score = 1),
                                           snparray2 = DataFrame(score = 1),
                                           snparray3 = DataFrame(score = 3))
assay(Elist(myMultiAssay)[[4]], background = 2)

##                          snparray1 snparray2 snparray3
## chr3:58000000-59502360:+         1         2         2
## chr3:58493000-58501999:+         2         1         2
## chr3:3-9002:-                    2         1         2
## chr1:1-3:-                       2         2         3
## chr2:4-9:-                       2         2         3

6 Integrated subsetting across experiments

The core functionality of MultiAssayExperiment is to allow subsetting by assay, rownames, and colnames, across all experiments simultaneously while guaranteeing continued matching of samples.

6.1 Subsetting samples / columns

Experimental samples are stored in the rows of pData but the columns of elements of Elist, so when we refer to subsetting by columns, we are referring to columns of the experimental assays. Subsetting by samples / columns will be more obvious after recalling the pData:

pData(myMultiAssay)

## DataFrame with 4 rows and 2 columns
##              sex       age
##         <factor> <integer>
## Jack           M        38
## Jill           F        39
## Bob            M        40
## Barbara        F        41

Subsetting by samples identifies the selected samples in rows of the pData DataFrame, then selects all columns of the Elist corresponding to these rows. Here we use an integer to keep the first two rows of pData, and all experimental assays associated to those two primary samples:

subsetByColumn(myMultiAssay, 1:2)

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 2 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 3 columns 
##  [3] Mirna: "matrix" - 4 rows, 1 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 2 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

Note that the above operation keeps different numbers of columns / samples from each assay, reflecting the reality that some samples may not have been assayed in all experiments, and may have replicates in some.

Subsetting the primary identifiers using a character vector corresponding to some rownames of pData returns the same result:

subsetByColumn(myMultiAssay, c("Jack", "Jill"))

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 2 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 3 columns 
##  [3] Mirna: "matrix" - 4 rows, 1 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 2 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

Columns can be subset using a logical:

malesMultiAssay <- subsetByColumn(myMultiAssay, pData(myMultiAssay)$sex=="M")
pData(malesMultiAssay)

## DataFrame with 2 rows and 2 columns
##           sex       age
##      <factor> <integer>
## Jack        M        38
## Bob         M        40

Note that selecting male patients from all assays could have been accomplished equivalently using the square bracket:

myMultiAssay[, pData(myMultiAssay)$sex=="M", ]

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 2 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 3 columns 
##  [3] Mirna: "matrix" - 4 rows, 2 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 1 rows, 1 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 2 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

Finally, for special use cases you can exert detail control of which samples to select using a list or CharacterList, which is just a convenient form of a list containing character vectors.

allsamples <- colnames(myMultiAssay)
allsamples

## CharacterList of length 5
## [["Affy"]] array1 array2 array3 array4
## [["Methyl 450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
## [["Mirna"]] micro1 micro2 micro3
## [["CNV gistic"]] snparray1 snparray2 snparray3
## [["CNV gistic2"]] mysnparray1 mysnparray2 mysnparray3 mysnparray4

Now let’s get rid of the Methyl 450k arrays 3-5, a couple different but equivalent ways:

allsamples[["Methyl 450k"]] <- allsamples[["Methyl 450k"]][-3:-5]
myMultiAssay[, as.list(allsamples), ]

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 2 columns 
##  [3] Mirna: "matrix" - 4 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

6.2 Subsetting assays

You can select certain assays / experiments using subset, by providing a character, logical, or integer vector. An example using character:

subsetByAssay(myMultiAssay, c("Affy", "CNV gistic"))

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] Affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

Examples using logical and integer:

is.cnv = grepl("CNV", names(Elist(myMultiAssay)))
is.cnv

## [1] FALSE FALSE FALSE  TRUE  TRUE

subsetByAssay(myMultiAssay, is.cnv)

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [2] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

subsetByAssay(myMultiAssay, which(is.cnv))

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [2] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

subsetByRow, subsetByColumn, and subsetByAssay are endogenous operations, in that it always returns another MultiAssayExperiment object. Use assay(myMultiAssay) to retrieve the experimental data in an ordinary list of datasets as their original classes.

6.3 Subsetting rows (features) by IDs, integers, or logicals

Rows of the assays correspond to assay features or measurements, such as genes. Regardless of whether the assay is ID-based (e.g. matrix, ExpressionSet) or range-based (e.g. RangedSummarizedExperiment, RangedRaggedAssay), they can be subset using any of:

a character vector of IDs that will be matched to rownames in each assay
an integer vector that will select rows of this position from each assay. This probably doesn’t make sense unless every Elist element represents the same measurements in the same order and will generate an error if any of the integer elements exceeds the number of rows in any Elist element. The most likely use of integer subsetting would as a “head()” function, for example to look at the first 6 rows of each assay.
a logical vector that will be passed directly to the row subsetting operation for each assay. A warning is issued if this results in recycling for any of the assays.
a list or CharacterList of the same length as Elist. Each element of the subsetting list will be passed on exactly to subset rows of the corresponding element of Elist.

Again, this operation always returns a MultiAssayExperiment class, unless “drop=TRUE” is passed to subset, with any Elist element not containing the feature having zero rows.

For example, return a MultiAssayExperiment where Affy and Methyl 450k contain only ENST0000035076 row, and “Mirna” and “CNV gistic” have zero rows: (drop argument is set to TRUE by default)

featSubsetted0 <- subsetByRow(myMultiAssay, "ENST00000355076")
class(featSubsetted0)

## [1] "MultiAssayExperiment"
## attr(,"package")
## [1] "MultiAssayExperiment"

class(Elist(featSubsetted0))

## [1] "Elist"
## attr(,"package")
## [1] "MultiAssayExperiment"

Elist(featSubsetted0)

## "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 1 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 1 rows, 5 columns 
##  [3] Mirna: "matrix" - 0 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 0 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 0 rows, 4 columns

In the following, Affy ExpressionSet keeps both rows but with their order reversed, and Methyl 450k keeps only its second row.

featSubsetted <-
  subsetByRow(myMultiAssay, c("ENST00000355076", "ENST00000294241"))
exprs(Elist(myMultiAssay)[[1]])

##                 array1 array2 array3 array4
## ENST00000294241    101    103    105    107
## ENST00000355076    102    104    106    108

exprs(Elist(featSubsetted)[[1]])

##                 array1 array2 array3 array4
## ENST00000355076    102    104    106    108
## ENST00000294241    101    103    105    107

6.4 Subsetting rows (features) by `GenomicRanges`

For MultiAssayExperiment objects containing range-based objects (currently RangedSummarizedExperiment and RangedRaggedAssay), these can be subset using a GRanges object, for example:

gr <- GRanges(seqnames = c("chr1"), strand = c("-", "+", "-"),
              ranges = IRanges(start = c(1, 4, 6), width = 3))

Now do the subsetting. The function doing the work here is IRanges::subsetByOverlaps - see its arguments for flexible types of subsetting by range. The first three arguments here are for subset, the rest passed on to IRanges::subsetByOverlaps through “…”:

subsetted <- subsetByRow(myMultiAssay, gr, maxgap = 2L, type = "within")
Elist(subsetted)

## "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 0 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 0 rows, 5 columns 
##  [3] Mirna: "matrix" - 0 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 1 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 0 rows, 4 columns

6.5 Subsetting by square bracket `[`

The bracket method for the MultiAssayExperiment is equivalent but more compact than the subsetBy*() methods. The three positions within the bracket operator indicate rows, columns, and assays, respectively (pseudocode):

myMultiAssay[rows, columns, assays]

For example, to select the gene ENST00000355076:

myMultiAssay["ENST00000355076", , ]

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] Affy: "ExpressionSet" - 1 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 1 rows, 5 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

The above operation works across all types of assays, whether ID-based (e.g. matrix, ExpressionSet, SummarizedExperiment) or range-based (e.g. RangedSummarizedExperiment, RangedRaggedAssay).

You can subset by rows, columns, and assays in a single bracket operation, and they will be performed in that order (rows, then columns, then assays):

myMultiAssay["ENST00000355076", 1:2, c("Affy", "Methyl 450k")]

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] Affy: "ExpressionSet" - 1 rows, 2 columns 
##  [2] Methyl 450k: "matrix" - 1 rows, 3 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

6.6 Subsetting by character, integer, and logical

By columns - character, integer, and logical are all allowed, for example:

myMultiAssay[, "Jack", ]

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 1 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 2 columns 
##  [3] Mirna: "matrix" - 4 rows, 1 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 1 rows, 1 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 1 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

myMultiAssay[, 1, ]

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 1 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 2 columns 
##  [3] Mirna: "matrix" - 4 rows, 1 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 1 rows, 1 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 1 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

myMultiAssay[, c(TRUE, FALSE, FALSE, FALSE), ]

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 1 columns 
##  [2] Methyl 450k: "matrix" - 2 rows, 2 columns 
##  [3] Mirna: "matrix" - 4 rows, 1 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 1 rows, 1 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 1 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

By assay - character, integer, and logical are allowed:

myMultiAssay[, , "Mirna"]

## A "MultiAssayExperiment" object of 1 listed
##  experiment with a user-defined name and respective class. 
##  Containing an "Elist" class object of length 1: 
##  [1] Mirna: "matrix" - 4 rows, 3 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

myMultiAssay[, , 3]

## A "MultiAssayExperiment" object of 1 listed
##  experiment with a user-defined name and respective class. 
##  Containing an "Elist" class object of length 1: 
##  [1] Mirna: "matrix" - 4 rows, 3 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

myMultiAssay[, , c(FALSE, FALSE, TRUE, FALSE, FALSE)]

## A "MultiAssayExperiment" object of 1 listed
##  experiment with a user-defined name and respective class. 
##  Containing an "Elist" class object of length 1: 
##  [1] Mirna: "matrix" - 4 rows, 3 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

6.7 the “drop” argument

Specify drop=FALSE to keep assays with zero rows or zero columns, e.g.:

myMultiAssay["ENST00000355076", , , drop=FALSE]

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 1 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 1 rows, 5 columns 
##  [3] Mirna: "matrix" - 0 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 0 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 0 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

Using the default drop=TRUE, assays with no rows or no columns are removed:

myMultiAssay["ENST00000355076", , , drop=TRUE]

## A "MultiAssayExperiment" object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 2: 
##  [1] Affy: "ExpressionSet" - 1 rows, 4 columns 
##  [2] Methyl 450k: "matrix" - 1 rows, 5 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

7 rownames and colnames

rownames and colnames return a CharacterList of rownames and colnames across all the assays. A CharacterList is just an alternative to list when each element contains a character vector, that provides a nice show method:

rownames(myMultiAssay)

## CharacterList of length 5
## [["Affy"]] ENST00000294241 ENST00000355076
## [["Methyl 450k"]] ENST00000355076 ENST00000383706
## [["Mirna"]] hsa-miR-21 hsa-miR-191 hsa-miR-148a hsa-miR148b
## [["CNV gistic"]] 1 2 3 4 5
## [["CNV gistic2"]] a b c d e

colnames(myMultiAssay)

## CharacterList of length 5
## [["Affy"]] array1 array2 array3 array4
## [["Methyl 450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
## [["Mirna"]] micro1 micro2 micro3
## [["CNV gistic"]] snparray1 snparray2 snparray3
## [["CNV gistic2"]] mysnparray1 mysnparray2 mysnparray3 mysnparray4

8 Requirements for support of additional data classes

Any data classes in the Elist object must support the following methods:

colnames()
rownames()
[
dim()

Here is what happens if one of the methods doesn’t:

objlist2 <- objlist
objlist2[[2]] <- data.frame(objlist2[[2]])
invalid.obj <- try(MultiAssayExperiment(objlist2, patient.data, dfmap))
invalid.obj

## A "MultiAssayExperiment" object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an "Elist" class object of length 5: 
##  [1] Affy: "ExpressionSet" - 2 rows, 4 columns 
##  [2] Methyl 450k: "data.frame" - 2 rows, 5 columns 
##  [3] Mirna: "matrix" - 4 rows, 3 columns 
##  [4] CNV gistic: "RangedRaggedAssay" - 5 rows, 3 columns 
##  [5] CNV gistic2: "RangedSummarizedExperiment" - 5 rows, 4 columns 
## To access slots use: 
##  Elist() - to obtain the "Elist" of experiment instances 
##  pData() - for the primary/phenotype "DataFrame" 
##  sampleMap() - for the sample availability "DataFrame" 
##  metadata() - for the metadata object of 'ANY' class 
## See also: subsetByAssay(), subsetByRow(), subsetByColumn()

9 Methods for MultiAssayExperiment

The following methods are defined for MultiAssayExperiment:

methods(class="MultiAssayExperiment")

##  [1] Elist          Elist<-        [              assay         
##  [5] colnames       getHits        isEmpty        length        
##  [9] metadata       names          pData          pData<-       
## [13] rownames       sampleMap      sampleMap<-    show          
## [17] subsetByAssay  subsetByColumn subsetByRow   
## see '?methods' for accessing help and source code

10 Wishlist

c() function for adding new assays to existing MultiAssayExperiment
- e.g. c(myMultiAssay, neweset)
- require that sample names in the new object match pData sample names
- require that sample names in the new object already exist in pData
Figure out how to support a “long-and-skinny” SQL database
“mergeDups” function to merge duplicate samples in any assay
- For matrix-like objects, it is clear how to do this. Default would be simple mean of the columns, but could allow user-specified functions.
- For GRangesList, it’s not obvious how to merge duplicates. Just concatenate?

11 sessionInfo()

sessionInfo()

## R version 3.3.0 (2016-05-03)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] SummarizedExperiment_1.3.2   GenomicRanges_1.25.0        
## [3] GenomeInfoDb_1.9.2           IRanges_2.7.0               
## [5] S4Vectors_0.11.0             Biobase_2.33.0              
## [7] BiocGenerics_0.19.0          MultiAssayExperiment_0.101.2
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5          knitr_1.13           XVector_0.13.0      
##  [4] magrittr_1.5         zlibbioc_1.19.0      xtable_1.8-2        
##  [7] R6_2.1.2             stringr_1.0.0        tools_3.3.0         
## [10] shinydashboard_0.5.1 htmltools_0.3.5      yaml_2.1.13         
## [13] digest_0.6.9         shiny_0.13.2         formatR_1.4         
## [16] evaluate_0.9         mime_0.4             rmarkdown_0.9.6     
## [19] stringi_1.0-1        httpuv_1.3.3

MultiAssayExperiment primary vignette

MultiAssay Special Interest Group

May 15, 2016

1 API

2 Overview of the `MultiAssayExperiment` class

2.1 Elist: experimental data

2.2 pData: primary data

2.3 Note on the flexibility of the `DataFrame`

2.4 sampleMap: relating pData to multiple assays

2.5 metadata

2.6 drops

3 A simple example where sampleMap isn’t provided

4 Creating a `MultiAssayExperiment` object: a rich example

4.1 Create toy datasets demonstrating all supported data types

4.2 creating a sampleMap

4.3 Experimental data as a `list()`

4.4 Creation of the `MultiAssayExperiment` class object

4.5 Helper function to create a `MultiAssayExperiment` object

5 `RangedRaggedAssay` class

6 Integrated subsetting across experiments

6.1 Subsetting samples / columns

6.2 Subsetting assays

6.3 Subsetting rows (features) by IDs, integers, or logicals

6.4 Subsetting rows (features) by `GenomicRanges`

6.5 Subsetting by square bracket `[`

6.6 Subsetting by character, integer, and logical

6.7 the “drop” argument

7 rownames and colnames

8 Requirements for support of additional data classes

9 Methods for MultiAssayExperiment

10 Wishlist

11 sessionInfo()

MultiAssayExperiment primary vignette

MultiAssay Special Interest Group

May 15, 2016

1 API

2 Overview of the MultiAssayExperiment class

2.1 Elist: experimental data

2.2 pData: primary data

2.3 Note on the flexibility of the DataFrame

2.4 sampleMap: relating pData to multiple assays

2.5 metadata

2.6 drops

3 A simple example where sampleMap isn’t provided

4 Creating a MultiAssayExperiment object: a rich example

4.1 Create toy datasets demonstrating all supported data types

4.2 creating a sampleMap

4.3 Experimental data as a list()

4.4 Creation of the MultiAssayExperiment class object

4.5 Helper function to create a MultiAssayExperiment object

5 RangedRaggedAssay class

6 Integrated subsetting across experiments

6.1 Subsetting samples / columns

6.2 Subsetting assays

6.3 Subsetting rows (features) by IDs, integers, or logicals

6.4 Subsetting rows (features) by GenomicRanges

6.5 Subsetting by square bracket [

6.6 Subsetting by character, integer, and logical

6.7 the “drop” argument

7 rownames and colnames

8 Requirements for support of additional data classes

9 Methods for MultiAssayExperiment

10 Wishlist

11 sessionInfo()

2 Overview of the `MultiAssayExperiment` class

2.3 Note on the flexibility of the `DataFrame`

4 Creating a `MultiAssayExperiment` object: a rich example

4.3 Experimental data as a `list()`

4.4 Creation of the `MultiAssayExperiment` class object

4.5 Helper function to create a `MultiAssayExperiment` object

5 `RangedRaggedAssay` class

6.4 Subsetting rows (features) by `GenomicRanges`

6.5 Subsetting by square bracket `[`