Contents

1 Introduction

This vignette assumes that the user is familiar with the basics of the SummarizedBenchmark framework for benchmarking computational methods described in the SummarizedBenchmark: Introduction vignette (the “introduction”). New users should start from the package description and example presented in the introduction before moving on to more advanced topics, including the class descriptions in this vignette. Here, we describe the anatomy of the BenchDesign and SummarizedBenchmark classes in a little more detail than was presented in the introduction.

1.1 Example Case Study

The classes are described in more detail using the same example benchmark of methods for multiple comparisons adjustment described in the introduction. Briefly, three methods, the Bonferroni correction, Benjamini-Hochberg step-up procedure, and Storey’s FDR q-value were benchmarked on a collection of 50 simulated two-sample t-tests. The simulated t-test results which serve as inputs to the methods are included with the package in the tdat data set.

The code above is identical to what was used in the introduction to construct the BenchDesign, b, and the SummarizedBenchmark results, sb. If any of this is unclear, consider revisting the SummarizedBenchmark: Introduction vignette.

2 BenchDesign

BenchDesign objects are composed of a data set and methods. Formally, the methods are stored in the BenchDesign as a BDMethodList and the data as a BDData object. As would be expected, the BDMethodList is a list (List) of BDMethod objects, each containing the definition of a method to be compared in the benchmark experiment. The general structure is illustrated in the figure below.

BenchDesign class structure

BenchDesign class structure

Continuing with the BenchDesign constructed above, we can access the list of methods and the data set stored in the object using the BDMethodList() and BDData() functions.

## BenchDesign Method List (BDMethodList) --------------------- 
## list of 3 methods 
##   method:  bonf; func: p.adjust       
##   method:    BH; func: p.adjust       
##   method:    qv; func: qvalue::qvalue
## BenchDesign Data (BDData) ---------------------------------- 
##   type: data 
##   data:
##    H test_statistic effect_size         pval        SE
## 1  1     -3.2083437 -1.17151466 4.872282e-03 0.3651463
## 2  0      0.1692236  0.07321912 8.675080e-01 0.4326768
## 3  1     -5.7077940 -1.81715381 2.061521e-05 0.3183636
## 4  1     -1.9805856 -1.09107836 6.313031e-02 0.5508867
## 5  1     -1.0014358 -0.37726058 3.298895e-01 0.3767197
## 6  1     -0.9190433 -0.47583669 3.702252e-01 0.5177522
## 7  1     -1.3184486 -0.72750221 2.038846e-01 0.5517866
## 8  1     -3.8492346 -1.47787871 1.175116e-03 0.3839409
## 9  1     -3.4836213 -1.35474550 2.651401e-03 0.3888900
## 10 0      0.6238195  0.25470364 5.405719e-01 0.4082970
##  ... with 40 more rows.

2.1 BDMethod and BDMethodList

The list of methods inherits from the SimpleList class, and supports several useful accessor and setter features.

## BenchDesign Method (BDMethod) ------------------------------ 
##   method: p.adjust
##     function (p, method = p.adjust.methods, n = length(p)... 
##   parameters:
##     p : pval 
##     method : "bonferroni" 
##   post:
##     none
##   meta:
##     none

This interface allows for adding new methods by creating a new named entry in the BDMethodList of the BenchDesign object.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: data 
##     names: H, test_statistic, effect_size, pval, SE 
##   benchmark methods:
##     method:   bonf; func: p.adjust       
##     method:     BH; func: p.adjust       
##     method:     qv; func: qvalue::qvalue 
##     method:  bonf2; func: p.adjust

Methods can also be removed by setting the list value to NULL. An equivalent convenience function, dropMethod() also exists for removing methods using syntax similar to addMethod().

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: data 
##     names: H, test_statistic, effect_size, pval, SE 
##   benchmark methods:
##     method:     BH; func: p.adjust       
##     method:     qv; func: qvalue::qvalue 
##     method:  bonf2; func: p.adjust

Each BDMethod object in a BDMethodList encapsulates a method to be benchmarked. The contents of the object are all of the arguments passed through the addMethod() convenience function used in the example above. New BDMethod objects can be constructed directly using the BDMethod() constructor, which takes the same input parameters as addMethod().

## BenchDesign Method (BDMethod) ------------------------------ 
##   method: p.adjust
##     function (p, method = p.adjust.methods, n = length(p)... 
##   parameters:
##     p : pval 
##     method : "bonferroni" 
##   post:
##     none
##   meta:
##     none

Directly modifying the BDMethodList object provides an alternative approach (aside form using addMethod()) to adding methods to a BenchDesign object.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: data 
##     names: H, test_statistic, effect_size, pval, SE 
##   benchmark methods:
##     method:     BH; func: p.adjust       
##     method:     qv; func: qvalue::qvalue 
##     method:  bonf2; func: p.adjust       
##     method:   bonf; func: p.adjust

2.2 BDData

The BDData object is a simple object which only contains two slots, @type and @data. The @type of a BDData object can be either data or hash. If the @type is data, the @data slot is a standard list or data.frame object of raw data. However, if the @type is hash, then the @data slot is just a MD5 hash of some raw data object computed using digest::digest(). As described next in Section 3, SummarizedBenchmark objects also contain the BenchDesign object used to generate the benchmark results. Often, the original raw data can be large, and saving the full data set as part of the SummarizedBenchmark object can be undesirable and unnecessary. While the raw data set is needed during the benchmark experiment, by default, the BDData is converted to a MD5 hash beforing stored the BenchDesign as part of a SummarizedBenchmark object. Using this approach, it is still possible to confirm whether a certain data set was used for a benchmarking experiment without having to store copies of the raw data with every SummarizedBenchmark.

Returning to the BenchDesign example from above, the BDData object of a BenchDesign can also be similarly extracted and modified. As with BDMethod objects, the data set may be removed by setting the value to NULL.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     NULL
##   benchmark methods:
##     method:     BH; func: p.adjust       
##     method:     qv; func: qvalue::qvalue 
##     method:  bonf2; func: p.adjust       
##     method:   bonf; func: p.adjust

A new data set can be constructed using the BDData() function and used to replace the data set in a BenchDesign object.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: data 
##     names: H, test_statistic, effect_size, pval, SE 
##   benchmark methods:
##     method:     BH; func: p.adjust       
##     method:     qv; func: qvalue::qvalue 
##     method:  bonf2; func: p.adjust       
##     method:   bonf; func: p.adjust

The helper function hashBDData() can be called on either BDData or BenchDesign objects to convert the data to a MD5 hash. By default, this function is called on the BenchDesign object during buildBench() before storing the object in the resulting SummarizedBenchmark.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: md5hash 
##     MD5 hash: 09867f2696d3178a8f3bc038917042e1 
##   benchmark methods:
##     method:     BH; func: p.adjust       
##     method:     qv; func: qvalue::qvalue 
##     method:  bonf2; func: p.adjust       
##     method:   bonf; func: p.adjust

We drop the "bonf2" method from the object to return to the same set of methods used to generate the SummarizedBenchmark object.

3 SummarizedBenchmark

As described above, the SummarizedBenchmark class builds on the existing SummarizedExperiment class and Bioconductor infrastructure. By doing so, saved results are tightly linked to metadata. Thus, it is possible, for example, to easily subset the results without losing references to the respective metadata. For example, the code below extracts the data for only the first two methods.

## DataFrame with 2 rows and 6 columns
##         func.pkg func.pkg.vers func.pkg.manual     param.p param.method
##      <character>   <character>       <logical> <character>  <character>
## bonf       stats         4.0.0           FALSE        pval "bonferroni"
## BH         stats         4.0.0           FALSE        pval         "BH"
##      session.idx
##        <numeric>
## bonf           1
## BH             1

Building on the SummarizedExperiment class, in addition to the slot for performanceMetrics described above, the SummarizedBenchmark class also includes a slot which stores a copy of the BenchDesign object used to generate the results with buildBench(). The BenchDesign can be accessed by simply passing the object to BenchDesign().

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: md5hash 
##     MD5 hash: 09867f2696d3178a8f3bc038917042e1 
##   benchmark methods:
##     method:  bonf; func: p.adjust       
##     method:    BH; func: p.adjust       
##     method:    qv; func: qvalue::qvalue

Notice, however, unlike the original BenchDesign above, the data is stored as a MD5 hash. To prevent bloat when the data set is large, by default, only a MD5 hash computed using digest::digest() is stored with the benchmarking results. This behavior can be changed by setting keepData = TRUE when calling buildBench(), in which case, the complete data set will be stored as part of the BenchDesign and SummarizedBenchmark objects. A simple helper function, compareBDData(), can be used to verify that while the data type may be different between the data originally used to construct the SummarizedBenchmark and the data stored with the SummarizedBenchmark, the referenced data sets are the same (i.e. have matching MD5 hash values).

## $data
## [1] TRUE
## 
## $type
## [1] FALSE

Finally, information about the R session where the methods were executed is stored in the metadata of the SummarizedBenchmark as an entry in a list called sessions. The package supports rerunning benchmark experiments, and for this purpose, each new session is stored as a new entry in the sessions list. Since this experiment has only been executed once, the list only has a single entry in the sessions list. In addition to storing the session info and list of methods evaluated during the session, the list also keeps track of all parameters in the buildBench() call, i.e. truthCols = "H".

## $methods
## [1] "bonf" "BH"   "qv"  
## 
## $results
## $results$bonf
## $results$bonf$default
## [1] "success"
## 
## 
## $results$BH
## $results$BH$default
## [1] "success"
## 
## 
## $results$qv
## $results$qv$default
## [1] "success"
## 
## 
## 
## $parameters
## $parameters$truthCols
## [1] "H"
## 
## $parameters$ftCols
## NULL
## 
## $parameters$sortIDs
## [1] FALSE
## 
## $parameters$keepData
## [1] FALSE
## 
## $parameters$catchErrors
## [1] TRUE
## 
## $parameters$parallel
## [1] FALSE
## 
## $parameters$BPPARAM
## class: MulticoreParam
##   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
##   bpexportglobals: TRUE
##   bplogdir: NA
##   bpresultdir: NA
##   cluster type: FORK
## 
## 
## $sessionInfo
## ─ Session info ────────────────────────────────────────────────────────────
##  setting  value                                             
##  version  R Under development (unstable) (2019-10-24 r77329)
##  os       Ubuntu 18.04.3 LTS                                
##  system   x86_64, linux-gnu                                 
##  ui       X11                                               
##  language (EN)                                              
##  collate  C                                                 
##  ctype    en_US.UTF-8                                       
##  tz       America/New_York                                  
##  date     2019-11-05                                        
## 
## ─ Packages ────────────────────────────────────────────────────────────────
##  package                * version   date       lib source        
##  acepack                  1.4.1     2016-10-29 [2] CRAN (R 4.0.0)
##  annotate                 1.65.0    2019-11-05 [2] Bioconductor  
##  AnnotationDbi            1.49.0    2019-11-05 [2] Bioconductor  
##  AnnotationHub            2.19.1    2019-11-05 [2] Bioconductor  
##  ashr                     2.2-39    2019-10-17 [2] CRAN (R 4.0.0)
##  assertthat               0.2.1     2019-03-21 [2] CRAN (R 4.0.0)
##  backports                1.1.5     2019-10-02 [2] CRAN (R 4.0.0)
##  base64enc                0.1-3     2015-07-28 [2] CRAN (R 4.0.0)
##  beeswarm                 0.2.3     2016-04-25 [2] CRAN (R 4.0.0)
##  Biobase                * 2.47.0    2019-11-05 [2] Bioconductor  
##  BiocFileCache            1.11.0    2019-11-05 [2] Bioconductor  
##  BiocGenerics           * 0.33.0    2019-11-05 [2] Bioconductor  
##  BiocManager              1.30.9    2019-10-23 [2] CRAN (R 4.0.0)
##  BiocNeighbors            1.5.1     2019-11-05 [2] Bioconductor  
##  BiocParallel           * 1.21.0    2019-11-05 [2] Bioconductor  
##  BiocSingular             1.3.0     2019-11-05 [2] Bioconductor  
##  BiocStyle              * 2.15.0    2019-11-05 [2] Bioconductor  
##  BiocVersion              3.11.0    2019-11-05 [2] Bioconductor  
##  bit                      1.1-14    2018-05-29 [2] CRAN (R 4.0.0)
##  bit64                    0.9-7     2017-05-08 [2] CRAN (R 4.0.0)
##  bitops                   1.0-6     2013-08-17 [2] CRAN (R 4.0.0)
##  blob                     1.2.0     2019-07-09 [2] CRAN (R 4.0.0)
##  bookdown                 0.14      2019-10-01 [2] CRAN (R 4.0.0)
##  checkmate                1.9.4     2019-07-04 [2] CRAN (R 4.0.0)
##  cli                      1.1.0     2019-03-19 [2] CRAN (R 4.0.0)
##  cluster                  2.1.0     2019-06-19 [2] CRAN (R 4.0.0)
##  codetools                0.2-16    2018-12-24 [2] CRAN (R 4.0.0)
##  colorspace               1.4-1     2019-03-18 [2] CRAN (R 4.0.0)
##  crayon                 * 1.3.4     2017-09-16 [2] CRAN (R 4.0.0)
##  curl                     4.2       2019-09-24 [2] CRAN (R 4.0.0)
##  data.table               1.12.6    2019-10-18 [2] CRAN (R 4.0.0)
##  DBI                      1.0.0     2018-05-02 [2] CRAN (R 4.0.0)
##  dbplyr                   1.4.2     2019-06-17 [2] CRAN (R 4.0.0)
##  DelayedArray           * 0.13.0    2019-11-05 [2] Bioconductor  
##  DelayedMatrixStats       1.9.0     2019-11-05 [2] Bioconductor  
##  DESeq2                 * 1.27.2    2019-11-05 [2] Bioconductor  
##  digest                 * 0.6.22    2019-10-21 [2] CRAN (R 4.0.0)
##  doParallel               1.0.15    2019-08-02 [2] CRAN (R 4.0.0)
##  dplyr                  * 0.8.3     2019-07-04 [2] CRAN (R 4.0.0)
##  edgeR                  * 3.29.0    2019-11-05 [2] Bioconductor  
##  ellipsis                 0.3.0     2019-09-20 [2] CRAN (R 4.0.0)
##  evaluate                 0.14      2019-05-28 [2] CRAN (R 4.0.0)
##  ExperimentHub            1.13.1    2019-11-05 [2] Bioconductor  
##  fansi                    0.4.0     2018-10-05 [2] CRAN (R 4.0.0)
##  fastmap                  1.0.1     2019-10-08 [2] CRAN (R 4.0.0)
##  fitdistrplus             1.0-14    2019-01-23 [2] CRAN (R 4.0.0)
##  foreach                  1.4.7     2019-07-27 [2] CRAN (R 4.0.0)
##  foreign                  0.8-72    2019-08-02 [2] CRAN (R 4.0.0)
##  Formula                  1.2-3     2018-05-03 [2] CRAN (R 4.0.0)
##  genefilter               1.69.0    2019-11-05 [2] Bioconductor  
##  geneplotter              1.65.0    2019-11-05 [2] Bioconductor  
##  GenomeInfoDb           * 1.23.0    2019-11-05 [2] Bioconductor  
##  GenomeInfoDbData         1.2.2     2019-10-25 [2] Bioconductor  
##  GenomicRanges          * 1.39.1    2019-11-05 [2] Bioconductor  
##  ggbeeswarm               0.6.0     2017-08-07 [2] CRAN (R 4.0.0)
##  ggplot2                * 3.2.1     2019-08-10 [2] CRAN (R 4.0.0)
##  glue                     1.3.1     2019-03-12 [2] CRAN (R 4.0.0)
##  gridExtra                2.3       2017-09-09 [2] CRAN (R 4.0.0)
##  gtable                   0.3.0     2019-03-25 [2] CRAN (R 4.0.0)
##  Hmisc                    4.2-0     2019-01-26 [2] CRAN (R 4.0.0)
##  htmlTable                1.13.2    2019-09-22 [2] CRAN (R 4.0.0)
##  htmltools                0.4.0     2019-10-04 [2] CRAN (R 4.0.0)
##  htmlwidgets              1.5.1     2019-10-08 [2] CRAN (R 4.0.0)
##  httpuv                   1.5.2     2019-09-11 [2] CRAN (R 4.0.0)
##  httr                     1.4.1     2019-08-05 [2] CRAN (R 4.0.0)
##  interactiveDisplayBase   1.25.0    2019-11-05 [2] Bioconductor  
##  IRanges                * 2.21.1    2019-11-05 [2] Bioconductor  
##  irlba                    2.3.3     2019-02-05 [2] CRAN (R 4.0.0)
##  iterators                1.0.12    2019-07-26 [2] CRAN (R 4.0.0)
##  knitr                    1.25      2019-09-18 [2] CRAN (R 4.0.0)
##  labeling                 0.3       2014-08-23 [2] CRAN (R 4.0.0)
##  later                    1.0.0     2019-10-04 [2] CRAN (R 4.0.0)
##  lattice                  0.20-38   2018-11-04 [2] CRAN (R 4.0.0)
##  latticeExtra             0.6-28    2016-02-09 [2] CRAN (R 4.0.0)
##  lazyeval                 0.2.2     2019-03-15 [2] CRAN (R 4.0.0)
##  lifecycle                0.1.0     2019-08-01 [2] CRAN (R 4.0.0)
##  limma                  * 3.43.0    2019-11-05 [2] Bioconductor  
##  locfit                   1.5-9.1   2013-04-20 [2] CRAN (R 4.0.0)
##  lsei                     1.2-0     2017-10-23 [2] CRAN (R 4.0.0)
##  magrittr               * 1.5       2014-11-22 [2] CRAN (R 4.0.0)
##  MASS                     7.3-51.4  2019-03-31 [2] CRAN (R 4.0.0)
##  Matrix                   1.2-17    2019-03-22 [2] CRAN (R 4.0.0)
##  matrixStats            * 0.55.0    2019-09-07 [2] CRAN (R 4.0.0)
##  mclust                 * 5.4.5     2019-07-08 [2] CRAN (R 4.0.0)
##  memoise                  1.1.0     2017-04-21 [2] CRAN (R 4.0.0)
##  mgcv                     1.8-30    2019-10-24 [2] CRAN (R 4.0.0)
##  mime                     0.7       2019-06-11 [2] CRAN (R 4.0.0)
##  mixsqp                   0.2-2     2019-10-16 [2] CRAN (R 4.0.0)
##  munsell                  0.5.0     2018-06-12 [2] CRAN (R 4.0.0)
##  nlme                     3.1-141   2019-08-01 [2] CRAN (R 4.0.0)
##  nnet                     7.3-12    2016-02-02 [2] CRAN (R 4.0.0)
##  npsurv                   0.4-0     2017-10-14 [2] CRAN (R 4.0.0)
##  pillar                   1.4.2     2019-06-29 [2] CRAN (R 4.0.0)
##  pkgconfig                2.0.3     2019-09-22 [2] CRAN (R 4.0.0)
##  plyr                     1.8.4     2016-06-08 [2] CRAN (R 4.0.0)
##  promises                 1.1.0     2019-10-04 [2] CRAN (R 4.0.0)
##  pscl                     1.5.2     2017-10-10 [2] CRAN (R 4.0.0)
##  purrr                    0.3.3     2019-10-18 [2] CRAN (R 4.0.0)
##  qvalue                   2.19.0    2019-11-05 [2] Bioconductor  
##  R6                       2.4.0     2019-02-14 [2] CRAN (R 4.0.0)
##  rappdirs                 0.3.1     2016-03-28 [2] CRAN (R 4.0.0)
##  RColorBrewer             1.1-2     2014-12-07 [2] CRAN (R 4.0.0)
##  Rcpp                     1.0.2     2019-07-25 [2] CRAN (R 4.0.0)
##  RCurl                    1.95-4.12 2019-03-04 [2] CRAN (R 4.0.0)
##  reshape2                 1.4.3     2017-12-11 [2] CRAN (R 4.0.0)
##  rlang                  * 0.4.1     2019-10-24 [2] CRAN (R 4.0.0)
##  rmarkdown                1.16      2019-10-01 [2] CRAN (R 4.0.0)
##  rpart                    4.1-15    2019-04-12 [2] CRAN (R 4.0.0)
##  RSQLite                  2.1.2     2019-07-24 [2] CRAN (R 4.0.0)
##  rstudioapi               0.10      2019-03-19 [2] CRAN (R 4.0.0)
##  rsvd                     1.0.2     2019-07-29 [2] CRAN (R 4.0.0)
##  S4Vectors              * 0.25.0    2019-11-05 [2] Bioconductor  
##  scales                   1.0.0     2018-08-09 [2] CRAN (R 4.0.0)
##  scater                 * 1.15.1    2019-11-05 [2] Bioconductor  
##  scRNAseq               * 2.1.1     2019-11-03 [2] Bioconductor  
##  sessioninfo            * 1.1.1     2018-11-05 [2] CRAN (R 4.0.0)
##  shiny                    1.4.0     2019-10-10 [2] CRAN (R 4.0.0)
##  SingleCellExperiment   * 1.9.0     2019-11-05 [2] Bioconductor  
##  splatter               * 1.11.0    2019-11-05 [2] Bioconductor  
##  SQUAREM                  2017.10-1 2017-10-07 [2] CRAN (R 4.0.0)
##  stringi                  1.4.3     2019-03-12 [2] CRAN (R 4.0.0)
##  stringr                * 1.4.0     2019-02-10 [2] CRAN (R 4.0.0)
##  SummarizedBenchmark    * 2.5.0     2019-11-05 [1] Bioconductor  
##  SummarizedExperiment   * 1.17.0    2019-11-05 [2] Bioconductor  
##  survival                 2.44-1.1  2019-04-01 [2] CRAN (R 4.0.0)
##  tibble                 * 2.1.3     2019-06-06 [2] CRAN (R 4.0.0)
##  tidyr                  * 1.0.0     2019-09-11 [2] CRAN (R 4.0.0)
##  tidyselect               0.2.5     2018-10-11 [2] CRAN (R 4.0.0)
##  truncnorm                1.0-8     2018-02-27 [2] CRAN (R 4.0.0)
##  tximport               * 1.15.0    2019-11-05 [2] Bioconductor  
##  UpSetR                 * 1.4.0     2019-05-22 [2] CRAN (R 4.0.0)
##  utf8                     1.1.4     2018-05-24 [2] CRAN (R 4.0.0)
##  vctrs                    0.2.0     2019-07-05 [2] CRAN (R 4.0.0)
##  vipor                    0.4.5     2017-03-22 [2] CRAN (R 4.0.0)
##  viridis                  0.5.1     2018-03-29 [2] CRAN (R 4.0.0)
##  viridisLite              0.3.0     2018-02-01 [2] CRAN (R 4.0.0)
##  withr                    2.1.2     2018-03-15 [2] CRAN (R 4.0.0)
##  xfun                     0.10      2019-10-01 [2] CRAN (R 4.0.0)
##  XML                      3.98-1.20 2019-06-06 [2] CRAN (R 4.0.0)
##  xtable                   1.8-4     2019-04-21 [2] CRAN (R 4.0.0)
##  XVector                  0.27.0    2019-11-05 [2] Bioconductor  
##  yaml                     2.2.0     2018-07-25 [2] CRAN (R 4.0.0)
##  zeallot                  0.1.0     2018-01-28 [2] CRAN (R 4.0.0)
##  zlibbioc                 1.33.0    2019-11-05 [2] Bioconductor  
## 
## [1] /tmp/RtmpwKtOBM/Rinst745054f56f78
## [2] /home/biocbuild/bbs-3.11-bioc/R/library

4 Reproducibility and Replicability

While the BenchDesign and SummarizedBenchmark classes are designed to enable easier reproduction and replication of benchmark experiments, they are not meant to completely replace properly written and documented analyses. If any functions or packages are needed for the methods defined in the BenchDesign, they must be available and sourced or loaded when running (or re-running) the analyses. As an example, suppose we have the following locally defined method, mymethod(), which depends on another locally defined, but unavailable, function, mysubmethod().

The top level mymethod() is captured in the corresponding BDMethod and BenchDesign. No errors are thrown since including mymethod() in a BenchDesign object does not require evaluating the function.

## function(x) {
##     mysubmethod(x)
## }

However, when buildBench() is called, the method naturally fails because the necessary mysubmethod() is not available in the current session.

## !! error caught in buildBench !!
## !! error in main function of method: 'method1'
## !!  original message: 
## !!  could not find function "mysubmethod"
## <simpleError in buildBench(mybd): 
## No method returned valid values for any assays!
## >

Therefore, if any locally defined functions are necessary for a method, they should be available along with the BenchDesign object or defined at the top level. Additionally, for reproducibility and clarity, we generally recommend keeping functions as “self-contained” as possible, and never relying on variables defined externally, e.g. in the global enviroment. Scoping with functions can be confusing, and it may not always be clear where constants are being evaluated. As an example, consider the following simple multiplying function.

While experienced R programmers may know which value of m (5, 2 or 10) will be used when buildBench() is called, for many users, this is less obvious.

##      methodr
## [1,]       2
## [2,]       4
## [3,]       6
## [4,]       8
## [5,]      10

The answer is the value assigned most recently in the environment where mymult() was defined, R_GlobalEnv, the global environment (m = 2). Note, however, that if the BenchDesign is saved and reloaded in a new session, whatever value of m is defined in the global environment of the new session will be used. If m is not defined in the new session, an error will be thrown. In this case, m should either be explicitly defined within the function or passed as a second input variable of mymult() and defined with params = in the BDMethod.

While it may be possible to aggressively capture all code and environment variables defined with each method in a BenchDesign, it is not within the scope of this package, and furthermore, may not be the best appraoch to constructing a reproducible benchmark. While redundant, for emphasis, we reiterate that the SummarizedBenchmark framework is a solution for structuring benchmark experiments which complements, but does not replace, well documented and organized code.

5 BenchDesign Features

Several less prominent, but hopefully useful features are also implemented for working with objects of the BenchDesign, BDMethod, and BDMethodList classes.

5.1 Specifying Method Metadata

As seen in the example above, metadata for methods are stored in the colData() of SummarizedBenchmark objects. Several default metadata columns are populated in the colData() of the SummarizedBenchmark object generated by a call to buildBench(). However, sometimes it may be useful to include additional metadata columns beyond just the defaults. While this can be accomplished manually by modifying the colData() of the SummarizedBenchmark object post hoc, method metadata can also be specified at the addMethod() step using the meta = optional parameter. The meta = parameter accepts a named list of metadata information. Each list entry will be added to the colData() as a new column, and should be the same type across methods to prevent possible errors. To avoid collisions between metadata columns specified with meta = and the default set of columns, metadata specified using meta = will be added to colData() with meta. prefixed to the column name.

As an example, we construct a BenchDesign object again using the multiple hypothesis testing example. The BenchDesign is created with two methods, Bonferroni correction and the Benjamini-Hochberg step-up procedure. Each method is specified with the optional meta = parameter noting the type of control provided, either familywise error rate (FWER) or false discovery rate (FDR). We can verify that the manually defined metadata column (meta.type) is available in the colData() of the newly generated SummarizedBenchmark.

## DataFrame with 2 rows and 7 columns
##         func.pkg func.pkg.vers func.pkg.manual     param.p param.method
##      <character>   <character>       <logical> <character>  <character>
## bonf       stats         4.0.0           FALSE        pval "bonferroni"
## BH         stats         4.0.0           FALSE        pval         "BH"
##         meta.type session.idx
##       <character>   <numeric>
## bonf FWER control           1
## BH    FDR control           1

While all methods in this example had the meta = option specified, this is not necessary. It is completely acceptable to specify the meta = parameter for only a subset of methods. Missing entries will be recorded as NAs in the colData().

5.2 Specifying Version Metadata

Arguably, two of the most important pieces of metadata stored in the colData() of the SummarizedBenchmark returned by buildBench() are the relevant package name and version (pkg_name, pkg_vers). Determining the package name and version requires the primary “workhorse” function of the method be directly specified as func = in the addMethod() call. In some cases, this may not be possible, e.g. if the “workhorse” function is a wrapper. However, there still might exist an important function for which we would like to track the package name and version. The meta parameter can help.

The meta = parameter will handle the following named list entries as special values: pkg_name, pkg_vers, pkg_func. First, if values are specified for pkg_name and pkg_vers in meta =, these will overwrite the values determined from func =. To trace the source of pkg_name and pkg_vers information, the func.pkg.manual column of the colData will be set to TRUE (the default value is FALSE). Alternatively, a function can be passed to meta = as pkg_func. This function will be used to determine both pkg_name and pkg_vers, and will take precendence over manually specified pkg_name and pkg_vers values. If pkg_func is specified, it will be included in the colData() as a new column with the same name, and the vers_src column will be set to "meta_func". **Note: the function should be wrapped in rlang::quo to be properly parsed.

We illustrate the behavior when using either pkg_func or pkg_name and pkg_vers with the meta optional parameter using the same multiple testing example from above. First, notice that Storey’s q-value method was previously included in the BenchDesign with a post = argument to extract q-values from the result of calling qvalue::qvalue on the data.

## qv --------------------------------------------------------- 
## BenchDesign Method (BDMethod) ------------------------------ 
##   method: qvalue::qvalue
##     function (p, fdr.level = NULL, pfdr = FALSE, lfdr.out... 
##   parameters:
##     p : pval 
##   post:
##     default : function (x) { ... 
##   meta:
##     none

We can avoid specifying post = by using the following wrapper function in place of the direct call to qvalue::qvalue.

However, we see that while the results for the two methods are the same, the colData() of the resulting SummarizedBenchmark object is missing the valuable package name and version information for the wrapper-based approach.

##                qv   qv_wrapper
## [1,] 0.0079813228 0.0079813228
## [2,] 0.3290660527 0.3290660527
## [3,] 0.0003753518 0.0003753518
## [4,] 0.0442094809 0.0442094809
## [5,] 0.1716134119 0.1716134119
## [6,] 0.1757029652 0.1757029652
## DataFrame with 2 rows and 5 columns
##               func.pkg func.pkg.vers func.pkg.manual     param.p session.idx
##            <character>   <character>       <logical> <character>   <numeric>
## qv              qvalue        2.19.0           FALSE        pval           1
## qv_wrapper          NA            NA           FALSE        pval           1

This can be fixed by specifying either the pkg_func or pkg_name and pkg_vers metadata values.

## DataFrame with 4 rows and 5 columns
##               func.pkg func.pkg.vers func.pkg.manual     param.p session.idx
##            <character>   <character>       <logical> <character>   <numeric>
## qv              qvalue        2.19.0           FALSE        pval           1
## qv_wrapper          NA            NA           FALSE        pval           1
## qv_pkgfunc      qvalue        2.19.0            TRUE        pval           1
## qv_pkgname      qvalue        2.19.0            TRUE        pval           1

The func.pkg and func.pkg.vers columns of the colData() are now populated with the correct information, and furthermore, the func.pkg.manual column is set to TRUE for the two latter methods.

5.3 Modifying Methods in a BenchDesign

Modifying the defintion of a method after it has been added to a BenchDesign is supported by the modifyMethod() function. The BenchDesign object created in the multiple testing example above, b, includes a method called qv.

## qv --------------------------------------------------------- 
## BenchDesign Method (BDMethod) ------------------------------ 
##   method: qvalue::qvalue
##     function (p, fdr.level = NULL, pfdr = FALSE, lfdr.out... 
##   parameters:
##     p : pval 
##   post:
##     default : function (x) { ... 
##   meta:
##     none

Suppose we wish to specify an additional pi0 = parameter for the qvalue::qvalue method as well as add a new metadata tag. This can be accomplished by passing both new parameters to the params = parameter of modifyMethod() as a rlang::quos(..) list. By default, all entries in the rlang::quos(..) list are assumed to be parameters for the main function of the method. However, func, post, or meta values can also be modified by included in their new values in the list using the special keywords, bd.func =, bd.post =, or bd.meta =. In the example below, we both add a new parameter , pi0 = and modify the meta = specification.

## qv --------------------------------------------------------- 
## BenchDesign Method (BDMethod) ------------------------------ 
##   method: qvalue::qvalue
##     function (p, fdr.level = NULL, pfdr = FALSE, lfdr.out... 
##   parameters:
##     p : pval 
##     pi0 : 0.9 
##   post:
##     default : function (x) { ... 
##   meta:
##     type : "FDR control"

We can see that the original method parameters, namely p = pval, is still defined for method, while the new method parameter pi0 = 0.9 has been added along with the new metadata specification. However, sometimes it may be desirable to completely overwrite all function parameters for a method, e.g. if we decide to completely redefine the parameters of a method. This may occur if some parameters were optional and originally specified, but no longer necessary. All function parameters can be overwritten by specifying .overwrite = TRUE.

## qv --------------------------------------------------------- 
## BenchDesign Method (BDMethod) ------------------------------ 
##   method: qvalue::qvalue
##     function (p, fdr.level = NULL, pfdr = FALSE, lfdr.out... 
##   parameters:
##     pi0 : 0.9 
##   post:
##     default : function (x) { ... 
##   meta:
##     type : "FDR control"

Now the originally specified p = pval is no longer included in the method definition. Obviously, this is not reasonable in this setting as the p parameter is a necessary parameter to run the method. Note that specifications of bd.meta = will always overwrite the existing metadata values of the method, even when .overwrite = FALSE.

5.4 Duplicating Methods in a BenchDesign

In addition to comparing multiple methods, a benchmark study may also involve comparing a single method across several parameter settings. The expandMethod() function provides the capability to take a method already defined in the BenchDesign, and expand it to multiple methods with differing parameter values in the BenchDesign object. In the following example, expandMethod() is used to duplicate the qv method with only the pi0 parameter modified. By default, the expanded method, in this case qv, is dropped. To prevent this from happening, .replace = FALSE can be specified.

Expanding the qv method with expandMethod() is accomplished by specifying any new parameters of the variant methods to params = as a rlang::quos(..) list. Here, two new variants of the qv method, qv-p90 and qv-p10, are created with new values of the pi0 parameter.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: md5hash 
##     MD5 hash: 09867f2696d3178a8f3bc038917042e1 
##   benchmark methods:
##     method:      BH; func: p.adjust       
##     method:      qv; func: qvalue::qvalue 
##     method:    bonf; func: p.adjust       
##     method:  qv_p90; func: qvalue::qvalue 
##     method:  qv_p10; func: qvalue::qvalue
## qv_p90 ----------------------------------------------------- 
## BenchDesign Method (BDMethod) ------------------------------ 
##   method: qvalue::qvalue
##     function (p, fdr.level = NULL, pfdr = FALSE, lfdr.out... 
##   parameters:
##     p : pval 
##     pi0 : 0.9 
##   post:
##     default : function (x) { ... 
##   meta:
##     none
## qv_p10 ----------------------------------------------------- 
## BenchDesign Method (BDMethod) ------------------------------ 
##   method: qvalue::qvalue
##     function (p, fdr.level = NULL, pfdr = FALSE, lfdr.out... 
##   parameters:
##     p : pval 
##     pi0 : 0.1 
##   post:
##     default : function (x) { ... 
##   meta:
##     none

When only a single parameter is modifed as in the example above, an alternative shorthand is provided. The parameter can be specified with onlyone = and the corresponding values specified with the new method names to params =.

## BenchDesign ------------------------------------------------ 
##   benchmark data:
##     type: md5hash 
##     MD5 hash: 09867f2696d3178a8f3bc038917042e1 
##   benchmark methods:
##     method:      BH; func: p.adjust       
##     method:      qv; func: qvalue::qvalue 
##     method:    bonf; func: p.adjust       
##     method:  qv_p90; func: qvalue::qvalue 
##     method:  qv_p10; func: qvalue::qvalue

The resulting modified BenchDesign object is the same.

6 SummarizedBenchmark Features

Aside from the standard SummarizedBenchmark

6.1 Manually Constructing

Throughout our documentation, SummarizedBenchmark objects are constructed exclusively using our recommended workflow by running buildBench() with a BenchDesign object. This recommended approach enables users to perform benchmark comparisons while automatically keeping track of parameters and software metadata. However, it is also possible to skip this recommended approach and construct a SummarizedBenchmark object directly from standard S3 data objects.

Using data from the iCOBRApackage (Soneson and Robinson 2016), this part of the vignette demonstrates how to build SummarizedBenchmark objects from S3 objects. The dataset contains differential expression results of three different methods (limma, edgeR and DESeq2) applied to a simulated RNA-seq dataset.

The process of building a SummarizedBenchmark object is similar to the process of constructing a SummarizedExperiment object. To build a SummarizedBenchmark object, three main objects are required: (1) a list where each element corresponds to a data.frame, (2) a DataFrame with annotations of the methods, and (3) when available, a DataFrame of ground truths.

In the SummarizedBenchmark object, each output of the methods is considered a different assay. For example, using the differential expression dataset example, we can define two assays, q-values and estimated log fold changes. For each assay, we arrange the output of the different methods as a matrix where each column corresponds to a method and each row corresponds to each feature (in this case, genes). We will need a list in which each of it’s element corresponds to an assay.

##                     edgeR      voom     DESeq2
## ENSG00000000457 0.2833295 0.1375146 0.04760152
## ENSG00000000460 0.2700376 0.2826398 0.08717737
## ENSG00000000938 0.9232794 0.9399669 0.99856748
##                      edgeR       voom      DESeq2
## ENSG00000000457 0.22706080 0.23980596 0.274811698
## ENSG00000000460 0.31050478 0.27677537 0.359226203
## ENSG00000000938 0.04985025 0.03344113 0.001526355

Since these are simulated data, the ground truths for both assays are known and available. We can format these as a DataFrame where each column corresponds to an assay and each row corresponds to a feature.

## DataFrame with 6 rows and 2 columns
##                    qvalue              logFC
##                 <integer>          <numeric>
## ENSG00000000457         0                  0
## ENSG00000000460         0                  0
## ENSG00000000938         0                  0
## ENSG00000000971         0                  0
## ENSG00000001460         1   3.93942204325368
## ENSG00000001461         1 0.0602244207363294

Finally, the method names are also reformatted as a DataFrame of simple annotation data of the methods.

## DataFrame with 3 rows and 1 column
##        method
##   <character>
## 1       edgeR
## 2        voom
## 3      DESeq2

A SummarizedBenchmark is built using the following call to the SummarizedBenchmark() constructor.

7 Next Steps

This vignette described the structure of the BenchDesign and SummarizedBenchmark classes which underly the SummarizedBenchmark package, as well as several complex and non-standard features of both classes. Examples of using these classes and the SummarizedBenchmark framework can be found in the SummarizedBenchmark: Full Case Study vignette as well as in the other Case Study vignettes. More advanced features are also described in various Feature vignettes.

References

Soneson, Charlotte, and Mark D Robinson. 2016. “iCOBRA: Open, Reproducible, Standardized and Live Method Benchmarking.” Nature Methods 13 (4):283–83. https://doi.org/10.1038/nmeth.3805.