Contents

1 Introduction

The AWAggregatorData package contains the data associated with the AWAggregator R package. It includes two pre-trained random forest models, one incorporating the average coefficient of variation as a feature, and the other one not including it. It also contains the PSMs in Benchmark Set 1~3 derived from the psm.tsv output files generated by FragPipe, which are used to train the random forest models.

2 Overview of Package Data

Data available in the AWAggregatorData package:

3 Installation

if (!requireNamespace('BiocManager', quietly = TRUE))
    install.packages('BiocManager')

BiocManager::install('ExperimentHub')
BiocManager::install('AWAggregatorData')

4 Load Data from ExperimentHub

Data are stored via ExperimentHub package. The information of available datasets can be retrieved by the query function

library(ExperimentHub)
## Loading required package: BiocGenerics
## Loading required package: generics
## 
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
##     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
eh = ExperimentHub()
query(eh, 'AWAggregatorData') # Require Bioconductor version 3.21 or later
## ExperimentHub with 5 records
## # snapshotDate(): 2025-07-17
## # $dataprovider: University of British Columbia
## # $species: NA
## # $rdataclass: data.frame, ranger
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH9637"]]' 
## 
##            title              
##   EH9637 | benchmark.set.1.rds
##   EH9638 | benchmark.set.2.rds
##   EH9639 | benchmark.set.3.rds
##   EH9640 | regr.rds           
##   EH9641 | regr.no.CV.rds

The datasets and pre-trained models can be downloaded by:

# Benchmark Set 1
df = eh[['EH9637']]
## see ?AWAggregatorData and browseVignettes('AWAggregatorData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
# Benchmark Set 2
df = eh[['EH9638']]
## see ?AWAggregatorData and browseVignettes('AWAggregatorData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
# Benchmark Set 3
df = eh[['EH9639']]
## see ?AWAggregatorData and browseVignettes('AWAggregatorData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
# Pre-trained model incorporating the average coefficient of variation (CV) as 
# a feature
regr = eh[['EH9640']]
## see ?AWAggregatorData and browseVignettes('AWAggregatorData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
# Pre-trained model excluding CV as a feature
regr = eh[['EH9641']]
## see ?AWAggregatorData and browseVignettes('AWAggregatorData') for documentation
## downloading 1 resources
## retrieving 1 resource
## loading from cache
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] AWAggregatorData_0.99.4 ExperimentHub_2.99.5    AnnotationHub_3.99.6   
## [4] BiocFileCache_2.99.6    dbplyr_2.5.0            BiocGenerics_0.55.1    
## [7] generics_0.1.4          BiocStyle_2.37.1       
## 
## loaded via a namespace (and not attached):
##  [1] toOrdinal_1.3-0.0    KEGGREST_1.49.1      xfun_0.53           
##  [4] bslib_0.9.0          httr2_1.2.1          lattice_0.22-7      
##  [7] Biobase_2.69.0       vctrs_0.6.5          tools_4.5.1         
## [10] stats4_4.5.1         curl_7.0.0           tibble_3.3.0        
## [13] AnnotationDbi_1.71.1 RSQLite_2.4.3        blob_1.2.4          
## [16] pkgconfig_2.0.3      Matrix_1.7-3         S4Vectors_0.47.0    
## [19] lifecycle_1.0.4      stringr_1.5.1        compiler_4.5.1      
## [22] brio_1.1.5           Biostrings_2.77.2    progress_1.2.3      
## [25] Seqinfo_0.99.2       htmltools_0.5.8.1    sass_0.4.10         
## [28] yaml_2.3.10          tidyr_1.3.1          pillar_1.11.0       
## [31] crayon_1.5.3         jquerylib_0.1.4      cachem_1.1.0        
## [34] tidyselect_1.2.1     digest_0.6.37        stringi_1.8.7       
## [37] dplyr_1.1.4          purrr_1.1.0          bookdown_0.44       
## [40] BiocVersion_3.22.0   grid_4.5.1           fastmap_1.2.0       
## [43] cli_3.6.5            magrittr_2.0.3       withr_3.0.2         
## [46] prettyunits_1.2.0    filelock_1.0.3       rappdirs_0.3.3      
## [49] bit64_4.6.0-1        rmarkdown_2.29       XVector_0.49.0      
## [52] httr_1.4.7           Peptides_2.4.6       bit_4.6.0           
## [55] ranger_0.17.0        AWAggregator_0.99.4  png_0.1-8           
## [58] hms_1.1.3            memoise_2.0.1        evaluate_1.0.4      
## [61] knitr_1.50           IRanges_2.43.0       testthat_3.2.3      
## [64] rlang_1.1.6          Rcpp_1.1.0           glue_1.8.0          
## [67] DBI_1.2.3            BiocManager_1.30.26  jsonlite_2.0.0      
## [70] R6_2.6.1