Contents

1 Institute for Computational Biomedicine, Heidelberg University

1 Available datasets

To see a full list of datasets call the omnipath_show_db function:

library(OmnipathR)
omnipath_show_db()
## # A tibble: 16 × 10
##    name        last_used           lifetime package loader loader_param latest_param loaded db           key  
##    <chr>       <dttm>                 <dbl> <chr>   <chr>  <list>       <list>       <lgl>  <list>       <chr>
##  1 Gene Ontol… 2022-04-26 17:30:29      300 Omnipa… go_on… <named list> <named list> TRUE   <named list> go_b…
##  2 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_f…
##  3 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_a…
##  4 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_a…
##  5 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_s…
##  6 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_c…
##  7 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_d…
##  8 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_c…
##  9 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_m…
## 10 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 11 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_m…
## 12 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 13 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 14 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_y…
## 15 GO annotat… NA                       300 Omnipa… go_an… <named list> <lgl [1]>    FALSE  <lgl [1]>    goa_…
## 16 UniProt-Ge… NA                       300 Omnipa… unipr… <named list> <lgl [1]>    FALSE  <lgl [1]>    up_g…

It returns a tibble where each dataset has a human readable name and a key which can be used to refer to it. We can also check here if the dataset is currently loaded, the time it’s been last used, the loader function and its arguments.

2 Access a dataset

Datasets can be accessed by the get_db function. Ideally you should call this function every time you use the dataset. The first time it will be loaded, the subsequent times the already loaded dataset will be returned. This way each access is registered and extends the expiry time. Let’s load the human UniProt-GeneSymbol table. Above we see its key is up_gs_human.

up_gs <- get_db('up_gs_human')
up_gs
## # A tibble: 20,348 × 2
##    From   To     
##    <chr>  <chr>  
##  1 P51451 BLK    
##  2 A6H8Y1 BDP1   
##  3 O60885 BRD4   
##  4 Q9Y3X0 CCDC9  
##  5 P22223 CDH3   
##  6 Q9BXJ4 C1QTNF3
##  7 P09871 C1S    
##  8 Q9ULX7 CA14   
##  9 Q53TS8 C2CD6  
## 10 Q01518 CAP1   
## # … with 20,338 more rows

This dataset is a two columns data frame of SwissProt IDs and Gene Symbols. Looking again at the datasets, we find that this dataset is loaded now and the last_used timestamp is set to the time we called get_db:

omnipath_show_db()
## # A tibble: 16 × 10
##    name        last_used           lifetime package loader loader_param latest_param loaded db           key  
##    <chr>       <dttm>                 <dbl> <chr>   <chr>  <list>       <list>       <lgl>  <list>       <chr>
##  1 Gene Ontol… 2022-04-26 17:30:29      300 Omnipa… go_on… <named list> <named list> TRUE   <named list> go_b…
##  2 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_f…
##  3 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_a…
##  4 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_a…
##  5 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_s…
##  6 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_c…
##  7 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_d…
##  8 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_c…
##  9 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_m…
## 10 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 11 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_m…
## 12 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 13 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 14 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_y…
## 15 GO annotat… NA                       300 Omnipa… go_an… <named list> <lgl [1]>    FALSE  <lgl [1]>    goa_…
## 16 UniProt-Ge… 2022-04-26 17:30:31      300 Omnipa… unipr… <named list> <named list> TRUE   <tibble>     up_g…

The above table contains also a reference to the dataset, and the arguments passed to the loader function:

d <- omnipath_show_db()
d %>% dplyr::pull(db) %>% magrittr::extract2(16)
## # A tibble: 20,348 × 2
##    From   To     
##    <chr>  <chr>  
##  1 P51451 BLK    
##  2 A6H8Y1 BDP1   
##  3 O60885 BRD4   
##  4 Q9Y3X0 CCDC9  
##  5 P22223 CDH3   
##  6 Q9BXJ4 C1QTNF3
##  7 P09871 C1S    
##  8 Q9ULX7 CA14   
##  9 Q53TS8 C2CD6  
## 10 Q01518 CAP1   
## # … with 20,338 more rows
d %>% dplyr::pull(latest_param) %>% magrittr::extract2(16)
## $to
## [1] "genesymbol"
## 
## $organism
## [1] 9606

If we call get_db again, the timestamp is updated, resetting the expiry counter:

up_gs <- get_db('up_gs_human')
omnipath_show_db()
## # A tibble: 16 × 10
##    name        last_used           lifetime package loader loader_param latest_param loaded db           key  
##    <chr>       <dttm>                 <dbl> <chr>   <chr>  <list>       <list>       <lgl>  <list>       <chr>
##  1 Gene Ontol… 2022-04-26 17:30:29      300 Omnipa… go_on… <named list> <named list> TRUE   <named list> go_b…
##  2 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_f…
##  3 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_a…
##  4 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_a…
##  5 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_s…
##  6 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_c…
##  7 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_d…
##  8 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_c…
##  9 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_m…
## 10 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 11 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_m…
## 12 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 13 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_p…
## 14 Gene Ontol… NA                       300 Omnipa… go_on… <named list> <lgl [1]>    FALSE  <lgl [1]>    go_y…
## 15 GO annotat… NA                       300 Omnipa… go_an… <named list> <lgl [1]>    FALSE  <lgl [1]>    goa_…
## 16 UniProt-Ge… 2022-04-26 17:30:41      300 Omnipa… unipr… <named list> <named list> TRUE   <tibble>     up_g…

3 Where are the loaded datasets?

The loaded datasets live in an environment which belong to the OmnipathR package. Normally users don’t need to access this environment. As we see below, omnipath_show_db presents us all information availble by directly looking at the environment:

OmnipathR:::omnipath.env$db$up_gs_human
## $name
## [1] "UniProt-GeneSymbol table (human)"
## 
## $last_used
## [1] "2022-04-26 17:30:41 EDT"
## 
## $lifetime
## [1] 300
## 
## $package
## [1] "OmnipathR"
## 
## $loader
## [1] "uniprot_full_id_mapping_table"
## 
## $loader_param
## $loader_param$to
## [1] "genesymbol"
## 
## $loader_param$organism
## [1] 9606
## 
## 
## $latest_param
## $latest_param$to
## [1] "genesymbol"
## 
## $latest_param$organism
## [1] 9606
## 
## 
## $loaded
## [1] TRUE
## 
## $db
## # A tibble: 20,348 × 2
##    From   To     
##    <chr>  <chr>  
##  1 P51451 BLK    
##  2 A6H8Y1 BDP1   
##  3 O60885 BRD4   
##  4 Q9Y3X0 CCDC9  
##  5 P22223 CDH3   
##  6 Q9BXJ4 C1QTNF3
##  7 P09871 C1S    
##  8 Q9ULX7 CA14   
##  9 Q53TS8 C2CD6  
## 10 Q01518 CAP1   
## # … with 20,338 more rows

4 How to extend the expiry period?

The default expiry of datasets is given by the option omnipath.db_lifetime. By calling omnipath_save_config this option is saved to the default config file and will be valid in all subsequent sessions. Otherwise it’s valid only in the current session.

options(omnipath.db_lifetime = 600)
omnipath_save_config()

5 Where are the datasets defined?

The built-in dataset definitions are in a JSON file shipped with the package. Easiest way to see it is by the git web interface.

6 How to add custom datasets?

Currently no API available for this, but it would be super easy to implement. It would be matter of providing a JSON similar to the above, or calling a function. Please open an issue if you are interested in this feature.

7 Session information

sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] OmnipathR_3.4.0  BiocStyle_2.24.0
## 
## loaded via a namespace (and not attached):
##  [1] progress_1.2.2      tidyselect_1.1.2    xfun_0.30           bslib_0.3.1         purrr_0.3.4        
##  [6] vctrs_0.4.1         generics_0.1.2      htmltools_0.5.2     yaml_2.3.5          utf8_1.2.2         
## [11] rlang_1.0.2         jquerylib_0.1.4     pillar_1.7.0        later_1.3.0         glue_1.6.2         
## [16] withr_2.5.0         DBI_1.1.2           rappdirs_0.3.3      bit64_4.0.5         readxl_1.4.0       
## [21] lifecycle_1.0.1     stringr_1.4.0       cellranger_1.1.0    evaluate_0.15       knitr_1.38         
## [26] tzdb_0.3.0          fastmap_1.1.0       parallel_4.2.0      curl_4.3.2          fansi_1.0.3        
## [31] Rcpp_1.0.8.3        readr_2.1.2         backports_1.4.1     checkmate_2.1.0     BiocManager_1.30.17
## [36] vroom_1.5.7         jsonlite_1.8.0      bit_4.0.4           hms_1.1.1           digest_0.6.29      
## [41] stringi_1.7.6       bookdown_0.26       dplyr_1.0.8         cli_3.3.0           tools_4.2.0        
## [46] magrittr_2.0.3      logger_0.2.2        sass_0.4.1          tibble_3.1.6        crayon_1.5.1       
## [51] tidyr_1.2.0         pkgconfig_2.0.3     ellipsis_0.3.2      xml2_1.3.3          prettyunits_1.1.1  
## [56] assertthat_0.2.1    rmarkdown_2.14      httr_1.4.2          R6_2.5.1            igraph_1.3.1       
## [61] compiler_4.2.0