Exploring a MgDb Object

Nathan D. Olson

2018-10-30

The MgDb Class in the metagenomeFeatures package includes the sequences and taxonomic information for a 16S database. The following vignette demonstrates the class methods for exploring and subsetting a MgDb-class object using the gg85 included in the metagenomeFeatures package. MgDb-class object with full databases are in separate packages such as the greengenes13.5MgDb package.

Demonstration MgDb-class Object

library(metagenomeFeatures)
## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colMeans, colSums, colnames,
##     dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
##     intersect, is.unsorted, lapply, lengths, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
##     rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Warning: replacing previous import 'lazyeval::is_formula' by
## 'purrr::is_formula' when loading 'metagenomeFeatures'
## Warning: replacing previous import 'lazyeval::is_atomic' by
## 'purrr::is_atomic' when loading 'metagenomeFeatures'
gg85 <- get_gg13.8_85MgDb()
gg85
## MgDb object:[1] "Metadata"
## |ACCESSION_DATE: Mon Apr  2 13:30:09 2018
## |URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus
## |DB_TYPE_NAME: GreenGenes
## |DB_VERSION: 13.8 85% OTUS
## |DB_TYPE_VALUE: MgDb
## |DB_SCHEMA_VERSION: 2.0
## [1] "Sequence Data:"
## [1] "DECIPHER formatted seqDB"
## [1] "Taxonomy Data:"
## # Source:   table<Seqs> [?? x 11]
## # Database: sqlite 3.22.0
## #   [/tmp/Rtmp941GRZ/Rinst75cf5201c4be/metagenomeFeatures/extdata/gg13.8_85.sqlite]
##    row_names identifier description Keys  Kingdom Phylum Class Ord   Family
##        <int> <chr>      <chr>       <chr> <chr>   <chr>  <chr> <chr> <chr> 
##  1         1 MgDb       1111561     1111… k__Bac… p__Pr… c__G… o__L… f__   
##  2         2 MgDb       1111421     1111… k__Bac… p__Pr… c__A… o__R… f__   
##  3         3 MgDb       1111090     1111… k__Bac… p__Ac… c__N… o__N… f__Ni…
##  4         4 MgDb       1110893     1110… k__Bac… p__Ba… c__[… o__[… f__Sa…
##  5         5 MgDb       1110814     1110… k__Bac… p__BR… c__   o__   f__   
##  6         6 MgDb       1110088     1110… k__Bac… p__Pr… c__G… o__   f__   
##  7         7 MgDb       1109993     1109… k__Bac… p__Ch… c__D… o__   f__   
##  8         8 MgDb       1109948     1109… k__Bac… p__Pl… c__[… o__B… f__W4 
##  9         9 MgDb       1109493     1109… k__Bac… p__Pl… c__v… o__   f__   
## 10        10 MgDb       1109328     1109… k__Bac… p__Ch… c__A… o__S… f__   
## # ... with more rows, and 2 more variables: Genus <chr>, Species <chr>
## [1] "Tree Data:"
## 
## Phylogenetic tree with 5088 tips and 5087 internal nodes.
## 
## Tip labels:
##  4479984, 540377, 811993, 823988, 4397176, 4446470, ...
## 
## Rooted; includes branch lengths.

MgDb Methods

taxa_keytypes

taxa_keytypes(gg85)
##  [1] "row_names"   "identifier"  "description" "Keys"        "Kingdom"    
##  [6] "Phylum"      "Class"       "Ord"         "Family"      "Genus"      
## [11] "Species"
taxa_columns(gg85)
## [1] "Keys"    "Kingdom" "Phylum"  "Class"   "Ord"     "Family"  "Genus"  
## [8] "Species"
head(taxa_keys(gg85, keytype = c("Kingdom")))
## # A tibble: 6 x 1
##   Kingdom    
##   <chr>      
## 1 k__Bacteria
## 2 k__Bacteria
## 3 k__Bacteria
## 4 k__Bacteria
## 5 k__Bacteria
## 6 k__Bacteria

Select Methods

Used to retrieve db entries for a specified taxonomic group or id list, can return either taxonomic, sequences information, or both.

Selecting taxonomic information

## # A tibble: 27 x 8
##    Keys   Kingdom  Phylum    Class     Ord      Family    Genus   Species 
##    <chr>  <chr>    <chr>     <chr>     <chr>    <chr>     <chr>   <chr>   
##  1 10479… k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__     s__     
##  2 818108 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__     s__     
##  3 651366 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__     s__     
##  4 592303 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__Pro… s__     
##  5 575794 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__     s__     
##  6 559954 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__     s__     
##  7 368586 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__     s__     
##  8 289174 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__Ple… s__shig…
##  9 268585 k__Bact… p__Prote… c__Gamma… o__Ente… f__Enter… g__Cit… s__     
## 10 232927 k__Bact… p__Prote… c__Gamma… o__Vibr… f__Vibri… g__     s__     
## # ... with 17 more rows

Selecting sequence information

##   A DNAStringSet instance of length 27
##      width seq                                         names               
##  [1]  1366 ATTGAACGCTGGCGGCAGGC...GTGAATACGTTCCCGGGCCT 1047956
##  [2]  1410 ACGGTACACAGAGAGCTTGC...TTCGGGAGGGCGCTTACCAC 818108
##  [3]  1421 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 651366
##  [4]  1453 AGTCGAGCGGTAACAGTGGG...CATGACTGGGGGAAGTCGTA 592303
##  [5]  1419 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 575794
##  ...   ... ...
## [23]  1383 TGGGAAACTGCCTGATGGAG...AACCTTCGGGAGGGCGGTTT 4336809
## [24]  1443 GGGTGAGTAATGTCTGGGAA...GGTTGCAAAAGAAGTAGGTA 656881
## [25]  1563 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4371215
## [26]  1392 GCGGCGGACGGGTGAGTAAT...TGGGTAGTTTAACCTTCGGG 4375861
## [27]  1389 TCGTGCGGTAATAGAGGAAC...AGCAAGTAGTTTAACCTAAA 4443068

Selecting all

## $taxa
## # A tibble: 2 x 8
##   Keys   Kingdom   Phylum     Class      Ord      Family    Genus  Species
##   <chr>  <chr>     <chr>      <chr>      <chr>    <chr>     <chr>  <chr>  
## 1 661785 k__Bacte… p__Proteo… c__Gammap… o__Vibr… f__Vibri… g__Vi… s__    
## 2 43758… k__Bacte… p__Proteo… c__Gammap… o__Vibr… f__Vibri… g__Vi… s__    
## 
## $seq
##   A DNAStringSet instance of length 2
##     width seq                                          names               
## [1]  1420 AGAGTTTGATCATGGCTCAGA...TTCATGACTGGGGTGAAGTC 661785
## [2]  1392 GCGGCGGACGGGTGAGTAATG...TGGGTAGTTTAACCTTCGGG 4375861
## 
## $tree
## 
## Phylogenetic tree with 2 tips and 1 internal nodes.
## 
## Tip labels:
## [1] "661785"  "4375861"
## 
## Rooted; includes branch lengths.
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] metagenomeFeatures_2.2.0 Biobase_2.42.0          
## [3] BiocGenerics_0.28.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.19      XVector_0.22.0    compiler_3.5.1   
##  [4] pillar_1.3.0      dbplyr_1.2.2      bindr_0.1.1      
##  [7] zlibbioc_1.28.0   tools_3.5.1       digest_0.6.18    
## [10] bit_1.1-14        nlme_3.1-137      RSQLite_2.1.1    
## [13] evaluate_0.12     memoise_1.1.0     tibble_1.4.2     
## [16] lattice_0.20-35   pkgconfig_2.0.2   rlang_0.3.0.1    
## [19] cli_1.0.1         DBI_1.0.0         yaml_2.2.0       
## [22] bindrcpp_0.2.2    stringr_1.3.1     dplyr_0.7.7      
## [25] knitr_1.20        IRanges_2.16.0    Biostrings_2.50.0
## [28] S4Vectors_0.20.0  stats4_3.5.1      rprojroot_1.3-2  
## [31] bit64_0.9-7       grid_3.5.1        tidyselect_0.2.5 
## [34] glue_1.3.0        R6_2.3.0          fansi_0.4.0      
## [37] rmarkdown_1.10    DECIPHER_2.10.0   purrr_0.2.5      
## [40] blob_1.1.1        magrittr_1.5      backports_1.1.2  
## [43] htmltools_0.3.6   assertthat_0.2.0  ape_5.2          
## [46] utf8_1.1.4        stringi_1.2.4     lazyeval_0.2.1   
## [49] crayon_1.3.4