Using Bioconductor for Annotation

Bioconductor has extensive facilities for mapping between microarray probe, gene, pathway, gene ontology, homology and other annotations.

Bioconductor has built-in representations of GO, KEGG, vendor, and other annotations, and can easily access NCBI, Biomart, UCSC, and other sources.

Package Types

Bioconductor contains many different types of annotation packages. You can browse the currently available types here here by simply using the bioconductor web site.

You will see that there are packages that contain annotation data about a particular microarray platform (ChipDb), there are packages that contain gene centered data about an organism (OrgDb), and even packages that contain genome centered data about an organisms transcriptome (TranscriptDb). This document will talk about typical uses for most of these more popular kinds of annotation package. As well as describe a newer meta package that wraps access to several different kinds of packages (OrganismDb).

Sample ChipDb Workflow

The following examples illustrates a typical R / Bioconductor session using a ChipDb style package for information about a specific type of microarray. It continues the differential expression workflow, taking a 'top table' of differentially expressed probesets and discovering the genes probed, and the Gene Ontology pathways to which they belong.

## Affymetrix U133 2.0 array IDs of interest; these might be
## obtained from
##
##   tbl <- topTable(efit, coef=2)
##   ids <- tbl[["ID"]]
##
## as part of a more extensive workflow.
> ids <- c("39730_at", "1635_at", "1674_at", "40504_at", "40202_at")

## load libraries as sources of annotation
> library("hgu95av2.db")

## To list the kinds of things that can be retrieved, use the cols method.
> cols(hgu95av2.db)

## To list the kinds of things that can be used as keys 
## use the keytypes method
> keytypes(hgu95av2.db)

## To extract viable keys of a particular kind, use the keys method.
> head(keys(hgu95av2.db, keytype="ENTREZID"))

## the select method allows you to mao probe ids to ENTREZ gene ids...
> select(hgu95av2.db, ids, "ENTREZID", "PROBEID")
   PROBEID ENTREZID
1 39730_at       25
2  1635_at       25
3  1674_at     7525
4 40504_at     5445
5 40202_at      687

## ... and to GENENAME etc.
> select(hgu95av2.db, ids, c("ENTREZID","GENENAME"), "PROBEID")
   PROBEID ENTREZID                                           GENENAME
1 39730_at       25     c-abl oncogene 1, non-receptor tyrosine kinase
2  1635_at       25     c-abl oncogene 1, non-receptor tyrosine kinase
3  1674_at     7525 v-yes-1 Yamaguchi sarcoma viral oncogene homolog 1
4 40504_at     5445                                      paraoxonase 2
5 40202_at      687                              Kruppel-like factor 9

## find and extract the GO ids associated with the first id
> res <- select(hgu95av2.db, ids[1], "GO", "PROBEID")
> head(res)
   PROBEID         GO EVIDENCE ONTOLOGY
1 39730_at GO:0000115      TAS       BP
2 39730_at GO:0000287      IDA       MF
3 39730_at GO:0003677      NAS       MF
4 39730_at GO:0003785      TAS       MF
5 39730_at GO:0004515      TAS       MF
6 39730_at GO:0004713      IDA       MF

## use GO.db to find the Terms associated with those GOIDs
> library("GO.db")
> head(select(GO.db, res$GO, "TERM", "GOID"))
        GOID                                                                   TERM
1 GO:0000115  regulation of transcription involved in S phase of mitotic cell cycle
2 GO:0000287                                                  magnesium ion binding
3 GO:0003677                                                            DNA binding
4 GO:0003785                                                  actin monomer binding
5 GO:0004515                     nicotinate-nucleotide adenylyltransferase activity
6 GO:0004713                                       protein tyrosine kinase activity

[ Back to top ]

Sample OrgDb Workflow

The organism wide gene centered packages (OrgDb packages) all contain gene centered data for an organism. These packages are accessed in much the same way as the platform based (ChipDb) packages previously discussed. But because they are general, they don't contain infromation like probe IDs that would relate to any specific platform.

But the important thing to understand is that the same methods apply. So for example you can look up information in this way:

> library(org.Hs.eg.db)
> keys <- head(keys(org.Hs.eg.db), n=2)
> cols <- c("PFAM","GO", "SYMBOL")
> select(org.Hs.eg.db, keys, cols, keytype="ENTREZID")

[ Back to top ]

Sample TranscriptDb Workflow

[ Back to top ]

Sample OrganismDb Workflow

[ Back to top ]

Sample AnnotationHub Workflow

[ Back to top ]

Installation and Use

Follow installation instructions to start using these packages. To install the annotations associated with the Affymetrix Human Genome U95 V 2.0, and with Gene Ontology, use

> source("http://bioconductor.org/biocLite.R")
> biocLite(c("hgu95av2.db", "GO.db"))

Package installation is required only once per R installation. View a full list of available software and annotation packages.

To use the AnnotationDbi and GO.db package, evaluate the commands

> library(AnnotationDbi")
> library("GO.db")

These commands are required once in each R session.

[ Back to top ]

Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

> help(package="GO.db")
> ?select

to obtain an overview of help on the GO.db package, and the select method. The AnnotationDbi package is used by most .db packages. View the vignettes in the AnnotationDbi package with

> browseVignettes(package="AnnotationDbi")

To view vignettes (providing a more comprehensive introduction to package functionality) in the AnnotationDbi package. Use

> help.start()

To open a web page containing comprehensive help resources.

[ Back to top ]

Annotation Resources

The following guides the user through key annotation packages. Users interested in how to create custom chip packages should see the vignettes in the AnnotationForge package. There is additional information in the AnnotationDbi, OrganismDbi and GenomicFeatures packages for how to use some of the extra tools provided. You can also refer to the complete list of annotation packages.

Key Packages

Types of Annotation Packages

[ Back to top ]

Fred Hutchinson Cancer Research Center