Contents

This vignette includes answers and supporting materials that address frequently asked questions (FAQs), especially those posted on the phyloseq issues tracker.

For most issues the phyloseq issues tracker should suffice; but occasionally there are questions that are asked repeatedly enough that it becomes appropriate to canonize the answer here in this vignette. This is both (1) to help users find solutions more quickly, and (2) to mitigate redundancy on the issues tracker.

All users are encouraged to perform a google search and review other questions/responses to both open and closed issues on the phyloseq issues tracker before seeking an active response by posting a new issue.

library("phyloseq"); packageVersion("phyloseq")
## [1] '1.50.0'
library("ggplot2"); packageVersion("ggplot2")
## [1] '3.5.1'
theme_set(theme_bw())

1 - I tried reading my biom file using phyloseq, but it didn’t work. What’s wrong?

The most common cause for this errors is derived from a massive change to the way biom files are stored on disk. There are currently two “versions” of the biom-format, each of which stores data very differently. The original format – and original support in phyloseq – was for biom-format version 1 based on JSON.

The latest version – version 2 – is based on the HDF5 file format, and this new biom format version recently become the default file output format for popular workflows like QIIME.

1.1 Good News: HDF5-biom should be supported in next release

The biomformat package is the Bioconductor incarnation of R package support for the biom file format, written by Paul McMurdie (phyloseq author) and Joseph Paulson (metagenomeSeq author). Although it has been available on GitHub and BioC-devel for many months now, the first release version of biomformat on Bioconductor will be in April 2016. In that same release, phyloseq will switch over from the JSON-only biom package hosted on CRAN to this new package, biomformat, which simultaneously supports biom files based on either HDF5 or JSON.

This difference will be largely opaque to users, and phyloseq will “just work” after the next release in April.

Use the import_biom function to read your recent QIIME or other biom-format data.

Additional back details are described in Issue 443.

1.2 HDF5 (Version 2.0) biom-format: biomformat

As just described, HDF5 biom format is currently supported in the development version of phyloseq, via the new beta/development package called biomformat on BioC-devel and GitHub:

https://github.com/joey711/biomformat

If you need to use HDF5-based biom format files immediately and cannot wait for the upcoming release, then you should install the development version of the biomformat package by following the instructions at the link above.

1.3 Not every data component is included in .biom files

Even though the biom-format supports the self-annotated inclusion of major components like that taxonomy table and sample data table, many tools that generate biom-format files (like QIIME, MG-RAST, mothur, etc.) do not export this data, even if you provided the information in your data input files. The reason for this boggles me, and I’ve shared my views on this with QIIME developers, but there nevertheless seems to be no plan to include your sample data in the ouput biom file.

Furthermore, even though I have proposed it to the biom-format team, there is currently no support (or timeline for support) for inclusion of a phylogenetic tree within a “.biom” file.

A number of tutorials are available demonstrating how one can add components to a phyloseq object after it has been created/imported. The following tutorial is especially relevant

http://joey711.github.io/phyloseq-demo/import-biom-sd-example.html

Which makes use of the following functions:

  • import_qiime_sample_data
  • merge_phyloseq

2 - microbio_me_qiime() returned an error. What’s wrong?

2.1 The QIIME-DB Server is Permanently Down.

The QIIME-DB server is permanently down.

Users are suggested to migrate their queries over to Qiita.

Indeed, the previous link to microbio.me/qiime now sends users to the new Qiita website.

2.2 An interface to Qiita is Planned.

Stay tuned. The Qiita API needs to be released by the Qiita developers first. The phyloseq developers have no control over this, as we are not affiliated directly with the QIIME developers. Once there is an official Qiita API with documentation, an interface for phyloseq will be added.

We found the microbio_me_qiime() function to be very convenient while the QIIME-DB server lasted. Hopefully an equivalent is hosted soon.

3 - I want a phyloseq graphic that looks like…

Great!

Every plot function in phyloseq returns a ggplot2 object. When these objects are “printed” to standard output in an R session, for instance,

data(esophagus)
plot_tree(esophagus)

then the graphic is rendered in the current graphic device.

Alternatively, if you save the object output from a phyloseq plot_ function as a variable in your session, then you can further modify it, interactively, at your leisure. For instance,

p1 = plot_tree(esophagus, color = "Sample")
p1

p1 + 
  ggtitle("This is my title.") +
  annotate("text", 0.25, 3, 
           color = "orange",
           label = "my annotation")

There are lots of ways for you to generate custom graphics with phyloseq as a starting point.

The following sections list some of my favorites.

3.1 Modify the ggplot object yourself.

For example, the plot_ordination() examples tutorial provides several examples of using additional ggplot2 commands to modify/customize the graphic encoded in the ggplot2 object returned by plot_ordination.

The ggplo2 documentation is the current and canonical online reference for creating, modifying, and developing with ggplot2 objects.

For simple changes to aesthetics and aesthetic mapping, the aesthetic specifications vignette is a useful resource.

3.2 psmelt and ggplot2

The psmelt function converts your phyloseq object into a table (data.frame) that is very friendly for defining a custom ggplot2 graphic. This function was originally created as an internal (not user-exposed) tool within phyloseq to enable a DRY approach to building ggplot2 graphics from microbiome data represented as phyloseq objects.

When applicable, the phyloseq plot_ family of functions use psmelt. This function is now a documented and user-accessible function in phyloseq – for the main purpose of enabling users to create their own ggplot2 graphics as needed.

There are lots of great documentation examples for ggplot2 at

The following are two very simple examples of using psmelt to define your own ggplot2 object “from scratch”. It should be evident that you could include further ggplot2 commands to modify each plot further, as you see fit.

data("esophagus")
mdf = psmelt(esophagus)
# Simple bar plot. See plot_bar() for more.
ggplot(mdf, aes(x = Sample, 
                y = Abundance)) + 
  geom_bar(stat = "identity", position = "stack", color = "black")

# Simple heat map. See plot_heatmap() for more.
ggplot(mdf, aes(x = Sample, 
                y = OTU, 
                fill = Abundance)) +
  geom_raster()