Package Guidelines

Introduction

The Bioconductor project promotes high-quality, well documented, and interoperable software. These guidelines help to achieve this objective; they are not meant to put undue burden on package authors, and authors having difficultly satisfying guidelines should seek advice on the bioc-devel mailing list.

Package maintainers are urged to follow these guidelines as closely as possible when developing Bioconductor packages.

General instructions for producing packages can be found in the Writing R Extensions manual, available from within R (RShowDoc("R-exts")) or on the R web site.

[ Back to top ]

Types of Packages

Most packages contributed by users are software packages that perform analytic calculations. Users also contribute annotation and experiment data packages. Annotation packages are database-like packages that provide information linking identifiers (e.g., Entrez gene names or Affymetrix probe ids) to other information (e.g., chromosomal location, Gene Ontology category). Experiment data packages provide data sets that are used, often by software packages, to illustrate particular analyses. An excellent practice is to develop a software package, and to provide or use an existing experiment data package to give a comprehensive illustration of the methods in the software package. The guidelines below apply to all packages, but annotation and experiment data packages are not required to conform to the space limitations of software packages. Developers wishing to contribute annotation or experiment data packages should seek additional support associated with package submission.

[ Back to top ]

Version of Bioconductor and R

Package developers should always use the devel version of Bioconductor when developing and testing packages to be contributed.

Depending on the R release cycle, using Bioconductor devel may or may not involve also using the devel version of R. See the how-to on using devel version of Bioconductor for up-to-date information.

[ Back to top ]

Correctness, Space and Time

Bioconductor packages must pass R CMD build (or R CMD INSTALL --build) and pass R CMD check with no errors and no warnings using a recent R-devel. Authors should also try to address all notes that arise during build or check.

Do not use filenames that differ only in case, as not all file systems are case sensitive.

The source package resulting from running R CMD build should occupy less than 4MB on disk. The package should require less than 5 minutes to run R CMD check --no-build-vignettes. Using the --no-build-vignettes option ensures that the vignette is built only once.

Vignette and man page examples should not use more than 2GB of memory since R cannot allocate more than this on 32-bit Windows.

[ Back to top ]

Package Name

Choose a descriptive name. An easy way to check whether your name is already in use is to check that the following command fails

source("http://bioconductor.org/biocLite.R")
biocLite("MyPackage")

Avoid names that are easily confused with existing package names, or that imply a temporal (e.g., ExistingPackage2) or qualitative (e.g., ExistingPackagePlus) relationship.

[ Back to top ]

License

The "License:" field in the DESCRIPTION file should preferably refer to a standard license (see opensource.org or wikipedia) using one of R's standard specifications. Be specific about any version that applies (e.g., GPL-2). Core Bioconductor packages are typically licensed under Artistic-2.0. To specify a non-standard license, include a file named LICENSE in your package (containing the full terms of your license) and use the string "file LICENSE" (without the double quotes) in the "License:" field of your DESCRIPTION file.

[ Back to top ]

Package Content

Packages must

[ Back to top ]

Package Dependencies

Reuse, rather than re-implement or duplicate, well-tested functionality from other packages. Specify package dependencies in the DESCRIPTION file, listed as follows

[ Back to top ]

S4 Classes and Methods

Re-use existing S4 classes and generics where possible. This encourages interoperability and simplifies your own package development. If your data requires a new representation or function, carefully design an S4 class or generic so that other package developers with similar needs will be able to re-use your hard work, and so that users of related packages will be able to seamlessly use your data structures. Do not hesitate to ask on the Bioc-devel mailing list for advice.

We recommend the following structure/layout:

A Collates: field in the DESCRIPTION file may be necessary to order class and method definitions appropriately during package installation.

[ Back to top ]

Vectorized Calculations

Many R operations are performed on the whole object, not just the elements of the object (e.g., sum(x), not x[1] + x[2] + ...). In particular, relatively few situations require an explicit for loop.

[ Back to top ]

End-User Messages

[ Back to top ]

Graphics Device

Use dev.new() to start a graphics device if necessary. Avoid using x11() or X11() for it can only be called on machines that have access to an X server.

[ Back to top ]

The Vignette

A vignette demonstrates how to accomplish non-trivial tasks embodying the core functionality of your package. There are two common types of vignettes. A Sweave vignette is an .Rnw file that contains LaTeX and chunks of R code. The R code chunk starts with a line <<>>=, and ends with @. Each chunk is evaluated during R CMD build, prior to LaTeX compilation to a PDF document. An R markdown vignette is similar to a Sweave vignette, but uses markdown instead of LaTeX for structuring text sections and resulting in HTML output. The knitr package can process most Sweave and all R markdown vignettes, producing pleasing output. Refer to Writing package vignettes for technical details. See the BiocStyle package for a convenient way to use common macros and a standard style.

A vignette provides reproducibility: the vignette produces the same results as copying the corresponding commands into an R session. It is therefore essential that the vignette embed R code between <<>>= and @; short-cuts (e.g., using a LaTeX verbatim environment, or using the Sweave eval=FALSE flag, or equivalent tricks in markdown) undermine the benefit of vignettes.

All packages are expected to have at least one vignette. Vignettes go in the vignettes directory of the package.

[ Back to top ]

Citations

Appropriate citations must be included in help pages (e.g., in the see also section) and vignettes; this aspect of documentation is no different from any scientific endeavor. The file inst/CITATION can be used to specify how a package is to be cited.

[ Back to top ]

Version Numbering

All Bioconductor packages use an x.y.z version scheme. The following rules apply:

When first submitted to Bioconductor, a package usually has version 0.99.0. For more details, see Version Numbering Standards

[ Back to top ]

C or Fortran code

If the package contains C or Fortran code, it should adhere to the standards and methods described in the System and foreign language interfaces section of the Writing R Extensions manual. In particular:

Use of external libraries whose functionality is redundant with libraries already supported is strongly discouraged. In cases where the external library is complex the author may need to supply pre-built binary versions for some platforms.

[ Back to top ]

Unit Tests

Unit tests are highly recommended. We find them indispensable for both package development and maintenance. Examples and explanations are provided here.

[ Back to top ]

Duplication of Packages in CRAN and Bioconductor

Authors are strongly discouraged from placing their package into both CRAN and Bioconductor. This avoids burdening the author with extra work and confusing the user.

[ Back to top ]

Package Author and Maintainer Responsibilities

Acceptance of packages into Bioconductor brings with it ongoing responsibility for package maintenance. These responsibilities include:

All authors mentioned in the package DESCRIPTION file are entitled to modify package source code. Changes to package authorship require consent of all authors.

[ Back to top ]

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:

 

Development Version»

Bioconductor packages under development:


Developer Resources:

Fred Hutchinson Cancer Research Center