4 Important Bioconductor Package Development Features
4.1 biocViews
Packages added to the Bioconductor Project require a biocViews:
field in their DESCRIPTION
file. The field name “biocViews” is
case-sensitive and must begin with a lower-case ‘b’.
biocViews
terms are “keywords” used to describe a given package. They
are broadly divided into four categories, representing the type of
packages present in the Bioconductor Project
- Software
- Annotation Data
- Experiment Data
- Workflow
biocViews
are available for the release and
devel branches of Bioconductor. The devel branch has a check box
under the tree structure which, when checked, displays biocViews
that are
defined but not used by any package, in addition to biocViews
that are in use.
See also description section
4.1.1 Motivation
One can use biocViews
for two broad purposes.
A researcher might want to identify all packages in the Bioconductor Project which are related to a specific purpose. For example, one may want to look for all packages related to “Copy Number Variants”.
During development, a package contributor can “tag” their package with
biocViews
so that when someone looking for packages (like in scenario 1) can easily find their package.
4.1.2 biocViews during new package development
Visit the ‘devel’ biocViews when you are in the process of
adding biocViews
to your new package. Identify as many terms as
appropriate from the hierarchy. Prefer ‘leaf’ terms at the end of the
hierarchy, over more inclusive terms. Remember to check the box
displaying all available terms.
Please Note:
Your package will belong to only one part of Bioconductor Project (Software, Annotation Data, Experiment Data, Workflow), so choose only
biocViews
from that category.biocViews
listed in your package must match exactly (e.g., spelling, capitalization) the terms in thebiocViews
hierarchy.
When you submit your new package for review , your package is checked and built by the Bioconductor Project. We check the following for biocViews:
Package contributor has added
biocViews
.biocViews
are valid.Package contributor has added
biocViews
from only one of the categories.
If you receive a “RECOMMENDED” direction for any of these biocViews
after you have submitted your package, you can try correcting them on
your own following the directions given here or ask your package
reviewer for more information.
If a developer thinks a biocViews
term should be added to the current
acceptable list, please email bioc-devel@r-project.org requesting the new
biocViews
term, under which hierarchy the term should be placed, and the
justification for the new term.
4.2 Common Bioconductor Methods and Classes
We strongly recommend reusing existing methods for importing data, and reusing established classes for representing data. Here are some suggestions for importing different file types and commonly used Bioconductor classes. For more classes and functionality also try searching in biocViews for your data type.
4.2.1 Importing data
- GTF, GFF, BED, BigWig, etc., – rtracklayer
::import()
- VCF – VariantAnnotation
::readVcf()
- SAM / BAM – Rsamtools
::scanBam()
, GenomicAlignments::readGAlignment*()
- FASTA – Biostrings
::readDNAStringSet()
- FASTQ – ShortRead
::readFastq()
- MS data (XML-based and mgf formats) – Spectra
::Spectra()
, Spectra::Spectra(source = MsBackendMgf::MsBackendMgf())
4.2.2 Common Classes
- Rectangular feature x sample data –
SummarizedExperiment
::SummarizedExperiment()
(RNAseq count matrix, microarray, …) - Genomic coordinates – GenomicRanges
::GRanges()
(1-based, closed interval) - Genomic coordinates from multiple samples –
GenomicRanges
::GRangesList()
- Ragged genomic coordinates – RaggedExperiment
::RaggedExperiment()
- DNA / RNA / AA sequences – Biostrings
::*StringSet()
- Gene sets – BiocSet
::BiocSet()
, GSEABase::GeneSet()
, GSEABase::GeneSetCollection()
- Multi-omics data –
MultiAssayExperiment
::MultiAssayExperiment()
- Single cell data –
SingleCellExperiment
::SingleCellExperiment()
- Spatial transcriptomics data –
SpatialExperiment
::SpatialExperiment()
- Mass spec data – Spectra
::Spectra()
- File formats – BiocIO
::`BiocFile-class`
In general, a package will not be accepted if it does not show interoperability with the current Bioconductor ecosystem.
4.3 Vignette
Every submitted Bioconductor package should have at least one Rmd (preferred) or
Rnw vignette, ideally utilizing BiocStyle::html_document
as output
rendering. This should include evaluated R package code and a detailed
introduction/abstract section that provides motivation for inclusion in
Bioconductor and when appropriate a review and comparison to existing
Bioconductor packages with similar functionality or scope. See vignette
documentation section for more details.