This is a list of the last 100 packages added to Bioconductor and available in the development version of Bioconductor. The list is also available as an RSS Feed.

rnaseqcomp Benchmark for RNA-seq Quantification Pipelines

Several quantitative and visualized benchmarks for RNA-seq quantification pipelines. Two-replicate quantifications for genes, transcripts, junctions or exons by each pipeline with nessasery meta information should be organizd into numeric matrix in order to proceed the evaluation.

INSPEcT Analysis of 4sU-seq and RNA-seq time-course data

INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-Course experiments) analyses 4sU-seq and RNA-seq time-course data in order to evaluate synthesis, processing and degradation rates and asses via modeling the rates that determines changes in mature mRNA levels.

Prize Prize: an R package for prioritization estimation based on analytic hierarchy process

The high throughput studies often produce large amounts of numerous genes and proteins of interest. While it is difficult to study and validate all of them. Analytic Hierarchy Process (AHP) offers a novel approach to narrowing down long lists of candidates by prioritizing them based on how well they meet the research goal. AHP is a mathematical technique for organizing and analyzing complex decisions where multiple criteria are involved. The technique structures problems into a hierarchy of elements, and helps to specify numerical weights representing the relative importance of each element. Numerical weight or priority derived from each element allows users to find alternatives that best suit their goal and their understanding of the problem.

XBSeq Test for differential expression for RNA-seq data

We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes.

CNVPanelizer Reliable CNV detection in targeted sequencing applications

A method that allows for the use of a collection of non-matched normal tissue samples. Our approach uses a non-parametric bootstrap subsampling of the available reference samples to estimate the distribution of read counts from targeted sequencing. As inspired by random forest, this is combined with a procedure that subsamples the amplicons associated with each of the targeted genes. The obtained information allows us to reliably classify the copy number aberrations on the gene level.

fCI f-divergence Cutoff Index

(f-divergence Cutoff Index), is to find DEGs in the transcriptomic & proteomic data, and identify DEGs by computing the difference between the distribution of fold-changes for the control-control and remaining (non-differential) case-control gene expression ratio data. fCI provides several advantages compared to existing methods.

IONiseR Quality Assessment Tools for Oxford Nanopore MinION data

IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.

erma epigenomic road map adventures

Software and data to support epigenomic road map adventures.

PGA An package for identification of novel peptides by customized database derived from RNA-Seq

This package provides functions for construction of customized protein databases based on RNA-Seq data, database searching, post-processing and report generation. This kind of customized protein database includes both the reference database (such as Refseq or ENSEMBL) and the novel peptide sequences form RNA-Seq data.

mirIntegrator Integrating microRNA expression into signaling pathways for pathway analysis

Tools for augmenting signaling pathways to perform pathway analysis of microRNA and mRNA expression levels.


Given single-cell RNA-seq data and true experiment time of cells or pseudo-time cell ordering, SEPA provides convenient functions for users to assign genes into different gene expression patterns such as constant, monotone increasing and increasing then decreasing. SEPA then performs GO enrichment analysis to analysis the functional roles of genes with same or similar patterns.

hierGWAS Asessing statistical significance in predictive GWA studies

Testing individual SNPs, as well as arbitrarily large groups of SNPs in GWA studies, using a joint model of all SNPs. The method controls the FWER, and provides an automatic, data-driven refinement of the SNP clusters to smaller groups or single markers.

mtbls2 Böttcher et al. (2004) Comparative LC/MS-based profiling of silver nitrate-treated Arabidopsis thaliana leaves of wild-type and cyp79B2 cyp79B3 double knockout plants

Indole-3-acetaldoxime (IAOx) represents an early intermediate of the biosynthesis of a variety of indolic secondary metabolites including the phytoanticipin indol-3-ylmethyl glucosinolate and the phytoalexin camalexin (3-thiazol-2'-yl-indole). Arabidopsis thaliana cyp79B2 cyp79B3 double knockout plants are completely impaired in the conversion of tryptophan to indole-3-acetaldoxime and do not accumulate IAOx-derived metabolites any longer. Consequently, comparative analysis of wild-type and cyp79B2 cyp79B3 plant lines has the potential to explore the complete range of IAOx-derived indolic secondary metabolites.

caOmicsV Visualization of multi-dimentional cancer genomics data

caOmicsV package provides methods to visualize multi-dimentional cancer genomics data including of patient information, gene expressions, DNA methylations, DNA copy number variations, and SNP/mutations in matrix layout or network layout.

RareVariantVis Visualization of rare variants in whole genome sequencing data

Genomic variants can be analyzed and visualized using many tools. Unfortunately, number of tools for global interrogation of variants is limited. Package RareVariantVis aims to present genomic variants (especially rare ones) in a global, per chromosome way. Visualization is performed in two ways - standard that outputs png figures and interactive that uses JavaScript d3 package. Interactive visualization allows to analyze trio/family data, for example in search for causative variants in rare Mendelian diseases.

ELMER Inferring Regulatory Element Landscapes and Transcription Factor Networks Using Cancer Methylomes

ELMER is designed to use DNA methylation and gene expression from a large number of samples to infere regulatory element landscape and transcription factor network in primary tissue.

OperaMate An R package of Data Importing, Processing and Analysis for Opera High Content Screening System

OperaMate is a flexible R package dealing with the data generated by PerkinElmer's Opera High Content Screening System. The functions include the data importing, normalization and quality control, hit detection and function analysis.

acde Artificial Components Detection of Differentially Expressed Genes

This package provides a multivariate inferential analysis method for detecting differentially expressed genes in gene expression data. It uses artificial components, close to the data's principal components but with an exact interpretation in terms of differential genetic expression, to identify differentially expressed genes while controlling the false discovery rate (FDR). The methods on this package are described in the vignette or in the article 'Multivariate Method for Inferential Identification of Differentially Expressed Genes in Gene Expression Experiments' by J. P. Acosta, L. Lopez-Kleine and S. Restrepo (2015, pending publication).

CausalR Causal Reasoning on Biological Networks

Causal Reasoning algorithms for biological networks, including predictions, scoring, p-value calculation and ranking

RTCGAToolbox A new tool for exporting TCGA Firehose data

Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.

SummarizedExperiment SummarizedExperiment container

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

BEclear Correct for batch effects in DNA methylation data

Provides some functions to detect and correct for batch effects in DNA methylation data. The core function "BEclear" is based on latent factor models and can also be used to predict missing values in any other matrix containing real numbers.

EMDomics Earth Mover's Distance for Differential Analysis of Genomics Data

The EMDomics algorithm is used to perform a supervised two-class analysis to measure the magnitude and statistical significance of observed continuous genomics data between two groups. Usually the data will be gene expression values from array-based or sequence-based experiments, but data from other types of experiments can also be analyzed (e.g. copy number variation). Traditional methods like Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA) use significance tests based on summary statistics (mean and standard deviation) of the two distributions. This approach lacks power to identify expression differences between groups that show high levels of intra-group heterogeneity. The Earth Mover's Distance (EMD) algorithm instead computes the "work" needed to transform one distribution into the other, thus providing a metric of the overall difference in shape between two distributions. Permutation of sample labels is used to generate q-values for the observed EMD scores.

edge Extraction of Differential Gene Expression

The edge package implements methods for carrying out differential expression analyses of genome-wide gene expression studies. Significance testing using the optimal discovery procedure and generalized likelihood ratio tests (equivalent to F-tests and t-tests) are implemented for general study designs. Special functions are available to facilitate the analysis of common study designs, including time course experiments. Other packages such as snm, sva, and qvalue are integrated in edge to provide a wide range of tools for gene expression analysis.

pwOmics Pathway-based data integration of omics data

pwOmics performs pathway-based level-specific data comparison of matching omics data sets based on pre-analysed user-specified lists of differential genes/transcripts and proteins. A separate downstream analysis of proteomic data including pathway identification and enrichment analysis, transcription factor identification and target gene identification is opposed to the upstream analysis starting with gene or transcript information as basis for identification of upstream transcription factors and regulators. The cross-platform comparative analysis allows for comprehensive analysis of single time point experiments and time-series experiments by providing static and dynamic analysis tools for data integration.

similaRpeak similaRpeak: Metrics to estimate a level of similarity between two ChIP-Seq profiles

This package calculates metrics which assign a level of similarity between ChIP-Seq profiles.

msa Multiple Sequence Alignment

This package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.

RnBeads RnBeads

RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.

flowVS Variance stabilization in flow cytometry (and microarrays)

Per-channel variance stabilization from a collection of flow cytometry samples by Bertlett test for homogeneity of variances. The approach is applicable to microarrays data as well.

ENCODExplorer A compilation of ENCODE metadata

This package allows user to quickly access ENCODE project files metadata and give access to helper functions to query the ENCODE rest api, download ENCODE datasets and save the database in SQLite format.

CAnD Perform Chromosomal Ancestry Differences (CAnD) Analyses

Functions to perform the non-parametric and parametric CAnD tests on a set of ancestry proportions. For a particular ancestral subpopulation, a user will supply the estimated ancestry proportion for each sample, and each chromosome or chromosomal segment of interest. A p-value for each chromosome as well as an overall CAnD p-value will be returned for each test. Plotting functions are also available.

diffHic Differential analyis of Hi-C data

Detects differential interactions across biological conditions in a Hi-C experiment. Methods are provided for read alignment and data pre-processing into interaction counts. Statistical analysis is based on edgeR and supports normalization and filtering. Several visualization options are also available.

FlowRepositoryR FlowRepository R Interface

This package provides an interface to search and download data and annotations from FlowRepository ( It uses the FlowRepository programming interface to communicate with a FlowRepository server.

R3CPET 3CPET: Finding Co-factor Complexes in Chia-PET experiment using a Hierarchical Dirichlet Process

The package provides a method to infer the set of proteins that are more probably to work together to maintain chormatin interaction given a ChIA-PET experiment results.

pandaR PANDA algorithm

Runs PANDA, an algorithm for discovering novel network structure by combining information from multiple complimentary data sources.

ENmix Data preprocessing and quality control for Illumina HumanMethylation450 BeadChip

Illumina HumanMethylation450 BeadChip has a complex array design, and the measurement is subject to experimental variations. The ENmix package provides tools to preprocess data from the array. It includes functions to improve data quality and to prepare clean dataset for EWAS and other DNA methylation analyses. The ENmix uses the same data structure as R package minfi, and is compatible with several R packages, such as minfi and waterMelon, and provides complementary functions for data quality control, background correction, inter-array normalization and data confounder exploretion. Especially, the package incorporates a novel model based background correction method ENmix, which was demonstrated to outperform other alternative background correction method. To support large scale data analysis, the package also provides multi-processor parallel computing wrappers for some commonly used data preprocessing methods, such as BMIQ probe design type bias correction and ComBat batch effect correction.

soGGi Visualise ChIP-seq, MNase-seq and motif occurrence as aggregate plots Summarised Over Grouped Genomic Intervals

The soGGi package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.

MethTargetedNGS Perform Methylation Analysis on Next Generation Sequencing Data

Perform step by step methylation analysis of Next Generation Sequencing data.

conumee Enhanced copy-number variation analysis using Illumina 450k methylation arrays

This package contains a set of processing and plotting methods for performing copy-number variation (CNV) analysis using Illumina 450k methylation arrays.

RCyjs Display and manipulate graphs in Cytoscape.js

Interactvive viewing and exploration of graphs, connecting R to Cytoscape.js

TPP Analyze thermal proteome profiling (TPP) experiments

Analyze thermal proteome profiling (TPP) experiments with varying temperatures (TR) or compound concentrations (CCR).

NanoStringQCPro Quality metrics and data processing methods for NanoString mRNA gene expression data

NanoStringQCPro provides a set of quality metrics that can be used to assess the quality of NanoString mRNA gene expression data -- i.e. to identify outlier probes and outlier samples. It also provides different background subtraction and normalization approaches for this data. It outputs suggestions for flagging samples/probes and an easily sharable html quality control output.

GoogleGenomics R Client for Google Genomics API

Provides an R package to interact with the Google Genomics API.

CopywriteR Copy number information from targeted sequencing using off-target reads

CopywriteR extracts DNA copy number information from targeted sequencing by utiizing off-target sequence reads. It allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. Thereby, CopywriteR constitutes a widely applicable alternative to available tools.

cogena co-expressed gene-set enrichment analysis

Description: Gene set enrichment analysis is a valuable tool for the study of molecular mechanisms that underpin complex biological traits. As the method is conventionally used on entire omic datasets, such as transcriptomes, it may be dominated by pathways and processes that are substantially represented in a dataset, however the approach may overlook smaller scale, but highly correlated cellular events that may be of great biological relevance. In order to detect these discrete molecular triggers, we developed a tool, co-expressed gene-set enrichment analysis (cogena), for clustering differentially expressed genes and identification of highly correlated molecular expression clusters. Cogena offers the user a range of clustering methods, including hierarchical clustering, model based clustering and self-organised mapping, based on different distance metrics like correlation and mutual information. After obtaining and visualising clusters, cogena performs gene set enrichment. These gene sets can be sourced from the Molecular Signatures Database (MSigDB) or user-defined gene sets. By performing gene set enrichment across expression clusters, we find considerable enhancement in the resolution of molecular signatures in omic data at the cluster level compared to the whole.

BrowserVizDemo BrowserVizDemo: How to subclass BrowserViz

A BrowserViz subclassing example, xy plotting in the browser using d3

SVM2CRM SVM2CRM: support vector machine for cis-regulatory elements detections

Detection of cis-regulatory elements using svm implemented in LiblineaR.

RUVcorr Removal of unwanted variation for gene-gene correlations and related analysis

RUVcorr allows to apply global removal of unwanted variation (ridged version of RUV) to real and simulated gene expression data.

OmicsMarkeR Classification and Feature Selection for 'Omics' Datasets

Tools for classification and feature selection for 'omics' level datasets. It is a tool to provide multiple multivariate classification and feature selection techniques complete with multiple stability metrics and aggregation techniques. It is primarily designed for analysis of metabolomics datasets but potentially extendable to proteomics and transcriptomics applications.

mogsa Multiple omics data integration and gene set analysis

This package provide a method for doing gene set analysis based on multiple omics data.

FISHalyseR FISHalyseR a package for automated FISH quantification

FISHalyseR provides functionality to process and analyse digital cell culture images, in particular to quantify FISH probes within nuclei. Furthermore, it extract the spatial location of each nucleus as well as each probe enabling spatial co-localisation analysis.

DMRcaller Differentially Methylated Regions caller

Uses Bisulfite sequencing data in two conditions and identifies differentially methylated regions between the conditions in CG and non-CG context. The input is the CX report files produced by Bismark and the output is a list of DMRs stored as GRanges objects.

regioneR Association analysis of genomic regions based on permutation tests

regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.

pmm Parallel Mixed Model

The Parallel Mixed Model (PMM) approach is suitable for hit selection and cross-comparison of RNAi screens generated in experiments that are performed in parallel under several conditions. For example, we could think of the measurements or readouts from cells under RNAi knock-down, which are infected with several pathogens or which are grown from different cell lines.

ComplexHeatmap Making Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential structures. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports self-defined annotation graphics.

RBM RBM: a R package for microarray and RNA-Seq data analysis

Use A Resampling-Based Empirical Bayes Approach to Assess Differential Expression in Two-Color Microarrays and RNA-Seq data sets.

podkat Position-Dependent Kernel Association Test

This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.

LowMACA LowMACA - Low frequency Mutation Analysis via Consensus Alignment

The LowMACA package is a simple suite of tools to investigate and analyze the mutation profile of several proteins or pfam domains via consensus alignment. You can conduct an hypothesis driven exploratory analysis using our package simply providing a set of genes or pfam domains of your interest.

rcellminer rcellminer: Molecular Profiles and Drug Response for the NCI-60 Cell Lines

The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project ( has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.

gtrellis Genome Level Trellis Layout

Genome level Trellis graph visualizes genomic data conditioned by genomic categories (e.g. chromosomes). For each genomic category, multiple dimensional data which are represented as tracks describe different features from different aspects. This package provides high flexibility to arrange genomic categories and add self-defined graphics in the plot.

ensembldb Utilities to create and use an Ensembl based annotation database

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, the ensembldb package provides also a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes.

TIN Transcriptome instability analysis

The TIN package implements a set of tools for transcriptome instability analysis based on exon expression profiles. Deviating exon usage is studied in the context of splicing factors to analyse to what degree transcriptome instability is correlated to splicing factor expression. In the transcriptome instability correlation analysis, the data is compared to both random permutations of alternative splicing scores and expression of random gene sets.

InPAS Identification of Novel alternative PolyAdenylation Sites (PAS)

Alternative polyadenylation (APA) is one of the important post-transcriptional regulation mechanisms which occurs in most human genes. InPAS facilitates the discovery of novel APA sites from RNAseq data. It leverages cleanUpdTSeq to fine tune identified APA sites.

GENESIS GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015): a Principal Components Analysis with genome-wide SNP genotype data for robust population structure inference in samples with related individuals (known or cryptic).

bamsignals Extract read count signals from bam files

This package allows to efficiently obtain count vectors from indexed bam files. It counts the number of reads in given genomic ranges and it computes reads profiles and coverage profiles. It also handles paired-end data.

SIMAT GC-SIM-MS data processing and alaysis tool

This package provides a pipeline for analysis of GC-MS data acquired in selected ion monitoring (SIM) mode. The tool also provides a guidance in choosing appropriate fragments for the targets of interest by using an optimization algorithm. This is done by considering overlapping peaks from a provided library by the user.

RNAprobR An R package for analysis of massive parallel sequencing based RNA structure probing data

This package facilitates analysis of Next Generation Sequencing data for which positional information with a single nucleotide resolution is a key. It allows for applying different types of relevant normalizations, data visualization and export in a table or UCSC compatible bedgraph file.

netbenchmark Benchmarking of several gene network inference methods

This package implements a benchmarking of several gene network inference algorithms from gene expression data.

MatrixRider Obtain total affinity and occupancies for binding site matrices on a given sequence

Calculates a single number for a whole sequence that reflects the propensity of a DNA binding protein to interact with it. The DNA binding protein has to be described with a PFM matrix, for example gotten from Jaspar.

LEA LEA: an R package for Landscape and Ecological Association Studies

LEA is an R package dedicated to landscape genomics and ecological association tests. LEA can run analyses of population structure and genome scans for local adaptation. It includes statistical methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (snmf, pca); and identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (lfmm), and controlling the false discovery rate. LEA is mainly based on optimized C programs that can scale with the dimension of very large data sets.

immunoClust immunoClust - Automated Pipeline for Population Detection in Flow Cytometry

Model based clustering and meta-clustering of Flow Cytometry Data

diggit Inference of Genetic Variants Driving Cellular Phenotypes

Inference of Genetic Variants Driving Cellullar Phenotypes by the DIGGIT algorithm

canceR A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC.

The package is user friendly interface based on the cgdsr and other modeling packages to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).

muscle Multiple Sequence Alignment with MUSCLE

MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.

BrowserViz BrowserViz: interactive R/browser graphics using websockets and JSON

Interactvive graphics in a web browser from R, using websockets and JSON

Rhtslib HTSlib high-throughput sequencing library as an R package

This package provides version 1.1 of the 'HTSlib' C library for high-throughput sequence analysis. The package is primarily useful to developers of other R packages who wish to make use of HTSlib. Motivation and instructions for use of this package are in the vignette, vignette(package="Rhtslib", "Rhtslib").

skewr Visualize Intensities Produced by Illumina's Human Methylation 450k BeadChip

The skewr package is a tool for visualizing the output of the Illumina Human Methylation 450k BeadChip to aid in quality control. It creates a panel of nine plots. Six of the plots represent the density of either the methylated intensity or the unmethylated intensity given by one of three subsets of the 485,577 total probes. These subsets include Type I-red, Type I-green, and Type II.The remaining three distributions give the density of the Beta-values for these same three subsets. Each of the nine plots optionally displays the distributions of the "rs" SNP probes and the probes associated with imprinted genes as series of 'tick' marks located above the x-axis.

sigsquared Gene signature generation for functionally validated signaling pathways

By leveraging statistical properties (log-rank test for survival) of patient cohorts defined by binary thresholds, poor-prognosis patients are identified by the sigsquared package via optimization over a cost function reducing type I and II error.

SELEX Functions for analyzing SELEX-seq data

Tools for quantifying DNA binding specificities based on SELEX-seq data

ProtGenerics S4 generic functions for Bioconductor proteomics infrastructure

S4 generic functions needed by Bioconductor proteomics packages.

BubbleTree A method to elucidate purity and clonality in tumors using copy number ratio and allele frequency

BubbleTree utilizes homogenous pertinent somatic copy number alterations (SCNAs) as markers of tumor clones to extract estimates of tumor ploidy, purity and clonality.

rGREAT Client for GREAT Analysis

This package makes GREAT (Genomic Regions Enrichment of Annotations Tool) analysis automatic by constructing a HTTP POST request according to user's input and automatically retrieving results from GREAT web server.

birte Bayesian Inference of Regulatory Influence on Expression (biRte)

Expression levels of mRNA molecules are regulated by different processes, comprising inhibition or activation by transcription factors and post-transcriptional degradation by microRNAs. biRte uses regulatory networks of TFs, miRNAs and possibly other factors, together with mRNA, miRNA and other available expression data to predict the relative influence of a regulator on the expression of its target genes. Inference is done in a Bayesian modeling framework using Markov-Chain-Monte-Carlo. A special feature is the possibility for follow-up network reverse engineering between active regulators.

HIBAG HLA Genotype Imputation with Attribute Bagging

It is a software package for imputing HLA types using SNP data, and relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

sincell R package for the statistical assessment of cell state hierarchies from single-cell RNA-seq data

Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.

Cardinal A mass spectrometry imaging toolbox for statistical analysis

Implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.

GreyListChIP Grey Lists -- Mask Artefact Regions Based on ChIP Inputs

Identify regions of ChIP experiments with high signal in the input, that lead to spurious peaks during peak calling. Remove reads aligning to these regions prior to peak calling, for cleaner ChIP analysis.

IVAS Identification of genetic Variants affecting Alternative Splicing

Identification of genetic variants affecting alternative splicing.

cytofkit cytofkit: an integrated analysis pipeline for mass cytometry data

An integrated mass cytometry data analysis pipeline that enables simultaneous illustration of cellular diversity and progression.

seq2pathway a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data

Seq2pathway is a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data, consisting of "seq2gene" and "gene2path" components. The seq2gene links sequence-level measurements of genomic regions (including SNPs or point mutation coordinates) to gene-level scores, and the gene2pathway summarizes gene scores to pathway-scores for each sample. The seq2gene has the feasibility to assign both coding and non-exon regions to a broader range of neighboring genes than only the nearest one, thus facilitating the study of functional non-coding regions. The gene2pathway takes into account the quantity of significance for gene members within a pathway compared those outside a pathway. The output of seq2pathway is a general structure of quantitative pathway-level scores, thus allowing one to functional interpret such datasets as RNA-seq, ChIP-seq, GWAS, and derived from other next generational sequencing experiments.

ggtree a phylogenetic tree viewer for different types of tree annotations

ggtree extends the ggplot2 plotting system which implemented the grammar of graphics. ggtree is designed for visualizing phylogenetic tree and different types of associated annotation data.

parglms support for parallelized estimation of GLMs/GEEs

support for parallelized estimation of GLMs/GEEs, catering for dispersed data

seqPattern Visualising oligonucleotide patterns and motif occurrences across a set of sorted sequences

Visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature.

MeSHSim MeSH(Medical Subject Headings) Semantic Similarity Measures

Provide for measuring semantic similarity over MeSH headings and MEDLINE documents

mAPKL A Hybrid Feature Selection method for gene expression data

We propose a hybrid FS method (mAP-KL), which combines multiple hypothesis testing and affinity propagation (AP)-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes.

gdsfmt R Interface to CoreArray Genomic Data Structure (GDS) Files

This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms and include hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers with less than 8 bits, since a single genetic/genomic variant, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are also supported with relatively efficient random access. It is allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

TRONCO TRONCO, a package for TRanslational ONCOlogy

Genotype-level cancer progression models describe the ordering of accumulating mutations, e.g., somatic mutations / copy number variations, during cancer development. These graphical models help understand the causal structure involving events promoting cancer progression, possibly predicting complex patterns characterising genomic progression of a cancer. Reconstructed models can be used to better characterise genotype-phenotype relation, and suggest novel targets for therapy design. TRONCO (TRanslational ONCOlogy) is a R package aimed at collecting state-of-the-art algorithms to infer progression models from cross-sectional data, i.e., data collected from independent patients which does not necessarily incorporate any evident temporal information. These algorithms require a binary input matrix where: (i) each row represents a patient genome, (ii) each column an event relevant to the progression (a priori selected) and a 0/1 value models the absence/presence of a certain mutation in a certain patient. The current first version of TRONCO implements the CAPRESE algorithm (Cancer PRogression Extraction with Single Edges) to infer possible progression models arranged as trees; cfr. Inferring tree causal models of cancer progression with probability raising, L. Olde Loohuis, G. Caravagna, A. Graudenzi, D. Ramazzotti, G. Mauri, M. Antoniotti and B. Mishra. PLoS One, to appear. This vignette shows how to use TRONCO to infer a tree model of ovarian cancer progression from CGH data of copy number alterations (classified as gains or losses over chromosome's arms). The dataset used is available in the SKY/M-FISH database.

RnaSeqSampleSize RnaSeqSampleSize

RnaSeqSampleSize package provides a sample size calculation method based on negative binomial model and the exact test for assessing differential expression analysis of RNA-seq data

gespeR Gene-Specific Phenotype EstimatoR

Estimates gene-specific phenotypes from off-target confounded RNAi screens. The phenotype of each siRNA is modeled based on on-targeted and off-targeted genes, using a regularized linear regression model.

coMET coMET: visualisation of regional epigenome-wide association scan (EWAS) results and DNA co-methylation patterns.

Visualisation of EWAS results in a genomic region. In addition to phenotype-association P-values, coMET also generates plots of co-methylation patterns and provides a series of annotation tracks. It can be used to other omic-wide association scans as long as the data can be translated to genomic level and for any species.

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:


Development Version »

Bioconductor packages under development:

Developer Resources:

Fred Hutchinson Cancer Research Center