This is a list of the last 100 packages added to Bioconductor and available in the development version of Bioconductor. The list is also available as an RSS Feed.

IWTomics Interval-Wise Testing for Omics Data

Implementation of the Interval-Wise Testing (IWT) for omics data. This inferential procedure tests for differences in "Omics" data between two groups of genomic regions (or between a group of genomic regions and a reference center of symmetry), and does not require fixing location and scale at the outset.

GRridge Better prediction by use of co-data: Adaptive group-regularized ridge regression

This package allows the use of multiple sources of co-data (e.g. external p-values, gene lists, annotation) to improve prediction of binary, continuous and survival response using (logistic, linear or Cox) group-regularized ridge regression. It also facilitates post-hoc variable selection and prediction diagnostics by cross-validation using ROC curves and AUC.

MWASTools MWASTools: an integrated pipeline to perform metabolome-wide association studies

MWAS provides a complete pipeline to perform metabolome-wide association studies. Key functionalities of the package include: quality control analysis of metabonomic data; MWAS using different association models (partial correlations; generalized linear models); model validation using non-parametric bootstrapping; visualization of MWAS results; NMR metabolite identification using STOCSY.

phosphonormalizer Compensates for the bias introduced by median normalization in phosphoproteomics

It uses the overlap between enriched and non-enriched datasets to compensate for the bias introduced in global phosphorylation after applying median normalization.

BPRMeth Model higher-order methylation profiles

BPRMeth package uses the Binomial Probit Regression likelihood to model methylation profiles and extract higher order features. These features quantitate precisely notions of shape of a methylation profile. Using these higher order features across promoter-proximal regions, we construct a powerful predictor of gene expression. Also, these features are used to cluster proximal-promoter regions using the EM algorithm.

yarn YARN: Robust Multi-Condition RNA-Seq Preprocessing and Normalization

Expedite large RNA-Seq analyses using a combination of previously developed tools. YARN is meant to make it easier for the user in performing basic mis-annotation quality control, filtering, and condition-aware normalization. YARN leverages many Bioconductor tools and statistical techniques to account for the large heterogeneity and sparsity found in very large RNA-seq experiments.

fCCAC functional Canonical Correlation Analysis to evaluate Covariance between nucleic acid sequencing datasets

An application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq).

CCPROMISE PROMISE analysis with Canonical Correlation for Two Forms of High Dimensional Genetic Data

Perform Canonical correlation between two forms of high demensional genetic data, and associate the first compoent of each form of data with a specific biologically interesting pattern of associations with multiple endpoints. A probe level analysis is also implemented.

proFIA Preprocessing of FIA-HRMS data

Flow Injection Analysis coupled to High-Resolution Mass Spectrometry is a promising approach for high-throughput metabolomics. FIA- HRMS data, however, cannot be pre-processed with current software tools which rely on liquid chromatography separation, or handle low resolution data only. Here we present the proFIA package, which implements a new methodology to pre-process FIA-HRMS raw data (netCDF, mzData, mzXML, and mzML) including noise modelling and injection peak reconstruction, and generate the peak table. The workflow includes noise modelling, band detection and filtering then signal matching and missing value imputation. The peak table can then be exported as a .tsv file for further analysis. Visualisations to assess the quality of the data and of the signal made are easely produced.

yamss Tools for high-throughput metabolomics

Tools to analyze and visualize high-throughput metabolomics data aquired using chromatography-mass spectrometry. These tools preprocess data in a way that enables reliable and powerful differential analysis.

regsplice Regularization-Based Methods for Detection of Differential Exon Usage

Statistical methods for detection of differential exon usage in RNA-seq and exon microarray data sets, using L1 regularization (lasso) to improve power.

StarBioTrek StarBioTrek

This tool StarBioTrek presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network data.

TVTB TVTB: The VCF Tool Box

The package provides S4 classes and methods to filter, summarise and visualise genetic variation data stored in VCF files. In particular, the package extends the FilterRules class (S4Vectors package) to define news classes of filter rules applicable to the various slots of VCF objects. Functionalities are integrated and demonstrated in a Shiny web-application, the Shiny Variant Explorer (tSVE).

M3Drop Michaelis-Menten Modelling of Dropouts in single-cell RNASeq

This package fits a Michaelis-Menten model to the pattern of dropouts in single-cell RNASeq data. This model is used as a null to identify significantly variable (i.e. differentially expressed) genes for use in downstream analysis, such as clustering cells.

meshes MeSH Enrichment and Semantic analyses

MeSH (Medical Subject Headings) is the NLM controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH terms were associated by Entrez Gene ID by three methods, gendoo, gene2pubmed and RBBH. This association is fundamental for enrichment and semantic analyses. meshes supports enrichment analysis (over-representation and gene set enrichment analysis) of gene list or whole expression profile. The semantic comparisons of MeSH terms provide quantitative ways to compute similarities between genes and gene groups. meshes implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively and supports more than 70 species.

annotatr Annotation of Genomic Regions to Genomic Annotations

Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.

crisprseekplus crisprseekplus

Bioinformatics platform containing interface to work with offTargetAnalysis and compare2Sequences in the CRISPRseek package, and GUIDEseqAnalysis.

HelloRanges Introduce *Ranges to bedtools users

Translates bedtools command-line invocations to R code calling functions from the Bioconductor *Ranges infrastructure. This is intended to educate novice Bioconductor users and to compare the syntax and semantics of the two frameworks.

MutationalPatterns Studying patterns in base substitution catalogues

An extensive toolset for the characterization and visualization of a wide range of mutational patterns in base substitution data.

anamiR An integrated analysis package of miRNA and mRNA expression data

This package is intended to identify potential interactions of miRNA-target gene interactions from miRNA and mRNA expression data. It contains functions for statistical test, databases of miRNA-target gene interaction and functional analysis.

psichomics Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation

Package with a Shiny-based graphical interface for the integrated analysis of alternative splicing data from The Cancer Genome Atlas (TCGA). This tool interactively performs survival, principal components and differential splicing analyses with direct incorporation of clinical features (such as tumour stage or survival) associated with TCGA samples.

MoonlightR Identify oncogenes and tumor suppressor genes from omics data

Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.

matter A framework for rapid prototyping with binary data on disk

Memory-efficient reading, writing, and manipulation of structured binary data on disk as vectors, matrices, and arrays. This package is designed to be used as a back-end for Cardinal for working with high-resolution mass spectrometry imaging data.

PathoStat PathoStat Statistical Microbiome Analysis Package

The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. PathoStat provides various functionalities including Relative Abundance charts, Diversity estimates and plots, tests of Differential Abundance, Time Series visualization, and Core OTU analysis.

MAST Model-based Analysis of Single Cell Transcriptomics

Methods and models for handling zero-inflated single cell assay data.

flowPloidy Analyze flow cytometer data to determine sample ploidy

Determine sample ploidy via flow cytometry histogram analysis. Reads Flow Cytometry Standard (FCS) files via the flowCore bioconductor package, and provides functions for determining the DNA ploidy of samples based on internal standards.

KEGGlincs Visualize all edges within a KEGG pathway and overlay LINCS data [option]

See what is going on 'under the hood' of KEGG pathways by explicitly re-creating the pathway maps from information obtained from KGML files.

geneXtendeR Optimal Gene Extensions From Histone Modification ChIP-seq Data

geneXtendeR is designed to optimally annotate a histone modification ChIP-seq peak input file with functionally important genomic features (e.g., genes associated with peaks) based on optimization calculations. geneXtendeR optimally extends the boundaries of every gene in a genome by some genomic distance (in DNA base pairs) for the purpose of flexibly incorporating cis-regulatory elements (CREs), such as enhancers and promoters, as well as downstream elements that are important to the function of the gene relative to an epigenetic histone modification ChIP-seq dataset. geneXtender computes optimal gene extensions tailored to the broadness of the specific epigenetic mark (e.g., H3K9me1, H3K27me3), as determined by a user-supplied ChIP-seq peak input file. As such, geneXtender maximizes the signal-to-noise ratio of locating genes closest to and directly under peaks. By performing a computational expansion of this nature, ChIP-seq reads that would initially not map strictly to a specific gene can now be optimally mapped to the regulatory regions of the gene, thereby implicating the gene as a potential candidate, and thereby making the ChIP-seq experiment more successful. Such an approach becomes particularly important when working with epigenetic histone modifications that have inherently broad peaks.

CancerInSilico An R interface for computational modeling of tumor progression

The CancerInSilico package provides an R interface for running mathematical models of tumor progresson. This package has the underlying models implemented in C++ and the output and analysis features implemented in R.

statTarget Statistical Analysis of Metabolite Profile

An easy to use tool provide a graphical user interface for quality control based shift signal correction, integration of metabolomic data from multi-batch experiments, and the comprehensive statistic analysis in non-targeted or targeted metabolomics.

DEsubs DEsubs: an R package for flexible identification of differentially expressed subpathways using RNA-seq expression experiments

DEsubs is a network-based systems biology package that extracts disease-perturbed subpathways within a pathway network as recorded by RNA-seq experiments. It contains an extensive and customizable framework covering a broad range of operation modes at all stages of the subpathway analysis, enabling a case-specific approach. The operation modes refer to the pathway network construction and processing, the subpathway extraction, visualization and enrichment analysis with regard to various biological and pharmacological features. Its capabilities render it a tool-guide for both the modeler and experimentalist for the identification of more robust systems-level biomarkers for complex diseases.

GOpro Find the most characteristic gene ontology terms for groups of human genes

Find the most characteristic gene ontology terms for groups of human genes. This package was created as a part of the thesis which was developed under the auspices of MI^2 Group (http://mi2.mini.pw.edu.pl/, https://github.com/geneticsMiNIng).

SPLINTER Splice Interpreter Of Transcripts

SPLINTER provides tools to analyze alternative splicing sites, interpret outcomes based on sequence information, select and design primers for site validiation and give visual representation of the event to guide downstream experiments.

SIMLR SIMLR: Single-cell Interpretation via Multi-kernel LeaRning

Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization. SIMLR is capable of separating known subpopulations more accurately in single-cell data sets than do existing dimension reduction methods. Additionally, SIMLR demonstrates high sensitivity and accuracy on high-throughput peripheral blood mononuclear cells (PBMC) data sets generated by the GemCode single-cell technology from 10x Genomics.

FitHiC Confidence estimation for intra-chromosomal contact maps

Fit-Hi-C is a tool for assigning statistical confidence estimates to intra-chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C.

Pi Leveraging Genetic Evidence to Prioritise Drug Targets at the Gene, Pathway and Network Level

Priority index or Pi is developed as a genomic-led target prioritisation system, with the focus on leveraging human genetic data to prioritise potential drug targets at the gene, pathway and network level. The long term goal is to use such information to enhance early-stage target validation. Based on evidence of disease association from genome-wide association studies (GWAS), this prioritisation system is able to generate evidence to support identification of the specific modulated genes (seed genes) that are responsible for the genetic association signal by utilising knowledge of linkage disequilibrium (co-inherited genetic variants), distance of associated variants from the gene, and evidence of independent genetic association with gene expression in disease-relevant tissues, cell types and states. Seed genes are scored in an integrative way, quantifying the genetic influence. Scored seed genes are subsequently used as baits to rank seed genes plus additional (non-seed) genes; this is achieved by iteratively exploring the global connectivity of a gene interaction network. Genes with the highest priority are further used to identify/prioritise pathways that are significantly enriched with highly prioritised genes. Prioritised genes are also used to identify a gene network interconnecting highly prioritised genes and a minimal number of less prioritised genes (which act as linkers bringing together highly prioritised genes).

uSORT uSORT: A self-refining ordering pipeline for gene selection

This package is designed to uncover the intrinsic cell progression path from single-cell RNA-seq data. It incorporates data pre-processing, preliminary PCA gene selection, preliminary cell ordering, feature selection, refined cell ordering, and post-analysis interpretation and visualization.

bigmelon Illumina methylation array analysis for large experiments

Methods for working with Illumina arrays using gdsfmt.

synergyfinder Calculate and Visualize Synergy Scores for Drug Combinations

Efficient implementations for all the popular synergy scoring models for drug combinations, including HSA, Loewe, Bliss and ZIP and visualization of the synergy scores as either a two-dimensional or a three-dimensional interaction surface over the dose matrix.

BaalChIP BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes

The package offers functions to process multiple ChIP-seq BAM files and detect allele-specific events. Computes allele counts at individual variants (SNPs/SNVs), implements extensive QC steps to remove problematic variants, and utilizes a bayesian framework to identify statistically significant allele- specific events. BaalChIP is able to account for copy number differences between the two alleles, a known phenotypical feature of cancer samples.

readat Functionality to Read and Manipulate SomaLogic ADAT files

This package contains functionality to import, transform and annotate data from ADAT files generated by the SomaLogic SOMAscan platform.

BiocWorkflowTools Tools to aid the development of Bioconductor Workflow packages

Provides functions to ease the transition between Rmarkdown and LaTeX documents when authoring a Bioconductor Workflow.

signeR Empirical Bayesian approach to mutational signature discovery

The signeR package provides an empirical Bayesian approach to mutational signature discovery. It is designed to analyze single nucleotide variaton (SNV) counts in cancer genomes, but can also be applied to other features as well. Functionalities to characterize signatures or genome samples according to exposure patterns are also provided.

LINC co-expression of lincRNAs and protein-coding genes

This package provides methods to compute co-expression networks of lincRNAs and protein-coding genes. Biological terms associated with the sets of protein-coding genes predict the biological contexts of lincRNAs according to the 'Guilty by Association' approach.

gCrisprTools Suite of Functions for Pooled Crispr Screen QC and Analysis

Set of tools for evaluating pooled high-throughput screening experiments, typically employing CRISPR/Cas9 or shRNA expression cassettes. Contains methods for interrogating library and cassette behavior within an experiment, identifying differentially abundant cassettes, aggregating signals to identify candidate targets for empirical validation, hypothesis testing, and comprehensive reporting.

MetaboSignal MetaboSignal: a network-based approach to overlay and explore metabolic and signaling KEGG pathways

MetaboSignal is an R package that allows merging, analyzing and customizing metabolic and signaling KEGG pathways. It is a network-based approach designed to explore the topological relationship between genes (signaling- or enzymatic-genes) and metabolites, representing a powerful tool to investigate the genetic landscape and regulatory networks of metabolic phenotypes.

philr Phylogenetic partitioning based ILR transform for metagenomics data

PhILR is short for Phylogenetic Isometric Log-Ratio Transform. This package provides functions for the analysis of compositional data (e.g., data representing proportions of different variables/parts). Specifically this package allows analysis of compositional data where the parts can be related through a phylogenetic tree (as is common in microbiota survey data) and makes available the Isometric Log Ratio transform built from the phylogenetic tree and utilizing a weighted reference measure.

geneAttribution Identification of candidate genes associated with genetic variation

Identification of the most likely gene or genes through which variation at a given genomic locus in the human genome acts. The most basic functionality assumes that the closer gene is to the input locus, the more likely the gene is to be causative. Additionally, any empirical data that links genomic regions to genes (e.g. eQTL or genome conformation data) can be used if it is supplied in the UCSC .BED file format.

YAPSA Yet Another Package for Signature Analysis

This package provides functions and routines useful in the analysis of somatic signatures (cf. L. Alexandrov et al., Nature 2013). In particular, functions to perform a signature analysis with known signatures (LCD = linear combination decomposition) and a signature analysis on stratified mutational catalogue (SMC = stratify mutational catalogue) are provided.

eegc Engineering Evaluation by Gene Categorization (eegc)

This package has been developed to evaluate cellular engineering processes for direct differentiation of stem cells or conversion (transdifferentiation) of somatic cells to primary cells based on high throughput gene expression data screened either by DNA microarray or RNA sequencing. The package takes gene expression profiles as inputs from three types of samples: (i) somatic or stem cells to be (trans)differentiated (input of the engineering process), (ii) induced cells to be evaluated (output of the engineering process) and (iii) target primary cells (reference for the output). The package performs differential gene expression analysis for each pair-wise sample comparison to identify and evaluate the transcriptional differences among the 3 types of samples (input, output, reference). The ideal goal is to have induced and primary reference cell showing overlapping profiles, both very different from the original cells.

Anaquin Statistical analysis of sequins

The project is intended to support the use of sequins (synthetic sequencing spike-in controls) owned and made available by the Garvan Institute of Medical Research. The goal is to provide a standard open source library for quantitative analysis, modelling and visualization of spike-in controls.

alpine alpine

Fragment sequence bias modeling and correction for RNA-seq transcript abundance estimation.

PharmacoGx Analysis of Large-Scale Pharmacogenomic Data

Contains a set of functions to perform large-scale analysis of pharmacogenomic data.

GAprediction Prediction of gestational age with Illumina HumanMethylation450 data

[GAprediction] predicts gestational age using Illumina HumanMethylation450 CpG data.

SVAPLSseq SVAPLSseq-An R package to adjust for the hidden factors of variability in differential gene expression studies based on RNAseq data

The package contains functions that are intended for the identification of differentially expressed genes between two groups of samples from RNAseq data after adjusting for various hidden biological and technical factors of variability.

CancerSubtypes Cancer subtypes identification, validation and visualization based on genomic data

CancerSubtypes integrates the current common computational biology methods for cancer subtypes identification and provides a standardized framework for cancer subtype analysis based on the genomic datasets.

GRmetrics Calculate growth-rate inhibition (GR) metrics

Functions for calculating and visualizing growth-rate inhibition (GR) metrics.

switchde Switch-like differential expression across single-cell trajectories

Inference and detection of switch-like differential expression across single-cell RNA-seq trajectories.

CellMapper Predict genes expressed selectively in specific cell types

Infers cell type-specific expression based on co-expression similarity with known cell type marker genes. Can make accurate predictions using publicly available expression data, even when a cell type has not been isolated before.

IPO Automated Optimization of XCMS Data Processing parameters

The outcome of XCMS data processing strongly depends on the parameter settings. IPO (`Isotopologue Parameter Optimization`) is a parameter optimization tool that is applicable for different kinds of samples and liquid chromatography coupled to high resolution mass spectrometry devices, fast and free of labeling steps. IPO uses natural, stable 13C isotopes to calculate a peak picking score. Retention time correction is optimized by minimizing the relative retention time differences within features and grouping parameters are optimized by maximizing the number of features showing exactly one peak from each injection of a pooled sample. The different parameter settings are achieved by design of experiment. The resulting scores are evaluated using response surface models.

CVE Cancer Variant Explorer

Shiny app for interactive variant prioritisation in precision cancer medicine. The input file for CVE is the output file of the recently released Oncotator Variant Annotation tool summarising variant-centric information from 14 different publicly available resources relevant for cancer researches. Interactive priortisation in CVE is based on known germline and cancer variants, DNA repair genes and functional prediction scores. An optional feature of CVE is the exploration of the tumour-specific pathway context that is facilitated using co-expression modules generated from publicly available transcriptome data. Finally druggability of prioritised variants is assessed using the Drug Gene Interaction Database (DGIdb).

BayesKnockdown BayesKnockdown: Posterior Probabilities for Edges from Knockdown Data

A simple, fast Bayesian method for computing posterior probabilities for relationships between a single predictor variable and multiple potential outcome variables, incorporating prior probabilities of relationships. In the context of knockdown experiments, the predictor variable is the knocked-down gene, while the other genes are potential targets. Can also be used for differential expression/2-class data.

recount Explore and download data from the recount project

Explore and download data from the recount project available at https://jhubiostatistics.shinyapps.io/recount/. Using the recount package you can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study. The RangedSummarizedExperiment objects can be used by different packages for performing differential expression analysis. Using http://bioconductor.org/packages/derfinder you can perform annotation-agnostic differential expression analyses with the data from the recount project as described at http://biorxiv.org/content/early/2016/08/08/068478.

ASpli Analysis of alternative splicing using RNA-Seq

Integrative pipeline for the analyisis of alternative splicing using RNAseq.

ctsGE Clustering of Time Series Gene Expression data

Methodology for supervised clustering of potentially many predictor variables, such as genes etc., in time series datasets Provides functions that help the user assigning genes to predefined set of model profiles.

MODA MODA: MOdule Differential Analysis for weighted gene co-expression network

MODA can be used to estimate and construct condition-specific gene co-expression networks, and identify differentially expressed subnetworks as conserved or condition specific modules which are potentially associated with relevant biological processes.

SNPediaR Query data from SNPedia

SNPediaR provides some tools for downloading and parsing data from the SNPedia web site . The implemented functions allow users to import the wiki text available in SNPedia pages and to extract the most relevant information out of them. If some information in the downloaded pages is not automatically processed by the library functions, users can easily implement their own parsers to access it in an efficient way.

GeneGeneInteR Tools for Testing Gene-Gene Interaction at the Gene Level

The aim of this package is to propose several methods for testing gene-gene interaction in case-control association studies. Such a test can be done by aggregating SNP-SNP interaction tests performed at the SNP level (SSI) or by using gene-gene multidimensionnal methods (GGI) methods. The package also proposes tools for a graphic display of the results.

ImpulseDE Detection of DE genes in time series data using impulse models

ImpulseDE is suited to capture single impulse-like patterns in high throughput time series datasets. By fitting a representative impulse model to each gene, it reports differentially expressed genes whether across time points in a single experiment or between two time courses from two experiments. To optimize the running time, the code makes use of clustering steps and multi-threading.

RCAS RNA Centric Annotation System

RCAS is an automated system that provides dynamic genome annotations for custom input files that contain transcriptomic regions. Such transcriptomic regions could be, for instance, peak regions detected by CLIP-Seq analysis that detect protein-RNA interactions, RNA modifications (alias the epitranscriptome), CAGE-tag locations, or any other collection of target regions at the level of the transcriptome. RCAS is designed as a reporting tool for the functional analysis of RNA-binding sites detected by high-throughput experiments. It takes as input a BED format file containing the genomic coordinates of the RNA binding sites and a GTF file that contains the genomic annotation features usually provided by publicly available databases such as Ensembl and UCSC. RCAS performs overlap operations between the genomic coordinates of the RNA binding sites and the genomic annotation features and produces in-depth annotation summaries such as the distribution of binding sites with respect to gene features (exons, introns, 5'/3' UTR regions, exon-intron boundaries, promoter regions, and whole transcripts). Moreover, by detecting the collection of targeted transcripts, RCAS can carry out functional annotation tables for enriched gene sets (annotated by the Molecular Signatures Database) and GO terms. As one of the most important questions that arise during protein-RNA interaction analysis; RCAS has a module for detecting sequence motifs enriched in the targeted regions of the transcriptome. A full interactive report in HTML format can be generated that contains interactive figures and tables that are ready for publication purposes.

ASAFE Ancestry Specific Allele Frequency Estimation

Given admixed individuals' bi-allelic SNP genotypes and ancestry pairs (where each ancestry can take one of three values) for multiple SNPs, perform an EM algorithm to deal with the fact that SNP genotypes are unphased with respect to ancestry pairs, in order to estimate ancestry-specific allele frequencies for all SNPs.

rDGIdb R Wrapper for DGIdb

The rDGIdb package provides a wrapper for the Drug Gene Interaction Database (DGIdb). For simplicity, the wrapper query function and output resembles the user interface and results format provided on the DGIdb website (http://dgidb.genome.wustl.edu/).

methylKit DNA methylation analysis from high-throughput bisulfite sequencing results

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Perl is needed to read SAM files only.

DeepBlueR DeepBlueR

Accessing the DeepBlue Epigenetics Data Server through R.

LOBSTAHS Lipid and Oxylipin Biomarker Screening through Adduct Hierarchy Sequences

LOBSTAHS is a multifunction package for screening, annotation, and putative identification of mass spectral features in large, HPLC-MS lipid datasets. In silico data for a wide range of lipids, oxidized lipids, and oxylipins can be generated from user-supplied structural criteria with a database generation function. LOBSTAHS then applies these databases to assign putative compound identities to features in any high-mass accuracy dataset that has been processed using xcms and CAMERA. Users can then apply a series of orthogonal screening criteria based on adduct ion formation patterns, chromatographic retention time, and other properties, to evaluate and assign confidence scores to this list of preliminary assignments. During the screening routine, LOBSTAHS rejects assignments that do not meet the specified criteria, identifies potential isomers and isobars, and assigns a variety of annotation codes to assist the user in evaluating the accuracy of each assignment.

AMOUNTAIN Active modules for multilayer weighted gene co-expression networks: a continuous optimization approach

A pure data-driven gene network, weighted gene co-expression network (WGCN) could be constructed only from expression profile. Different layers in such networks may represent different time points, multiple conditions or various species. AMOUNTAIN aims to search active modules in multi-layer WGCN using a continuous optimization approach.

clusterExperiment Compare clusterings for single-cell sequencing

This package provides functions for running and comparing many different clusterings of single-cell sequencing data.

MGFR Marker Gene Finder in RNA-seq data

The package is designed to detect marker genes from RNA-seq data.

CytoML GatingML interface for openCyto

This package is designed to use GatingML2.0 as the standard format to exchange the gated data with other software platform.

fgsea Fast Gene Set Enrichment Analysis

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

qsea IP-seq data analysis and vizualization

qsea (quantitative sequencing enrichment analysis) was developed as the successor of the MEDIPS package for analyzing data derived from methylated DNA immunoprecipitation (MeDIP) experiments followed by sequencing (MeDIP-seq). However, qsea provides several functionalities for the analysis of other kinds of quantitative sequencing data (e.g. ChIP-seq, MBD-seq, CMS-seq and others) including calculation of differential enrichment between groups of samples.

ccmap Combination Connectivity Mapping

Finds drugs and drug combinations that are predicted to reverse or mimic gene expression signatures. These drugs might reverse diseases or mimic healthy lifestyles.

normr Normalization and difference calling in ChIP-seq data

Robust normalization and difference calling procedures for ChIP-seq and alike data. Read counts are modeled jointly as a binomial mixture model with a user-specified number of components. A fitted background estimate accounts for the effect of enrichment in certain regions and, therefore, represents an appropriate null hypothesis. This robust background is used to identify significantly enriched or depleted regions.

chromstaR Combinatorial and Differential Chromatin State Analysis for ChIP-Seq Data

This package implements functions for combinatorial and differential analysis of ChIP-seq data. It includes uni- and multivariate peak-calling, export to genome browser viewable files, and functions for enrichment analyses.

esetVis Visualizations of expressionSet Bioconductor object

Utility functions for visualization of expressionSet (or SummarizedExperiment) Bioconductor object, including spectral map, tsne and linear discriminant analysis. Static plot via the ggplot2 package or interactive via the ggvis or rbokeh packages are available.

MetCirc MetCirc - a workflow to analyse and visualise metabolomics data

MetCirc comprises a workflow to interactively explore metabolomics data: create MSP, bin m/z values, calculate similarity between precursors and visualise similarities.

crossmeta Cross Platform Meta-Analysis of Microarray Data

Implements cross-platform and cross-species meta-analyses of Affymentrix, Illumina, and Agilent microarray data. This package automates common tasks such as downloading, normalizing, and annotating raw GEO data. A user interface makes it easy to select control and treatment samples for each contrast and study. This input is used for subsequent surrogate variable analysis (models unaccounted sources of variation) and differential expression analysis. Final meta-analysis of differential expression values can include genes measured in only a subset of studies.

sights Statistics and dIagnostic Graphs for HTS

SIGHTS is a suite of normalization methods, statistical tests, and diagnostic graphical tools for high throughput screening (HTS) assays. HTS assays use microtitre plates to screen large libraries of compounds for their biological, chemical, or biochemical activity.

geneplast Evolutionary and plasticity analysis based on orthologous groups distribution

Geneplast is designed for evolutionary and plasticity analysis based on orthologous groups distribution in a given species tree. It uses Shannon information theory and orthologs abundance to estimate the Evolutionary Plasticity Index. Additionally, it implements the Bridge algorithm to determine the evolutionary root of a given gene based on its orthologs distribution.

covEB Empirical Bayes estimate of block diagonal covariance matrices

Using bayesian methods to estimate correlation matrices assuming that they can be written and estimated as block diagonal matrices. These block diagonal matrices are determined using shrinkage parameters that values below this parameter to zero.

MADSEQ Mosaic Aneuploidy Detection and Quantification using Massive Parallel Sequencing Data

The MADSEQ package provides a group of hierarchical Bayeisan models for the detection of mosaic aneuploidy, the inference of the type of aneuploidy and also for the quantification of the fraction of aneuploid cells in the sample.

GEM GEM: fast association study for the interplay of Gene, Environment and Methylation

Tools for analyzing EWAS, methQTL and GxE genome widely.

SRGnet SRGnet An R package for studying synergistic response genes from transcriptomics data

We developed SRMnet to analyze synergistic regulatory mechanisms in transcriptome profiles that act to enhance the overall cell response to combination of mutations, drugs or environmental exposure. This package can be used to identify regulatory modules downstream of synergistic response genes, prioritize synergistic regulatory genes that may be potential intervention targets, and contextualize gene perturbation experiments.

netprioR A model for network-based prioritisation of genes

A model for semi-supervised prioritisation of genes integrating network data, phenotypes and additional prior knowledge about TP and TN gene labels from the literature or experts.

Pigengene Infers biological signatures from gene expression data

Pigengene package provides an efficient way to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., the input can be microarray or RNA Seq data. It can even infer the signatures using data from one platform, and evaluate them on the other. Pigengene identifies the modules (clusters) of highly coexpressed genes using coexpression network analysis, summarizes the biological information of each module in an eigengene, learns a Bayesian network that models the probabilistic dependencies between modules, and builds a decision tree based on the expression of eigengenes.

bioCancer Interactive Multi-Omics Cancers Data Visualization and Analysis

bioCancer is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.

msPurity Automated Evaluation of Precursor Ion Purity for Mass Spectrometry Based Fragmentation in Metabolomics

Assess the contribution of the targeted precursor in fragmentation acquired or anticipated isolation windows using a metric called "precursor purity". Also provides simple processing steps (averaging, filtering, blank subtraction, etc) for DI-MS data. Works for both LC-MS(/MS) and DI-MS(/MS) data.

covRNA Multivariate Analysis of Transcriptomic Data

This package provides the analysis methods fourthcorner and RLQ analysis for large-scale transcriptomic data.

dSimer Integration of Disease Similarity Methods

dSimer is an R package which provides computation of nine methods for measuring disease-disease similarity, including a standard cosine similarity measure and eight function-based methods. The disease similarity matrix obtained from these nine methods can be visualized through heatmap and network. Biological data widely used in disease-disease associations study are also provided by dSimer.

Director A dynamic visualization tool of multi-level data

Director is an R package designed to streamline the visualization of molecular effects in regulatory cascades. It utilizes the R package htmltools and a modified Sankey plugin of the JavaScript library D3 to provide a fast and easy, browser-enabled solution to discovering potentially interesting downstream effects of regulatory and/or co-expressed molecules. The diagrams are robust, interactive, and packaged as highly-portable HTML files that eliminate the need for third-party software to view. This enables a straightforward approach for scientists to interpret the data produced, and bioinformatics developers an alternative means to present relevant data.

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:

 

Development Version »

Bioconductor packages under development:

Developer Resources: