This is a list of the last 100 packages added to Bioconductor and available in the development version of Bioconductor. The list is also available as an RSS Feed.

Onassis OnASSIs Ontology Annotation and Semantic SImilarity software

A package that allows the annotation of text with ontology terms (mainly from OBO ontologies) and the computation of semantic similarity measures based on the structure of the ontology between different annotated samples.

zFPKM A suite of functions to facilitate zFPKM transformations

Perform the zFPKM transform on RNA-seq FPKM data. This algorithm is based on the publication by Hart et al., 2013 (Pubmed ID 24215113). Reference recommends using zFPKM > -3 to select expressed genes. Validated with encode open/closed chromosome data. Works well for gene level data using FPKM or TPM. Does not appear to calibrate well for transcript level data.

CEMiTool Finds Gene Co-expression Modules

CEMiTool improves WGCNA gene co-expression module discovery by optimizing network parameter selection. This package also perfoms enrichment analysis with the discovered modules.

RVS Computes estimates of the probability of related individuals sharing a rare variant

Rare Variant Sharing (RVS) implements tests of association and linkage between rare genetic variant genotypes and a dichotomous phenotype, e.g. a disease status, in family samples. The tests are based on probabilities of rare variant sharing by relatives under the null hypothesis of absence of linkage and association between the rare variants and the phenotype and apply to single variants or multiple variants in a region (e.g. gene-based test).

BiocSklearn interface to python sklearn via Rstudio reticulate

This package provides interfaces to selected sklearn elements.

rexposome Exposome exploration and outcome data analysis

Package that allows to explore the exposome and to perform association analyses between exposures and health outcomes.

GENIE3 GEne Network Inference with Ensemble of trees

This package implements the GENIE3 algorithm for inferring gene regulatory networks from expression data.

SCnorm Normalization of single cell RNA-seq data

This package implements SCnorm — a method to normalize single-cell RNA-seq data.

HiCcompare HiCcompare: Joint normalization and comparative analysis of multiple Hi-C datasets

HiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. It accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources. HiCcompare is designed to give the user the ability to perform a comparative analysis on the 3-Dimensional structure of the genomes of cells in different biological states.`HiCcompare` differs from other packages that attempt to compare Hi-C data in that it works on processed data in chromatin interaction matrix format instead of pre-processed sequencing data. In addition, `HiCcompare` provides a non-parametric method for the joint normalization and removal of biases between two Hi-C datasets for the purpose of comparative analysis. `HiCcompare` also provides a simple yet robust permutation method for detecting differences between Hi-C datasets.

EpiDISH Epigenetic Dissection of Intra-Sample-Heterogeneity

EpiDISH is a R package to infer the proportions of a priori known cell subtypes present in a sample representing a mixture of such cell-types. Inference proceeds via one of 3 methods (Robust Partial Correlations-RPC, Cibersort (CBS), Constrained Projection (CP)), as determined by user.

epivizrChart R interface to epiviz web components

This package provides an API for interactive visualization of genomic data using epiviz web components. Objects in R/BioConductor can be used to generate interactive R markdown/notebook documents or can be visualized in the R Studio's default viewer.

oneSENSE One-Dimensional Soli-Expression by Nonlinear Stochastic Embedding (OneSENSE)

A graphical user interface that facilitates the dimensional reduction method based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm, for categorical analysis of mass cytometry data. With One-SENSE, measured parameters are grouped into predefined categories, and cells are projected onto a space composed of one dimension for each category. Each dimension is informative and can be annotated through the use of heatplots aligned in parallel to each axis, allowing for simultaneous visualization of two catergories across a two-dimensional plot. The cellular occupancy of the resulting plots alllows for direct assessment of the relationships between the categories.

diffuStats Diffusion scores on biological networks

Label propagation approaches are a widely used procedure in computational biology for giving context to molecular entities using network data. Node labels, which can derive from gene expression, genome-wide association studies, protein domains or metabolomics profiling, are propagated to their neighbours in the network, effectively smoothing the scores through prior annotated knowledge and prioritising novel candidates. The R package diffuStats contains a collection of diffusion kernels and scoring approaches that facilitates their computation and benchmarking.

beachmat Compiling Bioconductor to Handle Each Matrix Type

Provides a consistent C++ class interface for a variety of commonly used matrix types, including sparse and HDF5-backed matrices.

consensusOV Gene expression-based subtype classification for high-grade serous ovarian cancer

This package implements four major subtype classifiers for high-grade serous (HGS) ovarian cancer as described by Helland et al. (PLoS One, 2011), Bentink et al. (PLoS One, 2012), Verhaak et al. (J Clin Invest, 2013), and Konecny et al. (J Natl Cancer Inst, 2014). In addition, the package implements a consensus classifier, which consolidates and improves on the robustness of the proposed subtype classifiers, thereby providing reliable stratification of patients with HGS ovarian tumors of clearly defined subtype.

SingleCellExperiment S4 Classes for Single Cell Data

Defines a S4 class for storing data from single-cell experiments. This includes specialized methods to store and retrieve spike-in information, dimensionality reduction coordinates and size factors for each cell, along with the usual metadata for genes and libraries.

ClusterJudge Judging Quality of Clustering Methods using Mutual Information

ClusterJudge implements the functions, examples and other software published as an algorithm by Gibbons, FD and Roth FP. The article is called "Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation" and it appeared in Genome Research, vol. 12, pp1574-1581 (2002). See package?ClusterJudge for an overview.

OPWeight Optimal p-value weighting with independent information

This package perform weighted-pvalue based multiple hypothesis test and provides corresponding information such as ranking probability, weight, significant tests, etc . To conduct this testing procedure, the testing method apply a probabilistic relationship between the test rank and the corresponding test effect size.

phenopath Genomic trajectories with heterogeneous genetic and environmental backgrounds

PhenoPath infers genomic trajectories (pseudotimes) in the presence of heterogeneous genetic and environmental backgrounds and tests for interactions between them.

scmap A tool for unsupervised projection of single cell RNA-seq data

Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues since the technology allows researchers to define cell-types using unsupervised clustering of the transcriptome. However, due to differences in experimental methods and computational analyses, it is often challenging to directly compare the cells identified in two different experiments. scmap is a method for projecting cells from a scRNA-seq experiment on to the cell-types identified in a different experiment.

scoreInvHap Get inversion status in predefined regions

scoreInvHap can get the samples' inversion status of known inversions. SNPfier uses SNP data as input and requires the following information about the inversion: genotype frequencies in the different haplotypes, R2 between the region SNPs and inversion status and heterozygote genotypes in the reference. The package include this data for two well known inversions (8p23 and 17q21.31) and for two additional regions.

Rhdf5lib hdf5 library as an R package

Provides C and C++ hdf5 libraries.

RProtoBufLib C++ headers and static libraries of Protocol buffers

This package provides the headers and static library of Protocol buffers 2.6.0 for other R packages to compile and link against.

mfa Bayesian hierarchical mixture of factor analyzers for modelling genomic bifurcations

MFA models genomic bifurcations using a Bayesian hierarchical mixture of factor analysers.

coexnet coexnet: An R package to build CO-EXpression NETworks from Microarray Data

Extracts the gene expression matrix from GEO DataSets (.CEL files) as a AffyBatch object. Additionally, can make the normalization process using two different methods (vsn and rma). The summarization (pass from multi-probe to one gene) uses two different criteria (Maximum value and Median of the samples expression data) and the process of gene differentially expressed analisys using two methods (sam and acde). The construction of the co-expression network can be conduced using two different methods, Pearson Correlation Coefficient (PCC) or Mutual Information (MI) and choosing a threshold value using a graph theory approach.

RTNsurvival Survival analysis using transcriptional networks inferred by the RTN package

RTNsurvival is a tool for integrating regulons generated by the RTN package with survival information for the same set of samples (i.e. a cohort). It visualizes the integrated data using 2-tailed Gene Set Enrichment Analysis (GSEA2) approach and survival plots [@Castro2016]. This package is implemented using S4 classes and methods for plots and analyses. There are two main survival analyses generated by RTNsurvival: a Cox Proportional Hazards approach used to model regulons as predictors of survival time, and a Kaplan-Meier analysis showing the stratification of a cohort based on the regulon activity. For a given regulon, the 2-tailed GSEA approach computes a differential Enrichment Score (dES) for each individual sample, and the dES distribution of all samples is then used to assess the survival statistics for the cohort. The plots can be fine-tuned to the user's specifications.

zinbwave Zero-Inflated Negative Binomial Model for RNA-Seq Data

Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.

Scale4C Scale4C: an R/Bioconductor package for scale-space transformation of 4C-seq data

Scale4C is an R/Bioconductor package for scale-space transformation and visualization of 4C-seq data. The scale-space transformation is a multi-scale visualization technique to transform a 2D signal (e.g. 4C-seq reads on a genomic interval of choice) into a tesselation in the scale space (2D, genomic position x scale factor) by applying different smoothing kernels (Gauss, with increasing sigma). This transformation allows for explorative analysis and comparisons of the data's structure with other samples.

IsoformSwitchAnalyzeR An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)

Since 2010 state-of-the-art bioinformatics tools have allowed researchers to reconstruct and quantify full length transcripts from RNA-seq data. Such geneome wide isoform resolution data has the potential to facilitate both genome wide analysis of alternative isoform usage and identification of isoform switching. Unfortunatly these types of analysis are still only rarely done and/or reported - in fact only 11% of articles analyzing RNA-seq data publised start 2016 performed analy isoform analysis. We hypothesis that there are 3 reasons why RNA-seq data is not used to its full potential: 1) There is still a lack of tools that can identify isoform switches with isoform resolution - thereby identifying the exact isoforms involved in a switch. 2) Although there are many very good tools to perform sequence analysis there is no common framework, which allows for integration of the analysis provided by these tools. 3) There is a lack of tools facilitating easy and article ready visual visualization of isoform switches. To all 3 problems we developed IsoformSwitchAnalyzeR. IsoformSwitchAnalyzeR is an easy to use R package that enableswhich enables statistical identification as well as visualization of isoform switches with predicted functional consequences from RNA-seq data

motifmatchr Fast Motif Matching in R

Quickly find motif matches for many motifs and many sequences. Wraps C++ code from the MOODS motif calling library, which was developed by Pasi Rastas, Janne Korhonen, and Petri Martinmäki.

miRBaseConverter A comprehensive and high-efficiency tool for converting and retrieving the information of miRNAs in different miRBase versions

A comprehensive tool for converting and retrieving the miRNA Name, Accession, Sequence, Version, History and Family information in different miRBase versions. It can process a huge number of miRNAs in a short time without other depends.

methInheritSim Simulating Whole-Genome Inherited Bisulphite Sequencing Data

Simulate a multigeneration methylation case versus control experiment with inheritance relation using a real control dataset.

IntramiRExploreR Predicting Targets for Drosophila Intragenic miRNAs

Intra-miR-ExploreR, an integrative miRNA target prediction bioinformatics tool, identifies targets combining expression and biophysical interactions of a given microRNA (miR). Using the tool, we have identified targets for 92 intragenic miRs in D. melanogaster, using available microarray expression data, from Affymetrix 1 and Affymetrix2 microarray array platforms, providing a global perspective of intragenic miR targets in Drosophila. Predicted targets are grouped according to biological functions using the DAVID Gene Ontology tool and are ranked based on a biologically relevant scoring system, enabling the user to identify functionally relevant targets for a given miR.

ndexr NDEx R client library

This package offers an interface to NDEx servers, e.g. the public server at http://ndexbio.org/. It can retrieve and save networks via the API. Networks are offered as RCX object and as igraph representation.

DASC Detecting hidden batch factors through data adaptive adjustment for biological effects

The package is used for identifying batches in high-dimensional dataset.

GA4GHshiny Shiny application for interacting with GA4GH-based data servers

GA4GHshiny package provides an easy way to interact with data servers based on Global Alliance for Genomics and Health (GA4GH) genomics API through a Shiny application. It also integrates with Beacon Network.

msgbsR msgbsR: methylation sensitive genotyping by sequencing (MS-GBS) R functions

Pipeline for the anaysis of a MS-GBS experiment.

TSRchitect Promoter identification from large-scale TSS profiling data

In recent years, large-scale transcriptional sequence data has yielded considerable insights into the nature of gene expression and regulation in eukaryotes. Techniques that identify the 5' end of mRNAs, most notably CAGE, have mapped the promoter landscape across a number of model organisms. Due to the variability of TSS distributions and the transcriptional noise present in datasets, precisely identifying the active promoter(s) for genes from these datasets is not straightforward. TSRchitect allows the user to efficiently identify the putative promoter (the transcription start region, or TSR) from a variety of TSS profiling data types, including both single-end (e.g. CAGE) as well as paired-end (RAMPAGE, PEAT). Along with the coordiantes of identified TSRs, TSRchitect also calculates the width, abundance and Shape Index, and handles biological replicates for expression profiling. Finally, TSRchitect imports annotation files, allowing the user to associate identified promoters with genes and other genomic features. Three detailed examples of TSRchitect's utility are provided in the User's Guide, included with this package.

RITAN Rapid Integration of Term Annotation and Network resources

Tools for comprehensive gene set enrichment and extraction of multi-resource high confidence subnetworks.

pgca PGCA: An Algorithm to Link Protein Groups Created from MS/MS Data

Protein Group Code Algorithm (PGCA) is a computationally inexpensive algorithm to merge protein summaries from multiple experimental quantitative proteomics data. The algorithm connects two or more groups with overlapping accession numbers. In some cases, pairwise groups are mutually exclusive but they may still be connected by another group (or set of groups) with overlapping accession numbers. Thus, groups created by PGCA from multiple experimental runs (i.e., global groups) are called "connected" groups. These identified global protein groups enable the analysis of quantitative data available for protein groups instead of unique protein identifiers.

ATACseqQC ATAC-seq Quality Control

ATAC-seq, an assay for Transposase-Accessible Chromatin using sequencing, is a rapid and sensitive method for chromatin accessibility analysis. It was developed as an alternative method to MNase-seq, FAIRE-seq and DNAse-seq. Comparing to the other methods, ATAC-seq requires less amount of the biological samples and time to process. In the process of analyzing several ATAC-seq dataset produced in our labs, we learned some of the unique aspects of the quality assessment for ATAC-seq data.To help users to quickly assess whether their ATAC-seq experiment is successful, we developed ATACseqQC package partially following the guideline published in Nature Method 2013 (Greenleaf et al.), including diagnostic plot of fragment size distribution, proportion of mitochondria reads, nucleosome positioning pattern, and CTCF or other Transcript Factor footprints.

heatmaps Flexible Heatmaps for Functional Genomics and Sequence Features

This package provides functions for plotting heatmaps of genome-wide data across genomic intervals, such as ChIP-seq signals at peaks or across promoters. Many functions are also provided for investigating sequence features.

ideal Interactive Differential Expression AnaLysis

This package provides functions for an Interactive Differential Expression AnaLysis of RNA-sequencing datasets, to extract quickly and effectively information downstream the step of differential expression. A Shiny application encapsulates the whole package.

PPInfer Inferring functionally related proteins using protein interaction networks

Interactions between proteins occur in many, if not most, biological processes. Most proteins perform their functions in networks associated with other proteins and other biomolecules. This fact has motivated the development of a variety of experimental methods for the identification of protein interactions. This variety has in turn ushered in the development of numerous different computational approaches for modeling and predicting protein interactions. Sometimes an experiment is aimed at identifying proteins closely related to some interesting proteins. A network based statistical learning method is used to infer the putative functions of proteins from the known functions of its neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions.

cellbaseR Querying annotation data from the high performance Cellbase web services

This R package makes use of the exhaustive RESTful Web service API that has been implemented for the Cellabase database. It enable researchers to query and obtain a wealth of biological information from a single database saving a lot of time. Another benefit is that researchers can easily make queries about different biological topics and link all this information together as all information is integrated.

metavizr R Interface to the metaviz web app for interactive metagenomics data analysis and visualization

This package provides Websocket communication to the metaviz web app (http://metaviz.cbcb.umd.edu) for interactive visualization of metagenomics data. Objects in R/bioc interactive sessions can be displayed in plots and data can be explored using a facetzoom visualization. Fundamental Bioconductor data structures are supported (e.g., MRexperiment objects), while providing an easy mechanism to support other data structures. Visualizations (using d3.js) can be easily added to the web app as well.

BLMA BLMA: A package for bi-level meta-analysis

Suit of tools for bi-level meta-analysis. The package can be used in a wide range of applications, including general hypothesis testings, differential expression analysis, functional analysis, and pathway analysis.

biotmle Targeted Learning for Biomarker Discovery with Moderated Statistics

This package facilitates the discovery of biomarkers from biological sequencing data (e.g., microarrays, RNA-seq) based on the associations of potential biomarkers with exposure and outcome variables by implementing an estimation procedure that combines a generalization of the moderated t-statistic with asymptotically linear statistical parameters estimated via targeted minimum loss-based estimation (TMLE).

methylInheritance Permutation-Based Analysis associating Conserved Differentially Methylated Elements from One Generation to the Next to a Treatment Effect

Permutation analysis, based on Monte Carlo sampling, for testing the hypothesis that the number of conserved differentially methylated elements, between several generations, is associated to an effect inherited from a treatment and that stochastic effect can be dismissed.

ImpulseDE2 Differential expression analysis of longitudinal count data sets

ImpulseDE2 is a differential expression algorithm for longitudinal count data sets which arise in sequencing experiments such as RNA-seq, ChIP-seq, ATAC-seq and DNaseI-seq. ImpulseDE2 is based on a negative binomial noise model with dispersion trend smoothing by DESeq2 and uses the impulse model to constrain the mean expression trajectory of each gene. The impulse model was empirically found to fit global expression changes in cells after environmental and developmental stimuli and is therefore appropriate in most cell biological scenarios. The constraint on the mean expression trajectory prevents overfitting to small expression fluctuations. Secondly, ImpulseDE2 has higher statistical testing power than generalized linear model-based differential expression algorithms which fit time as a categorial variable if more than six time points are sampled because of the fixed number of parameters.

semisup Detecting SNPs with interactive effects on a quantitative trait

This R packages moves away from testing interaction terms, and move towards testing whether an individual SNP is involved in any interaction. This reduces the multiple testing burden to one test per SNP, and allows for interactions with unobserved factors. Analysing one SNP at a time, it splits the individuals into two groups, based on the number of minor alleles. If the quantitative trait differs in mean between the two groups, the SNP has a main effect. If the quantitative trait differs in distribution between some individuals in one group and all other individuals, it possibly has an interactive effect. Implicitly, the membership probabilities may suggest potential interacting variables.

IMAS Integrative analysis of Multi-omics data for Alternative Splicing

Integrative analysis of Multi-omics data for Alternative splicing.

GenomicScores Infrastructure to work with genomewide position-specific scores

Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.

CATALYST Cytometry dATa anALYSis Tools

Mass cytometry (CyTOF) uses heavy metal isotopes rather than fluorescent tags as reporters to label antibodies, thereby substantially decreasing spectral overlap and allowing for examination of over 50 parameters at the single cell level. While spectral overlap is significantly less pronounced in CyTOF than flow cytometry, spillover due to detection sensitivity, isotopic impurities, and oxide formation can impede data interpretability. We designed CATALYST (Cytometry dATa anALYSis Tools) to provide a pipeline for preprocessing of cytometry data, including i) normalization using bead standards, ii) single-cell deconvolution, and iii) bead-based compensation.

basecallQC Working with Illumina Basecalling and Demultiplexing input and output files

The basecallQC package provides tools to work with Illumina bcl2Fastq (versions >= 2.1.7) software.Prior to basecalling and demultiplexing using the bcl2Fastq software, basecallQC functions allow the user to update Illumina sample sheets from versions <= 1.8.9 to >= 2.1.7 standards, clean sample sheets of common problems such as invalid sample names and IDs, create read and index basemasks and the bcl2Fastq command. Following the generation of basecalled and demultiplexed data, the basecallQC packages allows the user to generate HTML tables, plots and a self contained report of summary metrics from Illumina XML output files.

RaggedExperiment Representation of Sparse Experiments and Assays Across Samples

This package provides a flexible representation of copy number, mutation, and other data that fit into the ragged array schema for genomic location data. The basic representation of such data provides a rectangular flat table interface to the user with range information in the rows and samples/specimen in the columns.

banocc Bayesian ANalysis Of Compositional Covariance

BAnOCC is a package designed for compositional data, where each sample sums to one. It infers the approximate covariance of the unconstrained data using a Bayesian model coded with `rstan`. It provides as output the `stanfit` object as well as posterior median and credible interval estimates for each correlation element.

BioMedR Generating Various Molecular Representations for Chemicals, Proteins, DNAs/RNAs and Their Interactions

The BioMedR package offers an R/Bioconductor package generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions.

RIVER R package for RIVER (RNA-Informed Variant Effect on Regulation)

An implementation of a probabilistic modeling framework that jointly analyzes personal genome and transcriptome data to estimate the probability that a variant has regulatory impact in that individual. It is based on a generative model that assumes that genomic annotations, such as the location of a variant with respect to regulatory elements, determine the prior probability that variant is a functional regulatory variant, which is an unobserved variable. The functional regulatory variant status then influences whether nearby genes are likely to display outlier levels of gene expression in that person. See the RIVER website for more information, documentation and examples.

GenomicDataCommons NIH / NCI Genomic Data Commons Access

Programmatically access the NIH / NCI Genomic Data Commons RESTful service.

NADfinder Call wide peaks for sequencing data

Nucleolus is an important structure inside the nucleus in eukaryotic cells. It is the site for transcribing rDNA into rRNA and for assembling ribosomes, aka ribosome biogenesis. In addition, nucleoli are dynamic hubs through which numerous proteins shuttle and contact specific non-rDNA genomic loci. Deep sequencing analyses of DNA associated with isolated nucleoli (NAD- seq) have shown that specific loci, termed nucleolar- associated domains (NADs) form frequent three- dimensional associations with nucleoli. NAD-seq has been used to study the biological functions of NAD and the dynamics of NAD distribution during embryonic stem cell (ESC) differentiation. Here, we developed a Bioconductor package NADfinder for bioinformatic analysis of the NAD-seq data, including normalization, smoothing, peak calling, peak trimming and annotation.

AnnotationFilter Facilities for Filtering Bioconductor Annotation Resources

This package provides class and other infrastructure to implement filters for manipulating Bioconductor annotation resources. The filters will be used by ensembldb, Organism.dplyr, and other packages.

wiggleplotr Make read coverage plots from BigWig files

Tools to visualise read coverage from sequencing experiments together with genomic annotations (genes, transcripts, peaks). Introns of long transcripts can be rescaled to a fixed length for better visualisation of exonic read coverage.

flowTime Annotation and analysis of biological dynamical systems using flow cytometry

This package was developed for analysis of both dynamic and steady state experiments examining the function of gene regulatory networks in yeast (strain W303) expressing fluorescent reporter proteins using a BD Accuri C6 and SORP cytometers. However, the functions are for the most part general and may be adapted for analysis of other organisms using other flow cytometers. Functions in this package facilitate the annotation of flow cytometry data with experimental metadata, as is requisite for dissemination and general ease-of-use. Functions for creating, saving and loading gate sets are also included. In the past, we have typically generated summary statistics for each flowset for each timepoint and then annotated and analyzed these summary statistics. This method loses a great deal of the power that comes from the large amounts of individual cell data generated in flow cytometry, by essentially collapsing this data into a bulk measurement after subsetting. In addition to these summary functions, this package also contains functions to facilitate annotation and analysis of steady-state or time-lapse data utilizing all of the data collected from the thousands of individual cells in each sample.

STROMA4 Assign Properties to TNBC Patients

This package estimates four stromal properties identified in TNBC patients in each patient of a gene expression datasets. These stromal property assignments can be combined to subtype patients. These four stromal properties were identified in Triple negative breast cancer (TNBC) patients and represent the presence of different cells in the stroma: T-cells (T), B-cells (B), stromal infiltrating epithelial cells (E), and desmoplasia (D). Additionally this package can also be used to estimate generative properties for the Lehmann subtypes, an alternative TNBC subtyping scheme (PMID: 21633166).

GA4GHclient A Bioconductor package for accessing GA4GH API data servers

GA4GHclient provides an easy way to access public data servers through Global Alliance for Genomics and Health (GA4GH) genomics API. It provides low-level access to GA4GH API and translates response data into Bioconductor-based class objects.

netReg Network-Regularized Regression Models

netReg fits linear regression models using network-penalization. Graph prior knowledge, in the form of biological networks, is being incorporated into the likelihood of the linear model. The networks describe biological relationships such as co-regulation or dependency of the same transcription factors/metabolites/etc. yielding a part sparse and part smooth solution for coefficient profiles.

cydar Using Mass Cytometry for Differential Abundance Analyses

Identifies differentially abundant populations between samples and groups in mass cytometry data. Provides methods for counting cells into hyperspheres, controlling the spatial false discovery rate, and visualizing changes in abundance in the high-dimensional marker space.

motifcounter R package for analysing TFBSs in DNA sequences

'motifcounter' provides functionality to compute the statistics related with motif matching and counting of motif matches in DNA sequences. As an input, 'motifcounter' requires a motif in terms of a position frequency matrix (PFM). Furthermore, a set of DNA sequences is required to estimated a higher-order background model (BGM). The package provides functions to investigate the the per-position and per strand log-likelihood scores between the PFM and the BGM across a given sequence of set of sequences. Furthermore, the package facilitates motif matching based on an automatically derived score threshold. To this end the distribution of scores is efficiently determined and the score threshold is chosen for a user-prescribed significance level. This allows to control for the false positive rate. Moreover, 'motifcounter' implements a motif match enrichment test based on two the number of motif matches that are expected in random DNA sequences. Motif enrichment is facilitated by either a compound Poisson approximation or a combinatorial approximation of the motif match counts. Both models take higher-order background models, the motif's self-similarity, and hits on both DNA strands into account. The package is in particular useful for long motifs and/or relaxed choices of score thresholds, because the implemented algorithms efficiently bypass the need for enumerating a (potentially huge) set of DNA words that can give rise to a motif match.

BiocFileCache Manage Files Across Sessions

This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.

BioCor Functional similarities

Calculates functional similarities based on the pathways described on KEGG and REACTOME or in gene sets. These similarities can be calculated for pathways or gene sets, genes, or clusters and combined with other similarities. They can be used to improve networks, gene selection, testing relationships...

DaMiRseq Data Mining for RNA-seq data: normalization, feature selection and classification

The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for classification purposes.. The package accepts any kind of data presented as a table of raw counts and allows including covariates that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a ``Stacking'' ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.

branchpointer Prediction of intronic splicing branchpoints

Predicts branchpoint probability for sites in intronic branchpoint windows. Queries can be supplied as intronic regions; or to evaluate the effects of mutations, SNPs.

MaxContrastProjection Perform a maximum contrast projection of 3D images along the z-dimension into 2D

A problem when recording 3D fluorescent microscopy images is how to properly present these results in 2D. Maximum intensity projections are a popular method to determine the focal plane of each pixel in the image. The problem with this approach, however, is that out-of-focus elements will still be visible, making edges and fine structures difficult to detect. This package aims to resolve this problem by using the contrast around a given pixel to determine the focal plane, allowing for a much cleaner structure detection than would be otherwise possible. For convenience, this package also contains functions to perform various other types of projections, including a maximum intensity projection.

multiOmicsViz Plot the effect of one omics data on other omics data along the chromosome

Calculate the spearman correlation between the source omics data and other target omics data, identify the significant correlations and plot the significant correlations on the heat map in which the x-axis and y-axis are ordered by the chromosomal location.

pathprint Pathway fingerprinting for analysis of gene expression arrays

Algorithms to convert a gene expression array provided as an expression table or a GEO reference to a 'pathway fingerprint', a vector of discrete ternary scores representing high (1), low(-1) or insignificant (0) expression in a suite of pathways.

IntEREst Intron-Exon Retention Estimator

This package performs Intron-Exon Retention analysis on RNA-seq data (.bam files).

epiNEM epiNEM

epiNEM is an extension of the original Nested Effects Models (NEM). EpiNEM is able to take into account double knockouts and infer more complex network signalling pathways.

hicrep Measuring the reproducibility of Hi-C data

Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance-dependence. We present a novel reproducibility measure that systematically takes these features into consideration. This measure can assess pairwise differences between Hi-C matrices under a wide range of settings, and can be used to determine optimal sequencing depth. Compared to existing approaches, it consistently shows higher accuracy in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages than existing approaches. This R package `hicrep` implements our approach.

GISPA GISPA: Method for Gene Integrated Set Profile Analysis

GISPA is a method intended for the researchers who are interested in defining gene sets with similar, a priori specified molecular profile. GISPA method has been previously published in Nucleic Acid Research (Kowalski et al., 2016; PMID: 26826710).

rqt rqt: utilities for gene-level meta-analysis

Despite the recent advances of modern GWAS methods, it still remains an important problem of addressing calculation an effect size and corresponding p-value for the whole gene rather than for single variant. The R- package rqt offers gene-level GWAS meta-analysis. For more information, see: "Gene-set association tests for next-generation sequencing data" by Lee et al (2016), Bioinformatics, 32(17), i611-i619, .

MIGSA Massive and Integrative Gene Set Analysis

Massive and Integrative Gene Set Analysis. The MIGSA package allows to perform a massive and integrative gene set analysis over several expression and gene sets simultaneously. It provides a common gene expression analytic framework that grants a comprehensive and coherent analysis. Only a minimal user parameter setting is required to perform both singular and gene set enrichment analyses in an integrative manner by means of the best available methods, i.e. dEnricher and mGSZrespectively. The greatest strengths of this big omics data tool are the availability of several functions to explore, analyze and visualize its results in order to facilitate the data mining task over huge information sources. MIGSA package also provides several functions that allow to easily load the most updated gene sets from several repositories.

Organism.dplyr dplyr-based Access to Bioconductor Annotation Resources

This package provides an alternative interface to Bioconductor 'annotation' resources, in particular the gene identifier mapping functionality of the 'org' packages (e.g., org.Hs.eg.db) and the genome coordinate functionality of the 'TxDb' packages (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).

EventPointer An effective identification of alternative splicing events using junction arrays and RNA-Seq data

EventPointer is an R package to identify alternative splicing events that involve either simple (case-control experiment) or complex experimental designs such as time course experiments and studies including paired-samples. The algorithm can be used to analyze data from either junction arrays (Affymetrix Arrays) or sequencing data (RNA-Seq). The software returns a data.frame with the detected alternative splicing events: gene name, type of event (cassette, alternative 3',...,etc), genomic position, statistical significance and increment of the percent spliced in (Delta PSI) for all the events. The algorithm can generate a series of files to visualize the detected alternative splicing events in IGV. This eases the interpretation of results and the design of primers for standard PCR validation.

twoddpcr Classify 2-d Droplet Digital PCR (ddPCR) data and quantify the number of starting molecules

The twoddpcr package takes Droplet Digital PCR (ddPCR) droplet amplitude data from Bio-Rad's QuantaSoft and can classify the droplets. A summary of the positive/negative droplet counts can be generated, which can then be used to estimate the number of molecules using the Poisson distribution. This is the first open source package that facilitates the automatic classification of general two channel ddPCR data. Previous work includes 'definetherain' (Jones et al., 2014) and 'ddpcRquant' (Trypsteen et al., 2015) which both handle one channel ddPCR experiments only. The 'ddpcr' package available on CRAN (Attali et al., 2016) supports automatic gating of a specific class of two channel ddPCR experiments only.

timescape Patient Clonal Timescapes

TimeScape is an automated tool for navigating temporal clonal evolution data. The key attributes of this implementation involve the enumeration of clones, their evolutionary relationships and their shifting dynamics over time. TimeScape requires two inputs: (i) the clonal phylogeny and (ii) the clonal prevalences. Optionally, TimeScape accepts a data table of targeted mutations observed in each clone and their allele prevalences over time. The output is the TimeScape plot showing clonal prevalence vertically, time horizontally, and the plot height optionally encoding tumour volume during tumour-shrinking events. At each sampling time point (denoted by a faint white line), the height of each clone accurately reflects its proportionate prevalence. These prevalences form the anchors for bezier curves that visually represent the dynamic transitions between time points.

cellscape Explores single cell copy number profiles in the context of a single cell tree

CellScape facilitates interactive browsing of single cell clonal evolution datasets. The tool requires two main inputs: (i) the genomic content of each single cell in the form of either copy number segments or targeted mutation values, and (ii) a single cell phylogeny. Phylogenetic formats can vary from dendrogram-like phylogenies with leaf nodes to evolutionary model-derived phylogenies with observed or latent internal nodes. The CellScape phylogeny is flexibly input as a table of source-target edges to support arbitrary representations, where each node may or may not have associated genomic data. The output of CellScape is an interactive interface displaying a single cell phylogeny and a cell-by-locus genomic heatmap representing the mutation status in each cell for each locus.

mapscape mapscape

MapScape integrates clonal prevalence, clonal hierarchy, anatomic and mutational information to provide interactive visualization of spatial clonal evolution. There are four inputs to MapScape: (i) the clonal phylogeny, (ii) clonal prevalences, (iii) an image reference, which may be a medical image or drawing and (iv) pixel locations for each sample on the referenced image. Optionally, MapScape can accept a data table of mutations for each clone and their variant allele frequencies in each sample. The output of MapScape consists of a cropped anatomical image surrounded by two representations of each tumour sample. The first, a cellular aggregate, visually displays the prevalence of each clone. The second shows a skeleton of the clonal phylogeny while highlighting only those clones present in the sample. Together, these representations enable the analyst to visualize the distribution of clones throughout anatomic space.

Logolas Flexible and Customized Logo Plots using symbols, alphabets, numbers and alphanumeric strings

Produces logo plots of a variety of symbols and names comprising English alphabets, numerics and punctuations. Can be used for sequence motif generation, mutation pattern generation, protein amino acid geenration and symbol strength representation in any generic context.

TCGAbiolinksGUI "TCGAbiolinksGUI: A Graphical User Interface to analyze cancer molecular and clinical data"

"TCGAbiolinksGUI: A Graphical User Interface to analyze cancer molecular and clinical data. A demo version of GUI is found in https://tcgabiolinksgui.shinyapps.io/tcgabiolinks/"

swfdr Science-wise false discovery rate and proportion of true null hypotheses estimation

This package allows users to estimate the science-wise false discovery rate from Jager and Leek, "Empirical estimates suggest most published medical research is true," 2013, Biostatistics, using an EM approach due to the presence of rounding and censoring. It also allows users to estimate the proportion of true null hypotheses in the presence of covariates, using a regression framework, as per Boca and Leek, "A regression framework for the proportion of true null hypotheses," 2015, bioRxiv preprint.

REMP Repetitive Element Methylation Prediction

Machine learing-based tools to predict DNA methylation of locus-specific repetitive elements (RE) by learning surrounding genetic and epigenetic information. These tools provide genomewide and single-base resolution of DNA methylation prediction on RE that are difficult to measure using array-based or sequencing-based platforms, which enables epigenome-wide association study (EWAS) and differentially methylated region (DMR) analysis on RE.

geneClassifiers Application of gene classifiers

This packages aims for easy accessible application of classifiers which have been published in literature using an ExpressionSet as input.

sampleClassifier Sample Classifier

The package is designed to classify gene expression profiles.

RTNduals Analysis of co-regulatory network motifs and inference of 'dual regulons'

RTNduals is a tool that searches for possible co-regulatory loops between regulon pairs generated by the RTN package. It compares the shared targets in order to infer 'dual regulons', a new concept that tests whether regulon pairs agree on the predicted downstream effects.

MCbiclust Massive correlating biclusters for gene expression data and associated methods

Custom made algorithm and associated methods for finding, visualising and analysing biclusters in large gene expression data sets. Algorithm is based on with a supplied gene set of size n, finding the maximum strength correlation matrix containing m samples from the data set.

POST Projection onto Orthogonal Space Testing for High Dimensional Data

Perform orthogonal projection of high dimensional data of a set, and statistical modeling of phenotye with projected vectors as predictor.

discordant The Discordant Method: A Novel Approach for Differential Correlation

Discordant is a method to determine differential correlation of molecular feature pairs from -omics data using mixture models. Algorithm is explained further in Siska et al.

karyoploteR Plot customizable linear genomes displaying arbitrary data

karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.

splatter Simple Simulation of Single-cell RNA Sequencing Data

Splatter is a package for the simulation of single-cell RNA sequencing count data. It provides a simple interface for creating complex simulations that are reproducible and well-documented. Parameters can be estimated from real data and functions are provided for comparing real and simulated datasets.

Source Code & Build Reports »

Source code is stored in svn (user: readonly, pass: readonly).

Software packages are built and checked nightly. Build reports:

 

Development Version »

Bioconductor packages under development:

Developer Resources: