flowFit Estimate proliferation in cell-tracking dye studies
This package estimate the proliferation of a cell population in cell-tracking dye studies. The package uses an R implementation of the Levenberg-Marquardt algorithm (minpack.lm) to fit a set of peaks (corresponding to different generations of cells) over the proliferation-tracking dye distribution in a FACS experiment.
clonotypeR Identify and analyse B and T cell receptors at a high throughput.
Identify and analyse B and T cell receptors at a high throughput. The genes encoding T cell receptors and B cell receptors (the antibodies) are created by somatic recombination, generating an immense combination of V, (D) and J segments. Additional processes during the recombination create extra sequence diversity between the V an J segments. Collectively, this hyper-variable region is called the CDR3 loop. . The purpose of this package is to process and quantitatively analyse millions of V-CDR3-J combination, called clonotypes, from multiple libraries.
XVector Representation and manpulation of external sequences
Memory efficient S4 classes for storing sequences "externally" (behind an R external pointer, or on disk).
sSeq Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size
The purpose of this package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments. Gene expression is measured in counts of transcripts and modeled with the Negative Binomial (NB) distribution using a shrinkage approach for dispersion estimation. The method of moment (MM) estimates for dispersion are shrunk towards an estimated target, which minimizes the average squared difference between the shrinkage estimates and the initial estimates. The exact per-gene probability under the NB model is calculated, and used to test the hypothesis that the expected expression of a gene in two conditions identically follow a NB distribution.
SplicingGraphs Create, manipulate, visualize splicing graphs, and assign RNA-seq reads to them
This package allows the user to create, manipulate, and visualize splicing graphs and their bubbles based on a gene model for a given organism. Additionally it allows the user to assign RNA-seq reads to the edges of a set of splicing graphs, and to summarize them.
BiSeq Processing and analyzing bisulfite sequencing data
The BiSeq package provides useful classes and functions to handle and analyze targeted bisulfite sequencing (BS) data such as reduced-representation bisulfite sequencing (RRBS) data. In particular, it implements an algorithm to detect differentially methylated regions (DMRs). The package takes already aligned BS data from one or multiple samples.
triplex Search and visualize intramolecular triplex-forming sequences in DNA
This package provides functions for identification and visualization of potential intramolecular triplex patterns in DNA sequence. The main functionality is to detect the positions of subsequences capable of folding into an intramolecular triplex (H-DNA) in a much larger sequence. The potential H-DNA (triplexes) should be made of as many cannonical nucleotide triplets as possible. The package includes visualization showing the exact base-pairing in 1D, 2D or 3D.
prebs Probe region expression estimation for RNA-seq data for improved microarray comparability
The prebs package aims at making RNA-sequencing (RNA-seq) data more comparable to microarray data. The comparability is achieved by summarizing sequencing-based expressions of probe regions using a modified version of RMA algorithm. The pipeline takes mapped reads in BAM format as an input and produces either gene expressions or original microarray probe set expressions as an output.
piano Platform for integrative analysis of omics data
Piano performs gene set analysis using various statistical methods, from different gene level statistics and a wide range of gene-set collections. Furthermore, the Piano package contains functions for combining the results of multiple runs of gene set analyses.
MMDiff Statistical Testing for ChIP-Seq data sets
This package detects statistically significant difference between read enrichment profiles in different ChIP-Seq samples. To take advantage of shape differences it uses Kernel methods (Maximum Mean Discrepancy, MMD).
GENE.E Interact with GENE-E from R
Interactive exploration of matrices in GENE-E.
PAPi Predict metabolic pathway activity based on metabolomics data
The Pathway Activity Profiling - PAPi - is an R package for predicting the activity of metabolic pathways based solely on a metabolomics data set containing a list of metabolites identified and their respective abundances in different biological samples. PAPi generates hypothesis that improves the final biological interpretation. See Aggio, R.B.M; Ruggiero, K. and Villas-Boas, S.G. (2010) - Pathway Activity Profiling (PAPi): from metabolite profile to metabolic pathway activity. Bioinformatics.
antiProfiles Implementation of gene expression anti-profiles
Implements gene expression anti-profiles as described in Corrada Bravo et al., BMC Bioinformatics 2012, 13:272 doi:10.1186/1471-2105-13-272.
copynumber Segmentation of single- and multi-track copy number data by penalized least squares regression.
Penalized least squares regression is applied to fit piecewise constant curves to copy number data to locate genomic regions of constant copy number. Procedures are available for individual segmentation of each sample, joint segmentation of several samples and joint segmentation of the two data tracks from SNP-arrays. Several plotting functions are available for visualization of the data and the segmentation results.
geNetClassifier classify diseases and build associated gene networks using gene expression profiles
Comprehensive package to automatically train a multi-class SVM classifier based on gene expression data. Provides transparent selection of gene markers, their coexpression networks, and an interface to query the classifier.
SeqGSEA Gene Set Enrichment Analysis (GSEA) of RNA-Seq Data: integrating differential expression and splicing
The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.
DASiR Distributed Annotation System in R
R package for programmatic retrieval of information from DAS servers
metagenomeSeq Statistical analysis for sparse high-throughput sequencing
metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.
AnnotationHub A client for retrieving Bioconductor objects from AnnotationHub
A client for retrieving data from the Bioconductor AnnotationHub online services.
ROntoTools R Onto-Tools suite
Suite of tools for functional analysis
eiR Accelerated similarity searching of small molecules
The eiR package provides utilities for accelerated structure similarity searching of very large small molecule data sets using an embedding and indexing approach.
CNORfeeder Integration of CellNOptR to add missing links
This package integrates literature-constrained and data-driven methods to infer signalling networks from perturbation experiments. It permits to extends a given network with links derived from the data via various inference methods, and uses information on physical interactions of proteins to guide and validate the integration of links.
BaseSpaceR R SDK for BaseSpace RESTful API
A rich R interface to Illumina's BaseSpace cloud computing environment, enabling the fast development of data analysis and visualisation tools.
SPEM S-system parameter estimation method
This package can optimize the parameter in S-system models given time series data
SeqArray Big Data Management of Genome-wide Sequencing Variants
Big data management of genome-wide variants using the CoreArray library, where genotypic data and annotations are stored in an array-oriented manner, offering efficient access of genetic variants using the R language.
RNASeqPower Sample size for RNAseq studies
RNA-seq, sample size
pathview a tool set for pathway based data integration and visualization
Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set analysis tools for large-scale and fully automated analysis.
jmosaics Joint analysis of multiple ChIP-Seq data sets
jmosaics detects enriched regions of ChIP-seq data sets jointly.
HCsnip Semi-supervised adaptive-height snipping of the Hierarchical Clustering tree
Decompose given hierarchical clustering tree into non-overlapping clusters in a semi-supervised way by using available patients follow-up information as guidance. Contains functions for snipping HC tree, various cluster quality evaluation criteria, assigning new patients to one of the two given HC trees, testing the significance of clusters with permutation argument and clusters visualization using sample's molecular entropy.
epigenomix Epigenetic and gene expression data normalization and integration with mixture models
A package for the integrative analysis of microarray based gene expression and histone modification data obtained by ChIP-seq. The package provides methods for data preprocessing and matching as well as methods for fitting bayesian mixture models in order to detect genes with differences in both data types.
dexus DEXUS - Identifying Differential Expression in RNA-Seq Studies with Unknown Conditions or without Replicates
DEXUS identifies differentially expressed genes in RNA-Seq data under all possible study designs such as studies without replicates, without sample groups, and with unknown conditions. DEXUS works also for known conditions, for example for RNA-Seq data with two or multiple conditions. RNA-Seq read count data can be provided both by the S4 class Count Data Set and by read count matrices. Differentially expressed transcripts can be visualized by heatmaps, in which unknown conditions, replicates, and samples groups are also indicated. This software is fast since the core algorithm is written in C. For very large data sets, a parallel version of DEXUS is provided in this package. DEXUS is a statistical model that is selected in a Bayesian framework by an EM algorithm. DEXUS does not need replicates to detect differentially expressed transcripts, since the replicates (or conditions) are estimated by the EM method for each transcript. The method provides an informative/non-informative value to extract differentially expressed transcripts at a desired significance level or power.
Identifying distinct subpopulations through multiscale time series analysis
ARRmNormalization Adaptive Robust Regression normalization for Illumina methylation data
Perform the Adaptive Robust Regression method (ARRm) for the normalization of methylation data from the Illumina Infinium HumanMethylation 450k assay.
gCMAPWeb A web interface for gene-set enrichment analyses
The gCMAPWeb R package provides a graphical user interface for the gCMAP package. gCMAPWeb uses the Rook package and can be used either on a local machine, leveraging R's internal web server, or run on a dedicated rApache web server installation. gCMAPWeb allows users to search their own data sources and instructions to generate reference datasets from public repositories are included with the package. The package supports three common types of analyses, specifically queries with 1. one or two sets of query gene identifiers, whose members are expected to show changes in gene expression in a consistent direction. For example, an up-regulated gene set might contain genes activated by a transcription factor, a down-regulated geneset targets repressed by the same factor. 2. a single set of query gene identifiers, whose members are expected to show divergent differential expression (non-directional query). For example, members of a particular signaling pathway, some of which may be up- some down-regulated in response to a stimulus. 3. a query with the complete results of a differential expression profiling experiment. For example, gene identifiers and z-scores from a previous perturbation experiment. gCMAPWeb accepts three types of identifiers: EntreIds, gene Symbols and microarray probe ids and can be configured to work with any species supported by Bioconductor. For each query submission, significantly similar reference datasets will be identified and reported in graphical and tabular form.
SANTA Spatial Analysis of Network Associations
This package provides methods for measuring the strength of association between a network and a phenotype. It does this by measuring clustering of the phenotype across the network. Vertices can also be individually ranked by their strength of association with high-weight vertices.
proteinProfiles Protein Profiling
Significance assessment for distance measures of time-course protein profiles
DESeq2 Differential gene expression analysis based on the negative binomial distribution
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution
chimera A package for detection and secondary analysis of fusion products
This package facilitates the characterisation of fusion products events. It allows to import fusion data results from the following fusion finders: bellerophontes, deFuse, FusionFinder, FusionHunter, mapSplice, tophat-fusion, FusionMap, STAR.
CAGEr Analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining
Preprocessing of CAGE sequencing data, identification and normalization of transcription start sites and downstream analysis of transcription start sites clusters (promoters).
biomvRCNS Copy Number study and Segmentation for multivariate biological data
In this package, a Hidden Semi Markov Model (HSMM) and one homogeneous segmentation model are designed and implemented for segmentation genomic data, with the aim of assisting in transcripts detection using high throughput technology like RNA-seq or tiling array, and copy number analysis using aCGH or sequencing.
cisPath Visualization of the Shortest Functional Paths between Proteins.
cisPath is an R package for identification and visualization of the shortest functional paths between proteins in the protein-protein interaction network.
MineICA Analysis of an ICA decomposition obtained on genomics data
The goal of MineICA is to make easier the interpretation of the interpretation of a decomposition obtained by Independent Component Analysis on transcriptomic data. It helps the biological interpretation of the components by studying their association with variables (e.g sample annotations) and gene sets, and enables the comparison of components from different datasets using correlation-based graph.
lpNet Linear Programming Model for Network Inference
lpNet takes perturbation data as input and generates an LP model which allows the inference of signaling networks. For parameter identification either leave-one-out cross-validation or stratified n-fold cross-validation can be used.
SomatiCA SomatiCA: identifying, characterizing, and quantifying somatic copy number aberrations from cancer genome sequencing
SomatiCA is a software suite that is capable of identifying, characterizing, and quantifying somatic CNAs from cancer genome sequencing. First, it uses read depths and lesser allele frequencies (LAF) from mapped short sequence reads to segment the genome and identify candidate CNAs. Second, SomatiCA estimates the admixture rate from the relative copy-number profile of tumor-normal pair by a Bayesian finite mixture model. Third, SomatiCA quantifies absolute somatic copy-number and subclonality for each genomic segment to guide its characterization. Results from SomatiCA can be further integrated with single nucleotide variations (SNVs) to get a better understanding of the tumor evolution.
rBiopaxParser Parses BioPax files and represents them in R
Parses BioPAX files and represents them in R, at the moment BioPAX level 2 and level 3 are supported.
HTSFilter Filter replicated high-throughput transcriptome sequencing data
This package implements a filtering procedure for replicated transcriptome sequencing data based on a global Jaccard similarity index in order to identify genes with low, constant levels of expression across one or more experimental conditions.
MethylSeekR Segmentation of Bis-seq data
This is a package for the discovery of regulatory regions from Bis-seq data
DrugVsDisease Comparison of disease and drug profiles using Gene set Enrichment Analysis
This package generates ranked lists of differential gene expression for either disease or drug profiles. Input data can be downloaded from Array Express or GEO, or from local CEL files. Ranked lists of differential expression and associated p-values are calculated using Limma. Enrichment scores (Subramanian et al. PNAS 2005) are calculated to a reference set of default drug or disease profiles, or a set of custom data supplied by the user. Network visualisation of significant scores are output in Cytoscape format.
PathNet An R package for pathway analysis using topological information
PathNet uses topological information present in pathways and differential expression levels of genes (obtained from microarray experiment) to identify pathways that are 1) significantly enriched and 2) associated with each other in the context of differential expression. The algorithm is described in: PathNet: A tool for pathway analysis using topological information. Dutta B, Wallqvist A, and Reifman J. Source Code for Biology and Medicine 2012 Sep 24;7(1):10.
clipper Gene set analysis exploiting pathway topology
clipper is a package for topological gene set analysis. It implements a two-step empirical approach based on the exploitation of graph decomposition into a junction tree to reconstruct the most relevant signal path. In the first step clipper selects significant pathways according to statistical tests on the means and the concentration matrices of the graphs derived from pathway topologies. Then, it "clips" the whole pathway identifying the signal paths having the greatest association with a specific phenotype.
casper Characterization of Alternative Splicing based on Paired-End Reads
Infer alternative splicing from paired-end RNA-seq data. The model is based on counting paths across exons, rather than pairwise exon connections, and estimates the fragment size and start distributions non-parametrically, which improves estimation precision.
KEGGREST Client-side REST access to KEGG
A package that provides a client interface to the KEGG REST server. Based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.
QuasR Quantify and Annotate Short Reads in R
This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest.
Rbowtie R bowtie wrapper
This package provides an R wrapper around the popular bowtie short read aligner and around SpliceMap, a de novo splice junction discovery and alignment tool. The package is used by the QuasR bioconductor package. We recommend to use the QuasR package instead of using Rbowtie directly.
rTANDEM Encapsulate X!Tandem in R.
This package encapsulate X!Tandem in R. In its most basic functionality, this package allows to call tandem(input) from R, just as tandem.exe /path/to/input.xml would be used to run X!Tandem from the command line. Classes are also provided for taxonomy and parameters objects and methods are provided to convert xml files to R objects and vice versa. This package is the first step in an attempt to provide a reliable worflow for proteomics analysis in R.
RSVSim RSVSim: an R/Bioconductor package for the simulation of structural variations
RSVSim is a package for the simulation of deletions, insertions, inversion, tandem-duplications and translocations of various sizes in any genome available as FASTA-file or BSgenome data package. SV breakpoints can be placed uniformly accross the whole genome, with a bias towards repeat regions and regions of high homology (for hg19) or at user-supplied coordinates.
iBMQ integrated Bayesian Modeling of eQTL data
integrated Bayesian Modeling of eQTL data
GraphPAC Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach.
Identifies mutational clusters of amino acids in a protein while utilizing the proteins tertiary structure via a graph theoretical model.
ensemblVEP R Interface to Ensembl Variant Effect Predictor
Query the Ensembl Variant Effect Predictor via the perl API
wateRmelon Illumina 450 methylation array normalization and metrics
15 flavours of betas and three performance metrics, with methods for objects produced by methylumi, minfi and IMA packages.
bumphunter Bump Hunter
Tools for finding bumps in genomic data
DriverNet Drivernet: uncovering somatic driver mutations modulating transcriptional networks in cancer
DriverNet is a package to predict functional important driver genes in cancer by integrating genome data (mutation and copy number variation data) and transcriptome data (gene expression data). The different kinds of data are combined by an influence graph, which is a gene-gene interaction network deduced from pathway data. A greedy algorithm is used to find the possible driver genes, which may mutated in a larger number of patients and these mutations will push the gene expression values of the connected genes to some extreme values.
pRoloc A unifying bioinformatics framework for spatial proteomics
This package implements pattern recognition techniques on quantitiative mass spectrometry data to infer protein sub-cellular localisation.
pvca Principal Variance Component Analysis (PVCA)
This package contains the function to assess the batch sourcs by fitting all "sources" as random effects including two-way interaction terms in the Mixed Model(depends on lme4 package) to selected principal components, which were obtained from the original data correlation matrix. This package accompanies the book "Batch Effects and Noise in Microarray Experiements, chapter 12.
BiocParallel Bioconductor facilities for parallel evaluation
This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.
plrs Piecewise Linear Regression Splines (PLRS) for the association between DNA copy number and gene expression
The present package implements a flexible framework for modeling the relationship between DNA copy number and gene expression data using Piecewise Linear Regression Splines (PLRS).
UniProt.ws R Interface to UniProt Web Services
A collection of functions for retrieving, processing and repackaging the Uniprot web services.
SNAGEE Signal-to-Noise applied to Gene Expression Experiments
Signal-to-Noise applied to Gene Expression Experiments. Signal-to-noise ratios can be used as a proxy for quality of gene expression studies and samples. The SNRs can be calculated on any gene expression data set as long as gene IDs are available, no access to the raw data files is necessary. This allows to flag problematic studies and samples in any public data set.
illuminaio Parsing Illumina microarray output files
Tools for parsing Illumina's microarray output files, including IDAT.
RIPSeeker RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments
Infer and discriminate RIP peaks from RIP-seq alignments using two-state HMM with negative binomial emission probability. While RIPSeeker is specifically tailored for RIP-seq data analysis, it also provides a suite of bioinformatics tools integrated within this self-contained software package comprehensively addressing issues ranging from post-alignments processing to visualization and annotation.
matchBox Utilities to compute, compare, and plot the agreement between ordered vectors of features (ie. distinct genomic experiments). The package includes Correspondence-At-the-TOP (CAT) analysis.
The matchBox package enables comparing ranked vectors of features, merging multiple datasets, removing redundant features, using CAT-plots and Venn diagrams, and computing statistical significance.
rSFFreader rSFFreader reads in sff files generated by Roche 454 and Life Sciences Ion Torrent sequencers
rSFFreader reads sequence, qualities and clip point values from sff files generated by Roche 454 and Life Sciences Ion Torrent sequencers. The plan is to also write out sff files and to read in flowgrams with some utils
chroGPS chroGPS: visualizing the epigenome
We provide intuitive maps to visualize the association between genetic elements, with emphasis on epigenetics. The approach is based on Multi-Dimensional Scaling. We provide several sensible distance metrics, and adjustment procedures to remove systematic biases typically observed when merging data obtained under different technologies or genetic backgrounds.
NOISeq Exploratory analysis and differential expression for RNA-seq data
Analysis of RNA-seq expression data or other similar kind of data. Exploratory plots to evualuate saturation, count distribution, expression per chromosome, type of detected features, features length, etc. Differential expression between two experimental conditions with no parametric assumptions.
TransView Read density map construction and accession. Visualization of ChIPSeq and RNASeq data sets.
This package provides efficient tools to generate, access and display read densities of sequencing based data sets such as from RNA-Seq and ChIP-Seq.
Rcade R-based analysis of ChIP-seq And Differential Expression - a tool for integrating a count-based ChIP-seq analysis with differential expression summary data.
Rcade (which stands for "R-based analysis of ChIP-seq And Differential Expression") is a tool for integrating ChIP-seq data with differential expression summary data, through a Bayesian framework. A key application is in identifing the genes targeted by a transcription factor of interest - that is, we collect genes that are associated with a ChIP-seq peak, and differential expression under some perturbation related to that TF.
HTSeqGenie A NGS analysis pipeline.
Libraries to perform NGS analysis.
HMMcopy Copy number prediction with correction for GC and mappability bias for HTS data
Corrects GC and mappability biases for readcounts (i.e. coverage) in non-overlapping windows of fixed length for single whole genome samples, yielding a rough estimate of copy number for furthur analysis. Designed for rapid correction of high coverage whole genome tumour and normal samples.
CNORode ODE add-on to CellNOptR
ODE add-on to CellNOptR
bigmemoryExtras An extension of the bigmemory package with added safety, convenience, and a factor class.
This package defines a "BigMatrix" ReferenceClass which adds safety and convenience features to the filebacked.big.matrix class from the bigmemory package. BigMatrix protects against segfaults by monitoring and gracefully restoring the connection to on-disk data and it also protects against accidental data modification with a filesystem-based permissions system. We provide utilities for using BigMatrix-derived classes as assayData matrices within the Biobase package's eSet family of classes. BigMatrix provides some optimizations related to attaching to, and indexing into, file-backed matrices with dimnames. Additionally, the package provides a "BigMatrixFactor" class, a file-backed matrix with factor properties.
CNORdt Add-on to CellNOptR: Discretized time treatments
This add-on to the package CellNOptR handles time-course data, as opposed to steady state data in CellNOptR. It scales the simulation step to allow comparison and model fitting for time-course data. Future versions will optimize delays and strengths for each edge.
agilp Agilent expression array processing package
provides a pipeline for the low-level analysis of gene expression microarray data, primarily Agilent data
SCAN.UPC Single-channel array normalization (SCAN) and University Probability of expression Codes (UPC)
SCAN is a microarray normalization method to facilitate personalized-medicine workflows. Rather than processing microarray samples as groups, which can introduce biases and present logistical challenges, SCAN normalizes each sample individually by modeling and removing probe- and array-specific background noise using only data from within each array. SCAN can be applied to one-channel (e.g., Affymetrix) or two-channel (e.g., Agilent) microarrays. The Universal Probability of expression Codes (UPC) method is an extension of SCAN that generates probability-of-expression values. These values can be interpreted as the probability that a given genomic feature (e.g., gene, transcript) is expressed above the background in a given sample. The UPC method can be applied to one-channel or two-channel microarrays as well as to RNA-Seq read counts. Because UPC values are represented on the same scale and have an identical interpretation for each platform, they can be used for cross-platform data integration.)
ReportingTools Tools for making reports in various formats
The ReportingTools software package enables users to easily display reports of analysis results generated from sources such as microarray and sequencing data. The package allows users to create HTML pages that may be viewed on a web browser such as Safari, or in other formats readable by programs such as Excel. Users can generate tables with sortable and filterable columns, make and display plots, and link table entries to other data sources such as NCBI or larger plots within the HTML page. Using the package, users can also produce a table of contents page to link various reports together for a particular project that can be viewed in a web browser.
DeconRNASeq Deconvolution of Heterogeneous Tissue Samples for mRNA-Seq data
DeconSeq is an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It modeled expression levels from heterogeneous cell populations in mRNA-Seq as the weighted average of expression from different constituting cell types and predicted cell type proportions of single expression profiles.
motifStack Plot stacked logos for single or multiple DNA, RNA and amino acid sequence
The motifStack package is designed for graphic representation of multiple motifs with different similarity scores. It works with both DNA/RNA sequence motif and amino acid sequence motif. In addition, it provides the flexibility for users to customize the graphic parameters such as the font type and symbol colors.
gCMAP Tools for Connectivity Map-like analyses
The gCMAP package provides a toolkit for comparing differential gene expression profiles through gene set enrichment analysis. Starting from normalized microarray or RNA-seq gene expression values (stored in lists of ExpressionSet and CountDataSet objects) the package performs differential expression analysis using the limma or DESeq packages. Supplying a simple list of gene identifiers, global differential expression profiles or data from complete experiments as input, users can use a unified set of several well-known gene set enrichment analysis methods to retrieve experiments with similar changes in gene expression. To take into account the directionality of gene expression changes, gCMAPQuery introduces the SignedGeneSet class, directly extending GeneSet from the GSEABase package. To increase performance of large queries, multiple gene sets are stored as sparse incidence matrices within CMAPCollection eSets. gCMAP offers implementations of 1. Fisher's exact test (Fisher, J R Stat Soc, 1922) 2. The "connectivity map" method (Lamb et al, Science, 2006) 3. Parametric and non-parametric t-statistic summaries (Jiang & Gentleman, Bioinformatics, 2007) and 4. Wilcoxon / Mann-Whitney rank sum statistics (Wilcoxon, Biometrics Bulletin, 1945) as well as wrappers for the 5. camera (Wu & Smyth, Nucleic Acid Res, 2012) 6. mroast and romer (Wu et al, Bioinformatics, 2010) functions from the limma package and 7. wraps the gsea method from the mgsa package (Bauer et al, NAR, 2010). All methods return CMAPResult objects, an S4 class inheriting from AnnotatedDataFrame, containing enrichment statistics as well as annotation data and providing simple high-level summary plots.
Risa Converting experimental metadata from ISA-tab into Bioconductor data structures
The Investigation / Study / Assay (ISA) tab-delimited format is a general purpose framework with which to collect and communicate complex metadata (i.e. sample characteristics, technologies used, type of measurements made) from experiments employing a combination of technologies, spanning from traditional approaches to high-throughput techniques. Risa allows to access metadata/data in ISA-Tab format and build Bioconductor data structures. Currently, data generated from microarray, flow cytometry and metabolomics-based (i.e. mass spectrometry) assays are supported. The package is extendable and efforts are undergoing to support metadata associated to proteomics assays.
hpar Human Protein Atlas in R
A simple interface to and data from the Human Protein Atlas project.
hapFabia hapFabia: Identification of very short segments of identity by descent (IBD) characterized by rare variants in large sequencing data
A package to identify very short IBD segments in large sequencing data by FABIA biclustering. Two haplotypes are identical by descent (IBD) if they share a segment that both inherited from a common ancestor. Current IBD methods reliably detect long IBD segments because many minor alleles in the segment are concordant between the two haplotypes. However, many cohort studies contain unrelated individuals which share only short IBD segments. This package provides software to identify short IBD segments in sequencing data. Knowledge of short IBD segments are relevant for phasing of genotyping data, association studies, and for population genetics, where they shed light on the evolutionary history of humans. The package supports VCF formats, is based on sparse matrix operations, and provides visualization of haplotype clusters in different formats.
OrganismDbi Software to enable the smooth interfacing of different database packages.
The package enables a simple unified interface to several annotation packages each of which has its own schema by taking advantage of the fact that each of these packages implements a select methods.
lmdme Linear Model decomposition for Designed Multivariate Experiments
linear ANOVA decomposition of Multivariate Designed Experiments implementation based on limma lmFit. Features: i)Flexible formula type interface, ii) Fast limma based implementation, iii) p-values for each estimated coefficient levels in each factor, iv) F values for factor effects and v) plotting functions for PCA and PLS.
fmcsR Flexible Maximum Common Substructure (FMCS) Searching
The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.
methyAnalysis DNA methylation data analysis and visualization
The methyAnalysis package aims for the DNA methylation data analysis and visualization. A new class is defined to keep the chromosome location information together with the data. The current version of the package mainly focus on analyzing the Illumina Infinium methylation array data, but most methods can be generalized to other methylation array or sequencing data.
CNORfuzzy Addon to CellNOptR: Fuzzy Logic
This package is an extension to CellNOptR. It contains additional functionality needed to simulate and train a prior knowledge network to experimental data using constrained fuzzy logic (cFL, rather than Boolean logic as is the case in CellNOptR). Additionally, this package will contain functions to use for the compilation of multiple optimization results (either Boolean or cFL).
EasyqpcR EasyqpcR for low-throughput real-time quantitative PCR data analysis
This package is based on the qBase algorithms published by Hellemans et al. in 2007. The EasyqpcR package allows you to import easily qPCR data files as described in the vignette. Thereafter, you can calculate amplification efficiencies, relative quantities and their standard errors, normalization factors based on the best reference genes choosen (using the SLqPCR package), and then the normalized relative quantities, the NRQs scaled to your control and their standard errors. This package has been created for low-throughput qPCR data analysis.
GeneNetworkBuilder Build Regulatory Network from ChIP-chip/ChIP-seq and Expression Data
Appliation for discovering direct or indirect targets of transcription factors using ChIP-chip or ChIP-seq, and microarray or RNA-seq gene expression data. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TF.
MotifDb An Annotated Collection of Protein-DNA Binding Sequence Motifs
More than 2000 annotated position frequency matrices from five public source, for multiple organisms
ChIPXpress ChIPXpress: enhanced transcription factor target gene identification from ChIP-seq and ChIP-chip data using publicly available gene expression profiles
ChIPXpress takes as input predicted TF bound genes from ChIPx data and uses a corresponding database of gene expression profiles downloaded from NCBI GEO to rank the TF bound targets in order of which gene is most likely to be functional TF target.
RMassBank Workflow to process tandem MS files and build MassBank records
Workflow to process tandem MS files and build MassBank records. Functions include automated extraction of tandem MS spectra, formula assignment to tandem MS fragments, recalibration of tandem MS spectra with assigned fragments, spectrum cleanup, automated retrieval of compound information from Internet databases, and export to MassBank records.
Source Code & Build Reports »
Source code is stored in
Software packages are built and checked nightly. Build reports: