IHW Independent Hypothesis Weighting
Independent hypothesis weighting (IHW) is a multiple testing procedure that increases power compared to the method of Benjamini and Hochberg by assigning data-driven weights to each hypothesis. The input to IHW is a two-column table of p-values and covariates. The covariate can be any continuous-valued or categorical variable that is thought to be informative on the statistical properties of each hypothesis test, while it is independent of the p-value under the null hypothesis.
GenVisR Genomic Visualizations in R
Produce highly customizable publication quality graphics for genomic data primarily at the cohort level.
iCARE A Tool for Individualized Coherent Absolute Risk Estimation (iCARE)
An R package to compute Individualized Coherent Absolute Risk Estimators.
flowAI Automatic and interactive quality control for flow cytometry data
The package is able to perform an automatic or interactive quality control on .fcs data acquired using flow cytometry instruments. By evaluating three different properties: 1) flow rate, 2) signal acquisition, 3) dynamic range, the quality control enables the detection and removal of anomalies.
EmpiricalBrownsMethod Uses Brown's method to combine p-values from dependent tests
Combining P-values from multiple statistical tests is common in bioinformatics. However, this procedure is non-trivial for dependent P-values. This package implements an empirical adaptation of Brown’s Method (an extension of Fisher’s Method) for combining dependent P-values which is appropriate for highly correlated data sets found in high-throughput biological experiments.
PanVizGenerator Generate PanViz visualisations from your pangenome
SC3 Single-Cell Consensus Clustering
Interactive tool for clustering and analysis of single cell RNA-Seq data.
JunctionSeq JunctionSeq: A Utility for Detection of Differential Exon and Splice-Junction Usage in RNA-Seq data
A Utility for Detection and Visualization of Differential Exon or Splice-Junction Usage in RNA-Seq data.
cellTree Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure
This packages computes a Latent Dirichlet Allocation (LDA) model of single-cell RNA-seq data and builds a compact tree modelling the relationship between individual cells over time or space.
ggcyto Visualize Cytometry data with ggplot
With the dedicated fority method implemented for flowSet, ncdfFlowSet and GatingSet classes, both raw and gated flow cytometry data can be plotted directly with ggplot. ggcyto wrapper and some customed layers also make it easy to add gates and population statistics to the plot.
tofsims Import, process and analysis of Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) imaging data
This packages offers a pipeline for import, processing and analysis of ToF-SIMS 2D image data. Import of Iontof and Ulvac-Phi raw or preprocessed data is supported. For rawdata, mass calibration, peak picking and peak integration exist. General funcionality includes data binning, scaling, image subsetting and visualization. A range of multivariate tools common in the ToF-SIMS community are implemented (PCA, MCR, MAF, MNF). An interface to the bioconductor image processing package EBImage offers image segmentation functionality.
GSALightning Fast Permutation-based Gene Set Analysis
GSALightning provides a fast implementation of permutation-based gene set analysis for two-sample problem. This package is particularly useful when testing simultaneously a large number of gene sets, or when a large number of permutations is necessary for more accurate p-values estimation.
QuaternaryProd Computes the Quaternary Product Scoring Statistic for Signed and Unsigned Causal Graphs
One of the most challenging problems in computational biology is inference of active regulatory cascades under specific molecular and environmental perturbations. As our understanding of regulatory mechanisms grows, larger networks of causal and non-causal biological interactions become available. However, only a fraction of regulatory interactions are tagged as either increasing or decreasing a target transcript. This leads to networks with a mix of signed and unsigned interactions. In this package, we provide a statistical method (Quaternary Product Scoring Statistic) that will infer likely upstream regulators given (1) a set of up- and down-regulated transcripts from a specific biological experiment and (2) a mixed network of regulatory interactions potentially relevant in the current biological context.
CrispRVariants Tools for counting and visualising mutations in a target location
CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.
splineTCDiffExpr Time-course differential gene expression data analysis using spline regression models followed by gene association network reconstruction
This package provides functions for differential gene expression analysis of gene expression time-course data. Natural cubic spline regression models are used. Identified genes may further be used for pathway enrichment analysis and/or the reconstruction of time dependent gene regulatory association networks.
kimod A k-tables approach to integrate multiple Omics-Data
This package allows to work with mixed omics data (transcriptomics, proteomics, microarray-chips, rna-seq data), introducing the following improvements: distance options (for numeric and/or categorical variables) for each of the tables, bootstrap resampling techniques on the residuals matrices for all methods, that enable perform confidence ellipses for the projection of individuals, variables and biplot methodology to project variables (gene expression) on the compromise. Since the main purpose of the package is to use these techniques to omic data analysis, it includes an example data from four different microarray platforms (i.e.,Agilent, Affymetrix HGU 95, Affymetrix HGU 133 and Affymetrix HGU 133plus 2.0) on the NCI-60 cell lines.NCI60_4arrays is a list containing the NCI-60 microarray data with only few hundreds of genes randomly selected in each platform to keep the size of the package small. The data are the same that the package omicade4 used to implement the co-inertia analysis. The references in packages follow the style of the APA-6th norm.
lpsymphony Symphony integer linear programming solver in R
This package was derived from Rsymphony_0.1-17 from CRAN. These packages provide an R interface to SYMPHONY, an open-source linear programming solver written in C++. The main difference between this package and Rsymphony is that it includes the solver source code (SYMPHONY version 5.6), while Rsymphony expects to find header and library files on the users' system. Thus the intention of lpsymphony is to provide an easy to install interface to SYMPHONY. For Windows, precompiled DLLs are included in this package.
transcriptR An Integrative Tool for ChIP- And RNA-Seq Based Primary Transcripts Detection and Quantification
The differences in the RNA types being sequenced have an impact on the resulting sequencing profiles. mRNA-seq data is enriched with reads derived from exons, while GRO-, nucRNA- and chrRNA-seq demonstrate a substantial broader coverage of both exonic and intronic regions. The presence of intronic reads in GRO-seq type of data makes it possible to use it to computationally identify and quantify all de novo continuous regions of transcription distributed across the genome. This type of data, however, is more challenging to interpret and less common practice compared to mRNA-seq. One of the challenges for primary transcript detection concerns the simultaneous transcription of closely spaced genes, which needs to be properly divided into individually transcribed units. The R package transcriptR combines RNA-seq data with ChIP-seq data of histone modifications that mark active Transcription Start Sites (TSSs), such as, H3K4me3 or H3K9/14Ac to overcome this challenge. The advantage of this approach over the use of, for example, gene annotations is that this approach is data driven and therefore able to deal also with novel and case specific events. Furthermore, the integration of ChIP- and RNA-seq data allows the identification all known and novel active transcription start sites within a given sample.
profileScoreDist Profile score distributions
Regularization and score distributions for position count matrices.
dcGSA Distance-correlation based Gene Set Analysis for longitudinal gene expression profiles
Distance-correlation based Gene Set Analysis for longitudinal gene expression profiles. In longitudinal studies, the gene expression profiles were collected at each visit from each subject and hence there are multiple measurements of the gene expression profiles for each subject. The dcGSA package could be used to assess the associations between gene sets and clinical outcomes of interest by fully taking advantage of the longitudinal nature of both the gene expression profiles and clinical outcomes.
normalize450K Preprocessing of Illumina Infinium 450K data
Precise measurements are important for epigenome-wide studies investigating DNA methylation in whole blood samples, where effect sizes are expected to be small in magnitude. The 450K platform is often affected by batch effects and proper preprocessing is recommended. This package provides functions to read and normalize 450K '.idat' files. The normalization corrects for dye bias and biases related to signal intensity and methylation of probes using local regression. No adjustment for probe type bias is performed to avoid the trade-off of precision for accuracy of beta-values.
biomformat An interface package for the BIOM file format
This is an R package for interfacing with the BIOM format. This package includes basic tools for reading biom-format files, accessing and subsetting data tables from a biom object (which is more complex than a single table), as well as limited support for writing a biom-object back to a biom-format file. The design of this API is intended to match the python API and other tools included with the biom-format project, but with a decidedly "R flavor" that should be familiar to R users. This includes S4 classes and methods, as well as extensions of common core functions/methods.
BioQC Detect tissue heterogeneity in expression profiles with gene sets
BioQC performs quality control of high-throughput expression data based on tissue gene signatures
FamAgg Pedigree Analysis and Familial Aggregation
Framework providing basic pedigree analysis and plotting utilities as well as a variety of methods to evaluate familial aggregation of traits in large pedigrees.
iCOBRA Comparison and Visualization of Ranking and Assignment Methods
This package provides functions for calculation and visualization of performance metrics for evaluation of ranking and binary classification (assignment) methods. It also contains a shiny application for interactive exploration of results.
Chicago CHiCAGO: Capture Hi-C Analysis of Genomic Organization
A pipeline for analysing Capture Hi-C data.
multiClust A collection of gene feature selection and clustering analysis algorithms
Whole transcriptomic profiles are useful for studying the expression levels of thousands of genes across samples. Clustering algorithms are used to identify patterns in these profiles to determine clinically relevant subgroups. Feature selection is a critical integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing the appropriate methods is difficult as recent work demonstrates that no method is the clear winner. Hence, we present an R-package called `multiClust` that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. In addition, using multiClust, we present the merit of gene selection and clustering methods in the context of clinical relevance of clustering, specifically clinical outcome. Our integrative R- package contains: 1. A function to read in gene expression data and format appropriately for analysis in R. 2. Four different ways to select the number of genes a. Fixed b. Percent c. Poly d. GMM 3. Four gene ranking options that order genes based on different statistical criteria a. CV_Rank b. CV_Guided c. SD_Rank d. Poly 4. Two ways to determine the cluster number a. Fixed b. Gap Statistic 5. Two clustering algorithms a. Hierarchical clustering b. K-means clustering 6. A function to calculate average gene expression in each sample cluster 7. A function to correlate sample clusters with clinical outcome Order of Function use: 1. input_file, a function to read-in the gene expression file and assign gene probe names as the rownames. 2. number_probes, a function to determine the number of probes to select for in the gene feature selection process. 3. probe_ranking, a function to select for gene probes using one of the available gene probe ranking options. 4. number_clusters, a function to determine the number of clusters to be used to cluster genes and samples. 5. cluster_analysis, a function to perform Kmeans or Hierarchical clustering analysis of the selected gene expression data. 6. avg_probe_exp, a function to produce a matrix containing the average expression of each gene probe within each sample cluster. 7. surv_analysis, a function to produce Kaplan-Meier Survival Plots of selected gene expression data.
consensusSeekeR Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges
This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region.
globalSeq Testing for association between RNA-Seq and high-dimensional data
The method may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size.
scde Single Cell Differential Expression
The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).
R4RNA An R package for RNA visualization and analysis
A package for RNA basepair analysis, including the visualization of basepairs as arc diagrams for easy comparison and annotation of sequence and structure. Arc diagrams can additionally be projected onto multiple sequence alignments to assess basepair conservation and covariation, with numerical methods for computing statistics for each.
CNPBayes Bayesian mixture models for copy number polymorphisms
Bayesian hierarchical mixture models for batch effects and copy number.
subSeq Subsampling of high-throughput sequencing count data
Subsampling of high throughput sequencing count data for use in experiment design and analysis.
biobroom Turn Bioconductor objects into tidy data frames
This package contains methods for converting standard objects constructed by bioinformatics packages, especially those in Bioconductor, and converting them to tidy data. It thus serves as a complement to the broom package, and follows the same the tidy, augment, glance division of tidying methods. Tidying data makes it easy to recombine, reshape and visualize bioinformatics analyses.
miRcomp Tools to assess and compare miRNA expression estimatation methods
Based on a large miRNA dilution study, this package provides tools to read in the raw amplification data and use these data to assess the performance of methods that estimate expression from the amplification curves.
RCy3 Display and manipulate graphs in Cytoscape >= 3.3.0
Vizualize, analyze and explore graphs, connecting R to Cytoscape (>= 3.3.0).
SNPhood SNPhood: Investigate, quantify and visualise the epigenomic neighbourhood of SNPs using NGS data
To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly *Bioconductor* R package to investigate and visualize the local neighborhood of a set of SNPs of interest for NGS data such as chromatin marks or transcription factor binding sites from ChIP-Seq or RNA-Seq experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and summarize reads for a genomic region, perform various data quality checks, normalize read counts using additional input files, and to cluster and visualize the regions according to the binding pattern. The regions around each SNP can be binned in a user-defined fashion to allow for analysis of very broad patterns as well as a detailed investigation of specific binding shapes. Furthermore, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns. Finally, SNPhood can be employed for determining, investigating, and visualizing allele-specific binding patterns around the SNPs of interest.
RiboProfiling Ribosome Profiling Data Analysis: from BAM to Data Representation and Interpretation
Starting with a BAM file, this package provides the necessary functions for quality assessment, read start position recalibration, the counting of reads on CDS, 3'UTR, and 5'UTR, plotting of count data: pairs, log fold-change, codon frequency and coverage assessment, principal component analysis on codon coverage.
GeneBreak Gene Break Detection
Recurrent breakpoint gene detection on copy number aberration profiles.
DChIPRep DChIPRep - Analysis of chromatin modification ChIP-Seq data with replication
The DChIPRep package implements a methodology to assess differences between chromatin modification profiles in replicated ChIP-Seq studies as described in Chabbert et. al - http://www.dx.doi.org/10.15252/msb.20145776.
GUIDEseq GUIDE-seq analysis pipeline
The package implements GUIDE-seq analysis workflow including functions for obtaining unique insertion sites (proxy of cleavage sites), estimating the locations of the insertion sites, aka, peaks, merging estimated insertion sites from plus and minus strand, and performing off target search of the extended regions around insertion sites.
SWATH2stats Transform and Filter SWATH Data for Statistical Packages
This package is intended to transform SWATH data from the OpenSWATH software into a format readable by other statistics packages while performing filtering, annotation and FDR estimation.
SISPA SISPA: Method for Sample Integrated Set Profile Analysis
Sample Integrated Gene Set Analysis (SISPA) is a method designed to define sample groups with similar gene set enrichment profiles.
SICtools Find SNV/Indel differences between two bam files with near relationship
This package is to find SNV/Indel differences between two bam files with near relationship in a way of pairwise comparison thourgh each base position across the genome region of interest. The difference is inferred by fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region.
Prostar Provides a GUI for DAPAR
This package provides a GUI interface for DAPAR.
pathVar Methods to Find Pathways with Significantly Different Variability
This package contains the functions to find the pathways that have significantly different variability than a reference gene set. It also finds the categories from this pathway that are significant where each category is a cluster of genes. The genes are separated into clusters by their level of variability.
metagenomeFeatures Exploration of marker-gene sequence taxonomic annotations
metagenomeFeatures was developed for use in exploring the taxonomic annotations for a marker-gene metagenomic sequence dataset. The package can be used to explore the taxonomic composition of a marker-gene database or annotated sequences from a marker-gene metagenome experiment.
MEAL Perform methylation analysis
Package to integrate methylation and expression data. It can also perform methylation or expression analysis alone. Several plotting functionalities are included as well as a new region analysis based on redundancy analysis. Effect of SNPs on a region can also be estimated.
lfa Logistic Factor Analysis for Categorical Data
LFA is a method for a PCA analogue on Binomial data via estimation of latent structure in the natural parameter.
Imetagene A graphical interface for the metagene package
This package provide a graphical user interface to the metagene package. This will allow people with minimal R experience to easily complete metagene analysis.
iCheck QC Pipeline and Data Analysis Tools for High-Dimensional Illumina mRNA Expression Data
QC pipeline and data analysis tools for high-dimensional Illumina mRNA expression data.
gcatest Genotype Conditional Association TEST
GCAT is an association test for genome wide association studies that controls for population structure under a general class of trait. models.
DAPAR Tools for the Differential Analysis of Proteins Abundance with R
This package contains a collection of functions for the visualisation and the statistical analysis of proteomic data.
TarSeqQC TARgeted SEQuencing Experiment Quality Control
The package allows the representation of targeted experiment in R. This is based on current packages and incorporates functions to do a quality control over this kind of experiments and a fast exploration of the sequenced regions. An xlsx file is generated as output.
The package is designed for visualization of RNA-related genomic features with respect to the landmarks of RNA transcripts, i.e., transcription starting site, start codon, stop codon and transcription ending site.
FindMyFriends Microbial Comparative Genomics in R
A framework for doing microbial comparative genomics in R. The main purpose of the package is assisting in the creation of pangenome matrices where genes from related organisms are grouped by similarity, as well as the analysis of these data. FindMyFriends provides many novel approaches to doing pangenome analysis and supports a gene grouping algorithm that scales linearly, thus making the creation of huge pangenomes feasible.
EnrichedHeatmap Making Enriched Heatmaps
Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement Enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.
dupRadar Assessment of duplication rates in RNA-Seq datasets
Duplication rate quality control for RNA-Seq datasets.
DNABarcodes A tool for creating and analysing DNA barcodes used in Next Generation Sequencing multiplexing experiments
The package offers a function to create DNA barcode sets capable of correcting insertion, deletion, and substitution errors. Existing barcodes can be analysed regarding their minimal, maximal and average distances between barcodes. Finally, reads that start with a (possibly mutated) barcode can be demultiplexed, i.e., assigned to their original reference barcode.
DiffLogo DiffLogo: A comparative visualisation of sequence motifs
DiffLogo is an easy-to-use tool to visualize motif differences.
RTCGA The Cancer Genome Atlas Data Integration
The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care. RTCGA package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have an benefcial infuence on impact on development of science and improvement of patients' treatment. Furthermore, RTCGA package transforms TCGA data to tidy form which is convenient to use.
ProteomicsAnnotationHubData Transform public proteomics data resources into Bioconductor Data Structures
These recipes convert a variety and a growing number of public proteomics data sets into easily-used standard Bioconductor data structures.
motifbreakR A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 22).
LOLA Location overlap analysis for enrichment of genomic ranges
Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This make is possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data.
iGC An integrated analysis package of Gene expression and Copy number alteration
This package is intended to identify differentially expressed genes driven by Copy Number Alterations from samples with both gene expression and CNA data.
AnnotationHubData Transform public data resources into Bioconductor Data Structures
These recipes convert a wide variety and a growing number of public bioinformatic data sets into easily-used standard Bioconductor data structures.
sevenbridges R Client for Seven Bridges Platform API and CWL Tool builder in R
R client and utilities for Seven Bridges platoform API, from cancer genomics cloud to other Seven Bridges supported platforms.
GEOsearch is an extendable search engine for NCBI GEO (Gene Expression Omnibus). Instead of directly searching the term, GEOsearch can find all the gene names contained in the search term and search all the alias of the gene names simultaneously in GEO database. GEOsearch also provides other functions such as summarizing common biology keywords in the search results.
ldblock data structures for linkage disequilibrium measures in populations
Define data structures for linkage disequilibrium measures in populations.
Path2PPI Prediction of pathway-related protein-protein interaction networks
Package to predict protein-protein interaction (PPI) networks in target organisms for which only a view information about PPIs is available. Path2PPI predicts PPI networks based on sets of proteins which can belong to a certain pathway from well-established model organisms. It helps to combine and transfer information of a certain pathway or biological process from several reference organisms to one target organism. Path2PPI only depends on the sequence similarity of the involved proteins.
myvariant Accesses MyVariant.info variant query and annotation services
MyVariant.info is a comprehensive aggregation of variant annotation resources. myvariant is a wrapper for querying MyVariant.info services
ChIPComp Quantitative comparison of multiple ChIP-seq datasets
ChIPComp detects differentially bound sharp binding sites across multiple conditions considering matching control.
BBCAnalyzer BBCAnalyzer: an R/Bioconductor package for visualizing base counts
BBCAnalyzer is a package for visualizing the relative or absolute number of bases, deletions and insertions at defined positions in sequence alignment data available as bam files in comparison to the reference bases. Markers for the relative base frequencies, the mean quality of the detected bases, known mutations or polymorphisms and variants called in the data may additionally be included in the plots.
TCGAbiolinks TCGAbiolinks: An R/Bioconductor package for integrative analysis with TCGA data
The aim of TCGAbiolinks is : i) facilitate the TCGA open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) allow the user to download a specific version of the data and thus to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
ABAEnrichment Gene expression enrichment in human brain regions
The package ABAEnrichment is designed to test for enrichment of user defined candidate genes in the set of expressed genes in different human brain regions. The core function 'aba_enrich' integrates the expression of the candidate gene set (averaged across donors) and the structural information of the brain using an ontology, both provided by the Allen Brain Atlas project. 'aba_enrich' interfaces the ontology enrichment software FUNC to perform the statistical analyses. Additional functions provided in this package like 'get_expression' and 'plot_expression' facilitate exploring the expression data.
synlet Hits Selection for Synthetic Lethal RNAi Screen Data
Select hits from synthetic lethal RNAi screen data. For example, there are two identical celllines except one gene is knocked-down in one cellline. The interest is to find genes that lead to stronger lethal effect when they are knocked-down further by siRNA. Quality control and various visualisation tools are implemented. Four different algorithms could be used to pick up the interesting hits. This package is designed based on 384 wells plates, but may apply to other platforms with proper configuration.
NanoStringDiff Differential Expression Analysis of NanoString nCounter Data
This Package utilizes a generalized linear model(GLM) of the negative binomial family to characterize count data and allows for multi-factor design. NanoStrongDiff incorporate size factors, calculated from positive controls and housekeeping controls, and background level, obtained from negative controls, in the model framework so that all the normalization information provided by NanoString nCounter Analyzer is fully utilized.
metaX An R package for metabolomic data analysis
The package provides a integrated pipeline for mass spectrometry- based metabolomic data analysis. It includes the stages peak detection, data preprocessing, normalization, missing value imputation, univariate statistical analysis, multivariate statistical analysis such as PCA and PLS-DA, metabolite identification, pathway analysis, power analysis, feature selection and modeling, data quality assessment.
eudysbiome pseudo-cartesian plot and contingency test on 16S Microbial data
eudysbiome a package that permits to annotate the differential genera as harmful/harmless based on their ability to contribute to host diseases (as indicated in literature) or unknown based on their ambiguous genus classification. Further, the package statistically measures the eubiotic (harmless genera increase or harmful genera decrease) or dysbiotic(harmless genera decrease or harmful genera increase) impact of a given treatment or environmental change on the (gut-intestinal, GI) microbiome in comparison to the microbiome of the reference condition.
Oscope Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq
Oscope is a statistical pipeline developed to identifying and recovering the base cycle profiles of oscillating genes in an unsynchronized single cell RNA-seq experiment. The Oscope pipeline includes three modules: a sine model module to search for candidate oscillator pairs; a K-medoids clustering module to cluster candidate oscillators into groups; and an extended nearest insertion module to recover the base cycle order for each oscillator group.
variancePartition Quantify and interpret divers of variation in multilevel gene expression experiments
Quantify and interpret multiple sources of biological and technical variation in gene expression experiments. Uses linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables.
destiny Creates diffusion maps
Create and plot diffusion maps.
HilbertCurve Making 2D Hilbert Curve
Hilbert curve is a type of space-filling curves that fold one dimensional axis into a two dimensional space, but with still keep the locality. This package aims to provide a easy and flexible way to visualize data through Hilbert curve.
LedPred Learning from DNA to Predict enhancers
This package aims at creating a predictive model of regulatory sequences used to score unknown sequences based on the content of DNA motifs, next-generation sequencing (NGS) peaks and signals and other numerical scores of the sequences using supervised classification. The package contains a workflow based on the support vector machine (SVM) algorithm that maps features to sequences, optimize SVM parameters and feature number and creates a model that can be stored and used to score the regulatory potential of unknown sequences.
traseR GWAS trait-associated SNP enrichment analyses in genomic intervals
traseR performs GWAS trait-associated SNP enrichment analyses in genomic intervals using different hypothesis testing approaches, also provides various functionalities to explore and visualize the results.
OGSA Outlier Gene Set Analysis
OGSA provides a global estimate of pathway deregulation in cancer subtypes by integrating the estimates of significance for individual pathway members that have been identified by outlier analysis.
miRLAB Dry lab for exploring miRNA-mRNA relationships
Provide tools exploring miRNA-mRNA relationships, including popular miRNA target prediction methods, ensemble methods that integrate individual methods, functions to get data from online resources, functions to validate the results, and functions to conduct enrichment analyses.
genotypeeval QA/QC of a gVCF or VCF file
Takes in a gVCF or VCF and reports metrics to assess quality of calls.
ropls PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data
Latent variable modeling with Principal Component Analysis (PCA) and Partial Least Squares (PLS) are powerful methods for visualization, regression, classification, and feature selection of omics data where the number of variables exceeds the number of samples and with multicollinearity among variables. Orthogonal Partial Least Squares (OPLS) enables to separately model the variation correlated (predictive) to the factor of interest and the uncorrelated (orthogonal) variation. While performing similarly to PLS, OPLS facilitates interpretation. Successful applications of these chemometrics techniques include spectroscopic data such as Raman spectroscopy, nuclear magnetic resonance (NMR), mass spectrometry (MS) in metabolomics and proteomics, but also transcriptomics data. In addition to scores, loadings and weights plots, the package provides metrics and graphics to determine the optimal number of components (e.g. with the R2 and Q2 coefficients), check the validity of the model by permutation testing, detect outliers, and perform feature selection (e.g. with Variable Importance in Projection or regression coefficients). The package can be accessed via a user interface on the Workflow4Metabolomics.org online resource for computational metabolomics (built upon the Galaxy environment).
rCGH Comprehensive Pipeline for Analyzing and Visualizing Array-Based CGH Data
A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data is in a suitable format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to segmenting and annotating genes. This package provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.
DEMAND predicts Drug MoA by interrogating a cell context specific regulatory network with a small number (N >= 6) of compound-induced gene expression signatures, to elucidate specific proteins whose interactions in the network is dysregulated by the compound.
rnaseqcomp Benchmarks for RNA-seq Quantification Pipelines
Several quantitative and visualized benchmarks for RNA-seq quantification pipelines. Two-condition quantifications for genes, transcripts, junctions or exons by each pipeline with nessasery meta information should be organizd into numeric matrices in order to proceed the evaluation.
INSPEcT Analysis of 4sU-seq and RNA-seq time-course data
INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-Course experiments) analyses 4sU-seq and RNA-seq time-course data in order to evaluate synthesis, processing and degradation rates and asses via modeling the rates that determines changes in mature mRNA levels.
Prize Prize: an R package for prioritization estimation based on analytic hierarchy process
The high throughput studies often produce large amounts of numerous genes and proteins of interest. While it is difficult to study and validate all of them. Analytic Hierarchy Process (AHP) offers a novel approach to narrowing down long lists of candidates by prioritizing them based on how well they meet the research goal. AHP is a mathematical technique for organizing and analyzing complex decisions where multiple criteria are involved. The technique structures problems into a hierarchy of elements, and helps to specify numerical weights representing the relative importance of each element. Numerical weight or priority derived from each element allows users to find alternatives that best suit their goal and their understanding of the problem.
XBSeq Test for differential expression for RNA-seq data
We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes.
CNVPanelizer Reliable CNV detection in targeted sequencing applications
A method that allows for the use of a collection of non-matched normal tissue samples. Our approach uses a non-parametric bootstrap subsampling of the available reference samples to estimate the distribution of read counts from targeted sequencing. As inspired by random forest, this is combined with a procedure that subsamples the amplicons associated with each of the targeted genes. The obtained information allows us to reliably classify the copy number aberrations on the gene level.
fCI f-divergence Cutoff Index for Differential Expression Analysis in Transcriptomics and Proteomics
(f-divergence Cutoff Index), is to find DEGs in the transcriptomic & proteomic data, and identify DEGs by computing the difference between the distribution of fold-changes for the control-control and remaining (non-differential) case-control gene expression ratio data. fCI provides several advantages compared to existing methods.
IONiseR Quality Assessment Tools for Oxford Nanopore MinION data
IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.
erma epigenomic road map adventures
Software and data to support epigenomic road map adventures.
PGA An package for identification of novel peptides by customized database derived from RNA-Seq
This package provides functions for construction of customized protein databases based on RNA-Seq data with/without genome guided, database searching, post-processing and report generation. This kind of customized protein database includes both the reference database (such as Refseq or ENSEMBL) and the novel peptide sequences form RNA-Seq data.
Source Code & Build Reports »
Source code is stored in
Software packages are built and checked nightly. Build reports: