CSAMA 2017: Statistical Data Analysis for Genome-Scale Biology
June 11-16, 2017
Bressanone-Brixen, Italy
URL: http://www.huber.embl.de/csama2017/
Lecturers: Jennifer Bryan, RStudio and UBC; Vincent J. Carey, Harvard Medical School; Laurent Gatto, University of Cambridge; Wolfgang Huber, European Molecular Biology Laboratory (EMBL), Heidelberg; Martin Morgan, Roswell Park Cancer Institute, Buffalo; Johannes Rainer, European Academy of Bozen (EURAC); Charlotte Soneson, University of Zurich; Levi Waldron, CUNY School of Public Health at Hunter College, New York.
Teaching Assistants: Simone Bell, EMBL, Heidelberg; Vladislav Kim, EMBL, Heidelberg; Lori Shepherd, RPCI, Buffalo; Mike L. Smith, EMBL, Heidelberg.
Resources
Source: Github
Monday, June 12
Lectures
- Introduction to R and Bioconductor (html).
- Computing with Sequences and Ranges (html).
- Tabular data management (pdf).
- Annotation resources (html); EnsemblDb (pdf).
Labs
- Introduction to Bioc. Introduction (html), R html), Bioconductor (html), Data representations (html), Annotation (html) (data: ALLphenoData.tsv, BRFSS-subset.csv, symgo.csv).
- Use of Git and GitHub with R, RStudio, and R Markdown (html)
Tuesday, June 13
Lectures
- Basics of sequence alignment and aligners (pdf).
- RNA-Seq data analysis and differential expression (pdf).
- New workflows for RNA-seq (pdf).
- Hypothesis testing (pdf).
Labs
Wednesday, June 14
Lectures
- Multiple testing (txt).
- Linear models (basic intro)) (html).
- Experimental design, batch effects and confounding (pdf).
- Robust statistics: median, MAD, rank test, Spearman, robust linear model (html).
Thursday, June 15
Lectures
- Visualization, the grammar of graphics and ggplot2 (pdf).
- Mass spec proteomics (html) and metabolomics (pdf).
- Clustering and classification (html).
- Resampling: cross-validation, bootstrap, and permutation tests (html).
- Analysis of microbiome marker gene data (pdf).
Labs
- Mass spec proteomics & metabolomics Proteomics (html), Metabolomics (html).
- MultiAssayExperiment (html), cheatsheet (pdf).
Friday, June 16
Lectures
- Gene set enrichment analysis (pdf).
- Working with large-scale (pdf) and remote (html) data.
- Developer Practices, Writing functions (html).
- Developer Practices, Writing packages (html).
Labs
- Graphics (pdf) (data: diabetes.csv).
- Machine learning (supervised) (pdf).
- Large data, performance, and parallelization; large-scale efficient computation with genomic intervals (html).