- 1 Overview
- 1.1 Basic concepts
- 1.2 Yeast cell cycle: phenotypic transitions (Lee, Rinaldi et al.
*Science*2002) - 1.3 Expression clusters
- 1.4 Species and organ of origin: microarrays and orthologues (McCall et al.,
*NAR*2012) - 1.5 Question
- 1.6 Species, organ of origin, and batch: RNA-seq and orthologues (Lin et al.,
*PNAS*2014) - 1.7 Three data analysis problems
- 1.8 Clustering concept
- 1.9 Classification methods
- 1.10 Statistical concepts to master

- 2 Cluster analysis concepts
- 2.1 Interactive exploration of clustering
- 2.2 Exploring clusters with tissue-of-origin data
- 2.3 Some definitions
- 2.4 Example: Euclidean distance
- 2.5 What is the ward.D2 agglomeration method?
- 2.6 What is the Jaccard similarity coefficient?
- 2.7 What is the bootstrap distribution of a statistic?
- 2.8 What is the bootstrap distribution of a statistic?
- 2.9 How to use the bootstrap distribution?
- 2.10 Bootstrap distributions of Jaccard
- 2.11 Now that we know the definitions:
- 2.12 Summary
- 2.13 Road map
- 2.14 Yeast cell cycle: phenotypic transitions
- 2.15 Yeast cell cycle: regulatory model
- 2.16 a data extract:
*S. cerevisiae*colony synchronized with alpha pheromone - 2.17 Raw trajectories for some of the genes in MCM cluster
- 2.18 A pattern of interest (“prototype”, but not in the data)
- 2.19 Formalism for the basal oscillator prototype
- 2.20 One possible form for \(U_g(t)\) for \(g\) a basal oscillator
- 2.21 Application of the distance concept
- 2.22 Computing distances to basal oscillator pattern
- 2.23 The nearest gene
- 2.24 The distribution of distances
- 2.25 “Top ten!”
- 2.26 Is it a cluster?
- 2.27 Definition from ?silhouette
- 2.28 Realizations of an unstructured grouping scheme
- 2.29 Trajectories from the arbitrary groups
- 2.30 The silhouette plot
- 2.31 Recap
- 2.32 Another exemplar
- 2.33 Solution
- 2.34 The most hyperbasal gene
- 2.35 “Top ten!”
- 2.36 Silhouette continuation
- 2.37 Caveats
- 2.38 Hierarchical clustering
- 2.39 Filtering
- 2.40 limma for trigonometric regression fits
- 2.41 Interactive interface
- 2.42 Tuning hclust: dendrogram structure
- 2.43 Projection with labels
- 2.44 Characteristic traces, raw expression data
- 2.45 Summary on clustering

- 3 Classification concepts
- 3.1 On classification methods with genomic data
- 3.2 BiocViews: StatisticalMethod
- 3.3 Conceptual basis for methods covered in the talk
- 3.4 A method on the boundary: linear discriminant analysis
- 3.5 Notes on LDA
- 3.6 Other approaches, issues
- 3.7 Application to the tissue-of-origin data
- 3.8 On leukemia data
- 3.9 On leukemia data, 2class
- 3.10 Summary

- Organisms are assayed on multiple features
*Variability*in feature measures exhibits*structure*- Clustering:
- For some grouping, between-group variation is larger than within-group variation
- Our goals are to find, evaluate, and interpret such groupings

- Classification:
- Organisms are sorted into classes and labeled
- Rules for classification are maps from features to labels
- Our goals are to find, evaluate, and interpret such rules

- How do you expect gene expression time series to cluster?

- Spellman et al MBC ’98; dendrogram on left: bottom half labeled MCM

Multivariate analysis of the yeast cell cycle uses the gene expression trajectory over time as the data vector

Multivariate analysis of tissue of origin data uses a snapshot of the transcriptome as the data vector

- Should the same methods be used for visualization and interpretation? Why or why not?
- roles of convenience and agnosticism
- roles of biological
*knowledge*and potential for corroboration