`r knitr::opts_chunk$set(tidy=FALSE); options(width=130)` # Summary 1. R Language - Large data requires efficient programming. - Efficient programming benefits from understanding more of how R works 'under the hood'. Correctness is always more important than speed. - Parallel evaluation is a secondary approach to gaining performance. 2. Objects - Objects allow co-ordinated manipulation of complex inter-related data; objects are pervasive in R. - Formal S4 objects provide structure that benefits interoperability between related classes, while enabling experienced users and package developers to rapidly re-use existing concepts and code. - S4 objects are used extensively (and to good effect) in Bioconductor; it pays to understand key classes and their manipulation. 3. C (and other) languages - Two reasons for writing C code are (1) to interface with existing libraries and (2) to implement high-performance algorithms. - Writing C code has many drawbacks, including large time investment to develop the code, implementations that often undermine R concepts such as handling of NAs, and introduction of catastrophic or subtle memory bugs. These consideration should discourage us from embarking on projects that involve C code except as a last resort. - For algorithm implementation, one quickly graduates from the relative simplicity of the .C interface to the flexibility of the .Call interface (requiring significant understanding of R's internal representation) to [Rcpp][]-style programming that masks some of the complexity of interacting with R while exposing the object-oriented facilities of C++. 4. Data bases and external data representations - Processing data from non-R formats can be efficient and powerful. Bioconductor packages use SQL data bases to store gene and genome annotations, and XML to query web-based resources. - SQL represents a great solution for querying relational data. Straight-forward solutions easily scale to data with 100k rows, but like R exploiting larger SQL data resources requires non-trivial understanding of SQL and the data base engine in use. - XML and in particular XPath provides a very flexible way to query web-based resources or to interoperate with other software. The [XML][] package has a unique event parsing mechanism for iterating through large XML objects. [Rcpp]: http://cran.r-project.org/web/packages/Rcpp/index.html [XML]: http://cran.r-project.org/web/packages/XML/index.html