Authoring document for R

Introduction

This first section introduces some tools to author documents that contain analysis results produced with R. Bringing data, results and their interpretation is essential to assure reproducibility of an analysis pipeline and to comprehensively communicate these results to collaborators. A popular solution for this is literate programming, a technique and set of tools that permit to

  1. Write text and code within a single document. Here we will use the simple markdown syntax and include R code chunks; such documents are denoted R markdown and have the Rmd extension.
  2. Extract (called tangling) and execute the code
  3. Replace the original code chunks by their results into the original document (called weaving).
  4. Compile the document into a final easily read format such as pdf or html.

Steps 2 to 4 are can be executed individually or automated into a single command such as knit2html from the knitr package or using editors such as Rstudio.

Other types of document and frameworks that combine a programming and authoring languages are Sweave and knitr files Rnw (combining LaTeX and R), IPython for python and other languages, orgmode, …

The second section introduces the ReportingTools package, in particular generation of interactive tables.

The last part very briefly presents the shiny package by describing a simple application and suggesting a few ways to improve it.

The test data used throughout this document is a DESeq2 result, available with

library("rauthoring")
data(res)

R Markdown

Markdown syntax

The figures below, taken from the Rstudio markdown (v2) tutorial illustrates basic markdown syntax and its output using Rstudio.

Markdow basics

R code chunks

To include R code in the R markdown file, the native code chunk syntax is augmented with code chunk tags inside {r, ...}, as illustrated below:

Markdown code chunk

The following optimal code chunk options are available:

To execute in-line code, use ` r 1+1` (no space between ` and r).

From Rmd to html

If you are using Rstudio, the simplest way to generate your final output is to open your Rmd file and click the Knit HTML (or Knit PDF, …) button. From R, you can use the knit2html function and give the Rmd as input. Both options will use the knit function to weave the document and generate the markdown md file that includes the code outputs. The generation of the final output will depend on the software version you use: either markdown::markdownToHTML or the more recent rmarkdown::render functions.

Exercise: Experiment with R markdown and the features described so far. Either create your Rmd file from scratch or use the simple template from the rauthoring package with rmdtemplate().

More markdown

library("rauthoring")
library("DESeq2")
data(res)
class(res)
## [1] "DESeqResults"
## attr(,"package")
## [1] "DESeq2"
head(res)
## log2 fold change (MAP): treatment DPN vs Control 
## Wald test p-value: treatment DPN vs Control 
## DataFrame with 6 rows and 6 columns
##                  baseMean log2FoldChange     lfcSE      stat    pvalue
##                 <numeric>      <numeric> <numeric> <numeric> <numeric>
## ENSG00000000003  613.8196       -0.01722   0.08665  -0.19868   0.84252
## ENSG00000000005    0.5502       -0.10344   1.09363  -0.09458   0.92465
## ENSG00000000419  304.0482       -0.01695   0.09517  -0.17807   0.85867
## ENSG00000000457  183.5157       -0.09654   0.12138  -0.79533   0.42642
## ENSG00000000460  207.4336        0.35004   0.14375   2.43497   0.01489
## ENSG00000000938   11.1598       -0.06361   0.44907  -0.14166   0.88735
##                      padj
##                 <numeric>
## ENSG00000000003    0.9764
## ENSG00000000005        NA
## ENSG00000000419    0.9797
## ENSG00000000457    0.8887
## ENSG00000000460    0.2728
## ENSG00000000938        NA

The most recent version of R Markdown also has a table syntax described here. Such tables can be generated with the helper function knitr::kable. Such tables are then displayed as html tables. Instead of copy/pasting the output of kabble into the R markdown document, one can specify results='asis' for the output of the code chunk to be interpreted as is, i.e. write raw results from R into the output document.

library("knitr")
kable(head(res))
## 
## 
## |                | baseMean| log2FoldChange|  lfcSE|    stat| pvalue|   padj|
## |:---------------|--------:|--------------:|------:|-------:|------:|------:|
## |ENSG00000000003 | 613.8196|        -0.0172| 0.0867| -0.1987| 0.8425| 0.9764|
## |ENSG00000000005 |   0.5502|        -0.1034| 1.0936| -0.0946| 0.9246|     NA|
## |ENSG00000000419 | 304.0482|        -0.0169| 0.0952| -0.1781| 0.8587| 0.9797|
## |ENSG00000000457 | 183.5157|        -0.0965| 0.1214| -0.7953| 0.4264| 0.8887|
## |ENSG00000000460 | 207.4336|         0.3500| 0.1438|  2.4350| 0.0149| 0.2728|
## |ENSG00000000938 |  11.1598|        -0.0636| 0.4491| -0.1417| 0.8874|     NA|
baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000000003 613.8196 -0.0172 0.0867 -0.1987 0.8425 0.9764
ENSG00000000005 0.5502 -0.1034 1.0936 -0.0946 0.9246 NA
ENSG00000000419 304.0482 -0.0169 0.0952 -0.1781 0.8587 0.9797
ENSG00000000457 183.5157 -0.0965 0.1214 -0.7953 0.4264 0.8887
ENSG00000000460 207.4336 0.3500 0.1438 2.4350 0.0149 0.2728
ENSG00000000938 11.1598 -0.0636 0.4491 -0.1417 0.8874 NA

Try out some of these additional features in your own Rmd file or look at this vignettes source code. Find where it is with system.file("doc/rauthoring.Rmd", package = "rauthoring").

ReportingTools

The Bioconductor ReportingTools provides additional features to generate reports. From the Bioc landing page:

The ReportingTools software package enables users to easily
display reports of analysis results generated from sources such as
microarray and sequencing data. The package allows users to create
HTML pages that may be viewed on a web browser such as Safari, or
in other formats readable by programs such as Excel. Users can
generate tables with sortable and filterable columns, make and
display plots, and link table entries to other data sources such
as NCBI or larger plots within the HTML page. Using the package,
users can also produce a table of contents page to link various
reports together for a particular project that can be viewed in a
web browser. For more examples, please visit our site:
http://research-pub.gene.com/ReportingTools.

For example, to create a html report that contains an interactive html table:

library("ReportingTools")
htmlRep <- HTMLReport(shortName = "htmltable",
                      reportDirectory = "./reports")
publish(res, htmlRep)
finish(htmlRep)
browseURL("./reports/htmltable.html")

Generate the ReportingTools dynamic html table as shown above.

In particular, it seamlessly integrates with knitr and R markdown documents, as illustrated in the knitrreptools vignette.

See the ReportingTools vignettes for more details and other use cases:

vignette(package = "ReportingTools")

Web applications with shiny

shiny is an R package that allows to build interactive web applications from R. These apps are composed of a user interface part and a server back end. These are saved in the ui.R and server.R files respectively. Both files are stored in a directory, app1 below. Both sides continuously react to their respective updates, hence the term of reactive programming.

Server

The server back end defines the server functionality through the shinyServer function, which itself takes an anonymous function as input with parameters input and output. The former corresponds to data that stems for the user interface (see below) and the latter defines the server output that will be available to the interface.

library("shiny")
library("rauthoring")
library("DESeq2")
data(res)
res <- data.frame(res)[1:1000, ]


shinyServer(function(input, output) {
    ## Expression that generates a histogram. The expression is
    ## wrapped in a call to renderPlot to indicate that:
    ##
    ##  1) It is "reactive" and therefore should re-execute automatically
    ##     when inputs change
    ##  2) Its output type is a plot

    output$hist <- renderPlot({
        ## draw the histogram with the specified number of bins
        hist(res$pvalue, breaks = input$breaks,
             col = 'darkgray', border = 'white',
             main = "DESeq p-values")
    })
})

Interface

The interface is handled by the shinyUI function that draws the interface layout. Below, we create a fluidPage layout composed by a title and a side bar layout. The latter contains a side bar and a main panel. The side bar get a slider input widget with ranges from 10 to 100 and default value 50 - the value will be available in the server input as input$breaks. The main panel is composed by a plot renderer that display the value "hist", that is returned by the server back end.

library("shiny")

## Define UI for application that draws a histogram
ui <- fluidPage(
    ## Application title
    titlePanel("DESeq2 results"),

    ## Sidebar with a slider input for the number of bins
    sidebarLayout(
        sidebarPanel(
            sliderInput("breaks",
                        "Number of breaks:",
                        min = 10,
                        max = 100,
                        value = 50)
        ),

        ## Show a plot of the generated distribution
        mainPanel(
            plotOutput("hist")
        )
    )
)

shinyUI(ui)

To start this application use the runApp function with the name of the directory containing the server and interface definitions.

runApp("./app1")

Create the ui.R and server.R files, store them in a directory, and start you shiny app.

Let's try to improve our simple first app. A suggestion is available by running the app2() function. The additional widgets added to the interface are radioButtons, a numericInput field (side bar) and a verbatimTextOutput (main panel). The server back end takes advantage of two new inputs to add a vertical line to the histogram and displays a table of significant/non-significant features.

A few useful links :

Session information

## R version 3.1.0 (2014-04-10)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] XML_3.98-1.1              ReportingTools_2.4.0     
##  [3] AnnotationDbi_1.26.0      Biobase_2.24.0           
##  [5] RSQLite_0.11.4            DBI_0.2-7                
##  [7] knitr_1.6                 DESeq2_1.4.5             
##  [9] RcppArmadillo_0.4.300.8.0 Rcpp_0.11.2              
## [11] GenomicRanges_1.16.3      GenomeInfoDb_1.0.2       
## [13] IRanges_1.22.9            BiocGenerics_0.10.0      
## [15] rauthoring_0.2.2         
## 
## loaded via a namespace (and not attached):
##  [1] AnnotationForge_1.6.1    BBmisc_1.6              
##  [3] BSgenome_1.32.0          BatchJobs_1.2           
##  [5] BiocParallel_0.6.1       Biostrings_2.32.0       
##  [7] Category_2.30.0          Formula_1.1-1           
##  [9] GO.db_2.14.0             GOstats_2.30.0          
## [11] GSEABase_1.26.0          GenomicAlignments_1.0.1 
## [13] GenomicFeatures_1.16.2   Hmisc_3.14-4            
## [15] MASS_7.3-33              Matrix_1.1-4            
## [17] PFAM.db_2.14.0           R.methodsS3_1.6.1       
## [19] R.oo_1.18.0              R.utils_1.32.4          
## [21] RBGL_1.40.0              RColorBrewer_1.0-5      
## [23] RCurl_1.95-4.1           RJSONIO_1.2-0.2         
## [25] Rsamtools_1.16.1         VariantAnnotation_1.10.4
## [27] XVector_0.4.0            annotate_1.42.0         
## [29] biomaRt_2.20.0           biovizBase_1.12.1       
## [31] bitops_1.0-6             brew_1.0-6              
## [33] caTools_1.17             cluster_1.15.2          
## [35] codetools_0.2-8          colorspace_1.2-4        
## [37] dichromat_2.0-0          digest_0.6.4            
## [39] edgeR_3.6.2              evaluate_0.5.5          
## [41] fail_1.2                 foreach_1.4.2           
## [43] formatR_0.10             genefilter_1.46.1       
## [45] geneplotter_1.42.0       ggbio_1.12.4            
## [47] ggplot2_1.0.0            graph_1.42.0            
## [49] grid_3.1.0               gridExtra_0.9.1         
## [51] gtable_0.1.2             htmltools_0.2.4         
## [53] httpuv_1.3.0             hwriter_1.3             
## [55] iterators_1.0.7          lattice_0.20-29         
## [57] latticeExtra_0.6-26      limma_3.20.5            
## [59] locfit_1.5-9.1           markdown_0.7            
## [61] munsell_0.4.2            plyr_1.8.1              
## [63] proto_0.3-10             reshape2_1.4            
## [65] rtracklayer_1.24.2       scales_0.2.4            
## [67] sendmailR_1.1-2          shiny_0.10.0.9001       
## [69] splines_3.1.0            stats4_3.1.0            
## [71] stringr_0.6.2            survival_2.37-7         
## [73] tools_3.1.0              xtable_1.7-3            
## [75] zlibbioc_1.10.0

References