--- title: "Creating Your Docker Container and Command Line Interface (with docopt) " output: BiocStyle::html_document: toc: true toc_depth: 4 number_sections: false highlight: haddock css: style.css --- ```{r include=FALSE} library(BiocStyle) knitr::opts_chunk$set(eval = TRUE) ``` ## Introduction __In progress__ In this tutorial, we will go through ways to - Make native command line interface in R There are many fun ways to do it, here I am more focused for R developers. ### Existing docker repos Before you create any, make sure you don't re-invent the wheel and use the best base image for your container as your tool chain may save your lots of time later on. #### Rocker Project Official R Docker images is called "Rocker"" project and is on [github](https://github.com/rocker-org/rocker), please visit the page to find more details and Dockerfile. | Image | Description | | ------------- |:-------------:| | rocker/r-base | base package to build from| | rocker/r-devel | base plus R-devel from SVN | | rocker/rstudio | base plus RStudio Server | | rocker/hadleyverse | rstudio + Hadley's packages, LaTeX | | rocker/ropensci | hadleyverse + rOpenSci packages | | rocker/r-devel-san | base, SVN's R-devel and SAN | #### Bioconductor Images Bioconductor have a nice [page](http://bioconductor.org/help/docker/) about the official docker images, please read for more details. | Image | | ---- | | bioconductor/devel_base | bioconductor/devel_core | bioconductor/devel_flow | | bioconductor/devel_microarray | | bioconductor/devel_proteomics | | bioconductor/devel_sequencing | | bioconductor/release_base | | bioconductor/release_core | | bioconductor/release_flow | | bioconductor/release_microarray | | bioconductor/release_proteomics | | bioconductor/release_sequencing | To understand the image quickly here is the short instruction for the image name: - __release__ images are based on `rocker/rstudio` - __devel__ images are based on `rocker/rstudio-daily` - __base__: Contains R, RStudio, and BiocInstaller + system dependencies. - __core__: base + a selection of core. - __flow__: core + all packages tagged with the _FlowCytometry_ biocView. - __microarray__: core + all packages tagged with the _Microarray_ biocView. - __proteomics__: core + all packages tagged with the _Proteomics_ biocView. - __sequencing__: core + all packages tagged with the _Sequencing_ biocView. #### Dockerhub [Dockerhub](https://hub.docker.com/) also provide public/private repos, you can search existing tools without building yourself, it's very likely some popular tool already have docker container well maintained there. #### Seven Bridges Docker Registry Tutorial comming soon. Example Seven Bridges registry: - __SevenBridges__ : `images.sbgenomics.com/[:]` - __Cancer Genomics Cloud__: `cgc-images.sbgenomics.com/[:]` ## Tutorial: random number generator Our goal here is to making a cwl app to generate unif random numbers, yes, the core function is `runif()`, it's native function in R. ```{r} runif runif(10) ``` ### Using docopt package In R, we also have a nice implementation in a package called `docopt`, developed by *Edwin de Jonge*. Check out its [tutorial](https://github.com/docopt/docopt.R) on github. So let's quickly create a command line interface for our R scripts with a dummy example. Let's turn the uniform distribution function `runif` into a command line tool. when you check out the help page for `runif`, here is the key information you want to mark down. ``` Usage runif(n, min = 0, max = 1) Arguments n number of observations. If length(n) > 1, the length is taken to be the number required. min, max lower and upper limits of the distribution. Must be finite. ``` I will add one more parameter to set seed, here is the R script file called `runif.R`. At the beginning of the commadn line script, I use docopt standard to write my tool help. ```{r} 'usage: runif.R [--n= --min= --max= --seed=] options: --n= number of observations. If length(n) > 1, the length is taken to be the number required [default: 1]. --min= lower limits of the distribution. Must be finite [default: 0]. --max= upper limits of the distribution. Must be finite [default: 1]. --seed= seed for set.seed() function [default: 1]' -> doc library(docopt) ``` Let's first do some testing in your R session before you make it a full functional command line tool. ```{r} docopt(doc) #with no argumetns provided docopt(doc, "--n 10 --min=3 --max=5") ``` Looks like it works, now let's add main function script for this command line tool. ```{r} opts <- docopt(doc) set.seed(opts$seed) runif(n = as.integer(opts$n), min = as.numeric(opts$min), max = as.numeric(opts$max)) ``` Add Shebang at the top of the file, this is a complete example for `runif.R` command line will be like this ```{r, eval=FALSE} #!/usr/bin/Rscript 'usage: runif.R [--n= --min= --max= --seed=] options: --n= number of observations. If length(n) > 1, the length is taken to be the number required [default: 1]. --min= lower limits of the distribution. Must be finite [default: 0]. --max= upper limits of the distribution. Must be finite [default: 1]. --seed= seed for set.seed() function [default: 1]' -> doc library(docopt) opts <- docopt(doc) set.seed(opts$seed) runif(n = as.integer(opts$n), min = as.numeric(opts$min), max = as.numeric(opts$max)) ``` Let's test this command line. ``` $ runif.R --help Loading required package: methods usage: runif.R [--n= --min= --max= --seed=] options: --n= number of observations. If length(n) > 1, the length is taken to be the number required [default: 1]. --min= lower limits of the distribution. Must be finite [default: 0]. --max= upper limits of the distribution. Must be finite [default: 1]. --seed= seed for set.seed() function [default: 1] $ runif.R Loading required package: methods [1] 0.2655087 $ runif.R Loading required package: methods [1] 0.2655087 $ runif.R --seed=123 --n 10 --min=1 --max=100 Loading required package: methods [1] 29.470174 79.042208 41.488715 88.418723 94.106261 5.510093 53.282443 [8] 89.349485 55.592066 46.204859 ``` For full example you can check my github [example](https://github.com/tengfei/docker/tree/master/runif) ### Quick command line interface with commandArgs (position and named args) For advanced users, please read another tutorial "Creating Your Docker Container and Command Line Interface (with docopt)", "docopt" is more formal way to construct your command line interface, but there is a quick way to make command line interface here using just `commandArgs` Suppose I already have a R script like this using position mapping the arguments 1. numbers 2. min 3. max ```{r, eval = TRUE, comment=''} fl <- system.file("docker/sevenbridges/src", "runif2spin.R", package = "sevenbridges") cat(readLines(fl), sep = '\n') ``` Ignore the comment part, I will introduce spin/stich later. My base command will be somethine like ``` Rscript runif2spin.R 10 30 50 ``` I just describe my tool in this way ```{r} library(sevenbridges) library(readr) fd <- fileDef(name = "runif.R", content = read_file(fl)) rbx <- Tool(id = "runif", label = "runif", hints = requirements(docker(pull = "rocker/r-base"), cpu(1), mem(2000)), requirements = requirements(fd), baseCommand = "Rscript runif.R", stdout = "output.txt", inputs = list(input(id = "number", type = "integer", position = 1), input(id = "min", type = "float", position = 2), input(id = "max", type = "float", position = 3)), outputs = output(id = "random", glob = "output.txt")) ``` Now copy-paste the json into your project app and run it in the cloud to test it How about named argumentments? I will still recommend use "docopt" package, but for simple way. ```{r, eval = TRUE, comment=''} fl <- system.file("docker/sevenbridges/src", "runif_args.R", package = "sevenbridges") cat(readLines(fl), sep = '\n') ``` ``` Rscript runif_args.R --n=10 --min=30 --max=50 ``` I just describe my tool in this way, note, I use `separate=FALSE` and add `=` to my prefix as a hack. ```{r} library(readr) fd <- fileDef(name = "runif.R", content = read_file(fl)) rbx <- Tool(id = "runif", label = "runif", hints = requirements(docker(pull = "rocker/r-base"), cpu(1), mem(2000)), requirements = requirements(fd), baseCommand = "Rscript runif.R", stdout = "output.txt", inputs = list(input(id = "number", type = "integer", separate = FALSE, prefix = "--n="), input(id = "min", type = "float", separate = FALSE, prefix = "--min="), input(id = "max", type = "float", separate = FALSE, prefix = "--max=")), outputs = output(id = "random", glob = "output.txt")) ``` ### Quick report: Spin and Stich Alternative, you can use spin/stich from knitr to generate report directly from a Rscript with special format. For example, let's use above example ```{r, eval = TRUE, comment=''} fl <- system.file("docker/sevenbridges/src", "runif_args.R", package = "sevenbridges") cat(readLines(fl), sep = '\n') ``` You command is something like this ``` Rscript -e "rmarkdown::render(knitr::spin('runif_args.R', FALSE))" --args --n=100 --min=30 --max=50 ``` And so I describe my tool like this with docker image `rocker/hadleyverse` this contians knitr and rmarkdown package. ```{r} library(readr) fd <- fileDef(name = "runif.R", content = read_file(fl)) rbx <- Tool(id = "runif", label = "runif", hints = requirements(docker(pull = "rocker/hadleyverse"), cpu(1), mem(2000)), requirements = requirements(fd), baseCommand = "Rscript -e \"rmarkdown::render(knitr::spin('runif.R', FALSE))\" --args", stdout = "output.txt", inputs = list(input(id = "number", type = "integer", separate = FALSE, prefix = "--n="), input(id = "min", type = "float", separate = FALSE, prefix = "--min="), input(id = "max", type = "float", separate = FALSE, prefix = "--max=")), outputs = list(output(id = "stdout", type = "file", glob = "output.txt"), output(id = "random", type = "file", glob = "*.csv"), output(id = "report", type = "file", glob = "*.html"))) ``` You will get a report in the end ### Executable report with R markdown (advanced) We cannot really make a Rmarkdown file executable in it by simply put ``` #!/bin/bash/Rscript ``` In your markdown Of course, we can figure out a way to do it in `liftr` or `knitr`. But rmarkdown allow you to pass parameters to your Rmardown template, please read this tutorial [Parameterized Reports](http://rmarkdown.rstudio.com/developer_parameterized_reports.html). This doesn't solve my problem that I want to directly describe command line interface in the markdown template. However, here is alternative method: Create an command line interface to pass `params` from docopt into `rmarkdown::render()` function. In this way, we can pass as many as possible parameters from command line interface into our Rmarkdown template. So here we go, here is updated methods and it's also what I use for another tutorial about RNA-seq workflow. ```{r, eval = TRUE} fl <- system.file("docker/sevenbridges/src/", "runif.R", package = "sevenbridges") ``` Here is the current content of command line interface ```{r, comment='', eval = TRUE, echo = FALSE} cat(readLines(fl), sep = '\n') ``` And here is the report template ```{r, comment='', eval = TRUE, echo = FALSE} fl <- system.file("docker/sevenbridges/report/", "report.Rmd", package = "sevenbridges") cat(readLines(fl), sep = '\n') ``` ## Setup dockerhub automated build To make things more reproducible and explicit and automatic, you can do a autohook to automatically build your container/image on docker hub. Here is what I do 1. I created some project called 'docker' on my github and it has all container that crated from a Dockerfile, for example, tengfei/docker/runif, please go [here](https://github.com/tengfei/docker/tree/master/runif) to check it out 2. This folder root has a Dockerfile and subfolders for extra materials I added at build time, like script or report template. 3. login your dockerhub account, following this [tutorial](https://docs.docker.com/docker-hub/builds/) to make "automated build" from your github account. Make sure you input the right location for your Dockerfile, by customizing it. 4. Then you will have auto-build every time you push a new update in github. 5. Start using your docker image like 'tengfei/runif' 6. Feel free to push it onto your SevenBridges platform registry as well. ## More examples There are more examples under 'inst/docker' folder, you can check out how to describe command line and build docker, how to make report template. You can read the online github [code](https://github.com/sbg/sevenbridges-r/tree/master/inst/docker). Or you can read another tutorial about how we wrap RNA-seq workflow from bioconductor.