Chapter 3 Beyond R Basics

Here we briefly outline resources for taking your R programming to the next level, including resources for learning about package development. We also outline some companions to R that are good to know not only for package development, but also for running your own bioinformatic pipelines, enabling you to use a broader array of tools to go from raw data to preprocessed data before working in R.

3.1 Becoming an R Expert

A deeper dive into the finer details of the R programming language is provided by the book Advanced R. While targeted at more experienced R users and programmers, this book represents a comprehensive compendium of more advanced concepts, and touches on some of the paradigms used extensively by developers throughout Bioconductor, specifically programming with S4.

Eventually, you’ll reach the point where you have your own collection of functions and datasets, and where you will be writing your own packages. Luckily, there’s a guide for just that, with the book R Packages. Packages are great even if just for personal use, and of course, with some polishing, can eventually become available on CRAN or Bioconductor. Furthermore, they are also a great way of putting together code associated with a manuscript, promoting reproducible, accessible computing practices, something we all strive for in our work.

For many of the little details that are oft forgotten learning about R, the aptly named What They Forgot to Teach You About R is a great read for learning about the little things such as file naming, maintaining an R installation, and reproducible analysis habits.

Finally, we save the most intriguing resource for last - another book for those on the road to becoming an R expert is R Inferno, which dives into many of the unique quirks of R. Warning: this book goes very deep into the painstaking details of R.

3.2 Becoming an R/Bioconductor Developer

While learning to use Bioconductor tools is a very welcoming experience, unfortunately there is no central resource for navigating the plethora of gotchas and paradigms associated with developing for Bioconductor. Based on conversations with folks involved in developing for Bioconductor, much of this knowledge is hard won and fairly spread out. This however is beginning to change with more recent efforts led by the Bioconductor team, and while this book represents an earnest effort towards addressing the user perspective, it is currently out of scope to include a deep dive about the developer side.

For those looking to get started with developing packages for Bioconductor, it is important to first become acquainted with developing standalone R packages. To this end, the R Packages book provides a deep dive into the details of constructing your own package, as well as details regarding submission of a package to CRAN. For programming practices,

With that, some resources that are worth looking into to get started are the BiocWorkshops repository under the Bioconductor Github provides a book composed of workshops that have been hosted by Bioconductor team members and contributors. These workshops center around learning, using, and developing for Bioconductor. A host of topics are also available via the Learn module on the Bioconductor website. Finally, the Bioconductor developers portal contains a bevy of individual resources and guides for experienced R developers.

3.3 Nice Companions for R

While not essential for our purposes, many bioinformatic tools for processing raw sequencing data require knowledge beyond just R to install, run, and import their results into R for further analysis. The most important of which are basic knowledge of the Shell/Bash utilities, for working with bioinformatic pipelines and troubleshooting (R package) installation issues.

Additionally, for working with packages or software that are still in development and not hosted on an official repository like CRAN or Bioconductor, knowledge of Git - a version control system - and the popular GitHub hosting service is helpful. This enables you to not only work with other people’s code, but also better manage your own code to keep track of changes.

3.3.1 Shell/Bash

Datacamp and other interactive online resources such as Codecademy are great places to learn some of these extra skills. We highly recommend learning Shell/Bash, as it is the starting point for most bioinformatic processing pipelines.

3.3.2 Git

We would recommend learning Git next, a system for code versioning control which underlies the popular GitHub hosting service, where many of the most popular open source tools are hosted. Learning Git is essential not only for keeping track of your own code, but also for using, managing, and contributing to open source software projects.

For a more R centric look at using Git (and Github), we highly recommend checking out Happy Git and Github for the useR.

3.3.3 Other Languages

A frequent question that comes up is “What else should I learn besides R?” Firstly, we believe that honing your R skills is first and foremost, and beyond just R, learning Shell/Bash and Git covered in the Nice Companions for R section are already a great start. For those just getting started, these skills should become comfortable in practice before moving on.

However, there are indeed benefits to going beyond just R. At a basic level, learning other programming languages helps broaden one’s perspective - similar to learning multiple spoken or written languages, learning about other programming languages (even if only in a cursory manner) helps one identify broader patterns that may be applicable across languages.

At an applied level, work within and outside of R has made it ever more friendly now than ever before with multi-lingual setups and teams, enabling the use of the best tool for the job at hand. For example, Python is another popular language used in both data science and a broader array of applications as well. R now supports a native Python interface via the reticulate package, enabling access to tools developed originally in Python such as the popular TensorFlow framework for machine learning applications. C++ is frequently used natively in R as well via Rcpp in packages to massively accelerate computations. Finally, multiple languages are supported in code documents and reports through R Markdown.