Contents

1 Objectives

Abstract: This two-hour workshop is meant to empower Cancer Moonshot research labs to tackle their bioinformatic analysis challenges. For the first hour and a half, we’ll work through a hands-on workflow for single cell assay analysis. We’ll introduce data import, management, and interactive visualization using Bioconductor tools like iSEE. After seeing how to work with one assay, we’ll briefly explore approaches to integrating different assays. In the final ½ hour we’ll go beyond Bioconductor tools for single-cell analysis. We’ll assemble a panel to discuss possible strategies for data analysis challenges submitted (before the workshop) to the organizers.

Goal: Empower Cancer Moonshot Research Labs to tackle their bioinformatic analysis challenges

Objectives, this workshop:

  1. Learn the basics of R and Bioconductor
  2. Participate in the exploration of immuno-oncology relevant data using R / Bioconductor
  3. Tour additional directions possible in R / Bioconductor
  4. Discuss challenges and opportunities in immuno-oncology bioinformatics.

2 R and Bioconductor 101

2.1 R

What we will learn

  • Working with R functions, variables, vectors and data structures
  • Using packages to extend base R capabilities
  • Getting help

R

  • standard and advanced statistical analysis
  • high quality visualizations
  • interactivity

Vectors, variables, and functions

x = rnorm(100)
mean(x)
## [1] -0.1471021
var(x)
## [1] 0.7680572
hist(x)

Manageing data: classes and methods

y = x + rnorm(100)
df = data.frame(x, y)
plot(y ~ x, df)

Visualization

fit = lm(y ~ x, df)
anova(fit)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## x          1  69.876  69.876  59.719 9.617e-12 ***
## Residuals 98 114.668   1.170                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(y ~ x, df)
abline(fit)

Extending base R: packages

library(ggplot2)
ggplot(df, aes(x, y)) + 
    geom_point() +
    geom_smooth(method="lm")

CRAN (Comprehensive R Archive Network)

Help!

  • ?lm
  • browseVignettes("ggplot2") / vignette(package="ggplot2")

What we learned

  • Vectors simplify R expressions
  • Structures like data.frame help manage data
  • Packages provide many different extensions
  • Help is available through ?, browseVignettes() and other means

2.2 Bioconductor

What we will learn

  • Discover and use Bioconductor packages
  • Work with SummarizedExperiment for data management
  • Use annotation packages to map between gene identifiers

Bioconductor

  • More than 1800 R packages for statistical analysis and comprehension of high-throughput genomic data
  • Bulk and single-cell RNA-seq, epigenetic and other microarrays, called variants, flow cytometry, proteomics, …
  • Widely used (>1/2 million unique IP downloads / year), highly cited (>33,000 PubMedCentral citations), well-respected
  • NIH funded – NHGRI (core, cloud), ITCR (multi-assay, annotation- and experiment-hub), IOTN (immuno-oncology data coordinating center)

Resources

Data management

  • Extensive resources to import and operate on standard file formats, e.g. BED, VCF, BAM, …

Domain-specific work flows, e.g., bulk RNA-seq diffrential expression

  • Load example data, pre-formatted.
library(airway)
data(airway)      # load example data
airway
## class: RangedSummarizedExperiment 
## dim: 64102 8 
## metadata(1): ''
## assays(1): counts
## rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
## rowData names(0):
## colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
## colData names(9): SampleName cell ... Sample BioSample
  • A Bioconductor SummarizedExperiment object, providing coordinated data management