# 1 Objectives

Abstract: This two-hour workshop is meant to empower Cancer Moonshot research labs to tackle their bioinformatic analysis challenges. For the first hour and a half, weâ€™ll work through a hands-on workflow for single cell assay analysis. Weâ€™ll introduce data import, management, and interactive visualization using Bioconductor tools like iSEE. After seeing how to work with one assay, weâ€™ll briefly explore approaches to integrating different assays. In the final Â½ hour weâ€™ll go beyond Bioconductor tools for single-cell analysis. Weâ€™ll assemble a panel to discuss possible strategies for data analysis challenges submitted (before the workshop) to the organizers.

Goal: Empower Cancer Moonshot Research Labs to tackle their bioinformatic analysis challenges

Objectives, this workshop:

1. Learn the basics of R and Bioconductor
2. Participate in the exploration of immuno-oncology relevant data using R / Bioconductor
3. Tour additional directions possible in R / Bioconductor
4. Discuss challenges and opportunities in immuno-oncology bioinformatics.

# 2R and Bioconductor 101

## 2.1R

What we will learn

• Working with R functions, variables, vectors and data structures
• Using packages to extend base R capabilities
• Getting help

R

• standard and advanced statistical analysis
• high quality visualizations
• interactivity

Vectors, variables, and functions

``````x = rnorm(100)
mean(x)
## [1] -0.1471021
var(x)
## [1] 0.7680572
hist(x)``````

Manageing data: classes and methods

``````y = x + rnorm(100)
df = data.frame(x, y)
plot(y ~ x, df)``````

Visualization

``````fit = lm(y ~ x, df)
anova(fit)
## Analysis of Variance Table
##
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)
## x          1  69.876  69.876  59.719 9.617e-12 ***
## Residuals 98 114.668   1.170
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(y ~ x, df)
abline(fit)``````

Extending base R: packages

``````library(ggplot2)
ggplot(df, aes(x, y)) +
geom_point() +
geom_smooth(method="lm")``````

CRAN (Comprehensive R Archive Network)

Help!

• `?lm`
• `browseVignettes("ggplot2")` / `vignette(package="ggplot2")`

What we learned

• Vectors simplify R expressions
• Structures like `data.frame` help manage data
• Packages provide many different extensions
• Help is available through `?`, `browseVignettes()` and other means

## 2.2Bioconductor

What we will learn

• Discover and use Bioconductor packages
• Work with `SummarizedExperiment` for data management
• Use annotation packages to map between gene identifiers

Bioconductor

• More than 1800 R packages for statistical analysis and comprehension of high-throughput genomic data
• Bulk and single-cell RNA-seq, epigenetic and other microarrays, called variants, flow cytometry, proteomics, â€¦
• Widely used (>1/2 million unique IP downloads / year), highly cited (>33,000 PubMedCentral citations), well-respected
• NIH funded â€“ NHGRI (core, cloud), ITCR (multi-assay, annotation- and experiment-hub), IOTN (immuno-oncology data coordinating center)

Resources

Data management

• Extensive resources to import and operate on standard file formats, e.g.Â BED, VCF, BAM, â€¦

Domain-specific work flows, e.g., bulk RNA-seq diffrential expression

``````library(airway)
airway
## class: RangedSummarizedExperiment
## dim: 64102 8
## assays(1): counts
## rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
## rowData names(0):
## colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
## colData names(9): SampleName cell ... Sample BioSample``````
• A Bioconductor SummarizedExperiment object, providing coordinated data management