Usually, work is organized into a directory with:
.rdsfiles) that represent final results or intermediate ‘checkpoints’ (
data/ALL-cleaned.rds). Read the data into an R session using
setwd()to navigate to folder containing scripts/, extdata/ folder
R can also save the state of the current session (prompt when choosing to
quit() R), and to view and save the
history() of the the current session; I do not find these to be helpful in my own work flows.
All the functionality we have been using comes from packages that are automatically loaded when R starts. Loaded packages are on the
##  ".GlobalEnv" "package:RColorBrewer" "package:BiocStyle" "package:stats" ##  "package:graphics" "package:grDevices" "package:utils" "package:datasets" ##  "package:methods" "Autoloads" "package:base"
Additional packages may be installed in R’s libraries. Use `installed.packages() or the RStudio interface to see installed packages. To use these packages, it is necessary to attach them to the search path, e.g., for survival analysis
There are many thousands of R packages, and not all of them are installed in a single installation. Important repostories are
A package needs to be installed once, and then can be used in any R session.
Load the BRFSS-subset.csv data
path <- "extdata/BRFSS-subset.csv" # or file.choose() brfss <- read.csv(path)
Clean it by coercing
Year to factor
brfss$Year <- factor(brfss$Year)
Useful for quick exploration during a normal work flow.
?par, but often provided as arguments to
Construct complicated plots by layering information, e.g., points, regression line, annotation.
brfss2010Male <- subset(brfss, (Year == 2010) & (Sex == "Male")) fit <- lm(Weight ~ Height, brfss2010Male) plot(Weight ~ Height, brfss2010Male, main="2010, Males") abline(fit, lwd=2, col="blue") points(180, 90, pch=20, cex=3, col="red")
Approach to complicated graphics: create a grid of panels (e.g.,
par(mfrows=c(1, 2)), populate with plots, restore original layout.
brfssFemale <- subset(brfss, Sex=="Female") opar = par(mfrow=c(2, 1)) # layout: 2 'rows' and 1 'column' hist( # first panel -- 1990 brfssFemale[ brfssFemale$Year == 1990, "Weight" ], main = "Female, 1990") hist( # second panel -- 2010 brfssFemale[ brfssFemale$Year == 2010, "Weight" ], main = "Female, 2010")
par(opar) # restore original layout
‘Grammar of graphics’
aes()) to be plotted
Add layers (
geom_*()) of information
ggplot(brfss2010Male, aes(x=Height, y=Weight)) + geom_point() + geom_smooth(method="lm")
Capture a plot and augment it
plt <- ggplot(brfss2010Male, aes(x=Height, y=Weight)) + geom_point() + geom_smooth(method="lm") plt + labs(title = "2010 Male")
facet_*() for layouts
ggplot(brfssFemale, aes(x=Height, y=Weight)) + geom_point() + geom_smooth(method="lm") + facet_grid(. ~ Year)
Choose display to emphasize relevant aspects of data
ggplot(brfssFemale, aes(Weight, fill=Year)) + geom_density(alpha=.2)