# 1 First Impressions

Type values and mathematical formulas into R’s command prompt

1 + 1
## [1] 2

Assign values to symbols (variables)

x = 1
x + x
## [1] 2

Invoke functions such as c(), which takes any number of values and returns a single vector

x = c(1, 2, 3)
x
## [1] 1 2 3

R functions, such as sqrt(), often operate efficienty on vectors

y = sqrt(x)
y
## [1] 1.000000 1.414214 1.732051

There are often several ways to accomplish a task in R

x = c(1, 2, 3)
x
## [1] 1 2 3
x <- c(4, 5, 6)
x
## [1] 4 5 6
x <- 7:9
x
## [1] 7 8 9
10:12 -> x
x
## [1] 10 11 12

Sometimes R does ‘surprising’ things that can be fun to figure out

x <- c(1, 2, 3) -> y
x
## [1] 1 2 3
y
## [1] 1 2 3

# 2R Data types: vector and list

‘Atomic’ vectors

• Types include integer, numeric (float-point; real), complex, logical, character, raw (bytes)

people <- c("Brian", "Jim", "Herve", "Dan", "Val", "Martin")
people
## [1] "Brian"  "Jim"    "Herve"  "Dan"    "Val"    "Martin"
• Atomic vectors can be named

population <- c(Buffalo=259000, Rochester=210000, New York=8400000)
population
##   Buffalo Rochester  New York
##    259000    210000   8400000
log10(population)
##   Buffalo Rochester  New York
##  5.413300  5.322219  6.924279
• Statistical concepts like NA (not available)

truthiness <- c(TRUE, FALSE, NA)
truthiness
## [1]  TRUE FALSE    NA
• Logical concepts like ‘and’ (&), ‘or’ (|), and ‘not’ (!)

!truthiness
## [1] FALSE  TRUE    NA
truthiness | !truthiness
## [1] TRUE TRUE   NA
truthiness & !truthiness
## [1] FALSE FALSE    NA
• Numerical concepts like infinity (Inf) or not-a-number (NaN, e.g., 0 / 0)

undefined_numeric_values <- c(NA, 0/0, NaN, Inf, -Inf)
undefined_numeric_values
## [1]   NA  NaN  NaN  Inf -Inf
sqrt(undefined_numeric_values)
## Warning in sqrt(undefined_numeric_values): NaNs produced
## [1]  NA NaN NaN Inf NaN
• Common string manipulations

toupper(people)
## [1] "BRIAN"  "JIM"    "HERVE"  "DAN"    "VAL"    "MARTIN"
substr(people, 1, 3)
## [1] "Bri" "Jim" "Her" "Dan" "Val" "Mar"
• R is a green consumer – recylcing short vectors to align with long vectors

x <- 1:3
x * 2            # '2' (vector of length 1) recycled to c(2, 2, 2)
## [1] 2 4 6
truthiness | NA
## [1] TRUE   NA   NA
truthiness & NA
## [1]    NA FALSE    NA
• It’s very common to nest operations, which can be simultaneously compact, confusing, and expressive ([: subset; <: less than)

substr(tolower(people), 1, 3)
## [1] "bri" "jim" "her" "dan" "val" "mar"
population[population < 1000000]
##   Buffalo Rochester
##    259000    210000

Lists

• The list type can contain other vectors, including other lists

frenemies = list(
friends=c("Larry", "Richard", "Vivian"),
enemies=c("Dick", "Mik")
)
frenemies
## $friends ## [1] "Larry" "Richard" "Vivian" ## ##$enemies
## [1] "Dick" "Mik"
• [ subsets one list to create another list, [[ extracts a list element

frenemies[1]
## $friends ## [1] "Larry" "Richard" "Vivian" frenemies[c("enemies", "friends")] ##$enemies
## [1] "Dick" "Mik"
##
## $friends ## [1] "Larry" "Richard" "Vivian" frenemies[["enemies"]] ## [1] "Dick" "Mik" Factors • Character-like vectors, but with values restricted to specific levels sex = factor(c("Male", "Male", "Female"), levels=c("Female", "Male", "Hermaphrodite")) sex ## [1] Male Male Female ## Levels: Female Male Hermaphrodite sex == "Female" ## [1] FALSE FALSE TRUE table(sex) ## sex ## Female Male Hermaphrodite ## 1 2 0 sex[sex == "Female"] ## [1] Female ## Levels: Female Male Hermaphrodite # 3 Classes: data.frame and beyond Variables are often related to one another in a highly structured way, e.g., two ‘columns’ of data in a spreadsheet x = rnorm(1000) # 1000 random normal deviates y = x + rnorm(1000) # another 1000 deviates, as a function of x plot(y ~ x) # relationship bewteen x and y Convenient to manipulate them together • data.frame(): like columns in a spreadsheet df = data.frame(X=x, Y=y) head(df) # first 6 rows ## X Y ## 1 -1.7569371 -0.70884344 ## 2 -1.6527157 -1.97487316 ## 3 -0.5161684 -1.36055768 ## 4 0.2218860 0.09724608 ## 5 -0.6661832 -1.82587026 ## 6 -0.5512824 0.71819197 plot(Y ~ X, df) # same as above • See all data with View(df). Summarize data with summary(df) summary(df) ## X Y ## Min. :-3.27963 Min. :-5.20065 ## 1st Qu.:-0.71917 1st Qu.:-1.02837 ## Median :-0.06830 Median :-0.08605 ## Mean :-0.06072 Mean :-0.09962 ## 3rd Qu.: 0.64606 3rd Qu.: 0.90735 ## Max. : 2.77080 Max. : 4.37988 • Easy to manipulate data in a coordinated way, e.g., access column X with$ and subset for just those values greater than 0

positiveX = df[df\$X > 0,]
##            X           Y
## 4  0.2218860  0.09724608
## 9  0.6701959  0.82361589
## 10 1.1216619  1.49955242
## 14 0.6156470  0.11297448
## 15 0.2805778 -1.84736727
## 16 0.7633320 -1.63962235
plot(Y ~ X, positiveX)

class(df)
## [1] "data.frame"
dim(df)
## [1] 1000    2
colnames(df)
## [1] "X" "Y"
• matrix() a related class, where all elements have the same type (a data.frame() requires elements within a column to be the same type, but elements between columns can be different types).

A scatterplot makes one want to fit a linear model (do a regression analysis)

• Use a formula to describe the relationship between variables
• Variables found in the second argument

fit <- lm(Y ~ X, df)
• Visualize the points, and add the regression line

plot(Y ~ X, df)
abline(fit, col="red", lwd=3)

• Summarize the fit as an ANOVA table

anova(fit)
## Analysis of Variance Table
##
## Response: Y
##            Df Sum Sq Mean Sq F value    Pr(>F)
## X           1 1040.0 1039.96  1022.2 < 2.2e-16 ***
## Residuals 998 1015.4    1.02
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
• Introspection – what class is fit? What methods can I apply to an object of that class?

class(fit)
## [1] "lm"
methods(class=class(fit))
##  [1] add1           alias          anova          case.names     coerce         confint
##  [7] cooks.distance deviance       dfbeta         dfbetas        drop1          dummy.coef
## [13] effects        extractAIC     family         formula        hatvalues      influence
## [19] initialize     kappa          labels         logLik         model.frame    model.matrix
## [25] nobs           plot           predict        print          proj           qr
## [31] residuals      rstandard      rstudent       show           simulate       slotsFromS3
## [37] summary        variable.names vcov
## see '?methods' for accessing help and source code

# 4 Help!

Help available in Rstudio or interactively

• Check out the help page for rnorm()

?rnorm
• ‘Usage’ section describes how the function can be used

rnorm(n, mean = 0, sd = 1)
• Arguments, some with default values. Arguments matched first by name, then position

• ‘Arguments’ section describes what the arguments are supposed to be

• ‘Value’ section describes return value

• ‘Examples’ section illustrates use

• Often include citations to relevant technical documentation, reference to related functions, obscure details

• Can be intimidating, but in the end actually very useful