The R language

Motivation and relevance

'Atomic' vectors

data.frame(), matrix()

Attributes

str() (including 2nd argument) and dput()

Exercise: what's a factor()?

environment()

Exercise: finding variables

Exercise: NAMESPACE

function

Argument basics

Function environments

Exercise: bank account: explain…

account <- function(initial=0) {
    available <- initial
    list(deposit=function(amount) {
        available <<- available + amount
        available
    }, balance=function() {
         available
    })
}
my_acct <- account()
my_acct$deposit(100)
## [1] 100
your_acct <- account(20)
my_acct$deposit(200)
## [1] 300
my_acct$balance()
## [1] 300
your_acct$balance()
## [1] 20

Primitives and essential 'classes' and 'methods'

vector
   o length, [, [<-, [[, [[<-, names, names<-, class, class<-, ...
-- raw()                   RAWSXP
-- logical()               LGLSXP
-- numeric()               REALSXP
   -- integer()            INTSXP
-- complex()               CPLXSXP
-- character()             STRSXP
-- list()                  VECSXP
   -- data.frame()
   -- ... many S3 objects
-- structure()
   -- array()
      -- matrix()
-- expression()            EXPRSXP
environment (new.env())    ENVSXP
   o ls
   o [[, [[<-
closure (e.g., function)   CLOSSXP
S4 class                   S4SXP
...

Deeper…

Copy-on-write

Some tools

Exercise: explain…

x <- 1:5; tracemem(x)
x[1] <- 2L
x[1] <- 2

x <- y <- seq(1, 5); tracemem(x)
x[1] <- 2L

df <- data.frame(x=1:5, y=5:1)
tracemem(df); tracemem(df$x)
df[1,1] <- 2

m <- matrix(1:10, 2); tracemem(m)
m[1, 1] <- 2L

f <- function(x) x[1]
g <- function(x) { x[1] <- 2L; x }
tracemem(x <- 1:5); f(x)
tracemem(x <- 1:5); g(x)

Data representation

x <- 1:5
.Internal(inspect(x))
## @1050cc088 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5

Exercise: Use .Internal(inspect()) to discover other common S-expression types, in addition to INTSXP. Some examples:

.Internal(inspect(pi))
.Internal(inspect(data.frame()))
.Internal(inspect(function() {}))
.Internal(inspect(expression(1 + 2)))

Garbage collection

Uses NAMED rather than reference counts

.Internal(inspect(1:5))
## @105199ff8 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
.Internal(inspect(x <- 1:5))
## @1050f6838 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 1,2,3,4,5
.Internal(inspect(y <- x <- 1:5))
## @1051bc768 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5

Copy-on-write illusion

Styles of programming

Declarative vs. imperative

Example from Rowe:

'clamp' data so that values are no greater than 5 standard deviations from the mean.

Data:

set.seed(123)
x <- rnorm(10000000)

Find values: Declarative

x[abs(x) > 5 * sd(x)]
## [1] -5.051  5.213  5.348  5.227

Imperative

ans <- numeric()
for (xi in x)
    if (xi > 5 * sd(x))
        ans <- c(ans, xi)

Clamp: Declarative

x[abs(x) > 5 * sd(x)] <- 5 * sd(x)

Imperative

for (i in seq_along(x))
    if (abs(x[i]) > 5 * sd(x))
        x[i] <- 5 * sd(x)

Question: What are the merits of declarative vs. imperative styles?

Functional

Exercise: Few R functions are truly functional, but its possible to to recognize 'more' versus 'less' functional ways of writing R code. For the following,

df <- data.frame(x=1:5, y=5:1)
x0 <- sapply(names(df), function(x) sqrt(df[[x]]))
x1 <- sapply(names(df), function(x, df) sqrt(df[[x]]), df)
x3 <- sapply(df, function(x, fun) fun(x), sqrt)
x2 <- sapply(df, sqrt)

Object-oriented