A hasty tour inside R

R is a needful language for the data scientist. Its strengths include:

Capability: It offers a gargantuan set of functionalities
Community: It has an elephantine, ever growing community of users
Performance: It is lightning fast (when running in main memory)

This is a hasty tour inside R.

Basic arithmetic and logic operators

# arithmetic
1 + 2 * 4 - 2 / 2

## [1] 8

# integer division
31 %/% 3

## [1] 10

# modulus
31 %% 3

## [1] 1

# exponents
2^10

## [1] 1024

# comparison
1 == 1

## [1] TRUE

1 != 1

## [1] FALSE

1 < 1

## [1] FALSE

1 <= 1

## [1] TRUE

# logic operators
# conjunction
TRUE  & TRUE

## [1] TRUE

FALSE & TRUE

## [1] FALSE

TRUE  & FALSE

## [1] FALSE

FALSE & FALSE

## [1] FALSE

# disjunction
TRUE  | TRUE

## [1] TRUE

FALSE | TRUE

## [1] TRUE

TRUE  | FALSE

## [1] TRUE

FALSE | FALSE

## [1] FALSE

# negation
!TRUE

## [1] FALSE

!FALSE

## [1] TRUE

# exclusive disjunction
xor(TRUE, TRUE)

## [1] FALSE

xor(TRUE, FALSE)

## [1] TRUE

xor(FALSE, TRUE)

## [1] TRUE

xor(FALSE, FALSE)

## [1] FALSE

For conjunction and disjunction we have a shorter (& and |) and a longer (&& and ||) form. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right and evaluation proceeds only until the result is determined. The longer form is hence more efficient.

There are a few special values:

the value NA (not available) is used to represent missing values;
the value NULL is the null object (not to be confused with NULL in databases);
the value Inf stands for positive infinity;
the value NaN (not a number) is the result of a computation that makes no sense.

NA & TRUE

## [1] NA

NA & FALSE

## [1] FALSE

NA | TRUE

## [1] TRUE

NA | FALSE

## [1] NA

!NA

## [1] NA

2^1024

## [1] Inf

1/0

## [1] Inf

0 / 0

## [1] NaN

Inf - Inf

## [1] NaN

Of course, you may use variables to store values and results of expressions. There are 3 equivalent ways to assign a value to a variable (I prefer the first one, but the second one is the one you should use!):

x = 45
x <- 45 # this is the politically correct one!
45 -> x

To print the value of a variable, just type it:

## [1] 45

# or
print(x)

## [1] 45

# or print the structure of the object
str(x)

##  num 45

Data structures

R stores data in the following data structures: vector, list, matrix, array, and data frame.

Dim Homogeneous     Heterogeneous
1d  Atomic vector   List
2d  Matrix          Data frame
nd  Array

Check the type of data with function typeof.

typeof(x)

## [1] "double"

Vectors

A vector is a sequence of elements with the same type. Vector indexes start at 1 (not 0). Construct vectors with c or seq functions:

c(0, 1, 1, 2, 3, 5, 8)

## [1] 0 1 1 2 3 5 8

seq(1, 10, 1)

##  [1]  1  2  3  4  5  6  7  8  9 10

seq(2, 10, 2)

## [1]  2  4  6  8 10

Numeric vectors (double or integer vectors):

x = c(1, 2, 3)
typeof(x)

## [1] "double"

is.double(x)

## [1] TRUE

y = c(1L, 2L, 3L)
typeof(y)

## [1] "integer"

is.integer(y)

## [1] TRUE

is.numeric(x)

## [1] TRUE

is.numeric(y)

## [1] TRUE

Operations on numeric vectors:

# element-wise sum
c(1, 2, 3, 4) + c(10, 20, 30, 40)

## [1] 11 22 33 44

# element-wise product
c(1, 2, 3, 4) * c(10, 20, 30, 40)

## [1]  10  40  90 160

# scalar product
c(1, 2, 3, 4) %*% c(10, 20, 30, 40)

##      [,1]
## [1,]  300

If the two vectors have different lengths, the smaller one is repeated (recycling):

c(1, 2, 3, 4) + 10

## [1] 11 12 13 14

c(1, 2, 3, 4) + c(10, 20)

## [1] 11 22 13 24

Character vectors are vectors of strings:

c("This", "class", "is", "really", "terrific!")

## [1] "This"      "class"     "is"        "really"    "terrific!"

Boolean vectors are vectors of Booleans:

x = c(TRUE, FALSE, TRUE, FALSE)
y = !x
x

## [1]  TRUE FALSE  TRUE FALSE

## [1] FALSE  TRUE FALSE  TRUE

x & y

## [1] FALSE FALSE FALSE FALSE

x | y

## [1] TRUE TRUE TRUE TRUE

xor(x, y)

## [1] TRUE TRUE TRUE TRUE

Check the length of a vector with:

length(c(1, 2, 3))

## [1] 3

You may refer to members of a vector in several ways (mind the : operator):

a = 11:20
a

##  [1] 11 12 13 14 15 16 17 18 19 20

a[5]

## [1] 15

a[c(1, 5, 10)]

## [1] 11 15 20

a[-1]

## [1] 12 13 14 15 16 17 18 19 20

a[-c(1, 5, 10)]

## [1] 12 13 14 16 17 18 19

a > 15

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

a[a > 15]

## [1] 16 17 18 19 20

All subsetting operators can be combined with assignment to modify selected values of the input vector:

a[1] = 100
a[10] = 200
a

##  [1] 100  12  13  14  15  16  17  18  19 200

a[c(1, 10)] = c(10, 20)
a

##  [1] 10 12 13 14 15 16 17 18 19 20

a[a > 15] = Inf
a

##  [1]  10  12  13  14  15 Inf Inf Inf Inf Inf

All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be casted to the most flexible type (coercion). Types from least to most flexible are: logical, integer, double, and character.

as.integer(FALSE)

## [1] 0

as.integer(TRUE)

## [1] 1

x = c(TRUE, TRUE, FALSE, FALSE)
sum(x)

## [1] 2

mean(x)

## [1] 0.5

as.logical(0)

## [1] FALSE

as.logical(1)

## [1] TRUE

as.double(0L)

## [1] 0

as.integer(0.5)

## [1] 0

as.character(0.5)

## [1] "0.5"

as.double("0.5")

## [1] 0.5

as.numeric("a")

## Warning: si è prodotto un NA per coercizione

## [1] NA

Finally, vector elements can have names:

x = c(a = 1, b = 2, c = 3)
# or
x = c(1, 2, 3)
names(x) = c("a", "b", "c")
x["a"]

## a 
## 1

x[c("a", "b")]

## a b 
## 1 2

A factor is a vector that can contain only predefined values, and is used to store categorical variables. Factors are built on top of integer vectors using the levels attribute, which defines the set of allowed values.

x = factor(c("male", "female", "female", "male", "male"))
x

## [1] male   female female male   male  
## Levels: female male

typeof(x)

## [1] "integer"

levels(x)

## [1] "female" "male"

table(x)

## x
## female   male 
##      2      3

# You can't use values that are not levels
x[1] = "unknown"

## Warning in `[<-.factor`(`*tmp*`, 1, value = "unknown"): invalid factor
## level, NA generated

## [1] <NA>   female female male   male  
## Levels: female male

Factors are useful when you know the possible values a variable may take, even if you don’t see all values in a given dataset. Unfortunately, most data loading functions in R automatically convert character vectors to factors. This is suboptimal, because there’s no way for those functions to know the set of all possible levels or their optimal order. Instead, use the argument stringsAsFactors = FALSE to suppress this behaviour, and then manually convert character vectors to factors using your knowledge of the data.

Lists

A list is a sequence of elements that might have different type. It is a recursive structure, since it can contains other lists.

l = list(thing = "hat", size = 8.25, female = TRUE)
l

## $thing
## [1] "hat"
## 
## $size
## [1] 8.25
## 
## $female
## [1] TRUE

str(l)

## List of 3
##  $ thing : chr "hat"
##  $ size  : num 8.25
##  $ female: logi TRUE

# an element
l$thing

## [1] "hat"

l[[1]]

## [1] "hat"

l[["thing"]]

## [1] "hat"

# a sublist
l[c(1, 2)]

## $thing
## [1] "hat"
## 
## $size
## [1] 8.25

l[c("thing", "size")]

## $thing
## [1] "hat"
## 
## $size
## [1] 8.25

Mind that l[1] is a sub-list containing only the first component of list:

l[1]

## $thing
## [1] "hat"

typeof(l[1])

## [1] "list"

l[[1]]

## [1] "hat"

typeof(l[[1]])

## [1] "character"

“If list x is a train carrying objects, then x[[5]] is the object in car 5; x[5] is car number 5.”

You can add and remove elements from a list as follows:

l = list(a = 1, b = 2)
l$c = 3
l

## $a
## [1] 1
## 
## $b
## [1] 2
## 
## $c
## [1] 3

l$c = NULL
l

## $a
## [1] 1
## 
## $b
## [1] 2

A list may contain vectors:

l = list(thing = "hat", prices = c(8.25, 10.5), female = "TRUE")
l

## $thing
## [1] "hat"
## 
## $prices
## [1]  8.25 10.50
## 
## $female
## [1] "TRUE"

l$prices[1]

## [1] 8.25

A list may contain other lists:

l = list(1, list(1, 2, 3), list("a", 1, list("TRUE", "FALSE")))
str(l)

## List of 3
##  $ : num 1
##  $ :List of 3
##   ..$ : num 1
##   ..$ : num 2
##   ..$ : num 3
##  $ :List of 3
##   ..$ : chr "a"
##   ..$ : num 1
##   ..$ :List of 2
##   .. ..$ : chr "TRUE"
##   .. ..$ : chr "FALSE"

Find:

the list list(1, 2, 3)
the element 1 of list list(1, 2, 3)
the element TRUE of list list("TRUE", "FALSE")

You can combine lists:

a = list(1, 2, 3)
b = list(3, 2, 1)
c(a, b)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 3
## 
## [[5]]
## [1] 2
## 
## [[6]]
## [1] 1

Matrix and array

A matrix is a 2-dimensional vector. Hence all elements of a matrix must have the same type. Typically matrices contain numbers.

M = matrix(data = 1:9, nrow = 3, byrow = TRUE)
M

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

N = matrix(data = 1:9, ncol = 3)
N

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

nrow(M)

## [1] 3

ncol(M)

## [1] 3

dim(M)

## [1] 3 3

You can also create a matrix setting the attribute dim to a vector:

x = 1:9
x

## [1] 1 2 3 4 5 6 7 8 9

dim(x) = c(3, 3)
x

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Accessing the matrix elements:

# element
M[1, 2]

## [1] 2

# first row
M[1, ]

## [1] 1 2 3

# first column
M[ ,1]

## [1] 1 4 7

# sub-matrix
M[1:2, 1:2]

##      [,1] [,2]
## [1,]    1    2
## [2,]    4    5

M[-3, -3]

##      [,1] [,2]
## [1,]    1    2
## [2,]    4    5

# diagonal
diag(M)

## [1] 1 5 9

diag(M) = 0
M

##      [,1] [,2] [,3]
## [1,]    0    2    3
## [2,]    4    0    6
## [3,]    7    8    0

Operations on matrices:

# element-wise sum
M + N

##      [,1] [,2] [,3]
## [1,]    1    6   10
## [2,]    6    5   14
## [3,]   10   14    9

# element-wise product
M * N

##      [,1] [,2] [,3]
## [1,]    0    8   21
## [2,]    8    0   48
## [3,]   21   48    0

# matrix product
M %*% N

##      [,1] [,2] [,3]
## [1,]   13   28   43
## [2,]   22   52   82
## [3,]   23   68  113

# matrix transpose
t(M)

##      [,1] [,2] [,3]
## [1,]    0    4    7
## [2,]    2    0    8
## [3,]    3    6    0

# matrix inverse
C = matrix(c(1,0,1, 1,1,1, 1,1,0), nrow=3, byrow=TRUE)
D = solve(C)
D

##      [,1] [,2] [,3]
## [1,]    1   -1    1
## [2,]   -1    1    0
## [3,]    0    1   -1

D %*% C

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

C %*% D

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

# linear systems C x = b
C

##      [,1] [,2] [,3]
## [1,]    1    0    1
## [2,]    1    1    1
## [3,]    1    1    0

b = c(2, 1, 3)
# the system is:
# x1      + x3 = 2
# x1 + x2 + x3 = 1
# x1 + x2      = 3
x = solve(C,b)
x

## [1]  4 -1 -2

C %*% x

##      [,1]
## [1,]    2
## [2,]    1
## [3,]    3

# matrix spectrum
spectrum = eigen(C)
spectrum$vectors

##            [,1]       [,2]       [,3]
## [1,] -0.4151581 -0.4743098 -0.6026918
## [2,] -0.7480890 -0.2110877  0.7515444
## [3,] -0.5176936  0.8546767  0.2682231

spectrum$values

## [1]  2.2469796 -0.8019377  0.5549581

spectrum2 = eigen(t(C))
spectrum2$vectors

##            [,1]       [,2]       [,3]
## [1,] -0.7480890  0.2110877 -0.7515444
## [2,] -0.4151581  0.4743098  0.6026918
## [3,] -0.5176936 -0.8546767 -0.2682231

spectrum2$values

## [1]  2.2469796 -0.8019377  0.5549581

You can add rows with rbind and add columns with cbind:

##      [,1] [,2] [,3]
## [1,]    0    2    3
## [2,]    4    0    6
## [3,]    7    8    0

rbind(M, 10:12)

##      [,1] [,2] [,3]
## [1,]    0    2    3
## [2,]    4    0    6
## [3,]    7    8    0
## [4,]   10   11   12

# this makes a copy of M
M

##      [,1] [,2] [,3]
## [1,]    0    2    3
## [2,]    4    0    6
## [3,]    7    8    0

# modify M with
M = rbind(M, 10:12)
M

##      [,1] [,2] [,3]
## [1,]    0    2    3
## [2,]    4    0    6
## [3,]    7    8    0
## [4,]   10   11   12

M = cbind(M, seq(4, 16, 4))
M

##      [,1] [,2] [,3] [,4]
## [1,]    0    2    3    4
## [2,]    4    0    6    8
## [3,]    7    8    0   12
## [4,]   10   11   12   16

Rows and columns of a matrix can have names:

rownames(M) = letters[1:nrow(M)]
colnames(M) = LETTERS[1:ncol(M)]
M

##    A  B  C  D
## a  0  2  3  4
## b  4  0  6  8
## c  7  8  0 12
## d 10 11 12 16

M["a", "A"]

## [1] 0

M["a", ]

## A B C D 
## 0 2 3 4

M[ ,"A"]

##  a  b  c  d 
##  0  4  7 10

An array is a multi-dimensional vector:

A = array(1:27, dim=c(3, 3, 3))
A

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   10   13   16
## [2,]   11   14   17
## [3,]   12   15   18
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   19   22   25
## [2,]   20   23   26
## [3,]   21   24   27

A[ , , 1]

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

A[ , 1, 1]

## [1] 1 2 3

A[1 , , 1]

## [1] 1 4 7

A[1, 1, 1]

## [1] 1

Data frames

A data frame is a list of vectors (called columns) of the same length but possibly of different types. A data frame is like a database table. Each column has a name and contains elements of the same type. A data frame is a mix of list and matrix structures: like a list, elements (columns) can have different types. Like a matrix, columns have the same length.

team = c("Inter", "Milan", "Roma", "Palermo")
score = c(59, 58, 53, 46)
win = c(17, 17, 15, 13)
tie = c(8, 7, 8, 7)
lost = c(3, 4, 5, 8)

league = data.frame(team, score, win, tie, lost, stringsAsFactors = FALSE)
league

##      team score win tie lost
## 1   Inter    59  17   8    3
## 2   Milan    58  17   7    4
## 3    Roma    53  15   8    5
## 4 Palermo    46  13   7    8

Accessing data frames elements:

# first row
league[1, ]

##    team score win tie lost
## 1 Inter    59  17   8    3

# first column
league[ ,1]

## [1] "Inter"   "Milan"   "Roma"    "Palermo"

league[ ,"team"]

## [1] "Inter"   "Milan"   "Roma"    "Palermo"

league[1:2, 1:2]

##    team score
## 1 Inter    59
## 2 Milan    58

league[1:2, c("team", "score")]

##    team score
## 1 Inter    59
## 2 Milan    58

You can combine data frames as with matrices:

rbind(league, data.frame(team = "Lazio", score = 44, win = 12, tie = 8, lost = 8))

##      team score win tie lost
## 1   Inter    59  17   8    3
## 2   Milan    58  17   7    4
## 3    Roma    53  15   8    5
## 4 Palermo    46  13   7    8
## 5   Lazio    44  12   8    8

cbind(league, goals = c(45, 43, 38, 36))

##      team score win tie lost goals
## 1   Inter    59  17   8    3    45
## 2   Milan    58  17   7    4    43
## 3    Roma    53  15   8    5    38
## 4 Palermo    46  13   7    8    36

A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list. This means that a data frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length() of a data frame is the length of the underlying list and so is the same as ncol(); nrow() gives the number of rows.

# a data frame is a list
typeof(league)

## [1] "list"

league$team

## [1] "Inter"   "Milan"   "Roma"    "Palermo"

league[[1]]

## [1] "Inter"   "Milan"   "Roma"    "Palermo"

league[league$team == "Inter", ]

##    team score win tie lost
## 1 Inter    59  17   8    3

league[league$score == max(league$score), ]

##    team score win tie lost
## 1 Inter    59  17   8    3

nrow(league)

## [1] 4

ncol(league)

## [1] 5

rownames(league)

## [1] "1" "2" "3" "4"

colnames(league)

## [1] "team"  "score" "win"   "tie"   "lost"

Since a data frame is a list of vectors, and a vector can be a list, we can make data frames with list columns, and hence also data frames whose elements are data frames (nested data frames):

# a data frame with a list column
df1 = data.frame(
  x = I(list(a = 1:3, b = 4:6)), 
  y = c("Hello", "Venus"),
  stringsAsFactors = FALSE
)
str(df1)

## 'data.frame':    2 obs. of  2 variables:
##  $ x:List of 2
##   ..$ a: int  1 2 3
##   ..$ b: int  4 5 6
##   ..- attr(*, "class")= chr "AsIs"
##  $ y: chr  "Hello" "Venus"

df2 = data.frame(
  x = I(list(a = 3:1, b = 6:4)), 
  y = c("Hello", "Jupiter"),
  stringsAsFactors = FALSE
)
str(df2)

## 'data.frame':    2 obs. of  2 variables:
##  $ x:List of 2
##   ..$ a: int  3 2 1
##   ..$ b: int  6 5 4
##   ..- attr(*, "class")= chr "AsIs"
##  $ y: chr  "Hello" "Jupiter"

# a data frame with data frame elements
df = data.frame(
  x = I(list(Venus = df1, Jupiter = df2)), 
  y = c("Hello", "Worlds"),
  stringsAsFactors = FALSE
)
str(df)

## 'data.frame':    2 obs. of  2 variables:
##  $ x:List of 2
##   ..$ Venus  :'data.frame':  2 obs. of  2 variables:
##   .. ..$ x:List of 2
##   .. .. ..$ a: int  1 2 3
##   .. .. ..$ b: int  4 5 6
##   .. .. ..- attr(*, "class")= chr "AsIs"
##   .. ..$ y: chr  "Hello" "Venus"
##   ..$ Jupiter:'data.frame':  2 obs. of  2 variables:
##   .. ..$ x:List of 2
##   .. .. ..$ a: int  3 2 1
##   .. .. ..$ b: int  6 5 4
##   .. .. ..- attr(*, "class")= chr "AsIs"
##   .. ..$ y: chr  "Hello" "Jupiter"
##   ..- attr(*, "class")= chr "AsIs"
##  $ y: chr  "Hello" "Worlds"

Conditional and repetition

R is an object-oriented functional programming language. Conditional statements take the form:

x = 49
if (x %% 7 == 0) x else -x

## [1] 49

Looping constructs include while and for:

x = 108
i = 2
while (i <= x/2) {
 if (x %% i == 0) print(i)
 i = i + 1;
}

## [1] 2
## [1] 3
## [1] 4
## [1] 6
## [1] 9
## [1] 12
## [1] 18
## [1] 27
## [1] 36
## [1] 54

for (i in 2:(x/2)) {
  if (x %% i == 0) print(i)
}

## [1] 2
## [1] 3
## [1] 4
## [1] 6
## [1] 9
## [1] 12
## [1] 18
## [1] 27
## [1] 36
## [1] 54

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)


# we know the output length
output <- vector("double", ncol(df))  # 1. output
for (i in 1:ncol(df)) {               # 2. sequence
  output[i] <- mean(df[[i]])          # 3. body
}

# Unknown output length (expensive solution!)
means <- c(0, 1, 2)
output <- double()
for (i in 1:length(means)) {
  n <- sample(100, 1)
  output <- c(output, rnorm(n, means[i]))
}

# Unknown output length (efficient solution!)
output <- vector("list", length(means))
for (i in seq_along(means)) {
  n <- sample(100, 1)
  output[[i]] <- rnorm(n, means[i])
}
output <- unlist(output)

# Unknown sequence length
flip <- function() sample(c("T", "H"), 1)

flips <- 0
nheads <- 0
difficulty <- 10

while (nheads < difficulty) {
  if (flip() == "H") {
    nheads <- nheads + 1
  } else {
    nheads <- 0
  }
  flips <- flips + 1
}
flips

## [1] 1050

Functions

You may use built-in functions:

log

## function (x, base = exp(1))  .Primitive("log")

args(log)

## function (x, base = exp(1)) 
## NULL

log(x = 128, base = 2)

## [1] 7

log(base = 2, x = 128)

## [1] 7

log(128, 2)

## [1] 7

log(128)

## [1] 4.85203

Or define your our functions:

euclidean = function(x=0, y=0) {sqrt(x^2 + y^2)}

euclidean(1, 1)

## [1] 1.414214

euclidean(1)

## [1] 1

euclidean()

## [1] 0

operate = function(x, y, op) {
  switch(op, 
         plus = x + y,
         minus = x - y,
         times = x * y,
         divide = x / y,
         stop("Unknown op!")
  )
}

operate(6, 3, op="plus")

## [1] 9

operate(6, 3, op="minus")

## [1] 3

operate(6, 3, op="times")

## [1] 18

operate(6, 3, op="divide")

## [1] 2

# Try:
# operate(6, 2, op="log")

Functions may be recursive:

factorial = function(x) {
 if (x == 0) 1 else x * factorial(x-1)
}
factorial(5)

## [1] 120

You may write functionals, that are functions whose arguments are other functions:

g = function(f, n) {
 sum = 0
 for (i in 0:n) sum = sum + f(i)
 return(sum)
}
 
g(factorial, 5)

## [1] 154

An application of functionals and iteration is the set of apply-like functionals:

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

# apply mean to each column of data frame, returns a list
lapply(df, mean)

## $a
## [1] 0.2929172
## 
## $b
## [1] -0.5021485
## 
## $c
## [1] 0.07003155
## 
## $d
## [1] 0.1669388

# apply mean to each column of data frame, returns an atomic vector
sapply(df, mean)

##           a           b           c           d 
##  0.29291722 -0.50214855  0.07003155  0.16693883

mtx <- cbind(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)


# apply mean to each column of matrix, returns an atomic vector
apply(mtx, 2, mean)

##           a           b           c           d 
## -0.56884439 -0.32228197  0.02740099  0.43095743

# apply mean to each row of matrix, returns an atomic vector
apply(mtx, 1, mean)

##  [1]  0.09692097 -0.41739437  0.24424219 -0.41219123  0.33956363
##  [6] -0.21294377 -0.08479776  0.27154629  0.30138952 -1.20825534

You may define your own binary operators using functions:

'%()%' = function(x, y) {(x + y)^2}
2 %()% 3

## [1] 25

Plot

# a data matrix
M = matrix(c(
   c(1200, 1190, 1100, 1120, 890),
   c(6200, 6690, 6700, 7120, 7150),
   c(8900, 8790, 8760, 8800, 9010),
   c(3300, 3490, 3660, 4300, 4510),
   c(2190, 2000, 1890, 1740, 1500)), ncol = 5
)  

rownames(M) = 2014:2018
colnames(M) = LETTERS[1:5]
M

##         A    B    C    D    E
## 2014 1200 6200 8900 3300 2190
## 2015 1190 6690 8790 3490 2000
## 2016 1100 6700 8760 3660 1890
## 2017 1120 7120 8800 4300 1740
## 2018  890 7150 9010 4510 1500

# barplot
barplot(M[1,])

# stacked barplot
barplot(M, legend=TRUE)

#  juxtaposed barplot
barplot(M, beside=TRUE, legend=TRUE)

# histogram
x = rnorm(1000)
hist(x, probability=TRUE, main="Histogram of a normal sample")
rug(x)

# density plot
plot(density(x), main="Density of a normal sample")
rug(x)

# boxplot
# If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes.
boxplot(x, range = 1.5)

boxplot(x, range = 0)

# scatter plot
x = rnorm(100)
y = rnorm(100)
plot(x, y)

y = x + runif(100)
plot(x, y)

Scripts and packages

You can store a script of commands in a possibly remote file and evaluate the script using the source command. R comes with a number of packages, some of them are loaded by default.

# installed packages
(.packages(all.available=TRUE))

# loaded packages
(.packages())

# install a package
install.packages("igraph")

# load a package
library(igraph)

Getting help and quiting

?log
?'+'
??"regression"

When quitting, the workspace is saved in files .RData (environment) and .Rhistory (command history).