R is a needful language for the data scientist. Its strengths include:
This is a hasty tour inside R.
# arithmetic
1 + 2 * 4 - 2 / 2
## [1] 8
# integer division
31 %/% 3
## [1] 10
# modulus
31 %% 3
## [1] 1
# exponents
2^10
## [1] 1024
# comparison
1 == 1
## [1] TRUE
1 != 1
## [1] FALSE
1 < 1
## [1] FALSE
1 <= 1
## [1] TRUE
# logic operators
# conjunction
TRUE & TRUE
## [1] TRUE
FALSE & TRUE
## [1] FALSE
TRUE & FALSE
## [1] FALSE
FALSE & FALSE
## [1] FALSE
# disjunction
TRUE | TRUE
## [1] TRUE
FALSE | TRUE
## [1] TRUE
TRUE | FALSE
## [1] TRUE
FALSE | FALSE
## [1] FALSE
# negation
!TRUE
## [1] FALSE
!FALSE
## [1] TRUE
# exclusive disjunction
xor(TRUE, TRUE)
## [1] FALSE
xor(TRUE, FALSE)
## [1] TRUE
xor(FALSE, TRUE)
## [1] TRUE
xor(FALSE, FALSE)
## [1] FALSE
For conjunction and disjunction we have a shorter (& and |) and a longer (&& and ||) form. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right and evaluation proceeds only until the result is determined. The longer form is hence more efficient.
There are a few special values:
NA
(not available) is used to represent missing values;NULL
is the null object (not to be confused with NULL in databases);Inf
stands for positive infinity;NaN
(not a number) is the result of a computation that makes no sense.NA & TRUE
## [1] NA
NA & FALSE
## [1] FALSE
NA | TRUE
## [1] TRUE
NA | FALSE
## [1] NA
!NA
## [1] NA
2^1024
## [1] Inf
1/0
## [1] Inf
0 / 0
## [1] NaN
Inf - Inf
## [1] NaN
Of course, you may use variables to store values and results of expressions. There are 3 equivalent ways to assign a value to a variable (I prefer the first one, but the second one is the one you should use!):
x = 45
x <- 45 # this is the politically correct one!
45 -> x
To print the value of a variable, just type it:
x
## [1] 45
# or
print(x)
## [1] 45
# or print the structure of the object
str(x)
## num 45
Dim Homogeneous Heterogeneous 1d Atomic vector List 2d Matrix Data frame nd Array
Check the type of data with function typeof
.
typeof(x)
## [1] "double"
A vector is a sequence of elements with the same type. Vector indexes start at 1 (not 0). Construct vectors with c
or seq
functions:
c(0, 1, 1, 2, 3, 5, 8)
## [1] 0 1 1 2 3 5 8
seq(1, 10, 1)
## [1] 1 2 3 4 5 6 7 8 9 10
seq(2, 10, 2)
## [1] 2 4 6 8 10
Numeric vectors (double or integer vectors):
x = c(1, 2, 3)
typeof(x)
## [1] "double"
is.double(x)
## [1] TRUE
y = c(1L, 2L, 3L)
typeof(y)
## [1] "integer"
is.integer(y)
## [1] TRUE
is.numeric(x)
## [1] TRUE
is.numeric(y)
## [1] TRUE
Operations on numeric vectors:
# element-wise sum
c(1, 2, 3, 4) + c(10, 20, 30, 40)
## [1] 11 22 33 44
# element-wise product
c(1, 2, 3, 4) * c(10, 20, 30, 40)
## [1] 10 40 90 160
# scalar product
c(1, 2, 3, 4) %*% c(10, 20, 30, 40)
## [,1]
## [1,] 300
If the two vectors have different lengths, the smaller one is repeated (recycling):
c(1, 2, 3, 4) + 10
## [1] 11 12 13 14
c(1, 2, 3, 4) + c(10, 20)
## [1] 11 22 13 24
Character vectors are vectors of strings:
c("This", "class", "is", "really", "terrific!")
## [1] "This" "class" "is" "really" "terrific!"
Boolean vectors are vectors of Booleans:
x = c(TRUE, FALSE, TRUE, FALSE)
y = !x
x
## [1] TRUE FALSE TRUE FALSE
y
## [1] FALSE TRUE FALSE TRUE
x & y
## [1] FALSE FALSE FALSE FALSE
x | y
## [1] TRUE TRUE TRUE TRUE
xor(x, y)
## [1] TRUE TRUE TRUE TRUE
Check the length of a vector with:
length(c(1, 2, 3))
## [1] 3
You may refer to members of a vector in several ways (mind the :
operator):
a = 11:20
a
## [1] 11 12 13 14 15 16 17 18 19 20
a[5]
## [1] 15
a[c(1, 5, 10)]
## [1] 11 15 20
a[-1]
## [1] 12 13 14 15 16 17 18 19 20
a[-c(1, 5, 10)]
## [1] 12 13 14 16 17 18 19
a > 15
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
a[a > 15]
## [1] 16 17 18 19 20
All subsetting operators can be combined with assignment to modify selected values of the input vector:
a[1] = 100
a[10] = 200
a
## [1] 100 12 13 14 15 16 17 18 19 200
a[c(1, 10)] = c(10, 20)
a
## [1] 10 12 13 14 15 16 17 18 19 20
a[a > 15] = Inf
a
## [1] 10 12 13 14 15 Inf Inf Inf Inf Inf
All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be casted to the most flexible type (coercion). Types from least to most flexible are: logical, integer, double, and character.
as.integer(FALSE)
## [1] 0
as.integer(TRUE)
## [1] 1
x = c(TRUE, TRUE, FALSE, FALSE)
sum(x)
## [1] 2
mean(x)
## [1] 0.5
as.logical(0)
## [1] FALSE
as.logical(1)
## [1] TRUE
as.double(0L)
## [1] 0
as.integer(0.5)
## [1] 0
as.character(0.5)
## [1] "0.5"
as.double("0.5")
## [1] 0.5
as.numeric("a")
## Warning: si è prodotto un NA per coercizione
## [1] NA
Finally, vector elements can have names:
x = c(a = 1, b = 2, c = 3)
# or
x = c(1, 2, 3)
names(x) = c("a", "b", "c")
x["a"]
## a
## 1
x[c("a", "b")]
## a b
## 1 2
A factor is a vector that can contain only predefined values, and is used to store categorical variables. Factors are built on top of integer vectors using the levels
attribute, which defines the set of allowed values.
x = factor(c("male", "female", "female", "male", "male"))
x
## [1] male female female male male
## Levels: female male
typeof(x)
## [1] "integer"
levels(x)
## [1] "female" "male"
table(x)
## x
## female male
## 2 3
# You can't use values that are not levels
x[1] = "unknown"
## Warning in `[<-.factor`(`*tmp*`, 1, value = "unknown"): invalid factor
## level, NA generated
x
## [1] <NA> female female male male
## Levels: female male
Factors are useful when you know the possible values a variable may take, even if you don’t see all values in a given dataset. Unfortunately, most data loading functions in R automatically convert character vectors to factors. This is suboptimal, because there’s no way for those functions to know the set of all possible levels or their optimal order. Instead, use the argument stringsAsFactors = FALSE
to suppress this behaviour, and then manually convert character vectors to factors using your knowledge of the data.
A list is a sequence of elements that might have different type. It is a recursive structure, since it can contains other lists.
l = list(thing = "hat", size = 8.25, female = TRUE)
l
## $thing
## [1] "hat"
##
## $size
## [1] 8.25
##
## $female
## [1] TRUE
str(l)
## List of 3
## $ thing : chr "hat"
## $ size : num 8.25
## $ female: logi TRUE
# an element
l$thing
## [1] "hat"
l[[1]]
## [1] "hat"
l[["thing"]]
## [1] "hat"
# a sublist
l[c(1, 2)]
## $thing
## [1] "hat"
##
## $size
## [1] 8.25
l[c("thing", "size")]
## $thing
## [1] "hat"
##
## $size
## [1] 8.25
Mind that l[1]
is a sub-list containing only the first component of list:
l[1]
## $thing
## [1] "hat"
typeof(l[1])
## [1] "list"
l[[1]]
## [1] "hat"
typeof(l[[1]])
## [1] "character"
“If list x is a train carrying objects, then x[[5]] is the object in car 5; x[5] is car number 5.”
You can add and remove elements from a list as follows:
l = list(a = 1, b = 2)
l$c = 3
l
## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 3
l$c = NULL
l
## $a
## [1] 1
##
## $b
## [1] 2
A list may contain vectors:
l = list(thing = "hat", prices = c(8.25, 10.5), female = "TRUE")
l
## $thing
## [1] "hat"
##
## $prices
## [1] 8.25 10.50
##
## $female
## [1] "TRUE"
l$prices[1]
## [1] 8.25
A list may contain other lists:
l = list(1, list(1, 2, 3), list("a", 1, list("TRUE", "FALSE")))
str(l)
## List of 3
## $ : num 1
## $ :List of 3
## ..$ : num 1
## ..$ : num 2
## ..$ : num 3
## $ :List of 3
## ..$ : chr "a"
## ..$ : num 1
## ..$ :List of 2
## .. ..$ : chr "TRUE"
## .. ..$ : chr "FALSE"
Find:
list(1, 2, 3)
1
of list list(1, 2, 3)
TRUE
of list list("TRUE", "FALSE")
You can combine lists:
a = list(1, 2, 3)
b = list(3, 2, 1)
c(a, b)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 3
##
## [[5]]
## [1] 2
##
## [[6]]
## [1] 1
A matrix is a 2-dimensional vector. Hence all elements of a matrix must have the same type. Typically matrices contain numbers.
M = matrix(data = 1:9, nrow = 3, byrow = TRUE)
M
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
N = matrix(data = 1:9, ncol = 3)
N
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
nrow(M)
## [1] 3
ncol(M)
## [1] 3
dim(M)
## [1] 3 3
You can also create a matrix setting the attribute dim
to a vector:
x = 1:9
x
## [1] 1 2 3 4 5 6 7 8 9
dim(x) = c(3, 3)
x
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Accessing the matrix elements:
# element
M[1, 2]
## [1] 2
# first row
M[1, ]
## [1] 1 2 3
# first column
M[ ,1]
## [1] 1 4 7
# sub-matrix
M[1:2, 1:2]
## [,1] [,2]
## [1,] 1 2
## [2,] 4 5
M[-3, -3]
## [,1] [,2]
## [1,] 1 2
## [2,] 4 5
# diagonal
diag(M)
## [1] 1 5 9
diag(M) = 0
M
## [,1] [,2] [,3]
## [1,] 0 2 3
## [2,] 4 0 6
## [3,] 7 8 0
Operations on matrices:
# element-wise sum
M + N
## [,1] [,2] [,3]
## [1,] 1 6 10
## [2,] 6 5 14
## [3,] 10 14 9
# element-wise product
M * N
## [,1] [,2] [,3]
## [1,] 0 8 21
## [2,] 8 0 48
## [3,] 21 48 0
# matrix product
M %*% N
## [,1] [,2] [,3]
## [1,] 13 28 43
## [2,] 22 52 82
## [3,] 23 68 113
# matrix transpose
t(M)
## [,1] [,2] [,3]
## [1,] 0 4 7
## [2,] 2 0 8
## [3,] 3 6 0
# matrix inverse
C = matrix(c(1,0,1, 1,1,1, 1,1,0), nrow=3, byrow=TRUE)
D = solve(C)
D
## [,1] [,2] [,3]
## [1,] 1 -1 1
## [2,] -1 1 0
## [3,] 0 1 -1
D %*% C
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
C %*% D
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
# linear systems C x = b
C
## [,1] [,2] [,3]
## [1,] 1 0 1
## [2,] 1 1 1
## [3,] 1 1 0
b = c(2, 1, 3)
# the system is:
# x1 + x3 = 2
# x1 + x2 + x3 = 1
# x1 + x2 = 3
x = solve(C,b)
x
## [1] 4 -1 -2
C %*% x
## [,1]
## [1,] 2
## [2,] 1
## [3,] 3
# matrix spectrum
spectrum = eigen(C)
spectrum$vectors
## [,1] [,2] [,3]
## [1,] -0.4151581 -0.4743098 -0.6026918
## [2,] -0.7480890 -0.2110877 0.7515444
## [3,] -0.5176936 0.8546767 0.2682231
spectrum$values
## [1] 2.2469796 -0.8019377 0.5549581
spectrum2 = eigen(t(C))
spectrum2$vectors
## [,1] [,2] [,3]
## [1,] -0.7480890 0.2110877 -0.7515444
## [2,] -0.4151581 0.4743098 0.6026918
## [3,] -0.5176936 -0.8546767 -0.2682231
spectrum2$values
## [1] 2.2469796 -0.8019377 0.5549581
You can add rows with rbind and add columns with cbind:
M
## [,1] [,2] [,3]
## [1,] 0 2 3
## [2,] 4 0 6
## [3,] 7 8 0
rbind(M, 10:12)
## [,1] [,2] [,3]
## [1,] 0 2 3
## [2,] 4 0 6
## [3,] 7 8 0
## [4,] 10 11 12
# this makes a copy of M
M
## [,1] [,2] [,3]
## [1,] 0 2 3
## [2,] 4 0 6
## [3,] 7 8 0
# modify M with
M = rbind(M, 10:12)
M
## [,1] [,2] [,3]
## [1,] 0 2 3
## [2,] 4 0 6
## [3,] 7 8 0
## [4,] 10 11 12
M = cbind(M, seq(4, 16, 4))
M
## [,1] [,2] [,3] [,4]
## [1,] 0 2 3 4
## [2,] 4 0 6 8
## [3,] 7 8 0 12
## [4,] 10 11 12 16
Rows and columns of a matrix can have names:
rownames(M) = letters[1:nrow(M)]
colnames(M) = LETTERS[1:ncol(M)]
M
## A B C D
## a 0 2 3 4
## b 4 0 6 8
## c 7 8 0 12
## d 10 11 12 16
M["a", "A"]
## [1] 0
M["a", ]
## A B C D
## 0 2 3 4
M[ ,"A"]
## a b c d
## 0 4 7 10
An array is a multi-dimensional vector:
A = array(1:27, dim=c(3, 3, 3))
A
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 10 13 16
## [2,] 11 14 17
## [3,] 12 15 18
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 19 22 25
## [2,] 20 23 26
## [3,] 21 24 27
A[ , , 1]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
A[ , 1, 1]
## [1] 1 2 3
A[1 , , 1]
## [1] 1 4 7
A[1, 1, 1]
## [1] 1
A data frame is a list of vectors (called columns) of the same length but possibly of different types. A data frame is like a database table. Each column has a name and contains elements of the same type. A data frame is a mix of list and matrix structures: like a list, elements (columns) can have different types. Like a matrix, columns have the same length.
team = c("Inter", "Milan", "Roma", "Palermo")
score = c(59, 58, 53, 46)
win = c(17, 17, 15, 13)
tie = c(8, 7, 8, 7)
lost = c(3, 4, 5, 8)
league = data.frame(team, score, win, tie, lost, stringsAsFactors = FALSE)
league
## team score win tie lost
## 1 Inter 59 17 8 3
## 2 Milan 58 17 7 4
## 3 Roma 53 15 8 5
## 4 Palermo 46 13 7 8
Accessing data frames elements:
# first row
league[1, ]
## team score win tie lost
## 1 Inter 59 17 8 3
# first column
league[ ,1]
## [1] "Inter" "Milan" "Roma" "Palermo"
league[ ,"team"]
## [1] "Inter" "Milan" "Roma" "Palermo"
league[1:2, 1:2]
## team score
## 1 Inter 59
## 2 Milan 58
league[1:2, c("team", "score")]
## team score
## 1 Inter 59
## 2 Milan 58
You can combine data frames as with matrices:
rbind(league, data.frame(team = "Lazio", score = 44, win = 12, tie = 8, lost = 8))
## team score win tie lost
## 1 Inter 59 17 8 3
## 2 Milan 58 17 7 4
## 3 Roma 53 15 8 5
## 4 Palermo 46 13 7 8
## 5 Lazio 44 12 8 8
cbind(league, goals = c(45, 43, 38, 36))
## team score win tie lost goals
## 1 Inter 59 17 8 3 45
## 2 Milan 58 17 7 4 43
## 3 Roma 53 15 8 5 38
## 4 Palermo 46 13 7 8 36
A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list. This means that a data frame has names()
, colnames()
, and rownames()
, although names()
and colnames()
are the same thing. The length()
of a data frame is the length of the underlying list and so is the same as ncol()
; nrow()
gives the number of rows.
# a data frame is a list
typeof(league)
## [1] "list"
league$team
## [1] "Inter" "Milan" "Roma" "Palermo"
league[[1]]
## [1] "Inter" "Milan" "Roma" "Palermo"
league[league$team == "Inter", ]
## team score win tie lost
## 1 Inter 59 17 8 3
league[league$score == max(league$score), ]
## team score win tie lost
## 1 Inter 59 17 8 3
nrow(league)
## [1] 4
ncol(league)
## [1] 5
rownames(league)
## [1] "1" "2" "3" "4"
colnames(league)
## [1] "team" "score" "win" "tie" "lost"
Since a data frame is a list of vectors, and a vector can be a list, we can make data frames with list columns, and hence also data frames whose elements are data frames (nested data frames):
# a data frame with a list column
df1 = data.frame(
x = I(list(a = 1:3, b = 4:6)),
y = c("Hello", "Venus"),
stringsAsFactors = FALSE
)
str(df1)
## 'data.frame': 2 obs. of 2 variables:
## $ x:List of 2
## ..$ a: int 1 2 3
## ..$ b: int 4 5 6
## ..- attr(*, "class")= chr "AsIs"
## $ y: chr "Hello" "Venus"
df2 = data.frame(
x = I(list(a = 3:1, b = 6:4)),
y = c("Hello", "Jupiter"),
stringsAsFactors = FALSE
)
str(df2)
## 'data.frame': 2 obs. of 2 variables:
## $ x:List of 2
## ..$ a: int 3 2 1
## ..$ b: int 6 5 4
## ..- attr(*, "class")= chr "AsIs"
## $ y: chr "Hello" "Jupiter"
# a data frame with data frame elements
df = data.frame(
x = I(list(Venus = df1, Jupiter = df2)),
y = c("Hello", "Worlds"),
stringsAsFactors = FALSE
)
str(df)
## 'data.frame': 2 obs. of 2 variables:
## $ x:List of 2
## ..$ Venus :'data.frame': 2 obs. of 2 variables:
## .. ..$ x:List of 2
## .. .. ..$ a: int 1 2 3
## .. .. ..$ b: int 4 5 6
## .. .. ..- attr(*, "class")= chr "AsIs"
## .. ..$ y: chr "Hello" "Venus"
## ..$ Jupiter:'data.frame': 2 obs. of 2 variables:
## .. ..$ x:List of 2
## .. .. ..$ a: int 3 2 1
## .. .. ..$ b: int 6 5 4
## .. .. ..- attr(*, "class")= chr "AsIs"
## .. ..$ y: chr "Hello" "Jupiter"
## ..- attr(*, "class")= chr "AsIs"
## $ y: chr "Hello" "Worlds"
R is an object-oriented functional programming language. Conditional statements take the form:
x = 49
if (x %% 7 == 0) x else -x
## [1] 49
Looping constructs include while and for:
x = 108
i = 2
while (i <= x/2) {
if (x %% i == 0) print(i)
i = i + 1;
}
## [1] 2
## [1] 3
## [1] 4
## [1] 6
## [1] 9
## [1] 12
## [1] 18
## [1] 27
## [1] 36
## [1] 54
for (i in 2:(x/2)) {
if (x %% i == 0) print(i)
}
## [1] 2
## [1] 3
## [1] 4
## [1] 6
## [1] 9
## [1] 12
## [1] 18
## [1] 27
## [1] 36
## [1] 54
df <- data.frame(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
# we know the output length
output <- vector("double", ncol(df)) # 1. output
for (i in 1:ncol(df)) { # 2. sequence
output[i] <- mean(df[[i]]) # 3. body
}
# Unknown output length (expensive solution!)
means <- c(0, 1, 2)
output <- double()
for (i in 1:length(means)) {
n <- sample(100, 1)
output <- c(output, rnorm(n, means[i]))
}
# Unknown output length (efficient solution!)
output <- vector("list", length(means))
for (i in seq_along(means)) {
n <- sample(100, 1)
output[[i]] <- rnorm(n, means[i])
}
output <- unlist(output)
# Unknown sequence length
flip <- function() sample(c("T", "H"), 1)
flips <- 0
nheads <- 0
difficulty <- 10
while (nheads < difficulty) {
if (flip() == "H") {
nheads <- nheads + 1
} else {
nheads <- 0
}
flips <- flips + 1
}
flips
## [1] 1050
You may use built-in functions:
log
## function (x, base = exp(1)) .Primitive("log")
args(log)
## function (x, base = exp(1))
## NULL
log(x = 128, base = 2)
## [1] 7
log(base = 2, x = 128)
## [1] 7
log(128, 2)
## [1] 7
log(128)
## [1] 4.85203
Or define your our functions:
euclidean = function(x=0, y=0) {sqrt(x^2 + y^2)}
euclidean(1, 1)
## [1] 1.414214
euclidean(1)
## [1] 1
euclidean()
## [1] 0
operate = function(x, y, op) {
switch(op,
plus = x + y,
minus = x - y,
times = x * y,
divide = x / y,
stop("Unknown op!")
)
}
operate(6, 3, op="plus")
## [1] 9
operate(6, 3, op="minus")
## [1] 3
operate(6, 3, op="times")
## [1] 18
operate(6, 3, op="divide")
## [1] 2
# Try:
# operate(6, 2, op="log")
Functions may be recursive:
factorial = function(x) {
if (x == 0) 1 else x * factorial(x-1)
}
factorial(5)
## [1] 120
You may write functionals, that are functions whose arguments are other functions:
g = function(f, n) {
sum = 0
for (i in 0:n) sum = sum + f(i)
return(sum)
}
g(factorial, 5)
## [1] 154
An application of functionals and iteration is the set of apply-like functionals:
df <- data.frame(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
# apply mean to each column of data frame, returns a list
lapply(df, mean)
## $a
## [1] 0.2929172
##
## $b
## [1] -0.5021485
##
## $c
## [1] 0.07003155
##
## $d
## [1] 0.1669388
# apply mean to each column of data frame, returns an atomic vector
sapply(df, mean)
## a b c d
## 0.29291722 -0.50214855 0.07003155 0.16693883
mtx <- cbind(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
# apply mean to each column of matrix, returns an atomic vector
apply(mtx, 2, mean)
## a b c d
## -0.56884439 -0.32228197 0.02740099 0.43095743
# apply mean to each row of matrix, returns an atomic vector
apply(mtx, 1, mean)
## [1] 0.09692097 -0.41739437 0.24424219 -0.41219123 0.33956363
## [6] -0.21294377 -0.08479776 0.27154629 0.30138952 -1.20825534
You may define your own binary operators using functions:
'%()%' = function(x, y) {(x + y)^2}
2 %()% 3
## [1] 25
# a data matrix
M = matrix(c(
c(1200, 1190, 1100, 1120, 890),
c(6200, 6690, 6700, 7120, 7150),
c(8900, 8790, 8760, 8800, 9010),
c(3300, 3490, 3660, 4300, 4510),
c(2190, 2000, 1890, 1740, 1500)), ncol = 5
)
rownames(M) = 2014:2018
colnames(M) = LETTERS[1:5]
M
## A B C D E
## 2014 1200 6200 8900 3300 2190
## 2015 1190 6690 8790 3490 2000
## 2016 1100 6700 8760 3660 1890
## 2017 1120 7120 8800 4300 1740
## 2018 890 7150 9010 4510 1500
# barplot
barplot(M[1,])
# stacked barplot
barplot(M, legend=TRUE)
# juxtaposed barplot
barplot(M, beside=TRUE, legend=TRUE)
# histogram
x = rnorm(1000)
hist(x, probability=TRUE, main="Histogram of a normal sample")
rug(x)
# density plot
plot(density(x), main="Density of a normal sample")
rug(x)
# boxplot
# If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes.
boxplot(x, range = 1.5)
boxplot(x, range = 0)
# scatter plot
x = rnorm(100)
y = rnorm(100)
plot(x, y)
y = x + runif(100)
plot(x, y)
You can store a script of commands in a possibly remote file and evaluate the script using the source
command. R comes with a number of packages, some of them are loaded by default.
# installed packages
(.packages(all.available=TRUE))
# loaded packages
(.packages())
# install a package
install.packages("igraph")
# load a package
library(igraph)
?log
?'+'
??"regression"
When quitting, the workspace is saved in files .RData (environment) and .Rhistory (command history).