R is a free software environment for statistical computing and graphics. R is a needful language for the data scientist. Its strengths include:
RStudio is an integrated development environment (IDE) for R. It includes:
# help on log function ?log
# install or update a package (only once!) install.packages("igraph") # load a package (when you need it) library(igraph) # list all packages where an update is available old.packages() # update all available packages update.packages()
+
), minus (-
), product (*
), division (/
), integer division (%/%
), modulus (%%
), exponent (^
)==
), different (!=
), less than (<
), greater than (>
), less than or equal to (<=
), greater than or equal to (>=
)&
), disjunction (|
), negation (!
), exclusive disjunction (xor
)Explain the following mismatch between math and R:
\[ (\sqrt{2}) ^ 2 \stackrel{?}{=} 2\]
sqrt(2) ^ 2 == 2
## [1] FALSE
The computer uses finite binary arithmetic and the binary representation of \(\sqrt{2}\) has infinite figures, hence it is rounded.
Define the xor operator in terms of conjunction (&
), disjunction (|
), and negation (!
).
x = TRUE y = TRUE # first solution (x | y) & !(x & y)
## [1] FALSE
# second solution (x & !y) | (y & !x)
## [1] FALSE
x = TRUE y = FALSE (x | y) & !(x & y)
## [1] TRUE
(x & !y) | (y & !x)
## [1] TRUE
NA
(not available) is used to represent missing values;NULL
is the null object (not to be confused with NULL in databases);Inf
stands for positive infinity;NaN
(not a number) is the result of a computation that makes no sense.NA & TRUE
## [1] NA
NA & FALSE
## [1] FALSE
NA | TRUE
## [1] TRUE
NA | FALSE
## [1] NA
!NA
## [1] NA
2^1024
## [1] Inf
1/0
## [1] Inf
0 / 0
## [1] NaN
Inf - Inf
## [1] NaN
Of course, you may use variables to store values. There are 3 equivalent ways to assign a value to a variable:
x = 42 # my favourite x <- 42 # this is the politically correct one! 42 -> x # used rarely # print x x
## [1] 42
# print structure of x (with type) str(x)
## num 42
R has four main atomic types:
# double (double-precision number) x = 108.801 typeof(x)
## [1] "double"
# integer (integer number) x = 108L typeof(x)
## [1] "integer"
# character (a string of characters) x = "108L" typeof(x)
## [1] "character"
# logical (a Boolean, either TRUE or FALSE) x = TRUE typeof(x)
## [1] "logical"
The main data structures used in R include:
Dim Homogeneous Heterogeneous 1d atomic vector list 2d matrix data frame
A vector is a sequence of elements with the same type. Vector indexes start at 1 (not 0).
# create a vector with c() function c(1, 3, 5, 7)
## [1] 1 3 5 7
# concatenate vectors c(c(1, 3), c(5, 7))
## [1] 1 3 5 7
# element-wise sum c(1, 2, 3, 4) + c(10, 20, 30, 40)
## [1] 11 22 33 44
# recyclying 10 + c(1, 2, 3, 4)
## [1] 11 12 13 14
# element-wise product c(1, 2, 3, 4) * c(10, 20, 30, 40)
## [1] 10 40 90 160
# recyclying 10 * c(1, 2, 3, 4)
## [1] 10 20 30 40
# scalar product (the result is a 1x1 matrix) c(1, 2, 3, 4) %*% c(10, 20, 30, 40)
## [,1] ## [1,] 300
x = c(TRUE, FALSE, TRUE, FALSE) (y = !x) # also prints result
## [1] FALSE TRUE FALSE TRUE
x & y
## [1] FALSE FALSE FALSE FALSE
x | y
## [1] TRUE TRUE TRUE TRUE
xor(x, y)
## [1] TRUE TRUE TRUE TRUE
You may refer to members of a vector in several ways:
primes = c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29) primes[5]
## [1] 11
primes[c(1, 5, 10)]
## [1] 2 11 29
primes[-1]
## [1] 3 5 7 11 13 17 19 23 29
primes[-c(1, 5, 10)]
## [1] 3 5 7 13 17 19 23
primes > 15
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
primes[primes > 15]
## [1] 17 19 23 29
# modify the vector primes[primes > 15] = Inf primes
## [1] 2 3 5 7 11 13 Inf Inf Inf Inf
All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be casted to the most flexible type (coercion). Types from least to most flexible are: logical, integer, double, and character.
x = c(TRUE, TRUE, FALSE, FALSE) # how many TRUE? sum(x)
## [1] 2
# how many TRUE on average mean(x)
## [1] 0.5
Vector elements can have names:
x = c(a = 1, b = 2, c = 3) # or x = c(1, 2, 3) names(x) = c("a", "b", "c") x["a"]
## a ## 1
x[c("a", "b")]
## a b ## 1 2
Given a vector of integers from 0 to 100, select all numbers that are (Hint: use the :
operator to generate the vector):
# vector x = 0:100 # even x[x %% 2 == 0]
## [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 ## [20] 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 ## [39] 76 78 80 82 84 86 88 90 92 94 96 98 100
# even or divisible by 5 x[x %% 2 == 0 | x %% 5 == 0]
## [1] 0 2 4 5 6 8 10 12 14 15 16 18 20 22 24 25 26 28 30 ## [20] 32 34 35 36 38 40 42 44 45 46 48 50 52 54 55 56 58 60 62 ## [39] 64 65 66 68 70 72 74 75 76 78 80 82 84 85 86 88 90 92 94 ## [58] 95 96 98 100
# odd and divisible by 7 x[x %% 2 == 1 & x %% 7 == 0]
## [1] 7 21 35 49 63 77 91
Write a logical condition that is TRUE is the number is prime (Hint: take advantage of the all
function).
n = 109 n == 2L || all(n %% 2:(n-1) != 0)
## [1] TRUE
n = 111 n == 2L || all(n %% 2:(n-1) != 0)
## [1] FALSE
A factor is a vector that can contain only predefined values, and is used to store categorical variables (for instance sex or religion).
Factors are built on top of integer vectors using the levels
attribute, which defines the set of allowed values.
x = factor(c("male", "female", "female", "male", "male")) x
## [1] male female female male male ## Levels: female male
typeof(x)
## [1] "integer"
levels(x)
## [1] "female" "male"
# if you use values that are not levels # a warning is issued and a NA is generated x[1] = "unknown" x
## [1] <NA> female female male male ## Levels: female male
A list is a sequence of elements that might have different types.
# create a list l = list(thing = "hat", size = 8.25, female = TRUE) # print the list l
## $thing ## [1] "hat" ## ## $size ## [1] 8.25 ## ## $female ## [1] TRUE
str(l)
## List of 3 ## $ thing : chr "hat" ## $ size : num 8.25 ## $ female: logi TRUE
# an element l$thing
## [1] "hat"
l[[1]]
## [1] "hat"
# a sublist l[c("thing", "size")]
## $thing ## [1] "hat" ## ## $size ## [1] 8.25
l[c(1, 2)]
## $thing ## [1] "hat" ## ## $size ## [1] 8.25
“If list x is a train carrying objects, then x[[5]] is the object in car 5; x[5] is car number 5.”
# a sublist containing the first element of the list l[1]
## $thing ## [1] "hat"
typeof(l[1])
## [1] "list"
# the first element of the list l[[1]]
## [1] "hat"
typeof(l[[1]])
## [1] "character"
List elements can have any atomic or complex type. Hence a list can contain other lists, making it a nested list.
l = list(1, list(1, 2, 3), list("a", 1, list("TRUE", "FALSE"))) str(l)
## List of 3 ## $ : num 1 ## $ :List of 3 ## ..$ : num 1 ## ..$ : num 2 ## ..$ : num 3 ## $ :List of 3 ## ..$ : chr "a" ## ..$ : num 1 ## ..$ :List of 2 ## .. ..$ : chr "TRUE" ## .. ..$ : chr "FALSE"
Consider the list:
l = list(1, list(1, 2, 3), list("a", 1, list("TRUE", "FALSE")))
Find:
list(1, 2, 3)
1
of list list(1, 2, 3)
TRUE
of list list("TRUE", "FALSE")
l = list(1, list(1, 2, 3), list("a", 1, list("TRUE", "FALSE"))) l[[2]]
## [[1]] ## [1] 1 ## ## [[2]] ## [1] 2 ## ## [[3]] ## [1] 3
l[[2]][[1]]
## [1] 1
l[[3]][[3]][[1]]
## [1] "TRUE"
Write a list containing the information of the Porphyrian Tree. Then select the insensitive part of the tree.
substance = list(immaterial = "spirit", material = list( body = list( inanimate = "mineral", animate = list( living = list( insensitive = "plant", sensitive = list( irrational = "beast", rational = list(human = c("Arendt", "Butler", "Barad")))))))) str(substance)
## List of 2 ## $ immaterial: chr "spirit" ## $ material :List of 1 ## ..$ body:List of 2 ## .. ..$ inanimate: chr "mineral" ## .. ..$ animate :List of 1 ## .. .. ..$ living:List of 2 ## .. .. .. ..$ insensitive: chr "plant" ## .. .. .. ..$ sensitive :List of 2 ## .. .. .. .. ..$ irrational: chr "beast" ## .. .. .. .. ..$ rational :List of 1 ## .. .. .. .. .. ..$ human: chr [1:3] "Arendt" "Butler" "Barad"
substance$material$body$animate$living$insensitive
## [1] "plant"
A matrix is a 2-dimensional vector, that is a vector of vectors of the same type and length.
# by row M = matrix(data = 1:9, nrow = 3, byrow = TRUE) M
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9
# by column (the default) N = matrix(data = 1:9, ncol = 3) N
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
nrow(M)
## [1] 3
ncol(M)
## [1] 3
dim(M)
## [1] 3 3
M
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9
# element in row 1 and column 2 M[1, 2]
## [1] 2
# first row M[1, ]
## [1] 1 2 3
# first column M[ ,1]
## [1] 1 4 7
# sub-matrix M[1:2, 1:2]
## [,1] [,2] ## [1,] 1 2 ## [2,] 4 5
M[-3, -3]
## [,1] [,2] ## [1,] 1 2 ## [2,] 4 5
# diagonal diag(M)
## [1] 1 5 9
P = matrix(data = runif(9), nrow = 3, byrow = TRUE) P
## [,1] [,2] [,3] ## [1,] 0.9027998 0.20863525 0.06612994 ## [2,] 0.5501463 0.75075613 0.74441485 ## [3,] 0.2105642 0.08571193 0.40182803
# add column cbind(P, c(0, 0, 0))
## [,1] [,2] [,3] [,4] ## [1,] 0.9027998 0.20863525 0.06612994 0 ## [2,] 0.5501463 0.75075613 0.74441485 0 ## [3,] 0.2105642 0.08571193 0.40182803 0
# modify matrix P
## [,1] [,2] [,3] ## [1,] 0.9027998 0.20863525 0.06612994 ## [2,] 0.5501463 0.75075613 0.74441485 ## [3,] 0.2105642 0.08571193 0.40182803
P = cbind(P, c(0, 0, 0)) P
## [,1] [,2] [,3] [,4] ## [1,] 0.9027998 0.20863525 0.06612994 0 ## [2,] 0.5501463 0.75075613 0.74441485 0 ## [3,] 0.2105642 0.08571193 0.40182803 0
# add row P = rbind(P, c(0, 0, 0, 0)) P
## [,1] [,2] [,3] [,4] ## [1,] 0.9027998 0.20863525 0.06612994 0 ## [2,] 0.5501463 0.75075613 0.74441485 0 ## [3,] 0.2105642 0.08571193 0.40182803 0 ## [4,] 0.0000000 0.00000000 0.00000000 0
M
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9
N
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
# element-wise sum M + N
## [,1] [,2] [,3] ## [1,] 2 6 10 ## [2,] 6 10 14 ## [3,] 10 14 18
# element-wise product M * N
## [,1] [,2] [,3] ## [1,] 1 8 21 ## [2,] 8 25 48 ## [3,] 21 48 81
# matrix product M %*% N
## [,1] [,2] [,3] ## [1,] 14 32 50 ## [2,] 32 77 122 ## [3,] 50 122 194
# matrix transpose M
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9
t(M)
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
# matrix inverse C = matrix(c(1,0,1, 1,1,1, 1,1,0), nrow=3, byrow=TRUE) D = solve(C) D
## [,1] [,2] [,3] ## [1,] 1 -1 1 ## [2,] -1 1 0 ## [3,] 0 1 -1
D %*% C
## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 1 0 ## [3,] 0 0 1
C %*% D
## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 1 0 ## [3,] 0 0 1
# linear systems C x = b C
## [,1] [,2] [,3] ## [1,] 1 0 1 ## [2,] 1 1 1 ## [3,] 1 1 0
b = c(2, 1, 3) # the system is: # x1 + x3 = 2 # x1 + x2 + x3 = 1 # x1 + x2 = 3 x = solve(C,b) x
## [1] 4 -1 -2
C %*% x
## [,1] ## [1,] 2 ## [2,] 1 ## [3,] 3
# matrix spectrum spectrum = eigen(C) # columns are the eigenvectors spectrum$vectors
## [,1] [,2] [,3] ## [1,] -0.4151581 -0.4743098 -0.6026918 ## [2,] -0.7480890 -0.2110877 0.7515444 ## [3,] -0.5176936 0.8546767 0.2682231
# eigenvalues spectrum$values
## [1] 2.2469796 -0.8019377 0.5549581
# check x = spectrum$vectors[, 1] lambda = spectrum$values[1] lambda * x
## [1] -0.9328517 -1.6809406 -1.1632470
C %*% x
## [,1] ## [1,] -0.9328517 ## [2,] -1.6809406 ## [3,] -1.1632470
prod
)C = matrix(c(1,0,1, 1,1,1, 1,1,0), nrow=3, byrow=TRUE) (v = eigen(C)$values)
## [1] 2.2469796 -0.8019377 0.5549581
sum(v)
## [1] 2
sum(diag(C))
## [1] 2
prod(v)
## [1] -1
A data frame is a list of vectors (called columns). A data frame is like a database table:
name = c("John", "Samuel", "Uma", "Bruce", "Tim") age = c(23, 31, 17, 41, 25) married = c(TRUE, FALSE, FALSE, TRUE, TRUE) pulp = data.frame(name, age, married) pulp
## name age married ## 1 John 23 TRUE ## 2 Samuel 31 FALSE ## 3 Uma 17 FALSE ## 4 Bruce 41 TRUE ## 5 Tim 25 TRUE
# first row pulp[1, ]
## name age married ## 1 John 23 TRUE
# first column # matrix style pulp[ ,1]
## [1] "John" "Samuel" "Uma" "Bruce" "Tim"
pulp[, "name"]
## [1] "John" "Samuel" "Uma" "Bruce" "Tim"
# list style (remember a data frame is a list) pulp$name
## [1] "John" "Samuel" "Uma" "Bruce" "Tim"
pulp[[1]]
## [1] "John" "Samuel" "Uma" "Bruce" "Tim"
# filtering pulp[pulp$name == "Uma", ]
## name age married ## 3 Uma 17 FALSE
pulp[pulp$age < 18, ]
## name age married ## 3 Uma 17 FALSE
pulp[married == TRUE, "name"]
## [1] "John" "Bruce" "Tim"
Extract from the pulp
data frame the names of adult people that are not married.
pulp[married == FALSE & age >= 18, "name"]
## [1] "Samuel"
Since a data frame is a list, and lists can contain other lists as elements, you can create nested data frames, that is data frames whose elements are data frames.
# a data frame Venus = data.frame( x = c(17, 19), y = c("Hello", "Venus") ) # a data frame Jupiter = data.frame( x = c(21, 23), y = c("Hello", "Jupiter") ) # a nested data frame # I() treats the object ‘as is’ worlds = data.frame( x = I(list(Venus, Jupiter)), y = c("Hello", "Worlds") ) str(worlds)
## 'data.frame': 2 obs. of 2 variables: ## $ x:List of 2 ## ..$ :'data.frame': 2 obs. of 2 variables: ## .. ..$ x: num 17 19 ## .. ..$ y: chr "Hello" "Venus" ## ..$ :'data.frame': 2 obs. of 2 variables: ## .. ..$ x: num 21 23 ## .. ..$ y: chr "Hello" "Jupiter" ## ..- attr(*, "class")= chr "AsIs" ## $ y: chr "Hello" "Worlds"
worlds$x[[1]]
## x y ## 1 17 Hello ## 2 19 Venus
worlds$x[[2]]
## x y ## 1 21 Hello ## 2 23 Jupiter
R is an Turing-complete (functional) programming language.
It includes conditional statements:
x = 49 if (x %% 7 == 0) x else -x
## [1] 49
And loops:
x = 108 i = 2 while (i <= x/2) { if (x %% i == 0) print(i) i = i + 1; }
## [1] 2 ## [1] 3 ## [1] 4 ## [1] 6 ## [1] 9 ## [1] 12 ## [1] 18 ## [1] 27 ## [1] 36 ## [1] 54
for (i in 2:(x/2)) { if (x %% i == 0) print(i) }
## [1] 2 ## [1] 3 ## [1] 4 ## [1] 6 ## [1] 9 ## [1] 12 ## [1] 18 ## [1] 27 ## [1] 36 ## [1] 54
df = data.frame( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) # we know the sequence and output lengths # create a vector of a given size output = vector("double", ncol(df)) for (i in 1:ncol(df)) { output[i] = mean(df[[i]]) }
# we know the sequence length but we do NOT know the output length (slow solution) means = c(0, 1, 2) # a vector of doubles of length 0 output = double() for (i in 1:length(means)) { n = sample(1:100, 1) # dynamically increase the vector (slow) output = c(output, rnorm(n, means[i])) } # create a list with length(means) elements (faster solution) output = vector("list", length(means)) for (i in 1:length(means)) { n = sample(1:100, 1) output[[i]] = rnorm(n, means[i]) } # unlist the list into a vector output = unlist(output)
# we do not know the sequence length # iterate until a sequence of Heads of length difficulty is found flips = 0 nheads = 0 difficulty = 10 while (nheads < difficulty) { if (sample(c("T", "H"), 1) == "H") { nheads = nheads + 1 } else { nheads = 0 } flips = flips + 1 } flips
## [1] 2837
Most of the times you can perform your task by applying functions, avoiding loops. This is typically faster.
x = 1:100 # compute the sum (bad) s = 0 for (i in 1:length(x)) { s = s + x[i] } s
## [1] 5050
# compute the sum (good) sum(x)
## [1] 5050
# even faster n = length(x) n * (n+1) / 2
## [1] 5050
You may use built-in functions:
log
## function (x, base = exp(1)) .Primitive("log")
args(log)
## function (x, base = exp(1)) ## NULL
log(x = 128, base = 2)
## [1] 7
log(base = 2, x = 128)
## [1] 7
log(128, 2)
## [1] 7
log(2, 128)
## [1] 0.1428571
log(128)
## [1] 4.85203
Or define your our functions:
euclidean = function(x=0, y=0) {sqrt(x^2 + y^2)} euclidean(1, 1)
## [1] 1.414214
euclidean(1)
## [1] 1
euclidean()
## [1] 0
Or define your own binary operators using functions:
# xor '%()%' = function(x, y) {(x | y) & !(x & y)} TRUE %()% TRUE
## [1] FALSE
TRUE %()% FALSE
## [1] TRUE
FALSE %()% TRUE
## [1] TRUE
FALSE %()% FALSE
## [1] FALSE
Functions may be recursive:
factorial = function(x) { if (x == 0) 1 else x * factorial(x-1) } factorial(5)
## [1] 120
You may write functionals, that are functions whose arguments are other functions:
# compute the sum of applications of f up to n g = function(f, n) { sum = 0 for (i in 1:n) sum = sum + f(i) return(sum) } g(factorial, 5)
## [1] 153
An application of functionals and iteration is the set of apply-like functionals:
df = data.frame( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) # apply mean to each column of data frame, returns a list lapply(df, mean)
## $a ## [1] -0.03719727 ## ## $b ## [1] -0.1672439 ## ## $c ## [1] -0.3510924 ## ## $d ## [1] 0.388815
# apply mean to each column of data frame, returns an atomic vector sapply(df, mean)
## a b c d ## -0.03719727 -0.16724386 -0.35109241 0.38881500
# apply to a vector sapply(1:100, function(x) {x^2})
## [1] 1 4 9 16 25 36 49 64 81 100 121 144 ## [13] 169 196 225 256 289 324 361 400 441 484 529 576 ## [25] 625 676 729 784 841 900 961 1024 1089 1156 1225 1296 ## [37] 1369 1444 1521 1600 1681 1764 1849 1936 2025 2116 2209 2304 ## [49] 2401 2500 2601 2704 2809 2916 3025 3136 3249 3364 3481 3600 ## [61] 3721 3844 3969 4096 4225 4356 4489 4624 4761 4900 5041 5184 ## [73] 5329 5476 5625 5776 5929 6084 6241 6400 6561 6724 6889 7056 ## [85] 7225 7396 7569 7744 7921 8100 8281 8464 8649 8836 9025 9216 ## [97] 9409 9604 9801 10000
mtx <- cbind( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) # apply mean to each column of matrix, returns an atomic vector apply(mtx, 2, mean)
## a b c d ## 0.06472701 -0.38940661 0.45435349 -0.36427525
# apply mean to each row of matrix, returns an atomic vector apply(mtx, 1, mean)
## [1] 0.36985815 0.53755424 0.06308799 -0.81283658 -0.12011102 0.41473053 ## [7] 0.89464644 -1.02795479 -0.36259165 -0.54288674
Write a function that, given a square matrix \(A\) and an integer \(n \geq 0\), computes the power \(A^n\) (use diag
function to build the identity matrix).
power = function(A, n) { k = nrow(A) I = diag(k) if (n == 0) return(I) if (n == 1) return(A) B = A for (i in 2:n) { B = B %*% A } return(B) } A = matrix(c(1,2,0, 0,3,0, 0,5,1), nrow=3, byrow=TRUE) power(A, 5)
## [,1] [,2] [,3] ## [1,] 1 242 0 ## [2,] 0 243 0 ## [3,] 0 605 1
The determinant of a square matrix is the product of the eigenvalues of the matrix.
prod
)det = function(A) { v = eigen(A)$values return (prod(v)) } detn = function(A, n) { v = eigen(A)$values return (prod(v)^n) } (A = matrix(c(1,2,0, 0,3,0, 0,5,1), nrow=3, byrow=TRUE))
## [,1] [,2] [,3] ## [1,] 1 2 0 ## [2,] 0 3 0 ## [3,] 0 5 1
det(A)
## [1] 3
detn(A, 5)
## [1] 243
# a data matrix M = matrix(c( c(1200, 1190, 1100, 1120, 890), c(6200, 6690, 6700, 7120, 7150), c(8900, 8790, 8760, 8800, 9010), c(3300, 3490, 3660, 4300, 4510), c(2190, 2000, 1890, 1740, 1500)), ncol = 5 ) # give names to rows rownames(M) = 2014:2018 # give names to columns colnames(M) = LETTERS[1:5] M
## A B C D E ## 2014 1200 6200 8900 3300 2190 ## 2015 1190 6690 8790 3490 2000 ## 2016 1100 6700 8760 3660 1890 ## 2017 1120 7120 8800 4300 1740 ## 2018 890 7150 9010 4510 1500
# barplot barplot(M[1,])
# stacked barplot barplot(M, legend=TRUE)
# juxtaposed barplot barplot(M, beside=TRUE, legend=TRUE)
# histogram x = rnorm(1000) hist(x, probability=TRUE, main="Histogram of a normal sample") ## add distribution rug(x)
# density plot plot(density(x), main="Density of a normal sample") rug(x)
# boxplot # If range is positive, the whiskers extend to the most extreme # data point which is no more than range times the interquartile # range from the box. A value of zero causes the whiskers to extend # to the data extremes. x = rnorm(1000) boxplot(x, range = 1.5)
boxplot(x, range = 0)
# scatter plot x = rnorm(100) y = rnorm(100) plot(x, y)
y = x + runif(100) plot(x, y)
R Markdown files are designed to be used in three ways: