R as a programming language

> "Hello World!"
[1] "Hello World!"

> 1 + 1
[1] 2

> (1 + (2 * 4) - 3) / 2
[1] 3

> # integer division
> 31 %/% 3
[1] 10

> # modulus
> 31 %% 3
[1] 1

> # exponents
> 2^10
[1] 1024

> # comparison
> 1 == 1
[1] TRUE

> 1 != 1
[1] FALSE

> 1 < 1
[1] FALSE

> 1 <= 1
[1] TRUE

> # logical comparison

> FALSE & TRUE
[1] FALSE

> FALSE && TRUE
[1] FALSE

> TRUE | FALSE
[1] TRUE

> TRUE || FALSE
[1] TRUE

> !TRUE
[1] FALSE

> xor(TRUE, FALSE)
[1] TRUE

> xor(TRUE, TRUE)
[1] FALSE

For conjunction and disjunction we have a shorter and a longer form. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right and evaluation proceeds only until the result is determined.

There are a few special values. The value NA (not available) is used to represent missing values. Not to be confused with the value NULL, which is the null object. The value Inf stands for positive infinity:

> 2^1024
[1] Inf

> 1 / 0
[1] Inf

The value NaN (not a number) is the result of a computation that makes no sense:

> 0 / 0
[1] NaN

> Inf - Inf
[1] NaN

Of course, you may use variables to store values. There are 3 equivalent ways to assign a value to a variable (we will use the first one):

> x = 1
> x <- 1
> 1 -> x

To print the value of a variable, just type it:

> x
[1] 1

In R, any number is in fact a vector of length 1. The [1] means that the index of the first item displayed in the row is 1. Vector indexes start at 1 (not 0). Construct longer vectors with c (combine) function:

> c(0, 1, 1, 2, 3, 5, 8)
[1] 0 1 1 2 3 5 8

or using : operator:

> 0:99
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 [26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
 [51] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
 [76] 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

Operations on two vectors are performed element by element:

> c(1, 2, 3, 4) + c(10, 20, 30, 40)
[1] 11 22 33 44

> c(1, 2, 3, 4) * c(10, 20, 30, 40)
[1]  10  40  90 160

If the two vectors have different lengths, the smaller one is repeated:

> c(1, 2, 3, 4) + 10
[1] 11 12 13 14

> c(1, 2, 3, 4) + c(10, 20)
[1] 11 22 13 24

Character vectors are vectors of strings:

> c("This", "class", "is", "really", "terrific!")
[1] "This"      "class"     "is"        "really"    "terrific!"

You may refer to members of a vector in several ways:

a = 1:10 * 2
> a
 [1]  2  4  6  8 10 12 14 16 18 20

> a[5]
[1] 10

> a[c(1, 5, 10)]
[1]  2 10 20

> a[a > 10]
[1] 12 14 16 18 20

Notice that in the last example we used a vector of Booleans:

> a > 10
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

An array is a multi-dimensional vector:

a = array(1:12, dim=c(3, 4))

# or, equivalently:
a = 1:12
dim(a) = c(3, 4)

> a
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

> a[2,3]
[1] 8

> a[c(1,3), 2:4]
     [,1] [,2] [,3]
[1,]    4    7   10
[2,]    6    9   12

> a[1,]
[1]  1  4  7 10

> a[,1]
[1] 1 2 3

a = array(1:18, dim=c(3, 3, 2))

> a[, , 1]
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

> a[, , 2]
     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

A matrix is a 2-dimensional array:

m = matrix(data = 1:12, nrow=3, ncol=4)

# or, equivalently
m = matrix(data = 1:12, nrow=3)

> m
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

m = matrix(data = 1:12, nrow=3, byrow=TRUE)

> m
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

A list is a data type that allows to mix data of different types:

l = list(thing="hat", size=8.25)

> l
$thing
[1] "hat"

$size
[1] 8.25

> l$thing
[1] "hat"

> l[["thing"]]
[1] "hat"

> l[[1]]
[1] "hat"

Mind that l[1] is a sub-list containing only the first component of list l:

> l[1]
$thing
[1] "hat"

> l[1]$thing
[1] "hat"

> l[1][[1]]
[1] "hat"

A data frame is a list of named vectors of the same length. A data frame is like a database table.

team = c("Inter", "Milan", "Roma", "Palermo")
score = c(59, 58, 53, 46)
win = c(17, 17, 15, 13)
tie = c(8, 7, 8, 7)
lost = c(3, 4, 5, 8)

league = data.frame(team, score, win, tie, lost)
> league
    team score win tie lost
1   Inter    59  17   8    3
2   Milan    58  17   7    4
3    Roma    53  15   8    5
4 Palermo    46  13   7    8

> league[1,]
   team score win tie lost
1 Inter    59  17   8    3

> league[,2]
[1] 59 58 53 46

> league[,"score"]
[1] 59 58 53 46

> league[1:3, c("team", "score")]
   team score
1 Inter    59
2 Milan    58
3  Roma    53

> league$score
[1] 59 58 53 46

> league$score == max(league$score)
[1]  TRUE FALSE FALSE FALSE

> league[league$score == max(league$score), ]
   team score win tie lost
1 Inter    59  17   8    3

> league$team
[1] Inter   Milan   Roma    Palermo
Levels: Inter Milan Palermo Roma

> as.vector(league$team)
[1] "Inter"   "Milan"   "Roma"    "Palermo"

Notice that league$team is of type factor. A factor is a collection of items with a small set of repeated values, called levels. They are efficiently implemented mapping levels to integers. They might be used to store categorial data. For instance:

poll.results = factor(c("Berlusconi", "Berlusconi", "Bersani", "Casini", "Bersani"))

> poll.results
[1] Berlusconi Berlusconi Bersani    Casini     Bersani   
Levels: Berlusconi Bersani Casini

> levels(poll.results)
[1] "Berlusconi" "Bersani"    "Casini" 

R is in fact an object-oriented functional programming language. Conditional statements take the form:

x = 49
> if (x %% 7 == 0) x else -x
[1] 49

Looping constructs include while and for:

x = 99
i = 2
while (i < x) {
 if (x %% i == 0) print(i)
 i = i + 1;
}  

[1] 3
[1] 9
[1] 11
[1] 33

x = 99
for (i in 2:(x-1)) {
  if (x %% i == 0) print(i)
}

[1] 3
[1] 9
[1] 11
[1] 33

You may use built-in functions:

> log(128, 2)
[1] 7

> args(log)
function (x, base = exp(1)) 
NULL

> log(x=128, base=2)
[1] 7

> log(base=2, x=128)
[1] 7

> log(exp(1)^2)
[1] 2

Or define your our functions:

f = function(x=0, y=0) {sqrt(x^2 + y^2)}
> f
function(x,y) {sqrt(x^2 + y^2)}

> args(f)
function (x = 0, y = 0) 
NULL

> f(1, 1)
[1] 1.414214

> f(1)
[1] 1

> f()
[1] 0

Functions may be recursive:

factorial = function(x) {
 if (x == 0) 1 else x * factorial(x-1)
}
 
> factorial(5)
[1] 120

You may use functions as arguments to other functions:

g = function(f, n) {
 sum = 0;
 for (i in 0:n) sum = sum + f(i);
 return(sum);
}
 
> g(factorial, 5)
[1] 154

You may define your own binary operators using functions:

'%my%' = function(x, y) {2 * x + 2 * y}
> 1 %my% 5
[1] 12

You can store a script of commands in a possibly remote file and evaluate the script using the source command:

source("script.R")

R comes with a number of packages, some of them are loaded by default (like package base). To see all available packages run:

(.packages(all.available=TRUE))

To see loaded packages:

(.packages())

To load an available package:

library(stat4)

Packages not available within R installation can be found on repositories like CRAN and Bioconductor. They can be downloaded and installed from the R console (recall to load them after installation if you want to use them):

install.packages("igraph")

or from command line:

R CMD INSTALL igraph_0.5.3.tar.gz

To remove a package:

remove.packages("igraph")

Getting help:

?log
?'+'
??"regression"

Clear screen with key combination CTRL+l

Quit:

q()

The workspace is saved in files .RData (environment) and .Rhistory (command history). The global environment is the default working space. Use function exists to check if a name exists in the environment, function objects to print all names, and function remove to remove an object.