Elo method was coined by the physics professor and excellent chess player Arpad Elo. In 1970, FIDE, the World Chess Federation, agreed to adopt the Elo Rating System.
The method works as follows. Suppose that players \(i\) and \(j\) match. Let \(s_{i,j}\) be the actual score of \(i\) in the match against \(j\). We have that:
Notice that the actual score \(s_{j,i}\) of \(j\) in the match against \(i\) is \(1 - s_{i,j}\). Let \(\mu_{i,j}\) be the expected score of \(i\) in the match against \(j\). We have that:
\[ \begin{array}{lll} \mu_{i,j} & = & \frac{1}{1 + 10^{-(r_i - r_j) / \zeta}} = \frac{10^{r_i / \zeta}}{10^{r_i / \zeta} + 10^{r_j / \zeta}} \\\\ \end{array} \]
with \(r_i\) and \(r_j\) the ratings of \(i\) and \(j\) before the match and \(\zeta\) is a constant. Notice that the expected score \(\mu_{j,i}\) of \(j\) in the match against \(i\) is \(1 - \mu_{i,j}\).
We assume that initially all player ratings are equal to 0. When players \(i\) and \(j\) match, the new ratings \(r_i\) of \(i\) and \(r_j\) of \(j\) are modified using the following update rule:
\[ \begin{array}{lll} r_{i} & \leftarrow & r_i + \kappa (s_{i,j} - \mu_{i,j}) \\ r_j & \leftarrow & r_j + \kappa (s_{j,i} - \mu_{j,i}) \end{array} \]
where \(\kappa\) is a constant.
The Elo thesis is:
If a player performs as expected, it gains nothing. If it performs better than expected, it is rewarded, while if it performs poorer than expected, it is penalized.
According to the movie The social network by David Fincher, it appears that the Elo’s method formed the basis for rating people on Zuckerberg’s Web site Facemash, which was the predecessor of Facebook. This challenge is ispired by Chess ratings - Elo versus the Rest of the World Kaggle competition.
library(tidyverse)
# put games into a data frame
games = read_csv("data.csv")
group_by(games, Score) %>%
summarize(n = n(), pn = n / nrow(games))
## # A tibble: 3 × 3
## Score n pn
## <dbl> <int> <dbl>
## 1 0 15224 0.234
## 2 0.5 28666 0.441
## 3 1 21163 0.325
# excluding draws
games2 = filter(games, Score != 0.5)
group_by(games2, Score) %>%
summarize(n = n(), pn = n / nrow(games2))
## # A tibble: 2 × 3
## Score n pn
## <dbl> <int> <dbl>
## 1 0 15224 0.418
## 2 1 21163 0.582
# players are identified by integer numbers from 1.
# Some numbers are missing since the corresponding player was not sampled.
players = sort(unique(c(games$White, games$Black)))
ratingWhite = group_by(games, White) %>%
summarise(matchesWhite = n(), pointsWhite = sum(Score))
ratingBlack = group_by(games, Black) %>%
summarise(matchesBlack = n(), pointsBlack = sum(1-Score))
rating =
tibble(player = players) %>%
left_join(ratingWhite, join_by(player == White)) %>%
left_join(ratingBlack, join_by(player == Black)) %>%
mutate(pointsWhite = ifelse(is.na(pointsWhite), 0, pointsWhite),
pointsBlack = ifelse(is.na(pointsBlack), 0, pointsBlack),
matchesWhite = ifelse(is.na(matchesWhite), 0, matchesWhite),
matchesBlack = ifelse(is.na(matchesBlack), 0, matchesBlack)) %>%
mutate(points = pointsWhite + pointsBlack, matches = matchesWhite + matchesBlack)
ggplot(rating) +
geom_histogram(aes(x = points))
summary(rating$points)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 3.00 8.91 9.00 167.50
After a match between \(i\) and \(j\) the overall increase of rating in the system is:
\[ \kappa (s_{i,j} - \mu_{i,j}) + \kappa (s_{j,i} - \mu_{j,i}) = \kappa (s_{i,j} + s_{j,i}) - \kappa (\mu_{i,j} + \mu_{j,i}) = \kappa - \kappa = 0 \]
## Elo rating system
# INPUT
# games: a game *matrix* with columns White, Black and Score
# Players are integer numbers starting at 1
# The matrix is sorted in chronological order of the matches
# z: logistic parameter
# k: update factor
# OUTPUT
# r: rating vector
elo = function(games, z = 400, k = 25) {
# number of players
# (players are integer numbers starting at 1)
n = max(c(games[, "White"], games[, "Black"]))
# number of games
m = nrow(games)
# rating vector
r = rep(0, n)
# iterate through games
for (i in 1:m) {
score = games[i, "Score"]
white = games[i, "White"]
black = games[i, "Black"]
# compute update
spread = r[white] - r[black]
mu = 1 / (1 + 10^(-spread / z))
update = k * (score - mu)
# update ratings
r[white] = r[white] + update
r[black] = r[black] - update
}
return(r)
}
elo_gpt <- function(games, z = 400, k = 25) {
# Identify the unique players and initialize their ratings to 1500
players <- unique(c(games[, "White"], games[, "Black"]))
n_players <- max(players)
r <- rep(0, n_players) # Rating vector initialized to 1500 for each player
# Loop through each game in chronological order
for (i in 1:nrow(games)) {
# Extract game details
white <- games[i, "White"]
black <- games[i, "Black"]
score <- games[i, "Score"]
# Calculate the expected scores for White and Black using the logistic formula
expected_white <- 1 / (1 + 10^((r[black] - r[white]) / z))
expected_black <- 1 - expected_white
# Update the ratings based on the actual game outcome
r[white] <- r[white] + k * (score - expected_white)
r[black] <- r[black] + k * ((1 - score) - expected_black)
}
# Return the final ratings vector
return(r)
}
games_matrix = as.matrix(games)
r1 = elo(games_matrix)
r2 = elo_gpt(games_matrix)
r1[1:10]
## [1] 101.770151 -1.810541 -20.047513 -10.191793 -43.468542 12.567685
## [7] -12.714334 -13.013391 -21.115185 -86.026878
r2[1:10]
## [1] 101.770151 -1.810541 -20.047513 -10.191793 -43.468542 12.567685
## [7] -12.714334 -13.013391 -21.115185 -86.026878
# check residuals
sum(abs(r1 - r2))
## [1] 9.669154e-12
# compare performance
library(microbenchmark)
microbenchmark(elo(games_matrix), elo_gpt(games_matrix), times = 10)
## Unit: milliseconds
## expr min lq mean median uq max
## elo(games_matrix) 224.8306 225.4506 234.8613 227.0837 228.7867 294.3227
## elo_gpt(games_matrix) 256.7985 259.3296 262.7149 260.4569 261.7662 279.3780
## neval
## 10
## 10
eloVector = elo(games_matrix)
eloRating = tibble(player = 1:length(eloVector), elo = eloVector)
rating = left_join(rating, eloRating)
# check sum is 0
sum(rating$elo)
## [1] 9.298673e-13
summary(rating$elo)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -146.456 -22.873 -6.005 0.000 12.476 326.950
ggplot(rating) +
geom_histogram(aes(x = elo))
ggplot(rating, aes(x = matches, y = points)) +
geom_point(alpha = 0.2) +
geom_smooth(se=FALSE) +
theme_bw()
cor(rating$matches, rating$points)
## [1] 0.9900841
ggplot(rating, aes(x = matches, y = elo)) +
geom_point(alpha = 0.2) +
geom_smooth(se=FALSE) +
theme_bw()
ggplot(filter(rating, matches > 20, matches < 200), aes(x = matches, y = elo)) +
geom_point(alpha = 0.2) +
geom_smooth(se=FALSE) +
theme_bw()
intermediate = filter(rating, matches > 20, matches < 200)
cor.test(intermediate$matches, intermediate$elo)
##
## Pearson's product-moment correlation
##
## data: intermediate$matches and intermediate$elo
## t = 35.782, df = 1726, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6246670 0.6788635
## sample estimates:
## cor
## 0.6525992