Elo method was coined by the physics professor and excellent chess player Arpad Elo. In 1970, FIDE, the World Chess Federation, agreed to adopt the Elo Rating System.

The method works as follows. Suppose that players \(i\) and \(j\) match. Let \(s_{i,j}\) be the actual score of \(i\) in the match against \(j\). We have that:

Notice that the actual score \(s_{j,i}\) of \(j\) in the match against \(i\) is \(1 - s_{i,j}\). Let \(\mu_{i,j}\) be the expected score of \(i\) in the match against \(j\). We have that:

\[ \begin{array}{lll} \mu_{i,j} & = & \frac{1}{1 + 10^{-(r_i - r_j) / \zeta}} = \frac{10^{r_i / \zeta}}{10^{r_i / \zeta} + 10^{r_j / \zeta}} \\\\ \end{array} \]

with \(r_i\) and \(r_j\) the ratings of \(i\) and \(j\) before the match and \(\zeta\) is a constant. Notice that the expected score \(\mu_{j,i}\) of \(j\) in the match against \(i\) is \(1 - \mu_{i,j}\).

We assume that initially all player ratings are equal to 0. When players \(i\) and \(j\) match, the new ratings \(r_i\) of \(i\) and \(r_j\) of \(j\) are modified using the following update rule:

\[ \begin{array}{lll} r_{i} & \leftarrow & r_i + \kappa (s_{i,j} - \mu_{i,j}) \\ r_j & \leftarrow & r_j + \kappa (s_{j,i} - \mu_{j,i}) \end{array} \]

where \(\kappa\) is a constant.

The Elo thesis is:

If a player performs as expected, it gains nothing. If it performs better than expected, it is rewarded, while if it performs poorer than expected, it is penalized.

According to the movie The social network Mark Zuckerberg by David Fincher, it appears that the Elo’s method formed the basis for rating people on Zuckerberg’s Web site Facemash, which was the predecessor of Facebook.

This challenge is ispired by Chess ratings - Elo versus the Rest of the World Kaggle competition.

Downloads

Challenges

  1. An interesting property of Elo’s ratings is that the sum of all player ratings is always 0. Formally show this property. (Hint: use the fact that \(s_{i,j} + s_{j,i} = 1\) and \(\mu_{i,j} + \mu_{j,i} = 1\))
  2. Has the White an advantage over the Black, that is, is there a first-mover advantage?
  3. Visualize the number of matches by month.
  4. Compute the player point rating and observe its distribution
  5. Compute the player Elo rating and obserse its distribution (set \(\zeta = 400\) and \(\kappa = 25\)). Verify the 0-sum property for Elo’s ratings.
  6. Are point and Elo ratings correlated? Are top Elo players overlapping with top point players?
  7. Test if the number of played games has an effect on the Elo player rating. Why?
# Analysis
library(tidyverse)
library(modelr)
games = read_csv("training_data.csv")

The sum of ratings is always 0

After a match between \(i\) and \(j\) the overall increase of rating in the system is:

\[ \kappa (s_{i,j} - \mu_{i,j}) + \kappa (s_{j,i} - \mu_{j,i}) = \kappa (s_{i,j} + s_{j,i}) - \kappa (\mu_{i,j} + \mu_{j,i}) = \kappa - \kappa = 0 \]

Has the White an advantage over the Black?

ngames = nrow(games)
group_by(games, Score) %>% 
  summarize(n = n(), pn = n / ngames)
## # A tibble: 3 x 3
##   Score     n    pn
##   <dbl> <int> <dbl>
## 1   0   15224 0.234
## 2   0.5 28666 0.441
## 3   1   21163 0.325
# excluding draws
games2 = filter(games, Score != 0.5) 
ngames2 = nrow(games2)
group_by(games2, Score) %>% 
  summarize(n = n(), pn = n / ngames2)
## # A tibble: 2 x 3
##   Score     n    pn
##   <dbl> <int> <dbl>
## 1     0 15224 0.418
## 2     1 21163 0.582

Visualize the number of matches by month

ggplot(count(games, Month), aes(x = Month, y = n)) +
  geom_line() +
  geom_smooth(se = FALSE)

Compute the player point rating and observe its distribution

players = sort(unique(c(games$White, games$Black)))

pW = group_by(games, White) %>%
  summarise(matchesWhite = n(), pointsWhite = sum(Score))

pB = group_by(games, Black) %>%
  summarise(matchesBlack = n(), pointsBlack = sum(1-Score))

rating = 
  tibble(player = players) %>% 
  left_join(pW, by = c("player" = "White")) %>% 
  left_join(pB, by = c("player" = "Black")) %>% 
  mutate(pointsWhite = ifelse(is.na(pointsWhite), 0, pointsWhite), 
         pointsBlack = ifelse(is.na(pointsBlack), 0, pointsBlack),
         matchesWhite = ifelse(is.na(matchesWhite), 0, matchesWhite), 
         matchesBlack = ifelse(is.na(matchesBlack), 0, matchesBlack)) %>% 
  mutate(points = pointsWhite + pointsBlack, matches = matchesWhite + matchesBlack)
  

ggplot(rating) + geom_histogram(aes(x = points))

Compute the player Elo rating and obserse its distribution

##  Elo
# INPUT
# games: matches
# zeta: logistic parameter
# k: update factor
# OUTPUT
# r: rating vector
elo = function(games, z = 400, k = 25) {
  
  # number of players
  players = unique(c(games$White, games$Black))
  n = max(players)

  # number of games
  m = nrow(games)
  
  # old rating vector
  rold = rep(0, n)
  
  # new rating vector
  rnew = rep(0, n)
  
  for (i in 1:m) {

    # White player
    # compute update
    score = games[[i, "Score"]]
    spread = rold[games[[i, "White"]]] - rold[games[[i, "Black"]]]
    mu = 1 / (1 + 10^(-spread / z))
    update = k * (score - mu)
    # update rating
    rnew[games[[i,"White"]]] = rold[games[[i,"White"]]] + update
    
    # Black player
    # compute update
    score = 1 - games[[i, "Score"]]
    spread = rold[games[[i, "Black"]]] - rold[games[[i, "White"]]]
    mu = 1 / (1 + 10^(-spread / z))
    update = k * (score - mu)
    # update rating
    rnew[games[[i,"Black"]]] = rold[games[[i,"Black"]]] + update
    
    # update old ratings
    rold[games[[i,"White"]]] = rnew[games[[i,"White"]]]
    rold[games[[i,"Black"]]] = rnew[games[[i,"Black"]]]
  }
  return(rnew)
}


e = elo(games)
eloRating = tibble(player = 1:length(e), elo = e)
rating = left_join(rating, eloRating)

# check sum is 0
sum(rating$elo)
## [1] 1.829814e-12
ggplot(rating) + geom_histogram(aes(x = elo), boundary = 0)

count(rating, cut_width(elo, 5, boundary = 0), sort = TRUE)
## # A tibble: 80 x 2
##    `cut_width(elo, 5, boundary = 0)`     n
##    <fct>                             <int>
##  1 (-15,-10]                          1038
##  2 (-5,0]                              784
##  3 (10,15]                             552
##  4 (-25,-20]                           500
##  5 (0,5]                               481
##  6 (-10,-5]                            330
##  7 (-35,-30]                           274
##  8 (-20,-15]                           260
##  9 (-30,-25]                           258
## 10 (-40,-35]                           247
## # … with 70 more rows
summary(rating$elo)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -146.456  -22.873   -6.005    0.000   12.476  326.950

Are point and Elo ratings correlated?

ggplot(rating, aes(x = points, y = elo)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()

ggplot(filter(rating, points > 100), aes(x = points, y = elo)) +
  geom_point() +
  geom_smooth(se=FALSE) +
  theme_bw()

top10Points = 
  arrange(rating, -points) %>% 
  head(10) %>% 
  select(player)

top10Elo = 
  arrange(rating, -elo) %>% 
  head(10) %>% 
  select(player)

intersect(top10Elo, top10Points)
## # A tibble: 1 x 1
##   player
##    <dbl>
## 1     64

Test if the number of played games has an effect on the Elo player rating

ggplot(rating, aes(x = matches, y = elo)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()

ggplot(rating, aes(x = matches, y = points)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()