Elo method was coined by the physics professor and excellent chess player Arpad Elo. In 1970, FIDE, the World Chess Federation, agreed to adopt the Elo Rating System.

The method works as follows. Suppose that players \(i\) and \(j\) match. Let \(s_{i,j}\) be the actual score of \(i\) in the match against \(j\). We have that:

Notice that the actual score \(s_{j,i}\) of \(j\) in the match against \(i\) is \(1 - s_{i,j}\). Let \(\mu_{i,j}\) be the expected score of \(i\) in the match against \(j\). We have that:

\[ \begin{array}{lll} \mu_{i,j} & = & \frac{1}{1 + 10^{-(r_i - r_j) / \zeta}} = \frac{10^{r_i / \zeta}}{10^{r_i / \zeta} + 10^{r_j / \zeta}} \\\\ \end{array} \]

with \(r_i\) and \(r_j\) the ratings of \(i\) and \(j\) before the match and \(\zeta\) is a constant. Notice that the expected score \(\mu_{j,i}\) of \(j\) in the match against \(i\) is \(1 - \mu_{i,j}\).

We assume that initially all player ratings are equal to 0. When players \(i\) and \(j\) match, the new ratings \(r_i\) of \(i\) and \(r_j\) of \(j\) are modified using the following update rule:

\[ \begin{array}{lll} r_{i} & \leftarrow & r_i + \kappa (s_{i,j} - \mu_{i,j}) \\ r_j & \leftarrow & r_j + \kappa (s_{j,i} - \mu_{j,i}) \end{array} \]

where \(\kappa\) is a constant.

The Elo thesis is:

If a player performs as expected, it gains nothing. If it performs better than expected, it is rewarded, while if it performs poorer than expected, it is penalized.

According to the movie The social network by David Fincher, it appears that the Elo’s method formed the basis for rating people on Zuckerberg’s Web site Facemash, which was the predecessor of Facebook. This challenge is inspired by Chess ratings - Elo versus the Rest of the World Kaggle competition.

Downloads

Challenges

  1. Has the White an advantage over the Black, that is, is there a first-mover advantage?
  2. Compute the player point rating. How are points distributed? Why?
  3. An interesting property of Elo’s ratings is that the sum of all player ratings is always 0. Formally show this property. (Hint: use the fact that \(s_{i,j} + s_{j,i}=1\) and \(\mu_{i,j} + \mu_{j,i} =1\))
  4. Compute the player Elo rating (set \(\zeta = 400\) and \(\kappa = 25\)) (Hint: take advantage of the 0-sum property to optimize the code). Observe its distribution: why is different from the point distribution? Finally, verify the 0-sum property for Elo’s ratings
  5. Are point and Elo ratings correlated? Are top Elo players overlapping with top point players?
  6. Test if the number of played games has an effect on the Elo and point player rating. Why?
library(tidyverse)
# put games into a data frame
games = read_csv("data.csv")

Has the White an advantage over the Black?

group_by(games, Score) %>% 
  summarize(n = n(), pn = n / nrow(games))
## # A tibble: 3 × 3
##   Score     n    pn
##   <dbl> <int> <dbl>
## 1   0   15224 0.234
## 2   0.5 28666 0.441
## 3   1   21163 0.325
# excluding draws
games2 = filter(games, Score != 0.5) 
group_by(games2, Score) %>% 
  summarize(n = n(), pn = n / nrow(games2))
## # A tibble: 2 × 3
##   Score     n    pn
##   <dbl> <int> <dbl>
## 1     0 15224 0.418
## 2     1 21163 0.582

Compute the player point rating and observe its distribution

# players are identified by integer numbers from 1. 
# Some numbers are missing since the corresponding player was not sampled.
players = sort(unique(c(games$White, games$Black)))

ratingWhite = group_by(games, White) %>%
  summarise(matchesWhite = n(), pointsWhite = sum(Score))

ratingBlack = group_by(games, Black) %>%
  summarise(matchesBlack = n(), pointsBlack = sum(1-Score))

rating = 
  tibble(player = players) %>% 
  left_join(ratingWhite, join_by(player == White)) %>% 
  left_join(ratingBlack, join_by(player == Black)) %>% 
  mutate(pointsWhite = ifelse(is.na(pointsWhite), 0, pointsWhite), 
         pointsBlack = ifelse(is.na(pointsBlack), 0, pointsBlack),
         matchesWhite = ifelse(is.na(matchesWhite), 0, matchesWhite), 
         matchesBlack = ifelse(is.na(matchesBlack), 0, matchesBlack)) %>% 
  mutate(points = pointsWhite + pointsBlack, matches = matchesWhite + matchesBlack)
  

ggplot(rating) + 
  geom_histogram(aes(x = points))

summary(rating$points)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.00    3.00    8.91    9.00  167.50
  • the points distribution is a typical long-tail distribution associated with human talent, with many players with a low number of points and few players with an extraordinary number of points
  • notice that the number of points cannot go below the threshold of 0, hence many players gather around this threshold

Formally show the 0-sum property

After a match between \(i\) and \(j\) the overall increase of rating in the system is:

\[ \kappa (s_{i,j} - \mu_{i,j}) + \kappa (s_{j,i} - \mu_{j,i}) = \kappa (s_{i,j} + s_{j,i}) - \kappa (\mu_{i,j} + \mu_{j,i}) = \kappa - \kappa = 0 \]

Compute the player Elo rating and obserse its distribution

##  Elo rating system
# INPUT
# games: a game *matrix* with columns White, Black and Score
#        Players are integer numbers starting at 1
#        The matrix is sorted in chronological order of the matches
# z: logistic parameter
# k: update factor
# OUTPUT
# r: rating vector
elo = function(games, z = 400, k = 25) {
  
  # number of players 
  # (players are integer numbers starting at 1)
  n = max(c(games[, "White"], games[, "Black"]))

  # number of games
  m = nrow(games)
  
  # rating vector
  r = rep(0, n)
  
  # iterate through games
  for (i in 1:m) {
    score = games[i, "Score"]
    white = games[i, "White"]
    black = games[i, "Black"]

    # compute update
    spread = r[white] - r[black]
    mu = 1 / (1 + 10^(-spread / z))
    update = k * (score - mu)
    
    # update ratings
    r[white] = r[white] + update
    r[black] = r[black] - update
  
  }
  return(r)
}
games_matrix = as.matrix(games)
eloVector = elo(games_matrix)
eloRating = tibble(player = 1:length(eloVector), elo = eloVector)
rating = left_join(rating, eloRating)

# check sum is 0
sum(rating$elo)
## [1] 9.298673e-13
summary(rating$elo)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -146.456  -22.873   -6.005    0.000   12.476  326.950
ggplot(rating) + 
  geom_histogram(aes(x = elo))

  • since the Elo rating can both increase and decrease, the Elo distribution is roughly symmetric, in particular around some value slightly below 0
  • notice that the right tail is longer than the left one: both represent a form of talent

Are point and Elo ratings correlated?

ggplot(rating, aes(x = points, y = elo)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()

ggplot(filter(rating, points > 100), aes(x = points, y = elo)) +
  geom_point() +
  geom_smooth(se=FALSE) +
  theme_bw()

top10Points = 
  arrange(rating, -points) %>% 
  head(10) %>% 
  select(player)

top10Elo = 
  arrange(rating, -elo) %>% 
  head(10) %>% 
  select(player)

intersect(top10Elo, top10Points)
## # A tibble: 1 × 1
##   player
##    <dbl>
## 1     64
  • a positive correlation between points and Elo ratings exists until a given threshold
  • indeed, it is much more difficult for good players to increase their Elo ratings

Test if the number of played games has an effect on the Elo and point player ratings

ggplot(rating, aes(x = matches, y = points)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()

cor(rating$matches, rating$points)
## [1] 0.9900841
ggplot(rating, aes(x = matches, y = elo)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()

ggplot(filter(rating, matches > 20, matches < 200), aes(x = matches, y = elo)) +
  geom_point(alpha = 0.2) +
  geom_smooth(se=FALSE) +
  theme_bw()

intermediate = filter(rating, matches > 20, matches < 200)
cor.test(intermediate$matches, intermediate$elo)
## 
##  Pearson's product-moment correlation
## 
## data:  intermediate$matches and intermediate$elo
## t = 35.782, df = 1726, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6246670 0.6788635
## sample estimates:
##       cor 
## 0.6525992
  • the more a player plays, the stronger it gets in terms of number of points
  • this is not true for Elo ratings: newcomers are weak players and they do not scale up their Elo score as they play (typically, they get defeated as expected), seasoned players are strong ones and, as observed above, for them it is harder to increase their Elo score. Those in the middle have the best chance to increase their Elo scores as they play more games.