Elo's method

The Elo's system is an iterative method coined by the physics professor and excellent chess player Arpad Elo. Let $S_{i,j}$ be the score of team $i$ against team $j$; for instance, in chess a win is given a score of 1 and a draw a score of 1/2 (and a defeat a score of 0). Notice that $S_{i,j} + S_{j,i} = 1$. Let $\mu_{i,j}$ be the number of points that team $i$ is expected to score against team $j$; this is typically computed as a logistic function of the difference of ratings between the players, for instance, $$\mu_{i,j} = \frac{1}{1 + 10^{-d_{i,j} / \zeta}},$$ where $d_{i,j} = r_i(old) - r_j(old)$ and $\zeta$ is a constant (in the chess world $\zeta = 400$). Then, when teams $i$ and $j$ match, the new ranks $r_i(new)$ of team $i$ and $r_j(new)$ of team $j$ are updated as follows: $$ r_i(new) = r_i(old) + \kappa (S_{i,j} - \mu_{i,j}) \\ r_j(new) = r_j(old) + \kappa (S_{j,i} - \mu_{j,i}) $$ where $\kappa$ is a constant (for instance, in chess $\kappa = 25$ for new players).

Hence, beating a stronger player has a larger reward than beating a weaker one. Notice the intriguing similarity of Elo's update equation with equation defining temporal Massey's method. According to the movie The social network by David Fincher, it appears that the Elo's method formed the basis for rating people on Zuckerberg's Web site Facemash, which was the predecessor of Facebook.

What is the meaning of constants $\kappa$ and $\zeta$? As for parameter $\kappa$, it balances the difference between actual and expected scores against prior ratings. If $\kappa$ is too large, then too much volatile are the ratings. On the other hand, if $\kappa$ is too small, then too much stagnant are the ratings. As for parameter $\zeta$, notice that $$\mu_{i,j} = \frac{1}{1 + 10^{-d_{i,j} / \zeta}} = \frac{10^{r_{i} / \zeta}}{10^{r_{i} / \zeta} + 10^{r_{j} / \zeta}}$$ hence $$\mu_{i,j} + \mu_{j,i} = 1$$ and $$\frac{\mu_{i,j}}{\mu_{j,i}} = \frac{10^{r_{i} / \zeta}}{10^{r_{j} / \zeta}} $$ hence $$\mu_{i,j} = \mu_{j,i} 10^{(r_{i} - r_{j}) / \zeta} $$ This means that for every $\zeta$ rating points of advantage that player $i$ has over player $j$, the probability of player $i$ beating player $j$ is expected to be 10 times greater than the probability of player $j$ beating player $i$.

Instead of building ratings based just on wins and loses, Elo's can use the number of points $P_{i,j}$ that $i$ scores against $j$ by defining the score that team $i$ makes against team $j$ as $$S_{i,j} = \frac{P_{i,j} + 1}{P_{i,j} + P_{j,i} + 2}$$ Hence $0 < S_{i,j} < 1$ and $S_{i,j} + S_{j,i} = 1$. Notice that in this case, winning with a little point margin against a week player can in fact decrease your rating, and losing with a little point margin against a strong player can in fact increase your rating.

Another interesting property of Elo's ratings is that the sum of all player ratings is always a constant. To show this let us compute the sum of ratings before and after a match between players $i$ and $j$. Recall that $S_{i,j} + S_{j,i} = 1$ and $\mu_{i,j} + \mu_{j,i} = 1$. It follows that $$\kappa (S_{i,j} - \mu_{i,j}) + \kappa (S_{j,i} - \mu_{j,i}) = 0 $$ that is, the rating update for $i$ is the negative of the rating update of $j$: what player $i$ gains is what player $j$ loses or viceversa. It follows that after the match between $i$ and $j$ the sum of player ratings is the same as before the match, that is: $$\sum_k r_k(old) = \sum_k r_k(new) $$ In particular if we assume that the initial rating is 0 for all players, we have that the sum and mean of Elo's ratings is always 0.

The following code plots the logistic function:


logistic = function(x, z=400) {1 / (1 + 10^(-x / z))}
curve(logistic, -1000, 1000, ylab="logistic")
abline(h = 0.5, lty=2)
abline(v = 0, lty=2)

The following user-defined function elo computes the Elo's method. If also computes the foresight prediction accuracy, possibly using a home-field advantage.


##  Elo
# INPUT
# matches: matches data frame
# teams: team names
# zeta: logistic parameter
# k: update factor
# scores: use scores of matches?
# hf: home-field advantage
# OUTPUT
# r: rating vector
# wins: number of victories
# foreseen: number of foreseen victories
# accuracy: percentage of foreseen victories
elo = function(matches, teams, z = 400, k = 25, scores=FALSE, hf=0) {
  # number of teams
  n = length(teams)
  
  # number of matches
  m = dim(matches)[1]
  
  # old rating vector
  rold = rep(0, n)
  
  # new rating vector
  rnew = rep(0, n)
  
  wins = 0
  foreseen = 0
  
  for (i in 1:m) {
    
    
    # compute prediction accuracy
    spread = matches[i,"score1"] - matches[i,"score2"] 
    if (spread > 0) {
      wins = wins + 1
      if ((rold[matches[i,"team1"]] + hf) > rold[matches[i,"team2"]]) {
        foreseen = foreseen + 1
      }
    }
    if (spread < 0) {
      wins = wins + 1
      if (rold[matches[i,"team2"]] > (rold[matches[i,"team1"]] + hf)) {
        foreseen = foreseen + 1
      }
    }
    
    # team 1
    # compute score
    if (scores == FALSE) {
      spread = matches[i,"score1"] - matches[i,"score2"]
      if (spread > 0) {
        score = 1
      }
      if (spread == 0) {
        score = 0.5
      }
      if (spread < 0) {
        score = 0
      }
    } else {
      score = (matches[i,"score1"] + 1) / (matches[i,"score1"] + matches[i,"score2"] + 2)
    }
    
    # compute update
    delta = rold[matches[i,"team1"]] - rold[matches[i,"team2"]]
    mu = 1 / (1 + 10^(-delta / z))
    update = k * (score - mu)
    
    # update rating
    rnew[matches[i,"team1"]] = rold[matches[i,"team1"]] + update
    
    # team 2
    # compute score
    if (scores == FALSE) {
      spread = matches[i,"score2"] - matches[i,"score1"]
      if (spread > 0) {
        score = 1
      }
      if (spread == 0) {
        score = 0.5
      }
      if (spread < 0) {
        score = 0
      }
    } else {
      score = (matches[i,"score2"] + 1) / (matches[i,"score1"] + matches[i,"score2"] + 2)
    }
    
    # compute update
    delta = rold[matches[i,"team2"]] - rold[matches[i,"team1"]]
    mu = 1 / (1 + 10^(-delta / z))
    update = k * (score - mu)
    
    # update new rating
    rnew[matches[i,"team2"]] = rold[matches[i,"team2"]] + update

    # update old ratings
    rold[matches[i,"team1"]] = rnew[matches[i,"team1"]]
    rold[matches[i,"team2"]] = rnew[matches[i,"team2"]]
  }
  return(list(r=rnew, foreseen = foreseen, wins = wins, accuracy = foreseen / wins))
}