Centrality: temporal methods

Overview

temporal networks are graphs in which the edges have attached the timestamp of creation of the tie between nodes
a temporal network is a dynamic network, evolving in time; while a static network is a single frame, a temporal network is a movie (a sequence of consecutive frames)
centrality methods should take into consideration the additional temporal information attached to edges
in particular an endorsement of node A coming from node B at time t should increase the centrality of A according to the centrality of B at endorsing time t
static centrality methods are not time-aware and hence they miss the dynamic evolution of centrality

Sport metrics

in college football, say Notre Dame is highly touted early in the season, and Miami beats top-ranked Notre Dame in September
then Notre Dame suffers some injurys (physical or psyhcological) and loses three more games
traditional methods will miss the fact that Miami beat Notre Dame when they were at full strength and confident, whereas a time-aware metric would catch this difference

Art metrics

suppose that a collector bought (for the same price) two artworks A and B from the same artist but at different times: artwork A when the artist was unknown and artwork B after the artist became popular
the collector expects a larger increase in centrality from the second purchase, since they acquired a piece from a more renowned artist
static centrality methods do not distinguish between the two purchases
a timed-aware metric, on the other hand, would distinguish between the two scenarios, assigning to the collector different centrality gains, proportional to the artist’s centrality at the time of each sale

Elo

The Elo’s system is an iterative method coined by the physics professor and excellent chess player Arpad Elo. The Elo thesis is:

If a player performs as expected, it gains nothing. If it performs better than expected, it is rewarded, while if it performs poorer than expected, it is penalized.

According to the movie The social network Mark Zuckerberg by David Fincher, it appears that the Elo’s method formed the basis for rating people on Zuckerberg’s Web site Facemash, which was the predecessor of Facebook.

Math

Let \(s_{i,j}\) be the score of team \(i\) against team \(j\); for instance, in chess a win is given a score of 1 and a draw a score of 1/2 (and a defeat a score of 0). Notice that \[S_{i,j} + S_{j,i} = 1\]

Let \(\mu_{i,j}\) be the number of points that team \(i\) is expected to score against team \(j\); this is typically computed as a logistic function of the difference of ratings between the players: \[\mu_{i,j} = \frac{1}{1 + 10^{-d_{i,j} / \zeta}} = \frac{10^{r_{i} / \zeta}}{10^{r_{i} / \zeta} + 10^{r_{j} / \zeta}}\] where \(d_{i,j} = r_i - r_j\) and \(\zeta\) is a constant (in the chess world \(\zeta = 400\)).

We assume that initially all team ratings are equal to 0. Then, when teams \(i\) and \(j\) match, the new ranks \(r_i\) of team \(i\) and \(r_j\) of team \(j\) are updated as follows: \[ r_i \leftarrow r_i + \kappa (S_{i,j} - \mu_{i,j}) \\ r_j \leftarrow r_j + \kappa (S_{j,i} - \mu_{j,i}) \] where \(\kappa\) is a constant (for instance, in chess \(\kappa = 25\) for new players).

Math

What is the meaning of constants \(\kappa\) and \(\zeta\)?

As for parameter \(\kappa\), it balances the difference between actual and expected scores against prior ratings.

If \(\kappa\) is too large, then too much volatile are the ratings.
On the other hand, if \(\kappa\) is too small, then too much stagnant are the ratings.

As for parameter \(\zeta\), notice that \[\mu_{i,j} = \frac{1}{1 + 10^{-d_{i,j} / \zeta}} = \frac{10^{r_{i} / \zeta}}{10^{r_{i} / \zeta} + 10^{r_{j} / \zeta}}\]

hence \[\mu_{i,j} + \mu_{j,i} = 1\] and \[\frac{\mu_{i,j}}{\mu_{j,i}} = \frac{10^{r_{i} / \zeta}}{10^{r_{j} / \zeta}} \] hence \[\mu_{i,j} = \mu_{j,i} 10^{(r_{i} - r_{j}) / \zeta}\]

This means that:

for every \(\zeta\) rating points of advantage that player \(i\) has over player \(j\), the probability of player \(i\) beating player \(j\) is expected to be 10 times greater than the probability of player \(j\) beating player \(i\).

Math

Instead of building ratings based just on wins and loses, Elo’s can use the number of points \(P_{i,j}\) that \(i\) scores against \(j\) by defining the score that team \(i\) makes against team \(j\) as \[S_{i,j} = \frac{P_{i,j} + 1}{P_{i,j} + P_{j,i} + 2}\]

Hence \(0 < S_{i,j} < 1\) and \(S_{i,j} + S_{j,i} = 1\).

Notice that in this case, winning with a little point margin against a week player can in fact decrease your rating, and losing with a little point margin against a strong player can in fact increase your rating.

Play

Assuming that initially all Elo’s ratings are equal to 0, prove that at any time: \[\sum_i r_i = 0\]

Solution

Let \(\delta_{i,j} = \kappa (S_{i,j} - \mu_{i,j})\) and \(\delta_{j,i} = \kappa (S_{j,i} - \mu_{j,i})\). We have that:

\[ \delta_{i,j} + \delta_{j,i} = \kappa (S_{i,j} - \mu_{i,j} + S_{j,i} - \mu_{j,i}) = \kappa (1 - 1) = 0 \] It follows that after every match the cumulative rating added to the system is 0.

Code

##  Elo rating system
# INPUT
# games: a game *matrix* with columns White, Black and Score
#        Players are integer numbers starting at 1
#        The matrix is sorted in chronological order
# zeta: logistic parameter
# k: update factor
# OUTPUT
# r: rating vector
elo = function(games, z = 400, k = 25) {
  
  # number of players 
  # (players are integer numbers starting at 1)
  n = max(c(games[, "White"], games[, "Black"]))

  # number of games
  m = nrow(games)
  
  # rating vector
  r = rep(0, n)
  
  # iterate through games
  for (i in 1:m) {
    score = games[i, "Score"]
    white = games[i, "White"]
    black = games[i, "Black"]

    # compute update
    spread = r[white] - r[black]
    mu = 1 / (1 + 10^(-spread / z))
    update = k * (score - mu)
    
    # update ratings
    r[white] = r[white] + update
    r[black] = r[black] - update
    
  }
  return(r)
}

Time-aware HITS

We provide a time-aware version of HITS and apply it to the art market.

We consider a simplified view of the art market (and resulting sales network) as bipartite between the roles of the artist and the collector.

Artists create and sell artworks, they are the sources of art. Collectors purchase and pull together artworks, they have some sense of where good art is.

We posit that there exists a mutual reinforcement mechanism in the definition of the dominating figures of artists and collectors, that can be summarized as follows:

Important collectors purchase works of important artists; important artists sell their works to important collectors.

Time-aware HITS

we consider the weighted sales network where links go from buyer to seller and are weighted with the sale price
one proposal is to define the artist centrality as the authority centrality and the collector centrality as the hub centrality over the sale network
the proposal, however, misses a fundamental ingredient of art markets: the importance of timing

Time-aware HITS

We hence develop a time-aware extension of HITS:

initially, at time 0, all actors of the gallery have null rating
then, at each sale, we update the rating of the selling artist using the sale price and the rating that the buying collector had before the sale
similarly, we update the rating of the buying collector using the sale price and the rating that the selling artist had before the sale

Math

Suppose that artist \(i\) sells to collector \(j\) an artwork at price \(p\) at time \(t > 0\).

We update the artist rating \(x_{i}(t)\) of artist \(i\) at time \(t\) as well as the collector rating \(y_{j}(t)\) of collector \(j\) at time \(t\) using the following interrelated formulas:

\[ \begin{array}{lcl} x_{i}(t) & = & x_{i}(t-1) + P(p, t-1) \cdot P(y_{j}(t-1), t-1) \\ y_{j}(t) & = & y_{j}(t-1) + P(p, t-1) \cdot P(x_{i}(t-1), t-1) \end{array} \]

\(P(p, t-1)\) is the percentile of price \(p\) with respect to the distribution of gallery prices up to time \(t-1\). Hence, it is a factor from 0 to 1 that ponders the sale price within the price history of the gallery.
\(P(x_{i}(t-1), t-1)\) is the percentile of the rating of artist \(i\) at time \(t-1\) with respect to the distribution of ratings of artists that sold at least one piece by time \(t-1\)
\(P(y_{j}(t-1), t-1)\) is the percentile of the rating of collector \(j\) at time \(t-1\) with respect to the distribution of ratings of collectors that bought at least one piece by time \(t-1\)
these factors, also lying between 0 and 1, weight the artist/collector rating relative to similar ratings accrued so far
it is worth mentioning that we opted for a percentile approach since we noticed that both gallery prices and ratings display a right-skewed distribution, for which the mean is not a good indicator of the average case

Code

# TAHITS - Time-Aware HITS

# INPUT
# endorsements: data frame with timestamped endorsements. 
#               Columns are: from, to, amount, timestamp

# OUTPUT
# x: artist rating vector
# y: collector rating vector

TAHITS = function(endorsements) {
  
  # number of artists
  n = max(c(endorsements$from, endorsements$to))
  
  # number of events
  m = nrow(endorsements)
  
  # artist rating vector
  x = rep(0, n)
  # collector rating vector
  y = rep(0, n)
  
  # reward vectors
  rew = rep(0, m)
  rewx = rep(0, m)
  rewy = rep(0, m)
  
  # artist reward vector
  xw = rep(0, m)
  # collector reward vector
  yw = rep(0, m)


  # percentile function
  ecdf_fun <- function(x, perc) ecdf(x)(perc)
  
  # iterate through events
  for (i in 1:m) {
    
    # 1. get seller, buyer and price
    seller = endorsements[i, "to"]
    buyer = endorsements[i, "from"]
    price = endorsements[i, "amount"]

    # 2. compute price reward
    if (i == 1) {
      rew[i] = 0.5
    } else {
      rew[i] = ecdf_fun(endorsements[1:(i-1),]$amount, price)
    }
    
    # 3. compute rating rewards for seller and buyer
    # only sellers with one sale
    v = x[x > 0]
    if (length(v) == 0) {
      rewx[i] = 0.5
    } else {
      rewx[i] = ecdf_fun(v, x[seller])
    }

    # only buyers with one purchase
    v = y[y > 0]
    if (length(v) == 0) {
      rewy[i] = 0.5
    }
    else {
      rewy[i] = ecdf_fun(v, y[buyer])
    }
    
    # 4. compute artist reward using 
    #    price reward and collector rating reward
    xw[i] = rew[i] * rewy[i]

    # 5. compute collector reward using 
    #    price reward and artist reward
    yw[i] = rew[i] * rewx[i]

    # 6. update seller artist and buyer collector 
    #    ratings with respective rewards
    x[seller] = x[seller] + xw[i] 
    y[buyer] = y[buyer] + yw[i]
  }
  
  return(list(x = x, y = y))
}

Dig deeper

We applied time-aware hits to the market of digital art:

a Medium post
a pre-print paper