Characteristic polynomial

  • the eigenvalues of an \(n \times n\) matrix \(A\) are the roots of the characteristic polynomial of \(A\): \[p(\lambda) = \det(A − \lambda I)\]
  • since \(p(\lambda)\) has degree \(n\), there are \(n\) eigenvalues, but some of them can be complex, even if A is real, and some can be repeated (however all the eigenvalues of a symmetric matrix are real)

Multiplicity

  • the algebraic multiplicity \(\mu_\lambda\) of an eigenvalue \(\lambda\) is the number of times it is repeated as root of the characteristic polynomial
  • the geometric multiplicity \(\gamma_\lambda\) of an eigenvalue \(\lambda\) is the number of linear independent eigenvectors associated with \(\lambda\)
  • it holds that \(1 ≤ \gamma_\lambda ≤ \mu_\lambda ≤ n\)
  • if \(\mu_\lambda = \gamma_\lambda = 1\) then \(\lambda\) is said to be simple

Eigenvalues and eigenvectors

  • a vector \(x \neq 0\) is a right eigenvector of a square matrix \(A\) if \(Ax = \lambda x\) where \(\lambda\) is a scalar defined as eigenvalue associated with \(x\)
  • a vector \(x \neq 0\) is a left eigenvector of a square matrix \(A\) if \(xA = \lambda x\) where \(\lambda\) is a scalar defined as eigenvalue associated with \(x\)
  • unless \(A\) is symmetric, left eigenvectors are different from right eigenvectors
  • however, the set of the eigenvalues associated to right and left eigenvectors is exactly the same
  • the definition of eigenvector has a very intuitive geometrical meaning: If \(x\) is a (right) eigenvector of \(A\), then \(Ax\) has the same direction of \(x\). The action of \(A\) on \(x\) is, in some sense, quite polite

Spectrum and spectral radius

  • the set of the distinct eigenvalues of \(A\) is defined as spectrum of \(A\) and denoted with \(\sigma(A)\)
  • the spectral radius of \(A\) is the nonnegative number \[\rho(A) = \max_{\lambda \in \sigma(A)} |\lambda|\]

Irreducibility

  • a square matrix \(A\) is said irreducible if its graph \(g_A\) is strongly connected, that is, if there exists a directed path between any ordered pair of nodes of the graph
  • in particular a symmetric matrix \(A\) is irreducible if its graph \(g_A\) is connected, that is, there exists an undirected path between any pair of nodes of the graph

Perron-Frobenius Theorem

If a nonnegative matrix \(A\) is irreducible then:

  1. the spectral radius \(r = \rho(A) > 0\) and \(r \in \sigma(A)\)
  2. \(r\) is a simple eigenvalue of \(A\), hence its associated eigenvector is unique (up to a multiplicative constant)
  3. the unique right eigenvector \(x\) such that \(Ax = rx\) is positive
  4. the unique left eigenvector \(y\) such that \(yA = ry\) is positive

Eigenvector centrality

  • a natural extension of degree centrality is eigenvector centrality
  • in-degree centrality awards one centrality point for every link a node receives
  • but not all vertices are equivalent: some are more relevant than others, and, reasonably, endorsements from important nodes count more

Eigenvector centrality

The eigenvector centrality thesis reads:

A node is important if it is linked to by other important nodes

Math

Let \(A = (a_{i,j})\) be the adjacency matrix of a graph. The eigenvector centrality \(x_{i}\) of node \(i\) is given by:

\[x_i = \frac{1}{\lambda} \sum_k a_{k,i} \, x_k\]

where we assume the unknown \(\lambda \neq 0\). In matrix form we have:

\[\lambda x = x A\]

Math

The centrality vector \(x\) is the left-hand eigenvector of the adjacency matrix \(A\) associated with the eigenvalue \(\lambda\).

Notice that \(\lambda x = x A\) is not a linear system since both \(\lambda\) and \(x\) are variables.

It is wise to choose \(\lambda\) as the largest eigenvalue in absolute value of matrix \(A\).

By virtue of Perron-Frobenius theorem for non-negative matrices, this choice guarantees the following desirable property:

If the adjacency matrix is irreducible, or equivalently if the graph is (strongly) connected, then the eigenvector solution is both unique and positive.

Power method

The power method can be used to solve the eigenvector centrality problem.

Let \(m(v)\) denote the signed component of maximal magnitude of vector \(v\). If there is more than one maximal component, let \(m(v)\) be the first one. For instance, \(m(-3,3,2) = -3\).

Let \(x^{(0)} = e\), where \(e\) is the vector of all 1’s. For \(k \geq 1\):

  1. repeatedly compute \(x^{(k)} = x^{(k-1)} A\);
  2. normalize \(x^{(k)} = x^{(k)} / m(x^{(k)})\);

until the desired precision is achieved.

It holds that:

  • \(x^{(k)}\) converges to the dominant eigenvector of \(A\)
  • \(m(x^{(k)})\) converges to the dominant eigenvalue of \(A\)

Power method

Let us sort the eigenvalues of \(A\) in order of decreasing magnitude:

\[|\lambda_1| > |\lambda_2| > \ldots > |\lambda_n|\]

Then:

  • the power method converges when \(|\lambda_1| > |\lambda_2|\)
  • the rate of convergence is the rate at which \((\lambda_2 / \lambda_1)^k\) goes to \(0\)
  • in particular if \(|\lambda_2 / \lambda_1|\) is close to 1, the convergence is very slow

Code

Function eigen_centrality computes eigenvector centrality for a graph using the ARPACK package.

Notice that this implementation does not use the power method, but the Implicitly Restarted Arnoldi Method (IRAM) or, in the case of symmetric matrices, the corresponding variant of the Lanczos algorithm.

library(igraph)
g = read_graph(file = "EVGraphUndirected.gml", format = "gml")
coords = layout_in_circle(g)
plot(g, layout = coords, 
     vertex.size = 20, 
     vertex.color = "white")

ev = eigen_centrality(g)
(x = ev$vector)
##         A         B         C         D         E         F         G         H 
## 0.1332886 0.9460927 0.2395318 0.5264579 1.0000000 0.4927119 0.4927119 0.4927119 
##         I         L         M 
## 0.4927119 0.2531801 0.2531801
(lambda = ev$value)
## [1] 3.949758
plot(g, layout = coords, 
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(x, 2),
     vertex.label.cex = 0.80)

# check
A = as_adjacency_matrix(g, sparse = FALSE)
as.vector(x %*% A)
##  [1] 0.5264579 3.7368371 0.9460927 2.0793813 3.9497581 1.9460927 1.9460927
##  [8] 1.9460927 1.9460927 1.0000000 1.0000000
lambda * x
##         A         B         C         D         E         F         G         H 
## 0.5264579 3.7368371 0.9460927 2.0793813 3.9497581 1.9460927 1.9460927 1.9460927 
##         I         L         M 
## 1.9460927 1.0000000 1.0000000

Play

Code in R the power method and test it on the Bull graph.

Solution

# Eigenvector centrality (power method)
#INPUT
# g = graph
# t = precision
# OUTPUT
# A list with:
# vector = centrality vector
# value = eigenvalue
# iter = number of iterations

eigenvectorCentrality = function(g, t) {
  A = as_adjacency_matrix(g)
  n = vcount(g)
  x0 = rep(0, n)
  x1 = rep(1, n)
  eps = 1/10^t
  iter = 0;
  while (sum(abs(x0 - x1)) > eps) {
    x0 = x1
    x1 = as.vector(x1 %*% A)
    m = x1[which.max(abs(x1))]
    x1 = x1 / m
    iter = iter + 1
  } 
  return(list(vector = x1, value = m, iter = iter))
}
g = make_graph("Bull")
plot(g)
eigenvectorCentrality(g, 6)

## $vector
## [1] 0.8685172 1.0000000 1.0000000 0.4342586 0.4342586
## 
## $value
## [1] 2.302775
## 
## $iter
## [1] 26

Play

  1. Build a bipartite graph
  2. Run the power method on the graph. Does it converge?
  3. Explain in theory the absence of convergence

Solution

g = make_bipartite_graph(types = c(0,0,0, 1,1,1), 
                         edges = c(1,4, 1,5, 1,6, 2,4, 3,5))

plot(g)

# INFINITE COMPUTATION
#eigenvectorCentrality(g, 6)

A = as_adjacency_matrix(g, sparse = FALSE)
sort(abs(eigen(A)$values), decreasing = TRUE)
## [1] 1.9318517 1.9318517 1.0000000 1.0000000 0.5176381 0.5176381
eigen_centrality(g)$vector
## [1] 1.0000000 0.3660254 0.3660254 0.7071068 0.7071068 0.5176381

Katz centrality

  • a practical problem with eigenvector centrality is that it works well only if the graph is (strongly) connected
  • real undirected networks typically have a large connected component, of size proportional to the network size. However, real directed networks do not.
  • if a directed network is not strongly connected, then only vertices that are in strongly connected components of at least two nodes or in the out-component of such components can have non-zero eigenvector centrality
  • for instance, in a directed acyclic graph, all nodes get null eigenvector centrality

Katz centrality

  • a way to work around this problem is to give each node a small amount of centrality for free, regardless of the position of the vertex in the network, that it can transfer to other nodes
  • it follows that highly linked nodes have high centrality, regardless of the centrality of the linkers
  • however, nodes that receive few links may still have high centrality if the linkers have large centrality

Katz centrality

The Katz centrality thesis is then:

A node is important if it is linked from other important nodes or if it is highly linked.

This method has been proposed by Leo Katz (A new status index derived from sociometric analysis. Psychometrika, 1953) and later refined by Charles H. Hubbell (An input-output approach to clique identification. Sociometry, 1965).

Math

Let \(A = (a_{i,j})\) be the adjacency matrix of a directed graph.

The Katz centrality \(x_{i}\) of node \(i\) is given by:

\[x_i = \alpha \sum_k a_{k,i} \, x_k + \beta\]

where \(\alpha > 0\) and \(\beta > 0\) are positive constants. In matrix form we have:

\[x = \alpha x A + \beta\]

where \(\beta\) is now a vector whose elements are all equal a given positive constant.

This is no more an eigenvector problem, but instead a linear system, since \(\alpha\) and \(\beta\) are constants.

Notice that the centrality vector \(x\) is defined by two components:

  • an endogenous component that takes into consideration the network topology
  • an exogenous component that is independent of the network structure

Math

Notice that \(x = \alpha x A + \beta\) is equivalent to \(x (I - \alpha A) = \beta\).

If matrix \((I - \alpha A)\) is invertible then:

\[x = \beta (I - \alpha A)^{-1}\]

If we wish to make use of the Katz centrality we must first choose a value for constant \(\alpha\) and \(\beta\).

  • if we let \(\alpha \rightarrow 0\), then only the constant term survives and all vertices have the same centrality \(\beta\)
  • as we increase \(\alpha\) from zero the centralities increase and eventually there comes a point at which they diverge
  • this happens at the point where \((I - \alpha A)\) is singular, i.e., when \(\mathrm{det}(I - \alpha A) = 0\)
  • rewriting this condition as:\[\mathrm{det}(A - \alpha^{-1} I) = 0\] we see that it is simply the characteristic equation whose roots \(\alpha^{-1}\) are equal to the eigenvalues of the adjacency matrix
  • as \(\alpha\) increases from 0 (hence \(\alpha^{-1}\) decreases), the determinant first crosses zero when \(\alpha^{-1}\) = \(\lambda_1\), the largest eigenvalue of \(A\), or alternatively when \(\alpha = 1/\lambda_1\)
  • thus, we should choose a value of \(0 < \alpha < 1/ \rho(A)\), where \(\rho(A)\) the the spectral radius of \(A\)

Math

Recall that \[\sum_{i=0}^{\infty} x^i = \frac{1}{1-x}\] whenever \(|x| < 1\). This result holds also for matrices.

Hence, if \(\rho(\alpha A) = \alpha \rho(A) < 1\), that is, \(\alpha < 1/ \rho(A)\), we can express the Katz centrality as follows:

\[x = \beta (I - \alpha A)^{-1} = \beta \sum_{i=0}^{\infty} (\alpha A)^i = \beta + \beta \alpha A + \beta \alpha^2 A^2 + \ldots \]

  • recall that \(A^i\) contains all paths of length \(i\) between nodes of the graph
  • the Katz centrality of a node is hence the number of weighted paths reaching the node in the network plus an exogenous factor, a generalization of the in-degree measure which counts only paths of length one
  • long paths are weighted less than short ones by exploiting the attenuation factor \(\alpha\); this is reasonable since endorsements devalue over long chains of links
  • for small (close to 0) values of \(\alpha\) the contribution given by paths longer than one rapidly declines, and thus Katz scores are mainly influenced by short paths (mostly in-degrees) and by the exogenous factor of the system
  • when the damping factor is large, long paths are devalued smoothly, and Katz scores are more influenced the by endogenous topological part of the system

Math

  • as for the value of constant \(\beta\), it is possible to assign to each node \(i\) a different minimum amount of status \(\beta_i\)
  • nodes with higher exogenous status are, for some reason, privileged from the start
  • for instance, in the setting of the Web, we might bias the result towards topics, like contemporary art or ballet, that are of particular interest to the user to which the Web page ranking is served

Code

Function alpha_centrality computes Katz centrality for a graph:

g = read_graph(file = "EVGraphDirected.gml", format = "gml")
coords = layout_in_circle(g)

plot(g,
     layout = coords,
     vertex.size = 20, 
     vertex.color = "white",
     edge.arrow.size = 0.4)

A = as_adjacency_matrix(g)
eig = eigen(A)$values
r = max(abs(eig))
alpha = 0.85 / r
(x =  alpha_centrality(g, alpha = alpha, exo = 1))
##         A         B         C         D         E         F         G         H 
##  17.73198 203.77891 174.21208  19.68468  21.98198  19.68468   1.00000   1.00000 
##         I         L         M 
##   1.00000   1.00000   1.00000
plot(g, 
     layout = coords,
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(x, 0),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)

alpha = 0.5 / r
x =  alpha_centrality(g, alpha = alpha, exo = 1)

plot(g, 
     layout = coords,
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(x, 2),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)

alpha = 0.15 / r
x =  alpha_centrality(g, alpha = alpha, exo = 1)

plot(g, 
     layout = coords,
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(x, 2),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)

PageRank

  • a potential problem with Katz centrality is the following: if a node with high centrality links many others then all those others get high centrality
  • in many cases, however, it means less if a node is only one among many to be linked
  • the centrality gained by virtue of receiving a link from an important node should be diluted if the important vertex is very magnanimous with endorsements
  • PageRank is an adjustment of Katz centrality that takes into consideration this issue
  • it was proposed (and patented) by Sergey Brin and Larry Page (The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 1998).

PageRank

There are three distinct factors that determine the PageRank of a node:

  1. the number of links it receives
  2. the centrality of the linkers
  3. the link propensity of the linkers

PageRank

The PageRank thesis might be summarized as follows:

A node is important if it linked from other important and link parsimonious nodes or if it is highly linked.

Math

Let \(A = (a_{i,j})\) be the adjacency matrix of a directed graph.

The PageRank centrality \(x_{i}\) of node \(i\) is given by:

\[x_i = \alpha \sum_k \frac{a_{k,i}}{d_k} \, x_k + \beta\]

where \(\alpha\) and \(\beta\) are constants and \(d_k\) is the out-degree of node \(k\) if such degree is positive, or \(d_k = 1\) if the out-degree of \(k\) is null.

In matrix form we have:

\[x = \alpha x D^{-1}A + \beta\]

where \(\beta\) is now a vector whose elements are all equal a given positive constant and \(D^{-1}\) is a diagonal matrix with \(i\)-th diagonal element equal to \(1/d_{i}\).

It follows that \(x\) can be computed as:

\[x = \beta (I - \alpha D^{-1}A)^{-1}\]

The damping factor \(\alpha\) and the personalization vector \(\beta\) have the same role seen for Katz centrality. In particular, \(\alpha\) should be chosen between \(0\) and \(1 / \rho(D^{-1}A)\).

An alternative formulation: the random surfer

Consider the matrix \[Q = D^{-1}A\]

and let us call \(P\) the matrix \(Q\) in which we replace any row full of 0s – which correspond to nodes with zero out-degree – with a row full of \(1/n\), where \(n\) is the size of \(A\).

Notice that \(P\) is row-stochastic, that is each row sums to 1. Let \[G = \alpha P + (1 - \alpha) T\] where \(T\) is a matrix full of \(1/n\) and \(\alpha > 0\) is a constant calle the damping factor. Since \(T\) is also row-stochastic and \(G\) is a convex combination of row-stochastic matrices, \(G\) is also row-stochastic.

Moreover, \(G\) is irreducible, since its graph is complete and hence strongly connected.

The PageRank centrality can be computed as the vector \(x\) such that \[x = x G\]

Since \(G\) is stochastic, its dominant eigenvalue \(\lambda_1 = 1\), hence \(x\) is the dominant eigenvector of matrix \(G\). By virtue of Perron-Frobenius theorem, the solution \(x\) is unique and positive.

Matrix \(G\) is called the Google matrix. It models a random surfer of the Web. With probability \(\alpha\), historically set to \(0.85\), it follows the links on the visited Web pages, and with probability \(1 - \alpha\) it is teleported to a random Web page.

Moreover, the PageRank vector \(x\) is also the stationary vector of a Markov chain with transition matrix \(G\). Hence:

The PageRank of a node is the expected number of times a random surfer of the graph (Web) visits the node.

Notice that:

  • a node with many neighbors will be visited frequently
  • if some neighbors of a node are central (hence visited often) and parsimonious in their links, then the node will be visited often

Assuming that \(x e = 1\), that is vector \(x\) sums to one, we have that

\[x G = x (\alpha P + (1 - \alpha) T) = \alpha x P + e \, (1 - \alpha)/n \]

hence \[x = x G\] is equivalent to the linear system formulation \[x = \alpha x P + e \, (1 - \alpha)/n\] with \(\alpha\) equal to the damping factor and \(\beta = e \, (1 - \alpha)/n\).

Code

Function page_rank computes PageRank centrality for a graph:

g = read_graph(file = "EVGraphDirected.gml", format = "gml")
coords = layout_in_circle(g)

plot(g,
     layout = coords,
     vertex.size = 20, 
     vertex.color = "white",
     edge.arrow.size = 0.4)

pr =  page_rank(g)
# equivalent to:
pr =  page_rank(g, 
                damping = 0.85, 
                personalized = rep(1, vcount(g)))

(x = pr$vector)
##          A          B          C          D          E          F          G 
## 0.03278149 0.38440095 0.34291029 0.03908709 0.08088569 0.03908709 0.01616948 
##          H          I          L          M 
## 0.01616948 0.01616948 0.01616948 0.01616948
plot(g, 
     layout = coords,
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(x, 2),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)

pr =  page_rank(g, damping = 0.15)
(x = pr$vector)
##          A          B          C          D          E          F          G 
## 0.08478337 0.12976638 0.09789382 0.08472679 0.12595853 0.08472679 0.07842886 
##          H          I          L          M 
## 0.07842886 0.07842886 0.07842886 0.07842886
plot(g, 
     layout = coords,
     vertex.size = 30, 
     vertex.color = "white",
     vertex.label = round(x, 3),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)

Dig deeper

PageRank centrality, regarded as a ranking measure, is a remarkably old method. Early pioneers of this technique are:

  • Wassily W. Leontief (The Structure of American Economy, 1919-1929. Harvard University Press, 1941)
  • John R. Seeley (The net of reciprocal influence: A problem in treating sociometric data. The Canadian Journal of Psychology, 1949).

Read the full story.

HITS

  • so far, a node is important if it contains valuable content and hence receives many links from other important sources
  • nodes with no incoming links cumulate, in the best case, only a minimum amount of centrality, regardless of how many other useful information sources they reference
  • one can argue that a node is important also because it links to other important vertices
  • for instance, a review paper may refer to other authoritative sources: it is important because it tells us where to find trustworthy information
  • or an important art collector can tell us what is good art

HITS

Thus, there are now two types of central nodes:

  • authorities, that contain reliable information on the topic of interest
  • hubs, that tell us where to find authoritative information.
  • a node may be both an authority and a hub: for instance, a review paper may be highly cited because it contains useful content and it may as well cite other useful sources
  • or an important art collector may be also a successful artist

HITS

The Kleinberg centrality thesis reads:

A node is an authority if it is linked to by hubs; it is a hub if it links to authorities.

This method has been conceived by Jon M. Kleinberg (Authoritative sources in a hyperlinked environment. In ACM-SIAM Symposium on Discrete Algorithms, 1998).

Math

Let \(A = (a_{i,j})\) be the adjacency matrix of a directed graph. The authority centrality \(x_{i}\) of node \(i\) is given by:

\[x_i = \alpha \sum_k a_{k,i} \, y_k\]

and hub centrality \(y_{i}\) of node \(i\) is given by:

\[y_i = \beta \sum_k a_{i,k} \, x_k\]

where \(\alpha\) and \(\beta\) are constants. In matrix form we have:

\[\begin{array}{lcl} x & = & \alpha y A \\ y & = & \beta x A^T \\ \end{array}\]

or, combining the two:

\[\begin{array}{lcl} \lambda x & = & x A^T A \\ \lambda y & = & y A A^T \\ \end{array}\]

where \(\lambda = (\alpha \beta)^{-1}\).

Hence the authority vector is an eigenvector of the authority matrix \(A^T A\) and the hub vector is an eigenvector of the hub matrix \(A A^T\) (the matrices have the same eigenvalues).

Play

  1. show how to compute the hub vector \(y\) using the authority vector \(x\) (using only one vector-matrix product)
  2. show how to compute the authority vector \(x\) using the hub vector \(y\) (using only one vector-matrix product)

Solution

  1. Notice that \(\lambda x A^T = x A^T (A A^T)\). Hence we have that \(x A^T\) is an eigenvector of the hub matrix \(A A^T\) with the same eigenvalue \(\lambda\). Thus, once computed the authority vector \(x\), we can get the hub vector as \(y = x A^T\), or, equivalently, the hub score of \(i\) is the sum of authority scores of nodes linked to by \(i\).
  2. Similarly \(x = y A\), or, equivalently, the authority score of \(i\) is the sum of hub scores of nodes linked to \(i\).

Code

Functions authority_score and hub_score computes authority and hub centralities:

g = read_graph(file = "EVGraphDirected.gml", format = "gml")
coords = layout_in_circle(g)

plot(g,
     layout = coords,
     vertex.size = 20, 
     vertex.color = "white",
     edge.arrow.size = 0.4)

authority =  authority_score(g)$vector
hub =  hub_score(g)$vector

round(authority, 2)
##    A    B    C    D    E    F    G    H    I    L    M 
## 0.10 1.00 0.00 0.11 0.85 0.11 0.00 0.00 0.00 0.00 0.00
round(hub, 2)
##    A    B    C    D    E    F    G    H    I    L    M 
## 0.00 0.00 0.54 0.60 0.67 1.00 1.00 1.00 1.00 0.46 0.46
plot(g, 
     layout = coords,
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(authority, 2),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)

plot(g, 
     layout = coords,
     vertex.size = 25, 
     vertex.color = "white",
     vertex.label = round(hub, 2),
     vertex.label.cex = 0.80,
     edge.arrow.size = 0.4)