A large volume of research on networks has been devoted to the concept of centrality. This research addresses the question:
Which are the most important nodes or edges in a network?
A large volume of research on networks has been devoted to the concept of centrality. This research addresses the question:
Which are the most important nodes or edges in a network?
The thesis of degree centrality reads as follows:
A node is important if it has many neighbors, or, in the directed case, if there are many other nodes that link to it.
Let \(A = (a_{i,j})\) be the adjacency matrix of a directed graph and \(e\) is a vector with all components equal to unity.
The in-degree centrality \(x_{i}\) of node \(i\) is given by:
\[x_{i} = \sum_k a_{k,i}\]
or in matrix form:
\[x = e A\]
The out-degree centrality \(y_{i}\) of node \(i\) is given by:
\[y_{i} = \sum_k a_{i,k}\]
or in matrix form:
\[y = e A^T\]
degree
computes unweighted degree centralitystrength
computes weighted degree centralitylibrary(igraph) edges = c(1,3, 1,4, 2,5, 3,2, 4,4, 4,5, 5,3, 5,6, 5,4) dg = graph(edges, directed=TRUE) coords = layout_with_fr(dg) plot(dg, layout=coords, vertex.size = 20, vertex.color = "lightblue", edge.width = 3, edge.color = "black")
degree(dg, mode = "in")
## [1] 0 1 2 3 2 1
degree(dg, mode = "out")
## [1] 2 1 1 2 3 0
degree(dg, mode = "total")
## [1] 2 2 3 5 5 1
edges = c(1,2, 1,5, 2,3, 2,4, 3,4, 3,5, 3,6) wg = graph(edges, directed=TRUE) E(wg)$weight = c(1, 5, 2, 8, 4, 5, 6) coords = layout_with_fr(wg) plot(wg, layout=coords, vertex.size = 20, vertex.color = "lightblue", edge.width = 1, edge.color = "grey", edge.label = E(wg)$weight)
strength(wg, mode = "in")
## [1] 0 1 2 12 10 6
strength(wg, mode = "out")
## [1] 6 10 15 0 0 0
strength(wg, mode = "total")
## [1] 6 11 17 12 10 6
Write a function degreeCentrality
that, given an adjacency matrix \(A\), computes the in, out and total degree of the corresponding graph (without using the degree
function).
# Degree centrality # INPUT # g = graph # mode = degree mode degreeCentrality = function(A, mode) { n = nrow(A) e = rep(1, n); if (mode == "in") x = e %*% A if (mode == "out") x = e %*% t(A) if (mode == "total") x = e %*% (A + t(A)); return(as.vector(x)); } (A = as_adjacency_matrix(dg, sparse=FALSE))
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 0 0 1 1 0 0 ## [2,] 0 0 0 0 1 0 ## [3,] 0 1 0 0 0 0 ## [4,] 0 0 0 1 1 0 ## [5,] 0 0 1 1 0 1 ## [6,] 0 0 0 0 0 0
degreeCentrality(A, mode = "in")
## [1] 0 1 2 3 2 1
degreeCentrality(A, mode = "out")
## [1] 2 1 1 2 3 0
degreeCentrality(A, mode = "total")
## [1] 2 2 3 5 5 1
The friendship paradox is a phenomenon observed by sociologist Scott L. Feld. It claims that, on average,
I have fewer friends than my friends have :-(
Implications of the friendship paradox include:
On a social network, it formally says that the mean node degree:
\[\mu_1 = \frac{\sum_i k_i}{n}\]
is less than or equal to the mean neighbors degree:
\[\mu_2 = \frac{\sum_{i,j} a_{i,j} k_j}{\sum_{i,j} a_{i,j}}\]
We have that:
\[\mu_2 = \frac{\sum_{i,j} a_{i,j} k_j}{\sum_{i,j} a_{i,j}} = \frac{\sum_{j} \sum_{i} a_{i,j} k_j}{\sum_{j} \sum_{i} a_{i,j}} = \frac{\sum_j k_j \sum_i a_{i,j}}{\sum_j k_j} = \frac{\sum_j k_j \cdot k_j}{\sum_i k_i} = \frac{\langle k^2 \rangle}{\langle k \rangle}\]
where
Also recall that:
\[\mathrm{Var}(X) = \\ E[(X - E[X])^2] = \\ E[X^2 - 2XE[X] + E[X]^2] = \\ E[X^2] - 2E[X]E[X] + E[X]^2 = \\ E[X^2] - E[X]^2\] In other words, the variance of X is equal to the mean of the square of X minus the square of the mean of X.
\[\mu_2 - \mu_1 = \frac{\langle k^2 \rangle}{\langle k \rangle} - \langle k \rangle = \frac{\langle k^2 \rangle - \langle k \rangle^2}{\langle k \rangle} = \frac{\sigma^{2}}{\langle k \rangle} \geq 0\]
Since both the variance \(\sigma^2\) and the mean \(\langle k \rangle\) of the degree distribution are positive quantities, we have that the inequality holds.
In particular we have that \(\mu_1 = \mu_2\) if and only if \(\sigma^2 = 0\) if and only if all nodes have the same degree (the graph is regular). On the other hand, when the distribution of degrees is skewed (the typical case), the inequality is proper.
The reason for the inequality is that hub nodes, which are nodes with a high degree, are counted once in \(\mu_1\), but are counted many times (precisely their degree times) in \(\mu_2\). Hence the latter mean is biased towards hub nodes.
library(igraph) # scale-free network g = sample_pa(100, m=2) d = degree(g) sd(d)
## [1] 5.847938
(mu1 = mean(d))
## [1] 3.94
(mu2 = mean(d*d) / mean(d))
## [1] 12.53299
#random network g = sample_gnm(100, 200) d = degree(g) sd(d)
## [1] 2.247333
(mu1 = mean(d))
## [1] 4
(mu2 = mean(d*d) / mean(d))
## [1] 5.25
Closeness centrality measures the mean distance from a vertex to other vertices.
Given nodes \(i\) and \(j\), the distance \(d_{i,j}\) between them is the length of a shortest path from \(i\) to \(j\).
The mean distance for vertex \(i\) to others is:
\[l_i = \frac{1}{n-1} \sum_{j\neq i} d_{i,j}\] and the closeness centrality for \(i\) is:
\[C_i = \frac{1}{l_i} = \frac{n-1}{\sum_{j\neq i} d_{i,j}}\]
The maximum value for mean distance, hence the minimum for closeness, occurs for a root of a chain graph, a graph where the \(n\) nodes form a linear sequence or chain, where the roots are the two ends of the chain:
\[\frac{1}{n-1} \sum_{i=1}^{n-1} i = \frac{(n-1) n}{2 (n-1)} = \frac{n}{2}\]
The minimum value for mean distance, hence the maximum for closeness, occurs for the central node of a star graph, a network composed of a vertex attached to \(n-1\) other vertices, whose only connection is with the central node:
\[\frac{1}{n-1} \sum_{i=1}^{n-1} 1 = \frac{n-1}{n-1} = 1\]
Function closeness
computes closeness centrality (this function assigns a distance equal to the number of nodes of the graph to pairs of nodes that are not reachable).
g = make_lattice(c(5,5)) plot(g)
x = closeness(g) names(x) = 1:vcount(g) round(x, 3)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 ## 0.010 0.012 0.013 0.012 0.010 0.012 0.014 0.015 0.014 0.012 0.013 0.015 0.017 ## 14 15 16 17 18 19 20 21 22 23 24 25 ## 0.015 0.013 0.012 0.014 0.015 0.014 0.012 0.010 0.012 0.013 0.012 0.010
Let \(n_{s,t}^{i}\) be the number of shortest paths from \(s\) to \(t\) that pass through \(i\) and let \(n_{s,t}\) be the total number of geodesic paths from \(s\) to \(t\).
Then the betweenness centrality of vertex \(i\) is:
\[b_i = \sum_{\begin{array}{l} s\neq t \\ s \neq i \\ t \neq i\end{array}} w_{s,t}^{i} = \sum_{\begin{array}{l} s\neq t \\ s \neq i \\ t \neq i\end{array}} \frac{n_{s,t}^{i}}{n_{s,t}}\]
where the formula counts undirected paths in only one direction and, by convention, the ratio \(w_{s,t}^{i} = 0\) if \(n_{s,t} = 0\).
Notice that each pair of vertex \(s, t\) contribute to the sum for \(i\) with a weight \(w_{s,t}^{i}\) between 0 and 1 expressing the betweenness of \(i\) with respect to the pair \(s, t\).
The maximum possible value for betweenness occurs for the central node of a star graph. All paths go through the central vertex, hence its betweenness is
\[\sum_{i=1}^{n-2} i = (n-2) (n-1) / 2\]
At the other end of the scale, the smallest possible value for betweenness occurs for a leaf node (a node of degree 1). There are no shortest paths containing the leaf node, hence its betweenness is 0.
Function betweenness
computes betweenness centrality.
g = make_star(5, mode = "undirected") plot(g)
x = betweenness(g) names(x) = 1:vcount(g) round(x, 1)
## 1 2 3 4 5 ## 6 0 0 0 0
g = make_lattice(c(5,5)) plot(g)
x = betweenness(g) names(x) = 1:vcount(g) round(x, 0)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## 3 17 22 17 3 17 45 55 45 17 22 55 66 55 22 17 45 55 45 17 3 17 22 17 3
Typically, shortest paths are considered in the definition of both closeness and betweenness. There are two drawbacks of this approach:
In many applications, however, it is reasonable to consider the abundance and the length of all paths of the network, since communication on the network is enhanced as soon as more routes are possible, in particular if these pathways are short.
Current flow centrality addresses these issues, you can read more in this article: Resistance distance, closeness, and betweenness.