Geodesic paths and distances

  • a shortest path, or geodesic path, between two nodes in a graph is a path with the minimum number of edges
  • the length of a geodesic path is called geodesic distance or shortest distance.

Small worlds

  • one large-scale property of networks is the average geodesic distance between pairs of nodes in the network
  • one of the most discussed network phenomena is the small-world effect: in most real network the typical geodesic distance is surprisingly short, in particular when compared with the number of nodes of the network
  • check the small-world effect using the ErdÅ‘s number on MathSciNet collaboration distance tool

Milgram’s experiment

  • The first experimental evidence of the small-world phenomenon is made by psychologist Stanley Milgram in the 1960s with his now-famous small-world experiment
  • Milgram was interested to quantifying the typical distance between actors in a social network
  • He sent a set of packages, 96 in total, to recipients randomly chosen from the telephone directory in the US town of Omaha, Nebraska. The packages contained written instructions asking the recipients to attempt to get an included passport to a specified target individual, a friend of Milgram’s who lived in Boston, Massachusetts, over a thousand miles away
  • The only information supplied about the target was his name, geographic location, and occupation
  • But the passport holders were not allowed simply to send the passport to the given address. Instead, they were asked to pass it to someone they knew on a first-name basis and more specifically the person in this category who they felt would stand the best chance of getting the passport to the intended target
  • Each receiver was asked to repeat the process so that after a succession of such steps, a path on the social network of acquaintances, the passport would find its way to the target person
  • The length of such path provided an upper bound on the geodesic distance in the social network between the starting and ending individuals.
  • Of the 96 passports sent out, 18 found their way to the target in Boston
  • Milgram asked participants to record in the passport each step of the path taken. He found that the mean length for completed paths was 5.9 steps
  • This result is the origin of the idea of the six degrees of separation, the popular belief that there are only about six steps between any two people in the world
  • The phrase six degrees of separation is in fact more recent and comes from a popular Broadway play by John Guare, later made into a film, in which the lead character discusses Milgram’s work.

Milgram’s experiment: from local to global information

  • Having the complete network at disposal, the problem of finding the shortest path between two nodes is easy to resolve using a breath-first visit of the graph
  • Kleinberg noticed that, despite participants in Milgram’s experiment had only local information about their neighbourhood of acquaintances, they managed someway to find remarkably short paths to the target, which is a global property of the social graph
  • This happened because participants had a good knowledge about which of their friends was the closest to the target in terms of geodesic distance and they chose to send the passport to this closest friend

Small world effect

  • the small-world effect has a substantial implication for networked systems
  • suppose a rumor (or a disease) is spread over a social network
  • clearly it will reach people much faster if it is only about six steps from any person to any other than if it is a hundred
  • similarly, the speed with which one gets a response from another computer on the Internet depends on how many hops data packets have to make as they traverse the network

The small world size

  • in fact, once one looks deeply into the mathematics of networks, one discovers that the small-world effect is not so surprising after all
  • mathematical models of networks suggest that path lengths in networks should typically scale as \(\log n\) with the number \(n\) of network nodes, and should therefore tend to remain small even for large networks
  • the rough idea behind this short distances is that the number of nodes one encounters in a breath-first visit of a graph starting from a source node typically increases exponentially with the distance to the source node, hence the distance increases logarithmically in the number of visited nodes
  • it is observed that distances on a graph normally distribute; this means that the average distance represents a typical value of all distances in the network
  • furthermore, since the distribution of distances drops off rapidly around the mean, the time-consuming computation of the exact average distance can be approximated by computing the average distance on a relatively small random sample of node pairs
  • both random and preferential attachment network models generate small-world networks

Code

library(igraph)

# Zachary graph
g = make_graph("Zachary")

# number of nodes
vcount(g)
## [1] 34
# average distance (degree of separation)
mean_distance(g)
## [1] 2.4082
# maximum geodesic distance (diameter)
diameter(g)
## [1] 5
# highlight the diameter
d = get_diameter(g)
V(g)$color  = "white"
E(g)$color = "grey"
E(g)$width = 1
E(g, path=d)$color = "red"
E(g, path=d)$width = 2
V(g)[d]$color = "red"
plot(g, vertex.label = NA, vertex.size=5)

# histogram of distances
distance_table(g)
## $res
## [1]  78 265 137  73   8
## 
## $unconnected
## [1] 0
paths = distance_table(g)$res
names(paths) = 1:length(paths)
barplot(paths / sum(paths), xlab="Distance", ylab="Frequency")