Small-world networks

A shortest path, or geodesic path, between two nodes in a graph is a path with the minimum number of edges. If the graph is weighted, it is a path with the minimum sum of edge weights. The length of a geodesic path is called geodesic distance or shortest distance. Geodesic paths are not necessarily unique, but the geodesic distance is well-defined since all geodesic paths have the same length.

One large-scale property of networks is the average geodesic distance between pairs of nodes in the network. One of the most discussed network phenomena is the small-world effect. In most real network the typical geodesic distance is surprisingly short, in particular when compared with the number of nodes of the network.

The first experimental evidence of the small-world phenomenon is made by psychologist Stanley Milgram in the 1960s with his now-famous small-world experiment. Milgram was interested to quantifying the typical distance between actors in a social network. The geodesic distance between two vertices in a network is the length, in terms of number of edges, of a shortest path connecting the two nodes. Milgram wanted to measure the typical geodesic distance in real networks and to do this he concocted the following experiment. Milgram sent a set of packages, 96 in total, to recipients randomly chosen from the telephone directory in the US town of Omaha, Nebraska. The packages contained written instructions asking the recipients to attempt to get an included passport to a specified target individual, a friend of Milgram's who lived in Boston, Massachusetts, over a thousand miles away. The only information supplied about the target was his name, address, and occupation. But the passport holders were not allowed simply to send the passport to the given address. Instead, they were asked to pass it to someone they knew on a first-name basis and more specifically the person in this category who they felt would stand the best chance of getting the passport to the intended target. Each receiver was asked to repeat the process so that after a succession of such steps, a path on the social network of acquaintances, the passport would find its way to the target person. The length of such path provided an upper bound on the geodesic distance in the social network between the starting and ending individuals.

Of the 96 passports sent out, 18 found their way to the target in Boston. Milgram asked participants to record in the passport each step of the path taken. He found that the mean length for completed paths was 5.9 steps. This result is the origin of the idea of the six degrees of separation, the popular belief that there are only about six steps between any two people in the world. The phrase six degrees of separation is in fact more recent and comes from a popular Broadway play by John Guare, later made into a film, in which the lead character discusses Milgram's work.

For many reasons Milgram's experiment should be taken with a large pinch of salt. Milgram used only a single target; all the initial recipients were in a single town; there is no guarantee that the chains took the shortest possible route to the target; finally, most of the chains were not completed. Even so, the fundamental result that vertex pairs on social networks tend on average to be connected by short paths with respect to the number of nodes of the network is now widely accepted, and has moreover been shown to extend to many other kind of networks.

Milgram found that most of the passports that did find their way to the target did so via just three of the target's friends, that Milgram called sociometric superstars. This phenomenon, that is the fact that a large number of the paths from any vertex to the rest of the network go through a limited number of neighbours of the vertex, is sometimes referred to as the funneling effect.

Having the complete network at disposal, the problem of finding the shortest path between two nodes is easy to resolve using a breath-first visit of the graph. Kleinberg noticed that, despite participants in Milgram's experiment had only local information about their neighbourhood of acquaintances, they managed someway to find remarkably short paths to the target. This happened because participants had a good knowledge about which of their friends was the closest to the target in terms of geodesic distance and they chose to send the passport to this closest friend. Related to this, Killworth and Bernard made a reverse small-world experiment in which they asked participants to imagine that they were taking part is a small-world experiment. Participants were asked what they wanted to know about the target person in order to make a decision about whom to forward the message. Interestingly, the three characteristics more often indicated were the same three pieces of information that Milgram provided in his original experiment, namely name, geographic location, and occupation.

The small-world effect has a substantial implication for networked systems. Suppose a rumor (or a disease) is spread over a social network. Clearly it will reach people much faster if it is only about six steps from any person to any other than if it is a hundred. Similarly, the speed with which one gets a response from another computer on the Internet depends on how many hops data packets have to make as they traverse the network.

In fact, once one looks deeply into the mathematics of networks, one discovers that the small-world effect is not so surprising after all. Mathematical models of networks suggest that path lengths in networks should typically scale as $\log n$ with the number $n$ of network nodes, and should therefore tend to remain small even for large networks. The rough idea behind this short distances is that the number of nodes one encounters in a breath-first visit of a graph starting from a source node typically increases exponentially with the distance to the source node, hence the distance increases logarithmically in the number of visited nodes. Both random and preferential attachment network models generate small-world networks.

One can also examine the diameter of the network, which is the largest geodesic distance in the graph. The diameter is usually found to be very small as well and calculations using network models suggest that it should scale logarithmically with the number of nodes as the average geodesic distance. The diameter is, however, a less useful measure for real networks than the average distance, since it really only measure the distance between one pair of vertices at the extreme end of the distribution of distances. Moreover, the diameter is sensible to small changes to the set of nodes, which makes it a poor indicator of the behavior of the network as a whole.

For example, we computed the geodesic distances for all pairs of nodes in the computer science collaboration network. The next barplot shows the share of geodesics having a given length. Notice that geodesics have typically very short lengths compared to the number of nodes: 19% of geodesics have length 5, 33% have length 6, and 26% have length 7. The average geodesic distance is 6.41, and, interestingly, distances normally distribute around this peak. The largest distance is also remarkably small: 23 (there are 8 different geodesics with this length).

These are good news for the computing community: not only the collaboration network is mostly connected, but the average distance is short, and the longest one is not that longer. This means that scientific information can spread quickly on the great majority of the computing community through its collaboration network.

The fact that distances normally distribute is interesting because it means that the average distance of 6 links represents a typical value of all distances in the network. Furthermore, since the distribution of distances drops off rapidly around the mean, the time-consuming computation of the exact average distance can be approximated by computing the average distance on a relatively small random sample of node pairs.