Epidemics on networks

Epidemics

This part is taken from Chapter 21 of book Networks, crowds and markets by David Easley and Jon Kleinberg

the study of epidemic disease has always been a topic where biological issues mix with social ones
when we talk about epidemic disease, we will be thinking of contagious diseases caused by biological pathogens — things like influenza, measles, and sexually transmitted diseases, which spread from person to person
epidemics can pass explosively through a population, or they can persist over long time periods at low levels; they can experience sudden flare-ups or even wave-like cyclic patterns of increasing and decreasing prevalence

Diseases and networks that transmit them

the patterns by which epidemics spread through groups of people is determined by the properties of the pathogen carrying it — including its contagiousness, the length of its infectious period, and its severity
but also by network structures within the population it is affecting
the social network within a population — recording who knows whom — determines a lot about how the disease is likely to spread from one person to another
but more generally, the opportunities for a disease to spread are given by a contact network: there is a node for each person, and an edge if two people come into contact with each other in a way that makes it possible for the disease to spread from one to the other
for a highly contagious disease, involving airborne transmission based on coughs and sneezes, the contact network will include a huge number of links, including any pair of people who sat together on a bus or an airplane
for a disease requiring close contact, or a sexually transmitted disease, the contact network will be much sparser, with many fewer pairs of people connected by links

Non-human populations

contact networks are also important in understanding how diseases spread through animal populations, with researchers tracing out the interactions within livestock populations during epidemics such as the 2001 foot-and-mouth outbreak in the United Kingdom
as well as plant populations, where the affected individuals occupy fixed locations and diseases tend to have a much clearer spatial footprint
similar models have been employed for studying the spread of computer viruses, with malicious software spreading between computers across an underlying communication network

The random transmission assumption

with diseases there is a lack of decision-making in the transmission of the disease from one person to another
the process is sufficiently complex and unobservable at the person-to-person level that it is most useful to model it as random
that is, we will generally assume that when two people are directly linked in the contact network, and one of them has the disease, there is a given probability that he or she will pass it to the other
this use of randomness allows us to abstract away questions about the mechanics of how one person catches a disease from another for which we have no useful simple models

Branching processes

We begin with perhaps the simplest model of contagion, which we refer to as a branching process. It works as follows:

First wave. Suppose that a person carrying a new disease enters a population, and transmits it to each person he meets independently with a probability of \(p\). Further, suppose that he meets \(k\) people while he is contagious; let’s call these \(k\) people the first wave of the epidemic. Based on the random transmission of the disease from the initial person, some of the people in the first wave may get infected with the disease, while others may not
Second wave. Now, each person in the first wave goes out into the population and meets \(k\) different people, resulting in a second wave of \(k \cdot k = k^2\) people. Each infected person in the first wave passes the disease independently to each of the \(k\) second-wave people they meet, again independently with probability \(p\)
Subsequent waves. Further waves are formed in the same way, by having each person in the current wave meet \(k\) new people, passing the disease to each independently with probability \(p\)

The basic reproductive number

there are really only two possibilities for a disease in the branching process model: it reaches a wave where it infects no one, thus dying out after a finite number of steps
or it continues to infect people in every wave, proceeding infinitely through the contact network
and it turns out that there is a simple condition to tell these two possibilities apart, based on a quantity called the basic reproductive number of the disease
the basic reproductive number, denoted \(R_0\), is the expected number of new cases of the disease caused by a single individual
since in our model everyone meets \(k\) new people and infects each with probability \(p\), the basic reproductive number here is given by \[R_0 = p \cdot k\]

A dichotomy for branching processes

The outcome of the disease in a branching process model is determined by whether the basic reproductive number is smaller or larger than 1.

Claim: If \(R_0 < 1\), then with probability 1, the disease dies out after a finite number of waves. If \(R_0 > 1\), then with probability greater than 0 the disease persists by infecting at least one person in each wave.

A dichotomy for branching processes

when \(R_0 < 1\), the disease isn’t able to replenish itself: each infected person produces less than one new case in expectation, and so the size of the outbreak is constantly trending downward
when \(R_0 > 1\), on the other hand, the size of outbreak is constantly trending upward
notice, however, that even when \(R_0 > 1\), the conclusion is simply that the disease persists with positive probability, not with absolute certainty
in other words, even an ultra-contagious disease can simply get unlucky and vanish from the population before it has a chance to really get going

Play

Around the critical value \(R_0 = 1\), it can be worth investing large amounts of effort even to produce small shifts in the basic reproductive number.

Considering that \(R_0\) is the product of the two terms \(p\) and \(k\), propose public-health measures to reduce \(R_0\).

Solution

maintaining a social distance among individuals, quarantining infectious people, or even locking down regions or entire countries, which reduces the quantity \(k\)
encouraging behavioral measures such as better sanitary practices to reduce the spread of germs (using masks, washing hands, and so on), which reduces the quantity \(p\)

Analysis of branching processes

Recall the branching process model: each infected individual meets \(k\) others and infects each with probability \(p\).

Thus, the expected number of new cases of the disease caused by each infected individual is \[R_0 = p \cdot k,\] the basic reproductive number.

We want to show that the persistence of the disease depends critically on whether \(R_0\) is smaller or larger than 1, a notion that we will formulate as follows.

Recall that the population in this model is organized into a tree in which every node is connected to \(k\) nodes just below it. Let \(q_n\) denote the probability that the epidemic survives for at least \(n\) waves — in other words, that some individual in the \(n\)-th level of the tree becomes infected.

Let \(q^∗\) be the limit of \(q_n\) as \(n\) goes to infinity:

\[\lim_{n \rightarrow \infty} q_n = q^* \]

We can think of \(q^*\) as the probability that the disease persists indefinitely. We will prove the following claim.

Claim: (a) If \(R_0 < 1\) then \(q^∗ = 0\). (b) If \(R_0 > 1\) then \(q^∗ > 0\).

A formula for \(q_n\)

The quantity \(q_n\) depends on three more fundamental quantities:

the number of contacts per individual \(k\),
the contagion probability \(p\), and
the level of the tree \(n\).

In fact, it’s difficult to write down a direct formula for \(q_n\) in terms of these quantities, but it’s not hard to express \(q_n\) in terms of \(q_{n-1}\).

Consider the root node, and let’s first ask what it would take for the following event to hold:

(*) The disease spreads through the root node’s first contact j and then continues to persist down to n levels in the part of the tree reachable through j.

Therefore, the probability of the event (∗) is \(p \cdot q_{n−1}\). Or, taking the complementary view, event (∗) fails to hold with probability \[1 − p \cdot q_{n−1}\]

Now, there is a copy of event (∗) for each of the direct contacts of the root node, and each fails to hold with the same probability \(1 − p \cdot q_{n−1}\). Since they’re independent, the probability that they all fail to hold is: \[(1 − p \cdot q_{n−1})^k\]

But this probability is also \(1 − q_n\), since by the definition of \(q_n\), the quantity \(1 − q_n\) is exactly the probability that the disease fails to persist to \(n\) levels.

Therefore, we get:

\[1 − q_n = (1 − p \cdot q_{n−1})^k\] and solving for \(q_n\):

\[q_n = 1 - (1 − p \cdot q_{n−1})^k\] Since we are assuming that the root is infected, and we can treat the root as level 0 of the tree, we have \(q_0 = 1\); this simply says that the root is infected with probability 1. How do we solve for such a recursive equation?

Down the rabbit hole: following the values \(q_n\) to a limit

If we define the function \[f(x) = 1 − (1 − p x)^k\] then we can write \[q_n = 1 - (1 − p \cdot q_{n−1})^k\] as follows: \[q_n = f(q_{n−1})\]

This suggests a very clean, purely algebraic way of formulating our question about \(q^∗\), the limit of \(q_n\). We simply want to study the sequence of values \[1,\ f(1),\ f(f(1)),\ f(f(f (1))),\ \ldots\] obtained by applying \(f\) repeatedly.

Interestingly, this sequence converges to the fix point \(x^*\) of \(f\) such that \[x^* = f(x^*)\]

This means that \(q^*\), the limit of \(q_n\) as \(n\) goes to infinity, is precisely the fix point \(x^*\) of function \(f\).

Down the rabbit hole: following the values \(q_n\) to a limit

Here are some basic facts about \[f(x) = 1 − (1 − p x)^k\] that help in producing this plot:

first, \(f(0) = 0\) and \(f(1) = 1 - (1−p)^k < 1\). This means that the plot of \(f\) passes through the origin, but lies below the line \(y = x\) once x = 1
second, the derivative of f is \(f'(x) = pk(1 − px)^k\). Notice that as \(x\) ranges between 0 and 1, the quantity \(f'(x)\) is positive but monotonically decreasing. This means that \(f\) has an increasing but concave shape
finally, the slope of f at \(x = 0\) is equal to \(f'(0) = pk = R_0\).

The case \(R_0 > 1\)

When \(R_0 > 1\), \(y = f(x)\) starts out above \(y = x\) for small positive values of \(x\) but ends up below it by the time we get to \(x = 1\).

We conclude that \(y = f(x)\) must cross \(y = x\) somewhere in the interval between 0 and 1, at a point \(x^∗ > 0\).

Now, using this plot, let’s take a geometric view of the sequence of values \[1,\ f(1),\ f(f(1)),\ f(f(f (1))),\ \ldots\]

if we’re currently at a particular point \((x, x)\) on the line \(y = x\), we first move vertically to the curve \(y = f(x)\); this puts us at the point \((x, f(x))\)
we then move horizontally back to the line \(y = x\); this puts us at the point \((f(x),f(x))\) as desired.
continuing this process, we pass through all the points in the sequence \(x, f(x), f(f (x)), \ldots\) along the line \(y = x\)

If we start this from \(x = 1\), the process converges to a point \((x^∗, x^∗)\), with \(0 < x^∗ < 1\), where the line \(y = x\) meets the curve \(y = f(x)\).

This shows the part (b) of the claim, that is if \(R_0 > 1\), then \(q^∗ > 0\) and the disease persists.

The case \(R_0 < 1\)

Similarly, when \(R_0 < 1\), \(y = f(x)\) starts out below \(y = x\) and still ends up below it by the time we get to \(x = 1\).

We conclude that \(y = f(x)\) do not cross \(y = x\) somewhere in the interval between 0 and 1.

With a similar reasoning we see that in this case the process converges to 0. This shows the part (a) of the claim, that is if \(R_0 < 1\) then \(q^∗ = 0\) and the disease stops.

Play

What happens when \(R_0 = 1\)?

Solution

Quite mysteriously, David Easley and Jon Kleinberg, the authors of the book from which this part is taken, say nothing about this case.

However, function \(f(x)\) starts out at 0 with the same slope than \(x\), but immediately after 0 the slope of \(f(x)\) is less than the slope of \(x\), hence, in my assessment, a reasoning similar to the case \(R_0 < 1\) holds also in the case \(R_0 = 1\).

The SIR epidemic model

We now develop an epidemic model that can be applied to any network structure. To do this, we preserve the basic ingredients of the branching process model at the level of individual nodes, but make the contact structure much more general.

An individual node in the branching process model goes through three potential stages during the course of the epidemic:

Susceptible (S): Before the node has caught the disease, it is susceptible to infection from its neighbors
Infectious (I): Once the node has caught the disease, it is infectious and has some probability of infecting each of its susceptible neighbors
Removed (R): After a particular node has experienced the full infectious period, this node is removed from consideration, since it no longer poses a threat of future infection

The SIR epidemic model

using this three-stage life cycle for the disease at each node, we now define a model for epidemics on networks
we are given a directed graph representing the contact network; so an edge pointing from \(v\) to \(w\) in the graph means that if \(v\) becomes infected at some point, the disease has the potential to spread directly to \(w\)
to represent a symmetric contact between people, where either has the potential to directly infect the other, we can put in directed edges pointing each way

The SIR epidemic model

The progress of the epidemic is controlled by the contact network structure and by two additional quantities: \(p\) (the probability of contagion) and \(t_I\) (the length of the infection):

time is discrete and evolves in integer steps of equal length
initially, some nodes are in the \(I\) state and all others are in the \(S\) state
each node \(v\) that enters the \(I\) state remains infectious for a fixed number of steps \(t_I\)
during each of these \(t_I\) steps, \(v\) has a probability \(p\) of passing the disease to each of its susceptible neighbors
after \(t_I\) steps, node \(v\) is no longer infectious or susceptible to further bouts of the disease; we describe it as removed (R), since it is now an inert node in the contact network that can no longer either catch or transmit the disease

The SIR epidemic model

the SIR model is clearly most appropriate for a disease that each individual only catches once in their lifetime
after being infected, a node is removed either because it has acquired lifetime immunity or because the disease has killed it
since each infectious node recovers after some time, on a finite graph after a finite number of steps the disease no longer spreads: each node either never caught the disease (hence it is in state S) or caught the disease and later recovered (hence it is in state R)
thus a key question with an SIR epidemic on a given contact network is to understand how long the outbreak will last, and how many individuals will be affected at different points in time
given a particular contact network on which the disease can spread, the intensity of the epidemic is a function of the probability of contagion \(p\) and the length of the infection \(t_I\)
- if the probability of contagion \(p\) is large, the peak of infections is higher and reached early in time
- if the length of the infection \(t_I\) is large, the desease lasts longer before vanishing
notice also that the branching process model is a special case of the SIR model: it simply corresponds to the SIR model where \(t_I = 1\) and the contact network is an infinite tree, with each node connected to a fixed number of neighbors in the level below

The basic reproductive number

on infinite networks that do not have a tree structure, the simple dichotomy in epidemic behavior determined by the basic reproductive number \(R_0\) does not necessarily hold
in fact, it is not hard to construct an example showing how this dichotomy breaks down
consider the infinite network above and assume there are an infinite number of layers
recall that \(R_0\) the expected number of new cases caused by a randomly chosen node in the network
if the transmission probability is \(p = 2/3\), since each node has \(k = 2\) successor nodes, we have \(R_0 = p \cdot k = 4/3 > 1\)
however, in each layer, there are four edges leading to the next layer, and each will independently fail to transmit the disease with probability \(1/3\)
therefore, with probability \((1/3)^4 = 1/81\), all four edges will fail to transmit the disease — and at this point, these four edges become a roadblock guaranteeing the disease can never reach the portion of the network beyond them
hence, with probability 1, the disease must come to an end after a finite number of layers
however, the notion of the basic reproductive number is still a useful heuristic guide: even when epidemiological modelers do not have a precise condition governing when an epidemic will persist and when it will die out, they find \(R_0\) to be a useful approximate indication of the spreading power of the disease

The network effect

as we said, given a contact network, the intensity of the epidemic depends on the probability of contagion \(p\) and the length of the infection \(t_I\)
however, also the topology of the network has a clear effect on the diffusion of the disease
scale-free networks are more vulnerable to epidemics than random ones, since hub nodes play a dual role: they can easily catch the disease and at the same time they can infect a large number of peers
it turns out that important topological quantities for the diffusion of the disease are the quadratic mean of degrees \[\langle k^2 \rangle = \frac{\sum_i k_{i}^{2}}{n}\] as well as the mean of degrees \[\langle k \rangle = \frac{\sum_i k_{i}}{n}\] and in particular the quantity \[\beta \tau = \ln \frac{\langle k^2 \rangle - \langle k \rangle}{\langle k^2 \rangle - 2 \langle k \rangle} > 0\]
in particular, the lower the quantity \(\beta \tau\) the more likely is the diffusion of the epidemic on the network
notice that in scale-free networks the degrees of nodes are highly skewed, with a significant fraction of nodes (the hubs of the network) with a very large degree
these networks tend to have a large quadratic mean degree \(\langle k^2 \rangle\) compared to the degree mean \(\langle k \rangle\) and hence the value of \(\beta \tau\) tends to 0 since the argument of the logarithm tends to 1 from above
this is not the case for random graphs where node degrees tend to be roughly similar, and hence the argument of the logarithm is larger than 1 in particular for small values of the mean degree \(\langle k \rangle\)
notice, however, that in random graphs with large mean degree, the quadratic mean degree \(\langle k^2 \rangle\) dominates the mean degree \(\langle k \rangle\), hence the argument of the logarithm is again close to 1 and the value of \(\beta \tau\) is close to 0, hence the disease is more severe also on a random graph
this makes sense since with a large mean degree each node have more neighbors that can be infected

library(igraph)

betatau = function(g) {
  d = degree(g)
  summary(d)
  k2 = mean(d * d)
  k = mean(d)
  log( (k2 - k)/(k2 - 2*k) )
}

betatau(sample_pa(1000, m = 4, power = 1, directed = FALSE))

## [1] 0.06919996

# same number of edges 
betatau(sample_gnm(1000, 4000))

## [1] 0.1339569

# increase density
betatau(sample_gnm(1000, 6000))

## [1] 0.08640578

# even more...
betatau(sample_gnm(1000, 8000))

## [1] 0.06430241

The SIS epidemic model

so far we have been considering models for epidemics in which each individual contracts the disease at most once
however, a simple variation on these models allows us to reason about epidemics where nodes can be reinfected multiple times
to represent such epidemics, we have nodes that simply alternate between two possible states:
- Susceptible (S), and
- Infectious (I)
there is no Removed (R) state here; rather, after a node is done with the Infectious state, it cycles back to the Susceptible state and is ready to catch the disease again

The SIS epidemic model

The mechanics of the model SIS process as follows:

initially, some nodes are in the I state and all others are in the S state
each node \(v\) that enters the I state remains infectious for a fixed number of steps \(t_I\)
during each of these \(t_I\) steps, \(v\) has a probability \(p\) of passing the disease to each of its susceptible neighbors
after \(t_I\) steps, node \(v\) is no longer infectious, and it returns to the \(S\) state

SIS and SIR models on finite graphs

an SIR epidemic on a finite graph is burning through a bounded supply of nodes — since nodes can never be reinfected — and therefore it must come to an end after a relatively small number of steps
an SIS epidemic, on the other hand, can run for an extremely long time as it cycles through the nodes potentially multiple times
however, on a finite graph, there will eventually (with probability 1) come a point in time when all contagion attempts simultaneously fail for \(t_I\) steps in a row, and at this point it will be over
thus, again, a key question with an SIS epidemic on a given contact network is to understand how long the outbreak will last, and how many individuals will be affected at different points in time
the intensity of the epidemic is again a function of the probability of contagion \(p\) and the length of the infection \(t_I\)

The SIRS epidemic model

SIRS combine elements of the SIR and SIS models in a simple way, so that after an infected node recovers, it passes back to the S state after an immunity (or removal) time \(t_R\). In detail, the model works as follows:

initially, some nodes are in the I state and all others are in the S state
each node \(v\) that enters the I state remains infectious for a fixed number of steps \(t_I\)
during each of these \(t_I\) steps, \(v\) has a probability \(p\) of passing the disease to each of its susceptible neighbors
after \(t_I\) steps, node \(v\) is no longer infectious. It then enters the R state for a fixed number of steps \(t_R\). During this time, it cannot be infected with the disease, nor does it transmit the disease to other nodes
after \(t_R\) steps in the R state, node \(v\) returns to the S state
in this case we have three parameters: both the contagious probability \(p\) and the infection time \(t_I\) are positively correlated with the severity of the disease, while the immunity time \(t_R\) is negatively correlated with the severity of the disease

Synchronization

we now look at a new issue in the global dynamics of a disease — the tendency of epidemics for certain diseases to synchronize across a population, sometimes producing strong oscillations in the number of affected individuals over time
such effects are well-known for diseases including measles and syphilis
the crucial ingredients behind synchronization appear to be a combination of temporary immunity and long-range links (weak ties) in the contact network
roughly, long-range links produce coordination in the timing of flare-ups across dispersed parts of the network
the temporary immunity produces a network-wide deficit in the number and connectivity of susceptible individuals, yielding a large trough in the size of the outbreak that directly follows the peak from the earlier flare-ups

Synchronization and small-world networks

temporary immunity can produce oscillations in very localized parts of the network, with patches of immunity following large numbers of infections in a concentrated area
but for this to produce large fluctuations that can be seen at the level of the full network, the flare-ups of the disease have to be coordinated so that they happen at roughly the same time in many different places
a natural mechanism to produce this kind of coordination is to have a network that is rich in long-range connections, linking otherwise far-apart sections of the network
in a small-world networks, many of the links are strong ties, local and clustered ties connecting nodes with very similar social and geographic characteristics
however, some links are long-range links, corresponding to weak ties that link very different parts of the network
long-range links make it possible for things that happen in one part of the network to quickly affect what is happening elsewhere and are responsible for the synchronization effect observed on SIRS epidemic models

The SIR model in igraph

library(igraph)

# random graph
g = sample_gnm(1000, 4000)

# SIR model simulation
# beta: Non-negative scalar. The rate of infection of an individual 
# that is susceptible and has a single infected neighbor
# gamma: Positive scalar. The rate of recovery of an infected individual 
sm = sir(g, beta=0.25, gamma=0.5, no.sim = 500)

# plot infected
plot(sm, comp = "NI")

# plot susceptibles
plot(sm, comp = "NS")

# plot removed
plot(sm, comp = "NR")

# median and quantile of S, I and R
median(sm)

## $NS
##    [0,0.541] (0.541,1.08]  (1.08,1.62]  (1.62,2.16]  (2.16,2.71]  (2.71,3.25] 
##        998.0        992.0        982.0        962.0        929.0        874.0 
##  (3.25,3.79]  (3.79,4.33]  (4.33,4.87]  (4.87,5.41]  (5.41,5.95]  (5.95,6.49] 
##        799.0        711.0        611.0        511.0        420.0        342.0 
##  (6.49,7.03]  (7.03,7.58]  (7.58,8.12]  (8.12,8.66]   (8.66,9.2]   (9.2,9.74] 
##        275.0        227.0        190.0        162.0        143.0        129.0 
##  (9.74,10.3]  (10.3,10.8]  (10.8,11.4]  (11.4,11.9]  (11.9,12.4]    (12.4,13] 
##        119.0        112.0        107.0        104.0        100.0         97.0 
##    (13,13.5]  (13.5,14.1]  (14.1,14.6]  (14.6,15.2]  (15.2,15.7]  (15.7,16.2] 
##         95.0         94.0         93.0         91.0         91.0         91.0 
##  (16.2,16.8]  (16.8,17.3]  (17.3,17.9]  (17.9,18.4]  (18.4,18.9]  (18.9,19.5] 
##         90.0         92.0         90.0         90.0         89.0         90.0 
##    (19.5,20]    (20,20.6]  (20.6,21.1]  (21.1,21.6]  (21.6,22.2]  (22.2,22.7] 
##         90.0         88.0         89.0         91.0         90.5         89.0 
##  (22.7,23.3]  (23.3,23.8]  (23.8,24.4]  (24.4,24.9]  (24.9,25.4]    (25.4,26] 
##         87.0         87.0         89.0         92.5         88.0         90.5 
##    (26,26.5]  (26.5,27.1]  (27.1,27.6]  (27.6,28.1]  (28.1,28.7]  (28.7,29.2] 
##         82.0         81.0         72.0         82.0         97.0         86.0 
##  (29.2,29.8]  (29.8,30.3]  (30.3,30.8]  (30.8,31.4]  (31.4,31.9]  (31.9,32.5] 
##        110.0         96.0         99.0         96.0         65.0         64.0 
##    (32.5,33]    (33,33.6]  (33.6,34.1]  (34.1,34.6]  (34.6,35.2]  (35.2,35.7] 
##         79.5         62.0         62.0           NA           NA           NA 
##  (35.7,36.3]  (36.3,36.8]  (36.8,37.3]  (37.3,37.9]  (37.9,38.4]    (38.4,39] 
##           NA           NA         62.0           NA           NA         91.0 
##    (39,39.5] 
##         90.0 
## 
## $NI
##    [0,0.541] (0.541,1.08]  (1.08,1.62]  (1.62,2.16]  (2.16,2.71]  (2.71,3.25] 
##          2.0          7.0         14.0         28.0         51.0         90.0 
##  (3.25,3.79]  (3.79,4.33]  (4.33,4.87]  (4.87,5.41]  (5.41,5.95]  (5.95,6.49] 
##        140.0        195.0        248.0        291.0        308.0        312.0 
##  (6.49,7.03]  (7.03,7.58]  (7.58,8.12]  (8.12,8.66]   (8.66,9.2]   (9.2,9.74] 
##        305.0        288.0        265.0        234.0        200.0        168.0 
##  (9.74,10.3]  (10.3,10.8]  (10.8,11.4]  (11.4,11.9]  (11.9,12.4]    (12.4,13] 
##        137.0        113.0         91.0         74.0         59.0         47.0 
##    (13,13.5]  (13.5,14.1]  (14.1,14.6]  (14.6,15.2]  (15.2,15.7]  (15.7,16.2] 
##         38.0         30.0         23.0         18.0         15.0         11.0 
##  (16.2,16.8]  (16.8,17.3]  (17.3,17.9]  (17.9,18.4]  (18.4,18.9]  (18.9,19.5] 
##          9.0          8.0          6.0          5.0          3.5          3.0 
##    (19.5,20]    (20,20.6]  (20.6,21.1]  (21.1,21.6]  (21.6,22.2]  (22.2,22.7] 
##          3.0          2.0          1.0          1.0          1.0          0.5 
##  (22.7,23.3]  (23.3,23.8]  (23.8,24.4]  (24.4,24.9]  (24.9,25.4]    (25.4,26] 
##          1.0          0.0          0.0          0.0          0.0          0.0 
##    (26,26.5]  (26.5,27.1]  (27.1,27.6]  (27.6,28.1]  (28.1,28.7]  (28.7,29.2] 
##          0.0          0.0          4.0          0.0          0.0          0.0 
##  (29.2,29.8]  (29.8,30.3]  (30.3,30.8]  (30.8,31.4]  (31.4,31.9]  (31.9,32.5] 
##          0.0          0.0          0.0          2.0          2.5          3.0 
##    (32.5,33]    (33,33.6]  (33.6,34.1]  (34.1,34.6]  (34.6,35.2]  (35.2,35.7] 
##          2.5          5.0          3.0           NA           NA           NA 
##  (35.7,36.3]  (36.3,36.8]  (36.8,37.3]  (37.3,37.9]  (37.9,38.4]    (38.4,39] 
##           NA           NA          1.0           NA           NA          0.0 
##    (39,39.5] 
##          0.0 
## 
## $NR
##    [0,0.541] (0.541,1.08]  (1.08,1.62]  (1.62,2.16]  (2.16,2.71]  (2.71,3.25] 
##          0.0          1.0          4.0          9.0         19.0         35.0 
##  (3.25,3.79]  (3.79,4.33]  (4.33,4.87]  (4.87,5.41]  (5.41,5.95]  (5.95,6.49] 
##         61.0         95.0        139.0        195.0        256.0        326.0 
##  (6.49,7.03]  (7.03,7.58]  (7.58,8.12]  (8.12,8.66]   (8.66,9.2]   (9.2,9.74] 
##        401.0        471.0        538.0        602.0        657.0        704.0 
##  (9.74,10.3]  (10.3,10.8]  (10.8,11.4]  (11.4,11.9]  (11.9,12.4]    (12.4,13] 
##        744.0        776.0        803.0        823.0        842.0        854.0 
##    (13,13.5]  (13.5,14.1]  (14.1,14.6]  (14.6,15.2]  (15.2,15.7]  (15.7,16.2] 
##        865.0        875.0        882.0        888.0        892.0        896.0 
##  (16.2,16.8]  (16.8,17.3]  (17.3,17.9]  (17.9,18.4]  (18.4,18.9]  (18.9,19.5] 
##        899.0        900.0        902.0        905.0        907.0        907.0 
##    (19.5,20]    (20,20.6]  (20.6,21.1]  (21.1,21.6]  (21.6,22.2]  (22.2,22.7] 
##        908.0        910.0        910.0        908.5        908.0        909.5 
##  (22.7,23.3]  (23.3,23.8]  (23.8,24.4]  (24.4,24.9]  (24.9,25.4]    (25.4,26] 
##        912.0        913.0        910.5        905.5        912.0        908.5 
##    (26,26.5]  (26.5,27.1]  (27.1,27.6]  (27.6,28.1]  (28.1,28.7]  (28.7,29.2] 
##        918.0        919.0        926.0        918.0        901.0        914.0 
##  (29.2,29.8]  (29.8,30.3]  (30.3,30.8]  (30.8,31.4]  (31.4,31.9]  (31.9,32.5] 
##        890.0        904.0        901.0        902.0        932.5        933.0 
##    (32.5,33]    (33,33.6]  (33.6,34.1]  (34.1,34.6]  (34.6,35.2]  (35.2,35.7] 
##        918.0        933.0        935.0           NA           NA           NA 
##  (35.7,36.3]  (36.3,36.8]  (36.8,37.3]  (37.3,37.9]  (37.9,38.4]    (38.4,39] 
##           NA           NA        937.0           NA           NA        909.0 
##    (39,39.5] 
##        910.0

quantile(sm, comp = "NI", prob = 0.50)

##    [0,0.541] (0.541,1.08]  (1.08,1.62]  (1.62,2.16]  (2.16,2.71]  (2.71,3.25] 
##          2.0          7.0         14.0         28.0         51.0         90.0 
##  (3.25,3.79]  (3.79,4.33]  (4.33,4.87]  (4.87,5.41]  (5.41,5.95]  (5.95,6.49] 
##        140.0        195.0        248.0        291.0        308.0        312.0 
##  (6.49,7.03]  (7.03,7.58]  (7.58,8.12]  (8.12,8.66]   (8.66,9.2]   (9.2,9.74] 
##        305.0        288.0        265.0        234.0        200.0        168.0 
##  (9.74,10.3]  (10.3,10.8]  (10.8,11.4]  (11.4,11.9]  (11.9,12.4]    (12.4,13] 
##        137.0        113.0         91.0         74.0         59.0         47.0 
##    (13,13.5]  (13.5,14.1]  (14.1,14.6]  (14.6,15.2]  (15.2,15.7]  (15.7,16.2] 
##         38.0         30.0         23.0         18.0         15.0         11.0 
##  (16.2,16.8]  (16.8,17.3]  (17.3,17.9]  (17.9,18.4]  (18.4,18.9]  (18.9,19.5] 
##          9.0          8.0          6.0          5.0          3.5          3.0 
##    (19.5,20]    (20,20.6]  (20.6,21.1]  (21.1,21.6]  (21.6,22.2]  (22.2,22.7] 
##          3.0          2.0          1.0          1.0          1.0          0.5 
##  (22.7,23.3]  (23.3,23.8]  (23.8,24.4]  (24.4,24.9]  (24.9,25.4]    (25.4,26] 
##          1.0          0.0          0.0          0.0          0.0          0.0 
##    (26,26.5]  (26.5,27.1]  (27.1,27.6]  (27.6,28.1]  (28.1,28.7]  (28.7,29.2] 
##          0.0          0.0          4.0          0.0          0.0          0.0 
##  (29.2,29.8]  (29.8,30.3]  (30.3,30.8]  (30.8,31.4]  (31.4,31.9]  (31.9,32.5] 
##          0.0          0.0          0.0          2.0          2.5          3.0 
##    (32.5,33]    (33,33.6]  (33.6,34.1]  (34.1,34.6]  (34.6,35.2]  (35.2,35.7] 
##          2.5          5.0          3.0           NA           NA           NA 
##  (35.7,36.3]  (36.3,36.8]  (36.8,37.3]  (37.3,37.9]  (37.9,38.4]    (38.4,39] 
##           NA           NA          1.0           NA           NA          0.0 
##    (39,39.5] 
##          0.0

Play

Compare the SIR on random and scale-free graphs:

generate a random and a scale-free graph with the same number of nodes and edges;
compare the quantity \(\beta \tau = \ln \frac{\langle k^2 \rangle - \langle k \rangle}{\langle k^2 \rangle - 2 \langle k \rangle}\)
simulate the SIR model on the two networks and compare how fast arrive and how high are the peaks of infections

Hint: investigate the power parameter of the sample_pa function and increase it to a value larger than 1 (the default)

Solution

g = sample_gnm(1000, 4000)
betatau(g)

## [1] 0.1339164

sm = sir(g, beta=0.25, gamma=0.5, no.sim = 500)
plot(sm, comp = "NI")

#quantile(sm, comp = "NI", prob = 0.50)

g = sample_pa(1000, m = 4, power = 3, directed = FALSE)
betatau(g)

## [1] 0.002263923

sm = sir(g, beta=0.25, gamma=0.5, no.sim = 500)
plot(sm, comp = "NI")

#quantile(sm, comp = "NI", prob = 0.50)

Using the SIR mode and comparing with the random model, in scale-free networks the outbreak of the disease is:

faster (it arrives earlier)
higher (it inflects more nodes)
shorter (it lasts less)

The vaccination problem

suppose we have a disease spreading on a given contact network and a limited quantity of vaccine doses to distribute over the nodes of the network
with what strategy do you choose the nodes to vaccinate?
the benchmark strategy is randomness: choose at random the nodes to vaccinate
however, given the information of the topology of the network, a more effective strategy probably exists

The grandmother and grandchild dilemma

The dilemma is well described by the following excerpt of Wired paper The Vulnerable Can Wait. Vaccinate the Super-Spreaders First:

“Let’s say you’ve got one course of the vaccine and two people to choose between: Candidate 1 is a college student who doesn’t social distance, wears his mask slung beneath his chin, and plays beer pong all weekend at underground frat parties. Candidate 2 is his 87-year-old widowed grandmother, who lives on her own and has barely been out of the house since March. If your goal is to protect the more vulnerable person, you should vaccinate grandma. If your goal is to reduce transmission, you should vaccinate the frat bro. From society’s perspective, he’s a jerk; from the network’s, he’s a hub.”

“Essentially there are two approaches to using a vaccine. One is to protect individuals by vaccinating them, and the other is to reduce transmission and therefore protect the population.”

The AIDS epidemic in Africa

A similar dilemma held with AIDS epidemic in sub-Saharan Africa, where the US government had just announced an ambitious program to combat the disease.

“The Bush administration was giving the treatment to mothers with children because that sounds really good, and it’s soft and cozy. But what our measurement has shown is, no, no, no, you should actually give the HIV drugs to prostitutes, because those are the ones who are the biggest hubs when it comes to the spread of HIV - Albert-László Barabási”

We live in a scale-free world

if the contacts in the network were random, each node would have roughly the same degree, hence the random vaccination strategy would be as effective as any other strategy
however, the dilemma stems from the fact that the world we live in, more particularly the contact network at hand, is not random
on the contrary, the network is probably scale-free, with a long tail degree distribution in which few nodes (the vital few or super-spreaders) have a large number of contacts and the majority of nodes (the trivial many or vulnerable) have a tiny number of contacts
we have seen that scale-free networks are more vulnerable than random networks to epidemics, meaning the outbreak of the disease is faster and larger
moreover, as we have seen when talking about percolation, removing nodes in decreasing order of degree has a larger effect on the size of the resulting giant component then adopting a random removal approach

From local to global information

“To knock out the super-spreaders, the ideal target for a vaccine would be someone with many contacts in different settings – someone with a big, multigenerational family, a job that led to a lot of mixing with strangers, and a busy social life - Christopher Cox”

the practical problem is that we do not know the whole contact network and hence cannot detect the hubs, or super-spreaders, of the disease
can you find the hubs without a complete map?
we have already seen with the Milgram’s small world experiment that it’s possible to estimate a global property of a network (in this case the average separation distance among two nodes) using only local information about the nodes (the most likely neighbor to reach a target node)

Acquaintance immunization strategy

in 2003, during the first SARS epidemic, Shlomo Havlin, a physicist at Bar-Ilan University near Tel Aviv, proposed one of the most ingenious solutions to this problem
in a paper called Efficient Immunization Strategies for Computer Networks and Populations, Havlin and two colleagues argued that you could achieve global effects on a complex network using only local knowledge
all you had to do was follow a simple script:

Take a random sample of a population, ask each individual to name a single acquaintance, and vaccinate the acquaintance.

Acquaintance immunization and the friendship paradox

acquaintance immunization works because of a phenomenon known as the friendship paradox, which holds that, on average, your friends have more friends than you do
so, by selecting one on your friends, you have a bigger change to land on a hub
the very act of asking someone to choose a friend, any friend, played out over hundreds or thousands of iterations, leads inevitably to the most connected people

Play

Imagine a star graph with one center and \(n\) leaves. Suppose that, besides the core node, each leaf has also a second contact, different for every leaf.

Now ask each leaf to randomly choose one neighbor.

What is the probability for the core node to be chosen?
What is the probability for any contact of a leaf to be chosen?

Solution

For each leaf, the core node is chosen with probability \(1/2\). Hence, it is chosen \(n /2\) times after the \(n\) choices of the leaves. It follows that in general the core node is chosen with probability \(\frac{n/2}{n} = 1/2\).

For each leaf, the leaf’s contact node is chosen with probability \(1/2\), but, since it has no other contacts, \(1 /2\) is also the number of times it is chosen after the \(n\) choices of the leaves. It follows that in general the leaf’s contact node is chosen with probability \(\frac{1/2}{n} = \frac{1}{2n}\).

Using the acquaintance immunization strategy, the probability that the core node is not vaccinated is \(\frac{1}{2^n}\), hence the probability that it is vaccinated is \(1 - \frac{1}{2^n}\) with tends to 1 as \(n\) grows.

Is the clever solution the best solution?

a drawback inherent in all network-based immunization strategies, by their very nature, is that they require the cascading effects of interventions reaching across an entire population
but as soon as you have a disease that’s afflicting millions—or billions—of people, the stakes are too high to start experimenting
with lives on the line, who would choose an immunization plan that has never been tested outside of computer models and college campuses?

“There are some clever things you could try, but I think for a lot of reasons it makes sense to try to be not too clever - Marc Lipsitch”

“The highest-risk populations are clearly defined, and it would be foolish not to protect them directly. Once we have protected the high-risk strata, but we don’t have the resources to immediately blanket the rest of the population with vaccine—at that point, differential strategies might be beneficial - Alessandro Vespignani”

Dig deeper

Chapter 21 of book Networks, crowds and markets
Chapter 16 of book Networks
The Vulnerable Can Wait. Vaccinate the Super-Spreaders First