Epidemics

  • the study of epidemic disease has always been a topic where biological issues mix with social ones
  • when we talk about epidemic disease, we will be thinking of contagious diseases caused by biological pathogens — things like influenza, measles, and sexually transmitted diseases, which spread from person to person
  • epidemics can pass explosively through a population, or they can persist over long time periods at low levels; they can experience sudden flare-ups or even wave-like cyclic patterns of increasing and decreasing prevalence

Diseases and networks that transmit them

  • the patterns by which epidemics spread through groups of people is determined by the properties of the pathogen carrying it — including its contagiousness, the length of its infectious period, and its severity
  • but also by network structures within the population it is affecting
  • the social network within a population — recording who knows whom — determines a lot about how the disease is likely to spread from one person to another
  • but more generally, the opportunities for a disease to spread are given by a contact network: there is a node for each person, and an edge if two people come into contact with each other in a way that makes it possible for the disease to spread from one to the other
  • for a highly contagious disease, involving airborne transmission based on coughs and sneezes, the contact network will include a huge number of links, including any pair of people who sat together on a bus or an airplane
  • for a disease requiring close contact, or a sexually transmitted disease, the contact network will be much sparser, with many fewer pairs of people connected by links

Non-human populations

  • contact networks are also important in understanding how diseases spread through animal populations, with researchers tracing out the interactions within livestock populations during epidemics such as the 2001 foot-and-mouth outbreak in the United Kingdom
  • as well as plant populations, where the affected individuals occupy fixed locations and diseases tend to have a much clearer spatial footprint
  • similar models have been employed for studying the spread of computer viruses, with malicious software spreading between computers across an underlying communication network

The random transmission assumption

  • with diseases there is a lack of decision-making in the transmission of the disease from one person to another
  • the process is sufficiently complex and unobservable at the person-to-person level that it is most useful to model it as random
  • that is, we will generally assume that when two people are directly linked in the contact network, and one of them has the disease, there is a given probability that he or she will pass it to the other
  • this use of randomness allows us to abstract away questions about the mechanics of how one person catches a disease from another for which we have no useful simple models

Branching processes

We begin with perhaps the simplest model of contagion, which we refer to as a branching process. It works as follows:

  • First wave. Suppose that a person carrying a new disease enters a population, and transmits it to each person he meets independently with a probability of \(p\). Further, suppose that he meets \(k\) people while he is contagious; let’s call these \(k\) people the first wave of the epidemic. Based on the random transmission of the disease from the initial person, some of the people in the first wave may get infected with the disease, while others may not
  • Second wave. Now, each person in the first wave goes out into the population and meets \(k\) different people, resulting in a second wave of \(k \cdot k = k^2\) people. Each infected person in the first wave passes the disease independently to each of the \(k\) second-wave people they meet, again independently with probability \(p\)
  • Subsequent waves. Further waves are formed in the same way, by having each person in the current wave meet \(k\) new people, passing the disease to each independently with probability \(p\)

The basic reproductive number

  • there are really only two possibilities for a disease in the branching process model: it reaches a wave where it infects no one, thus dying out after a finite number of steps
  • or it continues to infect people in every wave, proceeding infinitely through the contact network
  • and it turns out that there is a simple condition to tell these two possibilities apart, based on a quantity called the basic reproductive number of the disease
  • the basic reproductive number, denoted \(R_0\), is the expected number of new cases of the disease caused by a single individual
  • since in our model everyone meets \(k\) new people and infects each with probability \(p\), the basic reproductive number here is given by \[R_0 = p \cdot k\]

A dichotomy for branching processes

The outcome of the disease in a branching process model is determined by whether the basic reproductive number is smaller or larger than 1.

Claim: If \(R_0 < 1\), then with probability 1, the disease dies out after a finite number of waves. If \(R_0 > 1\), then with probability greater than 0 the disease persists by infecting at least one person in each wave.

A dichotomy for branching processes

  • when \(R_0 < 1\), the disease isn’t able to replenish itself: each infected person produces less than one new case in expectation, and so the size of the outbreak is constantly trending downward
  • when \(R_0 > 1\), on the other hand, the size of outbreak is constantly trending upward
  • notice, however, that even when \(R_0 > 1\), the conclusion is simply that the disease persists with positive probability, not with absolute certainty
  • in other words, even an ultra-contagious disease can simply get unlucky and vanish from the population before it has a chance to really get going

Public-health measures

  • this suggests that around the critical value \(R_0 = 1\), it can be worth investing large amounts of effort even to produce small shifts in the basic reproductive number
  • since \(R_0\) is the product of the two terms \(p\) and \(k\), it is in fact easy to interpret two basic kinds of public-health measures in terms of reductions to \(R_0\)
    • maintaining a social distance among individuals, quarantining infectious people, or even locking down regions or entire countries, which reduces the quantity \(k\)
    • encouraging behavioral measures such as better sanitary practices to reduce the spread of germs (using masks, washing hands, and so on), which reduces the quantity \(p\)

Analysis of branching processes

Recall the branching process model: each infected individual meets \(k\) others and infects each with probability \(p\).

Thus, the expected number of new cases of the disease caused by each infected individual is \[R_0 = p \cdot k,\] the basic reproductive number.

We want to show that the persistence of the disease depends critically on whether \(R_0\) is smaller or larger than 1, a notion that we will formulate as follows.

Recall that the population in this model is organized into a tree in which every node is connected to \(k\) nodes just below it. Let \(q_n\) denote the probability that the epidemic survives for at least \(n\) waves — in other words, that some individual in the \(n\)-th level of the tree becomes infected.

Let \(q^∗\) be the limit of \(q_n\) as \(n\) goes to infinity; we can think of this as the probability that the disease persists indefinitely. We will prove the following claim.

Claim: (a) If \(R_0 < 1\) then \(q^∗ = 0\). (b) If \(R_0 > 1\) then \(q^∗ > 0\).

A formula for \(q_n\)

The quantity \(q_n\) depends on three more fundamental quantities:

  • the number of contacts per individual \(k\),
  • the contagion probability \(p\), and
  • the level of the tree \(n\).

In fact, it’s difficult to write down a direct formula for \(q_n\) in terms of these quantities, but it’s not hard to express \(q_n\) in terms of \(q_{n-1}\).

Consider the root node, and let’s first ask what it would take for the following event to hold:

(*) The disease spreads through the root node’s first contact j and then continues to persist down to n levels in the part of the tree reachable through j.

Therefore, the probability of the event (∗) is \(p \cdot q_{n−1}\). Or, taking the complementary view, event (∗) fails to hold with probability \[1 − p \cdot q_{n−1}\]

Now, there is a copy of event (∗) for each of the direct contacts of the root node, and each fails to hold with the same probability \(1 − p \cdot q_{n−1}\). Since they’re independent, the probability that they all fail to hold is: \[(1 − p \cdot q_{n−1})^k\]

But this probability is also \(1 − q_n\), since by the definition of \(q_n\), the quantity \(1 − q_n\) is exactly the probability that the disease fails to persist to \(n\) levels.

Therefore, and solving for \(q_n\) we get:

\[1 − q_n = (1 − p \cdot q_{n−1})^k\] and solving for \(q_n\):

\[q_n = 1 - (1 − p \cdot q_{n−1})^k\] Since we are assuming that the root is infected, and we can treat the root as level 0 of the tree, we have \(q_0 = 1\); this simply says that the root is infected with probability 1. How do we solve for such a recursive equation?

Down the rabbit hole: following the values \(q_n\) to a limit

If we define the function \[f(x) = 1 − (1 − p x)^k\] then we can write \[q_n = 1 - (1 − p \cdot q_{n−1})^k\] as follows: \[q_n = f(q_{n−1})\]

This suggests a very clean, purely algebraic way of formulating our question about \(q^∗\), the limit of \(q_n\). We simply want to study the sequence of values \[1,\ f(1),\ f(f(1)),\ f(f(f (1))),\ \ldots\] obtained by applying \(f\) repeatedly.

Interestingly, this sequence converges to the fix point \(x^*\) of \(f\) such that \[x^* = f(x^*)\]

Down the rabbit hole: following the values \(q_n\) to a limit

Here are some basic facts about \[f(x) = 1 − (1 − p x)^k\] that help in producing this plot:

  • first, \(f(0) = 0\) and \(f(1) = 1 - (1−p)^k < 1\). This means that the plot of \(f\) passes through the origin, but lies below the line \(y = x\) once x = 1
  • second, the derivative of f is \(f'(x) = pk(1 − px)^k\). Notice that as \(x\) ranges between 0 and 1, the quantity \(f'(x)\) is positive but monotonically decreasing. This means that \(f\) has an increasing but concave shape
  • finally, the slope of f at \(x = 0\) is equal to \(f'(0) = pk = R_0\).

The case \(R_0 > 1\)

When \(R_0 > 1\), \(y = f(x)\) starts out above \(y = x\) for small positive values of \(x\) but ends up below it by the time we get to \(x = 1\).

We conclude that \(y = f(x)\) must cross \(y = x\) somewhere in the interval between 0 and 1, at a point \(x^∗ > 0\).

Now, using this plot, let’s take a geometric view of the sequence of values \[1,\ f(1),\ f(f(1)),\ f(f(f (1))),\ \ldots\]

  1. if we’re currently at a particular point \((x, x)\) on the line \(y = x\), we first move vertically to the curve \(y = f(x)\); this puts us at the point \((x, f(x))\)
  2. we then move horizontally back to the line \(y = x\); this puts us at the point \((f(x),f(x))\) as desired.
  3. continuing this process, we pass through all the points in the sequence \(x, f(x), f(f (x)), \ldots\) along the line \(y = x\)

If we start this from \(x = 1\), the process converges to a point \((x^∗, x^∗)\), with \(0 < x^∗ < 1\), where the line \(y = x\) meets the curve \(y = f(x)\).

This shows the part (b) of the claim, that is if \(R_0 > 1\), then \(q^∗ > 0\) and the disease persists.

The case \(R_0 < 1\)

Similarly, when \(R_0 < 1\), \(y = f(x)\) starts out below \(y = x\) and still ends up below it by the time we get to \(x = 1\).

We conclude that \(y = f(x)\) do not cross \(y = x\) somewhere in the interval between 0 and 1.

With a similar reasoning we see that in this case the process converges to 0. This shows the part (a) of the claim, that is if \(R_0 < 1\) then \(q^∗ = 0\) and the disease stops.

Challenge

What happens when \(R_0 = 1\)?

The SIR epidemic model

We now develop an epidemic model that can be applied to any network structure. To do this, we preserve the basic ingredients of the branching process model at the level of individual nodes, but make the contact structure much more general.

An individual node in the branching process model goes through three potential stages during the course of the epidemic:

  • Susceptible (S): Before the node has caught the disease, it is susceptible to infection from its neighbors
  • Infectious (I): Once the node has caught the disease, it is infectious and has some probability of infecting each of its susceptible neighbors
  • Removed (R): After a particular node has experienced the full infectious period, this node is removed from consideration, since it no longer poses a threat of future infection

The SIR epidemic model

  • using this three-stage life cycle for the disease at each node, we now define a model for epidemics on networks
  • we are given a directed graph representing the contact network; so an edge pointing from \(v\) to \(w\) in the graph means that if \(v\) becomes infected at some point, the disease has the potential to spread directly to \(w\)
  • to represent a symmetric contact between people, where either has the potential to directly infect the other, we can put in directed edges pointing each way

The SIR epidemic model

The progress of the epidemic is controlled by the contact network structure and by two additional quantities: \(p\) (the probability of contagion) and \(t_I\) (the length of the infection):

  • time is discrete and evolves in integer steps of equal length
  • initially, some nodes are in the \(I\) state and all others are in the \(S\) state
  • each node \(v\) that enters the \(I\) state remains infectious for a fixed number of steps \(t_I\)
  • during each of these \(t_I\) steps, \(v\) has a probability \(p\) of passing the disease to each of its susceptible neighbors
  • after \(t_I\) steps, node \(v\) is no longer infectious or susceptible to further bouts of the disease; we describe it as removed (R), since it is now an inert node in the contact network that can no longer either catch or transmit the disease

The SIR epidemic model

  • the SIR model is clearly most appropriate for a disease that each individual only catches once in their lifetime
  • after being infected, a node is removed either because it has acquired lifetime immunity or because the disease has killed it
  • since each infectious node recovers after some time, on a finite graph after a finite number of steps the disease no longer spreads: each node either never caught the disease (hence it is in state S) or caught the disease and later recovered (hence it is in state R)
  • thus a key question with an SIR epidemic on a given contact network is to understand how long the outbreak will last, and how many individuals will be affected at different points in time
  • given a particular contact network on which the disease can spread, the intensity of the epidemic is a function of the probability of contagion \(p\) (the larger, the more severe the epidemic) and the length of the infection \(t_I\) (the smaller, the less severe the epidemic)
  • notice also that the branching process model is a special case of the SIR model: it simply corresponds to the SIR model where \(t_I = 1\) and the contact network is an infinite tree, with each node connected to a fixed number of neighbors in the level below

The basic reproductive number

  • on infinite networks that do not have a tree structure, the simple dichotomy in epidemic behavior determined by the basic reproductive number \(R_0\) does not necessarily hold
  • in fact, it is not hard to construct an example showing how this dichotomy breaks down
  • consider the infinite network above and assume there are an infinite number of layers
  • recall that \(R_0\) the expected number of new cases caused by a randomly chosen node in the network
  • if the transmission probability is \(p = 2/3\), since each node has \(k = 2\) successor nodes, we have \(R_0 = p \cdot k = 4/3 > 1\)
  • however, in each layer, there are four edges leading to the next layer, and each will independently fail to transmit the disease with probability \(1/3\)
  • therefore, with probability \((1/3)^4 = 1/81\), all four edges will fail to transmit the disease — and at this point, these four edges become a roadblock guaranteeing the disease can never reach the portion of the network beyond them
  • hence, with probability 1, the disease must come to an end after a finite number of layers
  • however, the notion of the basic reproductive number is still a useful heuristic guide: even when epidemiological modelers do not have a precise condition governing when an epidemic will persist and when it will die out, they find \(R_0\) to be a useful approximate indication of the spreading power of the disease

The network effect

  • as we said, given a contact network, the intensity of the epidemic depends on the probability of contagion \(p\) and the length of the infection \(t_I\)
  • however, also the topology of the network has a clear effect on the diffusion of the disease
  • important quantities are the quadratic mean of degrees \[\langle k^2 \rangle = \frac{\sum_i k_{i}^{2}}{n}\] as well as the mean of degrees \[\langle k \rangle = \frac{\sum_i k_{i}}{n}\] and in particular the quantity \[\beta \tau = \ln \frac{\langle k^2 \rangle - \langle k \rangle}{\langle k^2 \rangle - 2 \langle k \rangle} > 0\]
  • in particular, the lower the quantity \(\beta \tau\) the more likely is the diffusion of the epidemic on the network
  • notice that in scale-free networks the degrees of nodes are highly skewed, with a significant fraction of nodes (the hubs of the network) with a very large degree
  • these networks tend to have a large quadratic degree mean \(\langle k^2 \rangle\) compared to the degree mean \(\langle k \rangle\) and hence the value of \(\beta \tau\) tends to 0 since the argument of the logarithm tends to 1 from above
  • this is not the case for random graphs where node degrees tend to be roughly similar
  • summing up, scale-free networks are more vulnerable to epidemics than random ones

The SIS epidemic model

  • so far we have been considering models for epidemics in which each individual contracts the disease at most once
  • however, a simple variation on these models allows us to reason about epidemics where nodes can be reinfected multiple times
  • to represent such epidemics, we have nodes that simply alternate between two possible states:
    • Susceptible (S), and
    • Infectious (I)
  • there is no Removed (R) state here; rather, after a node is done with the Infectious state, it cycles back to the Susceptible state and is ready to catch the disease again

The SIS epidemic model

The mechanics of the model SIS process as follows:

  • initially, some nodes are in the I state and all others are in the S state
  • each node \(v\) that enters the I state remains infectious for a fixed number of steps \(t_I\)
  • during each of these \(t_I\) steps, \(v\) has a probability \(p\) of passing the disease to each of its susceptible neighbors
  • after \(t_I\) steps, node \(v\) is no longer infectious, and it returns to the \(S\) state

SIS and SIR models on finite graphs

  • an SIR epidemic on a finite graph is burning through a bounded supply of nodes — since nodes can never be reinfected — and therefore it must come to an end after a relatively small number of steps
  • an SIS epidemic, on the other hand, can run for an extremely long time as it cycles through the nodes potentially multiple times
  • however, on a finite graph, there will eventually (with probability 1) come a point in time when all contagion attempts simultaneously fail for \(t_I\) steps in a row, and at this point it will be over
  • thus, again, a key question with an SIS epidemic on a given contact network is to understand how long the outbreak will last, and how many individuals will be affected at different points in time
  • the intensity of the epidemic is again a function of the probability of contagion \(p\) (the larger, the more severe the epidemic) and the length of the infection \(t_I\) (the smaller, the less severe the epidemic)

The SIRS epidemic model

SIRS combine elements of the SIR and SIS models in a simple way, so that after an infected node recovers, it passes back to the S state. In detail, the model works as follows:

  • initially, some nodes are in the I state and all others are in the S state
  • each node \(v\) that enters the I state remains infectious for a fixed number of steps \(t_I\)
  • during each of these \(t_I\) steps, \(v\) has a probability \(p\) of passing the disease to each of its susceptible neighbors
  • after \(t_I\) steps, node \(v\) is no longer infectious. It then enters the R state for a fixed number of steps \(t_R\). During this time, it cannot be infected with the disease, nor does it transmit the disease to other nodes
  • after \(t_R\) steps in the R state, node \(v\) returns to the S state

Synchronization

  • we now look at a new issue in the global dynamics of a disease — the tendency of epidemics for certain diseases to synchronize across a population, sometimes producing strong oscillations in the number of affected individuals over time
  • such effects are well-known for diseases including measles and syphilis
  • the crucial ingredients behind synchronization appear to be a combination of temporary immunity and long-range links (weak ties) in the contact network
  • roughly, long-range links produce coordination in the timing of flare-ups across dispersed parts of the network
  • the temporary immunity produces a network-wide deficit in the number and connectivity of susceptible individuals, yielding a large trough in the size of the outbreak that directly follows the peak from the earlier flare-ups

Synchronization and small-world networks

  • temporary immunity can produce oscillations in very localized parts of the network, with patches of immunity following large numbers of infections in a concentrated area
  • but for this to produce large fluctuations that can be seen at the level of the full network, the flare-ups of the disease have to be coordinated so that they happen at roughly the same time in many different places
  • a natural mechanism to produce this kind of coordination is to have a network that is rich in long-range connections, linking otherwise far-apart sections of the network
  • in a small-world networks, many of the links are strong ties, local and clustered ties connecting nodes with very similar social and geographic characteristics
  • however, some links are long-range links, corresponding to weak ties that link very different parts of the network
  • long-range links make it possible for things that happen in one part of the network to quickly affect what is happening elsewhere and are responsible for the synchronization effect observed on SIRS epidemic models

Challenge

  1. implement in R the models SIR, SIS and SIRS
  2. run them on networks generated with the random model and with the preferential attachment model
  3. track the nodes in different states (S, I and R). Do you notice any difference between the two network models?

The SIR model in igraph

library(igraph)

# random graph
g = sample_gnm(100, 200)

# SIR model simulation
# beta: Non-negative scalar. The rate of infection of an individual 
# that is susceptible and has a single infected neighbor
# gamma: Positive scalar. The rate of recovery of an infected individual 
sm = sir(g, beta=5, gamma=1, no.sim = 100)

# plot infected
plot(sm, comp = "NI")

# plot susceptibles
plot(sm, comp = "NS")

# plot removed
plot(sm, comp = "NR")

# median and quantile of S, I and R
median(sm)
## $NS
##     [0,0.268] (0.268,0.537] (0.537,0.805]  (0.805,1.07]   (1.07,1.34] 
##          82.0          41.0          10.0           3.0           2.0 
##   (1.34,1.61]   (1.61,1.88]   (1.88,2.15]   (2.15,2.41]   (2.41,2.68] 
##           2.0           2.0           2.0           2.0           2.0 
##   (2.68,2.95]   (2.95,3.22]   (3.22,3.49]   (3.49,3.76]   (3.76,4.02] 
##           2.0           2.0           2.0           2.0           2.0 
##   (4.02,4.29]   (4.29,4.56]   (4.56,4.83]    (4.83,5.1]    (5.1,5.37] 
##           1.0           2.0           2.0           1.0           2.0 
##   (5.37,5.63]    (5.63,5.9]    (5.9,6.17]   (6.17,6.44]   (6.44,6.71] 
##           2.0           1.0           1.5           1.0           1.5 
##   (6.71,6.98]   (6.98,7.24]   (7.24,7.51]   (7.51,7.78]   (7.78,8.05] 
##           2.0           2.5            NA            NA           1.5 
##   (8.05,8.32]   (8.32,8.59]   (8.59,8.85]   (8.85,9.12]   (9.12,9.39] 
##            NA            NA            NA           1.0           0.0 
## 
## $NI
##     [0,0.268] (0.268,0.537] (0.537,0.805]  (0.805,1.07]   (1.07,1.34] 
##          17.0          52.0          66.0          57.0          44.0 
##   (1.34,1.61]   (1.61,1.88]   (1.88,2.15]   (2.15,2.41]   (2.41,2.68] 
##          34.0          26.0          20.0          15.0          11.0 
##   (2.68,2.95]   (2.95,3.22]   (3.22,3.49]   (3.49,3.76]   (3.76,4.02] 
##           8.0           6.0           5.0           4.0           3.0 
##   (4.02,4.29]   (4.29,4.56]   (4.56,4.83]    (4.83,5.1]    (5.1,5.37] 
##           2.0           2.0           1.0           1.0           1.0 
##   (5.37,5.63]    (5.63,5.9]    (5.9,6.17]   (6.17,6.44]   (6.44,6.71] 
##           0.0           0.0           0.0           0.0           0.0 
##   (6.71,6.98]   (6.98,7.24]   (7.24,7.51]   (7.51,7.78]   (7.78,8.05] 
##           0.0           0.0            NA            NA           0.5 
##   (8.05,8.32]   (8.32,8.59]   (8.59,8.85]   (8.85,9.12]   (9.12,9.39] 
##            NA            NA            NA           0.0           0.0 
## 
## $NR
##     [0,0.268] (0.268,0.537] (0.537,0.805]  (0.805,1.07]   (1.07,1.34] 
##           1.0           6.0          20.0          39.0          54.0 
##   (1.34,1.61]   (1.61,1.88]   (1.88,2.15]   (2.15,2.41]   (2.41,2.68] 
##          64.0          72.0          79.0          83.0          87.0 
##   (2.68,2.95]   (2.95,3.22]   (3.22,3.49]   (3.49,3.76]   (3.76,4.02] 
##          90.0          92.0          93.0          94.0          95.0 
##   (4.02,4.29]   (4.29,4.56]   (4.56,4.83]    (4.83,5.1]    (5.1,5.37] 
##          96.0          96.0          97.0          97.0          97.0 
##   (5.37,5.63]    (5.63,5.9]    (5.9,6.17]   (6.17,6.44]   (6.44,6.71] 
##          98.0          98.0          98.5          99.0          98.5 
##   (6.71,6.98]   (6.98,7.24]   (7.24,7.51]   (7.51,7.78]   (7.78,8.05] 
##          98.0          97.5            NA            NA          98.0 
##   (8.05,8.32]   (8.32,8.59]   (8.59,8.85]   (8.85,9.12]   (9.12,9.39] 
##            NA            NA            NA          99.0         100.0
quantile(sm, comp = "NI", prob = 0.50)
##     [0,0.268] (0.268,0.537] (0.537,0.805]  (0.805,1.07]   (1.07,1.34] 
##          17.0          52.0          66.0          57.0          44.0 
##   (1.34,1.61]   (1.61,1.88]   (1.88,2.15]   (2.15,2.41]   (2.41,2.68] 
##          34.0          26.0          20.0          15.0          11.0 
##   (2.68,2.95]   (2.95,3.22]   (3.22,3.49]   (3.49,3.76]   (3.76,4.02] 
##           8.0           6.0           5.0           4.0           3.0 
##   (4.02,4.29]   (4.29,4.56]   (4.56,4.83]    (4.83,5.1]    (5.1,5.37] 
##           2.0           2.0           1.0           1.0           1.0 
##   (5.37,5.63]    (5.63,5.9]    (5.9,6.17]   (6.17,6.44]   (6.44,6.71] 
##           0.0           0.0           0.0           0.0           0.0 
##   (6.71,6.98]   (6.98,7.24]   (7.24,7.51]   (7.51,7.78]   (7.78,8.05] 
##           0.0           0.0            NA            NA           0.5 
##   (8.05,8.32]   (8.32,8.59]   (8.59,8.85]   (8.85,9.12]   (9.12,9.39] 
##            NA            NA            NA           0.0           0.0

Comparing SIR on random and scale-free graphs

Scale-free networks are more vulnerable than random networks to epidemics, meaning the outbreak of the disease is faster and larger (at least in the SIR epidemic model).

g = sample_gnm(100, 400)
hist(degree(g))

d = degree(g)
k2 = mean(d * d)
k = mean(d)
log( (k2 - k)/(k2 - 2*k) )
## [1] 0.1370599
sm = sir(g, beta=0.25, gamma=0.5, no.sim = 500)
plot(sm, comp = "NI")

quantile(sm, comp = "NI", prob = 0.50)
##    [0,1.18] (1.18,2.35] (2.35,3.53] (3.53,4.71] (4.71,5.89] (5.89,7.06] 
##           4          16          27          32          29          24 
## (7.06,8.24] (8.24,9.42] (9.42,10.6] (10.6,11.8]   (11.8,13]   (13,14.1] 
##          18          12           8           5           4           2 
## (14.1,15.3] (15.3,16.5] (16.5,17.7] (17.7,18.8]   (18.8,20]   (20,21.2] 
##           2           1           0           0           0           0 
## (21.2,22.4] (22.4,23.5] (23.5,24.7] (24.7,25.9] (25.9,27.1] (27.1,28.3] 
##           0           0           0           0          NA           0
g = sample_pa(100, m = 4, power = 3, directed = FALSE)
hist(degree(g))

d = degree(g)
k2 = mean(d * d)
k = mean(d)
log( (k2 - k)/(k2 - 2*k) )
## [1] 0.02415576
sm = sir(g, beta=0.25, gamma=0.5, no.sim = 500)
plot(sm, comp = "NI")

quantile(sm, comp = "NI", prob = 0.50)
##   [0,1]   (1,2]   (2,3]   (3,4]   (4,5]   (5,6]   (6,7]   (7,8]   (8,9]  (9,10] 
##    15.0    32.0    37.0    34.0    25.0    18.0    12.0     8.0     5.5     3.0 
## (10,11] (11,12] (12,13] (13,14] (14,15] (15,16] (16,17] (17,18] (18,19] (19,20] 
##     2.0     1.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0 
## (20,21] (21,22] (22,23] (23,24] (24,25] (25,26] (26,27] (27,28] (28,29] 
##     0.0     0.0      NA      NA      NA      NA      NA      NA     0.0

The vaccination problem

  • suppose we have a disease spreading on a given contact network and a limited quantity of vaccine doses to distribute over the nodes of the network
  • with what strategy do you choose the nodes to vaccinate?
  • the benchmark strategy is randomness: choose at random the nodes to vaccinate
  • however, given the information of the topology of the network, a more effective strategy probably exists

The grandmother and grandchild dilemma

The dilemma is well described by the following excerpt of Wired paper The Vulnerable Can Wait. Vaccinate the Super-Spreaders First:

“Let’s say you’ve got one course of the vaccine and two people to choose between: Candidate 1 is a college student who doesn’t social distance, wears his mask slung beneath his chin, and plays beer pong all weekend at underground frat parties. Candidate 2 is his 87-year-old widowed grandmother, who lives on her own and has barely been out of the house since March. If your goal is to protect the more vulnerable person, you should vaccinate grandma. If your goal is to reduce transmission, you should vaccinate the frat bro. From society’s perspective, he’s a jerk; from the network’s, he’s a hub.”

“Essentially there are two approaches to using a vaccine. One is to protect individuals by vaccinating them, and the other is to reduce transmission and therefore protect the population.”

The AIDS epidemic in Africa

A similar dilemma held with AIDS epidemic in sub-Saharan Africa, where the US government had just announced an ambitious program to combat the disease.

“The Bush administration was giving the treatment to mothers with children because that sounds really good, and it’s soft and cozy. But what our measurement has shown is, no, no, no, you should actually give the HIV drugs to prostitutes, because those are the ones who are the biggest hubs when it comes to the spread of HIV.”

We live in a scale-free world

  • if the contacts in the network were random, each node would have roughly the same degree, hence the random vaccination strategy would be as effective as any other strategy
  • however, the dilemma stems from the fact that the world we live in, more particularly the contact network at hand, is not random
  • on the contrary, the network is probably scale-free, with a long tail degree distribution in which few nodes (the vital few or super-spreaders) have a large number of contacts and the majority of nodes (the trivial many or vulnerable) have a tiny number of contacts
  • we have seen that scale-free networks are more vulnerable than random networks to epidemics, meaning the outbreak of the disease is faster and larger
  • moreover, as we have seen when talking about percolation, removing nodes in decreasing order of degree has a larger effect on the size of the resulting giant component then adopting a random removal approach

From local to global information

“To knock out the super-spreaders, the ideal target for a vaccine would be someone with many contacts in different settings – someone with a big, multigenerational family, a job that led to a lot of mixing with strangers, and a busy social life.”

  • the practical problem is that we do not know the whole contact network and hence cannot detect the hubs, or super-spreaders, of the disease
  • can you find the hubs without a complete map?
  • we have already seen with the Milgram’s small world experiment that it’s possible to estimate a global property of a network (in this case the average separation distance among two nodes) using only local information about the nodes (the most likely neighbor to reach a target node)

Acquaintance immunization strategy

  • in 2003, during the first SARS epidemic, Shlomo Havlin, a physicist at Bar-Ilan University near Tel Aviv, proposed one of the most ingenious solutions to this problem
  • in a paper called Efficient Immunization Strategies for Computer Networks and Populations, Havlin and two colleagues argued that you could achieve global effects on a complex network using only local knowledge
  • all you had to do was follow a simple script:

Take a random sample of a population, ask each individual to name a single acquaintance, and vaccinate the acquaintance.

Acquaintance immunization and the friendship paradox

  • acquaintance immunization works because of a phenomenon known as the friendship paradox, which holds that, on average, your friends have more friends than you do
  • so, by selecting one on your friends, you have a bigger change to land on a hub
  • the very act of asking someone to choose a friend, any friend, played out over hundreds or thousands of iterations, leads inevitably to the most connected people

Is the clever solution the best solution?

  • a drawback inherent in all network-based immunization strategies, by their very nature, is that they require the cascading effects of interventions reaching across an entire population
  • but as soon as you have a disease that’s afflicting millions—or billions—of people, the stakes are too high to start experimenting
  • with lives on the line, who would choose an immunization plan that has never been tested outside of computer models and college campuses?

“There are some clever things you could try, but I think for a lot of reasons it makes sense to try to be not too clever.”

“Once we have protected the high-risk strata, but we don’t have the resources to immediately blanket the rest of the population with vaccine—at that point, differential strategies might be beneficial.”

Dig deeper