Connected components

A connected component of an undirected graph is a maximal set of nodes such that each pair of nodes is connected by a path.

  • connected components form a partition of the set of graph vertices, meaning that connected components are:
    1. non-empty
    2. pairwise disjoints
    3. the union of connected components forms the set of all vertices
  • equivalently, we can say that the relation that associates two nodes if and only if they belong to the same connected component is an equivalence relation, that is it is:
    1. reflexive
    2. symmetric
    3. transitive

The giant component

  • in real undirected networks, we typically find that there is a large component (the giant component) that fills most of the network
  • both the random model and the preferential attachment model generate networks with giant components
  • in particular, random graphs with \(n\) nodes and edge probability \(p\) show a giant component as soon as \(p > 1/n\)
  • it follows that if the mean node degree is at least one, \(p (n-1) \geq 1\), then \(p \geq 1/(n-1) > 1/n\), and we have the giant component

The giant component

library(igraph)
n = 100
p = 2/n
g = sample_gnp(n, p)
plot(g, vertex.size = 3, vertex.color = "black", vertex.label=NA)

n = 100
p = 1/n
g = sample_gnp(n, p)
plot(g, vertex.size = 3, vertex.color = "black", vertex.label=NA)

Play

Name at least three networks in which the giant component tipically covers the entire graph.

Solution

The Internet is a good example: there would be no point in being part of the Internet if you are not part of its largest component, since that would mean that you are disconnected from and unable to communicate with almost everyone else.

Other examples include power grids, train routes, and electronic circuits.

Biconnected components and more

\(k\)-component is a maximal set of vertices such that each is reachable from each of the others by at least \(k\) node-independent paths (paths that do not share any node unless the source and the target ones).

  • a 1-component is just an ordinary component; a 2-component is called bicomponent; a 3-component is called tricomponent
  • for \(k \geq 2\), a \(k\)-component is nested within a \(k-1\) component: every pair of nodes connected by \(k\) independent paths is also connected by \(k-1\) independent paths
  • for \(k \geq 2\), \(k\)-components may overlap and they do not define a partition of (or equivalence relation on) the set of vertices
  • for \(k \geq 3\), the \(k\)-components in a network can be non-contiguous

Play

  • find a graph with two overlapping bicomponents
  • find a graph with a non-contiguous tricomponent

Solution

Connectivity

  • the number of node-independent paths between two nodes is also known as the connectivity of the two nodes
  • the minimum connectivity of any pair of nodes in a graph is the connectivity (or cohesion) of the graph
  • connectivity of two nodes can be thought as a measure of how strongly connected the two nodes are

Connectivity

  • the smallest set of vertices that would have to be removed in order to disconnect two nodes not linked by an edge is called the minimum cut set for the two vertices
  • it holds that the size of the minimum cut set for two nodes (not linked by an edge) is exactly the connectivity of the two nodes
  • it follows that a \(k\)-component remains connected as soon as less than \(k\) nodes (and the incident edges) are removed
  • for instance, a bicomponent is still connected if we remove any of its nodes
  • for these reasons, the idea of \(k\)-component is associated with the idea of network robustness
  • for instance, one would hope that most of the Internet network backbone is a \(k\)-component with high \(k\), so that it would be difficult for routers on the backbone to lose connection with one another

Strongly connected components

Two vertices are in the same strongly connected component if each can reach and is reachable from the other along a directed path

  • strongly connected components form a partition of the vertex set and define an equivalence relation among nodes
  • not all directed networks have a large strongly connected component
  • in particular, any acyclic network has no strongly connected components of size greater than one
  • real-life networks that are (almost) acyclic are article citation networks and food webs
  • associated with each strongly connected component is an out-component: the set of vertices that can be reached from the strongly connected component but that cannot reach it
  • and an in-component: the set of vertices that can reach the strongly connected component but that are note reachable from it

The bow-tie structure of the Web

Code – Connected components

n = 100
p = 2/n
g = sample_gnp(n, p)

# connected?
is_connected(g)
## [1] FALSE
# get the components 
(c = components(g))
## $membership
##   [1]  1  2  3  3  3  3  3  3  4  3  3  3  3  3  3  3  3  5  3  6  3  7  8
##  [24]  3  3  3  3  3  3  3  3  9  3  3  3  3  3 10 11  3  3  3  3  3 12  8
##  [47]  3  3 13  3  5  3  3  3  3 14  3  3  3  3  3  3  3 14  3 15 16  3  3
##  [70]  3  3 17 11 18  3 19  3  3 20  9  3  3  3 21  3  3 22  3  3  3  3  3
##  [93]  3 23  3  3  3 24 25 26
## 
## $csize
##  [1]  1  1 70  1  2  1  1  2  2  1  2  1  1  2  1  1  1  1  1  1  1  1  1
## [24]  1  1  1
## 
## $no
## [1] 26
# get the giant component
nodes = which(c$membership == which.max(c$csize))

# color in red the nodes in the giant component
V(g)$color = "white"
V(g)[nodes]$color = "black"
plot(g, vertex.size = 3, vertex.label=NA)

Code – Bicomponents

n = 100
p = 3/n
g = sample_gnp(n, p)

bc = biconnected_components(g)

# number of bicomponents
bc$no
## [1] 23
# print nodes of biconnected components
bcn = bc$components
bcn[[1]]
## + 2/100 vertices, from 3b61d80:
## [1] 13  6
# print edges of biconnected components
bce = bc$component_edges
bce[[1]]
## + 1/144 edge from 3b61d80:
## [1] 6--13
# size of bicomponents
size = sapply(bcn, length)
table(size) 
## size
##  2 71 
## 22  1
# highlight the largest bi-component
giant = bcn[[which.max(size)]]
V(g)$color  = "white"
V(g)[giant]$color = "black"
plot(g, vertex.label=NA, vertex.size=5)

k-connected components

g = make_graph("Zachary")

# connectivity of the graph
vertex_connectivity(g)
## [1] 1
# this is computationally **heavy**
b = cohesive_blocks(g)

# print hieararchy of blocks
print(b)
## Cohesive block structure:
## B-1         c 1, n 34
## '- B-2      c 2, n 28   oooo...ooo ..oooo.ooo oooooooooo oooo 
##    '- B-4   c 4, n  5   oooo...o.. .......... .......... .... 
##    '- B-5   c 3, n  7   ooo.....o. .......... .......... o.oo 
##    '- B-7   c 4, n  5   oooo...... ...o...... .......... .... 
##    '- B-8   c 3, n 10   ..o....... .......... ...ooo.ooo .ooo 
## '- B-3      c 2, n  6   o...ooo... o.....o... .......... .... 
##    '- B-6   c 3, n  5   o...ooo... o......... .......... ....
# plot hieararchy of blocks
plot_hierarchy(b) 

# nodes from blocks
bn = blocks(b)

# graphs from blocks
bg = graphs_from_cohesive_blocks(b, g)

# plot most cohesive graph
c = cohesion(b)
h = bg[[which.max(c)]]
plot(h, vertex.size = 3, vertex.label=NA)

Code – Strongly connected components

n = 10
p = 2/n
g = sample_gnp(n, p, directed=TRUE)
coords = layout_with_fr(g)
plot(g, layout=coords, 
     vertex.size = 6, vertex.label=NA, 
     edge.arrow.size = 0.5)

# strong components
c = components(g, mode="strong")
nodes = which(c$membership == which.max(c$csize))

# color in red the nodes in the giant component
V(g)$color = "white"
V(g)[nodes]$color = "black"
plot(g, layout=coords, 
     vertex.size = 6, vertex.label=NA, 
     edge.arrow.size = 0.5)

Percolation

  • related to network connectivity and robustness is percolation, one of the simplest of processes taking place on networks
  • imagine taking a network and removing some fraction of its vertices, along with the edges connected to those vertices: this process is called percolation
  • one of the goals of percolation theory on networks is to understand how the knock-on effects of vertex removal affect the network as a whole

Failure of Internet routers

  • the failure of routers on the Internet, for instance, can be formally represented by removing the corresponding vertices and their attached edges from a network representation of the Internet.
  • in fact, about 3% of the routers on the Internet are non-functional for one reason or another at any one time, and it is a question of some practical interest what effect this will have on the performance of the network

Vaccination

  • another example of a percolation phenomenon is the vaccination or immunization of individuals against the spread of disease
  • diseases spread through populations over the networks of contacts between individuals, but if an individual is vaccinated against a disease and therefore cannot catch it, then that individual does not contribute to the spread of the disease
  • of course, the individual is still present in the network, but, from the point of view of the spread of the disease, might as well be absent, and hence the vaccination process can again be formally represented by removing vertices

Percolation strategies

  • there is more than one way in which vertices can be removed from a network
  • in the simplest case they could be removed purely at random
  • one popular alternative removal scheme is to remove vertices according to their degree in decreasing order
  • this approach turns out to make an effective vaccination strategy for the control of disease
  • the vaccination of an individual in a population not only prevents that individual from becoming infected but also prevents them from infecting others
  • any other node centrality measures can be used as an alternative and might be more efficient

Play

Consider the following code for the percolation process on a network. It increamentally remove from the graph a given number of nodes from a given vector of nodes and compute the size of the giant component after each removal

# percolation removes nodes from a graph and computes 
# the size of the giant connected component
# INPUT
# g: graph to percolate
# size: number of nodes to remove 
# d: removal vector
# OUTPUT
# giant: a vector with sizes of giant components when nodes are removed
percolate = function(g, size, d) {
  
  giant = vector()
  
  # initial size of giant component
  c = components(g)
  giant[1] = max(c$csize)
  
  # find vital nodes
  names(d) = 1:length(d)
  d = sort(d, decreasing=TRUE)
  vital = as.integer(names(d[1:size]))
  
  # compoute size of giant component after incremental removal 
  for (i in 1:size) {
    c = components(delete_vertices(g, vital[1:i]))
    giant[i+1] = max(c$csize)
  }
  
  return(giant)
  
}

Test the percolation process on random and preferential attachment networks: do you see any difference? Why?

Solution

# Preferential attachment graph
g = sample_pa(n = 100, m = 2, directed=FALSE)

# resilience
size = vcount(g)/2
# random
rand = percolate(g, size, d = sample(V(g), size))    
# degree
deg = percolate(g, size, d = degree(g))    
# pagerank
pr = percolate(g, size, d=page_rank(g)$vector)    
# betweenness
bet = percolate(g, size, d = betweenness(g))    

plot(0:size, deg, type = "l", col=1, 
     xlab="Number of removed nodes", 
     ylab="Size of giant component")
lines(0:size, pr, col=2)
lines(0:size, bet, col=3)
lines(0:size, rand, col=4)
lines(0:size, rep(vcount(g)/2, size+1), lty=2)
legend(x = "bottomleft", 
       legend = c("deg", "pr", "btw", "rand"), lty = 1, col = 1:4)

# random model
g = sample_gnp(n=100, p=5/100)

# resilience
size = vcount(g)/2
# random
rand = percolate(g, size, d = sample(V(g), size))    
# degree
deg = percolate(g, size, d = degree(g))    
# pagerank
pr = percolate(g, size, d=page_rank(g)$vector)    
# betweenness
bet = percolate(g, size, d = betweenness(g))    

plot(0:size, deg, type = "l", col=1, 
     xlab="Number of removed nodes", 
     ylab="Size of giant component")
lines(0:size, pr, col=2)
lines(0:size, bet, col=3)
lines(0:size, rand, col=4)
lines(0:size, rep(vcount(g)/2, size+1), lty=2)
legend(x = "bottomleft", 
       legend = c("deg", "pr", "btw", "rand"), lty = 1, col = 1:4)