Jose A. Rodriguez of the University of Barcelona created a network of the individuals involved in the bombing of commuter trains in Madrid on March 11, 2004. Rodriguez used press accounts in the two major Spanish daily newspapers (El Pais and El Mundo) to reconstruct the terrorist network. The names included were of those people suspected of having participated and their relatives. Rodriguez specified 4 kinds of ties linking the individuals involved:

These four were added together providing a strength of connection index that ranges from 1 to 4.

Dataset

Data challenges

  1. Use similarity among pairs of terrorists to detect the most similar and most dissimilar individuals
  2. Use clustering and dissimilarity as distance to discover the terrorist cells and highlight the cells in the terrorist network using different colors

Use similarity among pairs of terrorists to detect the most similar individuals

library(tidyverse)
library(igraph)
library(ggraph)
terrorists = read_csv("nodes.csv")
ties = read_csv("ties.csv")

# make graph
g = graph_from_data_frame(ties, directed = FALSE, vertices = tibble(1:nrow(terrorists)))

# remove isolated nodes
isolated = which(degree(g) == 0)
g = delete_vertices(g, isolated)
# similarity as (Pearson) correlation among columns
A = as_adjacency_matrix(g, attr = "weight", sparse = FALSE)
S = cor(A)
# remove self similarity
S = S + diag(-1, nrow(A))

# tidy similarity matrix (map matrix to graph and graph to data frame)
# matrix to graph
simga_graph = graph_from_adjacency_matrix(S, mode = "undirected", weighted = TRUE)

# graph to data frame
sigma = as.tibble(as_data_frame(simga_graph, what = "edges")) %>% 
  rename(x = from, y = to, similarity = weight) %>% 
  mutate(x = as.integer(x), y = as.integer(y)) %>%
  left_join(terrorists, join_by(x == id)) %>% 
  left_join(terrorists, join_by(y == id))

# most similar pairs
head(arrange(sigma, -similarity))
## # A tibble: 6 × 5
##       x     y similarity name.x              name.y           
##   <dbl> <dbl>      <dbl> <chr>               <chr>            
## 1    43    51      1     Abddenabi Koujma    Anuar Asri Rifaat
## 2    13    14      0.906 Mohamed Atta        Ramzi Binalshibh 
## 3     4     5      0.881 Vinay Kholy         Suresh Kumar     
## 4    12    58      0.855 Abu Musad Alsakaoui Shakur           
## 5    12    13      0.847 Abu Musad Alsakaoui Mohamed Atta     
## 6    12    15      0.836 Abu Musad Alsakaoui Mohamed Belfatmi
# most dissimilar pairs
head(arrange(sigma, similarity))
## # A tibble: 6 × 5
##       x     y similarity name.x         name.y             
##   <dbl> <dbl>      <dbl> <chr>          <chr>              
## 1     3    21     -0.276 Mohamed Chaoui Jos? Emilio Su?rez 
## 2     1    21     -0.271 Jamal Zougam   Jos? Emilio Su?rez 
## 3     3    64     -0.235 Mohamed Chaoui Emilio Llamo       
## 4     3    65     -0.235 Mohamed Chaoui Ivan Granados      
## 5     3    66     -0.235 Mohamed Chaoui Raul Gonzales Perez
## 6     3    67     -0.235 Mohamed Chaoui El Gitanillo

Use clustering and dissimilarity as distance to discover the terrorist cells

# distance matrix
D = 1-S

# distance object
d = as.dist(D)

# average-linkage clustering method
cc = hclust(d, method = "average")

# cut dendrogram at 4 clusters
cells = as.factor(cutree(cc, k = 4))

# plot graph with clusters
ggraph(g) + 
  geom_edge_link(aes(alpha = weight)) + 
  geom_node_point(aes(color = cells)) +
  theme_graph()