Tidy network data?

  • there’s a discrepancy between network data and the tidy data idea, in that network data cannot in any meaningful way be encoded as a single tidy data frame
  • on the other hand, both node and edge data by itself fits very well within the tidy concept as each node and edge is, in a sense, a single observation
  • thus, a close approximation of tidyness for network data is two tidy data frames, one describing the node data and one describing the edge data

tidygraph

  • tidygraph is an entry into the tidyverse that provides a tidy framework for network (graph) data
  • tidygraph provides an approach to manipulate node and edge data frames using the interface defined in the dplyr package
  • moreover it provides tidy interfaces to a lot of common graph algorithms, including igraph network analysis toolkit
  • it is developed by Thomas Lin Pedersen

ggraph

  • ggraph is an extension of ggplot2 that implements a visualization grammar for network data
  • it provides a huge variety of geoms for drawing nodes and edges, along with an assortment of layouts making it possible to produce a very wide range of network visualization types
  • while tidygraph provides a manipulation and analysis grammar for network data (like dplyr for tabular data), ggraph offers a visualization grammar (like ggplot for tabular data)
  • it is developed by Thomas Lin Pedersen

Read the graph with tidygraph

Let’s read a dolphin network:

  1. a set of nodes representing dolphins (dolphin_nodes.csv)
  2. a set of edges representing ties among dolphins (dolphin_edges.csv)

Package tidygraph represents the graph as a pair of data frames:

  • a data frame for nodes containing information about the nodes in the graph
  • A data frame for edges containing information about the edges in the graph. The terminal nodes of each edge must either be encoded in a to and from column, or in the two first columns, as integers. These integers refer to nodes index.
library(tidyverse)
library(tidygraph)
library(ggraph)

# setting the graph theme
set_graph_style()

nodes = read_csv("dolphin_nodes.csv")
edges = read_csv("dolphin_edges.csv")

nodes
## # A tibble: 62 × 2
##    name       sex  
##    <chr>      <chr>
##  1 Beak       M    
##  2 Beescratch M    
##  3 Bumper     M    
##  4 CCL        F    
##  5 Cross      M    
##  6 DN16       F    
##  7 DN21       M    
##  8 DN63       M    
##  9 Double     F    
## 10 Feather    M    
## # ℹ 52 more rows
edges
## # A tibble: 159 × 2
##        x     y
##    <dbl> <dbl>
##  1     4     9
##  2     6    10
##  3     7    10
##  4     1    11
##  5     3    11
##  6     6    14
##  7     7    14
##  8    10    14
##  9     1    15
## 10     4    15
## # ℹ 149 more rows
# add edge type
edges = 
  edges %>% 
  mutate(type = sample(c("love", "friendship"), 
                       nrow(edges), 
                       replace = TRUE) )

# make a tidy graph
dolphin = tbl_graph(nodes = nodes, edges = edges, directed = FALSE)
dolphin
## # A tbl_graph: 62 nodes and 159 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 62 × 2 (active)
##    name       sex  
##    <chr>      <chr>
##  1 Beak       M    
##  2 Beescratch M    
##  3 Bumper     M    
##  4 CCL        F    
##  5 Cross      M    
##  6 DN16       F    
##  7 DN21       M    
##  8 DN63       M    
##  9 Double     F    
## 10 Feather    M    
## # ℹ 52 more rows
## #
## # Edge Data: 159 × 3
##    from    to type      
##   <int> <int> <chr>     
## 1     4     9 love      
## 2     6    10 friendship
## 3     7    10 love      
## # ℹ 156 more rows
# extract node and edge data frames from the graph
as.list(dolphin)
# extract node data frame from the graph
as.list(dolphin)$nodes
# extract edge data frame from the graph
as.list(dolphin)$edges

ggraph components

ggraph builds upon three core concepts that are quite easy to understand:

  • the layout defines how nodes are placed on the plot. ggraph has access to all layout functions available in igraph and much more
  • the nodes are the connected entities in the graph structure. These can be plotted using the geom_node_*() family of geoms
  • the edges are the connections between the entities in the graph structure. These can be visualized using the geom_edge_*() family of geoms

ggraph basics

# basic plot
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point()

# plot edge type
ggraph(dolphin) + 
  geom_edge_link(aes(color = type)) + 
  geom_node_point()

# plot node sex
ggraph(dolphin) + 
  geom_edge_link(aes(color = type)) + 
  geom_node_point(aes(shape = sex))

# plot node name
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point() + 
  geom_node_text(aes(label = name), repel=TRUE)

Faceting

Faceting allows to create sub-plots according to the values of a qualitative attribute on nodes or edges.

# facet edges by type
ggraph(dolphin) + 
  geom_edge_link(aes(color = type)) + 
  geom_node_point() +
  facet_edges(~type)

# facet nodes by sex
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point(aes(color = sex)) +
  facet_nodes(~sex)

# facet both nodes and edges
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point() +
  facet_graph(type~sex) + 
  th_foreground(border = TRUE)

Directed graphs

# directed graphs
package = tibble(
  name = c("igraph", "ggraph", "dplyr", "ggplot", "tidygraph")
)

tie = tibble(
  from = c("igraph", "ggplot", "igraph", "dplyr", "ggraph"),
  to =   c("tidygraph", "ggraph", "tidygraph", "tidygraph", "tidygraph")
)

tidy = tbl_graph(nodes = package, edges = tie, directed = TRUE)


# use arrows for directions
ggraph(tidy, layout = "graphopt") + 
    geom_edge_link(aes(start_cap = label_rect(node1.name), 
                       end_cap = label_rect(node2.name)), 
                   arrow = arrow(type = "closed", 
                                 length = unit(3, "mm"))) + 
    geom_node_text(aes(label = name))

# use edge alpha to indicate direction, 
# direction is from lighter to darker node
ggraph(tidy, layout = 'graphopt') + 
    geom_edge_link(aes(start_cap = label_rect(node1.name), 
                       end_cap = label_rect(node2.name), 
                       alpha = stat(index)), 
                   show.legend = FALSE) + 
    geom_node_text(aes(label = name))

Hierarchical layouts

# This dataset contains the graph that describes the class 
# hierarchy for the Flare visualization library
head(flare$vertices)
##                                           name size             shortName
## 1 flare.analytics.cluster.AgglomerativeCluster 3938  AgglomerativeCluster
## 2   flare.analytics.cluster.CommunityStructure 3812    CommunityStructure
## 3  flare.analytics.cluster.HierarchicalCluster 6714   HierarchicalCluster
## 4            flare.analytics.cluster.MergeEdge  743             MergeEdge
## 5  flare.analytics.graph.BetweennessCentrality 3534 BetweennessCentrality
## 6           flare.analytics.graph.LinkDistance 5731          LinkDistance
head(flare$edges)
##                      from                                           to
## 1 flare.analytics.cluster flare.analytics.cluster.AgglomerativeCluster
## 2 flare.analytics.cluster   flare.analytics.cluster.CommunityStructure
## 3 flare.analytics.cluster  flare.analytics.cluster.HierarchicalCluster
## 4 flare.analytics.cluster            flare.analytics.cluster.MergeEdge
## 5   flare.analytics.graph  flare.analytics.graph.BetweennessCentrality
## 6   flare.analytics.graph           flare.analytics.graph.LinkDistance
# flare class hierarchy
graph = tbl_graph(edges = flare$edges, nodes = flare$vertices)

# dendrogram
ggraph(graph, layout = "dendrogram") + 
  geom_edge_diagonal()

# circular dendrogram
# notice the "dynamic" variable leaf
ggraph(graph, layout = "dendrogram", circular = TRUE) + 
  geom_edge_diagonal() + 
  geom_node_point(aes(filter = leaf)) + 
  coord_fixed()

# rectangular tree map
# notice the "dynamic" variable depth
ggraph(graph, layout = "treemap", weight = size) + 
  geom_node_tile(aes(fill = depth), size = 0.25)

# circular tree map
ggraph(graph, layout = "circlepack", weight = size) + 
  geom_node_circle(aes(fill = depth), size = 0.25, n = 50) + 
  coord_fixed()

# icicle
ggraph(graph, layout = "partition") + 
  geom_node_tile(aes(y = -y, fill = depth))

# sunburst (circular icicle)
ggraph(graph, layout = "partition", circular = TRUE) +
  geom_node_arc_bar(aes(fill = depth)) +
  coord_fixed()

Network analysis with tidygraph

  • the data frame graph representation can be easily augmented with metrics computed on the graph
  • before computing a metric on nodes or edges use the activate() function to activate either node or edge data frames
  • use dplyr verbs filter, arrange and mutate to manipulate the graph

Network analysis with tidygraph

dolphin = 
  dolphin %>% 
  activate(nodes) %>% 
  mutate(degree = centrality_degree()) %>% 
  filter(degree > 0) %>% 
  arrange(-degree) %>% 
  activate(edges) %>% 
  mutate(betweenness = centrality_edge_betweenness(), 
         # .N() gets the nodes data from edge you're accessing
         homo = (.N()$sex[from] == .N()$sex[to])) %>% 
  arrange(-betweenness)

dolphin
## # A tbl_graph: 62 nodes and 159 edges
## #
## # An undirected simple graph with 1 component
## #
## # Edge Data: 159 × 5 (active)
##     from    to type       betweenness homo 
##    <int> <int> <chr>            <dbl> <lgl>
##  1    10    17 love              283. FALSE
##  2    13    29 love              219. FALSE
##  3     6    10 friendship        184. TRUE 
##  4     2    17 love              181. TRUE 
##  5    17    48 love              173. TRUE 
##  6     9    48 love              146. FALSE
##  7    20    29 friendship        144. TRUE 
##  8    10    32 love              129. TRUE 
##  9    17    43 love              113. FALSE
## 10     7    17 love              105. TRUE 
## # ℹ 149 more rows
## #
## # Node Data: 62 × 3
##   name    sex   degree
##   <chr>   <chr>  <dbl>
## 1 Grin    F         12
## 2 SN4     F         11
## 3 Topless M         11
## # ℹ 59 more rows

Analyse and visualize network: centrality

Packages tidygraph and ggraph can be pipelined to perform analysis and visualization tasks in one go.

dolphin %>% 
  activate(nodes) %>%
  mutate(pagerank = centrality_pagerank()) %>%
  activate(edges) %>%
  mutate(betweenness = centrality_edge_betweenness()) %>%
  ggraph() +
  geom_edge_link(aes(alpha = betweenness)) +
  geom_node_point(aes(size = pagerank, colour = pagerank)) + 
  # discrete colour legend
  scale_color_gradient(guide = "legend")

Analyse and visualize network: communities

# visualize communities of nodes
dolphin %>% 
  activate(nodes) %>%
  mutate(community = as.factor(group_louvain())) %>% 
  ggraph() + 
  geom_edge_link() + 
  geom_node_point(aes(colour = community), size = 5)