Graph Theory

A network is a collection of vertices joined by edges. Vertices and edges are also called nodes and links in computer science, sites and bonds in physics, and actors and ties in sociology. We will mostly use the terms nodes and edges. In mathematics, a network is called graph and it is typically represented as a square matrix. Given a graph $G$ with $n$ nodes numbered from $1$ to $n$, the adjacency matrix $A = (a_{i,j})$ of $G$ is a square $n \times n$ matrix such that $a_{i,j} = 1$ if there exists an edge joining nodes $i$ and $j$, and $a_{i,j} = 0$ otherwise.

Undirected and directed graphs

A graph is undirected if edges have no direction. If there is an edge from $i$ to $j$ in an undirected graph, then there is also an edge from $j$ to $i$. This means that the adjacency matrix of an undirected graph is symmetric: $a_{i,j} = a_{j,i}$ for every pair $i,j$, or $A = A^T$, where $A^T$ is the transpose of $A$. If there is and edge between $i$ and $j$, then $i$ and $j$ are said to be neighbors. The neighbors of node $i$ are the 1s of row (or column) $i$ of $A$. Here is an example of an undirected graph and the corresponding symmetric adjacency matrix:

$$ \begin{pmatrix} 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ \end{pmatrix} $$

A graph is directed if edges have a direction. If there is an edge from $i$ to $j$ in an directed graph, then there is not necessarily an edge from $j$ to $i$, but it might exist. This means that the adjacency matrix of an directed graph is not necessarily symmetric. If there is and edge from $i$ to $j$, then $i$ is a predecessor of $j$ and $j$ is a successor of $i$. The predecessors of node $i$ are the 1s on column $i$ of $A$ and the successors of node $i$ are the 1s on row $i$ of $A$. Here is an example of a directed graph and the corresponding non-symmetric adjacency matrix (notice that node 1 has no predecessor and node 6 has no successor):

$$ \begin{pmatrix} 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ \end{pmatrix} $$

Loops (also known as self-loops or self-edges) are edges $(i,i)$ from a node $i$ to itself. A loop corresponds to a diagonal entry in the adjacency matrix of the graph.

Simple and multigraphs

In the rare cases there can be more than one edge between the same pair of vertices; we refer to those edges collectively as a multiedge. A network that has neither self-edges (loops) nor multiedges is called a simple graph. A network with multiedges is called a multigraph. A network with self-edges (and no multiedges) has no particular name. A multiedge is represented by setting the corresponding matrix element equal to the multiplicity of the edge. Here is an example of an undirected multigraph and the corresponding symmetric adjacency matrix:

$$ \begin{pmatrix} 0 & 1 & 0 & 0 & 3 & 0 \\ 1 & 0 & 2 & 1 & 0 & 0 \\ 0 & 2 & 0 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 0 \\ 3 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 2 \\ \end{pmatrix} $$

Notice that a simple loop in an undirected graph is represented by setting to 2 the corresponding matrix entry. Why? Edges in an undirected graph are represented with two entries $a_{i,j}$ and $a_{j,i}$ in the adjacency matrix. This is the case also for self-edges (in this case both entries coincide). One can also have multiple self-edges. Such edges are represented by setting the corresponding diagonal element of the adjacency matrix equal to twice the multiplicity of the edge.

Unweighted and weighted graphs

Many of the networks we will study are unweighted, that is, they have edges that form simple on/off connections between vertices. Either they are there or they are not. In some situations, however, it is useful to represent edges as having a strength, weight, or value to them. Thus in the Internet edges might have weights representing the amount of data flowing along them or their bandwidth. In a food web predator-prey interactions might have weights measuring total energy flow between prey and predator. In a social network connections might have weights representing frequency of contact between actors. The weights in a weighted network are usually positive real numbers, but there is no reason in theory why they should not be negative. For example, it is common in social network theory to construct networks of social relations between people in which positive edge weights denote friendship or other cordial relationships and negative ones represent animosity.

These weighted networks can be represented by giving the elements of the adjacency matrix values equal to the weights of the corresponding connections. An example of an undirected weighted graph and the corresponding symmetric non-boolean adjacency matrix follows:

$$ \begin{pmatrix} 0 & -1 & 0 & 0 & 0.5 & 0 \\ -1 & 0 & 2 & 2.8 & 0 & 0 \\ 0 & 2 & 0 & -4 & 5 & 6.2 \\ 0 & 2.8 & -4 & 0 & 0 & 0 \\ 0.5 & 0 & 5 & 0 & 0 & 0 \\ 0 & 0 & 6.2 & 0 & 0 & 0 \\ \end{pmatrix} $$

Paths and cycles

A path in a network is any sequence of nodes such that every consecutive pair of vertices in the sequence is connected by an edge in the network. In layman's terms a path is a route across the network that runs from node to node along the edges of the network. Paths can be defined for both directed and undirected networks. In a directed network, each edge traversed by a path must be traversed in the correct direction for that edge. In an undirected network edges can be traversed in either direction. A simple path is a path with no repetitions of nodes and edges allowed. The length of a path is the number of edges of the path.

Given two nodes, there may be many paths between them. The shortest path is the one with the minimum number of edges. Notice that the shortest path is not necessarily unique and may not exist if there are no paths between the nodes.

An undirected graph is connected if for each pair of nodes $i$ and $j$ there is a path between them. A directed graph is strongly connected if for each pair of nodes $i$ and $j$ there is a path from $i$ to $j$ and another path from $j$ to $i$. The maximal sets of nodes that are connected in a graph are called (strongly) connected components. A (strongly) connected graph has a unique (strongly) connected component.

A cycle is a path that starts and ends at the same vertex. A simple cycle is a path with no repetitions of nodes and edges allowed, other than the repetition of the starting and ending node. Notice that, in simple undirected graphs, there are no simple paths of length one and two; hence in such graphs the shortest possible simple cycle has length 3 (on the other hand, in undirected graphs with loops there are simple cycles of length 1 - any loop - and in undirected multigraphs there are simple cycles of length 2 - any multiedge.). A directed graph without simple cycles is called acyclic directed graph. An undirected graph without simple cycles is called forest. A connected forest is a tree. Examples follow:

It is straightforward to calculate the number of (not necessarily simple) paths of a given length on a simple graph in terms of its adjacency matrix. Notice that $$N_{i,j}^{(1)} = [A]_{i,j}$$ is the number of paths of length 1 (edges) from $i$ to $j$. Moreover, the product $a_{i,k} a_{k,j}$ is $1$ if and only if there is a path of length 2 from $i$ to $j$ (that goes through $k$). Hence, the number of paths of length 2 from $i$ to $j$ is: $$N_{i,j}^{(2)} = \sum_{k=1}^{n} a_{i,k} a_{k,j} = [A^{2}]_{i,j}$$ where $[A^{2}]_{i,j}$ is the $(i,j)$ entry of matrix $A^2 = A A$. More generally, the number of paths of length $r$ from $i$ to $j$ is: $$N_{i,j}^{(r)} = [A^{r}]_{i,j}$$ Notice that $[A^{r}]_{i,i}$ is the number of (not necessarily simple) cycles of length $r$ that start and end at the same node $i$.

Let us investigate the meaning of projection matrices $AA^T$ and $A^TA$. In undirected graphs, $A = A^T$, and hence $AA = AA^T = A^T A$. Notice that $[AA]_{i,j}$ is the number of common neighbors of nodes $i$ and $j$ and, in particular, $[AA]_{i,i}$ is the number of neighbors of $i$. In directed networks, however, $A$ and $A^T$ are generally different, and hence $AA$, $AA^T$, and $A^TA$ are also generally different matrices. We have that: $$[AA^T]_{i,j} = \sum_{k=1}^{n} a_{i,k} a_{j,k}$$ is the number of common successors of $i$ and $j$ ($[AA^T]_{i,i}$ is the number of successors of $i$) and $$[A^TA]_{i,j} = \sum_{k=1}^{n} a_{k,i} a_{k,j}$$ is the number of common predecessors of $i$ and $j$ ($[A^TA]_{i,i}$ is the number of predecessors of $i$). Notice that $AA^T$ and $A^T A$ are symmetric matrices: $(AA^T)^T = (A^T)^T A^T = A A^T$ and similarly for $A^T A$. We can consider $AA^T$ and $A^T A$ as adjacency matrices of weighted undirected graphs (with loops). Given a directed graph $G$ with adjacency matrix $A$, the weighted undirected graphs associated with $AA^T$ and $A^TA$ are called projections of $G$. An interesting problem is: do the projections of a graph univocally determine the graph? Or: are there two non-isomorphic graphs that have the same projections? Two graphs are isomorphic if they are the same graph as soon as we ignore node labeling. Here is a directed graph and its two projections:

Bipartite graphs and projections

A bipartite network, also called a two-mode network in the sociology literature, is a graph where there are two kinds of vertices, and edges run from nodes of different types only. Any network in which the vertices are connected together by common membership of groups of some kind can be represented in this way. In sociology such networks are called affiliation networks: directors sitting on the boards of companies, scientists coauthoring papers, and film actors appearing together in films are all examples of such networks.

The equivalent of an adjacency matrix for a bipartite network is a rectangular matrix called an incidence matrix. If $n$ is the number of people or other participants in the network and $g$ is the number of groups, then the incidence matrix $B$ is a $g \times n$ matrix having elements $B_{i,j} = 1$ if group $i$ contains participant $j$ and $B_{i,j} = 0$ otherwise. In the following an example of a bipartite graph with $n=7$ participants and $g=4$ groups, as well as the corresponding $4 \times 7$ incidence matrix:

$$ \begin{matrix} & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 8 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 9 & 0 & 1 & 1 & 1 & 1 & 0 & 0 \\ 10 & 0 & 0 & 0 & 1 & 0 & 1 & 0 \\ 11 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ \end{matrix} $$

Notice that a bipartite graph is a special case of undirected graph, hence it can be represented with a symmetric adjacency matrix as well. Let $G$ be a bipartite graph with $n$ participant nodes numbered $1, 2, \ldots, n$ and $g$ group nodes numeberd $n+1, n+2, \ldots, n+g$. Let $B$ be the $g \times n$ incidence matrix of $G$. Then the adjacency matrix $A$ of $G$ is a symmetric $(n+g) \times (n+g)$ matrix of the form: $$A = \begin{pmatrix} 0 & B^T \cr B & 0 \end{pmatrix}$$ where $0$ are square matrices of size $n \times n$ (upper) and $g \times g$ (lower).

Although a bipartite network may give the most complete representation of a particular network it is often convenient to work with direct connections between vertices of just one type. We can use the bipartite network to infer such connections, creating a one-mode projection from the two-mode bipartite form. As an example, consider the case of the films and actors. We can perform a projection onto the actors alone by constructing the $n$-vertex network in which the vertices represent actors and two actors are connected by an edge if they have appeared together in a film. The weight of the edge is the number of movies in which the two actors have co-appeared. The actor projection graph has weighted self-edges on each node, the weights being the number of films the actor has played. The corresponding one-mode projection onto the films would be the $g$-vertex network where the vertices represent films and two films are connected if they share a common actor. The weight of the edge is the number of actors shared by the two movies. The film projection graph has weighted self-edges on each node, the weights being the number of actors casted in the film.

If $B$ is the incidence matrix of a bipartite graph, the one-mode projection graph on participants (actors in the above example) is the $n \times n$ symmetric matrix $B^T B$, and the one-mode projection graph on groups (movies in the above example) is the $g \times g$ symmetric matrix $B B^T$. The two projections of the above bipartite graph are shown below (self-edges and all weights are omitted):

An interesting problem is: do the projections of a bipartite graph univocally determine the graph? Or: are there two non-isomorphic bipartite graphs that have the same projections? This generalizes the above mentioned projection problem on directed graphs to rectangular matrices.

Regular and regularizable graphs

In a weighted undirected graph, the weighted degree of a node is the sum of weights of edges incident in the node. In a weighted directed graph, we distinguish between weighted out-degree (the sum of weights of edges leaving the node) and weighted in-degree (the sum of weights of edges entering the node). These are generalizations of the notions of degree in unweighted graphs.

An unweighted undirected graph is regular if there is an integer $r > 0$ such that all nodes have degree equal to $r$. An unweighted directed graph is regular if there is an integer $r > 0$ such that all nodes have out-degree and in-degree equal to $r$. An unweighted undirected graph is regularizable if the edges of the graph can be weighted with positive integers and in the resulting weighted graph all nodes have weighted degree equal to $r > 0$. An unweighted directed graph is regularizable if the edges of the graph can be weighted with positive integers and in the resulting weighted graph all nodes have weighted out-degree and weighted in-degree equal to $r > 0$.

Clearly, a regular graph is regularizable but there are regularizable graphs that are not regular. Moreover, not all graphs are regularizable. For instance, it is easy to see that a star graph with at least 3 nodes is not regularizable.

Given an unweighed directed graph $G$, a spanning cycle forest $F$ is a subgraph of $G$ such that:

  1. $F$ spans $G$, that is, $G$ and $F$ have the same nodes;
  2. the strongly connected components of $F$ are simple cycles.

Given an unweighed undirected graph $G$, a spanning cycle forest $F$ is a subgraph of $G$ such that:

  1. $F$ spans $G$, that is, $G$ and $F$ have the same nodes;
  2. the connected components of $F$ are single edges or simple cycles (including loops).

The Sinkhorn-Knopp theorem proves that a graph is regularizable if and only if for every edge of the graph there exists a spanning cycle forest that contains the edge.

Consider the following two undirected graphs. The left graph contains a spanning cycle forest formed by edges $(1,2)$ and $(3,4)$. However, the graph is not regularizable since edges $(1,3)$ and $(1,4)$ do not belong to any spanning cycle forest. The graph on the right is regularizable: each edge belongs to the spanning cycle forest formed by the tringle that contains the edge plus the opposite edge. If $\alpha > 0$ and we label the outer edges with $3 \alpha$ and the inner edges with $2 \alpha$ we have a positive regularizability solution with regularization degree $8 \alpha$. Clearly, the graph is not regular.

Consider the following two directed graphs. A left graph contains a spanning cycle forest: the loop plus the 3-cycle make a spanning cycle forest. However, the remaining edges (those on the 2-cycle) are not contained in any spanning cycle forest, hence the graph is not regularizable. The graph of the right is regularizable: the loop plus the 3-cycle and the two 2-cycles form two distinct spanning cycle forests that cover all edges. It is easy to see that the graph is not regular.