People have, it appears, a strong tendency to associate with others whom they perceive as being similar to themselves in some way. This tendency is called homophily or assortative mixing. More rarely, we also encounters disassortative mixing, the tendency of people to associate with others who are unlike them. The most familiar example of disassortativity is mixing by gender in sexual contact networks: the majority of sexual relationships are heterosexual, although some homosexual relationships also occur. Assortative mixing is mainly a property of social networks, although not exclusively.
Assortative mixing can be distinguished in mixing by enumerative or scalar characteristics. An enumerative characteristic has a finite set of possible values. Examples are gender, race, and nationality. Given an enumerative characteristic, each node of the network can be assigned to a type according to the value of the characteristic for the node. The network is assortative if a significant fraction of the edges run between vertices of the same type.
Let us focus on undirected multi-graphs, that is, graphs that allow self-edges (edges involving the same node) and multi-edges (more than one simple edge between two vertices). A measure of assortativity is the number of edges that run between vertices of the same type minus the number of such edges we would expect to find if the configuration model is assumed, that is if edges were positioned at random while preserving the vertex degrees. Let us denote $c_i$ the type of vertex $i$ and $\delta(c_i,c_j) = 1$ if $c_i = c_j$ and $\delta(c_i,c_j) = 0$ otherwise. Hence, the number of edges that run between vertices of the same type is: $$ \sum_{(i,j) \in E} \delta(c_i, c_j) = \frac{1}{2} \sum_{i,j} a_{i,j} \delta(c_i, c_j) $$ where $E$ is the set of edges of the graph and $a_{i,j}$ is the actual number of edges between $i$ and $j$, which is zero or more (notice that each undirected edge is represented by two pairs in the second sum, hence the factor one-half).
The expected number of edges that run between vertices of the same type is: $$\frac{1}{2} \sum_{i,j} \frac{k_i k_j}{2m} \delta(c_i, c_j)$$ where $k_i$ and $k_j$ are the degrees of $i$ and $j$, while $m$ is the number of edges of the graph. Notice that $k_i k_j / 2m$ is the expected number of edges between vertices $i$ and $j$ in the configuration model assumption. Consider a particular edge attached to vertex $i$. The probability that this edge goes to node $j$ is $k_j / 2m$, since the number of edges attached to $j$ is $k_j$ and the total number of edge ends in the network is $2m$ (the sum of all node degrees). Since node $i$ has $k_i$ edges attached to it, the expected number of edges between $i$ and $j$ is $k_i k_j / 2m$. Hence the difference between the actual and expected number of edges connecting nodes of the same type, expressed as a fraction with respect to the total number of edges $m$, is called modularity, and given by:
$$Q = \frac{1}{2m} \sum_{i,j} \left(a_{i,j} - \frac{k_i k_j}{2m}\right) \delta(c_i, c_j)$$The modularity takes positive values if there are more edges between same-type vertices than expected, and negative values if there are less. We can normalize modularity by dividing it by the maximum value it can have, that is the modularity of a perfectly mixed network, which is a network where all edges run between nodes of the same type. In this network, the actual number of edges with nodes of the same type is of course $m$, hence the modularity results: $$Q_{max} = \frac{1}{2 m} \left( 2m - \sum_{i,j} \frac{k_i k_j}{2m} \delta(c_i, c_j) \right)$$ and the normalized modularity is: $$\frac{Q}{Q_{max}} = \frac{\sum_{i,j} (a_{i,j} - k_i k_j / 2m) \delta(c_i, c_j)}{2m - \sum_{i,j} (k_i k_j / 2m) \delta(c_i, c_j)}$$
This is equivalent to partition the graph into communities corresponding to the types of the enumerative characteristic and compute the modularity of the division. Function modularity computes the absolute modularity of the solution.
We can also investigate homophily according to scalar characteristics. These are features whose values can be sorted, like age or salary. If network nodes with similar values of a scalar characteristic tend to be connected together more often that those with different values then the network is assortatively mixed according to such characteristic.
A measure of assortative mixing according to a scalar characteristic is the covariance. In probability theory and statistics, covariance is a measure of how much two random variables $X$ and $Y$ change together. Variance is a special case of the covariance when the two variables are identical. It is:
$$cov(X,Y) = E[(X - E[X]) (Y - E[Y])]$$where $E[\cdot]$ is the expected value of the random variable. If the variables change together (both above or below their means), then the covariance is positive, otherwise (one above the mean and the other below) it is negative.
The idea is to assign one variable $L$ (left) to one end of the edges of the network, and another variable $R$ (right) to the other end of the edges. These variables assume the values of the scalar quantity for the nodes at the ends of the edges (both edges $(i,j)$ and $(j,i)$ are considered in the undirected case). Let $x_i$ be the value for vertex $i$ of the scalar quantity. Consider the following example.
$$L = (x_1, x_1, x_1, x_2, x_2, x_3, x_4, x_3) \\ R = (x_2, x_3, x_4, x_3, x_1, x_1, x_1, x_2)$$ Notice that both lists $L$ and $R$ contain each value $x_i$ a number of times equal to the degree $k_i$ of node $i$. Hence the means $\mu_L$ of $L$ and $\mu_R$ of $R$ are equal. Similarly, the standard deviations $\sigma_L$ of $L$ and $\sigma_R$ of $R$ are the same.The covariance of $L$ and $R$ measures if the scalar quantities associated by the edge relationship vary in the same direction or in opposite directions. This is given by: $$cov(L,R) = \frac{\sum_{i,j} a_{i,j} (x_i - \mu) (x_j - \mu)}{\sum_{i,j} a_{i,j}}$$ where the mean $\mu = \mu_L = \mu_R$ is computed over the edges as follows: $$\mu = \frac{\sum_{i,j} a_{i,j} x_i}{\sum_{i,j} a_{i,j}} = \frac{\sum_{i} x_i \sum_j a_{i,j}}{\sum_{i} \sum_j a_{i,j}} = \frac{\sum_{i} k_{i} x_i}{\sum_{i} k_i}$$ The covariance is positive if values $x_i$ and $x_j$ for vertices connected by an edge tend to be both large (larger than the mean) or small (smaller than the mean), and negative if they vary in opposite directions.
Just as with the modularity measure, it is convenient to normalize the covariance so that it takes value one in a perfectly mixed network - one in which all edges fall between nodes with precisely the same values of $x_i$. Hence, putting $x_i = x_j$ in the covariance formula we have: $$cov_{max}(L,R) = \frac{\sum_{i,j} a_{i,j} (x_i - \mu)^2}{\sum_{i,j} a_{i,j}}$$ Notice that this quantity is the variance $\sigma^{2}_{L} = \sigma^{2}_{R}$ of values $x_i$ over the edges of the network. Since $\sigma^{2}_{L} = \sigma^{2}_{R} = \sigma_{L} \sigma_{R}$, the normalized assortativity measure is defined as the ratio: $$cor(L,R) = \frac{cov(L, R)}{\sigma_{L} \sigma_{R}} = \frac{\sum_{i,j} a_{i,j} (x_i - \mu) (x_j - \mu)}{\sum_{i,j} a_{i,j} (x_i - \mu)^2}$$ But this ratio is precisely the Pearson correlation coefficient for the scalar characteristic computed over the edges of the network. In this context, it is called assortativity coefficient. It varies between a maximum of 1 for perfectly assortative networks and a minimum of -1 for perfectly disassortative ones. A value of 0 implies uncorrelation of the values of the characteristic over the nodes of the edges. There could be, however, non-linear correlation; the correlation coefficient detects only linear correlation.
A special and important case of assortative mixing according to a scalar characteristic is mixing by degree. That is, the scalar property is the degree of a node. In a network with assortative mixing by degree the high-degree nodes will be preferentially connected to other high-degree nodes, and the low to low. In a social network, for example, we have assortative mixing if the gregarious people are friends with other gregarious people and the hermits with other hermits. We have disassortative mixing if the gregarious actors are hanging out with hermits.
The reason this particular case is interesting is that degree is itself a property of the network structure. Hence one property of the network (degree) influences another (positions of edges). This gives rise to some interesting features. An assortative network by degree has a core-periphery structure, with a core of high-degree nodes surrounded by a less dense periphery of nodes with low degrees. On the other hand, if a network is disassortative mixed, it contains star-like structures.
A measure of assortative mixing by degree is the assortativity coefficient where the quantities $x_i$ are now the degrees $k_i$:
$$cor(L,R) = \frac{\sum_{i,j} a_{i,j} (k_i - \mu) (k_j - \mu)}{\sum_{i,j} a_{i,j} (k_i - \mu)^2}$$The computation of the above sum costs $O(n^2)$. A less intensive way to compute the coefficient is by using the following formula: $$cor(L,R) = \frac{S_1 S_e - S_{2}^{2}}{S_1 S_3 - S_{2}^{2}}$$ where $$S_e = 2 \sum_{(i,j) \in E} k_i k_j, \, \, \, S_1 = \sum_i k_i, \, \, \, S_2 = \sum_i k_{i}^2, \, \, \, S_3 = \sum_i k_{i}^3$$ This version can be computed with $O(n+m)$ operations, which is lower than $O(n^2)$ if the network is sparse (as it is in real cases).
It appears that many real networks have a tendency to negative values of degree assortativity coefficient $r$. This because these networks are very often represented as simple graphs, that is, they do not allow multi-edges, and hence only a single edge may exist between two vertices. Graphs with only simple edges tend in the absence of other bias to show disassortative mixing by degree because the number of edges that can fall between high-degree vertex pairs is limited.
There is, however, one notable exception: social networks. There is a clear tendency for social networks to have positive assortativity coefficients. For example, the assortativity coefficient for the collaboration network of computer scientists is 0.17, and 0.30 if only journal collaboration is considered. Hence collaborative computer scientists tend to collaborate with other collaborative computer scientists, and solitary authors match preferentially with other solitary authors, in particular when they write journal papers.