Other community detection methods
There are several other community detection methods. Those implemented in igraph are briefly described below:
-
cluster_fast_greedy starts out with each vertex in our network in a one-vertex group of its own, and then successively amalgamate groups in pairs, choosing at each step the pair whose amalgamation gives the biggest increase in modularity, or the smallest decrease if no choice gives an increase. Eventually all vertices are amalgamated into a single large community and the algorithm ends. Then we go back over the states through which the network passed during the course of the algorithm and select the one with the highest value of the modularity.
-
cluster_edge_betweenness looks for the edges that lie between communities. If we can find and remove these edges, we will be left with just the isolated communities. To identify edges between communities one common approach is to use edge betweenness centrality, which counts the number of geodesic paths that run along edges. A less expensive alternative would be to look for edges that belong to an unusually small number of short loops.
-
cluster_walktrap is an approach based on random walks. The general idea is that if you perform random walks on the graph, then the walks are more likely to stay within the same community because there are only a few edges that lead outside a given community. This method runs short random walks of 3-4-5 steps (depending on one of its parameters) and uses the results of these random walks to merge separate communities in a bottom-up manner. You can use the modularity score to select where to cut the dendrogram.
-
cluster_spinglass is an approach from statistical physics, based on the so-called Potts model. In this model, each particle (i.e. vertex) can be in one of $k$ spin states, and the interactions between the particles (i.e. the edges of the graph) specify which pairs of vertices would prefer to stay in the same spin state and which ones prefer to have different spin states. The model is then simulated for a given number of steps, and the spin states of the particles in the end define the communities. The method is not particularly fast and not deterministic (because of the simulation itself), but has a tunable resolution parameter that determines the cluster sizes.
-
cluster_label_prop
is a simple approach in which every node is assigned one of $k$ labels. The method then proceeds iteratively and re-assigns labels to nodes in a way that each node takes the most frequent label of its neighbors in a synchronous manner. The method stops when the label of each node is one of the most frequent labels in its neighborhood. It is very fast but yields different results based on the initial configuration (which is decided randomly), therefore one should run the method a large number of times and then build a consensus labeling, which could be tedious.
-
cluster_infomap is based on information theoretic principles; it tries to build a grouping which provides the shortest description length for a random walk on the graph, where the description length is measured by the expected number of bits per vertex required to encode the path of a random walk. The underlying intuition is the following: when we describe a network as a set of interconnected communities, we are highlighting certain regularities of the network's structure while filtering out the relatively unimportant details. Thus, a modular description of a network can be viewed as a lossy compression of that network's topology. The best maps are those that convey a great deal of information while requiring minimal bandwidth; i.e., they are good compressions.
-
cluster_louvain
is based on the modularity measure and a hierarchical approach. Initially, each vertex is assigned to a community on its own. In every step, vertices are re-assigned to communities in a local, greedy way: each vertex is moved to the community with which it achieves the highest contribution to modularity. When no vertices can be reassigned, each community is considered a vertex on its own, and the process starts again with the merged communities. The process stops when there is only a single vertex left or when the modularity cannot be increased any more in a step. Leiden community detection in a recent improvement over Louvain algorithm (here an implementation in Python).
-
cluster_optimal calculates the optimal community structure for a graph, in terms of maximal modularity score.
The calculation is done by transforming the modularity maximization into an integer programming problem, and then calling the GLPK library to solve that. Graphs with up to fifty vertices should be fine, graphs with a couple of hundred vertices might be possible.