Design of OpenMP-based Parallel Louvain algorithm for community detection,
with no internally-disconnected communities.
Community detection entails the identification of clusters of vertices that exhibit stronger connections within themselves compared to the wider network. The Louvain method, a commonly utilized heuristic for this task, employs a two-step process comprising a local-moving phase and an aggregation phase. This process iteratively optimizes the modularity metric, a measure of community quality. Although widely used, the Louvain method has been noted for generating internally-disconnected and poorly connected communities. In response, Traag et al. introduce the Leiden algorithm, which incorporates an extra refinement phase between the local-moving and aggregation phases. This refinement step enables vertices to explore and potentially create sub-communities within the identified communities from the local-moving phase. In this report, we propose another BFS-based approach, GSP-Louvain, for addressing the same issue. This is different from a number of earlier works, which tackled internally-disconnected communities as a post-processing step.
Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, GSP-Louvain on 13 different graphs. GSP-Louvain surpasses the original Leiden, igraph Leiden, and NetworKit Leiden by 341×
, 83×
, and 6.1×
respectively, achieving a processing rate of 328M
edges/s on a 3.8𝐵
edge graph.
Below we plot the speedup of GSP-Louvain wrt original Leiden, igraph Leiden, and NetworKit Leiden.
Next, we compare the modularity of communities identified by the original Leiden algorithm, igraph Leiden, NetworKit Leiden, and GSP-Louvain. On average, GSP-Louvain achieves 0.3%
lower modularity than the original Leiden and igraph Leiden, respectively, and 25%
higher modularity than NetworKit Leiden, particularly evident on road networks and protein k-mer graphs.
Finally, we plot the fraction of disconnected communities identified by each implementation. Absence of bars indicates the absence of disconnected communities. The original Leiden, igraph Leiden, and GSP-Louvain do not identify any communities that are internally-disconnected. However, on average, NetworKit Leiden exhibit fraction of disconnected communities amounting to 1.5×10^−2
, particularly on web graphs and social networks.
Refer to our technical reports for more details:
An Approach for Addressing Internally-Disconnected Communities in Louvain Algorithm.
Note
You can just copy main.sh
to your system and run it.
For the code, refer to main.cxx
.
The code structure of GSP-Louvain is as follows:
- inc/_algorithm.hxx: Algorithm utility functions
- inc/_bitset.hxx: Bitset manipulation functions
- inc/_cmath.hxx: Math functions
- inc/_ctypes.hxx: Data type utility functions
- inc/_cuda.hxx: CUDA utility functions
- inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/_iostream.hxx: Input/output stream functions
- inc/_iterator.hxx: Iterator utility functions
- inc/_main.hxx: Main program header
- inc/_mpi.hxx: MPI (Message Passing Interface) utility functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_queue.hxx: Queue utility functions
- inc/_random.hxx: Random number generation functions
- inc/_string.hxx: String utility functions
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions
- inc/batch.hxx: Batch update generation functions
- inc/bfs.hxx: Breadth-first search algorithms
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/dfs.hxx: Depth-first search algorithms
- inc/duplicate.hxx: Graph duplicating functions
- inc/Graph.hxx: Graph data structure functions
- inc/louvian.hxx: Louvian algorithm functions
- inc/louvainSplit.hxx: Louvain with no disconnected communities
- inc/main.hxx: Main header
- inc/mtx.hxx: Graph file reading functions
- inc/properties.hxx: Graph Property functions
- inc/selfLoop.hxx: Graph Self-looping functions
- inc/symmetricize.hxx: Graph Symmetricization functions
- inc/transpose.hxx: Graph transpose functions
- inc/update.hxx: Update functions
- main.cxx: Experimentation code
- process.js: Node.js script for processing output logs
Note that each branch in this repository contains code for a specific experiment. The main
branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.
- Fast unfolding of communities in large networks; Vincent D. Blondel et al. (2008)
- Community Detection on the GPU; Md. Naim et al. (2017)
- Scalable Static and Dynamic Community Detection Using Grappolo; Mahantesh Halappanavar et al. (2017)
- From Louvain to Leiden: guaranteeing well-connected communities; V.A. Traag et al. (2019)
- CS224W: Machine Learning with Graphs | Louvain Algorithm; Jure Leskovec (2021)
- The University of Florida Sparse Matrix Collection; Timothy A. Davis et al. (2011)
- Fetch-and-add using OpenMP atomic operations