Skip to content

Latest commit

 

History

History
108 lines (79 loc) · 5.65 KB

README.md

File metadata and controls

108 lines (79 loc) · 5.65 KB

Design of OpenMP-based Parallel Label Propagation Algorithm (LPA), aka RAK, for community detection.

In recent years, there has been an unprecedented increase in data collection, and their representation as graphs. Thus, there is a demand for efficient parallel algorithms designed to identify communities in large networks. The significance of the multicore/shared memory environment in this context is crucial, both due to its energy efficiency, and due to widespread availability of hardware with large memory capacities. Existing research on Label Propagation Algorithm (LPA) focuses on algorithmic optimizations and parallelization but lacks exploration of efficient data structures for selecting the most weighted label. Furthermore, the proposed techniques are dispersed across multiple papers, making it challenging for readers to grasp and implement them effectively. To address this, we introduce GVE-LPA, an optimized parallel implementation of LPA for shared memory multicores.

Below we plot the time taken by FLPA, igraph LPA, NetworKit LPA, and GVE-LPA on 13 different graphs. GVE-LPA surpasses FLPA, igraph LPA, and NetworKit LPA by 139×, 97,000×, and 40× respectively, achieving a processing rate of 1.4𝐵 edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GVE-LPA wrt FLPA, igraph LPA, and NetworKit LPA.

Next, we plot the modularity of communities identified by FLPA, igraph LPA, NetworKit LPA, and GVE-LPA. GVE-LPA on average obtains 6.6% / 0.2% higher modularity than FLPA and igraph LPA respectively (especially on web graphs), and 4.1% lower modularity than NetworKit LPA (especially on protein k-mer graphs with large number of vertices and a low average degree).

Finally, we plot the strong scaling behaviour of GVE-LPA. With doubling of threads, GVE-LPA exhibits an average performance scaling of 1.7×.

Refer to our technical report for more details:
GVE-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection in Shared Memory Setting.


Note

You can just copy main.sh to your system and run it.
For the code, refer to main.cxx.



Code structure

The code structure of GVE-LPA is as follows:

- inc/_algorithm.hxx: Algorithm utility functions
- inc/_bitset.hxx: Bitset manipulation functions
- inc/_cmath.hxx: Math functions
- inc/_ctypes.hxx: Data type utility functions
- inc/_cuda.hxx: CUDA utility functions
- inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/_iostream.hxx: Input/output stream functions
- inc/_iterator.hxx: Iterator utility functions
- inc/_main.hxx: Main program header
- inc/_mpi.hxx: MPI (Message Passing Interface) utility functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_queue.hxx: Queue utility functions
- inc/_random.hxx: Random number generation functions
- inc/_string.hxx: String utility functions
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions
- inc/batch.hxx: Batch update generation functions
- inc/bfs.hxx: Breadth-first search algorithms
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/dfs.hxx: Depth-first search algorithms
- inc/duplicate.hxx: Graph duplicating functions
- inc/Graph.hxx: Graph data structure functions
- inc/rak.hxx: LPA/RAK community detection algorithm functions
- inc/main.hxx: Main header
- inc/mtx.hxx: Graph file reading functions
- inc/properties.hxx: Graph Property functions
- inc/selfLoop.hxx: Graph Self-looping functions
- inc/symmetricize.hxx: Graph Symmetricization functions
- inc/transpose.hxx: Graph transpose functions
- inc/update.hxx: Update functions
- main.cxx: Experimentation code
- process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.



References




ORG