Advertisement

Distributed Graph Clustering Using Modularity and Map Equation

  • Michael Hamann
  • Ben Strasser
  • Dorothea Wagner
  • Tim ZeitzEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11014)

Abstract

We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internally-dense-externally-sparse principle. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is straight-forward. We conduct an extensive experimental study on real-world graphs and on synthetic benchmark graphs with up to 68 billion edges. Our algorithms are fast while detecting clusterings similar to those detected by other sequential, parallel and distributed clustering algorithms. Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is up to an order of magnitude faster and achieves better quality.

References

  1. 1.
    Bader, D.A., Meyerhenke, H., Sanders, P., Schulz, C., Kappes, A., Wagner, D.: Benchmarking for graph clustering and partitioning. In: Rokne, J., Alhajj, R. (eds.) Encyclopedia of Social Network Analysis and Mining, pp. 73–82. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-1-4614-6170-8CrossRefGoogle Scholar
  2. 2.
    Bae, S., Halperin, D., West, J.D., Rosvall, M., Howe, B.: Scalable and efficient flow-based community detection for large-scale graph analysis. ACM Trans. Knowl. Disc. Data 11(3), 32:1–32:30 (2017)Google Scholar
  3. 3.
    Bae, S., Howe, B.: GossipMap: a distributed community detection algorithm for billion-edge directed graphs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 27:1–27:12. ACM Press (2015)Google Scholar
  4. 4.
    Bingmann, T., et al.: Thrill: high-performance algorithmic distributed batch data processing with C++. Technical report, arXiv arXiv:1608.05634 (2016)
  5. 5.
    Blondel, V., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10) (2008)CrossRefGoogle Scholar
  6. 6.
    Brandes, U., et al.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20(2), 172–188 (2008)CrossRefGoogle Scholar
  7. 7.
    Buzun, N., et al.: EgoLP: fast and distributed community detection in billion-node social networks. In: Proceedings of the 2014 IEEE International Conference on Data Mining Workshops, pp. 533–540. IEEE Computer Society (2014)Google Scholar
  8. 8.
    Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fortunato, S., Barthélemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. U.S.A. 104(1), 36–41 (2007)CrossRefGoogle Scholar
  10. 10.
    Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Good, B.H., de Montjoye, Y.A., Clauset, A.: Performance of modularity maximization in practical contexts. Phys. Rev. E 81, 046106 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hamann, M., Meyer, U., Penschuck, M., Wagner, D.: I/O-efficient generation of massive graphs following the LFR benchmark. In: Proceedings of the 19th Meeting on Algorithm Engineering and Experiments (ALENEX 2017). SIAM (2017)Google Scholar
  13. 13.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefGoogle Scholar
  14. 14.
    Karypis, G., Kumar, V.: A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput. 48, 71–95 (1998)CrossRefGoogle Scholar
  15. 15.
    Kawamoto, T., Rosvall, M.: Estimating the resolution limit of the map equation in community detection. Phys. Rev. E 91, 012809 (2015)CrossRefGoogle Scholar
  16. 16.
    Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRefGoogle Scholar
  17. 17.
    Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
  18. 18.
    Ling, X., Yang, J., Wang, D., Chen, J., Li, L.: Fast community detection in large weighted networks using graphx in the cloud. In: 18th IEEE International Conference on High Performance Computing and Communications, pp. 1–8. IEEE (2016)Google Scholar
  19. 19.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(026113), 1–16 (2004)Google Scholar
  20. 20.
    Que, X., Checconi, F., Petrini, F., Gunnels, J.A.: Scalable community detection with the louvain algorithm. In: 29th International Parallel and Distributed Processing Symposium (IPDPS 2015), pp. 28–37. IEEE Computer Society (2015)Google Scholar
  21. 21.
    Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. The Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)CrossRefGoogle Scholar
  22. 22.
    Staudt, C., Meyerhenke, H.: Engineering parallel algorithms for community detection in massive networks. IEEE Trans. Parallel Distrib. Syst. 27(1), 171–184 (2016)CrossRefGoogle Scholar
  23. 23.
    Wickramaarachchi, C., Frincu, M., Small, P., Prasanna, V.K.: Fast parallel algorithm for unfolding of communities in large graphs. In: Proceedings of the 2014 IEEE High Performance Extreme Computing Conference, pp. 1–6. IEEE (2014)Google Scholar
  24. 24.
    Zeitz, T.: Engineering distributed graph clustering using MapReduce. Master’s thesis, Karlsruhe Institute of Technology (2017)Google Scholar
  25. 25.
    Zeng, J., Yu, H.: A study of graph partitioning schemes for parallel graph community detection. Parallel Comput. 58, 131–139 (2016)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Michael Hamann
    • 1
  • Ben Strasser
    • 1
  • Dorothea Wagner
    • 1
  • Tim Zeitz
    • 1
    Email author
  1. 1.Institute of Theoretical InformaticsKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations