Parallel Community Detection for Massive Graphs

  • E. Jason Riedy
  • Henning Meyerhenke
  • David Ediger
  • David A. Bader
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7203)

Abstract

Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore’s sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output’s modularity compares well with the standard sequential algorithms.

Keywords

Community detection parallel algorithm graph analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bader, D., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, W., Meuse, T.: HPCS SSCA#2 Graph Analysis Benchmark Specifications v1.1 (July 2005)Google Scholar
  2. 2.
    Bader, D., Madduri, K.: SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks. In: Proc. Int’l. Parallel and Distributed Processing Symp. (IPDPS 2008), Miami, FL (April 2008)Google Scholar
  3. 3.
    Bader, D., McCloskey, J.: Modularity and graph algorithms (September 2009), presented at UMBCGoogle Scholar
  4. 4.
    Berry, J., Hendrickson, B., LaViolette, R., Phillips, C.: Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E 83, 056119 (2011)CrossRefGoogle Scholar
  5. 5.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10), P10008 (2008)CrossRefGoogle Scholar
  6. 6.
    Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowledge and Data Engineering 20(2), 172–188 (2008)CrossRefGoogle Scholar
  7. 7.
    Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: Proc. 4th SIAM Intl. Conf. on Data Mining (SDM). SIAM, Orlando (2004)Google Scholar
  8. 8.
    Clauset, A., Newman, M., Moore, C.: Finding community structure in very large networks. Physical Review E 70(6), 66111 (2004)CrossRefGoogle Scholar
  9. 9.
    Facebook, Inc.: User statistics (October 2011), http://www.facebook.com/press/info.php?statistics
  10. 10.
    Fortunato, S., Barthélemy, M.: Resolution limit in community detection. Proc. of the National Academy of Sciences 104(1), 36–41 (2007)CrossRefGoogle Scholar
  11. 11.
    Fortunato, S.: Community detection in graphs. Physics Reports 486(3-5), 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gehweiler, J., Meyerhenke, H.: A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In: Proc. 7th High-Performance Grid Computing Workshop (HGCW 2010) in Conjunction with 24th Intl. Parallel and Distributed Processing Symposium (IPDPS 2010). IEEE Computer Society (2010)Google Scholar
  13. 13.
    Hoepman, J.H.: Simple distributed weighted matchings. CoRR cs.DC/0410047 (2004)Google Scholar
  14. 14.
    Lozano, S., Duch, J., Arenas, A.: Analysis of large social datasets by community detection. The European Physical Journal - Special Topics 143, 257–259 (2007)CrossRefGoogle Scholar
  15. 15.
    Manne, F., Bisseling, R.: A Parallel Approximation Algorithm for the Weighted Maximum Matching Problem. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 708–717. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Newman, M.: Modularity and community structure in networks. Proc. of the National Academy of Sciences 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  17. 17.
    Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)CrossRefGoogle Scholar
  18. 18.
    Noack, A., Rotta, R.: Multi-level Algorithms for Modularity Clustering. In: Vahrenhold, J. (ed.) SEA 2009. LNCS, vol. 5526, pp. 257–268. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  19. 19.
    Novick, M.B.: Fast parallel algorithms for the modular decomposition. Tech. rep., Cornell University, Ithaca, NY, USA (1989)Google Scholar
  20. 20.
    NYSE Euronext: Consolidated volume in NYSE listed issues, 2010 - current (March 2011), http://www.nyxdata.com/nysedata/asp/factbook/viewer_edition.asp?mode=table&key=3139&category=3
  21. 21.
    Preis, R.: Linear Time \(\frac{1}{2}\)-Approximation Algorithm for Maximum Weighted Matching in General Graphs. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, pp. 259–269. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  22. 22.
    Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. of the National Academy of Sciences 101(9), 2658 (2004)CrossRefGoogle Scholar
  23. 23.
    Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabási, A.L.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2002)CrossRefGoogle Scholar
  24. 24.
    Twitter, Inc.: Happy birthday Twitter! (March 2011), http://blog.twitter.com/2011/03/happy-birthday-twitter.html
  25. 25.
    Wakita, K., Tsurumi, T.: Finding community structure in mega-scale social networks. CoRR abs/cs/0702048 (2007)Google Scholar
  26. 26.
    Wilkinson, D.M., Huberman, B.A.: A method for finding communities of related genes. Proceedings of the National Academy of Sciences of the United States of America 101(suppl. 1), 5241–5248 (2004)CrossRefGoogle Scholar
  27. 27.
    Zhang, Y., Wang, J., Wang, Y., Zhou, L.: Parallel community detection on large networks with propinquity dynamics. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 997–1006. ACM, New York (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • E. Jason Riedy
    • 1
  • Henning Meyerhenke
    • 1
    • 2
  • David Ediger
    • 1
  • David A. Bader
    • 1
  1. 1.Georgia Institute of TechnologyAtlantaUSA
  2. 2.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations