Community Discovery: Simple and Scalable Approaches

Part of the Human–Computer Interaction Series book series (HCIS)


The increasing size and complexity of online social networks have brought distinct challenges to the task of community discovery. A community discovery algorithm needs to be efficient, not taking a prohibitive amount of time to finish. The algorithm should also be scalable, capable of handling large networks containing billions of edges or even more. Furthermore, a community discovery algorithm should be effective in that it produces community assignments of high quality. In this chapter, we present a selection of algorithms that follow simple design principles, and have proven highly effective and efficient according to extensive empirical evaluations. We start by discussing a generic approach of community discovery by combining multilevel graph contraction with core clustering algorithms. Next we describe the usage of network sampling in community discovery, where the goal is to reduce the number of nodes and/or edges while retaining the network’s underlying community structure. Finally, we review research efforts that leverage various parallel and distributed computing paradigms in community discovery, which can facilitate finding communities in tera- and peta-scale networks.


Spectral Cluster Community Detection Original Graph Community Detection Algorithm Community Discovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We are thankful to the Editors and anonymous reviewers for their valuable comments, insightful suggestions and constructive feedback that greatly helped improving this article.

This work is supported by NSF Grants IIS-1111118, CCF-1240651, and DMS-1418265. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


  1. 1.
    Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery. ACM, pp 36–43Google Scholar
  2. 2.
    Aggarwal CC, Zhao Y, Philip SY (2010) On clustering graph streams. In: SDM. SIAM, pp 478–489Google Scholar
  3. 3.
    Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008Google Scholar
  4. 4.
    Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. In: Proceedings of the thirtieth annual ACM symposium on theory of computing. ACM, pp 327–336Google Scholar
  5. 5.
    Bui TN, Jones C (1993) A heuristic for reducing fill-in in sparse matrix factorization. In: PPSC, pp 445–452Google Scholar
  6. 6.
    Bustamam A, Burrage K, Hamilton NA (2012) Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(3):679–692CrossRefGoogle Scholar
  7. 7.
    Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thirty-fourth annual ACM symposium on theory of computing. ACM, pp 380–388Google Scholar
  8. 8.
    Chung FR (1997) Spectral graph theory, vol 92. American Mathematical Society, ProvidencezbMATHGoogle Scholar
  9. 9.
    Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944CrossRefGoogle Scholar
  10. 10.
    Diniz PC, Plimpton S, Hendrickson B, Leland RW (1995) Parallel algorithms for dynamically partitioning unstructured grids. In: PPSC, pp 615–620Google Scholar
  11. 11.
    Doreian P, Mrvar A (2009) Partitioning signed social networks. Soc Netw 31(1):1–11CrossRefzbMATHGoogle Scholar
  12. 12.
    Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: 19th conference on design automation. IEEE, pp 175–181Google Scholar
  13. 13.
    Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305MathSciNetzbMATHGoogle Scholar
  14. 14.
    Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174MathSciNetCrossRefGoogle Scholar
  15. 15.
    George A, Liu J (1981) Computer solution of large sparse positive definite systems. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  16. 16.
    Heath MT, Ng E, Peyton BW (1991) Parallel algorithms for sparse linear systems. SIAM Rev 33(3):420–460MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Hubler C, Kriegel HP, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. In: Eighth IEEE international conference on data mining, ICDM’08. IEEE, pp 283–292Google Scholar
  18. 18.
    Kang U, Meeder B, Papalexakis EE, Faloutsos C (2014) HEigen: spectral analysis for billion-scale graphs. IEEE Trans Knowl Data Eng 26(2):350–362CrossRefGoogle Scholar
  19. 19.
    Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Karypis G, Kumar V (1999) Parallel multilevel series k-way partitioning scheme for irregular graphs. Siam Rev 41(2):278–300MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291–307CrossRefzbMATHGoogle Scholar
  22. 22.
    Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110CrossRefGoogle Scholar
  23. 23.
    Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLOS ONE 6(4):e18961CrossRefGoogle Scholar
  24. 24.
    Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 631–636Google Scholar
  25. 25.
    Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 177–187Google Scholar
  26. 26.
    Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on world wide web. ACM, pp 695–704Google Scholar
  27. 27.
    Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web. ACM, pp 631–640Google Scholar
  28. 28.
    Leung IX, Hui P, Lio P, Crowcroft J (2009) Towards real-time community detection in large networks. Phys Rev E 79(6):066107CrossRefGoogle Scholar
  29. 29.
    Luby M (1986) A simple parallel algorithm for the maximal independent set problem. SIAM J Comput 15(4):1036–1053MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Macskassy SA, Provost F (2007) Classification in networked data: a toolkit and a univariate case study. J Mach Learn Res 8:935–983Google Scholar
  31. 31.
    Maiya AS, Berger-Wolf TY (2010) Sampling community structure. In: Proceedings of the 19th international conference on world wide web. ACM, pp 701–710Google Scholar
  32. 32.
    Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRefGoogle Scholar
  33. 33.
    Niu Q, Lai PW, Faisal SM, Parthasarathy S, Sadayappan P (2014) A fast implementation of mlr-mcl algorithm on multi-core processors. In: 21st annual international conference on high performance computing, HiPC 2014, Goa, India, 17–20 December 2014Google Scholar
  34. 34.
    Ovelgonne M (2013) Distributed community detection in web-scale networks. In: 2013 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 66–73Google Scholar
  35. 35.
    Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24(3):515–554CrossRefGoogle Scholar
  36. 36.
    Parlett BN (1980) The symmetric eigenvalue problem, vol 7. SIAM, PhiladelphiazbMATHGoogle Scholar
  37. 37.
    Parthasarathy S, Faisal SM (2013) Network clustering. CRC Press, Boca Raton, pp 415–456Google Scholar
  38. 38.
    Parthasarathy S, Ruan Y, Satuluri V (2011) Community discovery in social networks: applications, methods and emerging trends. Social network data analytics. Springer, Berlin, pp 79–113CrossRefGoogle Scholar
  39. 39.
    Pemmaraju S, Skiena S (2003) Computational discrete mathematics: combinatorics and graph theory with mathematica. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  40. 40.
    Prat-Pérez A, Dominguez-Sal D, Brunat JM, Larriba-Pey JL (2012) Shaping communities out of triangles. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 1677–1681Google Scholar
  41. 41.
    Prat-Pérez A, Dominguez-Sal D, Larriba-Pey JL (2014) High quality, scalable and parallel community detection for large real graphs. In: Proceedings of the 23rd international conference on world wide web, international world wide web conferences steering committee, pp 225–236Google Scholar
  42. 42.
    Richter Y, Yom-Tov E, Slonim N (2010) Predicting customer churn in mobile networks through analysis of social groups. In: SDM. SIAM, vol 2010, pp 732–741Google Scholar
  43. 43.
    Riedy EJ, Meyerhenke H, Ediger D, Bader DA (2012) Parallel community detection for massive graphs. Parallel processing and applied mathematics. Springer, Berlin, pp 286–296CrossRefGoogle Scholar
  44. 44.
    Ruan Y, Fuhry D, Parthasarathy S (2013) Efficient community detection in large networks using content and links. In: Proceedings of the 22nd international conference on world wide web, international world wide web conferences steering committee, pp 1089–1098Google Scholar
  45. 45.
    Satuluri V, Parthasarathy S (2009) Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 737–746Google Scholar
  46. 46.
    Satuluri V, Parthasarathy S, Ruan Y (2011) Local graph sparsification for scalable clustering. In: Proceedings of the 2011 international conference on management of data. ACM, pp 721–732Google Scholar
  47. 47.
    Soffer SN, Vázquez A (2005) Network clustering coefficient without degree-correlation biases. Phys Rev E 71(5):057101CrossRefGoogle Scholar
  48. 48.
    Staudt CL, Meyerhenke H (2013) Engineering high-performance community detection heuristics for massive graphs. In: Proceedings of the 2013 42nd international conference on parallel processing. IEEE Computer Society, pp 180–189Google Scholar
  49. 49.
    Van Dongen SM (2000) Graph clustering by flow simulation. Ph.D. thesis, University of UtrechtGoogle Scholar
  50. 50.
    Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416MathSciNetCrossRefGoogle Scholar
  51. 51.
    Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv (CSUR) 45(4):43CrossRefzbMATHGoogle Scholar
  52. 52.
    Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19(10):1333–1348CrossRefGoogle Scholar
  53. 53.
    Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on web search and data mining. ACM, pp 587–596Google Scholar
  54. 54.
    Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 1151–1156Google Scholar
  55. 55.
    Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 927–936Google Scholar
  56. 56.
    Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA

Personalised recommendations