Knowledge and Information Systems

, Volume 42, Issue 1, pp 181–213 | Cite as

Defining and evaluating network communities based on ground-truth

Regular Paper

Abstract

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100 million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods.

Keywords

Network communities Ground-truth communities Community detection Modularity Community scoring function 

References

  1. 1.
    Abrahao BD, Soundarajan S, Hopcroft JE, Kleinberg R (2012) On the separability of structural classes of communities. In KDD ’12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 624–632Google Scholar
  2. 2.
    Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 466:761–764CrossRefGoogle Scholar
  3. 3.
    Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. In FOCS ’06: proceedings of the 47th annual IEEE symposium on foundations of computer science, pp 475–486Google Scholar
  4. 4.
    Andersen R, Lang K (2006) Communities from seed sets. In: WWW ’06 proceedings of the 15th international conference on, World Wide Web, pp 223–232Google Scholar
  5. 5.
    Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth and evolution. In KDD ’06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 44–54Google Scholar
  6. 6.
    Danon L, Duch J, Diaz-Guilera A, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 29(09):P09008Google Scholar
  7. 7.
    Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957CrossRefGoogle Scholar
  8. 8.
    Feld SL (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035CrossRefGoogle Scholar
  9. 9.
    Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In KDD ’00: proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 150–160Google Scholar
  10. 10.
    Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174CrossRefMathSciNetGoogle Scholar
  11. 11.
    Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Nat Acad Sci USA 104(1):36–41CrossRefGoogle Scholar
  12. 12.
    Girvan M, Newman M (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99(12):7821–7826CrossRefMathSciNetMATHGoogle Scholar
  13. 13.
    Gleich DF, Seshadhri C (2012) Neighborhoods are good communities. In KDD ’12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 597–605Google Scholar
  14. 14.
    Granovetter MS (1973) The strength of weak ties. Am J Sociol 78:1360–1380CrossRefGoogle Scholar
  15. 15.
    Kairam S, Wang D, Leskovec J (2012) The life and death of online groups: predicting group growth and longevity. In WSDM ’12: ACM international conference on web search and data miningGoogle Scholar
  16. 16.
    Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392CrossRefMathSciNetGoogle Scholar
  17. 17.
    Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83–97CrossRefGoogle Scholar
  18. 18.
    Leskovec J, Adamic L, Huberman B (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5CrossRefGoogle Scholar
  19. 19.
    Leskovec J, Lang K, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In WWW ’10: proceedings of the 19th international conference on World Wide WebGoogle Scholar
  20. 20.
    Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123CrossRefMathSciNetMATHGoogle Scholar
  21. 21.
    Lin W, Kong X, Yu PS, Wu Q, Jia Y, Li C (2012) Community detection in incomplete information networks. In WWW ’12: proceedings of the 21st international conference on, World Wide Web, pp 341–350Google Scholar
  22. 22.
    Meilǎ M (2005) Comparing clusterings: an axiomatic view. In ICML ’05: proceedings of the 22nd international conference on machine learning. New York, NY, USA, pp 577–584Google Scholar
  23. 23.
    Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In IMC ’07: proceedings of the 7th ACM SIGCOMM conference on internet, measurement, pp 29–42Google Scholar
  24. 24.
    Newman M (2006) Modularity and community structure in networks. Proc Nat Acad Sci USA 103(23):8577–8582CrossRefGoogle Scholar
  25. 25.
    Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRefGoogle Scholar
  26. 26.
    Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818CrossRefGoogle Scholar
  27. 27.
    Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Nat Acad Sci USA 101(9):2658–2663CrossRefGoogle Scholar
  28. 28.
    Ren Y, Kraut R, Kiesler S (2007) Applying common identity and bond theory to design of online communities. Organ Stud 28(3):377–408CrossRefGoogle Scholar
  29. 29.
    Schaeffer S (2007) Graph clustering. Comp Sci Rev 1(1):27–64CrossRefMathSciNetMATHGoogle Scholar
  30. 30.
    Shi C, Yu PS, Cai Y, Yan Z, Wu B (2011) On selection of objective functions in multi-objective community detection. In CIKM ’11: proceedings of the 20th ACM international conference on information and, knowledge management, pp 2301–2304Google Scholar
  31. 31.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  32. 32.
    Spielman D, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC ’04: proceedings of the 36th annual ACM symposium on theory of computing, pp 81–90Google Scholar
  33. 33.
    Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In KDD ’09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–806Google Scholar
  34. 34.
    von Luxburg U (2010) Clustering stability: an overview. Found Trends Mach Learn 2(3):235–274Google Scholar
  35. 35.
    Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393:440–442CrossRefGoogle Scholar
  36. 36.
    Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state of the art and comparative study. ACM Comput Surv 45(4). Art no 43Google Scholar
  37. 37.
    Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In ICDM ’12: proceedings of the 2012 IEEE international conference on data mining, pp 1170–1175Google Scholar
  38. 38.
    Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In ICDM ’12: proceedings of the 2012 IEEE international conference on data mining, pp 745–754Google Scholar
  39. 39.
    Yang J, Leskovec J (2013) Overlapping community detection at scale: a non-negative factorization approach. In WSDM ’13: proceedings of the sixth ACM international conference on web search and data mining, pp 587–596Google Scholar
  40. 40.
    Yang J, Leskovec J (2013) Structure and overlaps of communities in networks. ACM Trans Intell Syst Technol (to appear)Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Stanford UniversityStanfordUSA

Personalised recommendations