# A LexDFS-Based Approach on Finding Compact Communities

## Abstract

This article presents an efficient hierarchical clustering algorithm based on a graph traversal algorithm called LexDFS. This traversal algorithm has the property of going through the clustered parts of the graph in a small number of iterations, making them recognisable. The time complexity of our method is in *O*(*n* × log(*n*)). It is simple to implement and a thorough study shows that it outputs clusterings that are closer to some ground-truths than its competitors. Experiments are also carried out to analyse the behaviour of the algorithm during execution on sample graphs. This article also features a quality function called *compactness*, which measures how efficient is the cluster for internal communications. We prove that this quality function features interesting theoretical properties.

## Keywords

Community detection Compactness LexDFS## Notes

### Acknowledgements

The authors thank Loïck Lhote for his help with the proof of continuity.

## References

- 1.Adamcsek B, Palla G, Farkas I, Derényi I, Vicsek T. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–23CrossRefGoogle Scholar
- 2.Aldecoa R, Marín I. Surprise maximization reveals the community structure of complex networks. Sci Rep 2013;3. http://www.nature.com/articles/srep01060?WT.ec_id=SREP-631-20130201 and http://www.nature.com/articles/srep02930
- 3.Bagga A, Baldwin B. Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th international conference on computational linguistics, vol. 1. Stroudsburg: Association for Computational Linguistics; 1998. P. 79–85Google Scholar
- 4.Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999; 286(5439):509–12MathSciNetCrossRefzbMATHGoogle Scholar
- 5.Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008; 2008(10): P10,008Google Scholar
- 6.Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, et al. On modularity clustering. IEEE Trans Knowl Data Eng 2008;20(2):172–88CrossRefzbMATHGoogle Scholar
- 7.Chakraborty T, Sikdar S, Ganguly N, Mukherjee A. Citation interactions among computer science fields: a quantitative route to the rise and fall of scientific research. Soc Netw Anal Min 2014;4(1):1–18CrossRefGoogle Scholar
- 8.Chakraborty T, Sikdar S, Tammana V, Ganguly N, Mukherjee A. Computer science fields as ground-truth communities: their impact, rise and fall. In: Proceedings of advances in social networks analysis and mining (ASONAM). New York: ACM, 2013. P. 426–33Google Scholar
- 9.Chakraborty T, Srinivasan S, Ganguly N, Mukherjee A, Bhowmick S. On the permanence of vertices in network communities. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2014. New York, NY: ACM; 2014. P. 1396–405Google Scholar
- 10.Clauset A, Newman M, Moore C. Finding community structure in very large networks. Phys Rev E 2004;70(6). http://journals.aps.org/pre/abstract/10.1103/PhysRevE.70.066111
- 11.Corneil DG, Dalton B, Habib M. LDFS-based certifying algorithm for the minimum path cover problem on cocomparability graphs. SIAM J Comput 2013;42(3):792–807MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Corneil DG, Krueger RM. A unified view of graph searching. SIAM J Discr Math 2008;22(4):1259–276MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Creusefond J, Largillier T, Peyronnet S. Finding compact communities in large graphs. In: Proceedings of advances in social networks analysis and mining (ASONAM), 2015. ACM; 2015. P. 1457–464Google Scholar
- 14.Creusefond J, Largillier T, Peyronnet S. On the evaluation potential of quality functions in community detection for different contexts. In: Advances in network science. Springer; 2016. P. 111–125Google Scholar
- 15.Flake GW, Lawrence S, Giles CL. Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2000. P. 150–60Google Scholar
- 16.Fortunato S. Community detection in graphs. Phys Rep 2010;486(3–5):75–174MathSciNetCrossRefGoogle Scholar
- 17.Fortunato S, Barthelemy M. Resolution limit in community detection. Proc Natl Acad Sci 2007;104(1):36–41CrossRefGoogle Scholar
- 18.Girvan M, Newman ME. Community structure in social and biological networks. Proc Natl Acad Sci 2002;99(12):7821–826Google Scholar
- 19.Hansen P, Jaumard B. Minimum sum of diameters clustering. J Class 1987;4(2):215–26MathSciNetCrossRefzbMATHGoogle Scholar
- 20.Hu Y. Efficient, high-quality force-directed graph drawing. Math J 2005;10(1):37–71MathSciNetGoogle Scholar
- 21.Kannan R, Vempala S, Vetta A. On clusterings: good, bad and spectral. J ACM (JACM) 2004;51(3):497–515MathSciNetCrossRefzbMATHGoogle Scholar
- 22.Klimt B, Yang Y. Introducing the enron corpus. In: CEAS. 2004zbMATHGoogle Scholar
- 23.Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 2009;11(3):033015CrossRefGoogle Scholar
- 24.Leskovec J, Kleinberg J, Faloutsos C. Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 2007;1(1):2CrossRefGoogle Scholar
- 25.Leskovec J, Lang KJ, Dasgupta A, Mahoney MW. Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web. ACM; 2008. P. 695–704Google Scholar
- 26.Leskovec J, Lang KJ, Mahoney M. Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web. ACM; 2010. P. 631–40Google Scholar
- 27.Leskovec J, Mcauley JJ. Learning to discover social circles in ego networks. In: Advances in neural information processing systems; 2012. P. 539–47Google Scholar
- 28.Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix internet measurement conference (IMC 2007), San Diego, CA; 2007Google Scholar
- 29.Newman ME, Girvan M. Finding and evaluating community structure in networks. Phys Rev E 2004;69(2):026113CrossRefGoogle Scholar
- 30.Pons P, Latapy M. Computing communities in large networks using random walks. J Graph Algorithms Appl 2006;10(2):191–218MathSciNetCrossRefzbMATHGoogle Scholar
- 31.Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA 2004;101(9):2658–2663CrossRefGoogle Scholar
- 32.Raghavan U, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 2007;76(3). http://journals.aps.org/pre/abstract/10.1103/PhysRevE.76.036106
- 33.Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 2008;105(4):1118–123CrossRefGoogle Scholar
- 34.Seidman SB. Network structure and minimum degree. Soc Netw 1983;5(3):269–87MathSciNetCrossRefGoogle Scholar
- 35.Šubelj L, Bajec M. Model of complex networks based on citation dynamics. In: Proceedings of the 22nd international conference on World Wide Web; 2013. P. 527–30Google Scholar
- 36.Tarjan RE. Efficiency of a good but not linear set union algorithm. J ACM (JACM) 1975;22(2):215–25MathSciNetCrossRefzbMATHGoogle Scholar
- 37.Traag VA, Krings G, Van Dooren P. Significant scales in community structure. Sci Rep 2013;3. http://www.nature.com/articles/srep01060?WT.ec_id=SREP-631-20130201 and http://www.nature.com/articles/srep02930
- 38.van Dongen S. Graph clustering by flow simulation. Ph.D. thesis (2000)Google Scholar
- 39.Van Laarhoven T, Marchiori E.: Axioms for graph clustering quality functions. J Mach Learn Res 2014;15(1):193–215MathSciNetzbMATHGoogle Scholar
- 40.Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature 1998;393(6684):440–42CrossRefGoogle Scholar
- 41.Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 2012;42(1):81–213Google Scholar