MDPCluster: a swarm-based community detection algorithm in large-scale graphs

Abstract

Social network analysis has become an important topic for researchers in sociology and computer science. Similarities among individuals form communities as the basic constitutions of social networks. Regarding the importance of communities, community detection is a fundamental step in the study of social networks typically modeled as large-scale graphs. Detecting communities in such large-scale graphs which generally suffers from the curse of dimensionality is the main objective followed in this study. An efficient modularity-based community detection algorithm called MDPCluster is introduced in order to detect communities in large-scale graphs in a timely manner. To address the high dimensionality problem, first, a Louvain-based algorithm is utilized by MDPCluster to distinguish initial communities as super-nodes and then a Modified Discrete Particle Swarm Optimization algorithm, called MDPSO is leveraged to detect communities through maximizing modularity measure. MDPSO discretizes Particle Swarm Optimization using the idea of transmission tendency and also escapes from premature convergence thereby a mutation operator inspired by Genetic Algorithm. To evaluate the proposed method, six standard datasets, i.e., American College Football, Books about US Politics, Amazon Product Co-purchasing, DBLP, GR-QC and HEP-TH have been employed. The first two are known as synthetic datasets whereas the rest are real-world datasets. In comparison to eight state-of-the-art algorithms, i.e., Stationary Genetic Algorithm, Generational Genetic Algorithm, Simulated Annealing-Stationary Genetic Algorithm, Simulated Annealing-Generational Genetic Algorithm, Grivan–Newman, Danon and Label Propagation Algorithm, the results indicate the superiorities of MDCluster in terms of modularity, Normalized Mutual Information and execution time as well.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. 1.

    www.amazon.com.

  2. 2.

    Modularity and an Improved Genetic Algorithm.

References

  1. 1.

    González-Bailón S (2013) Social science in the era of big data. Policy Int 5(2):147–160

    Article  Google Scholar 

  2. 2.

    Javadi, SHS, Gharani P, Khadivi S (2018) Detecting community structure in dynamic social networks using the concept of leadership. In: Sustainable interdependent networks. Springer, pp 97–118

  3. 3.

    Jiang JQ, McQuay LJ (2012) Modularity functions maximization with nonnegative relaxation facilitates community detection in networks. Physica A 391(3):854–865

    Article  Google Scholar 

  4. 4.

    Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213

    Article  Google Scholar 

  5. 5.

    Kadirkamanathan V, Selvarajah K, Fleming PJ (2006) Stability analysis of the particle dynamics in particle swarm optimizer. IEEE Trans Evol Comput 10(3):245–255

    Article  Google Scholar 

  6. 6.

    Du K-L, Swamy M (2016) Particle swarm optimization. In: Search and optimization by metaheuristics. Springer, pp 153–173

  7. 7.

    Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm Intell 1(1):33–57

    Article  Google Scholar 

  8. 8.

    Yuan X, Wang L, Yuan Y (2008) Application of enhanced PSO approach to optimal scheduling of hydro system. Energy Convers Manag 49(11):2966–2972

    Article  Google Scholar 

  9. 9.

    Liu B, Wang L, Jin Y-H (2007) An effective PSO-based memetic algorithm for flow shop scheduling. IEEE Trans Syst Man Cybern Part B Cybern 37(1):18–27

    Article  Google Scholar 

  10. 10.

    Liu B, Wang L, Jin Y, Huang D (2005) Advances in particle swarmoptimization algorithm. Control Instrum Chem Ind 32(3):1–6

    Google Scholar 

  11. 11.

    Aung TT, Nyunt TTS (2019) Discrete artificial bee colony algorithm for community detection in social networks. In: Seventeenth international conference on computer applications (ICCA 2019)

  12. 12.

    Ghoshal AK, Das N, Bhattacharjee S, Chakraborty G (2019) A fast parallel genetic algorithm based approach for community detection in large networks. In: 2019 11th international conference on communication systems and networks (COMSNETS). IEEE, pp 95–101

  13. 13.

    Guerrero M, Montoya FG, Baños R, Alcayde A, Gil C (2017) Adaptive community detection in complex networks using genetic algorithms. Neurocomputing 266:101–113

    Article  Google Scholar 

  14. 14.

    Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106

    Article  Google Scholar 

  15. 15.

    Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  16. 16.

    Traag VA, Waltman L, van Eck NJ (2019) From Louvain to Leiden: guaranteeing well-connected communities. Scientific Rep 9:e5233

    Article  Google Scholar 

  17. 17.

    Guimera R, Danon L, Diaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(6):065103

    Article  Google Scholar 

  18. 18.

    Que X, Checconi F, Petrini F, Gunnels JA (2015) Scalable community detection with the louvain algorithm. In: 2015 IEEE international on parallel and distributed processing symposium (IPDPS). IEEE, pp 28–37

  19. 19.

    Newman ME (2012) Communities, modules and large-scale structure in networks. Nat Phys 8(1):25

    Article  Google Scholar 

  20. 20.

    Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    MATH  Article  Google Scholar 

  21. 21.

    Rossi F, Villa-Vialaneix N (2010) Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing approach. Neurocomputing 73(7–9):1142–1163

    Article  Google Scholar 

  22. 22.

    Wang R-S, Zhang S, Wang Y, Zhang X-S, Chen L (2008) Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures. Neurocomputing 72(1–3):134–141

    Article  Google Scholar 

  23. 23.

    Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    MATH  Article  Google Scholar 

  24. 24.

    Newman ME (2001) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64(1):016132

    MathSciNet  Article  Google Scholar 

  25. 25.

    Shang R, Bai J, Jiao L, Jin C (2013) Community detection based on modularity and an improved genetic algorithm. Physica A 392(5):1215–1231

    Article  Google Scholar 

  26. 26.

    Moradi M, Parsa S (2019) An evolutionary method for community detection using a novel local search strategy. Physica A 523:457–475

    Article  Google Scholar 

  27. 27.

    Ying W et al (2019) Parallel conical area community detection using evolutionary multi-objective optimization. Processes 7(2):111

    Article  Google Scholar 

  28. 28.

    Ji J, Song X, Liu C, Zhang X (2013) Ant colony clustering with fitness perception and pheromone diffusion for community detection in complex networks. Physica A 392(15):3260–3272

    Article  Google Scholar 

  29. 29.

    Cai Q, Gong M, Shen B, Ma L, Jiao L (2014) Discrete particle swarm optimization for identifying community structures in signed social networks. Neural Netw 58:4–13

    Article  Google Scholar 

  30. 30.

    Hassan EA, Hafez AI, Hassanien AE, Fahmy AA (2014) Community detection algorithm based on artificial fish swarm optimization. In: Intelligent Systems’ 2014. Springer, pp 509–521

  31. 31.

    Wang Y, Xu W, Kang Q (2018) AFSMA: an enhanced artificial fish swarm algorithm based on mutuality for community detection. In: Proceedings of the 2nd international conference on big data research. ACM, pp 107–113

  32. 32.

    Li X, Wu X, Xu S, Qing S, Chang P-C (2019) A novel complex network community detection approach using discrete particle swarm optimization with particle diversity and mutation. Appl Soft Comput 81:105476

    Article  Google Scholar 

  33. 33.

    Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, 1995. MHS’95. IEEE, pp 39–43

  34. 34.

    Kennedy R, Eberhart J (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks IV, vol 1000, p 1995

  35. 35.

    Yang X-S (2014) Swarm intelligence based algorithms: a critical analysis. Evol Intell 7(1):17–28

    Article  Google Scholar 

  36. 36.

    Yang X-S, He X (2016) Nature-inspired optimization algorithms in engineering: overview and applications. In: Nature-inspired computation in engineering. Springer, pp 1–20

  37. 37.

    Yu YF, Li G, Xu C (2013) An improved particle swarm optimization algorithm. In: Applied mechanics and materials. Trans Tech Publ, vol 401, pp 1328–1335

  38. 38.

    Selvi V, Umarani DR (2010) Comparative analysis of ant colony and particle swarm optimization techniques. Int J Comput Appl 5(4):1–6

    Google Scholar 

  39. 39.

    Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: The 1998 IEEE international conference on evolutionary computation proceedings, 1998. IEEE World Congress on Computational Intelligence. IEEE, pp 69–73

  40. 40.

    Clerc M (2004) Discrete particle swarm optimization, illustrated by the traveling salesman problem. In: New optimization techniques in engineering. Springer, pp 219–239

  41. 41.

    Pan Q-K, Tasgetiren MF, Liang Y-C (2008) A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem. Comput Oper Res 35(9):2807–2839

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Strasser S, Goodman R, Sheppard J, Butcher S (2016) A new discrete particle swarm optimization algorithm. In: Proceedings of the genetic and evolutionary computation conference 2016. ACM, pp 53–60

  43. 43.

    Wu H, Nie C, Kuo F-C, Leung H, Colbourn CJ (2015) A discrete particle swarm optimization for covering array generation. IEEE Trans Evol Comput 19(4):575–591

    Article  Google Scholar 

  44. 44.

    Chatterjee S, Sarkar S, Hore S, Dey N, Ashour AS, Balas VE (2017) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Comput Appl 28(8):2005–2016

    Article  Google Scholar 

  45. 45.

    Momeni E, Armaghani DJ, Hajihassani M, Amin MFM (2015) Prediction of uniaxial compressive strength of rock samples using hybrid particle swarm optimization-based artificial neural networks. Measurement 60:50–63

    Article  Google Scholar 

  46. 46.

    Lynn N, Suganthan PN (2015) Distance based locally informed particle swarm optimizer with dynamic population size. In: Proceedings of the 18th Asia Pacific symposium on intelligent and evolutionary systems-volume 2. Springer, pp 577–587

  47. 47.

    Meunier D, Lambiotte R, Fornito A, Ersche K, Bullmore ET (2009) Hierarchical modularity in human brain functional networks. Front Neuroinform 3:37

    Article  Google Scholar 

  48. 48.

    Pujol JM, Erramilli V, Rodriguez P (2009) Divide and conquer: partitioning online social networks. arXiv preprint arXiv:0905.4918

  49. 49.

    Raeder T, Chawla NV (2011) Market basket analysis with networks. Social Netw Anal Min 1(2):97–113

    Article  Google Scholar 

  50. 50.

    Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582

    Article  Google Scholar 

  51. 51.

    Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discovery Data (TKDD) 1(1):2

    Article  Google Scholar 

  52. 52.

    Boseniuk T, Ebeling W (1990) Boltzmann-, Darwin-and Haeckel-strategies in optimization problems. In International conference on parallel problem solving from nature. Springer, pp 429–444

  53. 53.

    Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    MathSciNet  MATH  Article  Google Scholar 

  54. 54.

    Cai X, Shi Y, Zhu Y, Qiao Y, Hu F (2017) An algorithm Q-PSO for community detection in complex networks. In 2017 16th international symposium on distributed computing and applications to business, engineering and science (DCABES). IEEE, pp 76–79

  55. 55.

    Bilal S, Abdelouahab M (2017) Evolutionary algorithm and modularity for detecting communities in networks. Physica A 473:89–96

    MATH  Article  Google Scholar 

  56. 56.

    Gong M, Fu B, Jiao L, Du H (2011) Memetic algorithm for community detection in networks. Phys Rev E 84(5):056101

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Saeed Farzi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Figure 13 shows the pseudocode of PSO algorithm. In this algorithm, the main loop iterates as many times as total iterations or as it is known as T. In this loop, there is an inner loop which runs as many times as the number of individuals in the population, Np, in which velocities and positions are updated. The fitness function is calculated for all the individuals of the population as many times as the number of problem dimensions or |V|. The time complexity of calculating fitness function is \( O\left( {\left| V \right|^{2} } \right) \) in this paper. Therefore, the time complexity of PSO is \( O\left( {\left( {\left| V \right|*N_{p} + \left| V \right|^{2} *N_{p} } \right)*T} \right) \), and with some simplification, the time complexity of the algorithm is \( O\left( {\left( {\left| V \right|^{2} *N_{p} } \right)*T} \right) \).

In GA, the main loop iterates as many times as all the iterations T. in this loop, there is an inner loop which runs as many times as the number of individuals in population Np in which the fitness function for all individual is calculated and mutation and cross-over are applied on population by mutation and cross-over rates. The time complexity of fitness function in this problem (modularity) is \( O\left( {\left| V \right|^{2} } \right) \). Thus, time complexity of GA would be \( O\left( {\left( {\left| V \right|^{2} *N_{p} } \right)*T*(p_{c} *O\left( {cross over} \right) + p_{m} *O\left( {mutation} \right))} \right) \). With some simplifications we have \( O\left( {\left( {\left| V \right|^{2} *N_{p} } \right)*T} \right) \) for the time complexity of the algorithm.

See Fig. 17.

Fig. 17
figure17

Particle swarm optimization

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fozuni Shirjini, M., Farzi, S. & Nikanjam, A. MDPCluster: a swarm-based community detection algorithm in large-scale graphs. Computing 102, 893–922 (2020). https://doi.org/10.1007/s00607-019-00787-4

Download citation

Keywords

  • Particle swarm optimization
  • Community detection
  • Swarm intelligence
  • Large scale graphs
  • Louvain algorithm

Mathematics Subject Classification

  • 05C85