Probabilistic Graphical Model Based Highly Scalable Directed Community Detection Algorithm

  • XiaoLong DengEmail author
  • ZiXiang Nie
  • JiaYu Zhai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11607)


Community detection algorithms have essential applications for character statistics in complex network which could contribute to the study of the real network, such as the online social network and the logistics distribution network. But traditional community detection algorithms could not handle the significant characteristic of directionality in real network for only concentrating on undirected network. Based on Information Transfer Probability method of classic Probabilistic Graphical Model (PGM) theory from Turing Award Owner Pearl, we propose an efficient local directed community detection method named Information Transfer Gain (ITG) from basic information transfer triangles which composed the core structure of community. Then, aiming at processing the large scale directed social network with high efficiency, we propose the scalable and distributed algorithm of Distributed Information Transfer Gain (DITG) based on GraphX model in Spark. Finally, with extensive experiment on directed artificial network dataset and real social network dataset, we prove that our algorithm have good precision and efficiency in distributed environment compared with some classical directed detection algorithms such as FastGN, OSLOM and Infomap.


Distributed computing Directed community detection Information transfer gain Probabilistic graphical mode Scalable algorithm 



Thanks to the National Key Research and Development Program of China (No. 2018YFC0831306).


  1. 1.
    Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  2. 2.
    Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)CrossRefGoogle Scholar
  4. 4.
    Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004). Scholar
  5. 5.
    Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, P., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005). Scholar
  6. 6.
    Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007). Scholar
  7. 7.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., et al.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 10, P10008 (2008). Scholar
  8. 8.
    Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008). Scholar
  9. 9.
    Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009). Scholar
  10. 10.
    Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12(10), 103018 (2010)CrossRefGoogle Scholar
  11. 11.
    Ahn, Y.-Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466(7307), 761–764 (2010)CrossRefGoogle Scholar
  12. 12.
    Lancichinetti, A., Radicchi, F., Ramasco, J.J., Fortunato, S.: Finding statistically significant communities in networks. PLoS One 6(4), e18961 (2011)CrossRefGoogle Scholar
  13. 13.
    Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: High quality, scalable and parallel community detection for large real graphs. In: The 23rd International Conference on World Wide Web, pp. 225–236. ACM, Seoul (2014).
  14. 14.
    Levorato, V., Petermann, C.: Detection of communities in directed networks based on strongly p-connected components. In: 2011 International Conference on Computational Aspects of Social Networks (CASoN), pp. 211–216. IEEE, Salamanca (2011).
  15. 15.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 1–10, Berkeley, CA, USA (2010)Google Scholar
  16. 16.
    Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 587–596. ACM (2013)Google Scholar
  17. 17.
    Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: unifying data-parallel and graph-parallel analytics. CoRR abs/1402.2394 (2014)Google Scholar
  18. 18.
    Sun, P.G., Gao, L.: A framework of mapping undirected to directed graphs for community detection. Inf. Sci. 298, 330–343 (2015)CrossRefGoogle Scholar
  19. 19.
    Zhang, X., Martin, T., Newman, M.E.: Identification of core-periphery structure in networks. Phys. Rev. E 91(3), 032803 (2015)CrossRefGoogle Scholar
  20. 20.
    Liu, J., Aggarwal, C., Han, J.: On integrating network and community discovery. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 117–126. ACM (2015)Google Scholar
  21. 21.
    Newman, M.E.J.: Community detection in networks: modularity optimization and maximum likelihood are equivalent. CoRR abs/1606.02319 (2016)Google Scholar
  22. 22.
    Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)Google Scholar
  23. 23.
    Leskovec, J., Sosic, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1 (2016)CrossRefGoogle Scholar
  24. 24.
    Newman, M.E., Clauset, A.: Structure and inference in annotated networks. Nat. Commun 7, 11863 (2016)CrossRefGoogle Scholar
  25. 25.
    Amdahl, G.M.: Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS Conference Proceedings, no. (30), pp. 483–485 (1967).
  26. 26.
    Deng, X., Zhai, J.: Efficient vector influence clustering coefficient based directed community detection algorithm. IEEE Access 5, 17106–17116 (2017). Scholar
  27. 27.
    Deng, X., Dou, Y., Lv, T., Nguyen, Q.V.H.: A novel centrality cascading based edge parameter evaluation method for robust influence maximization. IEEE Access 5, 22119–22131 (2017). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Beijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations