Probabilistic Graphical Model Based Highly Scalable Directed Community Detection Algorithm

Deng, XiaoLong; Nie, ZiXiang; Zhai, JiaYu

doi:10.1007/978-3-030-26142-9_28

Probabilistic Graphical Model Based Highly Scalable Directed Community Detection Algorithm

XiaoLong Deng ORCID: orcid.org/0000-0002-8847-7174¹⁰,
ZiXiang Nie¹⁰ &
JiaYu Zhai¹⁰

Conference paper
First Online: 12 September 2019

841 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11607))

Abstract

Community detection algorithms have essential applications for character statistics in complex network which could contribute to the study of the real network, such as the online social network and the logistics distribution network. But traditional community detection algorithms could not handle the significant characteristic of directionality in real network for only concentrating on undirected network. Based on Information Transfer Probability method of classic Probabilistic Graphical Model (PGM) theory from Turing Award Owner Pearl, we propose an efficient local directed community detection method named Information Transfer Gain (ITG) from basic information transfer triangles which composed the core structure of community. Then, aiming at processing the large scale directed social network with high efficiency, we propose the scalable and distributed algorithm of Distributed Information Transfer Gain (DITG) based on GraphX model in Spark. Finally, with extensive experiment on directed artificial network dataset and real social network dataset, we prove that our algorithm have good precision and efficiency in distributed environment compared with some classical directed detection algorithms such as FastGN, OSLOM and Infomap.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
MATH Google Scholar
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)
Article MathSciNet Google Scholar
Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)
Article Google Scholar
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004). https://doi.org/10.1103/PhysRevE.70.066111
Article Google Scholar
Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, P., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005). https://doi.org/10.1007/11569596_31
Chapter Google Scholar
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007). https://doi.org/10.1103/PhysRevE.76.036106
Article Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., et al.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 10, P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008
Article Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008). https://doi.org/10.1073/pnas.0706851105
Article Google Scholar
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009). https://doi.org/10.1103/PhysRevE.80.056117
Article Google Scholar
Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12(10), 103018 (2010)
Article Google Scholar
Ahn, Y.-Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466(7307), 761–764 (2010)
Article Google Scholar
Lancichinetti, A., Radicchi, F., Ramasco, J.J., Fortunato, S.: Finding statistically significant communities in networks. PLoS One 6(4), e18961 (2011)
Article Google Scholar
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: High quality, scalable and parallel community detection for large real graphs. In: The 23rd International Conference on World Wide Web, pp. 225–236. ACM, Seoul (2014). https://doi.org/10.1145/2566486.2568010
Levorato, V., Petermann, C.: Detection of communities in directed networks based on strongly p-connected components. In: 2011 International Conference on Computational Aspects of Social Networks (CASoN), pp. 211–216. IEEE, Salamanca (2011). https://doi.org/10.1109/cason.2011.6085946
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 1–10, Berkeley, CA, USA (2010)
Google Scholar
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 587–596. ACM (2013)
Google Scholar
Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: unifying data-parallel and graph-parallel analytics. CoRR abs/1402.2394 (2014)
Google Scholar
Sun, P.G., Gao, L.: A framework of mapping undirected to directed graphs for community detection. Inf. Sci. 298, 330–343 (2015)
Article Google Scholar
Zhang, X., Martin, T., Newman, M.E.: Identification of core-periphery structure in networks. Phys. Rev. E 91(3), 032803 (2015)
Article Google Scholar
Liu, J., Aggarwal, C., Han, J.: On integrating network and community discovery. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 117–126. ACM (2015)
Google Scholar
Newman, M.E.J.: Community detection in networks: modularity optimization and maximum likelihood are equivalent. CoRR abs/1606.02319 (2016)
Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Leskovec, J., Sosic, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1 (2016)
Article Google Scholar
Newman, M.E., Clauset, A.: Structure and inference in annotated networks. Nat. Commun 7, 11863 (2016)
Article Google Scholar
Amdahl, G.M.: Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS Conference Proceedings, no. (30), pp. 483–485 (1967). https://doi.org/10.1145/1465482.1465560
Deng, X., Zhai, J.: Efficient vector influence clustering coefficient based directed community detection algorithm. IEEE Access 5, 17106–17116 (2017). https://doi.org/10.1109/access.2017.2740962
Article Google Scholar
Deng, X., Dou, Y., Lv, T., Nguyen, Q.V.H.: A novel centrality cascading based edge parameter evaluation method for robust influence maximization. IEEE Access 5, 22119–22131 (2017). https://doi.org/10.1109/access.2017.2764750
Article Google Scholar

Download references

Acknowledgment

Thanks to the National Key Research and Development Program of China (No. 2018YFC0831306).

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
XiaoLong Deng, ZiXiang Nie & JiaYu Zhai

Authors

XiaoLong Deng
View author publications
You can also search for this author in PubMed Google Scholar
ZiXiang Nie
View author publications
You can also search for this author in PubMed Google Scholar
JiaYu Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to XiaoLong Deng .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U.
Singapore Management University, Singapore, Singapore
Hady W. Lauw

Appendix

A. Triple and Triangle Structure Statistic Method

Tables 1 and 2 give the repeat times of the edge function calculation. This appendix shows the statistic method of the triple and triangle structure.

Figure 10 shows the structure basis with three vertices. For the statistic of triple, two numbers are used to represent the two edges connect vertex and its two neighbours vertex and vertex. The number has three versions which are 0, 1 and 2 exactly. 0 represents the bidirectional edge while 1 and 2 represent the out direction edge and in direction edge respectively. The in and out attribute is observed by the focus vertex. When it comes to triangle structure, three numbers are used. The first two numbers remains the meaning. While the third number represents the edge attribute of the opposite edge of vertex. 0 is for bidirectional edge as well. 1 and 2 represent the directions from vertex to vertex and from vertex to vertex respectively. So the statistic of the triple and triangle structure is obvious for the counting of edge function repeat times. We set up the model of the situation in Fig. 10(a), and we can get the all nine ITG figures respectively.

It can be found in Fig. 11 the basic nine sub graphs of Fig. 10(a) and the other eighteen sub graphs of Fig. 10(b) can be found in Fig. 12. All the twenty seven sub graphs are classified to two types of weighted triangles which is the computation fundamental of Formula (1).

Table 11. All ITG computation in sub graphs of Figure A.10.

Full size table

B. Parameters Details of Formula ( 2 )

$$ \Theta _{1} = \frac{{((r - 1)\delta + 1 + q)(d_{in} - 1)\delta }}{{(r + q)((r - 1)(r - 2)\delta^{3} + (d_{in} - 1)\delta + q(q - 1)\delta \omega + q(q + 1)\omega + d_{out} \omega )}} $$

(2-1)

$$ \Theta _{2} = - \frac{{(r - 1)(r - 2)\delta^{3} }}{{(r - 1)(r - 2)\delta^{3} + q(q - 1)\omega + q(r - 1)\delta \omega }} \cdot \frac{(r + 1)\delta + q}{(r + q)(r - 1 + q)} $$

(2-2)

$$ \Theta _{3} = \frac{{d_{in} (d_{in} - 1)\delta }}{{d_{in} (d_{in} - 1)\delta + d_{out} (d_{out} - 1)\omega + d_{out} d_{in} \omega }} \cdot \frac{{d_{in} + d_{out} }}{{r + d_{out} }} $$

(23)

$$ q = {{(b - d_{in} )} \mathord{\left/ {\vphantom {{(b - d_{in} )} r}} \right. \kern-0pt} r} $$

(2-4)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, X., Nie, Z., Zhai, J. (2019). Probabilistic Graphical Model Based Highly Scalable Directed Community Detection Algorithm. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-26142-9_28
Published: 12 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation