Skip to main content
Log in

A block-based generative model for attributed network embedding

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Attributed network embedding has attracted plenty of interest in recent years. It aims to learn task-independent, low-dimensional, and continuous vectors for nodes preserving both topology and attribute information. Most of the existing methods, such as random-walk based methods and GCNs, mainly focus on the local information, i.e., the attributes of the neighbours. Thus, they have been well studied for assortative networks (i.e., networks with communities) but ignored disassortative networks (i.e., networks with multipartite, hubs, and hybrid structures), which are common in the real world. To model both assortative and disassortative networks, we propose a block-based generative model for attributed network embedding from a probability perspective. Specifically, the nodes are assigned to several blocks wherein the nodes in the same block share the similar linkage patterns. These patterns can define assortative networks containing communities or disassortative networks with the multipartite, hub, or any hybrid structures. To preserve the attribute information, we assume that each node has a hidden embedding related to its assigned block. We use a neural network to characterize the nonlinearity between node embeddings and node attributes. We perform extensive experiments on real-world and synthetic attributed networks. The results show that our proposed method consistently outperforms state-of-the-art embedding methods for both clustering and classification tasks, especially on disassortative networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In this paper, cluster, group, and block are interchangeable.

References

  1. Abbe, E.: Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18(1), 6446–6531 (2017)

    MathSciNet  Google Scholar 

  2. Barbieri, N., Bonchi, F., Manco, G.: Who to Follow and Why: Link Prediction with Explanations. In: KDD, pp 1266–1275 (2014)

  3. Bojchevski, A., Günnemann, S.: Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking. In: International Conference on Learning Representations (2018)

  4. Chen, H., Yin, H., Chen, T., Nguyen, Q. V. H., Peng, W. C., Li, X.: Exploiting Centrality Information with Graph Convolutions for Network Representation Learning. In: 2019 IEEE 35Th International Conference on Data Engineering (ICDE), pp 590–601. IEEE (2019)

  5. Chen, H., Yin, H., Chen, T., Wang, W., Li, X., Hu, X.: Social boosted recommendation with folded bipartite network embedding. IEEE Transactions on Knowledge and Data Engineering (2020)

  6. Chen, L., Liu, C., Liao, K., Li, J., Zhou, R.: Contextual Community Search over Large Social Networks. In: 2019 IEEE 35Th International Conference on Data Engineering (ICDE), pp 88–99. IEEE (2019)

  7. Chowdhary, A. A., Liu, C., Chen, L., Zhou, R., Yang, Y.: Finding Attribute Diversified Communities in Complex Networks. In: International Conference on Database Systems for Advanced Applications, pp 19–35. Springer (2020)

  8. Craveny, M., DiPasquoy, D., Freitagy, D., McCallumzy, A., Mitchelly, T., Nigamy, K., an Slatteryy, S.: Learning to Extract Symbolic Knowledge from the World Wide Web. In: AAAI, pp 509–516 (1998)

  9. Gama, F., Ribeiro, A., Bruna, J.: Diffusion Scattering Transforms on Graphs. In: International Conference on Learning Representations (2019)

  10. Gao, H., Huang, H.: Deep Attributed Network Embedding. In: IJCAI, pp 3364–3370 (2018)

  11. Gao, H., Ji, S.: Graph u-nets. In: Proceedings of the 36th International Conference on Machine Learning (2019)

  12. Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On Community Outliers and Their Efficient Detection in Information Networks. In: KDD, pp 813–822. ACM (2010)

  13. Gao, M., Chen, L., He, X., Zhou, A.: Bine: Bipartite Network Embedding. In: SIGIR, pp 715–724 (2018)

  14. Gopalan, P., Hofman, J. M., Blei, D. M.: Scalable Recommendation with Hierarchical Poisson Factorization. In: UAI, pp 326–335 (2015)

  15. Grover, A., Leskovec, J.: Node2vec: Scalable Feature Learning for Networks. In: KDD (2016)

  16. Guimerà, R., Sales-Pardo, M.: Missing and spurious interactions and the reconstruction of complex networks. PNAS 106(52), 22073–22078 (2009)

    Article  Google Scholar 

  17. Hamilton, W., Ying, Z., Leskovec, J.: Inductive Representation Learning on Large Graphs. In: Advances in Neural Information Processing Systems, pp 1024–1034 (2017)

  18. Holland, P. W., Laskey, K. B., Leinhardt, S.: Stochastic blockmodels: First steps. Soc. Netw. 5(2), 109–137 (1983)

    Article  MathSciNet  Google Scholar 

  19. Huang, X., Li, J., Hu, X.: Accelerated attributed network embedding. In: Proceedings of the 2017 SIAM international conference on data mining, pp 633–641 (2017)

  20. Huang, X., Song, Q., Yang, F., Hu, X.: Large-Scale Heterogeneous Feature Embedding. In: AAAI (2019)

  21. Jiang, F., He, L., Zheng, Y., Zhu, E., Xu, J., Yu, P. S.: On spectral graph embedding: a non-backtracking perspective and graph approximation. In: Proceedings of the 2018 SIAM International Conference on Data Mining, pp 324–332. SIAM (2018)

  22. Jiang, J.Q.: Stochastic block model and exploratory analysis in signed networks. Phys. Rev. E 91(6), 062805 (2015)

  23. Kendrick, L., Musial, K., Gabrys, B.: Change point detection in social networks—critical review with experiments. Comput. Sci. Rev. 29, 1–13 (2018)

    Article  MathSciNet  Google Scholar 

  24. Khan, K. U., Nawaz, W., Lee, Y. K.: Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20(3), 543–570 (2017)

    Article  Google Scholar 

  25. Kingma, D. P., Welling, M.: Auto-encoding variational bayes. ICLR (2014)

  26. Kipf, T. N., Welling, M.: Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning (2016)

  27. Kipf, T. N., Welling, M.: Semi-supervised classification with graph convolutional networks. ICLR (2017)

  28. Knyazev, B., Taylor, G. W., Amer, M.: Understanding Attention and Generalization in Graph Neural Networks. In: Advances in Neural Information Processing Systems, pp 4202–4212 (2019)

  29. Kuncheva, L. I., Hadjitodorov, S. T.: Using Diversity in Cluster Ensembles. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp 1214–1219 (2004)

  30. Li, J., Hu, X., Tang, J., Liu, H.: Unsupervised streaming feature selection in social media. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp 1041–1050 (2015)

  31. Li, Y., Sha, C., Huang, X., Zhang, Y.: Community detection in attributed graphs: an embedding approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  32. Liao, L., He, X., Zhang, H., Chua, T. S.: Attributed social network embedding. IEEE TKDE 30(12), 2257–2270 (2018)

    Google Scholar 

  33. Mehta, N., Duke, L. C., Rai, P.: Stochastic Blockmodels Meet Graph Neural Networks. In: ICML, pp 4466–4474 (2019)

  34. Namata, G., London, B., Getoor, L., Huang, B., EDU, U.: Query-Driven Active Surveying for Collective Classification. In: 10Th International Workshop on Mining and Learning with Graphs, vol. 8 (2012)

  35. Newman, M. E., Clauset, A.: Structure and inference in annotated networks. Nat. Commun. 7(1), 1–11 (2016)

    Article  Google Scholar 

  36. Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially Regularized Graph Autoencoder for Graph Embedding. In: IJCAI, pp 2609–2615 (2018)

  37. Pan, S., Wu, J., Zhu, X., Zhang, C., Wang, Y.: Tri-Party Deep Network Representation. In: IJCAI, pp 1895–1901 (2016)

  38. Pei, H., Wei, B., Chang, K. C. C., Lei, Y., Yang, B.: Geom-Gcn: Geometric Graph Convolutional Networks. In: ICLR (2020)

  39. Perozzi, B., Akoglu, L., Iglesias Sánchez, P., Müller, E.: Focused Clustering and Outlier Detection in Large Attributed Graphs. In: KDD, pp 1346–1355. ACM (2014)

  40. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online Learning of Social Representations. In: KDD, pp 701–710 (2014)

  41. Pillai, I., Fumera, G., Roli, F.: F-measure optimisation in multi-label classifiers. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp 2424–2427 (2012)

  42. Ribeiro, L. F., Saverese, P. H., Figueiredo, D.R.: Struc2vec: Learning Node Representations from Structural Identity. In: KDD, pp 385–394 (2017)

  43. Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient Community Detection in Large Networks Using Content and Links. In: WWW, pp 1089–1098. ACM (2013)

  44. Salehi, A., Davulcu, H.: Graph attention auto-encoders. arXiv:1905.10715 (2019)

  45. Silva, T. H., Laender, A. H., de Melo, P. O. V.: Social-Based Classification of Multiple Interactions in Dynamic Attributed Networks. In: 2018 IEEE International Conference on Big Data (Big Data), pp 4063–4072 (2018)

  46. Van Der Maaten, L.: Accelerating t-sne using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)

    MathSciNet  MATH  Google Scholar 

  47. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. ICLR (2018)

  48. Wahid-Ul-Ashraf, A., Budka, M., Musial, K.: How to predict social relationships—physics-inspired approach to link prediction. Physica A: Stat. Mech. Appl. 523, 1110–1129 (2019)

    Article  Google Scholar 

  49. Wang, Y., Li, Y., Fan, J., Ye, C., Chai, M.: A survey of typical attributed graph queries. World Wide Web 24(1), 297–346 (2021)

    Article  Google Scholar 

  50. Xu, B., Shen, H., Cao, Q., Qiu, Y., Cheng, X.: Graph Wavelet Neural Network. In: International Conference on Learning Representations (2018)

  51. Xu, W., Liu, X., Gong, Y.: Document Clustering Based on Non-Negative Matrix Factorization. In: SIGIR, pp 267–273 (2003)

  52. Yang, B., Liu, J., Liu, D.: Characterizing and extracting multiplex patterns in complex networks. IEEE Trans. Syst. Man Cybern. Part b Cybern. Publ. IEEE Syst. Man Cybern. Soc. 42(2), 469 (2012)

    Article  Google Scholar 

  53. Yang, B., Liu, X., Li, Y., Zhao, X.: Stochastic blockmodeling and variational bayes learning for signed network analysis. IEEE TKDE 29(9), 2026–2039 (2017)

    Google Scholar 

  54. Yang, H., Pan, S., Zhang, P., Chen, L., Lian, D., Zhang, C.: Binarized Attributed Network Embedding. In: 2018 IEEE International Conference on Data Mining (ICDM), pp 1476–1481. IEEE (2018)

  55. Yang, S., Yang, B.: Enhanced Network Embedding with Text Information. In: 2018 24Th International Conference on Pattern Recognition (ICPR), pp 326–331. IEEE (2018)

  56. Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: Detecting communities and their evolutions in dynamic social networks–a bayesian approach. Mach. Learn. 82(2), 157–189 (2011)

    Article  MathSciNet  Google Scholar 

  57. Zhang, Z., Yang, H., Bu, J., Zhou, S., Yu, P., Zhang, J., Ester, M., Wang, C.: Anrl: Attributed Network Representation Learning via Deep Neural Networks. In: IJCAI, pp 3155–3161 (2018)

  58. Zhao, A., Liu, G., Zheng, B., Zhao, Y., Zheng, K.: Temporal paths discovery with multiple constraints in attributed dynamic graphs. World Wide Web 23(1), 313–336 (2020)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant number 61876069; Jilin Province Key Scientific and Technological Research and Development project under grant numbers 20180201067GX, 20180201044GX; Jilin Province Natural Science Foundation under grant number 20200201036JC; China Scholarship Council under grant number 201906170205, 201906170208; Australian Research Council under grant number DP190101087.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

In this section, we give some details for the derivation of likelihood of complete-data and the update rules of the parameters of our model.

Derivation of likelihood of complete-data

According to the generative process of attributed networks, the joint probability or the likelihood of complete-data is

$$ p(\pmb{X},\pmb{A},\pmb{Z},\pmb{c}|\pmb{\Pi},\pmb{\omega},\pmb{\sigma},\pmb{\mu})=p(\pmb{A}|\pmb{c},\pmb{\Pi})p(\pmb{X}|\pmb{Z})p(\pmb{Z}|\pmb{c},\pmb{\sigma},\pmb{\mu})p(\pmb{C}|\pmb{\omega}) $$
(13)

and each factor is defined as follows.

First, we know that the node assignment follows a multinomial distribution. The probability of node i belongs to block k is ωk and the assignment for each node is independent. Thus, the probability of assigning all nodes, i.e., obtaining vector c =< c1, c2,...cn >, is

$$ p(\pmb{c}|\pmb{\omega}) = \prod\limits_{i}\omega_{c_{i}}. $$
(14)

Then, the embedding of node i follows a Gaussian distribution with mean \(\pmb {\mu }_{c_{i}}\) and standard derivation \(\pmb {\sigma }_{c_{i}}\) if we know that node i belongs to block ci. Thus, we have

$$ p(\pmb{Z}|\pmb{c},\pmb{\sigma},\pmb{\mu}) = \prod\limits_{id}\frac{1}{\sqrt{2\pi}\sigma_{c_{i}d}}e^{-\frac{(z_{id}-\mu_{c_{i}d})^{2}}{2\sigma_{c_{i}d}^{2}}}. $$
(15)

As for the probability of generating node attributes, if X ∈{0, 1}n×M, it follows a Bernoulli distribution, i.e., the probability of node i having m-th attribute is υim. Thus,

$$ p(\pmb{X}|\pmb{Z}) = \prod\limits_{im}\upsilon_{im}^{x_{im}}(1-\upsilon_{im})^{1-x_{im}}, $$
(16)

Similarly, if \(\pmb {X} \in \mathbb {R}^{n\times M}\), we can obtain

$$ p(\pmb{X}|\pmb{Z}) = \prod\limits_{im}\frac{1}{\sqrt{2\pi}\lambda_{im}}e^{-\frac{(x_{im}-\upsilon_{im})^{2}}{2\lambda_{im}^{2}}}. $$
(17)

Finally, generating links between each pair of nodes follows a Bernoulli distribution and the generation process of each pair of nodes is independent. The probability of node i connecting to node j is \(\pi _{c_{i}c_{j}}\) if the node assignment is known. Thus, the probability of generating links is

$$ p(\pmb{A}|\pmb{c},\pmb{\Pi}) = \prod\limits_{ij}\pi_{c_{i}c_{j}}^{a_{ij}}(1-\pi_{c_{i}c_{j}})^{1-a_{ij}}. $$
(18)

Using a network with binary attributes as an example, we substitute (14)-(16) and (18) to (13), we obtain

$$ \begin{array}{@{}rcl@{}} &&p(\pmb{X},\pmb{A},\pmb{Z},\pmb{c}|\pmb{\Pi},\pmb{\omega},\pmb{\sigma},\pmb{\mu})\\ &&=\prod\limits_{ij}\pi_{c_{i}c_{j}}^{a_{ij}}(1-\pi_{c_{i}c_{j}})^{1-a_{ij}}\times\prod\limits_{im}\upsilon_{im}^{x_{im}}(1-\upsilon_{im})^{(1-x_{im})}\\ &&\quad\times\prod\limits_{id}\frac{1}{\sqrt{2\pi}\sigma_{c_{i}d}}e^{-\frac{(z_{id}-\mu_{c_{i}d})^{2}}{2\sigma_{c_{i}d}^{2}}}\times\prod\limits_{i}\omega_{c_{i}}. \end{array} $$
(19)

Derivation of update rules of the parameters

First, the items related to τ on (7) are:

$$ \begin{array}{@{}rcl@{}} \mathcal{L}_{[\tau_{ik}]} &= &\sum\limits_{j}\sum\limits_{l}\tau_{ik}\tau_{jl}[a_{ij}\log\pi_{kl}+(1-a_{ij})\log(1-\pi_{kl})] \end{array} $$
(20)
$$ \begin{array}{@{}rcl@{}} && -\frac{1}{2}\sum\limits_{d=1}^{D}\tau_{ik}(\log\sigma_{kd}^{2}+\frac{\hat{\sigma}_{id}^{2}}{\sigma_{kd}^{2}} +\frac{(\hat{\mu}_{id}-\mu_{kd})^{2}}{\sigma_{kd}^{2}}) \end{array} $$
(21)
$$ \begin{array}{@{}rcl@{}} &&+ \tau_{ik}\log\frac{\omega_{k}}{\tau_{ik}}. \end{array} $$
(22)

Set \(\frac {\partial {\mathscr{L}}_{[\tau _{ik}]}}{\partial \tau _{ik}}=0\), then we can update τik by

$$ \begin{array}{@{}rcl@{}} \tau_{ik} \propto &\exp& (\sum\limits_{j}\sum\limits_{l}\tau_{jl}[a_{ij}\log\pi_{kl}+(1-a_{ij})\log(1-\pi_{kl})] \end{array} $$
(22)
$$ \begin{array}{@{}rcl@{}} && \quad-\frac{1}{2}{\sum\limits_{d}^{D}}(\log\sigma_{kd}^{2}+\frac{\hat{\sigma}_{id}^{2}}{\sigma_{kd}^{2}} +\frac{(\hat{\mu}_{id}-\mu_{kd})^{2}}{\sigma_{kd}^{2}}) + \log\omega_{k}). \end{array} $$
(22)

Then, we optimize πkl:

$$ \mathcal{L}_{[\pi_{kl}]} = \sum\limits_{ij}\tau_{ik}\tau_{jl}[A_{ij}\log\pi_{kl}+(1-A_{ij})\log(1-\pi_{kl})]. $$

Set \(\frac {\partial {\mathscr{L}}_{[\pi _{kl}]}}{\partial \pi _{kl}}=0\), we obtain

$$ \pi_{kl} = \frac{{\sum}_{ij}\tau_{ik}\tau_{jl}A_{ij}}{{\sum}_{ij}\tau_{ik}\tau_{jl}}. $$
(21)

Next, the items related to ωk are

$$ \mathcal{L}_{[\omega_{k}]}= \sum\limits_{i}\gamma_{ik}\log\omega_{k}. $$
(21)

Since \({\sum }_{k=1}^{K}=1\), we take the derivative of \({\mathscr{L}}_{[\omega _{k}]} + \upbeta ({\sum }_{k}\omega _{k} - 1)\) of ωk, and make the derivative to zero. Then, we can obtain the update formula for ωk as follows:

$$ \omega_{k} = \frac{1}{n}\sum\limits_{i}\gamma_{ik}. $$
(21)

In the same way, we can obtain the items related to μkd and σkd as follows:

$$ \mathcal{L}_{[\mu_{kd}]}=-\frac{1}{2}{\sum\limits_{i}^{n}}\gamma_{ik}\frac{(\hat{\mu}_{id}-\mu_{kd})^{2}}{\sigma_{kd}^{2}}, $$
(21)

and

$$ \mathcal{L}_{[\sigma_{kd}]}=-\frac{1}{2}{\sum\limits_{i}^{n}}\gamma_{ik}(\log\sigma_{kd}^{2}+\frac{\hat{\sigma}_{id}^{2}}{\sigma_{kd}^{2}}+\frac{(\hat{\mu}_{id}-\mu_{kd})^{2}}{\sigma_{kd}^{2}}). $$
(21)

We set \(\frac {\partial {\mathscr{L}}_{[\mu _{kd}]}}{\partial \mu _{kd}}=0\) and \(\frac {\partial {\mathscr{L}}_{[\sigma _{kd}]}}{\partial \sigma _{kd}}=0\), then we derive the update rules for μkd and σkd are

$$ \mu_{kd}= \frac{{{\sum}_{i}^{n}}\tau_{ik}\hat{\mu}_{id}}{{{\sum}_{i}^{n}}\tau_{ik}}, $$
(21)

and

$$ \sigma_{kd}= \frac{{{\sum}_{i}^{n}}\tau_{ik}(\hat{\sigma}_{id}+(\hat{\mu}_{id}-\mu_{kd})^{2})}{{{\sum}_{i}^{n}}\tau_{ik}}, $$
(21)

respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Yang, B., Song, W. et al. A block-based generative model for attributed network embedding. World Wide Web 24, 1439–1464 (2021). https://doi.org/10.1007/s11280-021-00918-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00918-y

Keywords

Navigation