Skip to main content
Log in

Sampling informative context nodes for network embedding

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Several modern network embedding methods learn vector representations from sampled context nodes. The sampling strategies are often carefully designed and controlled by specific parameters that enable them to adapt to different networks. However, the following fundamental question remains: what is the key factor that causes some sampling context results to yield better vectors than others on a certain network? We attempted to answer the question from the perspective of information theory. First, we defined the weighted entropy of the sampled context matrix, which denotes the amount of information it takes. We discovered that context matrices with higher weighted entropy generally produce better vectors. Second, we proposed maximum weighted entropy sampling methods for sampling more informative context nodes; thus, it can be used to produce more informative vectors. Herein, the results of the extensive experiments on the link prediction and node classification tasks confirm the effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sen P, Namata G, Bilgic M, et al. Collective classification in network data. AI Mag, 2008, 29: 93

    Google Scholar 

  2. Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv, 2009, 41: 1–58

    Article  Google Scholar 

  3. Yang J, McAuley J, Leskovec J. Community detection in networks with node attributes. In: Proceedings of the 13th International Conference on Data Minin, 2013. 1151–1156

  4. Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. J Am Soc Inform Sci Technol, 2007, 58: 1019–1031

    Article  Google Scholar 

  5. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014. 701–710

  6. Grover A, Leskovec J. node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. 855–864

  7. Tang J, Qu M, Wang M Z, et al. Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, 2015. 1067–1077

  8. Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. 1225–1234

  9. Abu-El-Haija S, Perozzi B, Al-Rfou R, et al. Watch your step: learning node embeddings via graph attention. In: Proceedings of the 32nd Conference on Neural Information Processing System, 2018. 9198–9208

  10. Dong Y X, Chawla N V, Swami A. metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017. 135–144

  11. Zhang D K, Yin J, Zhu X Q, et al. MetaGraph2Vec: complex semantic path augmented heterogeneous network embedding. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2018. 196–208

  12. Liao L Z, He X N, Zhang H W, et al. Attributed social network embedding. IEEE Trans Knowl Data Eng, 2018, 30: 2257–2270

    Article  Google Scholar 

  13. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. 2013. ArXiv:1301.3781

  14. Qiu J Z, Dong Y X, Ma H, et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, 2018. 459–467

  15. Guiaşu S. Weighted entropy. Rep Math Phys, 1971, 2: 165–179

    Article  MathSciNet  MATH  Google Scholar 

  16. Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014. ArXiv:1412.6980

  17. Tang L, Liu H. Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009. 817–826

  18. Cao S S, Lu W, Xu Q K. Grarep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015. 891–900

  19. Zou K H, O’Malley A J, Mauri L. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 2007, 115: 654–657

    Article  Google Scholar 

  20. Yang M S, Nataliani Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy. IEEE Trans Fuzzy Syst, 2018, 26: 817–835

    Article  Google Scholar 

  21. Park E, Ahn J, Yoo S. Weighted-entropy-based quantization for deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017

  22. Feng X C, Qin B, Liu T. A language-independent neural network for event detection. Sci China Inf Sci, 2018, 61: 092106

    Article  Google Scholar 

  23. Li X L, Zhuang Y, Fu Y J, et al. A trust-aware random walk model for return propensity estimation and consumer anomaly scoring in online shopping. Sci China Inf Sci, 2019, 62: 052101

    Article  Google Scholar 

  24. Perozzi B, Kulkarni V, Skiena S. Walklets: multiscale graph embeddings for interpretable network classification. 2016. ArXiv:1605.02115

Download references

Acknowledgements

This work was supported in part by Social Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 2018SJA0455), National Nature Science Foundation of China (Grant No. 61472183), and Social Science Foundation of Jiangsu Province (Grant No. 19TQD002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin-Yu Dai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, D., Dai, XY., Chen, J. et al. Sampling informative context nodes for network embedding. Sci. China Inf. Sci. 64, 212104 (2021). https://doi.org/10.1007/s11432-019-2635-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2635-8

Keywords

Navigation