World Wide Web

, Volume 22, Issue 6, pp 2799–2824 | Cite as

HSEM: highly scalable node embedding for link prediction in very large-scale social networks

  • Aakas Zhiyuli
  • Xun LiangEmail author
  • Yanfang Chen
Part of the following topical collections:
  1. Special Issue on Social Computing and Big Data Applications


Very large-scale social networks are typically sparse and dynamic and often have millions of nodes and billions of links. Link prediction in very large-scale networks is a challenging task for most existing methods. This paper investigates the link prediction problem in very large-scale networks. We propose a model, namely, a very large-scale network co-occurrence embedding algorithm (Hsem), which learns the co-occurrence features of node pairs to embed nodes into a vector with a lower and fixed dimension. A damping-based random walk algorithm is also developed to extract the hierarchical structural information of node pairs. Moreover, we design an incremental learning algorithm based on Hsem to meet the requirements of highly dynamic evolution in very large-scale networks, which significantly improves learning speed while maintaining accuracy. Finally, we present controlled experiments employing the proposed and baseline methods on eight challenging real-world large-scale datasets with a fixed dropout percentage and calculate the time overhead and receiver operating characteristics. The results show that Hsem performs comparably to current state-of-the-art methods and consistently outperforms the baselines in different experimental settings.


Link prediction Very large-scale social networks Link analysis Node embedding Co-occurrence learning 



This work is supported by the National Natural Science Foundation of China (Grant Nos. 71531012 and 71271211), the Natural Science Foundation of Beijing (Grant No. 4172032), the Outstanding Innovative Talents Cultivation Funded Programs 2018 of Renmin University of China, and the Opening Project of State Key Laboratory of Digital Publishing Technology of Founder Group.


  1. 1.
    Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Networks 25(3), 211–230 (2003) CrossRefGoogle Scholar
  2. 2.
    Balduini, M., Bozzon, A., Della Valle, E., Huang, Y., Houben, G.J.: Recommending venues using continuous predictive social media analytics. IEEE Internet Comput. 18(5), 28–35 (2014)CrossRefGoogle Scholar
  3. 3.
    Chen, Z., Chen, M., Weinberger, K.Q., Zhang, W.: Marginalized denoising for link prediction and multi-label learning. In: AAAI, pp. 1707–1713 (2015)Google Scholar
  4. 4.
    Dong, Y., Tang, J., Wu, S., Tian, J., Chawla, N.V., Rao, J., Cao, H.: Link prediction and recommendation across heterogeneous social networks. In: ICDM, pp. 181–190 (2012)Google Scholar
  5. 5.
    Ermis, B., Acar, E., Cemgil, A.T.: Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Min. Knowl. Disc. 29(1), 203–236 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Fire, M., Tenenboim-Chekina, L., Puzis, R., Lesser, O., Rokach, L., Elovici, Y.: Computationally efficient link prediction in a variety of social networks. ACM Trans. Intell. Syst. Technol. 5(1), 10–25 (2013)CrossRefGoogle Scholar
  7. 7.
    Garcia-Gasulla, D., Cortés, U.: Link Prediction in Very Large Directed Graphs: Exploiting Hierarchical Properties in Parallel. KNOW@LOD (2014)Google Scholar
  8. 8.
    Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 855–864 (2016)Google Scholar
  9. 9.
    Hisano, R.: Semi-supervised graph embedding approach to dynamic link prediction arXiv preprint arXiv (2016)Google Scholar
  10. 10.
    Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)Google Scholar
  11. 11.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika (1953)Google Scholar
  12. 12.
    Konstas, I., Stathopoulos, V., Jose, J.M.: On social networks and collaborative recommendation. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 195–202 (2009)Google Scholar
  13. 13.
    Lefakis, L., Fleuret, F.: Reservoir Boosting : Between Online and Offline Ensemble Learning, pp. 1412–1420 (2013)Google Scholar
  14. 14.
    Li, J., Dani, H., Hu, X., Tang, J., Chang, Y., Liu, H.: Attributed network embedding for learning in a dynamic environment. arXiv:1706.01860 (2017)
  15. 15.
    Li, J., Xia, F., Wang, W., Chen, Z., Asabere, N. Y., Jiang, H.: ACRec: a co-authorship based random walk model for academic collaboration recommendation. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion (2014)Google Scholar
  16. 16.
    Liao, L., He, X., Zhang, H., Chua, T.: Attributed social network embedding. arXiv:1705.04969 (2017)
  17. 17.
    Liben-Nowell, D., Kleinberg, J.M.: The link-prediction problem for social networks. JASIST 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  18. 18.
    Lichtenwalter, R.N., Lussier, J.T., Chawla, N.V.: New Perspectives and Methods in Link Prediction. In: SIGKDD, pp. 243–252 (2010)Google Scholar
  19. 19.
    Lü, L., Jin, C. H.: Zhou, T.: Similarity index based on local paths for link prediction of complex networks. Phys. Rev. E 80(4), 046,122 (2009)CrossRefGoogle Scholar
  20. 20.
    Lü, L., Zhou, T.: Link prediction in complex networks: a survey. Phys. A Stat. Mech. Appl. 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
  21. 21.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. cs.CL, pp. 1–12 (2013)Google Scholar
  22. 22.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the international workshop on artificial intelligence and statistics, pp. 246–252 (2005)Google Scholar
  23. 23.
    Nguyen, C.H., Mamitsuka, H.: Latent feature kernels for link prediction on sparse graphs. IEEE Trans. Neural Netw. Learn. Syst. 23(11), 1793–1804 (2012)CrossRefGoogle Scholar
  24. 24.
    Pan, J., Yang, H., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: SIGKDD, pp. 653–658 (2004)Google Scholar
  25. 25.
    Papadimitriou, A., Symeonidis, P., Manolopoulos, Y.: Fast and accurate link prediction in social networking systems. J. Syst. Softw. 85(9), 2119–2132 (2012)CrossRefGoogle Scholar
  26. 26.
    Raymond, R., Kashima, H.: Fast and Scalable Algorithms for Semi-Supervised Link Prediction on Static and Dynamic Graphs. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 131–147. Springer, Berlin (2010)CrossRefGoogle Scholar
  27. 27.
    Sarkar, P., Chakrabarti, D., Jordan, M.: Nonparametric link prediction in large scale dynamic networks. Electron J Stat 8(2), 2022–2065 (2014)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Shin, K., Jung, J., Lee, S., Kang, U.: BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs. In: SIGMOD, pp. 1571–1585 (2015)Google Scholar
  29. 29.
    Symeonidis, P., Perentis, C.: Link prediction in multi-modal social networks. In: ECML/PKDD, vol. 8726, pp. 147–162 (2014)CrossRefGoogle Scholar
  30. 30.
    Tabourier, L., Libert, A.S., Lambiotte, R.: RankMerging: learning to rank in large-scale social networks. In: ECML-PKDD Workshop (2014)Google Scholar
  31. 31.
    Tang, J., Chang, S., Aggarwal, C.C., Liu, H.: Negative link prediction in social media. In: WSDM, pp. 87–96 (2015)Google Scholar
  32. 32.
    Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Large-Scale Information Network Embedding. In: Proceedings of the 24Th International Conference on World Wide Web, pp. 1067–1077 (2015)Google Scholar
  33. 33.
    Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1225–1234 (2016)Google Scholar
  34. 34.
    Wang, P., Xu, B., Wu, Y., Zhou, X.: Link prediction in social networks: the state-of-the-art. Sci. China Inf. Sci. 58(1), 1–38 (2015)Google Scholar
  35. 35.
    Wang, X., He, D., Chen, D., Xu, J.: Clustering-based collaborative filtering for link prediction. In: AAAI, pp. 332–338 (2015)Google Scholar
  36. 36.
    Wu, Z., Li, Y.: Link prediction based on multi-steps resource allocation. In: WI-IAT, pp. 355–360 (2014)Google Scholar
  37. 37.
    Xia, F., Chen, Z., Wang, W., Li, J.: MVCWalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Trans. Emerg. Topics Comput. 2(3), 364–375 (2014)CrossRefGoogle Scholar
  38. 38.
    Zhang, J., Fang, Z., Chen, W., Tang, J.: Diffusion of following links in microblogging networks. IEEE Trans. Knowl. Data Eng. 27(8), 2093–2106 (2015)CrossRefGoogle Scholar
  39. 39.
    Zhou, T., Lü, L., Zhang, Y.C.: Predicting missing links via local information. Europ. Phys. J. B. 71(4), 623–630 (2009)CrossRefGoogle Scholar
  40. 40.
    Zhou, X., Liang, X., Zhang, H., Ma, Y.: Cross-platform identification of anonymous identical users in multiple social media networks. IEEE Trans. Knowl. Data Eng. 28(2), 411–424 (2016)CrossRefGoogle Scholar
  41. 41.
    Zhu, L., Guo, D., Yin, J., Steeg, G.V., Galstyan, A.: Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Trans. Knowl. Data Eng. 28(10), 2765–2777 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of InformationRenmin University of ChinaBeijingChina
  2. 2.School of Information Resource ManagementRenmin University of ChinaBeijingChina

Personalised recommendations