Supervised Link Prediction Using Random Walks

  • Yuechang LiuEmail author
  • Hanghang Tong
  • Lei Xie
  • Yong Tang
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 568)


Network structure has become increasingly popular in big-data representation over the last few years. As a result, network based analysis techniques are applied to networks containing millions of nodes. Link prediction helps people to uncover the missing or unknown links between nodes in networks, which is an essential task in network analysis.

Random walk based methods have shown outstanding performance in such task. However, the primary bottleneck for such methods is adapting to networks with different structure and dynamics, and scaling to the network magnitude. Inspired by Random Walk with Restart (RWR), a promising approach for link prediction, this paper proposes a set of path based features and a supervised learning technique, called Supervised Random Walk with Restart (SRWR) to identify missing links. We show that by using these features, a classifier can successfully order the potential links by their closeness to the query node. A new type of heterogeneous network, called Generalized Bi-relation Netowrk (GBN), is defined in this paper, upon which the novel structural features are introduced. Finally experiments are performed on a disease-chemical-gene interaction network, whose result shows SRWR significantly outperforms standard RWR algorithm in terms of the Area Under ROC Curve (AUC) gained and better than or equal to the best algorithms in the field of gene prioritization.


Support Vector Machine Heterogeneous Network Link Prediction Candidate Node Query Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This material is supported by National Institutes of Health under the grant number R01LM011986. The content of the information in this document does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. This work is also supported in part by the National High-Technology Research and Development Program (863 Program) of China under Grand 2013AA01A212, National Science Foundation Grant 61272067, 61370229 and Jiaying University Grant (“Collaboration Mechanism and Application in Social Networks.”).


  1. 1.
    Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 635–644. ACM, New York (2011)Google Scholar
  2. 2.
    Bromberg, Y.: Disease gene prioritization. PLoS Comput. Biol. 9(4), e1002902 (2013). 00014CrossRefGoogle Scholar
  3. 3.
    Chakrabarti, S., Agarwal, A.: Learning parameters in entity relationship graphs from ranking preferences. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 91–102. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Tech. 2(3), 27:1–27:27 (2011). 22106CrossRefGoogle Scholar
  5. 5.
    Cohen, S., Kimelfeld, B., Koutrika, G.: A Survey on Proximity Measures for Social Networks. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 191–206. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Cukierski, W., Hamner, B., Yang, B.: Graph-based features for supervised link prediction. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 1237–1244, July 2011Google Scholar
  7. 7.
    Davis, A.P., Grondin, C.J., Lennon-Hopkin, K., Saraceni-Richards, C., Sciaky, D., King, B.L., Wiegers, T.C., Mattingly, C.J.: The comparative toxicogenomics database’s 10th year anniversary: update 2015. Nucleic Acids Res. 43(Database issue), D914–D920 (2015)CrossRefGoogle Scholar
  8. 8.
    Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., Elovici, Y.: Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), pp. 73–80, October 2011Google Scholar
  9. 9.
    Hasan, M.A., Chaoji, V., Salem, S., Zaki, M.: Link prediction using supervised learning. In: Proceedings of SDM 2006 Workshop on Link Analysis. Counterterrorism and Security (2006). 00358Google Scholar
  10. 10.
    Hasan, M.A., Zaki, M.J.: A survey of link prediction in social networks. In: Aggarwal, C.C. (ed.) Social Network Data Analytics, pp. 243–275. Springer, USA (2011). 00107 CrossRefGoogle Scholar
  11. 11.
    Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bulletin de la Societe Vaudoise des Sciences Naturelles 37(142), 547–579 (1901)Google Scholar
  12. 12.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefzbMATHGoogle Scholar
  13. 13.
    Lao, N., Cohen, W.W.: Fast query execution for retrieval models based on path-constrained random walks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 881–888. ACM, New York (2010)Google Scholar
  14. 14.
    Li, Y., Patra, J.C.: Genome-wide inferring genecphenotype relationship by walking on the heterogeneous network. Bioinformatics 26(9), 1219–1224 (2010)CrossRefGoogle Scholar
  15. 15.
    Liben-Nowell, D., Kleinberg, J.: The link prediction problem for social networks. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM 2003, pp. 556–559. ACM, New York (2003)Google Scholar
  16. 16.
    Lu, L., Zhou, T.: Link prediction in complex networks: a survey. Physica A Stat. Mech. Appl. 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
  17. 17.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)Google Scholar
  18. 18.
    Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: Automatic multimedia cross-modal correlation discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 653–658. ACM, New York (2004)Google Scholar
  19. 19.
    Salton, G.: Introduction to Modern Information Retrieval. Mcgraw-Hill College, New York (1983)zbMATHGoogle Scholar
  20. 20.
    Tong, H., Faloutsos, C., Pan, J.Y.: Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 613–622. IEEE Computer Society, Washington, DC, USA (2006)Google Scholar
  21. 21.
    Xia, J., Caragea, D., Hsu, W.: Bi-relational network analysis using a fast random walk with restart. In: Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 1052–1057 (2009). 00011Google Scholar
  22. 22.
    Xie, M., Hwang, T., Kuang, R.: Prioritizing disease genes by Bi-random walk. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part II. LNCS, vol. 7302, pp. 292–303. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  23. 23.
    Zhang, J., Kong, X., Yu, P.S.: Predicting social links for new users across aligned heterogeneous social networks, October 2013. arXiv: arXiv:1310.3492 [physics]

Copyright information

© Springer Science+Business Media Singapore 2015

Authors and Affiliations

  • Yuechang Liu
    • 1
    • 3
    Email author
  • Hanghang Tong
    • 2
  • Lei Xie
    • 4
  • Yong Tang
    • 1
  1. 1.South China Normal UniversityGuangzhouChina
  2. 2.School of Computing, Informatics and Decision Systems EngineeringArizona State UniversityTempeUSA
  3. 3.School of Computer ScienceJiaying UniversityMeizhouChina
  4. 4.Computer ScienceHunter College, The City University of New YorkNew YorkUSA

Personalised recommendations