Advertisement

Link Prediction

  • Raf GunsEmail author
Chapter

Abstract

Social and information networks evolve according to certain regularities. Hence, given a network structure, some potential links are more likely to occur than others. This leads to the question of link prediction: how can one predict which links will occur in a future snapshot of the network and/or which links are missing from an incomplete network?

This chapter provides a practical overview of link prediction. We present a general overview of the link prediction process and discuss its importance to applications like recommendation and anomaly detection, as well as its significance to theoretical issues. We then discuss the different steps to be taken when performing a link prediction process, including preprocessing, predictor choice, and evaluation. This is illustrated on a small-scale case study of researcher collaboration, using the freely available linkpred tool.

Keywords

Networks Network analysis Link prediction 

References

  1. Adamic, L., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.CrossRefGoogle Scholar
  2. Ahlgren, P., Jarneving, B., & Rousseau, R. (2003). Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. Journal of the American Society for Information Science and Technology, 54(6), 550–560.CrossRefGoogle Scholar
  3. Antonellis, I., Molina, H. G., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment, 1(1), 408–421.CrossRefGoogle Scholar
  4. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509.CrossRefMathSciNetGoogle Scholar
  5. Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3–4), 590–614.CrossRefzbMATHMathSciNetGoogle Scholar
  6. Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1994). Measurement in information science. San Diego, CA: Academic.Google Scholar
  7. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRefGoogle Scholar
  8. Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191), 98–101.CrossRefGoogle Scholar
  9. de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.CrossRefGoogle Scholar
  10. de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.CrossRefGoogle Scholar
  11. Egghe, L., & Michel, C. (2002). Strong similarity measures for ordered sets of documents in information retrieval. Information Processing & Management, 38(6), 823–848.CrossRefzbMATHGoogle Scholar
  12. Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073–22078.CrossRefGoogle Scholar
  13. Guns, R. (2009). Generalizing link prediction: Collaboration at the University of Antwerp as a case study. Proceedings of the American Society for Information Science & Technology, 46(1), 1–15.CrossRefGoogle Scholar
  14. Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube, & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.Google Scholar
  15. Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.Google Scholar
  16. Guns, R., Liu, Y., & Mahbuba, D. (2011). Q-measures and betweenness centrality in a collaboration network: A case study of the field of informetrics. Scientometrics, 87(1), 133–147.CrossRefGoogle Scholar
  17. Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics. doi: 10.1007/s11192-013-1228-9.Google Scholar
  18. Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In JCDL’05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 141–142). New York, NY: ACM Press.Google Scholar
  19. Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York, NY: ACM.Google Scholar
  20. Kashima, H., & Abe, N. (2006). A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM2006) (pp. 340–349). Washington, DC: IEEE Computer Society.Google Scholar
  21. Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.CrossRefzbMATHGoogle Scholar
  22. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.CrossRefGoogle Scholar
  23. Lichtenwalter, R. N., & Chawla, N. V. (2011). LPmade: Link prediction made easy. Journal of Machine Learning Research, 12, 2489–2492.zbMATHGoogle Scholar
  24. Lü, L., & Zhou, T. (2010). Link prediction in weighted networks: The role of weak ties. Europhysics Letters, 89, 18001.CrossRefGoogle Scholar
  25. Merton, R.K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.Google Scholar
  26. Murata, T., & Moriyasu, S. (2007). Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 85–88). Washington, DC: IEEE Computer Society.CrossRefGoogle Scholar
  27. Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.CrossRefGoogle Scholar
  28. Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453.CrossRefGoogle Scholar
  29. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford, CA: Stanford Digital Library Technologies Project.Google Scholar
  30. Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory with application to the literature of physics. Information Processing & Management, 12(5), 297–312.CrossRefGoogle Scholar
  31. Rattigan, M. J., & Jensen, D. (2005). The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter, 7(2), 41–47.CrossRefGoogle Scholar
  32. Rodriguez, M. A., Bollen, J., & Van de Sompel, H. (2009). Automatic metadata generation using associative networks. ACM Transactions on Information Systems, 27(2), 1–20.CrossRefGoogle Scholar
  33. Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York, NY: McGraw-Hill.zbMATHGoogle Scholar
  34. Scripps, J., Tan, P. N., & Esfahanian, A. H. (2009). Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 747–756). New York, NY: ACM.CrossRefGoogle Scholar
  35. Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.CrossRefGoogle Scholar
  36. Spertus, E., Sahami, M., & Buyukkokten, O. (2005). Evaluating similarity measures: A large-scale study in the Orkut social network. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 678–684). New York, NY: ACM.CrossRefGoogle Scholar
  37. Van Eck, N. J., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.CrossRefGoogle Scholar
  38. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: University Press.CrossRefGoogle Scholar
  39. Xhignesse, L. V., & Osgood, C. E. (1967). Bibliographical citation characteristics of the psychological journal network in 1950 and in 1960. American Psychologist, 22(9), 778–791.CrossRefGoogle Scholar
  40. Yan, E., & Guns, R. (2014). Predicting and recommending collaborations: An author-, institution-, and country-level analysis. Journal of Informetrics, 8(2), 295–309.CrossRefGoogle Scholar
  41. Zhou, T., Lü, L., & Zhang, Y.-C. (2009). Predicting missing links via local information. European Physical Journal B, 71(4), 623–630.CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Institute for Education and Information Sciences, IBWUniversity of AntwerpAntwerpenBelgium

Personalised recommendations