Social and information networks evolve according to certain regularities. Hence, given a network structure, some potential links are more likely to occur than others. This leads to the question of link prediction: how can one predict which links will occur in a future snapshot of the network and/or which links are missing from an incomplete network?
This chapter provides a practical overview of link prediction. We present a general overview of the link prediction process and discuss its importance to applications like recommendation and anomaly detection, as well as its significance to theoretical issues. We then discuss the different steps to be taken when performing a link prediction process, including preprocessing, predictor choice, and evaluation. This is illustrated on a small-scale case study of researcher collaboration, using the freely available linkpred tool.
KeywordsNetworks Network analysis Link prediction
- Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1994). Measurement in information science. San Diego, CA: Academic.Google Scholar
- Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube, & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.Google Scholar
- Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.Google Scholar
- Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In JCDL’05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 141–142). New York, NY: ACM Press.Google Scholar
- Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York, NY: ACM.Google Scholar
- Kashima, H., & Abe, N. (2006). A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM2006) (pp. 340–349). Washington, DC: IEEE Computer Society.Google Scholar
- Merton, R.K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.Google Scholar
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford, CA: Stanford Digital Library Technologies Project.Google Scholar
- Scripps, J., Tan, P. N., & Esfahanian, A. H. (2009). Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 747–756). New York, NY: ACM.CrossRefGoogle Scholar