Abstract
Social and information networks evolve according to certain regularities. Hence, given a network structure, some potential links are more likely to occur than others. This leads to the question of link prediction: how can one predict which links will occur in a future snapshot of the network and/or which links are missing from an incomplete network?
This chapter provides a practical overview of link prediction. We present a general overview of the link prediction process and discuss its importance to applications like recommendation and anomaly detection, as well as its significance to theoretical issues. We then discuss the different steps to be taken when performing a link prediction process, including preprocessing, predictor choice, and evaluation. This is illustrated on a small-scale case study of researcher collaboration, using the freely available linkpred tool.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See, e.g., http://en.wikipedia.org/wiki/JSON and http://en.wikipedia.org/wiki/YAML.
References
Adamic, L., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
Ahlgren, P., Jarneving, B., & Rousseau, R. (2003). Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. Journal of the American Society for Information Science and Technology, 54(6), 550–560.
Antonellis, I., Molina, H. G., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment, 1(1), 408–421.
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509.
Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3–4), 590–614.
Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1994). Measurement in information science. San Diego, CA: Academic.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191), 98–101.
de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.
Egghe, L., & Michel, C. (2002). Strong similarity measures for ordered sets of documents in information retrieval. Information Processing & Management, 38(6), 823–848.
Guimerà , R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073–22078.
Guns, R. (2009). Generalizing link prediction: Collaboration at the University of Antwerp as a case study. Proceedings of the American Society for Information Science & Technology, 46(1), 1–15.
Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube, & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.
Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.
Guns, R., Liu, Y., & Mahbuba, D. (2011). Q-measures and betweenness centrality in a collaboration network: A case study of the field of informetrics. Scientometrics, 87(1), 133–147.
Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics. doi:10.1007/s11192-013-1228-9.
Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In JCDL’05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 141–142). New York, NY: ACM Press.
Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York, NY: ACM.
Kashima, H., & Abe, N. (2006). A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM2006) (pp. 340–349). Washington, DC: IEEE Computer Society.
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
Lichtenwalter, R. N., & Chawla, N. V. (2011). LPmade: Link prediction made easy. Journal of Machine Learning Research, 12, 2489–2492.
Lü, L., & Zhou, T. (2010). Link prediction in weighted networks: The role of weak ties. Europhysics Letters, 89, 18001.
Merton, R.K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.
Murata, T., & Moriyasu, S. (2007). Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 85–88). Washington, DC: IEEE Computer Society.
Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.
Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford, CA: Stanford Digital Library Technologies Project.
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory with application to the literature of physics. Information Processing & Management, 12(5), 297–312.
Rattigan, M. J., & Jensen, D. (2005). The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter, 7(2), 41–47.
Rodriguez, M. A., Bollen, J., & Van de Sompel, H. (2009). Automatic metadata generation using associative networks. ACM Transactions on Information Systems, 27(2), 1–20.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York, NY: McGraw-Hill.
Scripps, J., Tan, P. N., & Esfahanian, A. H. (2009). Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 747–756). New York, NY: ACM.
Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.
Spertus, E., Sahami, M., & Buyukkokten, O. (2005). Evaluating similarity measures: A large-scale study in the Orkut social network. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 678–684). New York, NY: ACM.
Van Eck, N. J., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: University Press.
Xhignesse, L. V., & Osgood, C. E. (1967). Bibliographical citation characteristics of the psychological journal network in 1950 and in 1960. American Psychologist, 22(9), 778–791.
Yan, E., & Guns, R. (2014). Predicting and recommending collaborations: An author-, institution-, and country-level analysis. Journal of Informetrics, 8(2), 295–309.
Zhou, T., Lü, L., & Zhang, Y.-C. (2009). Predicting missing links via local information. European Physical Journal B, 71(4), 623–630.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Usage as a Python Module
Appendix: Usage as a Python Module
Linkpred can be used both as a standalone tool and as a Python module. Here we provide basic instructions for usage as a Python module.
Once linkpred is properly installed, we can open a Python console and load the module from within Python.
> python
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import linkpred
The linkpred module is now loaded and can be used. First, let us open a network:
>>> G = linkpred.read_network("inf1990-2004.net")
11:49:00 - INFO - Reading file 'inf1990-2004.net'…
11:49:00 - INFO - Successfully read file.
We can now explore some properties of the network (stored in variable G), such as its number of nodes or links (see the NetworkX documentation at http://networkx.github.io/ for further information):
  >>> len(G) # number of nodes
  632
  >>> G.size() # number of links
  994
Predictors can be found in the linkpred.predictors submodule. Let us create a SimRank predictor for our network as an example. By setting only_new to True, we make sure that we only predict new links (i.e., links that are not present in the current network).
  >>> simrank = linkpred.predictors.SimRank(G, only_new=True)
The line above only sets up the predictor, it does not actually apply it to the network. To do that, we invoke the predict method. Predictor parameters can be set here; we will set c to 0.5.
  >>> simrank_results = simrank.predict(c=0.5)
Finally we take a look at the top five predictions and their scores.
  >>> top = simrank_results.top(5)
  >>> for authors, score in top.items():
  … print authors, score
  …
  Tomizawa, H - Fujigaki, Y 0.188686630053
  Shirabe, M - Hayashi, T 0.143866427916
  Garfield, E - Fuseler, EA 0.148097050146
  Persson, O - Larsen, IM 0.138516589957
  Vanleeuwen, TN - Noyons, ECM 0.185040358711
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Guns, R. (2014). Link Prediction. In: Ding, Y., Rousseau, R., Wolfram, D. (eds) Measuring Scholarly Impact. Springer, Cham. https://doi.org/10.1007/978-3-319-10377-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-10377-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10376-1
Online ISBN: 978-3-319-10377-8
eBook Packages: Computer ScienceComputer Science (R0)