Skip to main content

Link Prediction

  • Chapter
  • First Online:
Measuring Scholarly Impact

Abstract

Social and information networks evolve according to certain regularities. Hence, given a network structure, some potential links are more likely to occur than others. This leads to the question of link prediction: how can one predict which links will occur in a future snapshot of the network and/or which links are missing from an incomplete network?

This chapter provides a practical overview of link prediction. We present a general overview of the link prediction process and discuss its importance to applications like recommendation and anomaly detection, as well as its significance to theoretical issues. We then discuss the different steps to be taken when performing a link prediction process, including preprocessing, predictor choice, and evaluation. This is illustrated on a small-scale case study of researcher collaboration, using the freely available linkpred tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See, e.g., http://en.wikipedia.org/wiki/JSON and http://en.wikipedia.org/wiki/YAML.

References

  • Adamic, L., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

    Article  Google Scholar 

  • Ahlgren, P., Jarneving, B., & Rousseau, R. (2003). Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. Journal of the American Society for Information Science and Technology, 54(6), 550–560.

    Article  Google Scholar 

  • Antonellis, I., Molina, H. G., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment, 1(1), 408–421.

    Article  Google Scholar 

  • Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509.

    Article  MathSciNet  Google Scholar 

  • Barabási, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3–4), 590–614.

    Article  MATH  MathSciNet  Google Scholar 

  • Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1994). Measurement in information science. San Diego, CA: Academic.

    Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.

    Article  Google Scholar 

  • Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191), 98–101.

    Article  Google Scholar 

  • de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.

    Article  Google Scholar 

  • de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.

    Article  Google Scholar 

  • Egghe, L., & Michel, C. (2002). Strong similarity measures for ordered sets of documents in information retrieval. Information Processing & Management, 38(6), 823–848.

    Article  MATH  Google Scholar 

  • Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073–22078.

    Article  Google Scholar 

  • Guns, R. (2009). Generalizing link prediction: Collaboration at the University of Antwerp as a case study. Proceedings of the American Society for Information Science & Technology, 46(1), 1–15.

    Article  Google Scholar 

  • Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube, & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.

    Google Scholar 

  • Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.

    Google Scholar 

  • Guns, R., Liu, Y., & Mahbuba, D. (2011). Q-measures and betweenness centrality in a collaboration network: A case study of the field of informetrics. Scientometrics, 87(1), 133–147.

    Article  Google Scholar 

  • Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics. doi:10.1007/s11192-013-1228-9.

    Google Scholar 

  • Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In JCDL’05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 141–142). New York, NY: ACM Press.

    Google Scholar 

  • Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York, NY: ACM.

    Google Scholar 

  • Kashima, H., & Abe, N. (2006). A parameterized probabilistic model of network evolution for supervised link prediction. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM2006) (pp. 340–349). Washington, DC: IEEE Computer Society.

    Google Scholar 

  • Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.

    Article  MATH  Google Scholar 

  • Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.

    Article  Google Scholar 

  • Lichtenwalter, R. N., & Chawla, N. V. (2011). LPmade: Link prediction made easy. Journal of Machine Learning Research, 12, 2489–2492.

    MATH  Google Scholar 

  • Lü, L., & Zhou, T. (2010). Link prediction in weighted networks: The role of weak ties. Europhysics Letters, 89, 18001.

    Article  Google Scholar 

  • Merton, R.K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.

    Google Scholar 

  • Murata, T., & Moriyasu, S. (2007). Link prediction of social networks based on weighted proximity measures. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 85–88). Washington, DC: IEEE Computer Society.

    Chapter  Google Scholar 

  • Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.

    Article  Google Scholar 

  • Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453.

    Article  Google Scholar 

  • Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford, CA: Stanford Digital Library Technologies Project.

    Google Scholar 

  • Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory with application to the literature of physics. Information Processing & Management, 12(5), 297–312.

    Article  Google Scholar 

  • Rattigan, M. J., & Jensen, D. (2005). The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter, 7(2), 41–47.

    Article  Google Scholar 

  • Rodriguez, M. A., Bollen, J., & Van de Sompel, H. (2009). Automatic metadata generation using associative networks. ACM Transactions on Information Systems, 27(2), 1–20.

    Article  Google Scholar 

  • Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York, NY: McGraw-Hill.

    MATH  Google Scholar 

  • Scripps, J., Tan, P. N., & Esfahanian, A. H. (2009). Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 747–756). New York, NY: ACM.

    Chapter  Google Scholar 

  • Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.

    Article  Google Scholar 

  • Spertus, E., Sahami, M., & Buyukkokten, O. (2005). Evaluating similarity measures: A large-scale study in the Orkut social network. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 678–684). New York, NY: ACM.

    Chapter  Google Scholar 

  • Van Eck, N. J., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.

    Article  Google Scholar 

  • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: University Press.

    Book  Google Scholar 

  • Xhignesse, L. V., & Osgood, C. E. (1967). Bibliographical citation characteristics of the psychological journal network in 1950 and in 1960. American Psychologist, 22(9), 778–791.

    Article  Google Scholar 

  • Yan, E., & Guns, R. (2014). Predicting and recommending collaborations: An author-, institution-, and country-level analysis. Journal of Informetrics, 8(2), 295–309.

    Article  Google Scholar 

  • Zhou, T., Lü, L., & Zhang, Y.-C. (2009). Predicting missing links via local information. European Physical Journal B, 71(4), 623–630.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raf Guns .

Editor information

Editors and Affiliations

Appendix: Usage as a Python Module

Appendix: Usage as a Python Module

Linkpred can be used both as a standalone tool and as a Python module. Here we provide basic instructions for usage as a Python module.

Once linkpred is properly installed, we can open a Python console and load the module from within Python.

> python

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import linkpred

The linkpred module is now loaded and can be used. First, let us open a network:

>>> G = linkpred.read_network("inf1990-2004.net")

11:49:00 - INFO - Reading file 'inf1990-2004.net'…

11:49:00 - INFO - Successfully read file.

We can now explore some properties of the network (stored in variable G), such as its number of nodes or links (see the NetworkX documentation at http://networkx.github.io/ for further information):

  >>> len(G) # number of nodes

  632

  >>> G.size() # number of links

  994

Predictors can be found in the linkpred.predictors submodule. Let us create a SimRank predictor for our network as an example. By setting only_new to True, we make sure that we only predict new links (i.e., links that are not present in the current network).

  >>> simrank = linkpred.predictors.SimRank(G, only_new=True)

The line above only sets up the predictor, it does not actually apply it to the network. To do that, we invoke the predict method. Predictor parameters can be set here; we will set c to 0.5.

  >>> simrank_results = simrank.predict(c=0.5)

Finally we take a look at the top five predictions and their scores.

  >>> top = simrank_results.top(5)

  >>> for authors, score in top.items():

  … print authors, score

  …

  Tomizawa, H - Fujigaki, Y 0.188686630053

  Shirabe, M - Hayashi, T 0.143866427916

  Garfield, E - Fuseler, EA 0.148097050146

  Persson, O - Larsen, IM 0.138516589957

  Vanleeuwen, TN - Noyons, ECM 0.185040358711

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Guns, R. (2014). Link Prediction. In: Ding, Y., Rousseau, R., Wolfram, D. (eds) Measuring Scholarly Impact. Springer, Cham. https://doi.org/10.1007/978-3-319-10377-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10377-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10376-1

  • Online ISBN: 978-3-319-10377-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics