, Volume 119, Issue 2, pp 687–706 | Cite as

Formational bounds of link prediction in collaboration networks

  • Jinseok KimEmail author
  • Jana Diesner


Link prediction in collaboration networks is often solved by identifying structural properties of existing nodes that are disconnected at one point in time, and that share a link later on. The maximally possible recall rate or upper bound of this approach’s success is capped by the proportion of links that are formed among existing nodes embedded in these properties. Consequentially, sustained links as well as links that involve one or two new network participants are typically not predicted. The purpose of this study is to highlight formational constraints that need to be considered to increase the practical value of link prediction methods targeted for collaboration networks. In this study, we identify the distribution of basic link formation types based on four large-scale, over-time collaboration networks, showing that roughly speaking, 25% of links represent continued collaborations, 25% of links are new collaborations between existing authors, and 50% are formed between an existing author and a new network member. This implies that for collaboration networks, increasing the accuracy of computational link prediction solutions may not be a reasonable goal when the ratio of collaboration links that are eligible to the classic link prediction process is low.


Collaboration network Link prediction Network evolution Link formation primitives Preferential attachment 



This work is supported, in part, by Korea Institute of Science and Technology Information (KISTI). We would like to thank Vetle Torvik (University of Illinois at Urbana-Champaign), the American Physical Society, DBLP, and KISTI for providing datasets. We are also grateful to Mark E. J. Newman (University of Michigan) for providing code for disambiguating author names in APS data and Raf Guns (University of Antwerp) for comments on link prediction processes in LinkPred.


  1. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230. Scholar
  2. Barabási, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A-Statistical Mechanics and Its Applications, 311(3–4), 590–614. Scholar
  3. Braun, T., Glänzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51(3), 499–510. Scholar
  4. Cabanac, G., Hubert, G., & Milard, B. (2015). Academic careers in Computer Science: Continuance and transience of lifetime co-authorships. Scientometrics, 102(1), 135–150. Scholar
  5. Chen, D.-B., Xiao, R., & Zeng, A. (2014). Predicting the evolution of spreading on complex networks. Scientific Reports. CrossRefGoogle Scholar
  6. Chen, H., Li, X., & Huang, Z. (2005). Link prediction approach to collaborative filtering. Paper presented at the proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (JCDL ‘05).Google Scholar
  7. Choudhury, N., & Uddin, S. (2017). Mining actor-level structural and neighborhood evolution for link prediction in dynamic networks. Paper presented at the Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, Sydney, Australia.Google Scholar
  8. Choudhury, N., & Uddin, S. (2018). Evolutionary community mining for link prediction in dynamic networks. Paper presented at the complex networks & their applications VI, Lyon, France.Google Scholar
  9. Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. Siam Review, 51(4), 661–703. Scholar
  10. Fegley, B. D., & Torvik, V. I. (2013). Has large-scale named-entity network analysis been resting on a flawed assumption? PLoS ONE, 8(7), 1–16. Scholar
  11. Guns, R. (2014). Link prediction. In Measuring scholarly impact (pp. 35–55). Springer.Google Scholar
  12. Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics, 101(2), 1461–1473. Scholar
  13. Kim, J. (2018). Evaluating author name disambiguation for digital libraries: A case of DBLP. Scientometrics, 116(3), 1867–1886. Scholar
  14. Kim, J., & Diesner, J. (2015). The effect of data pre-processing on understanding the evolution of collaboration networks. Journal of Informetrics, 9(1), 226–236. Scholar
  15. Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461.CrossRefGoogle Scholar
  16. Kim, J., & Diesner, J. (2017). Over-time measurement of triadic closure in coauthorship networks. Social Network Analysis and Mining, 7(1), 1–12. Scholar
  17. Kim, J., Tao, L., Lee, S.-H., & Diesner, J. (2016). Evolution and structure of scientific co-publishing network in Korea between 1948–2011. Scientometrics, 107(1), 27–41. Scholar
  18. Lerchenmueller, M. J., & Sorenson, O. (2016). Author Disambiguation in PubMed: Evidence on the precision and recall of author-ity among NIH-funded scientists. PLoS ONE, 11(7), e0158731.CrossRefGoogle Scholar
  19. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031. Scholar
  20. Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and Its Applications, 390(6), 1150–1170.CrossRefGoogle Scholar
  21. Martin, T., Ball, B., Karrer, B., & Newman, M. E. J. (2013). Coauthorship and citation patterns in the Physical Review. Physical Review E, 88(1), 012814. Scholar
  22. Milojević, S. (2010). Modes of collaboration in modern science: Beyond power laws and preferential attachment. Journal of the American Society for Information Science and Technology, 61(7), 1410–1423. Scholar
  23. Mohdeb, D., Boubetra, A., & Charikhi, M. (2016). Tie persistence in academic social networks. Informatica, 40(3), 353.MathSciNetGoogle Scholar
  24. Mollenhorst, G., Volker, B., & Flap, H. (2011). Shared contexts and triadic closure in core discussion networks. Social Networks, 33(4), 292–302. Scholar
  25. Newman, D., Karimi, S., & Cavedon, L. (2009). Using topic models to interpret MEDLINE’s medical subject headings. In A. Nicholson, & X. Li (Eds.), AI 2009: Advances in artificial intelligence (Vol. 5866, pp. 270–279). Berlin, Heidelberg: Springer.Google Scholar
  26. Newman, M. E. J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E. Scholar
  27. Newman, M. E. J. (2001b). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404–409. Scholar
  28. Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., & Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences of the United States of America, 99(8), 5207–5211. Scholar
  29. Perc, M. (2014). The Matthew effect in empirical data. Journal of The Royal Society Interface. Scholar
  30. Price, D., & Gürsey, S. (1976). Studies in scientometrics. 1. Transience and continuance in scientific authorship. Paper presented at the international forum on information and documentation.Google Scholar
  31. Reitz, F., & Hoffmann, O. (2011). Did they notice? A case-study on the community contribution to data quality in DBLP. In S. Gradmann, F. Borri, C. Meghini, & H. Schuldt (Eds.), Research and advanced technology for digital libraries, TPDL 2011 (Vol. 6966, pp. 204–215). Berlin: Springer.CrossRefGoogle Scholar
  32. Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58.CrossRefGoogle Scholar
  33. Schubert, A., & Glänzel, W. (1991). Publication dynamics—Models and indicators. Scientometrics, 20(1), 317–331. Scholar
  34. Taskar, B., Wong, M. F., Abbeel, P., & Koller, D. (2003). Link prediction in relational data. Paper presented at the advances in neural information processing systems.Google Scholar
  35. Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. Scholar
  36. Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608–1618. Scholar
  37. Yan, E., & Guns, R. (2014). Predicting and recommending collaborations: An author-, institution-, and country-level analysis. Journal of Informetrics, 8(2), 295–309. Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  1. 1.Institute for Research on Innovation and Science, Survey Research Center, Institute for Social ResearchUniversity of MichiganAnn ArborUSA
  2. 2.School of Information SciencesUniversity of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations