Advertisement

Scientometrics

, Volume 101, Issue 2, pp 1461–1473 | Cite as

Recommending research collaborations using link prediction and random forest classifiers

  • Raf Guns
  • Ronald Rousseau
Article

Abstract

We introduce a method to predict or recommend high-potential future (i.e., not yet realized) collaborations. The proposed method is based on a combination of link prediction and machine learning techniques. First, a weighted co-authorship network is constructed. We calculate scores for each node pair according to different measures called predictors. The resulting scores can be interpreted as indicative of the likelihood of future linkage for the given node pair. To determine the relative merit of each predictor, we train a random forest classifier on older data. The same classifier can then generate predictions for newer data. The top predictions are treated as recommendations for future collaboration. We apply the technique to research collaborations between cities in Africa, the Middle East and South-Asia, focusing on the topics of malaria and tuberculosis. Results show that the method yields accurate recommendations. Moreover, the method can be used to determine the relative strengths of each predictor.

Keywords

Collaboration Networks Link prediction Machine learning Random forest classifiers Recommendation Facilitator cities 

Supplementary material

11192_2013_1228_MOESM1_ESM.doc (3.2 mb)
Supplementary material 1 (DOC 3328 kb)

References

  1. Adamic, L., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.CrossRefGoogle Scholar
  2. Antonellis, I., Garcia-Molina, H., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. In Proceedings of the 34th International Conference on Very Large Data Bases (pp. 408–421). Auckland, New Zealand.Google Scholar
  3. Boshoff, N. (2010). South–South research collaboration of countries in the Southern African Development Community (SADC). Scientometrics, 84(2), 481–503.CrossRefGoogle Scholar
  4. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefzbMATHGoogle Scholar
  5. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. New York: Chapman & Hall.zbMATHGoogle Scholar
  6. Egghe, L., & Rousseau, R. (2003). A measure for the cohesion of weighted networks. Journal of the American Society for Information Science and Technology, 54(3), 193–202.MathSciNetCrossRefGoogle Scholar
  7. Frenken, K., Hardeman, S., & Hoekman, J. (2009). Spatial scientometrics. Towards a cumulative research program. Journal of Informetrics, 3(3), 222–232.CrossRefGoogle Scholar
  8. Glänzel, W., & Gupta, B. M. (2008). Science in India. A bibliometric study of national research performance in 1991–2006. ISSI Newsletter, 4(3), 42–48.Google Scholar
  9. Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.Google Scholar
  10. Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.Google Scholar
  11. Guns, R., & Rousseau, R. (2013). Predicting and recommending potential research collaborations. In J. Gorraiz et al. (Eds.), Proceedings of ISSI 2013 (pp. 1409–1418). Vienna: AIT.Google Scholar
  12. Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York: ACM.Google Scholar
  13. Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.CrossRefzbMATHGoogle Scholar
  14. Langville, A. N., & Meyer, C. D. (2005). A survey of eigenvector methods for web information retrieval. SIAM Review, 47(1), 135–161.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.CrossRefGoogle Scholar
  16. Newman, M. E. J. (2001). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.CrossRefGoogle Scholar
  17. Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.CrossRefGoogle Scholar
  18. Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetzbMATHGoogle Scholar
  19. Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory with application to the literature of physics. Information Processing and Management, 12(5), 297–312.CrossRefGoogle Scholar
  20. Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, et al. (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.Google Scholar
  21. Schubert, T., & Sooryamoorthy, R. (2010). Can the centre–periphery model explain patterns of international scientific collaboration among threshold and industrialised countries? The case of South Africa and Germany. Scientometrics, 83(1), 181–203.CrossRefGoogle Scholar
  22. Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.CrossRefGoogle Scholar
  23. The STIMULATE-6 Group. (2007). The Hirsch index applied to topics of interest to developing countries. First Monday, 12(2). Retrieved November 28, 2013, from http://www.firstmonday.org/issues/issue12_2/stimulate/.
  24. Van Eck, N. J., & Waltman, L. (2007). VOS: A new method for visualizing similarities between objects. In H.-J. Lenz, & R. Decker (Eds.), Advances in Data Analysis: Proceedings of the 30th Annual Conference of the German Classification Society (pp. 299–306). Springer.Google Scholar
  25. Van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.CrossRefGoogle Scholar
  26. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: University Press.CrossRefGoogle Scholar
  27. Yang, L. Y., & Jin, B. H. (2006). A co-occurrence study of international universities and institutes leading to a new instrument for detecting partners for research collaboration. ISSI Newsletter, 2(3), 7–9.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2014

Authors and Affiliations

  1. 1.Institute for Education and Information Sciences, IBWUniversity of AntwerpAntwerpBelgium
  2. 2.KU LeuvenLeuvenBelgium

Personalised recommendations