Skip to main content
Log in

Recommending research collaborations using link prediction and random forest classifiers

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

We introduce a method to predict or recommend high-potential future (i.e., not yet realized) collaborations. The proposed method is based on a combination of link prediction and machine learning techniques. First, a weighted co-authorship network is constructed. We calculate scores for each node pair according to different measures called predictors. The resulting scores can be interpreted as indicative of the likelihood of future linkage for the given node pair. To determine the relative merit of each predictor, we train a random forest classifier on older data. The same classifier can then generate predictions for newer data. The top predictions are treated as recommendations for future collaboration. We apply the technique to research collaborations between cities in Africa, the Middle East and South-Asia, focusing on the topics of malaria and tuberculosis. Results show that the method yields accurate recommendations. Moreover, the method can be used to determine the relative strengths of each predictor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Adamic, L., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

    Article  Google Scholar 

  • Antonellis, I., Garcia-Molina, H., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. In Proceedings of the 34th International Conference on Very Large Data Bases (pp. 408–421). Auckland, New Zealand.

  • Boshoff, N. (2010). South–South research collaboration of countries in the Southern African Development Community (SADC). Scientometrics, 84(2), 481–503.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. New York: Chapman & Hall.

    MATH  Google Scholar 

  • Egghe, L., & Rousseau, R. (2003). A measure for the cohesion of weighted networks. Journal of the American Society for Information Science and Technology, 54(3), 193–202.

    Article  MathSciNet  Google Scholar 

  • Frenken, K., Hardeman, S., & Hoekman, J. (2009). Spatial scientometrics. Towards a cumulative research program. Journal of Informetrics, 3(3), 222–232.

    Article  Google Scholar 

  • Glänzel, W., & Gupta, B. M. (2008). Science in India. A bibliometric study of national research performance in 1991–2006. ISSI Newsletter, 4(3), 42–48.

    Google Scholar 

  • Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.

  • Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.

  • Guns, R., & Rousseau, R. (2013). Predicting and recommending potential research collaborations. In J. Gorraiz et al. (Eds.), Proceedings of ISSI 2013 (pp. 1409–1418). Vienna: AIT.

  • Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York: ACM.

  • Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.

    Article  MATH  Google Scholar 

  • Langville, A. N., & Meyer, C. D. (2005). A survey of eigenvector methods for web information retrieval. SIAM Review, 47(1), 135–161.

    Article  MathSciNet  MATH  Google Scholar 

  • Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.

    Article  Google Scholar 

  • Newman, M. E. J. (2001). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.

    Article  Google Scholar 

  • Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.

    Article  Google Scholar 

  • Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory with application to the literature of physics. Information Processing and Management, 12(5), 297–312.

    Article  Google Scholar 

  • Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, et al. (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.

    Google Scholar 

  • Schubert, T., & Sooryamoorthy, R. (2010). Can the centre–periphery model explain patterns of international scientific collaboration among threshold and industrialised countries? The case of South Africa and Germany. Scientometrics, 83(1), 181–203.

    Article  Google Scholar 

  • Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.

    Article  Google Scholar 

  • The STIMULATE-6 Group. (2007). The Hirsch index applied to topics of interest to developing countries. First Monday, 12(2). Retrieved November 28, 2013, from http://www.firstmonday.org/issues/issue12_2/stimulate/.

  • Van Eck, N. J., & Waltman, L. (2007). VOS: A new method for visualizing similarities between objects. In H.-J. Lenz, & R. Decker (Eds.), Advances in Data Analysis: Proceedings of the 30th Annual Conference of the German Classification Society (pp. 299–306). Springer.

  • Van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.

    Article  Google Scholar 

  • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: University Press.

    Book  Google Scholar 

  • Yang, L. Y., & Jin, B. H. (2006). A co-occurrence study of international universities and institutes leading to a new instrument for detecting partners for research collaboration. ISSI Newsletter, 2(3), 7–9.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raf Guns.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 3328 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guns, R., Rousseau, R. Recommending research collaborations using link prediction and random forest classifiers. Scientometrics 101, 1461–1473 (2014). https://doi.org/10.1007/s11192-013-1228-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-013-1228-9

Keywords

Navigation