Recommending research collaborations using link prediction and random forest classifiers
- 916 Downloads
We introduce a method to predict or recommend high-potential future (i.e., not yet realized) collaborations. The proposed method is based on a combination of link prediction and machine learning techniques. First, a weighted co-authorship network is constructed. We calculate scores for each node pair according to different measures called predictors. The resulting scores can be interpreted as indicative of the likelihood of future linkage for the given node pair. To determine the relative merit of each predictor, we train a random forest classifier on older data. The same classifier can then generate predictions for newer data. The top predictions are treated as recommendations for future collaboration. We apply the technique to research collaborations between cities in Africa, the Middle East and South-Asia, focusing on the topics of malaria and tuberculosis. Results show that the method yields accurate recommendations. Moreover, the method can be used to determine the relative strengths of each predictor.
KeywordsCollaboration Networks Link prediction Machine learning Random forest classifiers Recommendation Facilitator cities
- Antonellis, I., Garcia-Molina, H., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. In Proceedings of the 34th International Conference on Very Large Data Bases (pp. 408–421). Auckland, New Zealand.Google Scholar
- Glänzel, W., & Gupta, B. M. (2008). Science in India. A bibliometric study of national research performance in 1991–2006. ISSI Newsletter, 4(3), 42–48.Google Scholar
- Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.Google Scholar
- Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.Google Scholar
- Guns, R., & Rousseau, R. (2013). Predicting and recommending potential research collaborations. In J. Gorraiz et al. (Eds.), Proceedings of ISSI 2013 (pp. 1409–1418). Vienna: AIT.Google Scholar
- Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York: ACM.Google Scholar
- Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, et al. (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.Google Scholar
- The STIMULATE-6 Group. (2007). The Hirsch index applied to topics of interest to developing countries. First Monday, 12(2). Retrieved November 28, 2013, from http://www.firstmonday.org/issues/issue12_2/stimulate/.
- Van Eck, N. J., & Waltman, L. (2007). VOS: A new method for visualizing similarities between objects. In H.-J. Lenz, & R. Decker (Eds.), Advances in Data Analysis: Proceedings of the 30th Annual Conference of the German Classification Society (pp. 299–306). Springer.Google Scholar
- Yang, L. Y., & Jin, B. H. (2006). A co-occurrence study of international universities and institutes leading to a new instrument for detecting partners for research collaboration. ISSI Newsletter, 2(3), 7–9.Google Scholar