Skip to main content

CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting


The mechanism why two strange scholars become collaborators has been extensively studied from the perspective of social network analysis. In academia, two scholars may collaborate with each other more than once, which means that scientific collaboration is to some extent sustainable. However, less research has been done to explore the sustainability of scientific collaboration. In this paper, we examine to what extent the collaboration sustainability can be predicted. For this purpose, an extreme gradient boosting-based collaboration sustainability prediction model named CSTeller is devised. We propose to analyze the sustainability of scientific collaboration from the perspectives of collaboration duration and collaboration times. We investigate factors that may affect collaboration sustainability based on scholars’ local properties and network properties. These factors are adopted as input features of CSTeller. Extensive experiments on two real scholarly datasets demonstrate the effectiveness of our proposed model. To the best of our knowledge, this is the first attempt to explore scientific collaboration mechanism from the perspective of sustainability. Our work may shed light on scientific collaboration analysis and benefit many practical issues such as collaborator recommendation since a scientific collaboration is not a one-shot deal.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12


  1. Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, p 072015. IOP Publishing (2015)

  2. Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)

    Article  Google Scholar 

  3. Benchettara, N., Kanawati, R., Rouveirol, C.: Supervised machine learning applied to link prediction in bipartite social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 326–330. IEEE (2010)

  4. Birnholtz, J.P.: When do researchers collaborate? Toward a model of collaboration propensity. J. Am. Soc. Inf. Sci. Technol. 58(14), 2226–2239 (2007)

    Article  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  6. Bu, Y., Ding, Y., Liang, X., Murray, D.S.: Understanding persistent scientific collaboration. Journal of the Association for Information Science and Technology p. (2017)

    Article  Google Scholar 

  7. Caragea, C., Wu, J., Williams, K., Gollapalli, S.D., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: WSDM 2014 Workshop on Web-scale Classification: Classifying Big Data from the Web (2014)

  8. Chakraborty, T., Patranabis, S., Goyal, P., Mukherjee, A.: On the formation of circles in co-authorship networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 109–118. ACM (2015)

  9. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 785–794. ACM, New York (2016),

  10. Choudhury, N., Uddin, S.: Time-aware link prediction to explore network effects on temporal knowledge evolution. Scientometrics 108(2), 745–776 (2016)

    Article  Google Scholar 

  11. Cronin, B., Shaw, D., La Barre, K.: A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy. J. Am. Soc. Inf. Sci. Technol. 54(9), 855–871 (2003)

    Article  Google Scholar 

  12. Dong, Y., Johnson, R.A., Yang, Y., Chawla, N.V.: Collaboration signatures reveal scientific impact. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 480–487. IEEE (2015)

  13. Dong, Y., Johnson, R.A., Chawla, N.V.: Can scientific impact be predicted? IEEE Trans. Big Data 2(1), 18–30 (2016)

    Article  Google Scholar 

  14. Eom, Y.H., Jo, H.H.: Generalized friendship paradox in complex networks: The case of scientific collaboration. Sci. Rep. 4, 4603 (2014)

    Article  Google Scholar 

  15. Granovetter, M.S.: The strength of weak ties. Am. J. Sociol., 1360–1380 (1973)

    Article  Google Scholar 

  16. Hara, N., Solomon, P., Kim, S.L., Sonnenwald, D.H.: An emerging view of scientific collaboration: Scientists’ perspectives on collaboration and factors that impact collaboration. J. Am. Soc. Inf. Sci. Technol. 54(10), 952–965 (2003)

    Article  Google Scholar 

  17. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  18. Hou, H., Kretschmer, H., Liu, Z.: The structure of scientific collaboration networks in scientometrics. Scientometrics 75(2), 189–202 (2007)

    Article  Google Scholar 

  19. Huang, J., Zhuang, Z., Li, J., Giles, C.L.: Collaboration over time: Characterizing and modeling network evolution. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp 107–116. ACM (2008)

  20. Jiang, T., Liu, T., Ge, T., Sha, L., Li, S., Chang, B., Sui, Z.: Encoding temporal information for time-aware link prediction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2350–2354 (2016)

  21. Katz, J.S., Martin, B.R.: What is research collaboration? Res. Policy 26(1), 1–18 (1997)

    Article  Google Scholar 

  22. Khabsa, M., Giles, C.L.: The number of scholarly documents on the public Web. PLoS ONE 9(5), e93, 949 (2014)

    Article  Google Scholar 

  23. Khan, S., Liu, X., Shakil, K.A., Alam, M.: A survey on scholarly data: From big data perspective. Inf. Process. Manag. 53(4), 923–944 (2017)

    Article  Google Scholar 

  24. Kong, X., Jiang, H., Yang, Z., Xu, Z., Xia, F., Tolba, A.: Exploiting publication contents and collaboration networks for collaborator recommendation. PLoS ONE 11(2), e0148, 492 (2016)

    Article  Google Scholar 

  25. Kong, X., Mao, M., Wang, W., Liu, J., Xu, B.: Voprec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing. (2018)

  26. Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)

    Article  MathSciNet  Google Scholar 

  27. Kramer, O.: K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors, pp 13–23. Springer (2013)

  28. Li, J., Xia, F., Wang, W., Chen, Z., Asabere, N.Y., Jiang, H.: Acrec: A co-authorship based random walk model for academic collaboration recommendation. In: Proceedings of the 23rd International Conference on World Wide Web, pp 1209–1214. ACM (2014)

  29. Li, L., Tong, H.: The child is father of the man: Foresee the success at the early stage. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 655–664. ACM (2015)

  30. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  31. Liu, H., Zhang, X., Zhang, X., Cui, Y.: Self-adapted mixture distance measure for clustering uncertain data. Knowl.-Based Syst. 126, 33–47 (2017)

    Article  Google Scholar 

  32. Lopes, G.R., Moro, M.M., Wives, L.K., De Oliveira, J.P.M.: Collaboration recommendation on academic social networks. In: International Conference on Conceptual Modeling, pp 190–199. Springer (2010)

  33. Lü, L., Zhou, T.: Link prediction in complex networks: A survey. Physica A: Statist. Mech. Appl. 390(6), 1150–1170 (2011)

    Article  Google Scholar 

  34. Newman , M.E.: Scientific collaboration networks. ii. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64(1), 016, 132 (2001)

    Article  MathSciNet  Google Scholar 

  35. Newman, M.E.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)

    Article  MathSciNet  Google Scholar 

  36. Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proc. Nat. Acad. Sci. 101(suppl 1), 5200–5205 (2004)

    Article  Google Scholar 

  37. Persson, O., Glänzel, W., Danell, R.: Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics 60(3), 421–432 (2004)

    Article  Google Scholar 

  38. Petersen, A.M.: Quantifying the impact of weak, strong, and super ties in scientific careers. Proc. Natl. Acad. Sci. 112(34), E4671–E4680 (2015)

    Article  Google Scholar 

  39. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific (2014)

  40. Seber, G.A., Lee, A.J.: Linear Regression Analysis, vol. 936. Wiley (2012)

  41. Sinatra, R., Wang, D., Deville, P., Song, C., Barabási, A.L.: Quantifying the evolution of individual scientific impact. Science 354(6312), aaf5239 (2016)

    Article  Google Scholar 

  42. Sonnenwald, D.H.: Scientific collaboration. Ann. Rev. Inf. Sci. Technol. 41(1), 643–681 (2007)

    Article  Google Scholar 

  43. Stokols, D., Hall, K.L., Taylor, B.K., Moser, R.P.: The science of team science: Overview of the field and introduction to the supplement. Am. J. Prev. Med. 35(2), S77–S89 (2008)

    Article  Google Scholar 

  44. Sun, Y., Han, J., Aggarwal, C.C., Chawla, N.V.: When will it happen?: Relationship prediction in heterogeneous information networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp 663–672. ACM (2012)

  45. Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1285–1293. ACM (2012)

  46. Tang, J., Chang, S., Aggarwal, C., Liu, H.: Negative link prediction in social media. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp 87–96. ACM (2015)

  47. Tsai, C.H., Lin, Y.R.: Tracing and predicting collaboration for junior scholars. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp 375–380. International World Wide Web Conferences Steering Committee (2016)

  48. Tylenda, T., Angelova, R., Bedathur, S.: Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, p 9. ACM (2009)

  49. Wang, W., Bai, X., Xia, F., Bekele, T.M., Su, X., Tolba, A.: From triadic closure to conference closure: The role of academic conferences in promoting scientific collaborations. Scientometrics 113(1), 177–193 (2017)

    Article  Google Scholar 

  50. Wang, W., Cui, Z., Gao, T., Yu, S., Kong, X., Xia, F.: Is scientific collaboration sustainability predictable?. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp 853–854. International World Wide Web Conferences Steering Committee (2017)

  51. Williams, K., Wu, J., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the citeseer χ digital library. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), pp 68–73. IEEE (2014)

  52. Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science 316(5827), 1036–1039 (2007)

    Article  Google Scholar 

  53. Xia, F., Chen, Z., Wang, W., Li, J., Yang, L.T.: Mvcwalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Trans. Emerg. Topics Comput. 2(3), 364–375 (2014)

    Article  Google Scholar 

  54. Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: A survey. IEEE Trans. Big Data 3(1), 18–35 (2017)

    Article  Google Scholar 

  55. Yang, Z.R.: Biological applications of support vector machines. Brief. Bioinform. 5(4), 328–338 (2004)

    Article  Google Scholar 

  56. Zhang, C., Bu, Y., Ding, Y., Xu, J.: Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology. (2017)

    Article  Google Scholar 

Download references


We thank Tong Gao for assistance with the experiments. This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61502071, 71774020 and 71473028, and the Fundamental Research Funds for the Central Universities under Grant (DUT18JC09).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Bo Xu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Social Computing and Big Data Applications

Guest Editors: Xiaoming Fu, Hong Huang, Gareth Tyson, Lu Zheng, and Gang Wang

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Xu, B., Liu, J. et al. CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting. World Wide Web 22, 2749–2770 (2019).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: