World Wide Web

, Volume 22, Issue 6, pp 2749–2770 | Cite as

CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting

  • Wei Wang
  • Bo XuEmail author
  • Jiaying Liu
  • Zixin Cui
  • Shuo Yu
  • Xiangjie Kong
  • Feng Xia
Part of the following topical collections:
  1. Special Issue on Social Computing and Big Data Applications


The mechanism why two strange scholars become collaborators has been extensively studied from the perspective of social network analysis. In academia, two scholars may collaborate with each other more than once, which means that scientific collaboration is to some extent sustainable. However, less research has been done to explore the sustainability of scientific collaboration. In this paper, we examine to what extent the collaboration sustainability can be predicted. For this purpose, an extreme gradient boosting-based collaboration sustainability prediction model named CSTeller is devised. We propose to analyze the sustainability of scientific collaboration from the perspectives of collaboration duration and collaboration times. We investigate factors that may affect collaboration sustainability based on scholars’ local properties and network properties. These factors are adopted as input features of CSTeller. Extensive experiments on two real scholarly datasets demonstrate the effectiveness of our proposed model. To the best of our knowledge, this is the first attempt to explore scientific collaboration mechanism from the perspective of sustainability. Our work may shed light on scientific collaboration analysis and benefit many practical issues such as collaborator recommendation since a scientific collaboration is not a one-shot deal.


Scholarly big data Deep learning Relation mining Coauthor network 



We thank Tong Gao for assistance with the experiments. This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61502071, 71774020 and 71473028, and the Fundamental Research Funds for the Central Universities under Grant (DUT18JC09).


  1. 1.
    Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, p 072015. IOP Publishing (2015)Google Scholar
  2. 2.
    Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)CrossRefGoogle Scholar
  3. 3.
    Benchettara, N., Kanawati, R., Rouveirol, C.: Supervised machine learning applied to link prediction in bipartite social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 326–330. IEEE (2010)Google Scholar
  4. 4.
    Birnholtz, J.P.: When do researchers collaborate? Toward a model of collaboration propensity. J. Am. Soc. Inf. Sci. Technol. 58(14), 2226–2239 (2007)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  6. 6.
    Bu, Y., Ding, Y., Liang, X., Murray, D.S.: Understanding persistent scientific collaboration. Journal of the Association for Information Science and Technology p. (2017)CrossRefGoogle Scholar
  7. 7.
    Caragea, C., Wu, J., Williams, K., Gollapalli, S.D., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: WSDM 2014 Workshop on Web-scale Classification: Classifying Big Data from the Web (2014)Google Scholar
  8. 8.
    Chakraborty, T., Patranabis, S., Goyal, P., Mukherjee, A.: On the formation of circles in co-authorship networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 109–118. ACM (2015)Google Scholar
  9. 9.
    Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 785–794. ACM, New York (2016),
  10. 10.
    Choudhury, N., Uddin, S.: Time-aware link prediction to explore network effects on temporal knowledge evolution. Scientometrics 108(2), 745–776 (2016)CrossRefGoogle Scholar
  11. 11.
    Cronin, B., Shaw, D., La Barre, K.: A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy. J. Am. Soc. Inf. Sci. Technol. 54(9), 855–871 (2003)CrossRefGoogle Scholar
  12. 12.
    Dong, Y., Johnson, R.A., Yang, Y., Chawla, N.V.: Collaboration signatures reveal scientific impact. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 480–487. IEEE (2015)Google Scholar
  13. 13.
    Dong, Y., Johnson, R.A., Chawla, N.V.: Can scientific impact be predicted? IEEE Trans. Big Data 2(1), 18–30 (2016)CrossRefGoogle Scholar
  14. 14.
    Eom, Y.H., Jo, H.H.: Generalized friendship paradox in complex networks: The case of scientific collaboration. Sci. Rep. 4, 4603 (2014)CrossRefGoogle Scholar
  15. 15.
    Granovetter, M.S.: The strength of weak ties. Am. J. Sociol., 1360–1380 (1973)CrossRefGoogle Scholar
  16. 16.
    Hara, N., Solomon, P., Kim, S.L., Sonnenwald, D.H.: An emerging view of scientific collaboration: Scientists’ perspectives on collaboration and factors that impact collaboration. J. Am. Soc. Inf. Sci. Technol. 54(10), 952–965 (2003)CrossRefGoogle Scholar
  17. 17.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  18. 18.
    Hou, H., Kretschmer, H., Liu, Z.: The structure of scientific collaboration networks in scientometrics. Scientometrics 75(2), 189–202 (2007)CrossRefGoogle Scholar
  19. 19.
    Huang, J., Zhuang, Z., Li, J., Giles, C.L.: Collaboration over time: Characterizing and modeling network evolution. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp 107–116. ACM (2008)Google Scholar
  20. 20.
    Jiang, T., Liu, T., Ge, T., Sha, L., Li, S., Chang, B., Sui, Z.: Encoding temporal information for time-aware link prediction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2350–2354 (2016)Google Scholar
  21. 21.
    Katz, J.S., Martin, B.R.: What is research collaboration? Res. Policy 26(1), 1–18 (1997)CrossRefGoogle Scholar
  22. 22.
    Khabsa, M., Giles, C.L.: The number of scholarly documents on the public Web. PLoS ONE 9(5), e93, 949 (2014)CrossRefGoogle Scholar
  23. 23.
    Khan, S., Liu, X., Shakil, K.A., Alam, M.: A survey on scholarly data: From big data perspective. Inf. Process. Manag. 53(4), 923–944 (2017)CrossRefGoogle Scholar
  24. 24.
    Kong, X., Jiang, H., Yang, Z., Xu, Z., Xia, F., Tolba, A.: Exploiting publication contents and collaboration networks for collaborator recommendation. PLoS ONE 11(2), e0148, 492 (2016)CrossRefGoogle Scholar
  25. 25.
    Kong, X., Mao, M., Wang, W., Liu, J., Xu, B.: Voprec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing. (2018)
  26. 26.
    Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Kramer, O.: K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors, pp 13–23. Springer (2013)Google Scholar
  28. 28.
    Li, J., Xia, F., Wang, W., Chen, Z., Asabere, N.Y., Jiang, H.: Acrec: A co-authorship based random walk model for academic collaboration recommendation. In: Proceedings of the 23rd International Conference on World Wide Web, pp 1209–1214. ACM (2014)Google Scholar
  29. 29.
    Li, L., Tong, H.: The child is father of the man: Foresee the success at the early stage. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 655–664. ACM (2015)Google Scholar
  30. 30.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  31. 31.
    Liu, H., Zhang, X., Zhang, X., Cui, Y.: Self-adapted mixture distance measure for clustering uncertain data. Knowl.-Based Syst. 126, 33–47 (2017)CrossRefGoogle Scholar
  32. 32.
    Lopes, G.R., Moro, M.M., Wives, L.K., De Oliveira, J.P.M.: Collaboration recommendation on academic social networks. In: International Conference on Conceptual Modeling, pp 190–199. Springer (2010)Google Scholar
  33. 33.
    Lü, L., Zhou, T.: Link prediction in complex networks: A survey. Physica A: Statist. Mech. Appl. 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
  34. 34.
    Newman , M.E.: Scientific collaboration networks. ii. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64(1), 016, 132 (2001)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Newman, M.E.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proc. Nat. Acad. Sci. 101(suppl 1), 5200–5205 (2004)CrossRefGoogle Scholar
  37. 37.
    Persson, O., Glänzel, W., Danell, R.: Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics 60(3), 421–432 (2004)CrossRefGoogle Scholar
  38. 38.
    Petersen, A.M.: Quantifying the impact of weak, strong, and super ties in scientific careers. Proc. Natl. Acad. Sci. 112(34), E4671–E4680 (2015)CrossRefGoogle Scholar
  39. 39.
    Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific (2014)Google Scholar
  40. 40.
    Seber, G.A., Lee, A.J.: Linear Regression Analysis, vol. 936. Wiley (2012)Google Scholar
  41. 41.
    Sinatra, R., Wang, D., Deville, P., Song, C., Barabási, A.L.: Quantifying the evolution of individual scientific impact. Science 354(6312), aaf5239 (2016)CrossRefGoogle Scholar
  42. 42.
    Sonnenwald, D.H.: Scientific collaboration. Ann. Rev. Inf. Sci. Technol. 41(1), 643–681 (2007)CrossRefGoogle Scholar
  43. 43.
    Stokols, D., Hall, K.L., Taylor, B.K., Moser, R.P.: The science of team science: Overview of the field and introduction to the supplement. Am. J. Prev. Med. 35(2), S77–S89 (2008)CrossRefGoogle Scholar
  44. 44.
    Sun, Y., Han, J., Aggarwal, C.C., Chawla, N.V.: When will it happen?: Relationship prediction in heterogeneous information networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp 663–672. ACM (2012)Google Scholar
  45. 45.
    Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1285–1293. ACM (2012)Google Scholar
  46. 46.
    Tang, J., Chang, S., Aggarwal, C., Liu, H.: Negative link prediction in social media. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp 87–96. ACM (2015)Google Scholar
  47. 47.
    Tsai, C.H., Lin, Y.R.: Tracing and predicting collaboration for junior scholars. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp 375–380. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  48. 48.
    Tylenda, T., Angelova, R., Bedathur, S.: Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, p 9. ACM (2009)Google Scholar
  49. 49.
    Wang, W., Bai, X., Xia, F., Bekele, T.M., Su, X., Tolba, A.: From triadic closure to conference closure: The role of academic conferences in promoting scientific collaborations. Scientometrics 113(1), 177–193 (2017)CrossRefGoogle Scholar
  50. 50.
    Wang, W., Cui, Z., Gao, T., Yu, S., Kong, X., Xia, F.: Is scientific collaboration sustainability predictable?. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp 853–854. International World Wide Web Conferences Steering Committee (2017)Google Scholar
  51. 51.
    Williams, K., Wu, J., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the citeseer χ digital library. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), pp 68–73. IEEE (2014)Google Scholar
  52. 52.
    Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science 316(5827), 1036–1039 (2007)CrossRefGoogle Scholar
  53. 53.
    Xia, F., Chen, Z., Wang, W., Li, J., Yang, L.T.: Mvcwalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Trans. Emerg. Topics Comput. 2(3), 364–375 (2014)CrossRefGoogle Scholar
  54. 54.
    Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: A survey. IEEE Trans. Big Data 3(1), 18–35 (2017)CrossRefGoogle Scholar
  55. 55.
    Yang, Z.R.: Biological applications of support vector machines. Brief. Bioinform. 5(4), 328–338 (2004)CrossRefGoogle Scholar
  56. 56.
    Zhang, C., Bu, Y., Ding, Y., Xu, J.: Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology. (2017)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of SoftwareDalian University of TechnologyDalianChina

Personalised recommendations