Machine Learning Applied to the H Index of Colombian Authors with Publications in Scopus

  • Amelec ViloriaEmail author
  • Jenny Paola Lis-Gutiérrez
  • Mercedes Gaitán-Angulo
  • Carmen Luisa Vásquez Stanescu
  • Tito Crissien
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 167)


Our research aims to establish how to predict the H index of Colombian authors with publications in Scopus until 2016. The selection of the date was because, as mentioned earlier, the number of documents indexed per year exceeded 10,000 and they obtained the highest number of documents cited. To accomplish this purpose, a quantitative, nonexperimental, cross-sectional, descriptive, explanatory, and predictive research was designed using supervised learning algorithms. These were applied to information from 8,840 Colombian authors. Among the findings we can highlight that: (i) Colombia is in the fifth position in the scope of countries of South America and the Caribbean, in terms of the number of products and citations; (ii) the largest number of Colombian authors with products in Scopus until 2016, belonged mainly to the area of natural sciences, followed by medical sciences and health; (iii) most of the Colombian authors were men (64.2%, or 5,442) and they have higher H index rates than women; (iv) using random cross validation for 10 iterations, the methods with the best predictive value using R2 and the minimization of mean absolute error (MAE) correspond to: AdaBoost (96.6% and 0.397, respectively); Random Forest (96.8% and 0.431, respectively); KNN (94.4% and 0.525, respectively); Tree (94.9% and 0.53, respectively); and Neural Network (93.3% and 0.7, respectively); and (v) the variables that help predict the H index in the case of the Colombian authors, in addition to the citations, correspond to: the quantity of products, number of products in Q1, and international collaboration.


H index Scopus Academic publication Scientific research Machine learning Colombia 


  1. 1.
    Scimago Lab. Viz Tools. Scimago, Granada (2019)Google Scholar
  2. 2.
    Wuestman, M., Hoekman, J., Frenken, K.: The geography of scientific citations. Res. Policy (7), 1771–1780 (2019)CrossRefGoogle Scholar
  3. 3.
    Csomós, G.: A spatial scientometric analysis of the publication output of cities worldwide. J. Inf. 12(2), 547–566 (2018)Google Scholar
  4. 4.
    Quinapanta, M.A., Lescano, L.R.F., Barral, O.P., Jiménez, R.A.F., Rivera, D.N.: Medición del rendimiento del talento humano en instituciones de educación superior: producción científica. Ingeniería Industrial 40(1), 24–36 (2019)Google Scholar
  5. 5.
    Fast, N.J., Schroeder, J.: Power and decision making: new directions for research in the age of artificial intelligence. Curr. Opin. Psychol. 33, 172–176 (2020)CrossRefGoogle Scholar
  6. 6.
    De Sousa, W.G., de Melo, E.R.P., Bermejo, P.H.D.S., Farias, R.A.S., Gomes, A.O.: How and where is artificial intelligence in the public sector going? A literature review and research agenda. Government Information Quarterly, 101392 (2019). [In Press]Google Scholar
  7. 7.
    Rosales, R., Castañón-Puga, M., Lara-Rosano, F., Flores-Parra, J.M., Evans, R., Osuna-Millan, N., Gaxiola-Pacheco, C.: Modelling the interaction levels in HCI using an intelligent hybrid system with interactive agents: a case study of an interactive museum exhibition module in Mexico. Appl. Sci. 8(3), 446 (2018)CrossRefGoogle Scholar
  8. 8.
    Belhi, A., Bouras, A.: Towards a multimodal classification of cultural heritage. In: Qatar Foundation Annual Research Conference Proceedings, vol. 2018, No. 3, p. ICTPD1010. HBKU Press, Qatar (2018)Google Scholar
  9. 9.
    Charalampos, G., Dimitrios, C., Georgios, P., Tzoumanika, A.: Enhanced digital cultural experience. In: 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–5. IEEE (2017)Google Scholar
  10. 10.
    Yasser, A.M., Clawson, K., Bowerman, C.: Saving cultural heritage with digital make-believe: machine learning and digital techniques to the rescue. In: Proceedings of the 31st British Computer Society Human Computer Interaction Conference, p. 97. BCS Learning & Development Ltd. (2017)Google Scholar
  11. 11.
    Owens, T.: We have interesting problems: some applied grand challenges from digital libraries, archives and museums. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 1–1. ACM (2018)Google Scholar
  12. 12.
    Ibáñez, A., Bielza, C., Larrañaga, P.: Cost-sensitive selective naive Bayes classifiers for predicting the increase of the h-index for scientific journals. Neurocomputing 135, 42–52 (2014)CrossRefGoogle Scholar
  13. 13.
    Abrishami, A., Aliakbary, S.: Predicting citation counts based on deep neural network learning techniques. J. Inf. 13(2), 485–499 (2019)Google Scholar
  14. 14.
    Kilicoglu, H., Peng, Z., Tafreshi, S., Tran, T., Rosemblat, G., Schneider, J.: Confirm or refute?: A comparative study on citation sentiment classification in clinical research publications. J. Biomed. Inform. 91, 103123 (2019)CrossRefGoogle Scholar
  15. 15.
    Torres-Samuel M., Vásquez C.L., Viloria A., Varela N., Hernández-Fernandez L., Portillo-Medina R.: Analysis of patterns in the University World Rankings Webometrics, Shanghai, QS and SIR-SCimago: Case Latin America. In: Tan, Y., Shi, Y., Tang, Q. (eds.) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol. 10943. Springer, Cham (2018)CrossRefGoogle Scholar
  16. 16.
    Demsar, J., Curk, T., Erjavec, A., Gorup, C,, Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14(Aug), 2349–2353 (2013)Google Scholar
  17. 17.
    Pretnar, A.: The Mystery of Test & Score. University of Ljubljana, Ljubljana (2019). Retrieved from
  18. 18.
    Demšar, J., Zupan, B.: Orange: data mining fruitful and fun-a historical perspective. Informatica 37(1), 55–60 (2013)Google Scholar
  19. 19.
    Torres-Samuel M., et al.: Efficiency analysis of the visibility of Latin American Universities and their impact on the ranking web. In: Tan, Y., Shi, Y., Tang, Q. (eds.) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol. 10943. Springer, Cham (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Amelec Viloria
    • 1
    Email author
  • Jenny Paola Lis-Gutiérrez
    • 2
    • 3
  • Mercedes Gaitán-Angulo
    • 4
  • Carmen Luisa Vásquez Stanescu
    • 5
  • Tito Crissien
    • 1
  1. 1.Universidad de la CostaBarranquillaColombia
  2. 2.Corporación Universitaria del MetaVillavicencioColombia
  3. 3.Universidad Nacional de ColombiaBogotaColombia
  4. 4.Corporación Universitaria de SalamancaBarranquillaColombia
  5. 5.Universidad Nacional Experimental Politécnica “Antonio José de Sucre”BarquisimetoVenezuela

Personalised recommendations