A data science-based framework to categorize academic journals


Academic journals play a significant role in the dissemination of new research insights and knowledge among scientists. The number of such journals has recently increased significantly. Scientists prefer to publish their scholarly work at reputed venues. Speed of publication is also an import factor considered by many while selecting a publication venue. To evaluate a journal’s quality, few of the key indicators include impact factor, Source Normalized Impact per Paper (SNIP), and Hirsch index (h-index). Journals’ ranking is an indication of their impact and quality with respect to other venues in a specific discipline. Various measures can be utilized for ranking, like, field specific statistics, intra discipline ranking, or a combination of both. Earlier, the journals’ ranking was done through a manual process by providing an institutional list created by academic leaders. Factors like politicization, biases, and personal interests were the key issues with such categorization. Later, the process evolved to a database system based on impact factor, SNIP (Source Normalized Impact per Paper), h-index, or any combination of these. All this demanded an external source of categorizing academic journals. This work presents a data science-based framework that evaluates journals based on their key bibliometric indicators and presents an automated approach to categorize them. For this, the current proposal is restricted to the journals published in the computer science domain. The journal’s features considered in the proposed framework include: publisher, impact factor, website, CiteScore, SJR (SCImago Journal & Country Rank), SNIP, h-index, country, age, cited half-life, immediacy factor/index, Eigenfactor score, article influence score, open access, percentile, citations, acceptance rate, peer review, and the number of articles published yearly. A dataset is collected for 660 journals consisting of these 19 features. The dataset is preprocessed to fill-in the missing values and perform scaling. Three feature selection techniques, namely, Mutual Information (MI), minimum Redundancy Maximum Relevance (mRMR), and Statistical Dependency (SD) are used to rank the aforementioned features. The dataset is then vertically divided into three sets, all features, top nine features, and bottom ten features. Later, two clustering techniques, namely, k-means and k-medoids are employed to find the optimum number of coherent groups in the dataset. Based on a rigorous evaluation, four groups of journals are identified. It is followed by training two classifiers, i.e., k-NN (Nearest Neighbor) and Artificial Neural Network (ANN) to predict the category of an unknown journal. Where, the ANN shows an average accuracy of 82.85%. A descriptive analysis of the clusters formed is also presented to gain insights about the four journal categories. The proposed framework provides an opportunity to independently categorize academic journals based on data science methods using multiple significant bibliometric indicators.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.


  2. 2.


  3. 3.


  4. 4.


  5. 5.


  6. 6.


  7. 7.


  8. 8.


  9. 9.


  10. 10.


  11. 11.



  1. Aleskerov, F., Pislyakov, V., & Vitkup, T. (2014). Ranking Journals in Economics, Management and Political Sciences by the Threshold Aggregation Procedure.

  2. Bauerly, R. J., & Johnson, D. T. (2005). An evaluation of journals used in doctoral marketing programs. Journal of the Academy of Marketing Science, 33(3), 313–329.

    Article  Google Scholar 

  3. Bollen, K. A., & Paxton, P. (1998). Detection and determinants of bias in subjective measures. American Sociological Review, 63, 465–478.

    Article  Google Scholar 

  4. Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34(3), 483–519.

    Article  Google Scholar 

  5. Bouyssou, D., & Marchant, T. (2011). Bibliometric rankings of journals based on impact factors: An axiomatic approach. Journal of Informetrics, 5(1), 75–86.

    Article  Google Scholar 

  6. Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.

    Article  Google Scholar 

  7. Chang, C. L., McAleer, M., & Oxley, L. (2013). Journal impact factor, eigenfactor. Journal Influence and Article Influence (No. 13-002/III). Tinbergen Institute Discussion Paper.

  8. Derkatch, C. (2012). Demarcating medicine’s boundaries: Constituting and categorizing in the journals of the American Medical Association. Technical Communication Quarterly, 21(3), 210–229.

    Article  Google Scholar 

  9. Egghe, L. (1988). Mathematical relations between impact factors and average number of citations. Information Processing and Management, 24(5), 567–576.

    Article  Google Scholar 

  10. Epstein, D. (2007). Impact factor manipulation. The Write Stuff, 16(3), 133–134.

    Google Scholar 

  11. Franke, N., & Schreier, M. (2008). A meta-ranking of technology and innovation management/entrepreneurship journals. Die Betriebswirtschaft, 68, 185–216.

    Google Scholar 

  12. Freyne, J., Coyle, L., Smyth, B., & Cunningham, P. (2010). Relative status of journal and conference publications in computer science. Communications of the ACM, 53(11), 124–132.

    Article  Google Scholar 

  13. Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. Journal of the Association for Information Science and Technology, 14(3), 195–201.

    Google Scholar 

  14. Glänzel, W., & Moed, H. (2002). Journal impact measures in bibliometric research. Scientometrics, 53(2), 171–193.

    Article  Google Scholar 

  15. González-Pereira, B., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). A new approach to the metric of journals’ scientific prestige: The SJR indicator. Journal of informetrics, 4(3), 379–391.

    Article  Google Scholar 

  16. Goodman, S. N. (2018). A quality-control test for predatory journals. Nature, 553(7687), 155.

    Article  Google Scholar 

  17. Halim, Z., Atif, M., Rashid, A., & Edwin, C. A. (2017). Profiling players using real-world datasets: Clustering the data and correlating the results with the big-five personality traits. IEEE Transactions on Affective Computing.

  18. Halim, Z., Kalsoom, R., & Baig, A. R. (2016). Profiling drivers based on driver dependent vehicle driving features. Applied Intelligence, 44(03), 645–664.

    Article  Google Scholar 

  19. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569.

    Article  MATH  Google Scholar 

  20. Hole, A. R. (2017). Ranking economics journals using data from a national research evaluation exercise. Oxford Bulletin of Economics and Statistics, 79(5), 621–636.

    Article  Google Scholar 

  21. Kao, C., Lin, H. W., Chung, S. L., Tsai, W. C., Chiou, J. S., Chen, Y. L., et al. (2008). Ranking Taiwanese management journals: A case study. Scientometrics, 76(1), 95–115.

    Article  Google Scholar 

  22. Lambert, S., & Alony, I. (2018). Embedding MOOCs in academic programmes as a part of curriculum transformation: A pilot case study. In Innovations in open and flexible education (pp. 73–81). Springer, Singapore.

  23. Lowry, P., Moody, G., Gaskin, J., Galletta, D., Humphreys, S., Barlow, J., et al. (2013). Evaluating journal quality and the association for information systems (AIS) senior scholars’ journal basket via bibliometric measures: Do expert journal assessments add value? MIS Quarterly, 37(4), 993–1012.

    Article  Google Scholar 

  24. Lowry, P., Romans, D., & Curtis, A. (2004). Global journal prestige and supporting disciplines: A scientometric study of information systems journals. Journal of the Association for Information Systems, 5(2), 29–75.

    Article  Google Scholar 

  25. Meho, L. I., & Rogers, Y. (2008). Citation counting, citation ranking, and h-index of human-computer interaction researchers: a comparison of Scopus and Web of Science. Journal of the Association for Information Science and Technology, 59(11), 1711–1726.

    Google Scholar 

  26. Moed, H. F. (2011). The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact. Journal of the American Society for Information Science and Technology, 62(1), 211–213.

    Article  Google Scholar 

  27. Pisanski, K., Sorokowski, P., & Kulczycki, E. (2017). Predatory journals recruit fake editor. Nature, 543, 481–483.

    Article  Google Scholar 

  28. Serenko, A., & Bontis, N. (2009). Global ranking of knowledge management and intellectual capital academic journals. Journal of Knowledge Management, 13(1), 4–15.

    Article  Google Scholar 

  29. Spezi, V., Wakeling, S., Pinfield, S., Creaser, C., Fry, J., & Willett, P. (2017). Open-Access mega-journals: The future of scholarly communication or academic dumping ground? A review. Journal of Documentation, 73(2), 263–283.

    Article  Google Scholar 

  30. Tüselmann, H., Sinkovics, R. R., & Pishchulov, G. (2015). Towards a consolidation of worldwide journal rankings—a classification using random forests and aggregate rating via data envelopment analysis. Omega, 51, 11–23.

    Article  Google Scholar 

  31. Vaccario, G., Medo, M., Wider, N., & Mariani, M. S. (2017). Quantifying and suppressing ranking bias in a large citation network. Journal of Informetrics, 11(3), 766–782.

    Article  Google Scholar 

  32. Wallace, F. H., & Perri, T. J. (2018). Economists behaving badly: Publications in predatory journals. Scientometrics, 115(2), 749–766.

    Article  Google Scholar 

  33. Wiloso, E. I., Nazir, N., Hanafi, J., Siregar, K., Harsono, S. S., Setiawan, A. A. R. et al. (2018). Life cycle assessment research and application in Indonesia. The International Journal of Life Cycle Assessment, 1–11.

  34. Zhou, D., Ma, J., & Turban, E. (2001). Journal quality assessment: An integrated subjective and objective approach. IEEE Transactions on Engineering Management, 48(4), 479–490.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Zahid Halim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (TXT 30 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Halim, Z., Khan, S. A data science-based framework to categorize academic journals. Scientometrics 119, 393–423 (2019). https://doi.org/10.1007/s11192-019-03035-w

Download citation


  • Journals categorization
  • Ranking
  • Data science
  • Clustering application