, Volume 119, Issue 1, pp 393–423 | Cite as

A data science-based framework to categorize academic journals

  • Zahid HalimEmail author
  • Shafaq Khan


Academic journals play a significant role in the dissemination of new research insights and knowledge among scientists. The number of such journals has recently increased significantly. Scientists prefer to publish their scholarly work at reputed venues. Speed of publication is also an import factor considered by many while selecting a publication venue. To evaluate a journal’s quality, few of the key indicators include impact factor, Source Normalized Impact per Paper (SNIP), and Hirsch index (h-index). Journals’ ranking is an indication of their impact and quality with respect to other venues in a specific discipline. Various measures can be utilized for ranking, like, field specific statistics, intra discipline ranking, or a combination of both. Earlier, the journals’ ranking was done through a manual process by providing an institutional list created by academic leaders. Factors like politicization, biases, and personal interests were the key issues with such categorization. Later, the process evolved to a database system based on impact factor, SNIP (Source Normalized Impact per Paper), h-index, or any combination of these. All this demanded an external source of categorizing academic journals. This work presents a data science-based framework that evaluates journals based on their key bibliometric indicators and presents an automated approach to categorize them. For this, the current proposal is restricted to the journals published in the computer science domain. The journal’s features considered in the proposed framework include: publisher, impact factor, website, CiteScore, SJR (SCImago Journal & Country Rank), SNIP, h-index, country, age, cited half-life, immediacy factor/index, Eigenfactor score, article influence score, open access, percentile, citations, acceptance rate, peer review, and the number of articles published yearly. A dataset is collected for 660 journals consisting of these 19 features. The dataset is preprocessed to fill-in the missing values and perform scaling. Three feature selection techniques, namely, Mutual Information (MI), minimum Redundancy Maximum Relevance (mRMR), and Statistical Dependency (SD) are used to rank the aforementioned features. The dataset is then vertically divided into three sets, all features, top nine features, and bottom ten features. Later, two clustering techniques, namely, k-means and k-medoids are employed to find the optimum number of coherent groups in the dataset. Based on a rigorous evaluation, four groups of journals are identified. It is followed by training two classifiers, i.e., k-NN (Nearest Neighbor) and Artificial Neural Network (ANN) to predict the category of an unknown journal. Where, the ANN shows an average accuracy of 82.85%. A descriptive analysis of the clusters formed is also presented to gain insights about the four journal categories. The proposed framework provides an opportunity to independently categorize academic journals based on data science methods using multiple significant bibliometric indicators.


Journals categorization Ranking Data science Clustering application 

Supplementary material

11192_2019_3035_MOESM1_ESM.txt (29 kb)
Supplementary material 1 (TXT 30 kb)


  1. Aleskerov, F., Pislyakov, V., & Vitkup, T. (2014). Ranking Journals in Economics, Management and Political Sciences by the Threshold Aggregation Procedure.Google Scholar
  2. Bauerly, R. J., & Johnson, D. T. (2005). An evaluation of journals used in doctoral marketing programs. Journal of the Academy of Marketing Science, 33(3), 313–329.CrossRefGoogle Scholar
  3. Bollen, K. A., & Paxton, P. (1998). Detection and determinants of bias in subjective measures. American Sociological Review, 63, 465–478.CrossRefGoogle Scholar
  4. Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34(3), 483–519.CrossRefGoogle Scholar
  5. Bouyssou, D., & Marchant, T. (2011). Bibliometric rankings of journals based on impact factors: An axiomatic approach. Journal of Informetrics, 5(1), 75–86.CrossRefGoogle Scholar
  6. Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.CrossRefGoogle Scholar
  7. Chang, C. L., McAleer, M., & Oxley, L. (2013). Journal impact factor, eigenfactor. Journal Influence and Article Influence (No. 13-002/III). Tinbergen Institute Discussion Paper.Google Scholar
  8. Derkatch, C. (2012). Demarcating medicine’s boundaries: Constituting and categorizing in the journals of the American Medical Association. Technical Communication Quarterly, 21(3), 210–229.CrossRefGoogle Scholar
  9. Egghe, L. (1988). Mathematical relations between impact factors and average number of citations. Information Processing and Management, 24(5), 567–576.CrossRefGoogle Scholar
  10. Epstein, D. (2007). Impact factor manipulation. The Write Stuff, 16(3), 133–134.Google Scholar
  11. Franke, N., & Schreier, M. (2008). A meta-ranking of technology and innovation management/entrepreneurship journals. Die Betriebswirtschaft, 68, 185–216.Google Scholar
  12. Freyne, J., Coyle, L., Smyth, B., & Cunningham, P. (2010). Relative status of journal and conference publications in computer science. Communications of the ACM, 53(11), 124–132.CrossRefGoogle Scholar
  13. Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. Journal of the Association for Information Science and Technology, 14(3), 195–201.Google Scholar
  14. Glänzel, W., & Moed, H. (2002). Journal impact measures in bibliometric research. Scientometrics, 53(2), 171–193.CrossRefGoogle Scholar
  15. González-Pereira, B., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). A new approach to the metric of journals’ scientific prestige: The SJR indicator. Journal of informetrics, 4(3), 379–391.CrossRefGoogle Scholar
  16. Goodman, S. N. (2018). A quality-control test for predatory journals. Nature, 553(7687), 155.CrossRefGoogle Scholar
  17. Halim, Z., Atif, M., Rashid, A., & Edwin, C. A. (2017). Profiling players using real-world datasets: Clustering the data and correlating the results with the big-five personality traits. IEEE Transactions on Affective Computing.Google Scholar
  18. Halim, Z., Kalsoom, R., & Baig, A. R. (2016). Profiling drivers based on driver dependent vehicle driving features. Applied Intelligence, 44(03), 645–664.CrossRefGoogle Scholar
  19. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569.CrossRefzbMATHGoogle Scholar
  20. Hole, A. R. (2017). Ranking economics journals using data from a national research evaluation exercise. Oxford Bulletin of Economics and Statistics, 79(5), 621–636.CrossRefGoogle Scholar
  21. Kao, C., Lin, H. W., Chung, S. L., Tsai, W. C., Chiou, J. S., Chen, Y. L., et al. (2008). Ranking Taiwanese management journals: A case study. Scientometrics, 76(1), 95–115.CrossRefGoogle Scholar
  22. Lambert, S., & Alony, I. (2018). Embedding MOOCs in academic programmes as a part of curriculum transformation: A pilot case study. In Innovations in open and flexible education (pp. 73–81). Springer, Singapore.Google Scholar
  23. Lowry, P., Moody, G., Gaskin, J., Galletta, D., Humphreys, S., Barlow, J., et al. (2013). Evaluating journal quality and the association for information systems (AIS) senior scholars’ journal basket via bibliometric measures: Do expert journal assessments add value? MIS Quarterly, 37(4), 993–1012.CrossRefGoogle Scholar
  24. Lowry, P., Romans, D., & Curtis, A. (2004). Global journal prestige and supporting disciplines: A scientometric study of information systems journals. Journal of the Association for Information Systems, 5(2), 29–75.CrossRefGoogle Scholar
  25. Meho, L. I., & Rogers, Y. (2008). Citation counting, citation ranking, and h-index of human-computer interaction researchers: a comparison of Scopus and Web of Science. Journal of the Association for Information Science and Technology, 59(11), 1711–1726.Google Scholar
  26. Moed, H. F. (2011). The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact. Journal of the American Society for Information Science and Technology, 62(1), 211–213.CrossRefGoogle Scholar
  27. Pisanski, K., Sorokowski, P., & Kulczycki, E. (2017). Predatory journals recruit fake editor. Nature, 543, 481–483.CrossRefGoogle Scholar
  28. Serenko, A., & Bontis, N. (2009). Global ranking of knowledge management and intellectual capital academic journals. Journal of Knowledge Management, 13(1), 4–15.CrossRefGoogle Scholar
  29. Spezi, V., Wakeling, S., Pinfield, S., Creaser, C., Fry, J., & Willett, P. (2017). Open-Access mega-journals: The future of scholarly communication or academic dumping ground? A review. Journal of Documentation, 73(2), 263–283.CrossRefGoogle Scholar
  30. Tüselmann, H., Sinkovics, R. R., & Pishchulov, G. (2015). Towards a consolidation of worldwide journal rankings—a classification using random forests and aggregate rating via data envelopment analysis. Omega, 51, 11–23.CrossRefGoogle Scholar
  31. Vaccario, G., Medo, M., Wider, N., & Mariani, M. S. (2017). Quantifying and suppressing ranking bias in a large citation network. Journal of Informetrics, 11(3), 766–782.CrossRefGoogle Scholar
  32. Wallace, F. H., & Perri, T. J. (2018). Economists behaving badly: Publications in predatory journals. Scientometrics, 115(2), 749–766.CrossRefGoogle Scholar
  33. Wiloso, E. I., Nazir, N., Hanafi, J., Siregar, K., Harsono, S. S., Setiawan, A. A. R. et al. (2018). Life cycle assessment research and application in Indonesia. The International Journal of Life Cycle Assessment, 1–11.Google Scholar
  34. Zhou, D., Ma, J., & Turban, E. (2001). Journal quality assessment: An integrated subjective and objective approach. IEEE Transactions on Engineering Management, 48(4), 479–490.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  1. 1.The Machine Intelligence Research Group (MInG), Faculty of Computer Science and EngineeringGhulam Ishaq Khan Institute of Engineering Sciences and TechnologyTopiPakistan
  2. 2.School of Systems and Technology, Department of Computer ScienceUniversity of Management and TechnologyLahorePakistan

Personalised recommendations