Academic journals play a significant role in the dissemination of new research insights and knowledge among scientists. The number of such journals has recently increased significantly. Scientists prefer to publish their scholarly work at reputed venues. Speed of publication is also an import factor considered by many while selecting a publication venue. To evaluate a journal’s quality, few of the key indicators include impact factor, Source Normalized Impact per Paper (SNIP), and Hirsch index (h-index). Journals’ ranking is an indication of their impact and quality with respect to other venues in a specific discipline. Various measures can be utilized for ranking, like, field specific statistics, intra discipline ranking, or a combination of both. Earlier, the journals’ ranking was done through a manual process by providing an institutional list created by academic leaders. Factors like politicization, biases, and personal interests were the key issues with such categorization. Later, the process evolved to a database system based on impact factor, SNIP (Source Normalized Impact per Paper), h-index, or any combination of these. All this demanded an external source of categorizing academic journals. This work presents a data science-based framework that evaluates journals based on their key bibliometric indicators and presents an automated approach to categorize them. For this, the current proposal is restricted to the journals published in the computer science domain. The journal’s features considered in the proposed framework include: publisher, impact factor, website, CiteScore, SJR (SCImago Journal & Country Rank), SNIP, h-index, country, age, cited half-life, immediacy factor/index, Eigenfactor score, article influence score, open access, percentile, citations, acceptance rate, peer review, and the number of articles published yearly. A dataset is collected for 660 journals consisting of these 19 features. The dataset is preprocessed to fill-in the missing values and perform scaling. Three feature selection techniques, namely, Mutual Information (MI), minimum Redundancy Maximum Relevance (mRMR), and Statistical Dependency (SD) are used to rank the aforementioned features. The dataset is then vertically divided into three sets, all features, top nine features, and bottom ten features. Later, two clustering techniques, namely, k-means and k-medoids are employed to find the optimum number of coherent groups in the dataset. Based on a rigorous evaluation, four groups of journals are identified. It is followed by training two classifiers, i.e., k-NN (Nearest Neighbor) and Artificial Neural Network (ANN) to predict the category of an unknown journal. Where, the ANN shows an average accuracy of 82.85%. A descriptive analysis of the clusters formed is also presented to gain insights about the four journal categories. The proposed framework provides an opportunity to independently categorize academic journals based on data science methods using multiple significant bibliometric indicators.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Aleskerov, F., Pislyakov, V., & Vitkup, T. (2014). Ranking Journals in Economics, Management and Political Sciences by the Threshold Aggregation Procedure.
Bauerly, R. J., & Johnson, D. T. (2005). An evaluation of journals used in doctoral marketing programs. Journal of the Academy of Marketing Science, 33(3), 313–329.
Bollen, K. A., & Paxton, P. (1998). Detection and determinants of bias in subjective measures. American Sociological Review, 63, 465–478.
Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34(3), 483–519.
Bouyssou, D., & Marchant, T. (2011). Bibliometric rankings of journals based on impact factors: An axiomatic approach. Journal of Informetrics, 5(1), 75–86.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.
Chang, C. L., McAleer, M., & Oxley, L. (2013). Journal impact factor, eigenfactor. Journal Influence and Article Influence (No. 13-002/III). Tinbergen Institute Discussion Paper.
Derkatch, C. (2012). Demarcating medicine’s boundaries: Constituting and categorizing in the journals of the American Medical Association. Technical Communication Quarterly, 21(3), 210–229.
Egghe, L. (1988). Mathematical relations between impact factors and average number of citations. Information Processing and Management, 24(5), 567–576.
Epstein, D. (2007). Impact factor manipulation. The Write Stuff, 16(3), 133–134.
Franke, N., & Schreier, M. (2008). A meta-ranking of technology and innovation management/entrepreneurship journals. Die Betriebswirtschaft, 68, 185–216.
Freyne, J., Coyle, L., Smyth, B., & Cunningham, P. (2010). Relative status of journal and conference publications in computer science. Communications of the ACM, 53(11), 124–132.
Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. Journal of the Association for Information Science and Technology, 14(3), 195–201.
Glänzel, W., & Moed, H. (2002). Journal impact measures in bibliometric research. Scientometrics, 53(2), 171–193.
González-Pereira, B., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). A new approach to the metric of journals’ scientific prestige: The SJR indicator. Journal of informetrics, 4(3), 379–391.
Goodman, S. N. (2018). A quality-control test for predatory journals. Nature, 553(7687), 155.
Halim, Z., Atif, M., Rashid, A., & Edwin, C. A. (2017). Profiling players using real-world datasets: Clustering the data and correlating the results with the big-five personality traits. IEEE Transactions on Affective Computing.
Halim, Z., Kalsoom, R., & Baig, A. R. (2016). Profiling drivers based on driver dependent vehicle driving features. Applied Intelligence, 44(03), 645–664.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569.
Hole, A. R. (2017). Ranking economics journals using data from a national research evaluation exercise. Oxford Bulletin of Economics and Statistics, 79(5), 621–636.
Kao, C., Lin, H. W., Chung, S. L., Tsai, W. C., Chiou, J. S., Chen, Y. L., et al. (2008). Ranking Taiwanese management journals: A case study. Scientometrics, 76(1), 95–115.
Lambert, S., & Alony, I. (2018). Embedding MOOCs in academic programmes as a part of curriculum transformation: A pilot case study. In Innovations in open and flexible education (pp. 73–81). Springer, Singapore.
Lowry, P., Moody, G., Gaskin, J., Galletta, D., Humphreys, S., Barlow, J., et al. (2013). Evaluating journal quality and the association for information systems (AIS) senior scholars’ journal basket via bibliometric measures: Do expert journal assessments add value? MIS Quarterly, 37(4), 993–1012.
Lowry, P., Romans, D., & Curtis, A. (2004). Global journal prestige and supporting disciplines: A scientometric study of information systems journals. Journal of the Association for Information Systems, 5(2), 29–75.
Meho, L. I., & Rogers, Y. (2008). Citation counting, citation ranking, and h-index of human-computer interaction researchers: a comparison of Scopus and Web of Science. Journal of the Association for Information Science and Technology, 59(11), 1711–1726.
Moed, H. F. (2011). The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact. Journal of the American Society for Information Science and Technology, 62(1), 211–213.
Pisanski, K., Sorokowski, P., & Kulczycki, E. (2017). Predatory journals recruit fake editor. Nature, 543, 481–483.
Serenko, A., & Bontis, N. (2009). Global ranking of knowledge management and intellectual capital academic journals. Journal of Knowledge Management, 13(1), 4–15.
Spezi, V., Wakeling, S., Pinfield, S., Creaser, C., Fry, J., & Willett, P. (2017). Open-Access mega-journals: The future of scholarly communication or academic dumping ground? A review. Journal of Documentation, 73(2), 263–283.
Tüselmann, H., Sinkovics, R. R., & Pishchulov, G. (2015). Towards a consolidation of worldwide journal rankings—a classification using random forests and aggregate rating via data envelopment analysis. Omega, 51, 11–23.
Vaccario, G., Medo, M., Wider, N., & Mariani, M. S. (2017). Quantifying and suppressing ranking bias in a large citation network. Journal of Informetrics, 11(3), 766–782.
Wallace, F. H., & Perri, T. J. (2018). Economists behaving badly: Publications in predatory journals. Scientometrics, 115(2), 749–766.
Wiloso, E. I., Nazir, N., Hanafi, J., Siregar, K., Harsono, S. S., Setiawan, A. A. R. et al. (2018). Life cycle assessment research and application in Indonesia. The International Journal of Life Cycle Assessment, 1–11.
Zhou, D., Ma, J., & Turban, E. (2001). Journal quality assessment: An integrated subjective and objective approach. IEEE Transactions on Engineering Management, 48(4), 479–490.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Halim, Z., Khan, S. A data science-based framework to categorize academic journals. Scientometrics 119, 393–423 (2019). https://doi.org/10.1007/s11192-019-03035-w
- Journals categorization
- Data science
- Clustering application