Skip to main content
Log in

Identification and causal analysis of predatory open access journals based on interpretable machine learning

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Predatory journals have been a recent phenomenon, drawing attention from the academic community in the last decade. However, as the open access (OA) movement has gained momentum, the indiscriminate growth of predatory journals has had significant negative impacts on academic communication, scholarly publishing, and effective utilization of scientific resources. This rampant growth poses a serious threat to the healthy development of the OA movement and also undermines the integrity of research and the research ecosystem. Identifying predatory journals from the massive number of OA journals would assist scholars in evading negative consequences in areas of monetary investment, reputation, academic influence, and occupational advancement. Traditional methods for identifying predatory journals have relied heavily on the knowledge of domain experts. However, a large number of predatory journals exhibit latent and covert characteristics, and the growth rate of OA journals is extremely rapid, making it difficult for experts to identify these predatory journals from the vast number of OA journals. This paper proposes an interpretable machine learning model for early warning of predatory OA journals, which identifies predatory journals through the ensemble of multiple machine learning algorithms. Specifically, the proposed methodology first constructs an OA journal early warning indicator system and integrates multiple machine learning algorithms to compute the early warning values of OA journals. Then, the SHAP interpretable framework is introduced to analyze the causal factors of the early warning risks in a novel way. To verify the accuracy of the model's causal factors, we conduct a comparative analysis of domestic and foreign medical OA journals using case studies. The empirical analysis conducted in this study demonstrates the efficacy of the ensemble algorithm in accurately identifying the risk of predatory OA journals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Abp, A., Am, A., Ht, A., Sd, B., & Akm, A. (2020). Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis. Accident Analysis & Prevention, 136, 2313–2349.

    Google Scholar 

  • Ahmad, S., & Waris, A. (2017). Comparison among selected journal quality indicators of mechanical engineering journals. Journal of Scientometric Research, 6(3), 151–158.

    Article  Google Scholar 

  • beda Sánchez, A. M., FernándezCano, A., & Callejas, Z. (2019). Using evaluative indicators of scientific journals to identify emergent research fronts in special education. Luis Gómez Chova, (pp. 3394–3403).

  • Beranová, L., Joachimiak, M. P., Kliegr, T., et al. (2022). Why was this cited? Explainable machine learning applied to COVID-19 research literature. Scientometrics, 127, 2313–2349. https://doi.org/10.1007/s11192-022-04314-9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bohannon, J. (2013). Who’s afraid of peer review? Science, 342(6154), 60–65.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Bornmann, L., & Daniel, H. D. (2005). Does the h-index for ranking of scientists really work? Scientometrics, 65(3), 391–392.

    Article  Google Scholar 

  • Butler, D. (2008). Free journal-ranking tool enters citation market. Nature, 451(7174), 6.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Butler, D. (2013). Investigating journals: the dark side of publishing. Nature, 495(7442), 433–435.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Cantín, M., Muñoz, M., & Roa, I. (2015). Comparison between impact factor, eigenfactor score, and scimago journal rank indicator in anatomy and morphology journals. International Journal of Morphology, 33(3), 1183–1188.

    Article  Google Scholar 

  • Cheng, W., & Ren, S. (2016). Investigation on article processing charge for OA papers from the world’s major countries. Chinese Science Bulletin, 61(26), 2861–2868.

    Article  Google Scholar 

  • Clarivate. (2022). Journal Citation Reports. Retrieved July 31, 2022 from https://clarivate.com/zh-hant/news/news-releases-2022-0629/

  • Clarivate. (2023). Supporting integrity of the scholarly record: Our commitment to curation and selectivity in the Web of Science. Retrieved March 23, 2023 from https://clarivate.com/blog/supporting-integrity-of-the-scholarly-record-our-commitment-to-curation-and-selectivity-in-the-web-of-science/

  • Dadkhah, M., & Bianciardi, G. (2016). Ranking predatory journals: Solve the problem instead of removing it! Advanced Pharmaceutical Bulletin, 6(1), 1–4. https://doi.org/10.15171/apb.2016.001

    Article  PubMed  PubMed Central  Google Scholar 

  • Dai, Q., & Yuan, X. (2018). Academic reputation risk analysis and early warning research of open access journals. Chinese Journal of Scientific and Technical Periodical, 29(11), 1063–1071.

    Google Scholar 

  • Ding, H., & Ruan, J. L. (2022). Exploring the factors influencing LIS scholars citing other’s works: An empirical research based on algorithmic attribution. Document, Information & Knowledge, 39(02), 83–97.

    Google Scholar 

  • DOAJ. Directory of open access journals. Retrieved July 31, 2022 from https://doaj.org/

  • Dong, X., & Bollen, J. (2015). Computational models of consumer confidence from large-scale online attention data: crowd-sourcing econometrics. Plos One, 10(3), e0120039.

    Article  PubMed  PubMed Central  Google Scholar 

  • Falagas, M. E., Kouranos, V. D., Arencibia-Jorge, R., & Karageorgopoulos, D. E. (2008). Comparison of scimago journal rank indicator with journal impact factor. The FASEB Journal, 22(8), 2623–2628.

    Article  CAS  PubMed  Google Scholar 

  • Fang, H. L. (2018). Comparison of cited half-life between Chinese and international SCI journals. Chinese Journal of Scientific and Technical Periodicals, 29(09), 935–939.

    Google Scholar 

  • Feng, D., & Wu, G. (2022). Interpretable machine learning-based modeling approach for fundamental properties of concrete structures. Journal of Building Structures, 43(4), 228–238.

    Google Scholar 

  • Fu, Z. K., Liu, B. X., Zhou, Z. Y., & Peng, Q. N. (2022). Research on patent quality analysis and classification prediction based on ensemble learning. Journal of Intelligence, 10, 89–96.

    Google Scholar 

  • Garfield, E. (1955). Citation indexes for science: a new dimension in documentation through association of ideas. Science, 122(3159), 108–111.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Halim, Z., & Khan, S. (2019). A data science-based framework to categorize academic journals. Scientometrics, 119, 393–423. https://doi.org/10.1007/s11192-019-03035-w

    Article  Google Scholar 

  • He, Y., & Xu, X. (2022). Empirical study on quality evaluation of OA journals: A comparative analysis of double-blind and open review modes. Chinese Journal of Scientific and Technical Periodical, 33(3), 305–310.

    Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Ences of the United States of America, 102(46), 16569–16572.

    Article  ADS  CAS  Google Scholar 

  • Hu, D. H., Ren, L., & Han, H. (2010). Quality control mechanisms for open access journals: A PLoS The Chinese Academy of Sciences study. Chinese Journal of Scientific and Technical Periodicals, 4, 4.

    Google Scholar 

  • Huang, Y. Q., Liang, C. H., He, L., et al. (2016). Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. Journal of Clinical Oncology, 34(18), 2157.

    Article  PubMed  Google Scholar 

  • Ibrahim, M., Louie, M., Modarres, C., & Paisley, J. (2019). Global explanations of neural networks: Mapping the landscape of predictions. http://arxiv.org/abs/arXiv:1902.02384

  • Jaafar, R., Pereira, V., Saab, S. S., & El-Kassar, A. N. (2021). Which journal ranking list? A case study in business and economics. Euromed Journal of Business, 16(4), 361–380. https://doi.org/10.1108/emjb-05-2020-0039

    Article  Google Scholar 

  • John, T. (2019). FTC hits predatory scientific publisher with a $50 million fine. Retrieved July 31, 2022 from https://arstechnica.com/science/2019/04/ftc-hits-predatory-scientific-publisher-with-a-50-million-fine/

  • John, M., & Liying, Y. (2017). Evaluating journal quality: A review of journal citation indicators and ranking in business and management. European Journal of Operational Research, 257(1), 323–337.

    Article  MathSciNet  Google Scholar 

  • Li, X., Chen, Y., & Zhao, Y. (2022). Analysis and enlightenment of international high risk academic journals: A case study of early warning journals released by Chinese Academy of Sciences. Journal of Library and Information Science, 7(4), 67–73.

    Google Scholar 

  • Li, J., Fang, Y., Sun, Y., & Han, L. (2020). Analysis of challenges and governance countermeasures of scientific research integrity in biomedical field based on retraction data. Bulletin of National Natural Science Foundation of China, 34(3), 305–310.

    Google Scholar 

  • Lin, Y., Gan, H., Mo, L., & Bian, D. (2020). International impact analysis of the Chinese science and technology periodicals on the top list for seven consecutive years from 2011 to 2017 from the perspective of bibliometrics. Journal of Navy Medicine, 41(6), 741–747.

    Google Scholar 

  • Lin, Z. (2021). Evolution of large comprehensive oversea open access scientific journal and enlightenment on the establishment of similar journals in China. Acta Editologica, 33(1), 114–118.

    Google Scholar 

  • Liu, X. L., Fang, H. L., Zhou, Z. X., Dong, J. J., & Sheng, L. N. (2011). Controll study of bibliometrics characteristic in Chinese scientific and technologic journals with different self-cited rates. Acta Editologica, 23(1), 4.

    Google Scholar 

  • Luan, M., Sun, D., Li, Z., & Zhu, R. (2020). Terrorism risk prediction model based on GRA-SVR—Taking “the Belt and Road” as an Example. Journal of Intelligence, 39(3), 37–41.

    Google Scholar 

  • Lundberg, S., & Lee, S. I. (2017). A unified approach to interpreting model predictions. http://arxiv.org/abs/arXiv:1705.07874

  • Ma, Y., Han, Y. K., Chen, M. S., & Che, Y. Q. (2022). Study on dynamic evaluation of sci-tech journals based on time series model. Applied Sciences-Basel, 12(24), 26. https://doi.org/10.3390/app122412864

    Article  CAS  Google Scholar 

  • Mo, J., & Ma, J. H. (2012). Quality evaluation and problems of chinese science and technology journals—Based on Scientists’ Questionnaire Survey. Chinese Journal of Scientific and Technical Periodicals, 23(6), 8.

    Google Scholar 

  • Moed, H. F. (2011). The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact. Journal of the Association for Information Science & Technology, 62(1), 211–213.

    Google Scholar 

  • National Science Library, Chinese Academy of Sciences. (2020). Early warning list of international journals (trial). Retrieved July 31, 2022 from https://earlywarning.fenqubiao.com

  • Normile, D. Big-name scientists surprised to find themselves on journal board. Retrieved July 31, 2022 from https://www.science.org/content/article/big-name-scientists-surprised-find-themselves-journal-board

  • Paji, D. (2015). On the stability of citation-based journal rankings. Journal of Informetrics, 9(4), 990–1006.

    Article  Google Scholar 

  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: explaining the predictions of any classifier. ACM. doi, 10(1145/2939672), 2939778.

    Google Scholar 

  • Shapley, L. S. (1953). A value for n-person games. Princeton University Press.

    Google Scholar 

  • Su, Lx., Lyu, Ph., Yang, Z., et al. (2015). Scientometric cognitive and evaluation on smart city related construction and building journals data. Scientometrics, 105, 449–470. https://doi.org/10.1007/s11192-015-1697-0

    Article  Google Scholar 

  • Sun, R., An, L., & Li, G. (2022). Patent value prediction based on multi-feature fusion—Taking 5G technology as a case. Journal of Modern Information, 11, 87–96.

    Google Scholar 

  • The Paper. (2021). Nearly a year after the two offices issued a document, many universities have established a “negative list” of periodicals. Retrieved July 31, 2022 from https://baijiahao.baidu.com/s?id=1630860580442462650

  • Tian, Y. P., Li, G., & Mao, J. (2023). Predicting the evolution of scientific communities by interpretable machine learning approaches. Journal of Informetrics, 17(2), 20. https://doi.org/10.1016/j.joi.2023.101399

    Article  Google Scholar 

  • Valderrama, P., Valderrama, A., & Baca, P. (2020). Bibliometric analysis and evaluation of the journal medicina oral patología oral y cirugía bucal (2008–2018). Medicina oral, patologia oral y cirugia bucal,. https://doi.org/10.4317/medoral.23289

    Article  PubMed  Google Scholar 

  • Vundavalli, S., Naidu, G., Bhargav, A., Praveen, B. H., & Babburi, S. (2016). Quality of reporting of randomized controlled trials in ten academic indian dental journals. Indian Journal of Dental Research, 27(2), 116.

    Article  PubMed  Google Scholar 

  • Wei, M. (2019). Research on impact evaluation of open access journals. Scientometrics, 122(3), 1027–1049.

    Google Scholar 

  • Wolpert, A. J. (2013). For the sake of inquiry and knowledge–the inevitability of open access. New England Journal of Medicine, 368(9), 785–787.

    Article  CAS  PubMed  Google Scholar 

  • Wu, T., Yang, J., Chen, C., Zhao, J., & Sun, J. L. (2015). Research on comprehensive evaluation indicators of scientific and technological journal citations based on factor analysis. Chinese Journal of Scientific and Technological Periodicals, 26(2), 5.

    Google Scholar 

  • Yang, H., Tao, X., Du, H., & Xu, L. (2017). Review on quality evaluation methods of open acces journals. Acta Editologica, 29(2), 150–152.

    Google Scholar 

  • Yu, L. P., & Du, W. (2023). Periodical classfication and its characteristics based on the relationship between timeliness and influence. Information and Documentation Services, 01, 52–61.

    Google Scholar 

  • Yu, L. P., & Pan, W. B. (2022). Key indicators of journal evaluation based on K-means and PLS-DA. Journal of Library and Information Science in Agriculture, 34(12), 55–64.

    Google Scholar 

  • Zarifmahmoudi, L., Jamali, J., & Sadeghi, R. (2015). Google scholar journal metrics: Comparison with impact factor and scimago journal rank indicator for nuclear medicine journals. Iranian Journal of Nuclear Medicine, 23(1), 8–14.

    Google Scholar 

  • Zhang, H., & Huang, S. (2007). Discussion about the evaluation system on OA journals. Journal of Information, 16(3), 124–126.

    Google Scholar 

  • Zhao, R. Y., & Wang, X. (2019). Evaluation and comparison of influence in international open access journals between China and USA. Scientometrics, 120(3), 1091–1110.

    Article  Google Scholar 

  • Zhao, T., Dai, T., Lun, Z., & Gao, Y. (2021). An analysis of recently retracted articles by authors affiliated with hospitals in mainland china. Journal of Scholarly Publishing, 52(2), 107–122.

    Article  Google Scholar 

  • Zong, Z. J. (2022). Characteristics of journals on the early warning list. Journal of Intelligence, 41(12), 8.

    Google Scholar 

Download references

Acknowledgements

This work was supported by the China Scholarship Council.

Funding

Funding was provided 2020 Hubei Provincial Social Science Foundation Pre-Funded Projects (Grant No. 20ZD053), Social Science Foundation of Shaanxi Province (Grant No. 19CTQ030).

Author information

Authors and Affiliations

Authors

Contributions

MKL: Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the manuscript.WJH: Comment on the overall framework of the paper, provide article revisions, and offer ideas. LTY: Collected experimental data, redid experiments, and wrote revisions. ZL: Conceived and designed the analysis, Wrote the manuscript and designed the figures, Other contribution.

Corresponding author

Correspondence to Keliang Mu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Liu, T., Mu, K. et al. Identification and causal analysis of predatory open access journals based on interpretable machine learning. Scientometrics (2024). https://doi.org/10.1007/s11192-024-04969-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11192-024-04969-6

Keywords

Navigation