Detecting impact factor manipulation with data mining techniques

Yang, Dong-Hui; Li, Xin; Sun, Xiaoxia; Wan, Jie

doi:10.1007/s11192-016-2144-6

Detecting impact factor manipulation with data mining techniques

Published: 11 October 2016

Volume 109, pages 1989–2005, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Dong-Hui Yang¹,
Xin Li¹,
Xiaoxia Sun¹ &
…
Jie Wan^2,3

924 Accesses
15 Citations
1 Altmetric
Explore all metrics

Abstract

Disingenuously manipulating impact factor is the significant way to harm the fairness of impact factor. That behavior should be banned with effective means. In this paper, data mining techniques are used to solve this problem. Firstly, ten features are collected into feature set for nine normal journals and nine abnormal journals from 2005 to 2014. Then, three types of strong classification methods, k-nearest neighbor, decision tree and support vector machine are adopted to learn the well classification models. Moreover, eight algorithms are run on the data set to find out suitable methods for detecting impact factor manipulation in our experiment. Finally, two excellent algorithms in performance with precisions higher than 85 % are picked out and used to predict new journal samples. According to the results, random forest and one type of support vector machine are relatively more suitable than k-nearest neighbor in this case of detecting abnormal journals. When using those two methods to recognize other 90 journals in the field of nine disciplines from 2007 to 2014, they are verified to be broadly applicable. Unfortunately, four journals are recognized to be manipulated in some years. Therefore, in this paper, two data mining methods are discovered to be intelligent and automatic ways to detect and ban impact factor manipulation for journal managers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Methodology for Integrated Testing of Journal Entries by Benford’s Law

Identification and causal analysis of predatory open access journals based on interpretable machine learning

Article 11 March 2024

A Bibliometric Analysis on Data Mining Using Bradford’s Law

References

Billington, J., & Smith, A. T. (2015). Neural mechanisms for discounting head-roll-induced retinal motion. Journal of Neuroscience, 35(12), 4851–4856.
Article Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Campanario, J. M. (2014). The effect of citations on the significance of decimal places in the computation of journal impact factors. Scientometrics, 99(2), 289–298.
Article Google Scholar
Campanario, J. M. (2015). Providing impact: The distribution of JCR journals according to references they contribute to the 2-year and 5-year journal impact factors. Journal of Informetrics, 9(2), 398–407.
Article Google Scholar
Carrizosa, E., & Morales, D. R. (2013). Supervised classification and mathematical optimization. Computers and Operations Research, 40(1), 150–165.
Article MathSciNet MATH Google Scholar
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Suppot-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., & Hess, K. T. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783–2792.
Article Google Scholar
Davis, P. (2012). The emergence of a citation cartel. The Scholarly Kitchen, 10, 15–17.
Google Scholar
Diaz-Uriarte, R., & de Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7, 1.
Article Google Scholar
Ding, H., Takigawa, I., Mamitsuka, H., & Zhu, S. F. (2014). Similarity-based machine learning methods for predicting drug-target interactions: A brief review. Briefings in Bioinformatics, 15(5), 734–747.
Article Google Scholar
Falagas, M. E., & Alexiou, V. G. (2008). The top-ten in journal impact factor manipulation. Archivum Immunologiae Et Therapiae Experimentalis, 56(4), 223–226.
Article Google Scholar
Fowler, J. H., & Aksnes, D. W. (2007). Does self-citation pay? Scientometrics, 72(3), 427–437.
Article Google Scholar
Garfield, E. (1955). Citation indexse for science-new dimension in documentation through association of ideas. Science, 122(3159), 108–111.
Article Google Scholar
Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA-Journal of the American Medical Association, 295(1), 90–93.
Article Google Scholar
Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006). Random forests for land cover classification. Pattern Recognition Letters, 27(4), 294–300.
Article Google Scholar
Haghdoost, A., Zare, M., & Bazrafshan, A. (2014). How variable are the journal impact measures? Online Information Review, 38(6), 723–737.
Article Google Scholar
Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and techniques. New York: Elsevier.
MATH Google Scholar
Hemmingsson, A., Mygind, T., Skjennald, A., & Edgren, J. (2002). Manipulation of impact factors by editors of scientific journals. American Journal of Roentgenology, 178(3), 767.
Article Google Scholar
Heneberg, P. (2014). Parallel worlds of citable documents and others: Inflated commissioned opinion articles enhance scientometric indicators. Journal of the Association for Information Science and Technology, 65(3), 635–643.
Article Google Scholar
Heneberg, P. (2016). From excessive journal self-cites to citation stacking: Analysis of journal self-citation kinetics in search for journals, which boost their scientometric indicators. PLoS One, 11(4), e0153730.
Article Google Scholar
Henriksson, J., Piasecki, B. P., Lend, K., Burglin, T. R., & Swoboda, P. (2013). Finding ciliary genes: A computational approach. Method in Enzymology, 525, 327–350.
Article Google Scholar
Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.
Article Google Scholar
Jacso, P. (2009). Five-year impact factor data in the Journal Citation Reports. Online Information Review, 33(3), 603–614.
Article Google Scholar
Jain, A. K., Duin, R. P. W., & Mao, J. C. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.
Article Google Scholar
Khabsa, M., Elmagarmid, A., Ilyas, I., Hammady, H., & Ouzzani, M. (2016). Learning to identify relevant studies for systematic reviews using random forest and external information. Machine Learning, 102(3), 465–482.
Article MathSciNet Google Scholar
Krauss, J. (2007). Journal self-citation rates in ecological sciences. Scientometrics, 73(1), 79–89.
Article Google Scholar
Kuo, W., & Rupe, J. (2007). R-impact: Reliability-based citation impact factor. IEEE Transactions on Reliability, 56(3), 366–367.
Article Google Scholar
Lynch, J. G. (2012). Business journals combat coercive citation. Science, 335(6073), 1169.
Article Google Scholar
Martin, B. R. (2016). Editors’ JIF-boosting stratagems-which are appropriate and which not? Research Policy, 45(1), 1–7.
Article Google Scholar
Miller, J. B. (2002). Impact factors and publishing research. Scientist, 16(18), 11.
Google Scholar
Mongeon, P., Waltman, L., & Rijcke, S. (2016). https://www.cwts.nl/blog?article=n-q2w2b4.
Seok, J. H., & Kim, J. H. (2015). Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields. Pattern Recognition, 48(11), 3584–3599.
Article Google Scholar
Smith, R. (1997). Journal accused of manipulating impact factor. British Medical Journal, 314(7079), 463.
Article Google Scholar
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348.
Article Google Scholar
Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: A classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences, 43(6), 1947–1958.
Article Google Scholar
Thombs, B. D., Levis, A. W., Razykov, I., Syamchandra, A., Leentjens, A. F., Levenson, J. L., et al. (2015). Potentially coercive self-citation by peer reviewers: A cross-sectional study. Journal of Psychosomatic Research, 78(1), 1–6.
Article Google Scholar
Tort, A. B. L., Targino, Z. H., & Amaral, O. B. (2012). Rising publication delays inflate journal impact factors. PLoS One, 7(12), e53374.
Article Google Scholar
van Nierop, E. (2010). The introduction of the 5-year impact factor: does it benefit statistics journals? Statistica Neerlandica, 64(1), 71–76.
Article MathSciNet Google Scholar
Van Noorden, R., & Tollefson, J. (2013). Brazilian citation scheme outed. Nature, 500(7464), 510–511.
Article Google Scholar
Vens, C., Struyf, J., Schietgat, L., Dzeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214.
Article Google Scholar
Wallner, C. (2009). Ban impact factor manipulation. Science, 323(5913), 461.
Article Google Scholar
Wan, X. J., & Liu, F. (2014). Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology, 65(9), 1929–1938.
Article Google Scholar
Wilhite, A. W., & Fong, E. A. (2012). Coercive citation in academic publishing. Science, 335(6068), 542–543.
Article Google Scholar
Wu, X. D., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
Article Google Scholar
Yu, G., & Wang, L. (2007). The self-cited rate of scientific journals and the manipulation of their impact factors. Scientometrics, 73(3), 321–330.
Article Google Scholar
Yu, G., Yang, D. H., & He, H. X. (2011). An automatic recognition method of journal impact factor manipulation. Journal of Information Science, 37(3), 235–245.
Article Google Scholar
Yu, T., Yu, G., & Wang, M.-Y. (2014). Classification method for detecting coercive self-citation in journals. Journal of Informetrics, 8(1), 123–135.
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the editor and anonymous referees for their constructive comments that substantially helped improve the quality and presentation of this paper. This work was supported by the National Natural Science Foundation of China (Grant Nos. 71501040, 71473034), and the Fundamental Research Funds for the Central Universities (2242014K10020).

Author information

Authors and Affiliations

School of Economics and Management, Southeast University, Nanjing, 210096, People’s Republic of China
Dong-Hui Yang, Xin Li & Xiaoxia Sun
School of Energy Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
Jie Wan
Nanjing Qiuya Power Horizon Information Technology Company Limited, Nanjing, 210012, People’s Republic of China
Jie Wan

Authors

Dong-Hui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxia Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-Hui Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, DH., Li, X., Sun, X. et al. Detecting impact factor manipulation with data mining techniques. Scientometrics 109, 1989–2005 (2016). https://doi.org/10.1007/s11192-016-2144-6

Download citation

Received: 22 June 2016
Published: 11 October 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11192-016-2144-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting impact factor manipulation with data mining techniques

Abstract

Access this article

Similar content being viewed by others

Improving the Methodology for Integrated Testing of Journal Entries by Benford’s Law

Identification and causal analysis of predatory open access journals based on interpretable machine learning

A Bibliometric Analysis on Data Mining Using Bradford’s Law

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting impact factor manipulation with data mining techniques

Abstract

Access this article

Similar content being viewed by others

Improving the Methodology for Integrated Testing of Journal Entries by Benford’s Law

Identification and causal analysis of predatory open access journals based on interpretable machine learning

A Bibliometric Analysis on Data Mining Using Bradford’s Law

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation