Abstract
With the increase in scientific research investment, the number of papers has increased significantly, and the evaluation of the impact of papers has received extensive attention from scholars. The citation frequency is the most convenient and widely used index to measure the academic influence of papers. Still, the citation frequency can only measure the real impact of papers some period of time after those have been published. Therefore, to be able to identify highly cited papers at the early stage of publication, this paper collects data on 1025 academic papers published under the library and information discipline of the Web of Science library in 2007 and then extracts 24 predictive characteristics from three aspects: papers, authors, and journals. On this basis, 7 principal component vectors are constructed by feature screening based on PCA. Also, combined with the BP neural network model, the PCA-BPNN highly-cited paper classification prediction model is constructed and finally compared with the other 5 models. The results show that the PCA-BPNN model built in this paper has better prediction performance and provides an effective model for the prediction of paper influence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yu, T., Yu, G., Li, P.-Y., Wang, L.: Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics 101(2), 1233–1252 (2014). https://doi.org/10.1007/s11192-014-1279-6
Cao, X., Chen, Y., Liu, K.R.: A data analytic approach to quantifying scientific impact. J. Informet. 10, 471–484 (2016)
Bai, X., et al.: An overview on evaluating and predicting scholarly article impact. Information 8(3), 73 (2017)
Hou, J., Pan, H., Guo, T., Lee, I., Kong, X., Xia, F.: Prediction methods and applications in the science of science: a survey. Comput. Sci. Rev. 34, 100197 (2019)
Wang, M., Wang, Z., Chen, G.: Which can better predict the future success of articles? Bibliometric indices or alternative metrics. Scientometrics 119, 1575–1595 (2019)
Lokker, C., McKibbon, K.A., McKinlay, R.J., Wilczynski, N.L., Haynes, R.B.: Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study. BMJ 336, 655–657 (2008)
Pobiedina, N., Ichise, R.: Citation count prediction as a link prediction problem. Appl. Intell. 44(2), 252–268 (2015). https://doi.org/10.1007/s10489-015-0657-y
Kosteas, V.D.: Predicting long-run citation counts for articles in top economics journals. Scientometrics 115(3), 1395–1412 (2018). https://doi.org/10.1007/s11192-018-2703-0
Abramo, G., D’Angelo, C.A., Reale, E.: Peer review versus bibliometrics: which method better predicts the scholarly impact of publications? Scientometrics 121(1), 537–554 (2019). https://doi.org/10.1007/s11192-019-03184-y
Amjad, T., Shahid, N., Daud, A., Khatoon, A.: Citation burst prediction in a bibliometric network. Scientometrics 127(5), 2773–2790 (2022)
Ma, A., Liu, Y., Xu, X., Dong, T.: A deep-learning based citation count prediction model with paper metadata semantic features. Scientometrics 126(8), 6803–6823 (2021). https://doi.org/10.1007/s11192-021-04033-7
Wang, K., Shi, W., Bai, J., Zhao, X., Zhang, L.: Prediction and application of article potential citations based on nonlinear citation-forecasting combined model. Scientometrics 126(8), 6533–6550 (2021). https://doi.org/10.1007/s11192-021-04026-6
Zhao, Q., Feng, X.: Utilizing citation network structure to predict paper citation counts: A Deep learning approach. J. Informet. 16(1), 101235 (2022)
Dong, Y., Johnson, R.A., Chawla, N.V.: Will this paper increase your h-index? In: Bifet, A., May, M., Zadrozny, B., Gavalda, R., Pedreschi, D., Bonchi, F., Cardoso, J., Spiliopoulou, M. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 259–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_26
Hassan, S.-U., Bowman, T.D., Shabbir, M., Akhtar, A., Imran, M., Aljohani, N.R.: Influential tweeters in relation to highly cited articles in altmetric big data. Scientometrics 119(1), 481–493 (2019). https://doi.org/10.1007/s11192-019-03044-9
Wang, F., Fan, Y., Zeng, A., Di, Z.: Can we predict ESI highly cited publications? Scientometrics 118(1), 109–125 (2018). https://doi.org/10.1007/s11192-018-2965-6
Hu, Y.H., Tai, C.T., Liu, K.E., Cai, C.F.: Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity. J. Informet. 14, (2020)
Chowdhury, K.P.: Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers. J. Informet. 15(1), (2021)
Wang, M., Yu, G., Yu, D.: Mining typical features for highly cited papers. Scientometrics 87(3), 695–706 (2011)
Wang, M., Yu, G., An, S., Yu, D.: Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics 93(3), 635–644 (2012)
Bai, X., Zhang, F., Lee, I.: Predicting the citations of scholarly paper. J. Informet. 13(1), 407–418 (2019)
Ruan, X., Zhu, Y., Li, J., Cheng, Y.: Predicting the citation counts of individual papers via a BP neural network. J. Informet. 14(3), 101039 (2020)
Yan, R., Tang, J., Liu, X., Shan, D., Li, X.: Citation count prediction: learning to estimate future citations for literature. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 1247–1252 (2011)
So, M., Kim, J., Choi, S., Park, H.W.: Factors affecting citation networks in science and technology: focused on non-quality factors. Qual. Quant. 49(4), 1513–1530 (2014). https://doi.org/10.1007/s11135-014-0110-z
Xie, J., Gong, K., Li, J., Ke, Q., Kang, H., Cheng, Y.: A probe into 66 factors which are possibly associated with the number of citations an article received. Scientometrics 119(3), 1429–1454 (2019). https://doi.org/10.1007/s11192-019-03094-z
McClelland, D.C.: How motives, skills, and values determine what people do. Am. Psychol. 40(7), 812–825 (1985)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, T., Duan, C. (2023). Research on the Prediction of Highly Cited Papers Based on PCA-BPNN. In: Agarwal, N., Kleiner, G.B., Sakalauskas, L. (eds) Modeling and Simulation of Social-Behavioral Phenomena in Creative Societies. MSBC 2022. Communications in Computer and Information Science, vol 1717. Springer, Cham. https://doi.org/10.1007/978-3-031-33728-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-33728-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33727-7
Online ISBN: 978-3-031-33728-4
eBook Packages: Computer ScienceComputer Science (R0)