Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations

Qayyum, Faiza; Jamil, Harun; Iqbal, Naeem; Kim, DoHyeun; Afzal, Muhammad Tanvir

doi:10.1007/s11192-022-04530-3

Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations

Published: 21 October 2022

Volume 127, pages 6471–6499, (2022)
Cite this article

Scientometrics Aims and scope Submit manuscript

Faiza Qayyum¹,
Harun Jamil²,
Naeem Iqbal¹,
DoHyeun Kim¹ &
…
Muhammad Tanvir Afzal³

499 Accesses
6 Citations
Explore all metrics

Abstract

Citation analysis-based systems are premised on assuming that all citations are equally important. The scientific community argues that a citation may hold divergent reasons and thus, should not be treated at par. In this regard, a plethora of existing studies classifies citations for varying reasons. Presently, the community has a propensity toward binary citation classification with the notion of contemplating only important reasons while employing quantitative analysis-based measures. We argue that outcomes yielded by the contemporary state-of-the-art models cannot be deemed ideal as the plethora of them has been evaluated on a data set with minimal number of instances due to which the outcomes cannot be generalized. The scope of results from such approaches is restricted to a single domain only which may exhibit entirely different behavior for the different data sets. Most of the studies are ruled by the content based features evaluated by harnessing traditional classification models like Support Vector Machine (SVM), and random forest (RF), while an inconsiderable number of studies employ metadata which holds the potential to serve as a quintessential indicator to tackle meaningful citations. In this study, we introduce Multilayer perceptron artificial neural network (MLP-ANN) binary citation classifier, which exploits the best combinations of features formed using both sources. We also introduce a new benchmark data set from the electrical engineering domain which is consolidated with two existing benchmark data sets for model evaluation. The outcomes reveal that the results produced by the proposed MLP model outperform the contemporary models achieving a precision of 0.92.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Educational data mining: prediction of students' academic performance using machine learning algorithms

Article Open access 03 March 2022

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022)

Article 20 December 2022

References

Abu-Jbara, A., & Radev, D. (2011). Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 500–509). Association for Computational Linguistics. Retrieved 2021–12–09 from https://aclanthology.org/P11-1051
Adagbasa, E., Adelabu, S., & Okello, T. (2019). Application of deep learning with stratified k-fold for vegetation species discrimation in a protected mountainous region using sentinel-2 image. Geocarto International. https://doi.org/10.1080/10106049.2019.1704070
Article Google Scholar
Agarwal, S., Choubey, L., Yu, H. (2010). Automatically classifying the role of citations in biomedical articles, 2010, 11–15. Retrieved 2021–12–08 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041379/
Ahmed, I., & Afzal, M. T. (2020). A systematic approach to map the research articles’ sections to IMRAD, 8, 129359–129371. (Conference Name: IEEE Access). https://doi.org/10.1109/ACCESS.2020.3009021
Aljuaid, H., Iftikhar, R., Ahmad, S., Asif, M., Tanvir Afzal, M. (2021). Important citation identification using sentiment analysis of in-text citations, 56, 101492. Retrieved 2021–12–09 from https://www.sciencedirect.com/science/article/pii/S0736585320301519. https://doi.org/10.1016/j.tele.2020.101492
An, X., Sun, X., Xu, S., Hao, L., & Li, J. (2021). Important citations identification by exploiting generative model into discriminative model. Journal of Information Science. https://doi.org/10.1177/0165551521991034
Article Google Scholar
An, X., Sun, X., Xu, S. (2022). Important citations identification with semisupervised classification model. Scientometrics, 1–23.
Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
MATH Google Scholar
Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216. https://doi.org/10.1002/asi.4630330404
Article Google Scholar
Breiman, L. (2021). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Brooks, T. A. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 36(4), 223–229. https://doi.org/10.1002/asi.4630360402
Article Google Scholar
Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? a study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645. https://doi.org/10.1002/(SICI)1097-4571(2000)51:7⟨635::AID-ASI6⟩3.0.CO;2-H
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Article MATH Google Scholar
Diederich, J., & Balke, W.-T. (2007). The semantic GrowBag algorithm: Automatically deriving categorization systems. In L. Kovacs, N. Fuhr, & C. Meghini (Eds.), Research and advanced technology for digital libraries (pp. 1–13). Springer.
Google Scholar
Dong, C., & Sch¨afer, U. (2011). Ensemble-style self-training on citation classification. In Proceedings of 5th international joint conference on natural language processing (pp. 623–631). Asian Federation of Natural Language Processing. Retrieved 2021–12–09 from https://aclanthology.org/I111070
Finney, B. (1979). Can citation indexing be automated. The Reference Characteristics of Scientific Texts, 269, 189–192.
Google Scholar
Garfield, E. (1965). Can citation indexing be automated. In Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, 269, 189–192.
Google Scholar
Garzone, M., & Mercer, R. E. (2000). Towards an automated citation classifier. In H. J. Hamilton (Ed.), Advances in artificial intelligence (pp. 337–346). NY: Springer.
Chapter Google Scholar
Hassan, S.-U., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Proceedings of the National Academy of Sciences, 117(3), 1645–1662. https://doi.org/10.1007/s11192-018-2944-y
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102
Article MATH Google Scholar
Inhaber, H., & Przednowek, K. (1976). Quality of research and the nobel prizes. Social Studies of Science, 6(1), 33–50. https://doi.org/10.1177/030631277600600102
Article Google Scholar
Iqbal, N., Ahmad, R., Jamil, F., & Kim, D.-H. (2021). Hybrid features prediction model of movie quality using multi-machine learning techniques for effective business resource planning. Journal of Intelligent & Fuzzy Systems, 40(5), 9361–9382.
Article Google Scholar
Jochim, C., & Schu¨tze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING 2012 (pp. 1343–1358). The COLING 2012 Organizing Committee. Retrieved 2021–12–09 from https://aclanthology.org/C12-1082
Junli, C., & Licheng, J. (2000). Classification mechanism of support vector machines. WCC 2000–ICSP 2000: 2000 5th international conference on signal processing proceedings: 16th world computer congress 2000 (Vol. 3, pp. 1556–1559). Doi:https://doi.org/10.1109/ICOSP.2000.893396
Li, X., He, Y., Meyers, A., Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of the international conference recent advances in natural language processing RANLP 2013 (pp. 402–407). INCOMA Ltd. Shoumen, BULGARIA. Retrieved 2021–12–09 from https://aclanthology.org/R13-1052
Lyu, D., Ruan, X., Xie, J., & Cheng, Y. (2021). The classification of citing motivations: A meta-synthesis. Scientometrics, 126(4), 3243–3264.
Article Google Scholar
Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & B¨orner, K. (2013). Global multi-level analysis of the ‘scientific food web.’ Scientific Reports, 3(1), 1167. https://doi.org/10.1038/srep01167
Article Google Scholar
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92. https://doi.org/10.1177/030631277500500106
Article Google Scholar
Nanba, O. M. H. (1999). Towards multi-paper summarization using reference information. IJCAI, 99, 926–931.
Google Scholar
Nazir, S., Asif, M., Ahmad, S., Bukhari, F., Afzal, M. T., & Aljuaid, H. (2020). Important citation identification by exploiting content and section-wise in-text citation count. PLoS ONE, 15(3), e0228885. https://doi.org/10.1371/journal.pone.0228885
Article Google Scholar
Pham, S. B., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In T. T. D. Gedeon & L. C. C. Fung (Eds.), AI 2003: Advances in artificial intelligence (pp. 759–771). Springer.
Chapter Google Scholar
Pham, B. T., Tien Bui, D., Prakash, I., & Dholakia, M. B. (2017). Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. CATENA, 149, 52–63. https://doi.org/10.1016/j.catena.2016.09.007. Accessed 8 Dec 2021.
Article Google Scholar
Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: Backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25(6), 747–759. https://doi.org/10.1016/j.envsoft.2009.10.016
Article Google Scholar
Pride, D., & Knoth, P. (2017). Incidental or influential?–challenges in automatically detecting citation importance using publication full texts. In Proceedings of the lecture notes in computer science, Beer-Sheva, Israel, 29–30 June 2017 (Vol. 10450, pp. 572–578). Gabler: Wiesbaden, Germany.
Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cueterms from content. Scientometrics, 118(1), 21–43. https://doi.org/10.1007/s11192-018-2961-x
Article Google Scholar
Spiegel-Rosing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113. https://doi.org/10.1177/030631277700700111
Article Google Scholar
Sugiyama, K., Kumar, T., Kan, M.-Y., Tripathi, R.C. (2010). Identifying citing sentences in research papers using supervised learning. In 2010 international conference on information retrieval knowledge management (CAMP) (pp. 67–72). https://doi.org/10.1109/INFRKM.2010.5466945
Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics. Retrieved 2021–12–09 from https://aclanthology.org/W06-1613
Valenzuela, M., Ha, V., Etzioni, O. (2015). Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence. Retrieved 2021–12–08 from https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/view/10185
Wang, M., Zhang, J., Jiao, S., Zhang, X., Zhu, N., & Chen, G. (2020). Important citation identification by exploiting the syntactic and contextual information of citations. Scientometrics, 125(3), 2109–2129. https://doi.org/10.1007/s11192-020-03677-1
Article Google Scholar
Xu, S. (2018). Bayesian Naive Bayes classifiers to text classification. Journal of Information Science, 44(1), 48–59. https://doi.org/10.1177/0165551516677946
Article Google Scholar
Xu, S., An, X., Qiao, X., & Zhu, L. (2014). Multi-task least-squares support vector machines. Multimedia Tools and Applications, 71(2), 699–715. https://doi.org/10.1007/s11042-013-1526-5
Article Google Scholar
Zeng, T., & Acuna, D. E. (2020). Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models. Scientometrics, 124(1), 399–428. https://doi.org/10.1007/s11192-020-03421-9
Article Google Scholar
Zhang, Y., Wang, Y., Sheng, Q. Z., Mahmood, A., Emma Zhang, W., & Zhao, R. (2021). TDM-CFC: Towards document-level multi-label citation function classification. International Conference on Web Information Systems Engineering. https://doi.org/10.1007/978-3-030-91560-5_26
Article Google Scholar
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427. https://doi.org/10.1002/asi.23179
Article Google Scholar

Download references

Acknowledgements

This research was supported by Energy Cloud R&D Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (2019M3F2A1073387), and this work was supported by the Institute for Information & communications Technology Promotion (IITP) (NO. 2022-0-00980, Cooperative Intelligence Framework of Scene Perception for Autonomous IoT Device).

Funding

The authors have received no funding for the manuscript.

Author information

Authors and Affiliations

Computer Engineering Department, Jeju National University, Jeju, 63243, South Korea
Faiza Qayyum, Naeem Iqbal & DoHyeun Kim
Department of Electronic Engineering, Jeju National University, Jeju, 63243, South Korea
Harun Jamil
Department of Computing Sciences, Shifa Tameer-e-Milat University, Islamabad, 46000, Pakistan
Muhammad Tanvir Afzal

Authors

Faiza Qayyum
View author publications
You can also search for this author in PubMed Google Scholar
Harun Jamil
View author publications
You can also search for this author in PubMed Google Scholar
Naeem Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
DoHyeun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Tanvir Afzal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Faiza Qayyum or DoHyeun Kim.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

The authors have followed and agree to all the code of ethics required to submit manuscript in the Scientometrics journal.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qayyum, F., Jamil, H., Iqbal, N. et al. Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations. Scientometrics 127, 6471–6499 (2022). https://doi.org/10.1007/s11192-022-04530-3

Download citation

Received: 16 December 2021
Accepted: 14 September 2022
Published: 21 October 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11192-022-04530-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations

Abstract

Access this article

Similar content being viewed by others

Educational data mining: prediction of students' academic performance using machine learning algorithms

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022)

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations

Abstract

Access this article

Similar content being viewed by others

Educational data mining: prediction of students' academic performance using machine learning algorithms

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022)

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation