Skip to main content
Log in

Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Citation analysis-based systems are premised on assuming that all citations are equally important. The scientific community argues that a citation may hold divergent reasons and thus, should not be treated at par. In this regard, a plethora of existing studies classifies citations for varying reasons. Presently, the community has a propensity toward binary citation classification with the notion of contemplating only important reasons while employing quantitative analysis-based measures. We argue that outcomes yielded by the contemporary state-of-the-art models cannot be deemed ideal as the plethora of them has been evaluated on a data set with minimal number of instances due to which the outcomes cannot be generalized. The scope of results from such approaches is restricted to a single domain only which may exhibit entirely different behavior for the different data sets. Most of the studies are ruled by the content based features evaluated by harnessing traditional classification models like Support Vector Machine (SVM), and random forest (RF), while an inconsiderable number of studies employ metadata which holds the potential to serve as a quintessential indicator to tackle meaningful citations. In this study, we introduce Multilayer perceptron artificial neural network (MLP-ANN) binary citation classifier, which exploits the best combinations of features formed using both sources. We also introduce a new benchmark data set from the electrical engineering domain which is consolidated with two existing benchmark data sets for model evaluation. The outcomes reveal that the results produced by the proposed MLP model outperform the contemporary models achieving a precision of 0.92.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abu-Jbara, A., & Radev, D. (2011). Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 500–509). Association for Computational Linguistics. Retrieved 2021–12–09 from https://aclanthology.org/P11-1051

  • Adagbasa, E., Adelabu, S., & Okello, T. (2019). Application of deep learning with stratified k-fold for vegetation species discrimation in a protected mountainous region using sentinel-2 image. Geocarto International. https://doi.org/10.1080/10106049.2019.1704070

    Article  Google Scholar 

  • Agarwal, S., Choubey, L., Yu, H. (2010). Automatically classifying the role of citations in biomedical articles, 2010, 11–15. Retrieved 2021–12–08 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041379/

  • Ahmed, I., & Afzal, M. T. (2020). A systematic approach to map the research articles’ sections to IMRAD, 8, 129359–129371. (Conference Name: IEEE Access). https://doi.org/10.1109/ACCESS.2020.3009021

  • Aljuaid, H., Iftikhar, R., Ahmad, S., Asif, M., Tanvir Afzal, M. (2021). Important citation identification using sentiment analysis of in-text citations, 56, 101492. Retrieved 2021–12–09 from https://www.sciencedirect.com/science/article/pii/S0736585320301519. https://doi.org/10.1016/j.tele.2020.101492

  • An, X., Sun, X., Xu, S., Hao, L., & Li, J. (2021). Important citations identification by exploiting generative model into discriminative model. Journal of Information Science. https://doi.org/10.1177/0165551521991034

    Article  Google Scholar 

  • An, X., Sun, X., Xu, S. (2022). Important citations identification with semisupervised classification model. Scientometrics, 1–23.

  • Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

    MATH  Google Scholar 

  • Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216. https://doi.org/10.1002/asi.4630330404

    Article  Google Scholar 

  • Breiman, L. (2021). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  • Brooks, T. A. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 36(4), 223–229. https://doi.org/10.1002/asi.4630360402

    Article  Google Scholar 

  • Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? a study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645. https://doi.org/10.1002/(SICI)1097-4571(2000)51:7⟨635::AID-ASI6⟩3.0.CO;2-H

    Article  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018

    Article  MATH  Google Scholar 

  • Diederich, J., & Balke, W.-T. (2007). The semantic GrowBag algorithm: Automatically deriving categorization systems. In L. Kovacs, N. Fuhr, & C. Meghini (Eds.), Research and advanced technology for digital libraries (pp. 1–13). Springer.

    Google Scholar 

  • Dong, C., & Sch¨afer, U. (2011). Ensemble-style self-training on citation classification. In Proceedings of 5th international joint conference on natural language processing (pp. 623–631). Asian Federation of Natural Language Processing. Retrieved 2021–12–09 from https://aclanthology.org/I111070

  • Finney, B. (1979). Can citation indexing be automated. The Reference Characteristics of Scientific Texts, 269, 189–192.

    Google Scholar 

  • Garfield, E. (1965). Can citation indexing be automated. In Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, 269, 189–192.

    Google Scholar 

  • Garzone, M., & Mercer, R. E. (2000). Towards an automated citation classifier. In H. J. Hamilton (Ed.), Advances in artificial intelligence (pp. 337–346). NY: Springer.

    Chapter  Google Scholar 

  • Hassan, S.-U., Imran, M., Iqbal, S., Aljohani, N. R., & Nawaz, R. (2018). Deep context of citations using machine-learning models in scholarly full-text articles. Proceedings of the National Academy of Sciences, 117(3), 1645–1662. https://doi.org/10.1007/s11192-018-2944-y

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102

    Article  MATH  Google Scholar 

  • Inhaber, H., & Przednowek, K. (1976). Quality of research and the nobel prizes. Social Studies of Science, 6(1), 33–50. https://doi.org/10.1177/030631277600600102

    Article  Google Scholar 

  • Iqbal, N., Ahmad, R., Jamil, F., & Kim, D.-H. (2021). Hybrid features prediction model of movie quality using multi-machine learning techniques for effective business resource planning. Journal of Intelligent & Fuzzy Systems, 40(5), 9361–9382.

    Article  Google Scholar 

  • Jochim, C., & Schu¨tze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING 2012 (pp. 1343–1358). The COLING 2012 Organizing Committee. Retrieved 2021–12–09 from https://aclanthology.org/C12-1082

  • Junli, C., & Licheng, J. (2000). Classification mechanism of support vector machines. WCC 2000–ICSP 2000: 2000 5th international conference on signal processing proceedings: 16th world computer congress 2000 (Vol. 3, pp. 1556–1559). Doi:https://doi.org/10.1109/ICOSP.2000.893396

  • Li, X., He, Y., Meyers, A., Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of the international conference recent advances in natural language processing RANLP 2013 (pp. 402–407). INCOMA Ltd. Shoumen, BULGARIA. Retrieved 2021–12–09 from https://aclanthology.org/R13-1052

  • Lyu, D., Ruan, X., Xie, J., & Cheng, Y. (2021). The classification of citing motivations: A meta-synthesis. Scientometrics, 126(4), 3243–3264.

    Article  Google Scholar 

  • Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & B¨orner, K. (2013). Global multi-level analysis of the ‘scientific food web.’ Scientific Reports, 3(1), 1167. https://doi.org/10.1038/srep01167

    Article  Google Scholar 

  • Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92. https://doi.org/10.1177/030631277500500106

    Article  Google Scholar 

  • Nanba, O. M. H. (1999). Towards multi-paper summarization using reference information. IJCAI, 99, 926–931.

    Google Scholar 

  • Nazir, S., Asif, M., Ahmad, S., Bukhari, F., Afzal, M. T., & Aljuaid, H. (2020). Important citation identification by exploiting content and section-wise in-text citation count. PLoS ONE, 15(3), e0228885. https://doi.org/10.1371/journal.pone.0228885

    Article  Google Scholar 

  • Pham, S. B., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In T. T. D. Gedeon & L. C. C. Fung (Eds.), AI 2003: Advances in artificial intelligence (pp. 759–771). Springer.

    Chapter  Google Scholar 

  • Pham, B. T., Tien Bui, D., Prakash, I., & Dholakia, M. B. (2017). Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. CATENA, 149, 52–63. https://doi.org/10.1016/j.catena.2016.09.007. Accessed 8 Dec 2021.

    Article  Google Scholar 

  • Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: Backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25(6), 747–759. https://doi.org/10.1016/j.envsoft.2009.10.016

    Article  Google Scholar 

  • Pride, D., & Knoth, P. (2017). Incidental or influential?–challenges in automatically detecting citation importance using publication full texts. In Proceedings of the lecture notes in computer science, Beer-Sheva, Israel, 29–30 June 2017 (Vol. 10450, pp. 572–578). Gabler: Wiesbaden, Germany.

  • Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cueterms from content. Scientometrics, 118(1), 21–43. https://doi.org/10.1007/s11192-018-2961-x

    Article  Google Scholar 

  • Spiegel-Rosing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113. https://doi.org/10.1177/030631277700700111

    Article  Google Scholar 

  • Sugiyama, K., Kumar, T., Kan, M.-Y., Tripathi, R.C. (2010). Identifying citing sentences in research papers using supervised learning. In 2010 international conference on information retrieval knowledge management (CAMP) (pp. 67–72). https://doi.org/10.1109/INFRKM.2010.5466945

  • Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers.

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics. Retrieved 2021–12–09 from https://aclanthology.org/W06-1613

  • Valenzuela, M., Ha, V., Etzioni, O. (2015). Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence. Retrieved 2021–12–08 from https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/view/10185

  • Wang, M., Zhang, J., Jiao, S., Zhang, X., Zhu, N., & Chen, G. (2020). Important citation identification by exploiting the syntactic and contextual information of citations. Scientometrics, 125(3), 2109–2129. https://doi.org/10.1007/s11192-020-03677-1

    Article  Google Scholar 

  • Xu, S. (2018). Bayesian Naive Bayes classifiers to text classification. Journal of Information Science, 44(1), 48–59. https://doi.org/10.1177/0165551516677946

    Article  Google Scholar 

  • Xu, S., An, X., Qiao, X., & Zhu, L. (2014). Multi-task least-squares support vector machines. Multimedia Tools and Applications, 71(2), 699–715. https://doi.org/10.1007/s11042-013-1526-5

    Article  Google Scholar 

  • Zeng, T., & Acuna, D. E. (2020). Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models. Scientometrics, 124(1), 399–428. https://doi.org/10.1007/s11192-020-03421-9

    Article  Google Scholar 

  • Zhang, Y., Wang, Y., Sheng, Q. Z., Mahmood, A., Emma Zhang, W., & Zhao, R. (2021). TDM-CFC: Towards document-level multi-label citation function classification. International Conference on Web Information Systems Engineering. https://doi.org/10.1007/978-3-030-91560-5_26

    Article  Google Scholar 

  • Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427. https://doi.org/10.1002/asi.23179

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Energy Cloud R&D Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (2019M3F2A1073387), and this work was supported by the Institute for Information & communications Technology Promotion (IITP) (NO. 2022-0-00980, Cooperative Intelligence Framework of Scene Perception for Autonomous IoT Device).

Funding

The authors have received no funding for the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Faiza Qayyum or DoHyeun Kim.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

The authors have followed and agree to all the code of ethics required to submit manuscript in the Scientometrics journal.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qayyum, F., Jamil, H., Iqbal, N. et al. Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations. Scientometrics 127, 6471–6499 (2022). https://doi.org/10.1007/s11192-022-04530-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-022-04530-3

Keywords

Navigation