An NLP-based citation reason analysis using CCRO

Ihsan, Imran; Qadir, M. Abdul

doi:10.1007/s11192-021-03955-6

An NLP-based citation reason analysis using CCRO

Published: 26 March 2021

Volume 126, pages 4769–4791, (2021)
Cite this article

Scientometrics Aims and scope Submit manuscript

952 Accesses
2 Citations
Explore all metrics

Abstract

In recent scientific advances, Artificial Intelligence and Natural Language Processing are the major contributors to classifying documents and extracting information. Classifying citations in different classes have gathered a lot of attention due to the large volume of citations available in different digital libraries. Typical citation classification uses sentiment analysis, where various techniques are applied to citations texts to mainly classify them in “Positive”, “Negative” and “Neutral” sentiments. However, there can be innumerable reasons why an author selects another research for citation. Citations’ Context and Reasons Ontology—CCRO uses a clear scientific method to articulate eight basic reasons for citing by using an iterative process of sentiment analysis, collaborative meanings, and experts' opinions. Using CCRO, this research paper adopts an ontology-based approach to extract citation's reasons and instantiate ontology classes and properties on two different corpora of citation sentences. One corpus of citation sentences is a publicly available dataset, while the other is our own manually curated. The process uses a two-step approach. The first part is an interface to manually annotate each citation text in the selected corpora on CCRO properties. A team of carefully selected annotators has annotated each citation to achieve a high inter-annotator agreement. The second part focuses on the automatic extraction of these reasons. Using Natural Language Processing, Mapping Graph, and Reporting Verb in a citation sentence, citation's reason is extracted and mapped onto a CCRO property. After comparing both manual and automatic mapping, accuracy is calculated. Based on experiments and results, accuracy is calculated for both publicly available and own corpora of citation sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

Fig. 12

Notes

NLTK: https://www.nltk.org/.
spaCy: https://spacy.io/.
FrameNet: https://framenet.icsi.berkeley.edu/fndrupal/.

References

Amjad, Z., & Ihsan, I. (2020). VerbNet based citation sentiment class assignment using machine learning. International Journal of Advanced Computer Science and Applications, 11(9), 621–627. https://doi.org/10.14569/IJACSA.2020.0110973
Article Google Scholar
Angrosh, M. A., Cranefield, S., & Stanger, N. (2010). Context identification of sentences in related work sections using a conditional random field: Towards intelligent digital libraries. Proceedings of the ACM International Conference on Digital Libraries. https://doi.org/10.1145/1816123.1816168
Article Google Scholar
Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. https://doi.org/10.1162/coli.07-034-R2
Article Google Scholar
Athar, A. (2011). Sentiment analysis of citations using sentence structure-based features. In ACL HLT 2011—49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of Student Session, June, 81–87. http://dl.acm.org/citation.cfm?id=2000976.2000991
Athar, A. (2014). Sentiment analysis of scientific citation. … of Cambridge, Computer Laboratory,(UCAM-CL- …, 856, 114. www.cl.cam.ac.uk/techreports/UCAM-CL-TR-856.pdf
Baird, L. M., & Oppenheim, C. (1994). Do citations matter? Journal of Information Science, 20(1), 2–15. https://doi.org/10.1177/016555159402000102
Article Google Scholar
Butt, B. H., Rafi, M., Jamal, A., Ur Rehman, R. S., Alam, S. M. Z., & Alam, M. B. (2015). Classification of research citations (CRC). CEUR Workshop Proceedings, 1384, 18–27.
Google Scholar
Charles, M. (2006). Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes, 25(3), 310–331. https://doi.org/10.1016/j.esp.2005.05.003
Article Google Scholar
Ciancarini, P., di Iorio, A., Nuzzolese, A. G., Peroni, S., & Vitali, F. (2014). Evaluating citation functions in CiTO: Cognitive issues. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8465, 580–594. https://doi.org/10.1007/978-3-319-07443-6_39
Article Google Scholar
Councill, I. G., Lee Giles, C., & Kan, M. Y. (2008). ParsCit: An open-source CRF reference string parsing package.In Proceedings of the 6th International Conference on Language Resources and Evaluation LREC 2008, 2008(3), 661–667.
Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24. https://doi.org/10.1108/eb026703
Article Google Scholar
Dong, C., & Schäfer, U. (2011). Ensemble-style Self-training on Citation Classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, 623–631. http://www.aclweb.org/anthology/I11-1070
Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In ACL-05—43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 363–370. https://doi.org/10.3115/1219840.1219885
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479. https://doi.org/10.1126/science.178.4060.471
Article Google Scholar
Garfield, E. (1973). Citation frequency as a measure of research activity and performance. Essays of an Information Scientist, 1(1), 406–408.
Google Scholar
Garfield, E. (1996). When to cite. Library Quarterly, 66(4), 449–458. https://doi.org/10.1086/602912
Article Google Scholar
Gilbert, G. N., & Woolgar, S. (1974). The quantitative study of science: An examination of the literature. Social Studies of Science, 4(3), 279–294. https://doi.org/10.1177/030631277400400305
Article Google Scholar
Han Xu, E. M. (2013). Using heterogeneous features for scientific citation classification. In Proceedings of the 13th Conference of the Pacific Association for Computational Linguistics, September. https://doi.org/10.13140/2.1.2737.2484
Hernández, A. M., & Gómez, J. M. (2015). Survey in sentiment, polarity and function analysis of citation. In Proceedings of the First Workshop on Argumentation Mining, 102–103. https://doi.org/10.3115/v1/w14-2115
Hopper, P. J. (2013). Emergent grammar. In The routledge handbook of discourse analysis (pp. 301–314). https://doi.org/10.4324/9780203809068-30
Ihsan, I., Imran, S., Ahmed, O., & Qadir, M. A. (2019). Sentiment based study of citations reporting verb corpus using natural language processing. Corporum: Journal of Corpus Linguistics, 2(1), 25–35.
Google Scholar
Ihsan, I., & Qadir, M. A. (2019). CCRO: Citation’s context reasons ontology. IEEE Access, 7, 30423–30436. https://doi.org/10.1109/ACCESS.2019.2903450
Article Google Scholar
Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In 24th International Conference on Computational Linguistics—Proceedings of COLING 2012: Technical Papers, December 2012, 1343–1358. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.379.2126
Kazi, P. A. H., & Patwardhan, M. S. (2016). Context based citation summary of research articles: A step towards qualitative citation index. In IEEE International Conference on Computer Communication and Control, IC4 2015. https://doi.org/10.1109/IC4.2015.7375701
Kilgarriff, A., & Fellbaum, C. (2000). WordNet: An electronic lexical database. Language. https://doi.org/10.2307/417141
Article MATH Google Scholar
Kim, I. C., & Thoma, G. R. (2015). Automated classification of author’s sentiments in citation using machine learning techniques: A preliminary study. In 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2015. https://doi.org/10.1109/CIBCB.2015.7300319
Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. (2008). A large-scale classification of English verbs. Language Resources and Evaluation, 42(1), 21–40. https://doi.org/10.1007/s10579-007-9048-2
Article Google Scholar
Levin, B. (1995). English verb classes and alternations: A preliminary investigation. Language, 71(1), 144. https://doi.org/10.2307/415968
Article Google Scholar
Levin, B. (2008). Beth Levin—English verb classes and alternations—a preliminary investigation-University of Chicago Press (1993). In Optical and Infrared Interferometry.
Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In International Conference Recent Advances in Natural Language Processing, RANLP, September, 402–407. http://dblp.uni-trier.de/db/conf/ranlp/ranlp2013.html#LiHMG13
Manan, N. A., & Noor, N. M. (2014). Analysis of reporting verbs in master’s theses. Procedia—Social and Behavioral Sciences, 134, 140–145. https://doi.org/10.1016/j.sbspro.2014.04.232
Article Google Scholar
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. https://doi.org/10.1515/text.1.1988.8.3.243
Article Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2015). The stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55–60. https://doi.org/10.3115/v1/p14-5010
Markoff, J., Shapiro, G., & Weitman, S. R. (1975). Toward the integration of content analysis and general methodology. Sociological Methodology, 6, 1. https://doi.org/10.2307/270893
Article Google Scholar
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92. https://doi.org/10.1177/030631277500500106
Article Google Scholar
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Journal of Web Semantics, 17, 33–43. https://doi.org/10.1016/j.websem.2012.08.001
Article Google Scholar
Petrov, S. (2016). The world’s most accurate parser goes open source. Google AI Blog. https://ai.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html
Phugnar, P. (2014). A citation analysis of doctoral dissertation in library and information science accepted by the universities in Western India. http://hdl.handle.net/10603/18612
Piller, I., & Hyland, K. (1999). Hedging in scientific research articles. Language, 75(3), 631. https://doi.org/10.2307/417106
Article Google Scholar
Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics, 118(1), 21–43. https://doi.org/10.1007/s11192-018-2961-x
Article Google Scholar
Radev, D. R., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL anthology network corpus. Language Resources and Evaluation, 47(4), 919–944. https://doi.org/10.1007/s10579-012-9211-2
Article Google Scholar
Shum, S. B. (1998). Evolving the web for scientific knowledge: First steps towards an ÒHCI knowledge WebÓ TodayÕs HCI digital library. Interfaces, 39, 1–9.
Google Scholar
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388. https://doi.org/10.1007/s11192-011-0349-2
Article Google Scholar
Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers. In The 35th German Conference on Artificial Intelligence (KI-2012), i, 98–102.
Taşkın, Z., & Al, U. (2018). A content-based citation analysis study based on text categorization. Scientometrics, 114(1), 335–357. https://doi.org/10.1007/s11192-017-2560-2
Article Google Scholar
Teufel, S. (1999). Argumentative Zoning : Information extraction from scientific text University of Edinburgh. In Unpublished PhD thesis University of Edinburgh.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006a). An annotation scheme for citation function. In COLING/ACL 2006—SIGdial06: 7th SIGdial Workshop on Discourse and Dialogue, Proceedings of the Workshop, July, 80–87. https://doi.org/10.3115/1654595.1654612
Teufel, S., Siddharthan, A., & Tidhar, D. (2006b). Automatic classification of citation function. In COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, July, 103–110. https://doi.org/10.3115/1610075.1610091
Thompson, G., & Yiyun, Y. (1991). Evaluation in the reporting verbs used in academic papers. Applied Linguistics, 12(4), 365–382. https://doi.org/10.1093/applin/12.4.365
Article Google Scholar
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. AAAI Workshop—Technical Report, WS-15–13, 21–26. http://ai2-website.s3.amazonaws.com/publications/ValenzuelaHaMeaningfulCitations.pdf
Vinet, L., & Zhedanov, A. (2011). A “missing” family of classical orthogonal polynomials. Journal of Physics: A Mathematical and Theoretical. https://doi.org/10.1088/1751-8113/44/8/085201
Article MathSciNet MATH Google Scholar
Wan, X., & Liu, F. (2014). Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology, 65(9), 1929–1938. https://doi.org/10.1002/asi.23083
Article Google Scholar
Wilson, V. (2012). Research methods: Bibliometrics. Evidence Based Library and Information Practice, 7(3), 121–123.
Article Google Scholar
Xu, J., Zhang, Y., Wu, Y., Wang, J., Dong, X., & Xu, H. (2015). Citation sentiment analysis in clinical trial papers. In AMIA ... Annual Symposium Proceedings. AMIA Symposium, 2015, 1334–1341.
Yu, B. (2013). Automated citation sentiment analysis: What can we learn from biomedical researchers. Proceedings of the ASIST Annual Meeting. https://doi.org/10.1002/meet.14505001084
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Capital University of Science and Technology, Islamabad, Pakistan
Imran Ihsan & M. Abdul Qadir
Department of Creative Technologies, Faculty of Computing and AI, Air University, Islamabad, Pakistan
Imran Ihsan

Authors

Imran Ihsan
View author publications
You can also search for this author in PubMed Google Scholar
M. Abdul Qadir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Imran Ihsan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ihsan, I., Qadir, M.A. An NLP-based citation reason analysis using CCRO. Scientometrics 126, 4769–4791 (2021). https://doi.org/10.1007/s11192-021-03955-6

Download citation

Received: 06 July 2020
Accepted: 16 March 2021
Published: 26 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11192-021-03955-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An NLP-based citation reason analysis using CCRO

Abstract

Access this article

Similar content being viewed by others

Evaluating Citation Functions in CiTO: Cognitive Issues

Extracting reference text from citation contexts

Identifying Functions of Citations with CiTalO

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An NLP-based citation reason analysis using CCRO

Abstract

Access this article

Similar content being viewed by others

Evaluating Citation Functions in CiTO: Cognitive Issues

Extracting reference text from citation contexts

Identifying Functions of Citations with CiTalO

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation