Abstract
In recent scientific advances, Artificial Intelligence and Natural Language Processing are the major contributors to classifying documents and extracting information. Classifying citations in different classes have gathered a lot of attention due to the large volume of citations available in different digital libraries. Typical citation classification uses sentiment analysis, where various techniques are applied to citations texts to mainly classify them in “Positive”, “Negative” and “Neutral” sentiments. However, there can be innumerable reasons why an author selects another research for citation. Citations’ Context and Reasons Ontology—CCRO uses a clear scientific method to articulate eight basic reasons for citing by using an iterative process of sentiment analysis, collaborative meanings, and experts' opinions. Using CCRO, this research paper adopts an ontology-based approach to extract citation's reasons and instantiate ontology classes and properties on two different corpora of citation sentences. One corpus of citation sentences is a publicly available dataset, while the other is our own manually curated. The process uses a two-step approach. The first part is an interface to manually annotate each citation text in the selected corpora on CCRO properties. A team of carefully selected annotators has annotated each citation to achieve a high inter-annotator agreement. The second part focuses on the automatic extraction of these reasons. Using Natural Language Processing, Mapping Graph, and Reporting Verb in a citation sentence, citation's reason is extracted and mapped onto a CCRO property. After comparing both manual and automatic mapping, accuracy is calculated. Based on experiments and results, accuracy is calculated for both publicly available and own corpora of citation sentences.
Similar content being viewed by others
Notes
NLTK: https://www.nltk.org/.
spaCy: https://spacy.io/.
FrameNet: https://framenet.icsi.berkeley.edu/fndrupal/.
References
Amjad, Z., & Ihsan, I. (2020). VerbNet based citation sentiment class assignment using machine learning. International Journal of Advanced Computer Science and Applications, 11(9), 621–627. https://doi.org/10.14569/IJACSA.2020.0110973
Angrosh, M. A., Cranefield, S., & Stanger, N. (2010). Context identification of sentences in related work sections using a conditional random field: Towards intelligent digital libraries. Proceedings of the ACM International Conference on Digital Libraries. https://doi.org/10.1145/1816123.1816168
Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. https://doi.org/10.1162/coli.07-034-R2
Athar, A. (2011). Sentiment analysis of citations using sentence structure-based features. In ACL HLT 2011—49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of Student Session, June, 81–87. http://dl.acm.org/citation.cfm?id=2000976.2000991
Athar, A. (2014). Sentiment analysis of scientific citation. … of Cambridge, Computer Laboratory,(UCAM-CL- …, 856, 114. www.cl.cam.ac.uk/techreports/UCAM-CL-TR-856.pdf
Baird, L. M., & Oppenheim, C. (1994). Do citations matter? Journal of Information Science, 20(1), 2–15. https://doi.org/10.1177/016555159402000102
Butt, B. H., Rafi, M., Jamal, A., Ur Rehman, R. S., Alam, S. M. Z., & Alam, M. B. (2015). Classification of research citations (CRC). CEUR Workshop Proceedings, 1384, 18–27.
Charles, M. (2006). Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes, 25(3), 310–331. https://doi.org/10.1016/j.esp.2005.05.003
Ciancarini, P., di Iorio, A., Nuzzolese, A. G., Peroni, S., & Vitali, F. (2014). Evaluating citation functions in CiTO: Cognitive issues. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8465, 580–594. https://doi.org/10.1007/978-3-319-07443-6_39
Councill, I. G., Lee Giles, C., & Kan, M. Y. (2008). ParsCit: An open-source CRF reference string parsing package.In Proceedings of the 6th International Conference on Language Resources and Evaluation LREC 2008, 2008(3), 661–667.
Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24. https://doi.org/10.1108/eb026703
Dong, C., & Schäfer, U. (2011). Ensemble-style Self-training on Citation Classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, 623–631. http://www.aclweb.org/anthology/I11-1070
Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In ACL-05—43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 363–370. https://doi.org/10.3115/1219840.1219885
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479. https://doi.org/10.1126/science.178.4060.471
Garfield, E. (1973). Citation frequency as a measure of research activity and performance. Essays of an Information Scientist, 1(1), 406–408.
Garfield, E. (1996). When to cite. Library Quarterly, 66(4), 449–458. https://doi.org/10.1086/602912
Gilbert, G. N., & Woolgar, S. (1974). The quantitative study of science: An examination of the literature. Social Studies of Science, 4(3), 279–294. https://doi.org/10.1177/030631277400400305
Han Xu, E. M. (2013). Using heterogeneous features for scientific citation classification. In Proceedings of the 13th Conference of the Pacific Association for Computational Linguistics, September. https://doi.org/10.13140/2.1.2737.2484
Hernández, A. M., & Gómez, J. M. (2015). Survey in sentiment, polarity and function analysis of citation. In Proceedings of the First Workshop on Argumentation Mining, 102–103. https://doi.org/10.3115/v1/w14-2115
Hopper, P. J. (2013). Emergent grammar. In The routledge handbook of discourse analysis (pp. 301–314). https://doi.org/10.4324/9780203809068-30
Ihsan, I., Imran, S., Ahmed, O., & Qadir, M. A. (2019). Sentiment based study of citations reporting verb corpus using natural language processing. Corporum: Journal of Corpus Linguistics, 2(1), 25–35.
Ihsan, I., & Qadir, M. A. (2019). CCRO: Citation’s context reasons ontology. IEEE Access, 7, 30423–30436. https://doi.org/10.1109/ACCESS.2019.2903450
Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In 24th International Conference on Computational Linguistics—Proceedings of COLING 2012: Technical Papers, December 2012, 1343–1358. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.379.2126
Kazi, P. A. H., & Patwardhan, M. S. (2016). Context based citation summary of research articles: A step towards qualitative citation index. In IEEE International Conference on Computer Communication and Control, IC4 2015. https://doi.org/10.1109/IC4.2015.7375701
Kilgarriff, A., & Fellbaum, C. (2000). WordNet: An electronic lexical database. Language. https://doi.org/10.2307/417141
Kim, I. C., & Thoma, G. R. (2015). Automated classification of author’s sentiments in citation using machine learning techniques: A preliminary study. In 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2015. https://doi.org/10.1109/CIBCB.2015.7300319
Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. (2008). A large-scale classification of English verbs. Language Resources and Evaluation, 42(1), 21–40. https://doi.org/10.1007/s10579-007-9048-2
Levin, B. (1995). English verb classes and alternations: A preliminary investigation. Language, 71(1), 144. https://doi.org/10.2307/415968
Levin, B. (2008). Beth Levin—English verb classes and alternations—a preliminary investigation-University of Chicago Press (1993). In Optical and Infrared Interferometry.
Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In International Conference Recent Advances in Natural Language Processing, RANLP, September, 402–407. http://dblp.uni-trier.de/db/conf/ranlp/ranlp2013.html#LiHMG13
Manan, N. A., & Noor, N. M. (2014). Analysis of reporting verbs in master’s theses. Procedia—Social and Behavioral Sciences, 134, 140–145. https://doi.org/10.1016/j.sbspro.2014.04.232
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. https://doi.org/10.1515/text.1.1988.8.3.243
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2015). The stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55–60. https://doi.org/10.3115/v1/p14-5010
Markoff, J., Shapiro, G., & Weitman, S. R. (1975). Toward the integration of content analysis and general methodology. Sociological Methodology, 6, 1. https://doi.org/10.2307/270893
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92. https://doi.org/10.1177/030631277500500106
Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Journal of Web Semantics, 17, 33–43. https://doi.org/10.1016/j.websem.2012.08.001
Petrov, S. (2016). The world’s most accurate parser goes open source. Google AI Blog. https://ai.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html
Phugnar, P. (2014). A citation analysis of doctoral dissertation in library and information science accepted by the universities in Western India. http://hdl.handle.net/10603/18612
Piller, I., & Hyland, K. (1999). Hedging in scientific research articles. Language, 75(3), 631. https://doi.org/10.2307/417106
Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics, 118(1), 21–43. https://doi.org/10.1007/s11192-018-2961-x
Radev, D. R., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL anthology network corpus. Language Resources and Evaluation, 47(4), 919–944. https://doi.org/10.1007/s10579-012-9211-2
Shum, S. B. (1998). Evolving the web for scientific knowledge: First steps towards an ÒHCI knowledge WebÓ TodayÕs HCI digital library. Interfaces, 39, 1–9.
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388. https://doi.org/10.1007/s11192-011-0349-2
Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers. In The 35th German Conference on Artificial Intelligence (KI-2012), i, 98–102.
Taşkın, Z., & Al, U. (2018). A content-based citation analysis study based on text categorization. Scientometrics, 114(1), 335–357. https://doi.org/10.1007/s11192-017-2560-2
Teufel, S. (1999). Argumentative Zoning : Information extraction from scientific text University of Edinburgh. In Unpublished PhD thesis University of Edinburgh.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006a). An annotation scheme for citation function. In COLING/ACL 2006—SIGdial06: 7th SIGdial Workshop on Discourse and Dialogue, Proceedings of the Workshop, July, 80–87. https://doi.org/10.3115/1654595.1654612
Teufel, S., Siddharthan, A., & Tidhar, D. (2006b). Automatic classification of citation function. In COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, July, 103–110. https://doi.org/10.3115/1610075.1610091
Thompson, G., & Yiyun, Y. (1991). Evaluation in the reporting verbs used in academic papers. Applied Linguistics, 12(4), 365–382. https://doi.org/10.1093/applin/12.4.365
Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. AAAI Workshop—Technical Report, WS-15–13, 21–26. http://ai2-website.s3.amazonaws.com/publications/ValenzuelaHaMeaningfulCitations.pdf
Vinet, L., & Zhedanov, A. (2011). A “missing” family of classical orthogonal polynomials. Journal of Physics: A Mathematical and Theoretical. https://doi.org/10.1088/1751-8113/44/8/085201
Wan, X., & Liu, F. (2014). Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology, 65(9), 1929–1938. https://doi.org/10.1002/asi.23083
Wilson, V. (2012). Research methods: Bibliometrics. Evidence Based Library and Information Practice, 7(3), 121–123.
Xu, J., Zhang, Y., Wu, Y., Wang, J., Dong, X., & Xu, H. (2015). Citation sentiment analysis in clinical trial papers. In AMIA ... Annual Symposium Proceedings. AMIA Symposium, 2015, 1334–1341.
Yu, B. (2013). Automated citation sentiment analysis: What can we learn from biomedical researchers. Proceedings of the ASIST Annual Meeting. https://doi.org/10.1002/meet.14505001084
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ihsan, I., Qadir, M.A. An NLP-based citation reason analysis using CCRO. Scientometrics 126, 4769–4791 (2021). https://doi.org/10.1007/s11192-021-03955-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-03955-6