Abstract
Information from the textual context of citations in scientific articles has been studied and used in many applications by the research community. For example, it has been used in topic modeling, sentiment analysis, scientific paper summarization and information retrieval. However, these applications suffer the problem of right identification of citation context window and alternately use the text in a fixed size window around the citation mention. In this way, citation contexts may contain terms or other text that is not used for describing the citation and should not be included in the citation context. Identifying such non-reference text in the citation context is a non-trivial task, yet significant. In this paper, it is attempted to identify and remove the non-reference text from the citation contexts by developing a heuristic algorithm based on pruning the transition-based dependency parse tree. Evaluating the accuracy of our algorithm, results showed 77% macro-precision, 83% macro-recall and 80% F-macro for 88 research articles of testing dataset having varying number of citations. Additionally, we find that for many of the cited articles in our testing dataset, the number of objective citation contexts is more than subjective ones.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 500–509. Association for Computational Linguistics (2011)
Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., Hussain, A.: Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn. Comput. 7(4), 487–499 (2015)
Liang, Y., Li, Q., Qian, T.: Finding relevant papers based on citation relations. In: International Conference on Web-Age Information Management, pp. 403–414. Springer, New York (2011)
Nanba, H., Kando, N., Okumura, M.: Classification of research papers using citation links and citation types: Towards automatic review article generation. Adv. Classif. Res. Online 11(1), 117–134 (2011)
Dong, C., Schäfer, U.: Ensemble-style self-training on citation classification. In: IJCNLP, pp. 623–631 (2011)
Ritchie, A., Teufel, S., Robertson, S.: Using terms from citations for ir: some first results. In: European Conference on Information Retrieval, pp. 211–221. Springer, New York (2008)
Liu, S., Chen, C.: The differences between latent topics in abstracts and citation contexts of citing papers. J. Am. Soc. Inf. Sci. Technol. 64(3), 627–639 (2013)
Caragea, C., Bulgarov, F.A., Godea, A., Gollapalli, S.D.: Citation-enhanced keyphrase extraction from research papers: a supervised approach. EMNLP 14, 1435–1446 (2014)
Teufel, S., Siddharthan, A., Tidhar, D.: An annotation scheme for citation function. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, pp. 80–87. Association for Computational Linguistics (2009)
Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 555–564. Association for Computational Linguistics (2010)
Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. IJCAI 99, 926–931 (1999)
Abu-Jbara, A., Radev, D.: Reference scope identification in citing sentences. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 80–90. Association for Computational Linguistics (2012)
Ritchie, A., Teufel, S., Robertson, S.: How to find better index terms through citations. In: Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?, pp. 25–32. Association for Computational Linguistics (2006)
O’Connor, J.: Citing statements: computer recognition and use to improve retrieval. Inf. Process. Manag. 18(3), 125–131 (1982)
Ritchie, A., Robertson, S., Teufel, S.: Comparing citation contexts for information retrieval. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 213–222. ACM (2008)
Alvarez, M.H.: Concit-corpus context citation analysis to learn function, polarity and influence. Ph.D. thesis, Universitat d’Alacant-Universidad de Alicante (2015)
Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The acl anthology network corpus. Lang. Resour. Eval. 47(4), 919–944 (2013)
Athar, A.: Sentiment analysis of citations using sentence structure-based features. In: Proceedings of the ACL 2011 student session, pp. 81–87. Association for Computational Linguistics (2011)
Jochim, C., Schütze, H.: Towards a generic and flexible citation classifier based on a faceted classification scheme. In: Proceedings of the 2012 International Conference on Computational Linguistics, pp. 1343–1358. Citeseer (2012)
Di Iorio, A., Nuzzolese, A.G., Peroni, S.: Towards the automatic identification of the nature of citations. In: SePublica, pp. 63–74 (2013)
Li, X., He, Y., Meyers, A., Grishman, R.: Towards fine-grained citation function classification. In: RANLP, pp. 402–407 (2013)
Tuarob, S., Mitra, P., Giles, C.L.: A classification scheme for algorithm citation function in scholarly works. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 367–368. ACM (2013)
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zhai, C.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)
Martin, F., Johnson, M.: More efficient topic modelling through a noun only approach. In: Australasian Language Technology Association Workshop 2015, p. 111 (2015)
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 301–311. Springer (2005)
Nakagawa, T., Inui, K., Kurohashi, S.: Dependency tree-based sentiment classification using crfs with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794. Association for Computational Linguistics (2010)
Pak, A., Paroubek, P.: Text representation using dependency tree subgraphs for sentiment analysis. In: International Conference on Database Systems for Advanced Applications, pp. 323–332. Springer, New York (2011)
Tu, Z., Jiang, W., Liu, Q., Lin, S.: Dependency forest for sentiment analysis. In: Natural Language Processing and Chinese Computing, pp. 69–77. Springer, New York (2012)
Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp. 10–18. Association for Computational Linguistics (2012)
Honnibal, M.: spacy (2015)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khalid, A., Alam, F. & Ahmed, I. Extracting reference text from citation contexts. Cluster Comput 21, 605–622 (2018). https://doi.org/10.1007/s10586-017-0954-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-0954-9