Cluster Computing

, Volume 21, Issue 1, pp 605–622 | Cite as

Extracting reference text from citation contexts

  • Afsheen KhalidEmail author
  • Fakhri Alam
  • Imran Ahmed


Information from the textual context of citations in scientific articles has been studied and used in many applications by the research community. For example, it has been used in topic modeling, sentiment analysis, scientific paper summarization and information retrieval. However, these applications suffer the problem of right identification of citation context window and alternately use the text in a fixed size window around the citation mention. In this way, citation contexts may contain terms or other text that is not used for describing the citation and should not be included in the citation context. Identifying such non-reference text in the citation context is a non-trivial task, yet significant. In this paper, it is attempted to identify and remove the non-reference text from the citation contexts by developing a heuristic algorithm based on pruning the transition-based dependency parse tree. Evaluating the accuracy of our algorithm, results showed 77% macro-precision, 83% macro-recall and 80% F-macro for 88 research articles of testing dataset having varying number of citations. Additionally, we find that for many of the cited articles in our testing dataset, the number of objective citation contexts is more than subjective ones.


Citation contexts Transition-based dependency parse tree Objective citation contexts Subjective citation contexts 


  1. 1.
    Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 500–509. Association for Computational Linguistics (2011)Google Scholar
  2. 2.
    Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., Hussain, A.: Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn. Comput. 7(4), 487–499 (2015)CrossRefGoogle Scholar
  3. 3.
    Liang, Y., Li, Q., Qian, T.: Finding relevant papers based on citation relations. In: International Conference on Web-Age Information Management, pp. 403–414. Springer, New York (2011)Google Scholar
  4. 4.
    Nanba, H., Kando, N., Okumura, M.: Classification of research papers using citation links and citation types: Towards automatic review article generation. Adv. Classif. Res. Online 11(1), 117–134 (2011)CrossRefGoogle Scholar
  5. 5.
    Dong, C., Schäfer, U.: Ensemble-style self-training on citation classification. In: IJCNLP, pp. 623–631 (2011)Google Scholar
  6. 6.
    Ritchie, A., Teufel, S., Robertson, S.: Using terms from citations for ir: some first results. In: European Conference on Information Retrieval, pp. 211–221. Springer, New York (2008)Google Scholar
  7. 7.
    Liu, S., Chen, C.: The differences between latent topics in abstracts and citation contexts of citing papers. J. Am. Soc. Inf. Sci. Technol. 64(3), 627–639 (2013)CrossRefGoogle Scholar
  8. 8.
    Caragea, C., Bulgarov, F.A., Godea, A., Gollapalli, S.D.: Citation-enhanced keyphrase extraction from research papers: a supervised approach. EMNLP 14, 1435–1446 (2014)Google Scholar
  9. 9.
    Teufel, S., Siddharthan, A., Tidhar, D.: An annotation scheme for citation function. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, pp. 80–87. Association for Computational Linguistics (2009)Google Scholar
  10. 10.
    Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 555–564. Association for Computational Linguistics (2010)Google Scholar
  11. 11.
    Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. IJCAI 99, 926–931 (1999)Google Scholar
  12. 12.
    Abu-Jbara, A., Radev, D.: Reference scope identification in citing sentences. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 80–90. Association for Computational Linguistics (2012)Google Scholar
  13. 13.
    Ritchie, A., Teufel, S., Robertson, S.: How to find better index terms through citations. In: Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?, pp. 25–32. Association for Computational Linguistics (2006)Google Scholar
  14. 14.
    O’Connor, J.: Citing statements: computer recognition and use to improve retrieval. Inf. Process. Manag. 18(3), 125–131 (1982)CrossRefGoogle Scholar
  15. 15.
    Ritchie, A., Robertson, S., Teufel, S.: Comparing citation contexts for information retrieval. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 213–222. ACM (2008)Google Scholar
  16. 16.
    Alvarez, M.H.: Concit-corpus context citation analysis to learn function, polarity and influence. Ph.D. thesis, Universitat d’Alacant-Universidad de Alicante (2015)Google Scholar
  17. 17.
    Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The acl anthology network corpus. Lang. Resour. Eval. 47(4), 919–944 (2013)Google Scholar
  18. 18.
    Athar, A.: Sentiment analysis of citations using sentence structure-based features. In: Proceedings of the ACL 2011 student session, pp. 81–87. Association for Computational Linguistics (2011)Google Scholar
  19. 19.
    Jochim, C., Schütze, H.: Towards a generic and flexible citation classifier based on a faceted classification scheme. In: Proceedings of the 2012 International Conference on Computational Linguistics, pp. 1343–1358. Citeseer (2012)Google Scholar
  20. 20.
    Di Iorio, A., Nuzzolese, A.G., Peroni, S.: Towards the automatic identification of the nature of citations. In: SePublica, pp. 63–74 (2013)Google Scholar
  21. 21.
    Li, X., He, Y., Meyers, A., Grishman, R.: Towards fine-grained citation function classification. In: RANLP, pp. 402–407 (2013)Google Scholar
  22. 22.
    Tuarob, S., Mitra, P., Giles, C.L.: A classification scheme for algorithm citation function in scholarly works. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 367–368. ACM (2013)Google Scholar
  23. 23.
    Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zhai, C.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)CrossRefGoogle Scholar
  24. 24.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)Google Scholar
  25. 25.
    Martin, F., Johnson, M.: More efficient topic modelling through a noun only approach. In: Australasian Language Technology Association Workshop 2015, p. 111 (2015)Google Scholar
  26. 26.
    Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 301–311. Springer (2005)Google Scholar
  27. 27.
    Nakagawa, T., Inui, K., Kurohashi, S.: Dependency tree-based sentiment classification using crfs with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794. Association for Computational Linguistics (2010)Google Scholar
  28. 28.
    Pak, A., Paroubek, P.: Text representation using dependency tree subgraphs for sentiment analysis. In: International Conference on Database Systems for Advanced Applications, pp. 323–332. Springer, New York (2011)Google Scholar
  29. 29.
    Tu, Z., Jiang, W., Liu, Q., Lin, S.: Dependency forest for sentiment analysis. In: Natural Language Processing and Chinese Computing, pp. 69–77. Springer, New York (2012)Google Scholar
  30. 30.
    Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp. 10–18. Association for Computational Linguistics (2012)Google Scholar
  31. 31.
    Honnibal, M.: spacy (2015)Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Institute of Management SciencesPeshawarPakistan

Personalised recommendations