Skip to main content
Log in

Extracting reference text from citation contexts

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Information from the textual context of citations in scientific articles has been studied and used in many applications by the research community. For example, it has been used in topic modeling, sentiment analysis, scientific paper summarization and information retrieval. However, these applications suffer the problem of right identification of citation context window and alternately use the text in a fixed size window around the citation mention. In this way, citation contexts may contain terms or other text that is not used for describing the citation and should not be included in the citation context. Identifying such non-reference text in the citation context is a non-trivial task, yet significant. In this paper, it is attempted to identify and remove the non-reference text from the citation contexts by developing a heuristic algorithm based on pruning the transition-based dependency parse tree. Evaluating the accuracy of our algorithm, results showed 77% macro-precision, 83% macro-recall and 80% F-macro for 88 research articles of testing dataset having varying number of citations. Additionally, we find that for many of the cited articles in our testing dataset, the number of objective citation contexts is more than subjective ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://clair.eecs.umich.edu/aan/index.php.

References

  1. Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 500–509. Association for Computational Linguistics (2011)

  2. Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., Hussain, A.: Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn. Comput. 7(4), 487–499 (2015)

    Article  Google Scholar 

  3. Liang, Y., Li, Q., Qian, T.: Finding relevant papers based on citation relations. In: International Conference on Web-Age Information Management, pp. 403–414. Springer, New York (2011)

  4. Nanba, H., Kando, N., Okumura, M.: Classification of research papers using citation links and citation types: Towards automatic review article generation. Adv. Classif. Res. Online 11(1), 117–134 (2011)

    Article  Google Scholar 

  5. Dong, C., Schäfer, U.: Ensemble-style self-training on citation classification. In: IJCNLP, pp. 623–631 (2011)

  6. Ritchie, A., Teufel, S., Robertson, S.: Using terms from citations for ir: some first results. In: European Conference on Information Retrieval, pp. 211–221. Springer, New York (2008)

  7. Liu, S., Chen, C.: The differences between latent topics in abstracts and citation contexts of citing papers. J. Am. Soc. Inf. Sci. Technol. 64(3), 627–639 (2013)

    Article  Google Scholar 

  8. Caragea, C., Bulgarov, F.A., Godea, A., Gollapalli, S.D.: Citation-enhanced keyphrase extraction from research papers: a supervised approach. EMNLP 14, 1435–1446 (2014)

    Google Scholar 

  9. Teufel, S., Siddharthan, A., Tidhar, D.: An annotation scheme for citation function. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, pp. 80–87. Association for Computational Linguistics (2009)

  10. Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 555–564. Association for Computational Linguistics (2010)

  11. Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. IJCAI 99, 926–931 (1999)

    Google Scholar 

  12. Abu-Jbara, A., Radev, D.: Reference scope identification in citing sentences. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 80–90. Association for Computational Linguistics (2012)

  13. Ritchie, A., Teufel, S., Robertson, S.: How to find better index terms through citations. In: Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?, pp. 25–32. Association for Computational Linguistics (2006)

  14. O’Connor, J.: Citing statements: computer recognition and use to improve retrieval. Inf. Process. Manag. 18(3), 125–131 (1982)

    Article  Google Scholar 

  15. Ritchie, A., Robertson, S., Teufel, S.: Comparing citation contexts for information retrieval. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 213–222. ACM (2008)

  16. Alvarez, M.H.: Concit-corpus context citation analysis to learn function, polarity and influence. Ph.D. thesis, Universitat d’Alacant-Universidad de Alicante (2015)

  17. Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The acl anthology network corpus. Lang. Resour. Eval. 47(4), 919–944 (2013)

  18. Athar, A.: Sentiment analysis of citations using sentence structure-based features. In: Proceedings of the ACL 2011 student session, pp. 81–87. Association for Computational Linguistics (2011)

  19. Jochim, C., Schütze, H.: Towards a generic and flexible citation classifier based on a faceted classification scheme. In: Proceedings of the 2012 International Conference on Computational Linguistics, pp. 1343–1358. Citeseer (2012)

  20. Di Iorio, A., Nuzzolese, A.G., Peroni, S.: Towards the automatic identification of the nature of citations. In: SePublica, pp. 63–74 (2013)

  21. Li, X., He, Y., Meyers, A., Grishman, R.: Towards fine-grained citation function classification. In: RANLP, pp. 402–407 (2013)

  22. Tuarob, S., Mitra, P., Giles, C.L.: A classification scheme for algorithm citation function in scholarly works. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 367–368. ACM (2013)

  23. Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zhai, C.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)

    Article  Google Scholar 

  24. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)

  25. Martin, F., Johnson, M.: More efficient topic modelling through a noun only approach. In: Australasian Language Technology Association Workshop 2015, p. 111 (2015)

  26. Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 301–311. Springer (2005)

  27. Nakagawa, T., Inui, K., Kurohashi, S.: Dependency tree-based sentiment classification using crfs with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794. Association for Computational Linguistics (2010)

  28. Pak, A., Paroubek, P.: Text representation using dependency tree subgraphs for sentiment analysis. In: International Conference on Database Systems for Advanced Applications, pp. 323–332. Springer, New York (2011)

  29. Tu, Z., Jiang, W., Liu, Q., Lin, S.: Dependency forest for sentiment analysis. In: Natural Language Processing and Chinese Computing, pp. 69–77. Springer, New York (2012)

  30. Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extraction. In: Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp. 10–18. Association for Computational Linguistics (2012)

  31. Honnibal, M.: spacy (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afsheen Khalid.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khalid, A., Alam, F. & Ahmed, I. Extracting reference text from citation contexts. Cluster Comput 21, 605–622 (2018). https://doi.org/10.1007/s10586-017-0954-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0954-9

Keywords

Navigation