A New Approach for Implicit Citation Extraction

  • Chaker Jebari
  • Manuel Jesús Cobo
  • Enrique Herrera-Viedma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11315)


The extraction of implicit citations becomes more important since it is a fundamental step in many other applications such as paper summarization, citation sentiment analysis, citation classification, etc. This paper describes the limitations of previous works in citation extraction and then proposes a new approach which is based on topic modeling and word embedding. As a first step, our approach uses LDA technique to identify the topics discussed in the cited paper. Following the same idea of Doc2Vec technique, our approach proposes two models. The first one called Sentence2Vec and it is used to represent all sentences following an explicit citation. This sentences are candidates to be implicit citation sentences. The second model called Topic2Vec, used to represent the topics covered in the cited paper. Based on the similarity between Sentence2Vec and Topic2Vec representations we can label a candidate sentence as implicit or not.


Implicit citation extraction Topic modeling Doc2Vec Sentence2Vec Topic2Vec 


  1. 1.
    Abu-Jbara, A., Radev, D.R.: Reference scope identification in citing sentences. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montréal, Canada, pp. 80–90 (2012)Google Scholar
  2. 2.
    Abu-Jbara, A., Ezra, J., Radev, D.R.: Purpose and polarity of citation: towards NLP-based bibliometrics. In: Proceedings of the North American Association for Computational Linguistics, Atlanta, Georga, USA, pp. 596–606 (2013)Google Scholar
  3. 3.
    Alghamdi, R., Alfalqi, K.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6, 147–153 (2015)Google Scholar
  4. 4.
    Athar, A.: Sentiment analysis of citations using sentence structure-based features. In: Proceedings of the ACL 2011 Student Session, pp. 81–87 (2011)Google Scholar
  5. 5.
    Athar, A., Teufel, S.: Context-enhanced citation sentiment detection. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, Canada, pp. 587–601 (2012)Google Scholar
  6. 6.
    Bu, Y., Wang, B., Huang, W.B., Che, S., Huang, Y.: Using the appearance of citations in full text on author co-citation analysis. Scientometrics 116(1), 275–289 (2018)CrossRefGoogle Scholar
  7. 7.
    David, M.B., Andrew, Y.N., Michael, I.J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  8. 8.
    Fortunato, S. et al.: Science of science. Science, 359(1007) (2018)Google Scholar
  9. 9.
    Hernandez-Alvarez, M., Gomez, J.M.: Survey about citation context analysis: tasks, techniques, and resources. Nat. Lang. Eng. 22(3), 327–349 (2015)CrossRefGoogle Scholar
  10. 10.
    Jochim, C., Schutze, H.: Improving citation polarity classification with product reviews. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics, pp. 42–48. ACL, Baltimore (2014)Google Scholar
  11. 11.
    Kaplan, D., Iida, R., Tokunaga, T.: Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, Singapore, pp. 88–95 (2009)Google Scholar
  12. 12.
    Kim, I.C., Le, D.X., Thoma, G.R.: Automated method for extracting citation sentences from online biomedical articles using SVM-based text summarization technique. In: Paper Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA, pp. 1991–1996 (2014)Google Scholar
  13. 13.
    O’Connor, J.: Citing statements: computer recognition and use to improve retrieval. Inf. Process. Manag. 18(3), 125–131 (1982)CrossRefGoogle Scholar
  14. 14.
    Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 555–564 (2010)Google Scholar
  15. 15.
    Quoc., L.E., Tomas. M.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China (2014)Google Scholar
  16. 16.
    Radev, D.R., Muthukrishnan, P., Qazvinian, V.: The ACL anthology network corpus. Lang. Resour. Eval. 47(4), 919–944 (2013)CrossRefGoogle Scholar
  17. 17.
    Small, H.: Interpreting maps of science using citation context sentiments: a preliminary investigation. Scientometrics 87, 373–388 (2011)CrossRefGoogle Scholar
  18. 18.
    Sondhi, P., Zhai, C.X.: A constrained hidden Markov model approach for non-explicit citation context extraction. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 361–369 (2014)CrossRefGoogle Scholar
  19. 19.
    Sugiyama, K., Kumar, T., Kan, M.Y., Tripathi. R.C.: Identifying citing sentences in research papers using supervised learning. In: Proceedings of the 2010 International Conference on Information Retrieval and Knowledge Management, Malaysia, pp. 67–72 (2010)Google Scholar
  20. 20.
    Yousif, A.: A survey on sentiment analysis of scientific citations. Artif. Intell. Rev. 1–34 (2017).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Chaker Jebari
    • 1
    • 2
  • Manuel Jesús Cobo
    • 3
  • Enrique Herrera-Viedma
    • 4
  1. 1.Computer Science DepartmentTunis El Manar UniversityTunisTunisia
  2. 2.Information Technology DepartmentColleges of Applied SciencesIbriOman
  3. 3.Department of Computer Science and EngineeringUniversity of CádizCádizSpain
  4. 4.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations