Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models


Scientist learn early on how to cite scientific sources to support their claims. Sometimes, however, scientists have challenges determining where a citation should be situated—or, even worse, fail to cite a source altogether. Automatically detecting sentences that need a citation (i.e., citation worthiness) could solve both of these issues, leading to more robust and well-constructed scientific arguments. Previous researchers have applied machine learning to this task but have used small datasets and models that do not take advantage of recent algorithmic developments such as attention mechanisms in deep learning. We hypothesize that we can develop significantly accurate deep learning architectures that learn from large supervised datasets constructed from open access publications. In this work, we propose a bidirectional long short-term memory network with attention mechanism and contextual information to detect sentences that need citations. We also produce a new, large dataset (PMOA-CITE) based on PubMed Open Access Subset, which is orders of magnitude larger than previous datasets. Our experiments show that our architecture achieves state of the art performance on the standard ACL-ARC dataset (\(F_{1}=0.507\)) and exhibits high performance (\(F_{1}=0.856\)) on the new PMOA-CITE. Moreover, we show that it can transfer learning across these datasets. We further use interpretable models to illuminate how specific language is used to promote and inhibit citations. We discover that sections and surrounding sentences are crucial for our improved predictions. We further examined purported mispredictions of the model, and uncovered systematic human mistakes in citation behavior and source data. This opens the door for our model to check documents during pre-submission and pre-archival procedures. We discuss limitations of our work and make this new dataset, the code, and a web-based tool available to the community.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.


  1. Aksnes, D. W., & Rip, A. (2009). Researchers’ perceptions of citations. Research Policy, 38(6), 895–905.

    Article  Google Scholar 

  2. Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211–36.

    Article  Google Scholar 

  3. Allerton, D. J. (1969). The sentence as a linguistic unit. Lingua, 22, 27–46.

    Article  Google Scholar 

  4. ANSI, NISO, Z. (2013). JATS: Journal article tag suite. Baltimore: National Information Standards Organization.

    Google Scholar 

  5. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

  6. Bhagavatula, C., Feldman, S., Power, R., & Ammar, W. (2018). Content-based citation recommendation. In Proceedings of NAACL-HLT 2018 (p. 13).

  7. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.

    Article  Google Scholar 

  8. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    MATH  Google Scholar 

  9. Bonab, H., Zamani, H., Learned-Miller, E. G., & Allan, J. (2018). Citation worthiness of sentences in scientific reports. In SIGIR (pp. 1061–1064).

  10. Booth, W., Colomb, G., Williams, J., Bizup, J., & FitzGerald, W. (2016). The craft of research. Chicago guides to writing, editing, and publishing (4th ed.). Chicago: University of Chicago Press.

    Google Scholar 

  11. Chen, C.-C. & Roth, C. (2012). Citation needed: the dynamics of referencing in wikipedia. In Proceedings of the eighth annual international symposium on wikis and open collaboration (p. 8). ACM.

  12. Chen, J., & Zhuge, H. (2019). Automatic generation of related work through summarizing citations. Concurrency and Computation: Practice and Experience, 31(3), e4261.

    Article  Google Scholar 

  13. Chen, X., Xu, L., Liu, Z., Sun, M., & Luan, H. (2015). Joint learning of character and word embeddings. In Twenty-fourth international joint conference on artificial, intelligence.

  14. Duma, D. & Klein, E. (2014). Citation resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers) (vol. 2, pp. 358–363).

  15. Duma, D., Liakata, M., Clare, A., Ravenscroft, J., & Klein, E. (2016). Applying core scientific concepts to context-based citation recommendation. In LREC.

  16. Ebesu, T. & Fang, Y. (2017). Neural citation network for context-aware citation recommendation. In Proceedings of the 40th international ACM SIGIR conference on research and development in information (pp. 1093–1096). ACM.

  17. Färber, M., Thiemann, A., & Jatowt, A. (2018). To cite, or not to cite? detecting citation contexts in text. In European conference on information (pp. 598–603). Springer.

  18. Fetahu, B., Markert, K., & Anand, A. (2017). Fine-grained citation span detection for references in wikipedia. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 1990–1999).

  19. Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis, 53, 69.

    Google Scholar 

  20. Frajzyngier, Z., Hodges, A., & Rood, D. S. (2005). Linguistic diversity and language theories (Vol. 72). Amsterdam: John Benjamins Publishing.

    Book  Google Scholar 

  21. Gazni, A., & Ghaseminik, Z. (2016). Author practices in citing other authors, institutions, and journals. Journal of the Association for Information Science and Technology, 67(10), 2536–2549.

    Article  Google Scholar 

  22. Graves, A., Mohamed, A.-R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645–6649). IEEE.

  23. Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv preprint arXiv:1410.5401.

  24. Halliday, M. A. K., Matthiessen, C., & Halliday, M. (2014). An introduction to functional grammar. Abingdon: Routledge.

    Book  Google Scholar 

  25. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

    Article  Google Scholar 

  26. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer series in statistics (2nd ed.). New York: Springer.

    Book  Google Scholar 

  27. He, J., Nie, J.-Y., Lu, Y., & Zhao, W. X. (2012). Position-aligned translation model for citation recommendation. In International symposium on string processing and information (pp. 251–263). Springer.

  28. He, Q., Kifer, D., Pei, J., Mitra, P., & Giles, C. L. (2011). Citation recommendation without author supervision. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 755–764). ACM.

  29. He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In WWW ’10 Proceedings of the 19th international conference on World wide web.

  30. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  31. Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (To appear).

  32. Huang, W., Wu, Z., Liang, C., Mitra, P., & Giles, C. L. (2015). A Neural probabilistic model for context based citation recommendation. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence (p. 7).

  33. Jack, K., López-García, P., Hristakeva, M., & Kern, R. (2014). Citation needed: Filling in wikipedia’s citation shaped holes. In Bibliometric-enhanced information (pp. 45–52). BIR 2014.

  34. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: With applications in R. New York: Springer Publishing Company, Incorporated.

    MATH  Google Scholar 

  35. Jebari, C., Cobo, M. J., & Herrera-Viedma, E. (2018). A new approach for implicit citation extraction. In International conference on intelligent data engineering and automated learning (pp. 121–129). Springer.

  36. Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. Proceedings of COLING, 2012, 1343–1358.

    Google Scholar 

  37. Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (3rd ed.). London: Pearson.

    Google Scholar 

  38. Kang, I.-S. & Kim, B.-K. (2012). Characteristics of citation scopes: A preliminary study to detect citing sentences. In Computer applications for database, education, and ubiquitous computing (pp. 80–85). Springer.

  39. Kaplan, D., Tokunaga, T., & Teufel, S. (2016). Citation block determination using textual coherence. Journal of Information Processing, 24(3), 540–553.

    Article  Google Scholar 

  40. Küçüktunç, O., Saule, E., Kaya, K., & Çatalyürek, Ü. V. (2012). Direction awareness in citation recommendation. In DBRank’12.

  41. Lancichinetti, A., Sirer, M. I., Wang, J. X., Acuna, D., Körding, K., & Amaral, L. A. N. (2015). High-reproducibility and high-accuracy method for automated topic classification. Physical Review X, 5(1), 011007.

    Article  Google Scholar 

  42. Li, P., Li, W., He, Z., Wang, X., Cao, Y., Zhou, J., & Xu, W. (2016). Dataset and neural recurrent sequence labeling model for open-domain factoid question answering. arXiv preprint arXiv:1607.06275.

  43. Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.

  44. Lukic, I. K., Lukic, A., Gluncic, V., Katavic, V., Vucenik, V., & Marusic, A. (2004). Citation and quotation accuracy in three anatomy journals. Clinical Anatomy: The Official Journal of the American Association of Clinical Anatomists and the British Association of Clinical Anatomists, 17(7), 534–539.

    Article  Google Scholar 

  45. Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1412–1421). Association for Computational Linguistics.

  46. Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  47. Masic, I. (2013). The importance of proper citation of references in biomedical articles. Acta Informatica Medica, 21(3), 148.

    Article  Google Scholar 

  48. McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In Proceedings of the 2002 ACM conference on computer supported cooperative work (pp. 116–125). ACM.

  49. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th international conference on neural information processing systems - volume 2 (NIPS’13) (pp. 3111–3119). Red Hook: Curran Associates Inc.

    Google Scholar 

  50. Mogull, S. A. (2017). Accuracy of cited “facts” in medical research articles: A review of study methodology and recalculation of quotation error rate. PLoS ONE, 12(9), e0184727.

    MathSciNet  Article  Google Scholar 

  51. Nakov, P. I., Schwartz, A. S., & Hearst, M. (2004). Citances: Citation sentences for semantic analysis of bioscience text. Proceedings of the SIGIR, 4, 81–88.

    Google Scholar 

  52. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. International Conference on Machine Learning, 28, 1310–1318.

    Google Scholar 

  53. Peng, H., Liu, J., & Lin, C.-Y. (2016). News citation recommendation with implicit and explicit semantics. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers) (vol. 1, pp. 388–398).

  54. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

  55. Ritchie, A. (2009). Citation context analysis for information retrieval. Technical report, University of Cambridge, Computer Laboratory.

  56. Santos, C. D. & Zadrozny, B. (2014). Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st international conference on machine learning (ICML-14) (pp. 1818–1826).

  57. Sun, Y., & Fisher, R. (2003). Object-based visual attention for computer vision. Artificial Intelligence, 146(1), 77–123.

    MathSciNet  Article  Google Scholar 

  58. Torres, R., McNee, S. M., Abel, M., Konstan, J. A., & Riedl, J. (2004). Enhancing digital libraries with techlens+. In Proceedings of the 4th ACM/IEEE-CS joint conference on digital libraries (pp. 228–236). ACM.

  59. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems (NIPS’17) (pp. 6000–6010). Red Hook: Curran Associates Inc.

  60. Wikipedia contributors. (2018). A rape on campus–Wikipedia, the free encyclopedia. Online Accessed 13 June-2018.

  61. Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (pp. 649–657). Red Hook: Curran Associates Inc.

    Google Scholar 

  62. Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., and Xu, B. (2016). Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers) (pp. 207–212). Association for Computational Linguistics.

Download references


Tong Zeng was funded by the China Scholarship Council #201706190067. Daniel E. Acuna was partially funded by the National Science Foundation Awards #1800956.

Author information



Corresponding author

Correspondence to Daniel E. Acuna.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zeng, T., Acuna, D.E. Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models. Scientometrics 124, 399–428 (2020).

Download citation


  • Citation worthiness
  • Citation context
  • Deep learning