Skip to main content
Log in

SentCite: a sentence-level citation recommender based on the salient similarity among multiple segments

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Efficiently making adequate citations is becoming more challenging due to the rapidly increasing volume of publications. In practice, citing the appropriate references is a time-consuming and skill-required task. Accordingly, many studies have tried to help by providing citation-oriented support. In this field, citation recommendation is a significant research area because it addresses the problems of required profound skills and information overload. In this paper, we propose a sentence-level citation recommender, SentCite, that can identify the sentences that need links to references and can recommend citations. SentCite employs the convolutional recurrent neural network to extract the citing sentences and recommends citations based on the salient similarity between the sentences among the abstract, full text, and in-link context of the target papers. Unlike some other research in the big data domain, the recommended quality papers in this application are very limited. We proposed undersampling inlink context awareness to avoid overfitting problems. SentCite can recommend the most appropriate papers for the given sentences and outperforms other context-based methods in terms of improvement in mean reciprocal rank (MRR) 31.8%, mean average precision (MAP) 30.1%, and normalized discounted cumulative gain (NDCG) 33.8%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., & Xia, F. (2019). Scientific paper recommendation: A survey. IEEE Access, 7, 9324–9339.

    Article  Google Scholar 

  • Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.

    Article  Google Scholar 

  • Bertin, M., Atanassova, I., Sugimoto, C. R., & Lariviere, V. (2016). The linguistic patterns and rhetorical structure of citation context: An approach using n-grams. Scientometrics, 109(3), 1417–1434.

    Article  Google Scholar 

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O'Reilly Media, Inc.

  • Cai, X., Han, J., & Yang, L. (2018, April). Generative adversarial network based heterogeneous bibliographic network representation for personalized citation recommendation. In Thirty-second AAAI conference on artificial intelligence.

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Dai, T., Zhu, L., Wang, Y., & Carley, K. M. (2019). Attentive stacked denoising autoencoder with bi-LSTM for personalized context aware citation recommendation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 553–568.

    Article  Google Scholar 

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

  • Doslu, M., & Bingol, H. O. (2016). Context sensitive article ranking with citation context analysis. Scientometrics, 108(2), 653–671.

    Article  Google Scholar 

  • Duma, D., & Klein, E. (2014). Citation resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Short Papers), Baltimore, MD (Vol. 2, pp. 358–363).

  • Ebesu, T., & Fang, Y. (2017, August). Neural citation network for context-aware citation recommendation. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (pp. 1093–1096).

  • Färber, M., Thiemann, A., & Jatowt, A. (2018). To cite, or not to cite? Detecting citation contexts in text. In European conference on information retrieval (pp. 598–603). Springer.

  • Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1), 1–309.

    Article  Google Scholar 

  • Habib, R., & Afzal, M. T. (2019). Sections-based bibliographic coupling for research paper recommendation. Scientometrics, 119, 643–656. https://doi.org/10.1007/s11192-019-03053-8

    Article  Google Scholar 

  • Han, J., Song, Y., Zhao, W., Shi, S., & Zhang, H. (2018). hyperdoc2vec: Distributed representations of hypertext documents. In Proceedings of the 56th annual meeting of the Association for Computational Linguistics, Melbourne, Australia.

  • Hassan, S., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. ACM IEEE joint conference on digital libraries. Toronto (pp. 1–8). https://doi.org/10.1109/JCDL.2017.7991558

  • He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on World Wide Web (pp. 421–430). ACM.

  • Hernández-Alvarez, M., & Gomez, J. M. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.

    Article  Google Scholar 

  • Huang, W., Wu, Z., Chen, L., Mitra, P., & Giles, C. (2015). A neural probabilistic model for context based citation recommendation. In AAAI, Austin, TX (pp. 2404–2410).

  • Jeong, C., Jang, S., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics, 124(3), 1907–1922.

    Article  Google Scholar 

  • Jinha, A. (2010). Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing, 23(3), 258–263.

    Article  Google Scholar 

  • Johnson, R., Watkinson, A., & Mabe, M. (2018). The STM report: An overview of scientific and scholarly publishing. Technical and Medical Publishers.

    Google Scholar 

  • Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.

  • Kataria, S., Mitra, P., & Bhatia, S. (2010). Utilizing context in generative Bayesian models for linked corpus. In AAAI, Georgia, USA (Vol. 10, p. 1).

  • Kobayashi, Y., Shimbo, M., & Matsumoto, Y. (2018). Citation recommendation using distributed representation of discourse facets in scientific articles. In Proceedings of the 18th ACM/IEEE on joint conference on digital libraries (pp. 243–251). ACM.

  • Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.

  • Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

    Article  Google Scholar 

  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, Beijing, China (pp. 1188–1196).

  • Lutz, B., & Rüdiger, M. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222.

    Article  Google Scholar 

  • Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122(3), 1445–1472.

  • Ma, S., Zhang, H., Zhang, C., & Liu, X. (2021). Chronological citation recommendation with time preference. Scientometrics, 126(4), 2991–3010.

    Article  Google Scholar 

  • Makarenkov, V., & Rokach, L. (2020). Lessons learned from applying off-the-shelf BERT: There is no silver bullet. arXiv preprint arXiv:2009.07238.

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Nogueira, R., Jiang, Z., Cho, K., & Lin, J. (2020). Evaluating pretrained transformer models for citation recommendation. CEUR Workshop Proceedings, 2591, 89–100.

    Google Scholar 

  • Onodera, N., & Yoshikane, F. (2015). Factors affecting citation rates of research articles. Journal of the Association for Information Science and Technology, 66(4), 739–764.

    Article  Google Scholar 

  • Qi, H., Jian, P., Daniel, K., Prasenjit, M., & Lee, G. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on World Wide Web (pp. 421–430). ACM.

  • Ricci, F., Rokach, L., & Shapira, B. (2011). Introduction to recommender systems handbook. In Recommender systems handbook (pp. 1–35). Springer.

  • Sugiyama, K., & Kan, M.-Y. (2015). A comprehensive evaluation of scholarly paper recommendation using potential citation papers. International Journal on Digital Libraries, 16(2), 91–109.

    Article  Google Scholar 

  • Sugiyama, K., Kumar, T., Kan, M., & Tripathi, R. (2010). Identifying citing sentences in research papers using supervised learning. In 2010 International conference on information retrieval and knowledge management (CAMP) (pp. 67–72). IEEE.

  • Swearingen, K., & Sinha, R. (2001). Beyond algorithms: An HCI perspective on recommender systems. In ACM SIGIR 2001 workshop on recommender systems (Vol. 13, pp. 1–11). Citeseer.

  • Tang, J., & Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. In Pacific–Asia conference on knowledge discovery and data mining (pp. 572–579). Springer.

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110).

  • Valenzuela, M., Ha, V. A., & Etzioni, O. (2015). Identifying meaningful citations. AAAI Workshop: Scholarly Big Data.

  • Weber, R. O., Haolin, H., & Prateek, G. (2019). Explaining citation recommendations: Abstracts or full texts. In IJCAI 2019 explainable AI workshop. https://sites.google.com/view/xai2019/home.

  • Xu, S., Mariani, M. S., Lü, L., & Medo, M. (2020). Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. Journal of Informetrics, 14(1), 101005.

    Article  Google Scholar 

  • Yang, L., Zhang, Z., Cai, X., & Dai, T. (2019). Attention-based personalized encoder–decoder model for local citation recommendation. Computational Intelligence and Neuroscience. https://doi.org/10.1155/2019/1232581

    Article  Google Scholar 

  • Zhang, G., Ding, Y., & Milojević, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology, 64(7), 1490–1503.

    Article  Google Scholar 

Download references

Acknowledgements

The research is based on work supported by Taiwan Ministry of Science and Technology under Grant Nos. MOST 107-2410-H-006 040-MY3 and MOST 108-2511-H-0 06-009. We would like to thank partially research grant supported by “Higher Education SPROUT Project” and “Center for Innovative FinTech Business Models” of National Cheng Kung University (NCKU), sponsored by the Ministry of Education, Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hei-Chia Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, HC., Cheng, JW. & Yang, CT. SentCite: a sentence-level citation recommender based on the salient similarity among multiple segments. Scientometrics 127, 2521–2546 (2022). https://doi.org/10.1007/s11192-022-04339-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-022-04339-0

Keywords

Navigation