Abstract
Efficiently making adequate citations is becoming more challenging due to the rapidly increasing volume of publications. In practice, citing the appropriate references is a time-consuming and skill-required task. Accordingly, many studies have tried to help by providing citation-oriented support. In this field, citation recommendation is a significant research area because it addresses the problems of required profound skills and information overload. In this paper, we propose a sentence-level citation recommender, SentCite, that can identify the sentences that need links to references and can recommend citations. SentCite employs the convolutional recurrent neural network to extract the citing sentences and recommends citations based on the salient similarity between the sentences among the abstract, full text, and in-link context of the target papers. Unlike some other research in the big data domain, the recommended quality papers in this application are very limited. We proposed undersampling inlink context awareness to avoid overfitting problems. SentCite can recommend the most appropriate papers for the given sentences and outperforms other context-based methods in terms of improvement in mean reciprocal rank (MRR) 31.8%, mean average precision (MAP) 30.1%, and normalized discounted cumulative gain (NDCG) 33.8%.
Similar content being viewed by others
References
Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., & Xia, F. (2019). Scientific paper recommendation: A survey. IEEE Access, 7, 9324–9339.
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.
Bertin, M., Atanassova, I., Sugimoto, C. R., & Lariviere, V. (2016). The linguistic patterns and rhetorical structure of citation context: An approach using n-grams. Scientometrics, 109(3), 1417–1434.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O'Reilly Media, Inc.
Cai, X., Han, J., & Yang, L. (2018, April). Generative adversarial network based heterogeneous bibliographic network representation for personalized citation recommendation. In Thirty-second AAAI conference on artificial intelligence.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Dai, T., Zhu, L., Wang, Y., & Carley, K. M. (2019). Attentive stacked denoising autoencoder with bi-LSTM for personalized context aware citation recommendation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 553–568.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Doslu, M., & Bingol, H. O. (2016). Context sensitive article ranking with citation context analysis. Scientometrics, 108(2), 653–671.
Duma, D., & Klein, E. (2014). Citation resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Short Papers), Baltimore, MD (Vol. 2, pp. 358–363).
Ebesu, T., & Fang, Y. (2017, August). Neural citation network for context-aware citation recommendation. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (pp. 1093–1096).
Färber, M., Thiemann, A., & Jatowt, A. (2018). To cite, or not to cite? Detecting citation contexts in text. In European conference on information retrieval (pp. 598–603). Springer.
Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1), 1–309.
Habib, R., & Afzal, M. T. (2019). Sections-based bibliographic coupling for research paper recommendation. Scientometrics, 119, 643–656. https://doi.org/10.1007/s11192-019-03053-8
Han, J., Song, Y., Zhao, W., Shi, S., & Zhang, H. (2018). hyperdoc2vec: Distributed representations of hypertext documents. In Proceedings of the 56th annual meeting of the Association for Computational Linguistics, Melbourne, Australia.
Hassan, S., Akram, A., & Haddawy, P. (2017). Identifying important citations using contextual information from full text. ACM IEEE joint conference on digital libraries. Toronto (pp. 1–8). https://doi.org/10.1109/JCDL.2017.7991558
He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on World Wide Web (pp. 421–430). ACM.
Hernández-Alvarez, M., & Gomez, J. M. (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering, 22(3), 327–349.
Huang, W., Wu, Z., Chen, L., Mitra, P., & Giles, C. (2015). A neural probabilistic model for context based citation recommendation. In AAAI, Austin, TX (pp. 2404–2410).
Jeong, C., Jang, S., Park, E., & Choi, S. (2020). A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics, 124(3), 1907–1922.
Jinha, A. (2010). Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing, 23(3), 258–263.
Johnson, R., Watkinson, A., & Mabe, M. (2018). The STM report: An overview of scientific and scholarly publishing. Technical and Medical Publishers.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
Kataria, S., Mitra, P., & Bhatia, S. (2010). Utilizing context in generative Bayesian models for linked corpus. In AAAI, Georgia, USA (Vol. 10, p. 1).
Kobayashi, Y., Shimbo, M., & Matsumoto, Y. (2018). Citation recommendation using distributed representation of discourse facets in scientific articles. In Proceedings of the 18th ACM/IEEE on joint conference on digital libraries (pp. 243–251). ACM.
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, Beijing, China (pp. 1188–1196).
Lutz, B., & Rüdiger, M. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222.
Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122(3), 1445–1472.
Ma, S., Zhang, H., Zhang, C., & Liu, X. (2021). Chronological citation recommendation with time preference. Scientometrics, 126(4), 2991–3010.
Makarenkov, V., & Rokach, L. (2020). Lessons learned from applying off-the-shelf BERT: There is no silver bullet. arXiv preprint arXiv:2009.07238.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nogueira, R., Jiang, Z., Cho, K., & Lin, J. (2020). Evaluating pretrained transformer models for citation recommendation. CEUR Workshop Proceedings, 2591, 89–100.
Onodera, N., & Yoshikane, F. (2015). Factors affecting citation rates of research articles. Journal of the Association for Information Science and Technology, 66(4), 739–764.
Qi, H., Jian, P., Daniel, K., Prasenjit, M., & Lee, G. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on World Wide Web (pp. 421–430). ACM.
Ricci, F., Rokach, L., & Shapira, B. (2011). Introduction to recommender systems handbook. In Recommender systems handbook (pp. 1–35). Springer.
Sugiyama, K., & Kan, M.-Y. (2015). A comprehensive evaluation of scholarly paper recommendation using potential citation papers. International Journal on Digital Libraries, 16(2), 91–109.
Sugiyama, K., Kumar, T., Kan, M., & Tripathi, R. (2010). Identifying citing sentences in research papers using supervised learning. In 2010 International conference on information retrieval and knowledge management (CAMP) (pp. 67–72). IEEE.
Swearingen, K., & Sinha, R. (2001). Beyond algorithms: An HCI perspective on recommender systems. In ACM SIGIR 2001 workshop on recommender systems (Vol. 13, pp. 1–11). Citeseer.
Tang, J., & Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. In Pacific–Asia conference on knowledge discovery and data mining (pp. 572–579). Springer.
Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110).
Valenzuela, M., Ha, V. A., & Etzioni, O. (2015). Identifying meaningful citations. AAAI Workshop: Scholarly Big Data.
Weber, R. O., Haolin, H., & Prateek, G. (2019). Explaining citation recommendations: Abstracts or full texts. In IJCAI 2019 explainable AI workshop. https://sites.google.com/view/xai2019/home.
Xu, S., Mariani, M. S., Lü, L., & Medo, M. (2020). Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. Journal of Informetrics, 14(1), 101005.
Yang, L., Zhang, Z., Cai, X., & Dai, T. (2019). Attention-based personalized encoder–decoder model for local citation recommendation. Computational Intelligence and Neuroscience. https://doi.org/10.1155/2019/1232581
Zhang, G., Ding, Y., & Milojević, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology, 64(7), 1490–1503.
Acknowledgements
The research is based on work supported by Taiwan Ministry of Science and Technology under Grant Nos. MOST 107-2410-H-006 040-MY3 and MOST 108-2511-H-0 06-009. We would like to thank partially research grant supported by “Higher Education SPROUT Project” and “Center for Innovative FinTech Business Models” of National Cheng Kung University (NCKU), sponsored by the Ministry of Education, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, HC., Cheng, JW. & Yang, CT. SentCite: a sentence-level citation recommender based on the salient similarity among multiple segments. Scientometrics 127, 2521–2546 (2022). https://doi.org/10.1007/s11192-022-04339-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04339-0