Abstract
The rapid explosion of scientific publications has made related work writing increasingly laborious. In this paper, we propose a fully automated approach to generate related work sections by leveraging a seq2seq neural network. In particular, the main goal of our work is to improve the abstractive generation of related work by introducing problem and method information, which serve as a pivot to connect the previous works in the related work section and has been ignored by the existing studies. More specifically, we employ a title-generation strategy to automatically obtain problem and method information from given references and add the problem and method information as an additional feature to enhance the generation of related work. To verify the effectiveness and feasibility of our approach, we conduct a comparative experiment on publicly available datasets using several common neural summarizers. The experimental results indicate that the introduction of problem and method information contributes to the better generation of related work and our approach substantially outperforms the informed baseline on ROUGE-1 and ROUGE-L. The case study shows that the problem and method information enables considerable topic coherence between the generated related work section and the original paper.
Similar content being viewed by others
References
Chen, J., & Zhuge, H. (2019). Automatic generation of related work through summarizing citations. Concurrency and Computation: Practice and Experience, 31(3), e4261.
Chen, Y. C., & Bansal, M. (2018). Fast abstractive summarization with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 675–686).
Cheng, S. W., Kuo, C. W., & Kuo, C. H. (2012). Research article titles in applied linguistics. Journal of Academic Language and Learning, 6(1), A1–A14.
Das, S., & Paik, J. H. (2021). Context-sensitive gender inference of named entities in text. Information Processing & Management, 58(1), 102423.
Day, R. A. (1996). How to write and publish a scientific paper. General Pharmacology, 6(27), 1077.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers) (pp. 4171–4186).
Flowerdew, L. (2008). Corpus-based analyses of the problem-solution pattern: A phraseological approach (Vol. 29). John Benjamins Publishing.
Gehrmann, S., Deng, Y., & Rush, A. M. (2018). Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 4098–4109).
Heffernan, K., & Teufel, S. (2018). Identifying problems and solutions in scientific text. Scientometrics, 116(2), 1367–1382.
Hoang, C. D. V., & Kan, M. Y. (2010). Towards automated related work summarization. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (pp. 427–435).
Hsu, W. T., Lin, C. K., Lee, M. Y., Min, K., Tang, J., & Sun, M. (2018). A unified model for extractive and abstractive summarization using inconsistency loss. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 132–141).
Hu, Y., & Wan, X. (2014, October). Automatic generation of related work sections in scientific papers: An optimization approach. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1624–1633).
Jaidka, K., Khoo, C., & Na, J. C. (2013). Deconstructing human literature reviews–a framework for multi-document summarization. In Proceedings of the 14th European Workshop on Natural Language Generation (pp. 125–135).
Jamali, H. R., & Nikzad, M. (2011). Article title type and its relation with the number of downloads and citations. Scientometrics, 88(2), 653–661.
Ji, D., Tao, P., Fei, H., & Ren, Y. (2020). An end-to-end joint model for evidence information extraction from court record document. Information Processing & Management, 57(6), 102305.
Khoo, C. S., Na, J. C., & Jaidka, K. (2011). Analysis of the macro-level discourse structure of literature. Online Information Review, 35(2), 255–271.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880).
Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 150–157).
Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3721–3731).
Lu, Y., Dong, Y., & Charlin, L. (2020). Multi-XScience: A large-scale dataset for extreme multi-document summarization of scientific articles. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 8068–8074).
Luan, Y., Wadden, D., He, L., Shah, A., Ostendorf, M., & Hajishirzi, H. (2019). A general framework for information extraction using dynamic span graphs. In Proceedings of NAACL-HLT (pp. 3036–3046).
Ma, S., Zhang, C., & Liu, X. (2020). A review of citation recommendation: From textual content to enriched context. Scientometrics, 122(3), 1445–1472.
Miao, L., Cao, D., Li, J., & Guan, W. (2020). Multi-modal product title compression. Information Processing & Management, 57(1), 102123.
Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishnan, P., Qazvinian, V., & Zajic, D. (2009). Using citations to generate surveys of scientific paradigms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 584–592).
Mutlu, B., Sezer, E. A., & Akcayol, M. A. (2020). Candidate sentence selection for extractive text summarization. Information Processing & Management, 57(6), 102359.
Nasar, Z., Jaffry, S. W., & Malik, M. K. (2018). Information extraction from scientific articles: A survey. Scientometrics, 117(3), 1931–1990.
Paiva, C. E., Lima, J. P. D. S. N., & Paiva, B. S. R. (2012). Articles with short titles describing the results are cited more often. Clinics, 67(5), 509–513.
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318).
Putra, J. W. G., & Khodra, M. L. (2017). Automatic title generation in scientific articles for authorship assistance: A summarization approach. Journal of ICT Research and Applications, 11(3), 253–267.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. University of British Colombia.
Saggion, H., Shvets, A., & Bravo, À. (2020). Automatic related work section generation: Experiments in scientific document abstracting. Scientometrics, 125(3), 3159–3185.
Scott, M. (2001). Mapping key words to problem and solution. In M. Scott & G. Thompson (Eds.), Patterns of Text: In Honour of Michael Hoey (pp. 109–127). Benjamins.
See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1073–1083).
Swales, J. M., & Feak, C. B. (2004). Academic writing for graduate students: Essential tasks and skills (Vol. 1). University of Michigan Press.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 6000–6010).
Wang, P., Li, S., Zhou, H., Tang, J., & Wang, T. (2019). ToC-RWG: Explore the combination of topic model and citation information for automatic related work generation. IEEE Access, 8, 13043–13055.
Wang, Y., Liu, X., & Gao, Z. (2018). Neural related work summarization with a joint context-driven attention mechanism. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1776–1786).
Widyantoro, D. H., & Amin, I. (2014). Citation sentence identification and classification for related work summarization. In 2014 International Conference on Advanced Computer Science and Information System (pp. 291–296). IEEE.
Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A. R., Li, I., Friedman, D., & Radev, D. R. (2019). ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 7386–7393).
Zaman, F., Shardlow, M., Hassan, S. U., Aljohani, N. R., & Nawaz, R. (2020). HTSS: A novel hybrid text summarisation and simplification architecture. Information Processing & Management, 57(6), 102351.
Zhang, M., Zhou, G., Yu, W., & Liu, W. (2021). FAR-ASS: Fact-aware reinforced abstractive sentence summarization. Information Processing & Management, 58(3), 102478.
Acknowledgements
This work was partially supported by Major Projects of National Social Science Foundation of China (No. 17ZDA292).
Author information
Authors and Affiliations
Contributions
PL: Conceptualization, Methodology, Writing—Original Draft. WL: Conceptualization, Methodology, Formal analysis, Supervision. QC: Data Curation, Writing—Review & Editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Rights and permissions
About this article
Cite this article
Li, P., Lu, W. & Cheng, Q. Generating a related work section for scientific papers: an optimized approach with adopting problem and method information. Scientometrics 127, 4397–4417 (2022). https://doi.org/10.1007/s11192-022-04458-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04458-8