Characterizing human summarization strategies for text reuse and transformation in literature review writing

Abstract

Citations are useful signals of information salience, but little research has identified the patterns of information selection, transformation, and organization that they espouse. This paper investigated the summarization strategies followed in the writing of literature review sections of information science research papers. We found that the summarization strategies followed are different for the two major styles of literature review writing, descriptive versus integrative literature reviews. Descriptive literature reviews, which focus on individual descriptions of research papers, are more likely to reference the Method and the Result sections of the cited paper and copy-paste text the referenced text. In contrast, integrative literature reviews, which synthesize the main ideas for many papers together, have more critiques and focus mainly on the Conclusion sections. These findings, based on a hand-annotated dataset, have the potential to scale up into a transformation-invariant neural architecture for scientific summarization that can generate different summaries of the input text with integrative or descriptive characteristics.

This is a preview of subscription content, log in to check access.

Fig. 1

References

  1. Abura’ed, A., Bravo, A., Chiruzzo, L., & Saggion, H. (2018). LaSTUS/TALN + INCO@ CL-SciSumm 2018-using regression and convolutions for cross-document semantic linking and summarization of scholarly literature. In Proceedings of the 3nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL2018). Ann Arbor, Michigan (July 2018).

  2. Bourner, T. (1996). The research process: Four steps to success. Research methods: guidance for postgraduates, Arnold, London, pp. 7–11.

  3. Bradshaw, S. (2003). Reference directed indexing: Redeeming relevance for subject search in citation indexes. In International conference on theory and practice of digital libraries (pp. 499–510). Springer, Berlin, Heidelberg.

    Google Scholar 

  4. Bruce, C. S. (1994). Research students’ early experiences of the dissertation literature review. Studies in Higher Education,19(2), 217–229.

    Article  Google Scholar 

  5. Buchanan, G., & McKay, D. (2017). The lowest form of flattery: characterising text re-use and plagiarism patterns in a digital library corpus. In Proceedings of the ACM/IEEE joint conference on digital libraries (pp. 1–10). IEEE.

  6. Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science,5(4), 423–441.

    Article  Google Scholar 

  7. Citron, D. T., & Ginsparg, P. (2015). Patterns of text reuse in a scientific corpus. Proceedings of the National Academy of Sciences,112(1), 25–30.

    Article  Google Scholar 

  8. Dijk, T. A. (1979). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. New York: L. Erlbaum Associates.

    Google Scholar 

  9. Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology,59(1), 51–62.

    Article  Google Scholar 

  10. Guo, Q., & Li, C. (2007). The research on the application of text clustering and natural language understanding in automatic abstracting. In Fourth international conference on fuzzy systems and knowledge discovery, 2007. FSKD 2007. (vol. 4, pp. 92–96). IEEE.

  11. Hart, C. (1998). Doing a literature review. London: Sage.

    Google Scholar 

  12. Jaidka, K., Chandrasekaran, M. K., Rustagi, S., & Kan, M. Y. (2018). Insights from CL-SciSumm 2016: The faceted scientific document summarization shared task. International Journal on Digital Libraries,19(2–3), 163–171.

    Article  Google Scholar 

  13. Jaidka, K., Khoo, C., & Na, J. C. (2010). Imitating human literature review writing: an approach to multi-document summarization. In Proceedings of the international conference on asian digital libraries (pp. 116–119). Springer, Berlin, Heidelberg.

  14. Jaidka, K., Khoo, C., & Na, J. C. (2013a). Deconstructing human literature reviews–a framework for multi-document summarization. In proceedings of the 14th European workshop on natural language generation (pp. 125–135).

  15. Jaidka, K., Khoo, C. S. G., & Na, J. C. (2013b). Literature review writing: How information is selected and transformed. Aslib Proceedings,65(3), 303–325.

    Article  Google Scholar 

  16. Jha, R., Jbara, A. A., Qazvinian, V., & Radev, D. R. (2017). NLP-driven citation analysis for scientometrics. Natural Language Engineering,23(1), 93–130.

    Article  Google Scholar 

  17. Jing, H., & McKeown, K. R. (1999). The decomposition of human-written summary sentences. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 129–136). ACM.

  18. Jönsson, S. (2006). On academic writing. European Business Review,18(6), 479–490.

    Article  Google Scholar 

  19. Kan, M. Y., Klavans, J. L., & McKeown, K. R. (2002). Using the annotated bibliography as a resource for indicative summarization. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02).

  20. Khoo, C. S., Na, J. C., & Jaidka, K. (2011). Analysis of the macro-level discourse structure of literature reviews. Online Information Review,35(2), 255–271.

    Article  Google Scholar 

  21. Knott, D. (1999). Writing an annotated bibliography. Retrieved January 2009. http://www.writing.utoronto.ca/advice/specific-types-of-writing/annotated-bibliography.

  22. Liu, Y., Wang, X., Zhang, J., & Xu, H. (2008). Personalized PageRank based multi-document summarization. In IEEE international workshop on semantic computing and systems, 2008. WSCS’08. (pp. 169–173). IEEE.

  23. Massey, A. (1996). Using the literature: 3 × 4 analogies. The Qualitative Report, 2(4). Retrieved from January 2009. http://www.nova.edu/ssss/QR/QR2-4/massey.html.

  24. Mei, Q., & Zhai, C. (2008). Generating impact-based summaries for scientific literature. In Proceedings of the ACL conference on human language technologies (pp. 816–824). Association for Computational Linguistics.

  25. Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., & Zajic, D. (2009). Using citations to generate surveys of scientific paradigms. In Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics (pp. 584–592). Association for Computational Linguistics.

  26. Nanba, H. (2000). Classification of research papers using citation links and citation types: Towards automatic review article generation. In Proceedings of the American Society for Information Science (ASIS)/the 11th SIG classification research workshop, classification for user support and learning, Chicago, USA, 2000 (pp. 117–134). Morgan Kaufmann Publishers.

  27. Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization reference information. In Proceedings of the 16th international joint conference on Artificial intelligence-Volume 2 (pp. 926–931). Morgan Kaufmann Publishers Inc.

  28. Nanba, H., & Okumura, M. (2005). Automatic detection of survey articles. In International Conference on Theory and Practice of Digital Libraries (pp. 391–401). Springer, Berlin, Heidelberg.

  29. Nomoto, T. (2016). NEAL: A neurally enhanced approach to linking citation and reference. In Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL) (pp. 168–174).

  30. Qazvinian, V., & Radev, D. R. (2010). Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 555–564). Association for Computational Linguistics.

  31. Qazvinian, V., Radev, D. R., & Özgür, A. (2010). Citation summarization through keyphrase extraction. In Proceedings of the 23rd international conference on computational linguistics (pp. 895–903). Association for Computational Linguistics.

  32. Rowley, J., & Slack, F. (2004). Conducting a literature review. Management research news,27(6), 31–39.

    Article  Google Scholar 

  33. Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 379–389).

  34. Silva, F. N., Amancio, D. R., Bardosova, M., Costa, L. D. F., & Oliveira, O. N., Jr. (2016). Using network science and text analytics to produce surveys in a scientific topic. Journal of Informetrics,10(2), 487–502.

    Article  Google Scholar 

  35. Singh, M., Niranjan, A., Gupta, D., Bakshi, N. A., Mukherjee, A., & Goyal, P. (2017). Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science. In Proceedings of the ACM/IEEE joint conference on digital libraries (JCDL) (pp. 1–4). IEEE.

  36. Tandon, N., & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers. In 35th German conference on artificial intelligence (p. 98).

  37. Teufel, S. (1999). Argumentative Zoning: Information Extraction from scientific text. Ph.D. Thesis, University of Edinburgh.

  38. Teufel, S., Carletta, J., & Moens, M. (1999). An annotation scheme for discourse-level argumentation in research articles. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (pp. 110–117). Association for Computational Linguistics.

  39. Torraco, R. J. (2005). Writing integrative literature reviews: Guidelines and examples. Human Resource Development Review,4(3), 356–367.

    Article  Google Scholar 

  40. Toulmin, S. E. (2003). The uses of argument. Cambridge: Cambridge University Press.

    Google Scholar 

  41. Yasunaga, M., Kasai, J., Zhang, R., Dan, A. R. F. I. L., & Radev, F. D. R. (2019). ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of the AAAI annual meeting.

  42. Zhang, Y., Barzilay, R., & Jaakkola, T. (2017). Aspect-augmented adversarial networks for domain adaptation. arXiv preprint arXiv:1701.00188.

  43. Zhao, J. J., Kim, Y., Zhang, K., Rush, A. M., & LeCun, Y. (2017). Adversarially regularized autoencoders for generating discrete structures. CoRR, abs/1706.04223.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kokil Jaidka.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 172 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jaidka, K., Khoo, C.S.G. & Na, J. Characterizing human summarization strategies for text reuse and transformation in literature review writing. Scientometrics 121, 1563–1582 (2019). https://doi.org/10.1007/s11192-019-03250-5

Download citation

Keywords

  • Literature review writing
  • Scientific summarization
  • Discourse analysis
  • Citance
  • Abstracting
  • Citation analysis

Mathematics Subject Classification

  • 62H20