Robust Argumentative Zoning for Sensemaking in Scholarly Documents

  • Simone Teufel
  • Min-Yen Kan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6699)

Abstract

We present an automated approach to classify sentences of scholarly work with respect to their rhetorical function. While previous work that achieves this task of argumentative zoning requires richly annotated input, our approach is robust to noise and can process raw text. Even in cases where the input has noise (as it is obtained from optical character recognition or text extraction from PDF files), our robust classifier is largely accurate. We perform an in-depth study of our system both with clean and noisy inputs. We also give preliminary results from in situ acceptability testing when the classifier is embedded within a digital library reading environment.

Keywords

Digital Library Optical Character Recognition Minority Class Maximum Entropy Model Noisy Input 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cleverdon, C.W.: Optimizing convenient online access to bibliographic databases. Information Services and Use 4, 37–47 (1984)Google Scholar
  2. 2.
    Shum, S.B.: Evolving the web for scientific knowledge: First steps towards an “HCI knowledge web”. Interfaces, British HCI Group Magazine 39, 16–21 (1998)Google Scholar
  3. 3.
    Bazerman, C.: Physicists reading physics, schema-laden purposes and purpose-laden schema. Written Communication 2(1), 3–23 (1985)CrossRefGoogle Scholar
  4. 4.
    Kircz, J.G.: The rhetorical structure of scientific articles: The case for argumentational analysis in information retrieval. Journal of Documentation 47(4), 354–372 (1991)CrossRefGoogle Scholar
  5. 5.
    Ingwersen, P.: Cognitive perspectives of information retrieval interaction: Elements of a cognitive ir theory. Journal of Documentation 52, 3–50 (1996)CrossRefGoogle Scholar
  6. 6.
    Swales, J.: Research articles in English. In: Genre Analysis: English in Academic and Research Settings, ch. 7, pp. 110–176. Cambridge University Press, Cambridge (1990)Google Scholar
  7. 7.
    Teufel, S., Moens, M.: Summarising scientific articles — experiments with relevance and rhetorical status. Computational Linguistics 28(4), 409–446 (2002)CrossRefGoogle Scholar
  8. 8.
    Copestake, A., Corbett, P.T., Murray-Rust, P., Rupp, C.J., Siddharthan, A., Teufel, S., Waldron, B.: An architecture for language technology for processing scientific texts. In: UK e-Science All Hands Meeting (2006)Google Scholar
  9. 9.
    Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. In: Proceedings of IJCAI 1999, pp. 926–931 (1999)Google Scholar
  10. 10.
    Liddy, E.D.: The discourse-level structure of empirical abstracts: An exploratory study. Information Processing and Management 27(1), 55–81 (1991)CrossRefGoogle Scholar
  11. 11.
    Kando, N.: Text-level structure of research papers: Implications for text-based information processing systems. In: Proceedings of BCS-IRSG Colloquium, pp. 68–81 (1997)Google Scholar
  12. 12.
    Teufel, S., Carletta, J., Moens, M.: An annotation scheme for discourse-level argumentation in research articles. In: Proceedings of European ACL (EACL 1999), Bergen, Norway, pp. 110–117 (1999)Google Scholar
  13. 13.
    Siegel, S., Castellan, N.J.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-Hill, Berkeley (1988)Google Scholar
  14. 14.
    Siddharthan, A., Teufel, S.: Whose idea was this, and why does it matter? attributing scientific work to citations. In: Proceedings of the North American chapter of the Association of Computational Linguistics, NAACL 2007 (2007)Google Scholar
  15. 15.
    Hachey, B., Grover, C.: Extractive summarisation of legal texts. Artificial Intelligence and Law: Special Issue on E-government 14(4), 305–345 (2006)CrossRefGoogle Scholar
  16. 16.
    Merity, S., Murphy, T., Curran, J.R.: Accurate argumentative zoning with maximum entropy models. In: Proceedings of ACL-IJCNLP 2009 Workshop on text and citation analysis for scholarly digital libraries (NLPIR4DL), Singapore, pp. 19–26 (2009)Google Scholar
  17. 17.
    McKnight, L., Arinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA 2003 Symposium Proceedings, pp. 440–444 (2003)Google Scholar
  18. 18.
    Lin, J., Karakos, D., Demner-Fushman, D., Khudanpur, S.: Generative content models for structural analysis of medical abstracts. In: Proceedings of the HLT/NAACL 2006 Workshop on Biomedical Natural Language Processing (BIONLP 2006), New York City, USA, pp. 65–72 (2006)Google Scholar
  19. 19.
    Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 381–388 (2008) ACL Anthology Ref. I08-1050Google Scholar
  20. 20.
    Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Firth Conference on Applied Natural Language Processing, pp. 803–806 (1997)Google Scholar
  21. 21.
    Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Empirical Methods for Natural Language Processing. Association for Computational Linguistics, New Jersey (1996)Google Scholar
  22. 22.
    Porter, M.F.: An algorithm for suffix stripping. Program (3), 130–137 (1980)CrossRefGoogle Scholar
  23. 23.
    Teufel, S.: Argumentative Zoning: Information Extraction from Scientific Text. PhD thesis, School of Cognitive Science, University of Edinburgh, Edinburgh, UK (2000)Google Scholar
  24. 24.
    Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975)CrossRefMATHGoogle Scholar
  25. 25.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–381 (1971)CrossRefGoogle Scholar
  26. 26.
    Carletta, J.: Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22(2), 249–254 (1996)Google Scholar
  27. 27.
    Krippendorff, K.: Content Analysis: An Introduction to its Methodology, 2nd edn. Sage Publications, Beverly Hills (2004)Google Scholar
  28. 28.
    Krenn, B., Evert, S., Zinsmeister, H.: Determining intercoder agreement for a collocation identification task. In: Proceedings of Konvens 2004 (2004)Google Scholar
  29. 29.
    Nguyen, T.D., Kan, M.Y., Dang, D.T., Hänse, M., Hong, C.H.A., Luong, M.T., Gozali, J.P., Sugiyama, K., Tan, Y.F.: ForeCite: towards a reader-centric scholarly digital library. Under Review (2010)Google Scholar
  30. 30.
    Luong, M.T., Nguyen, T.D., Kan, M.Y.: Logical structure recovery in scholarly articles with rich document features. International Journal of Digital Library Systems (2011)Google Scholar
  31. 31.
    Nakov, P., Schwarz, A., Hearst, M.: Citances: Citation sentences for semantic analysis of bioscience text. In: SIGIR 2004 Workshop on Search and Discovery in Bioinformatics (2004)Google Scholar
  32. 32.
    Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of COLING 2008, Manchester, UK (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Simone Teufel
    • 1
  • Min-Yen Kan
    • 2
  1. 1.Computer LaboratoryUniversity of CambridgeUK
  2. 2.Department of Computer ScienceNational University of SingaporeSingapore

Personalised recommendations