Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

  • 880 Accesses

Abstract

Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific journal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed systems with a similar task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.: Natural Language Understanding, 2nd edn. Benjamin/Cummings, Redwood City (1994)

    Google Scholar 

  2. Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  3. Asher, N., Vieu, L.: Subordinating and coordinating discourse relations. Lingua 115(4), 591–610 (2005)

    Google Scholar 

  4. Bärenfänger, M., Hilbert, M., Lobin, H., Lüngen, H., Puskàs, C.: Cues and constraints for the relational discourse analysis of complex text types - the role of logical and generic document structure. In: Sidner, C., Harpur, J., Benz, A., Kühnlein, P. (eds.) Proceedings of the Workshop on Constraints in Discourse, National University of Ireland, Maynooth, Ireland, pp. 27–34. (2006)

    Google Scholar 

  5. Bärenfänger, M., Goecke, D., Hilbert, M., Lüngen, H., Stührenberg, M.: Anaphora as an indicator of elaboration: A corpus study. JLCL - Journal for Language Technology and Computational Linguistics, 49–72 (2008)

    Google Scholar 

  6. Bärenfänger, M., Lobin, H., Lüngen, H., Hilbert, M.: OWL ontologies as a resource for discourse parsing. LDV-Forum GLDV-Journal for Computational Linguistics and Language Technology 23(2), 17–26 (2008)

    Google Scholar 

  7. Carlson, L., Marcu, D., Okurowski, M.E.: RST discourse treebank (2002), http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T07 (visited 20.01.2009), Linguistic Data Consortium

  8. Corston-Oliver, S.H.: Identifying the linguistic correlates of rhetorical relations. In: Proceedings of the ACL Workshop on Discourse Relations and Discourse Markers, pp. 8–14 (1998)

    Google Scholar 

  9. Cramer, I., Finthammer, M.: An evaluation procedure for word net based lexical chaining: Methods and issues. In: Proceedings of the Global WordNet Conference 2008, Szeged, Hungary (2008)

    Google Scholar 

  10. Diewald, N., Stührenberg, M., Garbar, A., Goecke, D.: Serengeti – Webbasierte Annotation semantischer Relationen. JLCL - Journal for Language Technology and Computational Linguistics, 74–94 (2008)

    Google Scholar 

  11. Earley, J.: An efficient context-free parsing algorithm. Communications of the Association for Computing Machinery 13(2), 94–102 (1970)

    MATH  Google Scholar 

  12. Egg, M., Redeker, G.: Underspecified discourse representation. In: Benz, A., Kühnlein, P. (eds.) Constraints in Discourse, Pragmatics & Beyond, Benjamins, Amsterdam, pp. 117–138 (2008)

    Google Scholar 

  13. Green, S.J.: Lexical semantics and automatic hypertext construction. ACM Computing Surveys 31(4) (1999)

    Google Scholar 

  14. Hanneforth, T., Heintze, S., Stede, M.: Rhetorical parsing with underspecification and forests. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL), Edmonton, Canada (2003)

    Google Scholar 

  15. Hearst, M.A.: TextTiling: A quantitative appraoch to discourse segmentation. Technical Report UCB:S2K-93-24 (1993), http://people.ischool.berkeley.edu/hearst/tiling-about.html (visited 20.01.2009)

  16. Hellwig, P.: Parsing natürlicher Sprachen: Grundlagen und Parsing natürlicher Sprachen: Realisierungen. In: Bátori, I.S., Lenders, W., Putschke, W. (eds.) Computational Linguistics. An International Handbook on Computer Oriented Language Research and Applications, Handbücher zur Sprach- und Kommunikationswissenschaft, de Gruyter, Berlin, pp. 348–431 (1989)

    Google Scholar 

  17. Hilbert, M., Lüngen, H.: RST-HP - Annotation of rhetorical structures in SemDok. Interne Reports der DFG-Forschergruppe 437 “Texttechnologische Informationsmodellierung”, Justus-Liebig-Universität Gießen, Fachgebiet ASCL (2009)

    Google Scholar 

  18. Hilbert, M., Lüngen, H., Bärenfänger, M., Lobin, H.: Demonstration des SemDok-Textparsers. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds.) Proceedings of the 9th Conference on Natural Language Processing (KONVENS 2008), pp. 22–28. Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin (2008)

    Google Scholar 

  19. Le Thanh, H.: An approach in automatically generating discourse structure of text. Journal of Computer Science and Cybernetics, Vietnam 23(3), 212–230 (2007)

    Google Scholar 

  20. Le Thanh, H., Abeysinghe, G.: A study to improve the efficiency of a discourse parsing system. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 104–117. Springer, Heidelberg (2003)

    Google Scholar 

  21. Le Thanh, H., Abeysinghe, G., Huyck, C.: Using cohesive devices to recognize rhetorical relations in text. In: Proceedings of the 4th Computational Linguistics UK Research Colloquium (CLUK-4). University of Edinburgh, UK (2003)

    Google Scholar 

  22. Le Thanh, H., Abeysinghe, G., Huyck, C.: Automated discourse segmentation by syntactic information and cue phrases. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2004), Innsbruck, Austria (2004)

    Google Scholar 

  23. Le Thanh, H., Abeysinghe, G., Huyck, C.: Generating discourse structures for written texts. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)

    Google Scholar 

  24. Lenz, E.A., Lüngen, H.: Dokumentation der Annotationsschicht: Logische Dokumentstruktur. Internal Report, Universität Dortmund, Institut für deutsche Sprache und Literatur/ Justus-Liebig-Universität Gießen, Fachgebiet ASCL (2004), http://www.uni-dortmund.de/hytex/hytex/publikationen.html

  25. Lüngen, H., Puskás, C., Bärenfänger, M., Hilbert, M., Lobin, H.: Discourse segmentation of german written texts. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 245–256. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Lüngen, H., Kunze, C., Lemnitzer, L., Storrer, A.: Towards an integrated OWL model for domain-specific and general language wordnets. In: Proceedings of the Fourth Global WordNet Conference (GWC 2008), Szeged, Hungary, pp. 281–296 (2008)

    Google Scholar 

  27. Lüngen, H., Bärenfänger, M., Hilbert, M., Lobin, H., Puskàs, C.: Discourse relations and document structure. In: Metzing, D., Witt, A. (eds.) Linguistic Modeling of Information and Markup Languages. Contributions to Language Technology, Text, Speech and Language Technology. Springer, Dordrecht (2010)

    Google Scholar 

  28. Magerman, D.M., Marcus, M.P.: Pearl: A probabilistic chart parser. In: Proceedings of the European ACL Conference, pp. 40–47 (1991)

    Google Scholar 

  29. Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: Toward a functional theory of text organisation. Text 8(3), 243–281 (1988)

    Article  Google Scholar 

  30. Marcu, D.: The rhetorical parsing, summarization, and generation of natural language texts. PhD thesis, University of Toronto (1997)

    Google Scholar 

  31. Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  32. Naumann, S., Langer, H.: Parsing. Teubner, Stuttgart (1994)

    Google Scholar 

  33. Polanyi, L., Culy, C., van den Berg, M., Thione, G.L., Ahn, D.: A rule based approach to discourse parsing. In: Proceedings of the 5th Workshop in Discourse and Dialogue, Cambridge, MA, pp. 108–117 (2004)

    Google Scholar 

  34. Polanyi, L., Culy, C., van den Berg, M., Thione, G.L., Ahn, D.: Sentential structure and discourse parsing. In: Proceedings of the ACL 2004 Workshop on Discourse Annotation, Barcelona, pp. 49–56 (2004)

    Google Scholar 

  35. Reitter, D.: Rhetorical analysis with rich-feature support vector models. Master’s thesis, University of Potsdam (2003)

    Google Scholar 

  36. Reitter, D.: Simple signals for complex rhetorics: On rhetorical analysis with rich-feature support vector models. In: Seewald-Heeg, U.: (ed) Sprachtechnologie für die multilinguale Kommunikation. Textproduktion, Recherche, Übersetzung, Lokalisierung. Beiträge der GLDV-Frühjahrstagung, Köthen, LDV-Forum, vol. 18(1,2), pp. 38–52 (2003)

    Google Scholar 

  37. Reitter, D., Stede, M.: Step by step: Underspecified markup in incremental rhetorical analysis. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003) at the EACL, Budapest (2003)

    Google Scholar 

  38. Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada (2003)

    Google Scholar 

  39. Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Association for Computational Linguistics, Washington D.C., pp. 64–71 (1997)

    Google Scholar 

  40. Tomita, M.: An efficient augmented-context-free parsing algorithm. Computational Linguistics 13(1-2), 31–46 (1987)

    Google Scholar 

  41. Walsh, N., Muellner, L.: DocBook: The Definitive Guide. O’Reilly, Sebastopol (1999)

    Google Scholar 

  42. Hilbert, M., Lüngen, H., Bärenfänger, M., Lobin, H.: Demonstration des SemDok-Textparsers. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds.) Proceedings of the 9th Conference on Natural Language Processing (KONVENS 2008), pp. 22–28. Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin (2008)

    Google Scholar 

  43. Webber, B.: D-LTAG: Extending Lexicalized TAG to Discourse. Cognitive Science 28(5), 751–779 (2004)

    Article  Google Scholar 

  44. Witt, A.: Multiple hierarchies: New aspects of an old solution. In: Proceedings of the Extreme Markup Languages, Montreal (2004)

    Google Scholar 

  45. Witt, A., Lüngen, H., Goecke, D., Sasaki, F.: Unification of XML documents with concurrent markup. Literary and Linguistic Computing 20(1), 103–116 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lobin, H., Lüngen, H., Hilbert, M., Bärenfänger, M. (2011). Processing Text-Technological Resources in Discourse Parsing. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22613-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22612-0

  • Online ISBN: 978-3-642-22613-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics