Abstract
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asher, Nichoas and Vieu, Laure (2005). Subordinating and coordinating discourse relations. Lingua, 115(4):591–610.
Asher, Nicholas and Lascarides, Alex (2003). Logics of Conversation. Cambridge University Press, Cambridge, UK.
Bärenfänger, Maja, Lüngen, Harald, Hilbert, Mirco, and Lobin, Henning (in press). The role of logical and generic document structure in relational discourse analysis. In Benz, Anton, Kühnlein, Peter, and Sidner, Candy, editors, Constraints in Discourse 2. Series Pragmatics & Beyond. John Benjamins, Amsterdam.
Bärenfänger, Maja, Lobin, Henning, Lüngen, Harald, and Hilbert, Mirco (2008). OWL ontologies in discourse parsing. LDV-Forum. GLDV-Journal for Computational Linguistics and language Technololgy 23(1):7–26.
Bayerl, Petra Saskia, Lüngen, H., Gut, U., and Paul, K.I. (2003a). Methodology for reliable schema development and evaluation of manual annotations. In Workshop Notes for the Workshop on Knowledge Markup and Semantic Annotation, Second International Conference on Knowledge Capture (K-CAP 2003), pages 17–23, Sanibel, Florida.
Bayerl, Petra Saskia, Lüngen, Harald, Goecke, Daniela, Witt, Andreas, and Naber, Daniel (2003b). Methods for the semantic analysis of document markup. In Proceedings of the ACM Symposium on Document Engineering (DocEng 2003), pages 161–170, Grenoble.
Bechhofer, Sean, van Harmelen, Frank, Hendler, Jim, Horrocks, Ian, McGuiness, Deborah L., Patel-Schneider, Peter F., and Stein, Andrea Lynn (2004). OWL Web Ontology Language – Reference. Technical report, W3C (World Wide Web) Consortium. http://www.w3.org/TR/2004/REC-owl-ref-20040210/.
Brinker, Klaus (1997). Linguistische Textanalyse. Eine Einführung in Grundbegriffe und Methoden. 4th edition, Erich Schmidt, Berlin.
Carlson, Lynn and Marcu, Daniel (2001). Discourse tagging reference manual. Technical report, Information Science Institute, Marina del Rey, CA. ISI-TR-545.
Carlson, Lynn, Marcu, Daniel, and Okurowski, Mary Ellen (2001). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, Denmark.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurements, 20:37–46.
Corston-Oliver, Simon (1998). Computing of Representations of the Structure of Written Discourse. PhD thesis, University of California, Santa Barbara.
Daneš, Frantisek (1970). Zur linguistischen Analyse der Textstruktur. Folia Linguistica, 4:72–78.
Danlos, Laurence (2005). Comparing RST and SDRT discourse structures through dependency graphs. In Sassen, Claudia, Benz, Anton, and Kühnlein, Peter, editors, Proceedings of Constraints in Discourse, pages 55–62, Dortmund.
Egg, Markus and Redeker, Gisela (2005). Underspecified discourse representation. In Sassen, Claudia, Benz, Anton, and Kühnlein, Peter, editors, Proceedings of Constraints in Discourse, pages 46–53, Dortmund.
Givon, Talmy (1983). Topic Continuity in Discourse: An Introduction. In Givon, Talmy, editor, Topic Continuity in Discourse: A Quantitative Cross-Language Study, pages 5–41. John Benjamins, Amsterdam, Philadelphia.
Goecke, Daniela, Lüngen, Harald, Sasaki, Felix, Witt, Andreas, and Farrar, Scott (2005). GOLD and discourse: Domain- and community-specific extensions. In Proceedings of the 2005 E-MELD-Workshop, Boston, MA.
Gruber, H. and Muntigl, P. (2005). Generic and rhetorical structures of texts: Two sides of the same coin? Folia Linguistica. Special Issue: Approaches to Genre, XXXIX(1–2):75–114.
Helbig, Gerhard and Buscha, Joachim (1998). Deutsche Grammatik: Ein Handbuch für den Ausländerunterricht. 18th edition, Langenscheidt, Leipzig.
Holler, Anke und Jan Frederik Maas und Angelika Storrer (2004). Exploiting coreference annotations for text-to-hypertext conversion. In Proceedings of LREC, volume II, pages 651–654, Lisboa.
Hovy, Eduard and Maier, Elisabeth (1995). Parsimonious or profligate: How many and which discourse structure relations? Unpublished paper, http://www.isi.edu/natural-language/people/hovy/publications.html.
Kando, Noriko (1999). Text structure analysis as a tool to make retrieved documents usable. In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages, pages 126–135, Taipei, Taiwan.
Kunze, Claudia (2001). Lexikalisch-semantische Wortnetze. In Carstensen, Kai-Uwe et al., editor, Computerlinguistik und Sprachtechnologie: eine Einführung, pages 386–393. Spektrum Verlag, Heidelberg.
Landis, J.R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33:159–174.
Langer, Hagen, Lüngen, Harald, and Bayerl, Petra Saskia (2004a). Text type structure and logical document structure. In Proceedings of the ACL 2004 Workshop on Discourse Annotation, pages 49–56, Barcelona.
Le Thanh, Huong, Abeysinghe, Geetha, and Huyck, Christian (2004). Generating discourse structures for written texts. In Proceedings of COLING’04, Geneva, Switzerland.
Lötscher, Andreas (1987). Text und Thema. Studien zur thematischen Konstituenz von Texten. Reihe Germanistische Linguistik, 81. Niemeyer, Tübingen.
Lüngen, Harald, Lobin, Henning, Bärenfänger, Maja, Hilbert, Mirco, and Puskás, Csilla (2006a). Text parsing of a complex genre. In Proceedings of the Conference on Electronic Publishing (ELPUB), pages 247–256, Bansko, Bulgaria.
Lüngen, Harald, Puskás, Csilla, Bärenfänger, Maja, Hilbert, Mirco, and Lobin, Henning (2006b). Discourse segmentation of German written text. In Proceedings of the 5th International Conference on Natural Language Processing (FinTAL 2006), pages 245–256, Åbo, Finland. Springer.
Mann, William C. and Taboada, Maite (2005). RST – Rhetorical Structure Theory. W3C page. http://www.sfu.ca/rst.
Mann, William C. and Thompson, Sandra A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organisation. Text, 8(3):243–281.
Marcu, Daniel (1999). A decision-based approach to rhetorical parsing. In Proceedings of the 37th annual meeting of the ACL, pages 365–372, Maryland. Association for Computational Linguistics.
Marcu, Daniel (2000). The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge, MA.
Morris, Jane and Hirst, Graeme (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21–48.
Motsch, Wolfgang and Viehweger, Dieter (1991). Illokutionsstruktur als Komponente einer modularen Textanalyse. In Brinker, Klaus, editor, Aspekte der Textlinguistik, volume 106/107 of Germanistische Linguistik, pages 107–132. Olms, Hildesheim/Zürich/New York.
O’Donnell, Michael (2000). RSTTool 2.4 – A markup tool for Rhetorical Structure Theory. In Proceedings of the International Natural Language Generation Conference (INLG’2000), pages 253 – 256, Mitzpe Ramon, Israel.
Pasch, Renate, Brauße, Ursula, Breindl, Eva, and Waßner, Ulrich Hermann, editors (2003). Handbuch der deutschen Konnektoren. Linguistische Grundlagen der Beschreibung und syntaktische Merkmale der deutschen Satzverknüpfer (Konjunktionen, Satzadverbien und Partikeln). Schriften des Instituts für Deutsche Sprache. de Gruyter, Berlin.
Polanyi, Livia, Culy, Chris, van den Berg, Martin, Thione, Gian Lorenzo, and Ahn, David (2004a). A rule based approach to discourse parsing. In Proceedings of the 5th Workshop in Discourse and Dialogue, pages 108–117, Cambridge, MA. 2004.
Polanyi, Livia, Culy, Chris, van den Berg, Martin, Thione, Gian Lorenzo, and Ahn, David (2004b). Sentential structure and discourse parsing. In Proceedings of the ACL 2004 Workshop on Discourse Annotation, pages 49–56, Barcelona.
Polanyi, Livia, van den Berg, Martin, and Ahn, David (2003). Discourse structure and sentential information structure. Journal of Logic, Language and Information, 12:337–350.
Rehm, Georg (1998). Vorüberlegungen zur automatischen Zusammenfassung deutschsprachiger Texte mittels einer SGML- und DSSSL-basierten Repräsentation von RST-Relationen. Master’s thesis, Universität Osnabrück.
Reitter, David (2003a). Rhetorical analysis with rich-feature support vector models. Master’s thesis, University of Potsdam.
Reitter, David (2003b). Simple signals for complex rhetorics: On rhetorical analysis with rich-feature support vector models. In Seewald-Heeg, Uta, editor, Sprachtechnologie für die multilinguale Kommunikation. Textproduktion, Recherche, Übersetzung, Lokalisierung. Beiträge der GLDV-Frühjahrstagung 2003, volume 18 of LDV-Forum, pages 38–52, Köthen.
Schröder, Thomas (2003). Die Handlungsstruktur von Texten. Ein integrativer Beitrag zur Texttheorie. Gunter Narr, Tübingen.
Sporleder, Caroline and Lapata, Mirella (2004). Automatic paragraph identification: A study across languages and domains. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 72–79, Barcelona.
Stede, Manfred and Umbach, Carla (1998). DiMLex: A lexicon of discourse markers for text generation and understanding. In Proceedings of the 17th international conference on Computational Linguistics (COLING-98), pages 1238–1242, Montreal, Canada.
Stein, Stephan (2003). Textgliederung. Einheitenbildung im geschriebenen und gesprochenen Deutsch: Theorie und Empirie, volume 69 of Studia Linguistica Germanica. de Gruyter, Berlin.
Swales, John M. (1990). Genre Analysis. English in academic and research settings. Cambridge University Press, Cambridge, UK.
Teufel, Simone (1999). Argumentative Zoning: Information Extraction from Scientific Text. PhD thesis, University of Edinburgh.
Teufel, Simone and Moens, Marc (2002). Summarizing scientfic articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409–445.
van Dijk, Teun A. (1980). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Lawrence Erlbaum Associates, Hillsdale, NJ.
Walsh, Norman and Muellner, Leonard (1999). DocBook: The Definitive Guide. O’Reilly, Sebastopol, CA.
Witt, Andreas, Lüngen, Harald, Goecke, Daniela, and Sasaki, Felix (2005). Unification of XML documents with concurrent markup. Literary and Linguistic Computing, 20(1):103–116.
Wolf, Florian and Gibson, Edward (2005). Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2):249–288.
Zifonun, Gisela, Hoffmann, Ludger, and Strecker, Bruno (1997). Grammatik der deutschen Sprache, volume 7 of Schriften des Instituts für deutsche Sprache, chapter C6 “Thematische Organisation von Text und Diskurs”, pages 535–591. de Gruyter, Berlin/New York.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Lüngen, H., Bärenfänger, M., Hilbert, M., Lobin, H., Puskás, C. (2010). Discourse Relations and Document Structure. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_6
Download citation
DOI: https://doi.org/10.1007/978-90-481-3331-4_6
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3330-7
Online ISBN: 978-90-481-3331-4
eBook Packages: Computer ScienceComputer Science (R0)