Discourse Segmentation of German Written Texts

  • Harald Lüngen
  • Csilla Puskás
  • Maja Bärenfänger
  • Mirco Hilbert
  • Henning Lobin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.


Relative Clause Main Clause Matrix Clause Segmentation Error Subordinate Clause 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: Toward a functional theory of text organisation. Text 8(3), 243–281 (1988)Google Scholar
  2. 2.
    Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)MATHGoogle Scholar
  3. 3.
    Marcu, D.: A decision-based approach to rhetorical parsing. In: Proceedings of the 37th annual meeting of the ACL, Maryland, Association for Computational Linguistics, pp. 365–372 (1999)Google Scholar
  4. 4.
    Carlson, L., Marcu, D.: Discourse tagging reference manual. Technical report, Information Science Institute, Marina del Rey, CA (2001) ISI-TR-545Google Scholar
  5. 5.
    Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of the Human Laanguage Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), Edmonton, Canada (2003)Google Scholar
  6. 6.
    Le Thanh, H., Abeysinghe, G., Huyck, C.: Automated discourse segmentation by syntactic information and cue phrases. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2004), Innsbruck, Austria (2004)Google Scholar
  7. 7.
    Sporleder, C., Lapata, M.: Discourse chunking and its application to sentence compression. In: Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, Canada (2005)Google Scholar
  8. 8.
    Le Thanh, H., Abeysinghe, G., Huyck, C.: Generating discourse structures for written texts. In: Proceedings of COLING 2004, Geneva, Switzerland (2004)Google Scholar
  9. 9.
    Walsh, N., Muellner, L.: DocBook: The Definitive Guide. O’Reilly, Sebastopol (1999)Google Scholar
  10. 10.
    Saari, M.: Schwedisch als die zweite Nationalsprache Finnlands: Soziolinguistische Aspekte. Linguistik Online 7 (2000),
  11. 11.
    Krohn, P.: Arm, ärmer, kind. Die Zeit 15, 27 (2005)Google Scholar
  12. 12.
    O’Donnell, M.: RSTTool 2.4 – A markup tool for Rhetorical Structure Theory. In: Proceedings of the International Natural Language Generation Conference (INLG 2000), Mitzpe Ramon, Israel, pp. 253–256 (2000)Google Scholar
  13. 13.
    Lobin, H., Bärenfänger, M., Hilbert, M., Lüngen, H., Puskàs, C.: Text parsing of a complex genre. In: Proceedings of the Conference on Electronic Publishing (ELPUB), Bansko, Bulgaria (to appear, 2006)Google Scholar
  14. 14.
    Tapanainen, P., Järvinen, T.: A non-projective dependency parser. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Washington D.C., Association for Computational Linguistics, pp. 64–71 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Harald Lüngen
    • 1
  • Csilla Puskás
    • 1
  • Maja Bärenfänger
    • 1
  • Mirco Hilbert
    • 1
  • Henning Lobin
    • 1
  1. 1.FB 05 – Applied and Computational LinguisticsJustus-Liebig-Universität Gießen

Personalised recommendations