UDRST: A Novel System for Unlabeled Discourse Parsing in the RST Framework

  • Ngo Xuan Bach
  • Nguyen Le Minh
  • Akira Shimazu
Conference paper

DOI: 10.1007/978-3-642-33983-7_25

Volume 7614 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Xuan Bach N., Le Minh N., Shimazu A. (2012) UDRST: A Novel System for Unlabeled Discourse Parsing in the RST Framework. In: Isahara H., Kanzaki K. (eds) Advances in Natural Language Processing. Lecture Notes in Computer Science, vol 7614. Springer, Berlin, Heidelberg

Abstract

This paper presents UDRST, an unlabeled discourse parsing system in the RST framework. UDRST consists of a segmentation model and a parsing model. The segmentation model exploits subtree features to rerank N-best outputs of a base segmenter, which uses syntactic and lexical features in a CRF framework. In the parsing model, we present two algorithms for building a discourse tree from a segmented text: an incremental algorithm and a dual decomposition algorithm. Our system achieves 77.3% in the unlabeled score on the standard test set of the RST Discourse Treebank corpus, which improves 5.0% compared to HILDA [6], a state-of-the-art discourse parsing system.

Keywords

Discourse Parsing Dual Decomposition Rhetorical Structure Theory RST UDRST 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ngo Xuan Bach
    • 1
  • Nguyen Le Minh
    • 1
  • Akira Shimazu
    • 1
  1. 1.School of Information ScienceJapan Advanced Institute of Science and TechnologyNomiJapan