A Symbolic Approach for Automatic Detection of Nuclearity and Rhetorical Relations among Intra-sentence Discourse Segments in Spanish

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7181)


Nowadays automatic discourse analysis is a very prominent research topic, since it is useful to develop several applications, as automatic summarization, automatic translation, information extraction, etc. Rhetorical Structure Theory(RST) is the most employed theory. Nevertheless, there are not many studies about this subject in Spanish. In this paper we present the first system assigning nuclearity and rhetorical relations to intra-sentence discourse segments in Spanish texts. To carry out the research, we analyze the learning corpus of the RST Spanish Treebank, a corpus of manually-annotated specialized texts, in order to build a list of lexical and syntactic patterns marking rhetorical relations. To implement the system, this patterns’ list and a discourse segmenter called DiSeg are used. To evaluate the system, it is applied over the test corpus of the RST Spanish Treebank. Automatic and manual rhetorical analyses of each sentence are compared, by means of recall and precision, obtaining positive results.


Nuclearity Rhetorical Relations Intra-sentence Discourse Segments Rhetorical Structure Theory Corpus Symbolic Approach Spanish 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Afantenos, S., Denis, P., Muller, P., Danlos, L.: Learning recursive segments for discourse parsing. In: Proceedings of the Conference LREC 2010, pp. 3578–3584 (2010) Google Scholar
  2. 2.
    Carlson, L., Marcu, D., Okurowski, M.E.: RST Discourse Treebank. Linguistic Data Consortium, Pennsylvania (2002) Google Scholar
  3. 3.
    da Cunha, I., Torres-Moreno, J.-M., Sierra, G.: On the development of the RST Spanish Treebank. In: Proceedings of the Fifth Law Workshop (ACL 2011), pp. 1–10 (2011) Google Scholar
  4. 4.
    da Cunha, I., SanJuan, E., Torres-Moreno, J.-M., Lloberes, M., Castellón, I.: Discourse Segmentation for Spanish based on Shallow Parsing. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds.) MICAI 2010. LNCS, vol. 6437, pp. 13–23. Springer, Heidelberg (2010) Google Scholar
  5. 5.
    da Cunha, I., SanJuan, E., Torres-Moreno, J.-M., Lloberes, M., Castellón, I.: DiSeg 1.0: The First System for Spanish Discourse Segmentation. Expert Systems with Applications 39(2), 1671–1678 (2012) Google Scholar
  6. 6.
    da Cunha, I., Iruskieta, M.: Comparing rhetorical structures of different languages: The influence of translation strategies. Discourse Studies 12(5), 563–598 (2010) Google Scholar
  7. 7.
    Hovy, E.: Annotation. A Tutorial. Presented at the 48th Annual Meeting of the Association for Computational Linguistics (2010) Google Scholar
  8. 8.
    Iruskieta, M., da Cunha, I.: Marcadores y relaciones discursivas en el ámbito médico: un estudio en español y euskera. In: Bueno, J.L., et al. (eds.) Analizar datos > Describir variación: XXVIII Congreso Internacional AESLA, pp. 146–159. Universidade de Vigo, Servizo de Publicacións, Vigo (2010) Google Scholar
  9. 9.
    Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3), 243–281 (1988) Google Scholar
  10. 10.
    Marcu, D.: The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach. Computational Linguistics 26(3), 395–448 (2000) Google Scholar
  11. 11.
    Maziero, E., Pardo, T.A.S., Nunes, M.G.V.: Identificação automática de segmentos discursivos: o uso do parser PALAVRAS. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional. Universidade de São Paulo, São Carlos (2007) Google Scholar
  12. 12.
    Maziero, E., Pardo, T.A.S.: Metodologia de avaliação automática de estruturas retóricas. In: Proceedings of the III RST Meeting (7th Brazilian Symposium in Information and Human Language Technology), Brasil (2009) Google Scholar
  13. 13.
    O’Donnell, M.: RSTTOOL 2.4 – A markup tool for rhetorical structure theory. In: Proceed. of the International Natural Language Generation Conference, pp. 253–256 (2000) Google Scholar
  14. 14.
    Pardo, T.A.S., Nunes, M.G.V.: On the Development and Evaluation of a Brazilian Portuguese Discourse Parser. Journal of Theoretical and Applied Computing 15(2), 43–64 (2008) Google Scholar
  15. 15.
    Pardo, T.A.S., Seno, E.R.M.: Rhetalho: um corpus de referência anotado retoricamente. In: Anais do V Encontro de Corpora. São Carlos-SP, Brasil (2005) Google Scholar
  16. 16.
    Portolés, J.: Marcadores del discurso. Ariel, Barcelona (1998) Google Scholar
  17. 17.
    Soricut, R., Marcu, D.: Sentence Level Discourse Parsing using Syntactic and Lexical Information. In: Proceedings of the 2003 Conference of NAACL-HLT, pp. 149–156 (2003) Google Scholar
  18. 18.
    Stede, M.: The Potsdam commentary corpus. In: Proceedings of the Workshop on Discourse Annotation, 42nd Meeting of the ACL (2004) Google Scholar
  19. 19.
    Subba, R., Di Eugenio, B.: An effective discourse parser that uses rich linguistic information. In: Proceedings of the 2009 Conference of HLT-ACL, pp. 566–574 (2009) Google Scholar
  20. 20.
    Sumita, K., Ono, K., Chino, T., Ukita, T., Amano, S.: A discourse structure analyzer for Japanese text. In: Proceedings of the International Conference on Fifth Generation Computer Systems, pp. 1133–1140 (1992) Google Scholar
  21. 21.
    Taboada, T.: Discourse markers as signals (or not) of rhetorical relations. Journal of Pragmatics 38, 567–592 (2006) Google Scholar
  22. 22.
    Taboada, M., Mann, W.C.: Applications of Rhetorical Structure Theory. Discourse Studies 8(4), 567–588 (2006) Google Scholar
  23. 23.
    Taboada, M., Renkema, J.: Discourse Relations Reference Corpus [Corpus]. Simon Fraser University and Tilburg University (2008),
  24. 24.
    Tofiloski, M., Brooke, J., Taboada, M.: A Syntactic and Lexical-Based Discourse Segmenter. In: Proceedings of the 47th Annual Meeting of ACL (2009) Google Scholar
  25. 25.
    van Dijk, T.A.: Texto y contexto (Semántica y pragmática del discurso). Cátedra, Madrid (1984) Google Scholar
  26. 26.
    Versley, Y.: Multilabel Tagging of Discourse Relations in Ambiguous Temporal Connectives. In: Proceedings de la 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), pp. 154–161 (2011) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Institut Universitari de Lingüística AplicadaUniversitat Pompeu FabraBarcelonaSpain
  2. 2.Laboratoire Informatique d’AvignonUniversité d’Avignon et des Pays de VaucluseAvignon Cedex 9France
  3. 3.Département de génie informatiqueÉcole Polytechnique de MontréalMontréalCanada
  4. 4.Instituto de IngenieríaUniversidad Nacional Autónoma de MéxicoMexico D.F.Mexico

Personalised recommendations