Language Resources and Evaluation

, Volume 49, Issue 2, pp 263–309 | Cite as

A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora

  • Mikel Iruskieta
  • Iria da Cunha
  • Maite Taboada
Original Paper


Explaining why the same passage may have different rhetorical structures when conveyed in different languages remains an open question. Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts may differ in their rhetorical structures. To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory. We propose a method to describe the main linguistic differences among the rhetorical structures of the three languages in the two annotation stages (segmentation and rhetorical analysis). We show a new type of comparison that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators. With the use of this new method, we show how translation strategies affect discourse structure.


Annotation evaluation Discourse analysis Rhetorical Structure Theory Translation strategies 



This work has been partially financed by the Spanish projects RICOTERM 4 (FFI2010-21365-C03-01) and APLE 2 (FFI2012-37260), and a Juan de la Cierva Grant (JCI-2011-09665) to Iria da Cunha. Maite Taboada was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (261104-2008). Mikel Iruskieta was supported by the following projects: OPENMT-2 (TIN2009-14675-C03-01) [Spanish Ministry], Ber2Tek (IE12-333) [Basque Government] and IXA group (GIU09/19) [University of the Basque Country]. We would like to thank the anonymous reviewers for their comments and suggestions, Nynke van der Vliet for her feedback on the evaluation method, Esther Miranda for designing the website, and Oier Lopez de Lacalle for helping with the scripts to calculate the statistics.


  1. Abelen, E., Redeker, G., & Thompson, S. A. (1993). The rhetorical structure of US-American and Dutch fund-raising letters. Text, 13(3), 323–350.Google Scholar
  2. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596.CrossRefGoogle Scholar
  3. Baker, M. (2004). A corpus-based view of similarity and difference in translation. International Journal of Corpus Linguistics, 9(2), 167–193.CrossRefGoogle Scholar
  4. Bateman, J. A., & Rondhuis, K. J. (1997). Coherence relations: Towards a general specification. Discourse Processes, 24(1), 3–49.CrossRefGoogle Scholar
  5. Carlson, L., Okurowski, M. E., & Marcu, D. (2002). RST Discourse Treebank, LDC2002T07 [Corpus]. Philadelphia, PA: Linguistic Data Consortium.Google Scholar
  6. Carlson, L., Marcu, D., & Okurowski, M. E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In van Kuppevelt, C. J. Jan & R. W. Smith (Eds.), Current and new directions in discourse and dialogue (pp. 85–112). Berlin: Springer.Google Scholar
  7. Catford, J. C. (1965). A linguistic theory of translation: An essay in applied linguistics (Vol. 8). New York: Oxford University Press.Google Scholar
  8. Cenoz, J. (2003). The role of typology in the organization of the multilingual lexicon. In J. Cenoz, B. Hufeisen & U. Jessner (Eds.), The multilingual lexicon (pp. 103–116), New York: Springer.Google Scholar
  9. Chesterman, A. (1993). From ‘is’ to ‘ought’: Laws, norms and strategies in translation studies. Target, 5(1), 1–20.CrossRefGoogle Scholar
  10. Chesterman, A. (1997). Memes of translation: The spread of ideas in translation theory (Vol. 22). Amsterdam and Philadelphia: Benjamins.CrossRefGoogle Scholar
  11. Chiswick, B. R., & Miller, P. W. (2005). Linguistic distance: A quantitative measure of the distance between english and other languages. Journal of Multilingual and Multicultural Development, 26(1), 1–11.CrossRefGoogle Scholar
  12. Cristea, D., Ide, N., & Romary, L. (1998). Veins theory: A model of global discourse cohesion and coherence. In C. Boitet & P. Whitelock (Eds.), 17th international conference on Computational linguistics (Vol. 1 pp. 281–285). Montreal, Canada: Association for Computational Linguistics.Google Scholar
  13. Cui, S. (1986). A comparison of English and Chinese expository rhetorical structures. Ph.D. thesis, UCLA.Google Scholar
  14. da Cunha, I., & Iruskieta, M. (2010). Comparing rhetorical structures in different languages: The influence of translation strategies. Discourse Studies, 12(5), 563–598.CrossRefGoogle Scholar
  15. da Cunha, I., Torres-Moreno, J. M., & Sierra, G. (2011a). On the Development of the RST Spanish Treebank. In 5th Linguistic annotation workshop. 49th annual meeting of the association for computational linguistics, ACL (pp. 1–10). Portland, Oregon, USA.Google Scholar
  16. da Cunha, I., Torres-Moreno, J. M., Sierra, G., Cabrera-Diego, L. A., Castro-Rolón, B. G., & Rolland-Bartilotti, J. M. (2011b). The RST Spanish Treebank On-line Interface. In International conference recent advances in NLP (pp. 698–703), Bulgaria.Google Scholar
  17. Delin, J., Hartley, A. F., Paris, C., Scott, D. R., & Linden, K. V. (1994). Expressing procedural relationships in multilingual instructions. In Seventh International Workshop on Natural Language Generation (pp. 61–70), Association for Computational Linguistics.Google Scholar
  18. Delin, J., Hartley, A. F., & Scott, D. R. (1996). Towards a contrastive pragmatics: Syntactic choice in English and French instructions. Language Sciences, 18(3–4), 897–931.CrossRefGoogle Scholar
  19. Egg, M., & Redeker, G. (2010). How complex is discourse structure? In Proceedings of the 7th international conference on language resources and evaluation (LREC 2010) (pp. 1619–1623), Valletta, Malta.Google Scholar
  20. Fetzer, A., & Johansson, M. (2010). Cognitive verbs in context. A contrastive analysis of English and French argumentative discourse. International Journal of Corpus Linguistics, 15(2), 240–266.CrossRefGoogle Scholar
  21. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRefGoogle Scholar
  22. Flowerdew, J. (2010). Use of signalling nouns across l1 and l2 writer corpora. International Journal of Corpus Linguistics, 15(1), 36–55.CrossRefGoogle Scholar
  23. Fung, P. (1995). Compiling bilingual lexicon entries from a non-parallel English–Chinese corpus. In 3rd workshop on very large Corpora, (Vol. 78, pp. 173–183). Boston, MA.Google Scholar
  24. Ghorbel, H., Ballim, A., & Coray, G. (2001). ROSETTA: Rhetorical and semantic environment for text alignment. In: Corpus Linguistics, Lancaster University (UK) (pp. 224–233).Google Scholar
  25. Gomez, X., & Simoes, A. (2009). Parallel corpus-based bilingual terminology extraction. In 8th international conference on terminology and artificial intelligence Toulouse.Google Scholar
  26. Granger, S. (2003). The corpus approach: A common way forward for Contrastive Linguistics and Translation Studies (pp. 17–29). Rodopi, Corpus-based approaches to contrastive linguistics and translation studies. Amsterdam/New York.Google Scholar
  27. House, J. (2004). Explicitness in discourse across languages. Neue Perspektiven in der Übersetzungs-und Dolmetschwissenschaft (pp. 185–208), Bochum: AKS.Google Scholar
  28. Iruskieta, M., Aranzabe, M. J., Díaz de Ilarraza, A., Gonzalez, I., Lersundi, M., & Lopez de la Calle, O. (2013a). The RST Basque TreeBank: An online search interface to check rhetorical relations. In 4th workshop RST and discourse studies, Brasil.Google Scholar
  29. Iruskieta, M., Díaz de Ilarraza, A., & Lersundi, M. (2013b). Establishing criteria for RST-based discourse segmentation and annotation for texts in Basque. Corpus Linguistics and Linguistic Theory, 1–32.Google Scholar
  30. Kanté, I. (2010). Mood and modality in finite noun complement clauses: A French-English contrastive study. International Journal of Corpus Linguistics, 15(2), 267–290.CrossRefGoogle Scholar
  31. Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In: MT summit, Phuket, Thailand.Google Scholar
  32. Kong, K. C. C. (1998). Are simple business request letters really simple? A comparison of Chinese and English business request letters. Text & Talk, 18(1), 103–141.Google Scholar
  33. Mann, W. C., & Taboada, M. (2010). RST web-site. Accessed 30 September 2012.
  34. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3), 243–281.CrossRefGoogle Scholar
  35. Marcu, D. (2000a). The rhetorical parsing of unrestricted texts: A surface-based approach. Computational Linguistics, 26(3), 395–448.CrossRefGoogle Scholar
  36. Marcu, D. (2000b). The theory and practice of discourse parsing and summarization. Cambridge: MIT press.Google Scholar
  37. Marcu, D., Carlson, L., & Watanabe, M. (2000). The automatic translation of discourse structures. In 1st North American chapter of the Association for Computational Linguistics conference (pp. 9–17), Seattle (USA): Morgan Kaufmann Publishers.Google Scholar
  38. Maxwell, M. (2010). Limitations of corpora. International Journal of Corpus Linguistics, 15(3), 379–383.CrossRefGoogle Scholar
  39. Maziero, E. G., & Pardo, T. A. S. (2009). Automatização de um método de avaliação de estruturas retóricas. In: RST Brazilian meeting, São Paulo, Brazil.Google Scholar
  40. Mitocariu, E., Anechitei, D. A., & Cristea, D. (2013). Comparing discourse tree structures (pp. 513–522). Berlin: Springer. Computational Linguistics and Intelligent Text Processing.Google Scholar
  41. Mohamed, A. H., & Omer, M. R. (1999). Syntax as a marker of rhetorical organization in written texts: Arabic and English. International Review of Applied Linguistics in Language Teaching (IRAL), 37(4), 291–305.Google Scholar
  42. Morin, E., Daille, B., Takeuchi, K., & Kageura, K. (2007). Bilingual terminology mining-using brain, not brawn comparable corpora. In Annual meetings ACL (Vol. 45, pp. 664–671). Prague.Google Scholar
  43. Mortier, L., & Degand, L. (2009). Adversative discourse markers in contrast: The need for a combined corpus approach. International Journal of Corpus Linguistics, 14(3), 338–366.CrossRefGoogle Scholar
  44. O’Donnell, M. (2000). RSTTool 2.4: A markup tool for rhetorical structure Theory. In First international conference on natural language generation INLG’00 (Vol. 14, pp. 253–256). Mitzpe Ramon: ACL.Google Scholar
  45. Pardo, T. A. S. (2005). Métodos para análise discursiva automática. Ph.D. thesis, Instituto de Ciências Matemáticas e de Computação, São Carlos-SP: Universidade de São Paulo.Google Scholar
  46. Ramsay, G. (2000). Linearity in rhetorical organisation: A comparative cross-cultural analysis of newstext from the People’s Republic of China and Australia. International Journal of Applied Linguistics, 10(2), 241–258.CrossRefGoogle Scholar
  47. Ramsay, G. (2001). Rhetorical styles and newstexts: A contrastive analysis of rhetorical relations in Chinese and Australian news-journal text. ASAA E-Journal of Asian Linguistics and Language-teaching, 1(1), 1–22.Google Scholar
  48. Salkie, R., & Oates, S. L. (1999). Contrast and concession in French and English. Languages in Contrast, 2(1), 27–56.CrossRefGoogle Scholar
  49. Sarjala, M. (1994). Signalling of reason and cause relations in academic discourse. Anglicana Turkuensia, 13, 89–98.Google Scholar
  50. Scott, D. R., Delin, J., & Hartley, A. F. (1998). Identifying congruent pragmatic relations in procedural texts. Languages in Contrast, 1(1), 45–82.CrossRefGoogle Scholar
  51. Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. In 2003 conference of the North American Chapter of the Association for Computational Linguistics on human language technology (Vol. 1, pp. 149–156). Association for Computational Linguistics.Google Scholar
  52. Stede, M. (2008a). Disambiguating rhetorical structure. Research on Language and Computation, 6(3), 311–332.CrossRefGoogle Scholar
  53. Stede, M. (2008b). RST revisited: Disentangling nuclearity (pp. 33–57). Amsterdam and Philadelphia: John Benjamins. ‘Subordination’ versus ‘coordination’ in sentence and text.Google Scholar
  54. Taboada, M. (2004a). Building coherence and cohesion: Task-oriented dialogue in English and Spanish. Amsterdam and Philadelphia: John Benjamins.CrossRefGoogle Scholar
  55. Taboada, M. (2004b). Rhetorical relations in dialogue: A contrastive study (pp. 75–97), Amsterdam and Philadelphia: John Benjamins. Discourse across Languages and Cultures.Google Scholar
  56. Taboada, M., & Mann, W. C. (2006a). Applications of rhetorical structure theory. Discourse Studies, 8(4), 567–588.CrossRefGoogle Scholar
  57. Taboada, M., & Mann, W. C. (2006b). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459.CrossRefGoogle Scholar
  58. Taboada, M., & Renkema, J. (2008). Discourse relations reference corpus. Simon Fraser University and Tilburg University. Accessed 30 September 2012
  59. Trask, R. L. (1997). The history of Basque. London: Routledge.Google Scholar
  60. Usoniene, A., & Soliene, A. (2010). Choice of strategies in realizations of epistemic possibility in English and Lithuanian: A corpus-based study. International Journal of Corpus Linguistics, 15(2), 291–316.CrossRefGoogle Scholar
  61. UZEI and HAEE-IVAP. (1997). International congress on terminology. Donostia and Gasteiz: UZEI; HAEE-IVAP.Google Scholar
  62. van der Vliet, N. (2010). Inter annotator agreement in discourse analysis.
  63. Wu, D., & Xia, X. (1994). Learning an English–Chinese lexicon from a parallel corpus. In First conference of the AMTA (pp. 206–213). Citeseer, Columbia.Google Scholar
  64. Xiao, R. (2010). How different is translated Chinese from native Chinese? A corpus-based study of translation universals. International Journal of Corpus Linguistics, 15(1), 5–35.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Mikel Iruskieta
    • 1
  • Iria da Cunha
    • 2
  • Maite Taboada
    • 3
  1. 1.Department of Didactics of Language and LiteratureUniversity of the Basque CountryLeioaSpain
  2. 2.University Institute for Applied LinguisticsUniversitat Pompeu FabraBarcelonaSpain
  3. 3.Department of LinguisticsSimon Fraser UniversityBurnabyCanada

Personalised recommendations