Skip to main content
Log in

RST Signalling Corpus: a corpus of signals of coherence relations

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We present the RST Signalling Corpus (Das et al. in RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T10, 2015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al. in RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T07, 2002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium, and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Most of the examples in the paper were extracted from the RST Discourse Treebank (Carlson et al. 2002). The content inside the square brackets following an example refers to the file number in the RST Discourse Treebank from which the example has been taken.

  2. A combined span comprises two or more spans (Elementary Discourse Units, or EDUs), and is represented by the starting span and the ending span, with a hyphen between them.

  3. TextLink, Structuring Discourse in Multilingual Europe, COST Action IS1312, http://textlink.ii.metu.edu.tr/.

  4. https://www.ldc.upenn.edu/.

  5. Conventions for interpreting examples from the RST-DT: The text within square brackets denotes a span. Each pair of square brackets is followed by either the uppercase character N, referring to the nucleus span, or the uppercase character S, referring to the satellite span. A pair of two spans (N and S, or N and N) is respectively followed by a dash and the name of the relation that holds between the spans. The square brackets at the end contain the file number of the source document, and the location of the relation in the document. The signal under discussion is underlined.

  6. The Contingency relation is also signalled by the DM but here.

  7. We chose to use the entire infinitival clause as the relevant signal rather than the infinitive particle to, as it can be confused with the preposition to.

  8. A combined signal is represented within parentheses, including two features conjoined by the ‘+’ symbol. For example, a combined signal, containing feature 1 and feature 2, is represented in the following form: (feature 1 + feature 2).

  9. The original taxonomy also included ten types of combined signals. See Taboada and Das (2013) for more information on our pilot study.

References

  • Afantenos, S., Asher, N., Benamara, F., Bras, M., Fabre, C., & Ho-Dac, M., et al. (2012). An empirical resource for discovering cognitive principles of discourse organization: the ANNODIS corpus. In Paper presented at the the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey.

  • Alonso, L., Castellón, I., Gibert, K., & Padró, L. (2002). An empirical approach to discourse markers by clustering. In M. T. Escrig, F. Toledo, & E. Golobardes (Eds.), Topics in artificial intelligence (Vol. 2504, pp. 173–183). Berlin: Springer.

    Chapter  Google Scholar 

  • Al-Saif, A., & Markert, K. (2010). The leeds Arabic Discourse Treebank: Annotating discourse connectives for Arabic. In Paper presented at the the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta.

  • Bateman, J., Kamps, T., Kleinz, J., & Reichenberger, K. (2001). Towards constructive text, diagram, and layout generation for information presentation. Computational Linguistics, 27(3), 409–449.

    Article  Google Scholar 

  • Berzlánovich, I., & Redeker, G. (2012). Genre-dependent interaction of coherence and lexical cohesion in written discourse. Corpus Linguistics and Linguistic Theory, 8(1), 183–208.

    Article  Google Scholar 

  • Blakemore, D. (1987). Semantic constraints on relevance. Oxford: Blackwell.

    Google Scholar 

  • Blakemore, D. (1992). Understanding utterances: An introduction to pragmatics. Oxford: Blackwell.

    Google Scholar 

  • Blakemore, D. (2002). Relevance and linguistic meaning: The semantics and pragmatics of discourse markers. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Cain, K., & Nash, H. M. (2011). The influence of connectives on young readers’ processing and comprehension of text. Journal of Educational Psychology, 103(2), 429–441.

    Article  Google Scholar 

  • Carlson, L., & Marcu, D. (2001). Discourse tagging manual. Los Angeles: University of Southern California.

    Google Scholar 

  • Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T07.

  • Cevasco, J. (2009). The role of connectives in the comprehension of spontaneous spoken discourse. The Spanish Journal of Psychology, 12(1), 56–65.

    Article  Google Scholar 

  • Corston-Oliver, S. (1998). Beyond string matching and cue phrases: Improving efficiency and coverage in discourse analysis. In Paper presented at the AAAI 1998 spring symposium series, intelligent text summarization, Madison, Wisconsin.

  • da Cunha, I., Juan, E. S., Torres-Moreno, J. M., Cabré, M. T., & Sierra, G. (2012). A symbolic approach for automatic detection of nuclearity and rhetorical relations among intra-sentence discourse segments in Spanish. In Paper presented at the CICLing, New Delhi, India.

  • da Cunha, I., Torres-Moreno, J.-M., & Sierra, G. (2011). On the development of the RST Spanish Treebank. In Paper presented at the the 5th linguistic annotation workshop, 49th annual meeting of the association for computational linguistics (ACL), Portland, OR.

  • Dale, R. (1991a). Exploring the role of punctuation in the signalling of discourse structure. In Paper presented at the workshop on text representation and domain modeling: Ideas from linguistics and AI. Technical University of Berlin.

  • Dale, R. (1991b). The role of punctuation in discourse structure. In Paper presented at the the AAAI fall symposium on discourse structure in natural language understanding and generation, Asilomar, CA.

  • Dancygier, B., & Sweetser, E. (2005). Mental spaces in grammar: Conditional constructions. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Das, D. (2012). Investigating the role of discourse markers in signalling coherence relations: A corpus study. In Paper presented at the the Northwest linguistics conference. Seattle: University of Washington.

  • Das, D. (2014). Signalling of coherence relations in discourse. Ph.D. dissertation. Burnaby: Simon Fraser University.

  • Das, D., & Taboada, M. (2013). Explicit and implicit coherence relations: A corpus study. In Paper presented at the the Canadian linguistic association (CLA) conference. Victoria: University of Victoria.

  • Das, D., Taboada, M., & McFetridge, P. (2015). RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T10.

  • Degand, L., & Sanders, T. (2002). The impact of relational markers on expository text omprehension in L1 and L2. Reading and Writing, 15(7–8), 739–758.

    Article  Google Scholar 

  • Derczynski, L., & Gaizauskas, R. (2013). Temporal signals help label temporal relations. In Paper presented at the annual meeting of the association for computational linguistics, ACL, Sofia, Bulgaria.

  • Dipper, S., Götze, M., & Stede, M. (2004). Simple annotation tools for complex annotation tasks: An evaluation. In Paper presented at the the LREC workshop on XML-based richly annotated corpora, Lisbon, Portugal.

  • Duque, E. (2014). Signaling causal coherence relations. Discourse Studies, 16(1), 25–46.

    Article  Google Scholar 

  • Feng, V. W., & Hirst, G. (2012). Text-level discourse parsing with rich linguistic features. In Paper presented at the the 50th annual meeting of the association for computational linguistics.

  • Feng, V. W., & Hirst, G. (2014). A linear-time bottom-up discourse parser with constraints and post-editing. In Paper presented at the the 52th annual meeting of the association for computational linguistics (ACL-2014), Baltimore, USA.

  • Fraser, B. (1990). An approach to discourse markers. Journal of Pragmatics, 14, 383–395.

    Article  Google Scholar 

  • Fraser, B. (1999). What are discourse markers? Journal of Pragmatics, 31, 931–953.

    Article  Google Scholar 

  • Fraser, B. (2006). Towards a theory of discourse markers. In K. Fischer (Ed.), Approaches to discourse particles (pp. 189–204). Amsterdam: Elsevier Press.

    Google Scholar 

  • Fraser, B. (2009). An account of discourse markers. International Review of Pragmatics, 1, 293–320.

    Article  Google Scholar 

  • Haberlandt, K. (1982). Reader expectations in text comprehension. In J.-F. Le Ny & W. Kintsch (Eds.), Language and comprehension (pp. 239–249). Amsterdam: North-Holland.

    Chapter  Google Scholar 

  • Halliday, M., & Hasan, R. (1976). Cohesion in english. London: Longman.

    Google Scholar 

  • Hernault, H., Bollegala, D., & Ishizuka, M. (2011). Semi-supervised discourse relation classification with structural learning. In Paper presented at the the 12th international conference on computational linguistics and intelligent text processing (CICLing ‘11), Tokyo, Japan.

  • Hernault, H., Prendinger, H., duVerle, D. A., & Ishizuka, M. (2010). HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse, 1(3), 1–33.

  • Kamalski, J. (2007). Coherence marking, comprehension and persuasion: On the processing and representation of discourse. Utrecht: LOT.

    Google Scholar 

  • Knott, A. (1996). A data-driven methodology for motivating a set of coherence relations. Ph.D. dissertation. Edinburgh: University of Edinburgh.

  • Knott, A., & Dale, R. (1994). Using linguistic phenomena to motivate a set of coherence relations. Discourse Processes, 18(1), 35–62.

    Article  Google Scholar 

  • Knott, A., & Sanders, T. (1998). The classification of coherence relation and their linguistic markers: An exploration of two languages. Journal of Pragmatics, 30, 135–175.

    Article  Google Scholar 

  • Kolachina, S., Prasad, R., Misra Sharma, D., & Joshi, A. (2012). Evaluation of discourse relation annotation in the Hindi Discourse Treebank. In Paper presented at the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey.

  • Lapata, M., & Lascarides, A. (2004). Inferring sentence-internal temporal relations. In Paper presented at the North American chapter of the assocation of computational linguistics.

  • Le Thanh, H. (2007). An approach in automatically generating discourse structure of text. Journal of Computer Science and Cybernetics, 23(3), 212–230.

    Google Scholar 

  • Lin, Z., Kan, M.-Y., & Ng, H. T. (2009). Recognizing implicit discourse relations in the Penn Discourse Treebank. In Paper presented at the 2009 conference on empirical methods in natural language processing, Singapore.

  • Louis, A., Joshi, A., Prasad, R., & Nenkova, A. (2010). Using entity features to classify implicit discourse relations. In Paper presented at the the 11th annual meeting of the special interest group on discourse and dialogue, SIGDIAL’10.

  • Mak, W. M., & Sanders, T. J. M. (2013). The role of causality in discourse processing: Effects on expectation and coherence relations. Language and Cognitive Processes, 28(9), 1414–1437.

    Article  Google Scholar 

  • Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281.

    Article  Google Scholar 

  • Marcu, D. (1999). A decision-based approach to rhetorical parsing. In Paper presented at the the 37th annual meeting of the association for computational linguistics on computational linguistics, College Park, Maryland.

  • Marcu, D. (2000). The rhetorical parsing of unrestricted texts: A surface based approach. Computational Linguistics, 26(3), 395–448.

    Article  Google Scholar 

  • Marcu, D., & Echihabi, A. (2002). An unsupervised approach to recognising discourse relations. In Paper presented at the 40th annual meeting of the association for computational linguistics (ACL’02), Philadelphia, PA.

  • Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Martin, J. R. (1992). English text: System and structure. Amsterdam: John Benjamins.

    Book  Google Scholar 

  • Matthiessen, C. M. I. M. (2015). Register in the round: Registerial cartography. Functional Linguistics, 2(9), 1–48.

    Google Scholar 

  • Matthiessen, C. M. I. M., & Teruya, K. (2015). Grammatical realizations of rhetorical relations in different registers. Word, 61(3), 232–281.

    Article  Google Scholar 

  • Maziero, E. G., Pardo, T. A. S., da Cunha, I., Torres-Moreno, J.-M., & SanJuan, E. (2011). DiZer 2.0—An Adaptable on-line discourse parser. In Paper presented at the the III RST meeting (8th Brazilian symposium in information and human language technology, Cuiaba, MT, Brazil.

  • Meyer, B. J. F. (1975). The organization of prose and its effects on memory. Amsterdam: North-Holland.

    Google Scholar 

  • Meyer, T., & Webber, B. (2013). Implicitation of discourse connectives in (machine) translation. In Paper presented at the the 1st DiscoMT workshop at ACL 2013 (51th annual meeting of the association for computational linguistics), Sofia, Bulgaria.

  • Millis, K. K., & Just, M. A. (1994). The influence of connectives on sentence comprehension. Journal of Memory and Language, 33, 128–147.

    Article  Google Scholar 

  • Mithun, S., & Kosseim, L. (2011). Comparing approaches to tag discourse relations. In Paper presented at the the 12th international conference on computational linguistics and intelligent text processing (CICLing’11), Tokyo, Japan.

  • Mladová, L., Zikánová, Š., & Hajičova, E. (2008). From sentence to discourse: Building an annotation scheme for discourse based on Prague Dependency Treebank. In Paper presented at the the 6th international conference on language resources and evaluation (LREC 2008), Marakéš, Maroko.

  • Mulder, G. (2008). Undestanding causal coherence relations. Ph.D. dissertation. Utrecht: Utrecht University.

  • Mulder, G., & Sanders, T. J. M. (2012). Causal coherence relations and levels of discourse representation. Discourse Processes, 49(6), 501–522.

    Article  Google Scholar 

  • Murray, J. D. (1995). Logical connectives and local coherence. In J. R. F. Lorch & E. J. O’Brien (Eds.), Sources of coherence in reading (pp. 107–125). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • O’Donnell, M. (1997). RSTTool. http://www.wagsoft.com/RSTTool/.

  • O’Donnell, M. (2008). The UAM CorpusTool: Software for corpus annotation and exploration. In Paper presented at the the XXVI Congreso de AESLA, Almeria, Spain.

  • Pardo, T. A. S., & Nunes, M. D. G. V. (2008). On the development and evaluation of a Brazilian Portuguese discourse parser. Journal of Theoretical and Applied Computing, 15(2), 43–64.

    Google Scholar 

  • Pitler, E., Louis, A., & Nenkova, A. (2009). Automatic sense prediction for implicit discourse relations in text. In Paper presented at the the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Singapore.

  • Polanyi, L., Culy, C., van den Berg, M., Thione, G. L., & Ahn, D. (2004). A rule based approach to discourse parsing. In Paper presented at the the 5th SIGdial workshop on discourse and dialogue. Cambridge, MA: ACL.

  • Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., & Webber, B. (2008). The Penn Discourse Treebank 2.0. In Paper presented at the 6th international conference on language resources and evaluation (LREC 2008), Marrackech, Morocco.

  • Prasad, R., Joshi, A., & Webber, B. (2010). Realization of discourse relations by other means: Alternative lexicalizations. In Paper presented at the the 23rd international conference on computational linguistics, Beijing.

  • Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A., Robaldo, L., & Webber, B. (2007). The Penn Discourse Treebank 2.0 annotation manual. The PDTB Research Group. University of Pennsylvania.

  • Redeker, G., Berzlánovich, I., van der Vliet, N., Bouma, G., & Egg, M. (2012). Multi-layer discourse annotation of a Dutch text corpus. In Paper presented at the the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey.

  • Renkema, J. (2004). Introduction to discourse studies. Amsterdam: Benjamins.

    Book  Google Scholar 

  • Renkema, J. (2009). The texture of discourse. Amsterdam: John Benjamins Publishing Company.

    Book  Google Scholar 

  • Roze, C., Danlos, L., & Muller, P. (2012). EXCONN: A French lexicon of discourse connectives. Discours, 10, 114–125.

  • Sanders, T., Land, J., & Mulder, G. (2007). Linguistic markers of coherence improve text comprehension in funtional contexts—On text representation and document design. Information Design Journal, 15(3), 219–235.

    Article  Google Scholar 

  • Sanders, T., & Noordman, L. (2000). The role of coherence relations and their linguistic markers in text processing. Discourse Processes, 29(1), 37–60.

    Article  Google Scholar 

  • Sanders, T., & Spooren, W. (2007). Discourse and text structure. In D. Geeraerts & J. Cuykens (Eds.), Handbook of cognitive linguistics (pp. 916–941). Oxford: Oxford University Press.

    Google Scholar 

  • Sanders, T., & Spooren, W. (2009). The cognition of discourse coherence. In J. Renkema (Ed.), Discourse, of course (pp. 197–212). Amsterdam: Benjamins.

    Chapter  Google Scholar 

  • Sanders, T., Spooren, W., & Noordman, L. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15, 1–35.

    Article  Google Scholar 

  • Sanders, T., Spooren, W., & Noordman, L. (1993). Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics, 4(2), 93–133.

    Article  Google Scholar 

  • Scanlan, C. (2000). Reporting and writing: Basics for the 21st century. Oxford: Oxford University Press.

    Google Scholar 

  • Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Schiffrin, D. (2001). Discourse markers: Language, meaning and context. In D. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of discourse analysis (pp. 54–75). Malden, MA: Blackwell.

    Google Scholar 

  • Schilder, F. (2002). Robust discourse parsing via discourse markers, topicality and position. Natural Language Engineering, 8(2/3), 235–255.

    Google Scholar 

  • Scott, D., & de Souza, C. S. (1990). Getting the message across in RST-based text generation. In R. Dale, C. Mellish, & M. Zock (Eds.), Current research in natural language generation (pp. 47–73). London: Academic Press.

    Google Scholar 

  • Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. New York: McGraw-Hil.

    Google Scholar 

  • Sporleder, C., & Lascarides, A. (2005). Exploiting linguistic cues to classify rhetorical relations. In Paper presented at the recent advances in natural language processing (RANLP-05).

  • Sporleder, C., & Lascarides, A. (2008). Using automatically labelled examples to classify rhetorical relations: An assessment. Natural Language Engineering, 14, 369–416.

    Article  Google Scholar 

  • Spyridakis, J. H., & Standal, T. C. (1987). Signals in expository prose: Effects on reading comprehension. Reading Research Quarterly, 12, 285–298.

    Article  Google Scholar 

  • Stede, M., & Umbach, C. (1998). DiMLex: A lexicon of discourse markers for text generation and understanding. In Paper presented at the the COLING-ACL ‘98 conference, Montreal.

  • Taboada, M. (2006). Discourse markers as signals (or not) of rhetorical relations. Journal of Pragmatics, 38(4), 567–592.

    Article  Google Scholar 

  • Taboada, M. (2009). Implicit and explicit coherence relations. In J. Renkema (Ed.), Discourse, of course. Amsterdam: John Benjamins.

    Google Scholar 

  • Taboada, M., & Das, D. (2013). Annotation upon annotation: Adding signaling information to a corpus of discourse relations. Dialogue and Discourse, 4(2), 249–281.

    Article  Google Scholar 

  • Taboada, M., & Mann, W. C. (2006a). Applications of rhetorical structure theory. Discourse Studies, 8(4), 567–588.

    Article  Google Scholar 

  • Taboada, M., & Mann, W. C. (2006b). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459.

    Article  Google Scholar 

  • Theijssen, D. (2007). Features for automatic discourse analysis of paragraphs. M.A. dissertation. Nijmegen: Radboud University Nijmegen.

  • Theijssen, D., van Halteren, H., Verberne, S., & Boves, L. (2008). Features for automatic discourse analysis of paragraphs. In Paper presented at the 18th meeting of computational linguistics in the Netherlands (CLIN 2007).

  • Tonelli, S., Riccardi, G., Prasad, R., & Joshi, A. (2010). Annotation of discourse relations for conversational spoken dialogs. In Paper presented at the the 7th international conference on language resources and evaluation (LREC 2010), Valletta, Malta.

  • Versley, Y. (2013). Subgraph-based classification of explicit and implicit discourse relations. In Paper presented at the the 10th international conference on computational semantics (IWCS 2013), Potsdam, Germany.

  • Versley, Y., & Gastel, A. (2013). Linguistic tests for discourse relations in the TüBa-D/Z corpus of written German. Dialogue and Discourse, 4(2), 142–173.

    Article  Google Scholar 

  • Zeyrek, D., Demirşahin, I., Sevdik-Çalli, A. B., Balaban, H. Ö., Yalçinkaya, I., & Turan, Ü. D. (2010). The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotation. In Paper presented at the the fourth linguistic annotation workshop (LAW-IV).

Download references

Acknowledgements

We are greatly indebted to the late Dr. Paul McFetridge for his invaluable contribution to this work. Dr. McFetridge was the Senior Supervisor of Debopam Das’ Ph.D. dissertation (Das 2014). Sadly, he passed away on March 14, 2014, only a few months before the completion of the final version of the RST Signalling Corpus. He was a major driving force and a source of constant support for our work. Funding for this research was provided by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 261,104-2008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debopam Das.

Appendix

Appendix

See Table 6.

Table 6 Complete taxonomy of signals used to annotate the RST Signalling Corpus

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, D., Taboada, M. RST Signalling Corpus: a corpus of signals of coherence relations. Lang Resources & Evaluation 52, 149–184 (2018). https://doi.org/10.1007/s10579-017-9383-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9383-x

Keywords

Navigation