Skip to main content

Mining Local Discourse Annotation for Features of Global Discourse Structure

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2020)

Abstract

Descriptive approaches to discourse (text) structure and coherence typically proceed either in a bottom-up or a top-down analytic way. The former ones analyze how the smallest discourse units (clauses, sentences) are connected in their closest neighbourhood, locally, in a linear way. The latter ones postulate a hierarchical organization of smaller and larger units, sometimes also represent the whole text as a tree-like graph. In the present study, we mine a Czech corpus of 50k sentences annotated in the local coherence fashion (Penn Discourse Treebank style) for indices signalling higher discourse structure. We analyze patterns of overlapping discourse relations and look into hierarchies they form. The types and distributions of the detected patterns correspond to the results for English local annotation, with patterns not complying with the tree-like interpretation at very low numbers. We also detect hierarchical organization of local discourse relations of up to 5 levels in the Czech data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this study, we use the term discourse relations, according to the Penn Discourse Treebank’s terminology.

  2. 2.

    Both ways of text analysis for the same data are rare, yet they exist, e.g. for English Wall Street Journal texts  [2, 13, 18] and for German news commentaries  [14]. For German, even a mapping procedure between the two annotation layers was introduced in  [15].

  3. 3.

    Basically a constituency tree, which is in its nature projective and does not allow crossing edges, in comparison to the basic mathematical definition of a tree graph.

  4. 4.

    Due to the limited range of this paper, we only compare our results to theirs for discourse relations. The implications for syntax (level of complexity) is not explicitly discussed.

  5. 5.

    Technically, the annotation is not carried out on raw texts, but on top of the syntactic trees.

  6. 6.

    We have obtained so much data that we must only select certain aspects for this study. We therefore concentrate on the patterns studied by Lee et al., and on hierarchical structuring of discourse relations.

  7. 7.

    Due to space limit we only present the English translations of the PDT Czech originals here. Relation 1 is highlighted in italics, relation 2 in bold. The connectives are underlined.

  8. 8.

    In the representation in Example 2, the clause This means that is not in italics, not a part of any argument of the left relation.

  9. 9.

    The “also-not” connective is originally in Czech ani, in the meaning of neither. Lit. translation: “Neither here is_concerned a small portion...”.

  10. 10.

    This also explains the zeros in Table 2.

  11. 11.

    And the more so, as we do not include implicit and entity-based relations into our study.

References

  1. Hajič, J., et al.: Prague Dependency Treebank 3.5. Data/software. Institute of Formal and Applied Linguistics, Charles University, LINDAT/CLARIN PID (2018). http://hdl.handle.net/11234/1-2621

  2. Carlson, L., Okurowski, M.E., Marcu, D.: RST Discourse Treebank. Linguistic Data Consortium, University of Pennsylvania (2002)

    Google Scholar 

  3. Egg, M., Redeker, G.: How complex is discourse structure? In: Proceedings of LREC 2010, Malta, pp. 619–1623 (2010)

    Google Scholar 

  4. Feng, V.W., Lin, Z., Hirst, G.: The impact of deep hierarchical discourse structures in the evaluation of text coherence. In: Proceedings of COLING, pp. 940–949 (2014)

    Google Scholar 

  5. Lee, A., Prasad, R., Joshi, A., Dinesh, N.: Complexity of dependencies in discourse: are dependencies in discourse more complex than in syntax? In: Proceedings of the TLT 2006, Prague, Czech Republic, pp. 79–90 (2006)

    Google Scholar 

  6. Lee, A., Prasad, R., Joshi, A., Webber, B.: Departures from tree structures in discourse: shared arguments in the Penn Discourse Treebank. In: Proceedings of the Constraints in Discourse III Workshop, pp. 61–68 (2008)

    Google Scholar 

  7. Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies-Volume 1, pp. 997–1006 (2011)

    Google Scholar 

  8. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 8(3), 243–281 (1988)

    Google Scholar 

  9. Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)

    Book  Google Scholar 

  10. Poláková, L., Mírovský, J., Synková, P.: Signalling implicit relations: a PDTB-RST comparison. Dialogue Discourse 8(2), 225–248 (2017)

    Google Scholar 

  11. Poláková, L., Mírovský, J.: Anaphoric connectives and long-distance discourse relations in Czech. Computación y Sistemas 23(3), 711–717 (2019)

    Article  Google Scholar 

  12. Prasad, R., Dinesh, N., Lee, A., et al.: The Penn discourse treebank 2.0. In: Proceedings of LREC 2008, Morocco, pp. 2961–2968 (2008)

    Google Scholar 

  13. Prasad, R., Joshi, A., Webber, B.: Exploiting scope for shallow discourse parsing. In: Proceedings of LREC 2010, Malta, pp. 2076–2083 (2010)

    Google Scholar 

  14. Stede, M., Neumann, A.: Potsdam commentary corpus 2.0: annotation for discourse research. In: Proceedings of LREC 2014, pp. 925–929 (2014)

    Google Scholar 

  15. Scheffler, T., Stede, M.: Mapping PDTB-style connective annotation to RST-style discourse annotation. In: Proceedings of KONVENS 2016, pp. 242–247 (2016)

    Google Scholar 

  16. Taboada, M., Mann, W.C.: Rhetorical structure theory: looking back and moving ahead. Discourse Stud. 8(3), 423–459 (2006)

    Article  Google Scholar 

  17. Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based study. Comput. Linguist. 31(2), 249–287 (2005)

    Article  Google Scholar 

  18. Wolf, F., Gibson, E., Fisher, A., Knight, M.: Discourse Graphbank, LDC2005T08 [Corpus]. Linguistic Data Consortium, Philadelphia (2005)

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge support from the Grant Agency of the Czech Republic, project no. 20-09853S. The work described herein has been using resources provided by the LINDAT/CLARIAH-CZ Research Infrastructure, supported by the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2018101).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucie Poláková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Poláková, L., Mírovský, J. (2020). Mining Local Discourse Annotation for Features of Global Discourse Structure. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58323-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58322-4

  • Online ISBN: 978-3-030-58323-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics