Skip to main content
Log in

Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • Andersen, E. S., Brizuela, M., DuPuy, B., & Gonnermas, L. (1999). Cross-linguistic evidence for the early acquisition of discourse markers as register variables. Journal of Pragmatics, 31, 1339–1351.

    Article  Google Scholar 

  • Archakis, A. (2001). On discourse markers: Evidence from Modern Greek. Journal of Pragmatics, 33, 1235–1261.

    Article  Google Scholar 

  • Blakemore, D. (1992). Understanding utterances. Oxford, Cambridge: Blackwell Publishers.

    Google Scholar 

  • Blakemore, D. (2002). Relevance and linguistic meaning: The semantics and pragmatics of discourse markers. Cambridge: Cambridge University Press.

    Google Scholar 

  • Constantini, E., Burger, S., & Pianesi, F. (2002). NESPOLE!’s multilingual and multimodal corpus. Paper presented at 3rd International Conference on Language Resources and Evaluation 2002, LREC 2002, Las Palmas, Spain.

  • Coulthard, M. (1985). An introduction to discourse analysis. London: Longman.

    Google Scholar 

  • Dedaić, M. N. (2005). Ironic denial: tabože in Croatian political discourse. Journal of Pragmatics, 37, 667–683.

    Article  Google Scholar 

  • Eggins, S., & Slade, D. (1997). Analysing casual conversation. London and Washington: Cassell.

    Google Scholar 

  • Fox Tree, J. E., & Schrock, J. C. (1999). Discourse markers in spontaneous speech: Oh what a difference an oh makes. Journal of Memory and Language, 40(2), 280–295.

    Article  Google Scholar 

  • Fraser, B. (1990). An approach to discourse markers. Journal of Pragmatics, 14, 383–395.

    Article  Google Scholar 

  • Fraser, B. (1996). Pragmatic markers. Pragmatics, 6(2), 167–190.

    Google Scholar 

  • Fraser, B. (1999). What are discourse markers? Journal of Pragmatics, 31, 931–952.

    Article  Google Scholar 

  • Fukushima, T. (2004). Japanese continuative conjunction ga as a semanetic boundary marker. Journal of Pragmatics, 25, 81–106.

    Google Scholar 

  • Fuller, J. M. (2003). The influence of speaker roles on discourse marker use. Journal of Pragmatics, 35, 23–45.

    Article  Google Scholar 

  • Gorjanc, V. (1998). Konektorji v slovničnem opisu znanstvenega besedila. Slavistična revija, XLVI(4), 367–388.

    Google Scholar 

  • Heeman, P., & Allen, J. (1999). Speech repairs, intonational phrases and discourse markers: Modeling speakers’ utterances in spoken dialog. Computational Linguistics, 25(4)

  • Heeman, P., Byron, D., & Allen, J. (1998). Identifying discourse markers in spoken dialogue. In Working notes of AAAI spring symposium on applying machine learning to discourse processing. Stanford, CA

  • Jucker A. H., & Ziv Y. (Eds.). (1998). Discourse markers: Descriptions and theory. Amsterdam: John Benjamins.

    Google Scholar 

  • de Klerk, V. (2004). Procedural meanings of well in a corpus of Xhosa English. Journal of Pragmatics, 37, 1183–1205.

    Google Scholar 

  • Kroon, C. (1998). A framework for the description of Latin discourse markers. Journal of Pragmatics, 30, 205–223.

    Article  Google Scholar 

  • Kurematsu, A., Akegami, Y., Burger, S., Jekat, S., Lause, B., MacLaren, V., Oppermann, D., & Schultz, T. (2000). Verbmobil dialogues: Multifaced analysis. Paper presented at the International Conference of Spoken Language Processing.

  • Lazzari, G., Waibel, A., & Zong, C. (2004). Worldwide ongoing activities on multilingual speech to speech translation. Paper presented at Interspeech 2004— ICSLP, International Conference on Spoken Language Processing, Special Session: Multi-lingual speech-to-speech translation. Jeju Island, Korea.

  • Levinson, S. (1983). Pragmatics. Cambridge: Cambridge University Press.

    Google Scholar 

  • Matsui, T. (2001). Semantics and pragmatics of a Japanese discourse marker dakara (so/in other words): A unitary account. Journal of Pragmatics, 34, 867–891.

    Article  Google Scholar 

  • Miltsakaki, E., Prasad, R., Joshi, A., & Webber, B. (2002). The Penn discourse treebank. Paper presented at the Language Resources and Evaluation Conference’04. Lisbon, Portugal.

  • Montes, R. G. (1999). The development of discourse markers in Spanish: Intejections. Journal of Pragmatics, 31, 1289–1319.

    Article  Google Scholar 

  • Norrick, N. R. (2001). Discourse markers in oral narrative. Journal of Pragmatics, 33, 849–878.

    Article  Google Scholar 

  • Pisanski, A. (2002). Analiza nekaterih metabesedilnih elementov v slovenskih znanstvenih člankih v dveh časovnih obdobjih. Slavistična revija, 50(2), 183–197.

    Google Scholar 

  • Pisanski, P. A. (2005). Text-organising metatext in research articles: An English-Slovene contrastive analysis. Engl. specif. purp. (N.Y. N.Y.), 24(3), 307–319.

    Google Scholar 

  • Redeker, G. (1990). Ideational and pragmatic markers of discourse structure. Journal of Pragmatics, 14, 367–381.

    Article  Google Scholar 

  • Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press.

    Google Scholar 

  • Schiffrin, D. (1994). Approaches to discourse. Cambridge: Blackwell Publishers.

    Google Scholar 

  • Schlamberger Brezar, M. (1998). Vloga povezovalcev v diskurzu. In Jezik za danes in jutri (pp. 194–202). Ljubljana: Društvo za uporabno jezikoslovje Slovenije.

  • Schourup, L. (1999). Discourse markers. Lingua, 107, 227–265.

    Article  Google Scholar 

  • Schourup, L. (2001). Rethinking well. Journal of Pragmatics, 33, 1025–1060.

    Article  Google Scholar 

  • Smolej, M. (2004). Členki kot besedilni povezovalci. Jezik in slovstvo, 49(5), 45–57.

    Google Scholar 

  • Swerts, M. (1998). Filled pauses as markers of discourse structure. Journal of Pragmatics, 30, 485–496.

    Article  Google Scholar 

  • Tagliamonte, S. (2005). So who? Like how? Just what? Discourse markers in the conversations of Young Canadians. Journal of Pragmatics, 37, 1896–1915.

    Article  Google Scholar 

  • Tchizmarova, I. K. (2005). Hedging functions of the Bulgarian discourse marker xajde. Journal of Pragmatics, 37, 1143–1163.

    Article  Google Scholar 

  • Tillmann, H. G., & Tischer, B. (1995). Collection and exploitation of spontaneous speech produced in negotiation dialogues. Paper presented at the ESCA Workshop on Spoken Language Systems, pp 217–220. Vigsø.

  • Ueffing, N., Ney, H., Arranz, V., & Castell N. (2002). Overview of speech centered translation. LC-STAR, project report D4.1. http://www.lc-star.com/archive.htm

  • Verdonik, D., & Rojc, M. (2006). Are you ready for a call?—Spontaneous conversations in tourism for speech-to-speech translation systems. Paper presented at the 5th International Conference on Language Resources and Evaluation. Genoa, Italy.

  • Vlemings, J. (2003). The discourse use of French donc in imperative sentences. Journal of Pragmatics, 35, 1095–1112.

    Article  Google Scholar 

  • Waibel, A. (1996). Interactive translation of conversational speech. IEEE Computer, 29(7), 41–48.

    Google Scholar 

  • Wilson, D., & Sperber, D. (1986). Relevance. Cambridge: Cambridge University Press.

  • Wood Linda A., & Kroger Rolf O. (2000). Doing Discourse Analysis: Methods for studying action in talk and text. Sage Publications, Inc.

  • Žgank, A., Rotovnik, T., Sepesy Maučec, M., Verdonik, D., Kitak, J., Vlaj, D., Hozjan, V., Kačič, Z., & Horvat, B.(2004). Acquisition and annotation of Slovenian Broadcast News database. Paper presented at the 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal.

Download references

Acknowledgements

We sincerely thank all the tourist companies that helped us record the conversations for the TURDIS corpus: the Sonček, Kompas, Neckermann Reisen and Aritours tourist agencies, the Terme Maribor, especially the Hotel Piramida and the Hotel Habakuk, and the Mariborski zavod za turizem and its tourist office MATIC. We also thank all the tourist agents in these companies whose conversations have been recorded, and all the callers who were ready to use the TURDIS system.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darinka Verdonik.

Appendix

Appendix

1.1 Transcription of the examples in this paper

The transcription rules for the examples from the TURDIS-1 corpus in this paper are:

  • each caller is identified by the letter K and an index number, e.g., K1

  • each tourist agent is identified by the letter A, two lower case letters, indicating the tourist company he works for (e.g., so for the Soncek tourist agency), and an index number (e.g., Aso1)

  • the speaker’s ID occurs at the beginning of each utterance, or when a turn consists of more than one utterance

  • the text of conversations follows a colon sign (:); for overlapping speech the sign [overlap] is used between the speaker’s ID and the colon sign, for example:

  • K1 [overlap]: text

  • Aso1 [overlap]: text

  • the English translation of each utterance follows a slash sign (/)

  • other signs occurring in the examples are:

Sign

Description

...

Cut-off utterance

wor()

Cut-off word

?

Rising intonation

#word#

Emphasized word

wo[:]rd

Previous phoneme is prolonged

[.]

Short silence

Text [1]

Utterance continues in the first segment that follows, starting with [2]

[2] text

Continuation of the last preceding segment, ending in [1]

text [P] text

Segment includes two utterances, [P] signals the border

GMX

Abbreviation is spelled out

@ SI

Abbreviation is pronounced as one word

[+SOGOVORNIK_ja] / [+OVERLAP_yes]

Background signal ja (yes, yeah) overlaps with the previous word of the speaker’s turn

[SOGOVORNIK_ja] / [OVERLAP_ja]

Background signal ja (yes, yeah) is pronounced in a pause that the speaker makes in his talk

[+LAUGH]

The speaker laughing while pronouncing the previous word

[LAUGH]

The speaker laughing

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verdonik, D., Rojc, M. & Stabej, M. Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language. Lang Resources & Evaluation 41, 147–180 (2007). https://doi.org/10.1007/s10579-007-9035-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9035-7

Keywords

Navigation