Abstract
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.
References
Andersen, E. S., Brizuela, M., DuPuy, B., & Gonnermas, L. (1999). Cross-linguistic evidence for the early acquisition of discourse markers as register variables. Journal of Pragmatics, 31, 1339–1351.
Archakis, A. (2001). On discourse markers: Evidence from Modern Greek. Journal of Pragmatics, 33, 1235–1261.
Blakemore, D. (1992). Understanding utterances. Oxford, Cambridge: Blackwell Publishers.
Blakemore, D. (2002). Relevance and linguistic meaning: The semantics and pragmatics of discourse markers. Cambridge: Cambridge University Press.
Constantini, E., Burger, S., & Pianesi, F. (2002). NESPOLE!’s multilingual and multimodal corpus. Paper presented at 3rd International Conference on Language Resources and Evaluation 2002, LREC 2002, Las Palmas, Spain.
Coulthard, M. (1985). An introduction to discourse analysis. London: Longman.
Dedaić, M. N. (2005). Ironic denial: tabože in Croatian political discourse. Journal of Pragmatics, 37, 667–683.
Eggins, S., & Slade, D. (1997). Analysing casual conversation. London and Washington: Cassell.
Fox Tree, J. E., & Schrock, J. C. (1999). Discourse markers in spontaneous speech: Oh what a difference an oh makes. Journal of Memory and Language, 40(2), 280–295.
Fraser, B. (1990). An approach to discourse markers. Journal of Pragmatics, 14, 383–395.
Fraser, B. (1996). Pragmatic markers. Pragmatics, 6(2), 167–190.
Fraser, B. (1999). What are discourse markers? Journal of Pragmatics, 31, 931–952.
Fukushima, T. (2004). Japanese continuative conjunction ga as a semanetic boundary marker. Journal of Pragmatics, 25, 81–106.
Fuller, J. M. (2003). The influence of speaker roles on discourse marker use. Journal of Pragmatics, 35, 23–45.
Gorjanc, V. (1998). Konektorji v slovničnem opisu znanstvenega besedila. Slavistična revija, XLVI(4), 367–388.
Heeman, P., & Allen, J. (1999). Speech repairs, intonational phrases and discourse markers: Modeling speakers’ utterances in spoken dialog. Computational Linguistics, 25(4)
Heeman, P., Byron, D., & Allen, J. (1998). Identifying discourse markers in spoken dialogue. In Working notes of AAAI spring symposium on applying machine learning to discourse processing. Stanford, CA
Jucker A. H., & Ziv Y. (Eds.). (1998). Discourse markers: Descriptions and theory. Amsterdam: John Benjamins.
de Klerk, V. (2004). Procedural meanings of well in a corpus of Xhosa English. Journal of Pragmatics, 37, 1183–1205.
Kroon, C. (1998). A framework for the description of Latin discourse markers. Journal of Pragmatics, 30, 205–223.
Kurematsu, A., Akegami, Y., Burger, S., Jekat, S., Lause, B., MacLaren, V., Oppermann, D., & Schultz, T. (2000). Verbmobil dialogues: Multifaced analysis. Paper presented at the International Conference of Spoken Language Processing.
Lazzari, G., Waibel, A., & Zong, C. (2004). Worldwide ongoing activities on multilingual speech to speech translation. Paper presented at Interspeech 2004— ICSLP, International Conference on Spoken Language Processing, Special Session: Multi-lingual speech-to-speech translation. Jeju Island, Korea.
Levinson, S. (1983). Pragmatics. Cambridge: Cambridge University Press.
Matsui, T. (2001). Semantics and pragmatics of a Japanese discourse marker dakara (so/in other words): A unitary account. Journal of Pragmatics, 34, 867–891.
Miltsakaki, E., Prasad, R., Joshi, A., & Webber, B. (2002). The Penn discourse treebank. Paper presented at the Language Resources and Evaluation Conference’04. Lisbon, Portugal.
Montes, R. G. (1999). The development of discourse markers in Spanish: Intejections. Journal of Pragmatics, 31, 1289–1319.
Norrick, N. R. (2001). Discourse markers in oral narrative. Journal of Pragmatics, 33, 849–878.
Pisanski, A. (2002). Analiza nekaterih metabesedilnih elementov v slovenskih znanstvenih člankih v dveh časovnih obdobjih. Slavistična revija, 50(2), 183–197.
Pisanski, P. A. (2005). Text-organising metatext in research articles: An English-Slovene contrastive analysis. Engl. specif. purp. (N.Y. N.Y.), 24(3), 307–319.
Redeker, G. (1990). Ideational and pragmatic markers of discourse structure. Journal of Pragmatics, 14, 367–381.
Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press.
Schiffrin, D. (1994). Approaches to discourse. Cambridge: Blackwell Publishers.
Schlamberger Brezar, M. (1998). Vloga povezovalcev v diskurzu. In Jezik za danes in jutri (pp. 194–202). Ljubljana: Društvo za uporabno jezikoslovje Slovenije.
Schourup, L. (1999). Discourse markers. Lingua, 107, 227–265.
Schourup, L. (2001). Rethinking well. Journal of Pragmatics, 33, 1025–1060.
Smolej, M. (2004). Členki kot besedilni povezovalci. Jezik in slovstvo, 49(5), 45–57.
Swerts, M. (1998). Filled pauses as markers of discourse structure. Journal of Pragmatics, 30, 485–496.
Tagliamonte, S. (2005). So who? Like how? Just what? Discourse markers in the conversations of Young Canadians. Journal of Pragmatics, 37, 1896–1915.
Tchizmarova, I. K. (2005). Hedging functions of the Bulgarian discourse marker xajde. Journal of Pragmatics, 37, 1143–1163.
Tillmann, H. G., & Tischer, B. (1995). Collection and exploitation of spontaneous speech produced in negotiation dialogues. Paper presented at the ESCA Workshop on Spoken Language Systems, pp 217–220. Vigsø.
Ueffing, N., Ney, H., Arranz, V., & Castell N. (2002). Overview of speech centered translation. LC-STAR, project report D4.1. http://www.lc-star.com/archive.htm
Verdonik, D., & Rojc, M. (2006). Are you ready for a call?—Spontaneous conversations in tourism for speech-to-speech translation systems. Paper presented at the 5th International Conference on Language Resources and Evaluation. Genoa, Italy.
Vlemings, J. (2003). The discourse use of French donc in imperative sentences. Journal of Pragmatics, 35, 1095–1112.
Waibel, A. (1996). Interactive translation of conversational speech. IEEE Computer, 29(7), 41–48.
Wilson, D., & Sperber, D. (1986). Relevance. Cambridge: Cambridge University Press.
Wood Linda A., & Kroger Rolf O. (2000). Doing Discourse Analysis: Methods for studying action in talk and text. Sage Publications, Inc.
Žgank, A., Rotovnik, T., Sepesy Maučec, M., Verdonik, D., Kitak, J., Vlaj, D., Hozjan, V., Kačič, Z., & Horvat, B.(2004). Acquisition and annotation of Slovenian Broadcast News database. Paper presented at the 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal.
Acknowledgements
We sincerely thank all the tourist companies that helped us record the conversations for the TURDIS corpus: the Sonček, Kompas, Neckermann Reisen and Aritours tourist agencies, the Terme Maribor, especially the Hotel Piramida and the Hotel Habakuk, and the Mariborski zavod za turizem and its tourist office MATIC. We also thank all the tourist agents in these companies whose conversations have been recorded, and all the callers who were ready to use the TURDIS system.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Transcription of the examples in this paper
The transcription rules for the examples from the TURDIS-1 corpus in this paper are:
-
each caller is identified by the letter K and an index number, e.g., K1
-
each tourist agent is identified by the letter A, two lower case letters, indicating the tourist company he works for (e.g., so for the Soncek tourist agency), and an index number (e.g., Aso1)
-
the speaker’s ID occurs at the beginning of each utterance, or when a turn consists of more than one utterance
-
the text of conversations follows a colon sign (:); for overlapping speech the sign [overlap] is used between the speaker’s ID and the colon sign, for example:
-
K1 [overlap]: text
-
Aso1 [overlap]: text
-
the English translation of each utterance follows a slash sign (/)
-
other signs occurring in the examples are:
Sign | Description |
---|---|
... | Cut-off utterance |
wor() | Cut-off word |
? | Rising intonation |
#word# | Emphasized word |
wo[:]rd | Previous phoneme is prolonged |
[.] | Short silence |
Text [1] | Utterance continues in the first segment that follows, starting with [2] |
[2] text | Continuation of the last preceding segment, ending in [1] |
text [P] text | Segment includes two utterances, [P] signals the border |
∼GMX | Abbreviation is spelled out |
@ SI | Abbreviation is pronounced as one word |
[+SOGOVORNIK_ja] / [+OVERLAP_yes] | Background signal ja (yes, yeah) overlaps with the previous word of the speaker’s turn |
[SOGOVORNIK_ja] / [OVERLAP_ja] | Background signal ja (yes, yeah) is pronounced in a pause that the speaker makes in his talk |
[+LAUGH] | The speaker laughing while pronouncing the previous word |
[LAUGH] | The speaker laughing |
Rights and permissions
About this article
Cite this article
Verdonik, D., Rojc, M. & Stabej, M. Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language. Lang Resources & Evaluation 41, 147–180 (2007). https://doi.org/10.1007/s10579-007-9035-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9035-7