Language Resources and Evaluation

, Volume 41, Issue 2, pp 147–180

Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language

Authors

    • Faculty of Electrical Engineering and Computer ScienceUniversity of Maribor
  • Matej Rojc
    • Faculty of Electrical Engineering and Computer ScienceUniversity of Maribor
  • Marko Stabej
    • Faculty of ArtsUniversity of Ljubljana
Article

DOI: 10.1007/s10579-007-9035-7

Cite this article as:
Verdonik, D., Rojc, M. & Stabej, M. Lang Resources & Evaluation (2007) 41: 147. doi:10.1007/s10579-007-9035-7

Abstract

Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.

Keywords

Discourse markersSpeech corporaAnnotatingConversationDiscourse analysisSpeech-to-speech translationSpontaneous speechSlovenian language

Copyright information

© Springer Science+Business Media B.V. 2007