Abstract
We present in this paper the results of a tagged corpus-based study conducted on two kinds of disfluencies (repeats and self-repairs) from a corpus of spontaneous spoken French. This work first investigates the linguistic features of both phenomena, and then shows how – from a corpus output tagged with TreeTagger – to take into account repeats and self-repairs using word N-grams model and rule-based pattern matching. Some results on a test corpus are finally presented.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adda-Decker, M., Habert, B., Barras, C., Adda, G., Boula De Mareuil, P., Paroubek, P.: A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models. In: DIsfluencies in Spontaneous Speech conference, pp. 67–70. Göteborg University, Sweden (2003)
Blanche-Benveniste, C.: Approches de la langue parlée en français. Collection L’essentiel Français, Editions OPHRYS, Paris (2000)
Clark, H.H., Wasow, T.: Repeating words in spontaneous speech. Cognitive Psychology 37, 201–242 (1998)
Core, M., Schubert, L.: A syntactic framework for speech repairs and other disruptions. In: 37th Annual Meeting of the Association for Computational Linguistics, College Park, pp. 413–420 (1999)
Delic, E.: Présentation du Corpus de Référence du Français Parlé. Recherches Sur le Français Parlé 18, 11–42 (2004)
Engel, D., Charniak, E., Jonhson, M.: Parsing and disfluency placement. In: ACL conference on Empirical Methods in Language Processing, vol. 10, pp. 49–54 (2002)
Heeman, P.A., Allen, J.: Detecting and correcting speech repairs. In: 32nd Annual Meeting of the Association for Computational Linguistics, pp. 295–302 (1994)
Henry, S., Campione, E., Véronis, J.: Répétitions et pauses (silencieuses et remplies) en français spontané. In: 15th Journées d’Etude sur la Parole, pp. 261–264 (2004)
Levelt, W.J.M.: Monitoring and self-repair in speech. Cognition 14, 41–104 (1983)
Lickley, R.: Detecting disfluency in spontaneous speech. Ph.D. thesis, University of Edinburgh. Scotland (1994)
Liu, Y., Shriberg, E., Stolcke, A.: Automatic disfluency identification in conversational speech using multiple knowledge sources. In: EUROSPEECH 2003, Geneva, Switzerland, pp. 957–960 (2003)
Martinie, B.: Etude syntaxique des énoncés réparés en français parlé. Thèse d’état, Université Paris X-Nanterre, France (1999)
Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. Revised version, original work. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bove, R. (2008). A Tagged Corpus-Based Study for Repeats and Self-repairs Detection in French Transcribed Speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)