Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept

Qader, Raheel; Lecorvé, Gwénolé; Lolive, Damien; Sébillot, Pascale

doi:10.1007/978-3-030-00810-9_4

Raheel Qader¹⁶,
Gwénolé Lecorvé¹⁶,
Damien Lolive¹⁶ &
…
Pascale Sébillot¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11171))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

642 Accesses
1 Citations

Abstract

This paper presents an exploratory work to automatically insert disfluencies in text-to-speech (TTS) systems. The objective is to make TTS more spontaneous and expressive. To achieve this, we propose to focus on the linguistic level of speech through the insertion of pauses, repetitions and revisions. We formalize the problem as a theoretical process, where transformations are iteratively composed. This is a novel contribution since most of the previous work either focus on the detection or cleaning of linguistic disfluencies in speech transcripts, or solely concentrate on acoustic phenomena in TTS, especially pauses. We present a first implementation of the proposed process using conditional random fields and language models. The objective and perceptual evalation conducted on an English corpus of spontaneous speech show that our proposition is effective to generate disfluencies, and highlights perspectives for future improvements.

This study has been realized under the ANR (French National Research Agency) project SynPaFlex ANR-15-CE23-0015.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adell, J., Bonafonte, A., Escudero, D.: Filled pauses in speech synthesis: towards conversational speech. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 358–365. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74628-7_47
Chapter Google Scholar
Adell, J., Bonafonte, A., Mancebo, D.E.: On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2008)
Google Scholar
Adell, J., Escudero, D., Bonafonte, A.: Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Commun. 54, 459–476 (2012)
Article Google Scholar
Andersson, S., Georgila, K., Traum, D., Aylett, M., Clark, R.A.: Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In: Proceedings of Speech Prosody (2010)
Google Scholar
Betz, S., Wagner, P., Schlangen, D.: Micro-structure of disfluencies: basics for conversational speech synthesis. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2015)
Google Scholar
Clark, H.H.: Speaking in time. Speech Commun. 36, 5–13 (2002)
Article Google Scholar
Dall, R., Tomalin, M., Wester, M., Byrne, W.J., King, S.: Investigating automatic & human filled pause insertion for speech synthesis. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2014)
Google Scholar
Hassan, H., Schwartz, L., Hakkani-Tür, D., Tür, G.: Segmentation and disfluency removal for conversational speech translation. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2014)
Google Scholar
Honnibal, M., Johnson, M.: Joint incremental disfluency detection and dependency parsing. Trans. Assoc. Comput. Linguist. 2, 131–142 (2014)
Google Scholar
Kaushik, M., Trinkle, M., Hashemi-Sakhtsari, A.: Automatic detection and removal of disfluencies from spontaneous speech. In: Proceedings of the Australasian International Conference on Speech Science and Technology (SST) (2010)
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Audio Speech Lang. Process. 14, 1526–1540 (2006)
Article Google Scholar
de Mareüil, P.B., et al.: A quantitative study of disfluencies in French broadcast interviews. In: Proceedings of Disfluency in Spontaneous Speech Workshop (2005)
Google Scholar
Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)
Article Google Scholar
Rose, R.L.: The communicative value of filled pauses in spontaneous speech. Ph.D. thesis, University of Birmingham (1998)
Google Scholar
Shriberg, E.E.: Phonetic consequences of speech disfluency. Technical report, DTIC Document (1999)
Google Scholar
Shriberg, E.E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California (1994)
Google Scholar
Stolcke, A., Shriberg, E.: Statistical language modeling for speech disfluencies. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996)
Google Scholar
Stolcke, A., et al.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP) (1998)
Google Scholar
Sundaram, S., Narayanan, S.: An empirical text transformation method for spontaneous speech synthesizers. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2003)
Google Scholar
Székely, E., Mendelson, J., Gustafson, J.: Synthesising uncertainty: the interplay of vocal effort and hesitation disfluencies. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2017)
Google Scholar
Tomalin, M., Wester, M., Dall, R., Byrne, W., King, S.: A lattice-based approach to automatic filled pause insertion. In: Proceedinds of the Workshop on Disfluency in Spontaneous Speech (2015)
Google Scholar
Tree, J.E.F.: The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. J. Mem. Lang. 34, 709–738 (1995)
Article Google Scholar
Tree, J.E.F.: Listeners’ uses ofum and uh in speech comprehension. Mem. Cogn. 29(2), 320–326 (2001)
Article Google Scholar
Tseng, S.C.: Grammar, prosody and speech disfluencies in spoken dialogues. Unpublished doctoral dissertation. University of Bielefeld (1999)
Google Scholar
Wester, M., Aylett, M.P., Tomalin, M., Dall, R.: Artificial personality and disfluency. In: Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Univ Rennes, CNRS, IRISA, 22300, Lannion, France
Raheel Qader, Gwénolé Lecorvé & Damien Lolive
Univ Rennes, Inria, CNRS, IRISA, 35000, Rennes, France
Pascale Sébillot

Authors

Raheel Qader
View author publications
You can also search for this author in PubMed Google Scholar
Gwénolé Lecorvé
View author publications
You can also search for this author in PubMed Google Scholar
Damien Lolive
View author publications
You can also search for this author in PubMed Google Scholar
Pascale Sébillot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gwénolé Lecorvé .

Editor information

Editors and Affiliations

University of Mons, Mons, Belgium
Thierry Dutoit
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
University of Mons, Mons, Belgium
Gueorgui Pironkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qader, R., Lecorvé, G., Lolive, D., Sébillot, P. (2018). Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-00810-9_4
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00809-3
Online ISBN: 978-3-030-00810-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics