Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis
This paper focuses on the understanding of the effects leading to high-quality HMM-based speech synthesis with various degrees of articulation. The adaptation of a neutral speech synthesizer to generate hypo and hyperarticulated speech is first performed. The impact of cepstral adaptation, of prosody, of phonetic transcription as well as the adaptation technique on the perceived degree of articulation is studied. For this, a subjective evaluation is conducted. It is shown that high-quality hypo and hyperarticulated speech synthesis requires the use of an efficient adaptation such as CMLLR. Moreover, in addition to prosody adaptation, the importance of cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyperarticulated phonetic transcriptions is assessed.
KeywordsSpeech Synthesis Voice Conversion Speech Synthesizer Phonetic Transcription Speaker Adaptation
Unable to display preview. Download preview PDF.
- 2.Beller, G.: Analyse et Modèle Génératif de l’Expressivité - Application à la Parole et à l’Interprétation Musicale, PhD Thesis, Universit Paris VI - Pierre et Marie Curie, IRCAM (2009) (in French)Google Scholar
- 3.Beller, G., Obin, N., Rodet, X.: Articulation Degree as a Prosodic Dimension of Expressive Speech. In: Fourth International Conference on Speech Prosody, Campinas, Brazil (2008)Google Scholar
- 4.Picart, B., Drugman, T., Dutoit, T.: Analysis and Synthesis of Hypo and Hyperarticulated Speech. In: Proc. Speech Synthesis Workshop 7 (SSW7), Kyoto, Japan (2010)Google Scholar
- 5.Picart, B., Drugman, T., Dutoit, T.: Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis. In: Proc. Interspeech, Firenze, Italy (2011)Google Scholar
- 7.Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – Towards TTS with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)Google Scholar
- 9.HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
- 11.Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proc. Interspeech, Brighton, U.K. (2009)Google Scholar
- 14.Ferguson, J.: Variable Duration Models for Speech. In: Proc. Symp. on the Application of Hidden Markov Models to Text and Speech, pp. 143–179 (1980)Google Scholar