Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data
The quality of static phones (e.g. vowels, fricatives, nasals, laterals) generated by articulatory speech synthesizers has reached a high level in the last years. Our goal is to expand this high quality to dynamic speech, i.e. whole syllables, words, and utterances by re-synthesizing natural acoustic speech data. Re-synthesis means that vocal tract action units or articulatory gestures, describing the succession of speech movements, are adapted spatio-temporally with respect to a natural speech signal produced by a natural “model speaker” of Standard German. This adaptation is performed using the software tool SAGA (Sound and Articulatory Gesture Alignment) that is currently under development in our lab. The resulting action unit scores are stored in a database and serve as input for our articulatory speech synthesizer. This technique is designed to be the basis for a unit selection articulatory speech synthesis in the future.
Keywordsspeech articulatory speech synthesis articulation re-synthesis vocal tract action units
Unable to display preview. Download preview PDF.
- Birkholz, P.: 3D Artikulatorische Sprachsynthese. Ph.D Thesis, Rostock (2005)Google Scholar
- Birkholz, P., Kröger, B.J.: Vocal Tract Model Adaptation Using Magnetic Resonance Imaging. In: Proceedings of the 7th International Seminar on Speech Production, Belo Horizonte, Brazil, pp. 493–500 (2006)Google Scholar
- Birkholz, P., Jackèl, D., Kröger, B.J.: Construction and Control of a Three-Dimensional Vocal Tract Model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Toulouse, France, pp. 873–876 (2006)Google Scholar
- Birkholz, P., Steiner, I., Breuer, S.: Control Concepts for Articulatory Speech Synthesis. In: Sixth ISCA Workshop on Speech Synthesis, Bonn, Germany, pp. 5–10 (2007)Google Scholar
- Deterding, D., Nolan, F.: Aspiration and Voicing of Chinese and English Plosives. In: Proceedings of the ICPhS XVI, Saarbrücken, pp. 385–388 (2007)Google Scholar
- Engwall, O.: Articulatory Synthesis Using Corpus-Based Estimation of Line Spectrum Pairs. In: Proceedings of Interspeech, Lisbon, Portugal (2005)Google Scholar
- Horiguchi, S., Bell-Berti, F.: The Velotrace: A Device for Monitoring Velar Position. Cleft Palate Journal 24(2), 104–111 (1987)Google Scholar
- Levelt, W.J.M., Roelofs, A., Meyer, A.S.: A Theory of Lexical Access in Speech Production. Behav. Brain Sci. 22, 1–38 (1999)Google Scholar
- Wrench, A.: An Investigation of Sagittal Velar Movements and its Correlation with Lip, Tongue and Jaw Movement. In: Proceedings of the ICPhS, San Francisco, pp. 435–438 (1999)Google Scholar