Abstract
The quality of static phones (e.g. vowels, fricatives, nasals, laterals) generated by articulatory speech synthesizers has reached a high level in the last years. Our goal is to expand this high quality to dynamic speech, i.e. whole syllables, words, and utterances by re-synthesizing natural acoustic speech data. Re-synthesis means that vocal tract action units or articulatory gestures, describing the succession of speech movements, are adapted spatio-temporally with respect to a natural speech signal produced by a natural “model speaker” of Standard German. This adaptation is performed using the software tool SAGA (Sound and Articulatory Gesture Alignment) that is currently under development in our lab. The resulting action unit scores are stored in a database and serve as input for our articulatory speech synthesizer. This technique is designed to be the basis for a unit selection articulatory speech synthesis in the future.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adams, S.G., Weismer, G., Kent, R.D.: Speaking Rate and Speech Movement Velocity Profiles. Journal of Speech and Hearing Research 36, 41–54 (1993)
Badin, P., Bailly, G., Revéret, L., Baciu, M., Segebarth, C., Savariaux, C.: Three-Dimensional Linear Articulatory Modeling of Tongue, Lips and Face, Based on MRI and Video Images. Journal of Phonetics 30, 533–553 (2002)
Birkholz, P.: 3D Artikulatorische Sprachsynthese. Ph.D Thesis, Rostock (2005)
Birkholz, P., Kröger, B.J.: Vocal Tract Model Adaptation Using Magnetic Resonance Imaging. In: Proceedings of the 7th International Seminar on Speech Production, Belo Horizonte, Brazil, pp. 493–500 (2006)
Birkholz, P., Jackel, D., Kröger, B.J.: Simulation of losses due to turbulence in the time-varying vocal system. IEEE Transactions on Audio, Speech, and Language Processing 15, 1218–1225 (2007)
Birkholz, P., Jackèl, D., Kröger, B.J.: Construction and Control of a Three-Dimensional Vocal Tract Model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Toulouse, France, pp. 873–876 (2006)
Birkholz, P., Steiner, I., Breuer, S.: Control Concepts for Articulatory Speech Synthesis. In: Sixth ISCA Workshop on Speech Synthesis, Bonn, Germany, pp. 5–10 (2007)
Dang, J., Honda, K.: Estimation of vocal tract shapes from speech sounds with a physiological articulatory model. Journal of Phonetics 30, 511–532 (2002)
Deterding, D., Nolan, F.: Aspiration and Voicing of Chinese and English Plosives. In: Proceedings of the ICPhS XVI, Saarbrücken, pp. 385–388 (2007)
Draper, M.H., Ladefoged, P., Whiteridge, D.: Respiratory Muscles in Speech. Journal of Speech and Hearing Research 2, 16–27 (1959)
Engwall, O.: Articulatory Synthesis Using Corpus-Based Estimation of Line Spectrum Pairs. In: Proceedings of Interspeech, Lisbon, Portugal (2005)
Horiguchi, S., Bell-Berti, F.: The Velotrace: A Device for Monitoring Velar Position. Cleft Palate Journal 24(2), 104–111 (1987)
Kröger, B.J.: A gestural production model and its application to reduction in German. Phonetica 50, 213–233 (1993)
Kröger, B.J., Birkholz, P.: A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 174–189. Springer, Heidelberg (2007)
Kröger, B.J., Schröder, G., Opgen-Rhein, C.: A gesture-based dynamic mo¬del describing articulatory movement data. Journal of the Acoustical Society of America 98, 1878–1889 (1995)
Levelt, W.J.M., Roelofs, A., Meyer, A.S.: A Theory of Lexical Access in Speech Production. Behav. Brain Sci. 22, 1–38 (1999)
Levelt, W.J.M., Wheeldon, L.: Do Speakers Have Access to a Mental Syllabary? Cognition 50, 239–269 (1994)
Löfqvist, A.: Lip Kinematics in Long and Short Stop and Fricative Consonants. J. Acoust. Soc. A. 117(2), 858–878 (2005)
Löfqvist, A., Gracco, V.L.: Lip and Jaw Kinematics in Bilabial Stop Consonant Production. Journal of Speech, Language, and Hearing Research 40, 877–893 (1997)
Löfqvist, A., Yoshioka, H.: Laryngeal Activity in Swedish Obstruent Clusters. J. Acoust. Soc. Am. 68(3), 792–801 (1980)
Moll, K.L., Daniloff, R.G.: Investigation of the Timinig of Velar Movements during Speech. JASA 50(2), 678–684 (1971)
Wrench, A.: An Investigation of Sagittal Velar Movements and its Correlation with Lip, Tongue and Jaw Movement. In: Proceedings of the ICPhS, San Francisco, pp. 435–438 (1999)
Yoshioka, H., Löfqvist, A., Hirose, H.: Laryngeal adjustments in the production of consonant clusters and geminates in American English. J. Acoust. Soc. Am. 70(6), 1615–1623 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bauer, D., Kannampuzha, J., Kröger, B.J. (2009). Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-03320-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03319-3
Online ISBN: 978-3-642-03320-9
eBook Packages: Computer ScienceComputer Science (R0)