Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data

Bauer, Dominik; Kannampuzha, Jim; Kröger, Bernd J.

doi:10.1007/978-3-642-03320-9_32

Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data

Dominik Bauer²¹,
Jim Kannampuzha²¹ &
Bernd J. Kröger²¹

Conference paper

1600 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5641))

Abstract

The quality of static phones (e.g. vowels, fricatives, nasals, laterals) generated by articulatory speech synthesizers has reached a high level in the last years. Our goal is to expand this high quality to dynamic speech, i.e. whole syllables, words, and utterances by re-synthesizing natural acoustic speech data. Re-synthesis means that vocal tract action units or articulatory gestures, describing the succession of speech movements, are adapted spatio-temporally with respect to a natural speech signal produced by a natural “model speaker” of Standard German. This adaptation is performed using the software tool SAGA (Sound and Articulatory Gesture Alignment) that is currently under development in our lab. The resulting action unit scores are stored in a database and serve as input for our articulatory speech synthesizer. This technique is designed to be the basis for a unit selection articulatory speech synthesis in the future.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, S.G., Weismer, G., Kent, R.D.: Speaking Rate and Speech Movement Velocity Profiles. Journal of Speech and Hearing Research 36, 41–54 (1993)
Article Google Scholar
Badin, P., Bailly, G., Revéret, L., Baciu, M., Segebarth, C., Savariaux, C.: Three-Dimensional Linear Articulatory Modeling of Tongue, Lips and Face, Based on MRI and Video Images. Journal of Phonetics 30, 533–553 (2002)
Article Google Scholar
Birkholz, P.: 3D Artikulatorische Sprachsynthese. Ph.D Thesis, Rostock (2005)
Google Scholar
Birkholz, P., Kröger, B.J.: Vocal Tract Model Adaptation Using Magnetic Resonance Imaging. In: Proceedings of the 7th International Seminar on Speech Production, Belo Horizonte, Brazil, pp. 493–500 (2006)
Google Scholar
Birkholz, P., Jackel, D., Kröger, B.J.: Simulation of losses due to turbulence in the time-varying vocal system. IEEE Transactions on Audio, Speech, and Language Processing 15, 1218–1225 (2007)
Article Google Scholar
Birkholz, P., Jackèl, D., Kröger, B.J.: Construction and Control of a Three-Dimensional Vocal Tract Model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Toulouse, France, pp. 873–876 (2006)
Google Scholar
Birkholz, P., Steiner, I., Breuer, S.: Control Concepts for Articulatory Speech Synthesis. In: Sixth ISCA Workshop on Speech Synthesis, Bonn, Germany, pp. 5–10 (2007)
Google Scholar
Dang, J., Honda, K.: Estimation of vocal tract shapes from speech sounds with a physiological articulatory model. Journal of Phonetics 30, 511–532 (2002)
Article Google Scholar
Deterding, D., Nolan, F.: Aspiration and Voicing of Chinese and English Plosives. In: Proceedings of the ICPhS XVI, Saarbrücken, pp. 385–388 (2007)
Google Scholar
Draper, M.H., Ladefoged, P., Whiteridge, D.: Respiratory Muscles in Speech. Journal of Speech and Hearing Research 2, 16–27 (1959)
Article Google Scholar
Engwall, O.: Articulatory Synthesis Using Corpus-Based Estimation of Line Spectrum Pairs. In: Proceedings of Interspeech, Lisbon, Portugal (2005)
Google Scholar
Horiguchi, S., Bell-Berti, F.: The Velotrace: A Device for Monitoring Velar Position. Cleft Palate Journal 24(2), 104–111 (1987)
Google Scholar
Kröger, B.J.: A gestural production model and its application to reduction in German. Phonetica 50, 213–233 (1993)
Article Google Scholar
Kröger, B.J., Birkholz, P.: A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 174–189. Springer, Heidelberg (2007)
Chapter Google Scholar
Kröger, B.J., Schröder, G., Opgen-Rhein, C.: A gesture-based dynamic mo¬del describing articulatory movement data. Journal of the Acoustical Society of America 98, 1878–1889 (1995)
Article Google Scholar
Levelt, W.J.M., Roelofs, A., Meyer, A.S.: A Theory of Lexical Access in Speech Production. Behav. Brain Sci. 22, 1–38 (1999)
Google Scholar
Levelt, W.J.M., Wheeldon, L.: Do Speakers Have Access to a Mental Syllabary? Cognition 50, 239–269 (1994)
Article Google Scholar
Löfqvist, A.: Lip Kinematics in Long and Short Stop and Fricative Consonants. J. Acoust. Soc. A. 117(2), 858–878 (2005)
Article Google Scholar
Löfqvist, A., Gracco, V.L.: Lip and Jaw Kinematics in Bilabial Stop Consonant Production. Journal of Speech, Language, and Hearing Research 40, 877–893 (1997)
Article Google Scholar
Löfqvist, A., Yoshioka, H.: Laryngeal Activity in Swedish Obstruent Clusters. J. Acoust. Soc. Am. 68(3), 792–801 (1980)
Article Google Scholar
Moll, K.L., Daniloff, R.G.: Investigation of the Timinig of Velar Movements during Speech. JASA 50(2), 678–684 (1971)
Article Google Scholar
Wrench, A.: An Investigation of Sagittal Velar Movements and its Correlation with Lip, Tongue and Jaw Movement. In: Proceedings of the ICPhS, San Francisco, pp. 435–438 (1999)
Google Scholar
Yoshioka, H., Löfqvist, A., Hirose, H.: Laryngeal adjustments in the production of consonant clusters and geminates in American English. J. Acoust. Soc. Am. 70(6), 1615–1623 (1981)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany
Dominik Bauer, Jim Kannampuzha & Bernd J. Kröger

Authors

Dominik Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Jim Kannampuzha
View author publications
You can also search for this author in PubMed Google Scholar
Bernd J. Kröger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via G. Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, 182 52, Prague 8, Czech Republic
Robert Vích

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bauer, D., Kannampuzha, J., Kröger, B.J. (2009). Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-03320-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03319-3
Online ISBN: 978-3-642-03320-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics