Advertisement

Articulatory Speech Synthesis from Static Context-Aware Articulatory Targets

  • Anastasiia TsukanovaEmail author
  • Benjamin Elie
  • Yves Laprie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10733)

Abstract

The aim of this work is to develop an algorithm for controlling the articulators (the jaw, the tongue, the lips, the velum, the larynx and the epiglottis) to produce given speech sounds, syllables and phrases. This control has to take into account coarticulation and be flexible enough to be able to vary strategies for speech production. The data for the algorithm are 97 static MRI images capturing the articulation of French vowels and blocked consonant-vowel syllables. The results of this synthesis are evaluated visually, acoustically and perceptually, and the problems encountered are broken down by their origin: the dataset, its modeling, the algorithm for managing the vocal tract shapes, their translation to the area functions, and the acoustic simulation. We conclude that, among our test examples, the articulatory strategies for vowels and stops are most correct, followed by those of nasals and fricatives. Improving timing strategies with dynamic data is suggested as an avenue for future work.

Keywords

Articulatory synthesis Coarticulation Articulatory gestures 

Notes

Acknowledgments

The data collection for this work benefited from the support of the project ArtSpeech of ANR (Agence Nationale de la Recherche), France.

References

  1. 1.
    Anderson, P., Harandi, N.M., Moisik, S., Stavness, I., Fels, S.: A comprehensive 3D biomechanically-driven vocal tract model including inverse dynamics for speech research. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  2. 2.
    Birkholz, P., Jackèl, D., Kröger, B.J.: Construction and control of a three-dimensional vocal tract model. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), pp. 873–876 (2006)Google Scholar
  3. 3.
    Birkholz, P.: Modeling consonant-vowel coarticulation for articulatory speech synthesis. PloS one 8(4), e60603 (2013)CrossRefGoogle Scholar
  4. 4.
    Elie, B., Laprie, Y.: Extension of the single-matrix formulation of the vocal tract: consideration of bilateral channels and connection of self-oscillating models of the vocal folds with a glottal chink. Speech Commun. 82, 85–96 (2016)CrossRefGoogle Scholar
  5. 5.
    Elie, B., Laprie, Y.: A glottal chink model for the synthesis of voiced fricatives. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5240–5244. IEEE (2016)Google Scholar
  6. 6.
    Elie, B., Laprie, Y., Vuissoz, P.A., Odille, F.: High spatiotemporal cineMRI films using compressed sensing for acquiring articulatory data. In: Eusipco, Budapest, pp. 1353–1357, August 2016Google Scholar
  7. 7.
    Heinz, J.M., Stevens, K.N.: On the relations between lateral cineradiographs, area functions and acoustic spectra of speech. In: Proceedings of the 5th International Congress on Acoustics, p. A44 (1965)Google Scholar
  8. 8.
    Honda, K., Maeda, S.: Glottal-opening and airflow pattern during production of voiceless fricatives: a new non-invasive instrumentation. J. Acoust. Soc. Am. 123(5), 3738–3738 (2008)CrossRefGoogle Scholar
  9. 9.
    Howard, I.S., Messum, P.: Modeling the development of pronunciation in infant speech acquisition. Motor Control 15(1), 85–117 (2011)CrossRefGoogle Scholar
  10. 10.
    Laprie, Y., Busset, J.: Construction and evaluation of an articulatory model of the vocal tract. In: 19th European Signal Processing Conference - EUSIPCO-2011. Barcelona, Spain, August 2011Google Scholar
  11. 11.
    Laprie, Y., Vaxelaire, B., Cadot, M.: Geometric articulatory model adapted to the production of consonants. In: 10th International Seminar on Speech Production (ISSP). Köln, Allemagne, May 2014. http://hal.inria.fr/hal-01002125
  12. 12.
    Laprie, Y., Elie, B., Tsukanova, A.: 2D articulatory velum modeling applied to copy synthesis of sentences containing nasal phonemes. In: International Congress of Phonetic Sciences (2015)Google Scholar
  13. 13.
    Lloyd, J.E., Stavness, I., Fels, S.: ArtiSynth: a fast interactive biomechanical modeling toolkit combining multibody and finite element simulation. In: Payan Y. (eds.) Soft Tissue Biomechanical Modeling for Computer Assisted Surgery, pp. 355–394. Springer, Berlin (2012). https://doi.org/10.1007/8415_2012_126
  14. 14.
    Maeda, S.: Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In: Hardcastle, W., Marchal, A. (eds.) Speech Production and Speech Modelling, pp. 131–149. Kluwer Academic Publisher, Amsterdam (1990)CrossRefGoogle Scholar
  15. 15.
    McGowan, R., Jackson, M., Berger, M.: Analyses of vocal tract cross-distance to area mapping: an investigation of a set of vowel images. J. Acoust. Soc. Am. 131(1), 424–434 (2012)CrossRefGoogle Scholar
  16. 16.
    Öhman, S.: Coarticulation in VCV utterances: spectrographic measurements. J. Acoust. Soc. Am. 39(1), 151–168 (1966)CrossRefGoogle Scholar
  17. 17.
    Soquet, A., Lecuit, V., Metens, T., Demolin, D.: Mid-sagittal cut to area function tranformations: direct measurements of mid-sagittal distance and area with MRI. Speech Commun. 36(3–4), 169–180 (2002)CrossRefGoogle Scholar
  18. 18.
    Story, B.: Phrase-level speech simulation with an airway modulation model of speech production. Comput. Speech Lang. 27(4), 989–1010 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Anastasiia Tsukanova
    • 1
    Email author
  • Benjamin Elie
    • 2
  • Yves Laprie
    • 1
  1. 1.Université de Lorraine, CNRS, Inria, LORIANancyFrance
  2. 2.L2S, CentraleSupelec, CNRS, Université Paris-sud, Université Paris-SaclayGif-sur-YvetteFrance

Personalised recommendations