Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion

  • Korin Richmond
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4885)


We have previously proposed a trajectory model which is based on a mixture density network (MDN) trained with target variables augmented with dynamic features together with an algorithm for estimating maximum likelihood trajectories which respects the constraints between those features. In this paper, we have extended that model to allow diagonal covariance matrices and multiple mixture components in the trajectory MDN output probability density functions. We have evaluated this extended model on an inversion mapping task and found the trajectory model works well, outperforming smoothing of equivalent trajectories using low-pass filtering. Increasing the number of mixture components in the TMDN improves results further.


Root Mean Square Error Gaussian Mixture Model Mixture Component Automatic Speech Recognition Trajectory Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schroeter, J., Sondhi, M.M.: Speech coding based on physiological models of speech production. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech Signal Processing, pp. 231–268. Marcel Dekker Inc., New York (1992)Google Scholar
  2. 2.
    Toda, T., Black, A., Tokuda, K.: Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. In: Proc. 5th ISCA Workshop on Speech Synthesis (2004)Google Scholar
  3. 3.
    King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., Wester, M.: Speech production knowledge in automatic speech recognition. Journal of the Acoustical Society of America 121(2), 723–742 (2007)CrossRefGoogle Scholar
  4. 4.
    Wrench, A., Richmond, K.: Continuous speech recognition using articulatory data. In: Proc. ICSLP 2000, Beijing, China (2000)Google Scholar
  5. 5.
    Wakita, H.: Estimation of vocal-tract shapes from acoustical analysis of the speech wave: The state of the art. IEEE Trans. Acoust. Speech Signal Process. ASSP-27, 281–285 (1979)CrossRefGoogle Scholar
  6. 6.
    Shirai, K., Kobayashi, T.: Estimating articulatory motion from speech wave. Speech Communication 5, 159–170 (1986)CrossRefGoogle Scholar
  7. 7.
    Atal, B.S., Chang, J.J., Mathews, M.V., Tukey, J.W.: Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer sorting technique. J. Acoust. Soc. Am. 63, 1535–1555 (1978)CrossRefGoogle Scholar
  8. 8.
    Rahim, M.G., Kleijn, W.B., Schroeter, J., Goodyear, C.C.: Acoustic-to-articulatory parameter mapping using an assembly of neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 485–488 (1991)Google Scholar
  9. 9.
    Richmond, K., King, S., Taylor, P.: Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language 17, 153–172 (2003)CrossRefGoogle Scholar
  10. 10.
    Hogden, J., Lofqvist, A., Gracco, V., Zlokarnik, I., Rubin, P., Saltzman, E.: Accurate recovery of articulator positions from acoustics: New conclusions based on human data. J. Acoust. Soc. Am. 100(3), 1819–1834 (1996)CrossRefGoogle Scholar
  11. 11.
    Toda, T., Black, A., Tokuda, K.: Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In: Proc. 8th International Conference on Spoken Language Processing, Jeju, Korea (2004)Google Scholar
  12. 12.
    Richmond, K.: Estimating Articulatory Parameters from the Acoustic Speech Signal. PhD thesis, The Centre for Speech Technology Research, Edinburgh University (2002)Google Scholar
  13. 13.
    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)Google Scholar
  14. 14.
    Richmond, K.: A trajectory mixture density network for the acoustic-articulatory inversion mapping. In: Proc. Interspeech, Pittsburgh, USA (September 2006)Google Scholar
  15. 15.
    Hiroya, S., Honda, M.: Estimation of articulatory movements from speech acoustics using an HMM-based speech production model. IEEE Transactions on Speech and Audio Processing 12(2), 175–185 (2004)CrossRefGoogle Scholar
  16. 16.
    Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. ICASSP, Istanbul, Turkey, pp. 1315–1318 (June 2000)Google Scholar
  17. 17.
    Wrench, A.: The MOCHA-TIMIT articulatory database (1999),

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Korin Richmond
    • 1
  1. 1.Centre for Speech Technology Research, Edinburgh University, EdinburghUnited Kingdom

Personalised recommendations