Skip to main content

Trajectory Representations and Acoustic Descriptions for a Segment-Modelling Approach to Automatic Speech Recognition

  • Chapter
  • 227 Accesses

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

Summary

This paper discusses some of the possibilities for modelling speech segment trajectories in a domain which is more directly correlated with the mechanisms of speech production than the typical mel-cepstrum representation. Initial developments are described towards using linear dynamic segmental HMMs [12] to model underlying (unobserved) trajectories of features which closely reflect the nature of articulation. So far, this work has involved calculating segment probabilities using an approach which is different from that used in earlier studies (e.g. [4]), and is more consistent with the idea of treating the trajectory as unobserved. In parallel, experiments have demonstrated that formant features can be useful for HMM-based automatic speech recognition [3].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. Deng, G. Ramsay, and H. Sameti. From modeling surface phenomena to modeling mechanisms: Towards a faithful model of the speech process aiming at speech recognition. In Proc. IEEE Automatic Speech Recognition Workshop, pages 183–184, Snowbird, 1995.

    Google Scholar 

  2. M. J. Gales and S. J. Young. Segmental hidden Markov models. In EUROSPEECH, pages 1611–1614, Berlin, 1993.

    Google Scholar 

  3. J. N. Holmes, W. J. Holmes, and R N. Garner. Using formant frequencies in speech recognition. In EUROSPEECH, Rhodes, 1997.

    Google Scholar 

  4. W. J. Holmes and M. J. Russell. Linear dynamic segmental HMMs: Variability representation and training procedure. In ICASSP, pages 1399–1402, Munich, 1997.

    Google Scholar 

  5. A. Hu and E. Barnard. Smoothness analysis for trajectory features. In ICASSP, pages 979–982, Munich, 1997.

    Google Scholar 

  6. R. K. Moore. Signal decomposition using Markov modelling techniques. RSRE Memo 3931, RSRE, Malvern, UK, 1986.

    Google Scholar 

  7. R. K. Moore. Twenty things we still don’t know about speech. In Proceedings CRIM/FORWISS Workshop on Progress and Prospects of Speech Research Technology, 1994.

    Google Scholar 

  8. M. Ostendorf, V. V. Digalakis, and O. A. Kimball. From HMM’s to segment models: A unified view of stochastic modeling for speech recognition. IEEE Trans Speech and Audio Processing, 4 (5): 360–378, 1996.

    Article  Google Scholar 

  9. G. Ramsay and L. Deng. Maximum-likelihood estimation for articulatory speech recognition using a stochastic target model. In EUROSPEECH, pages 1401–1404, Madrid, 1995.

    Google Scholar 

  10. H. B. Richards, J. S. Bridle, M. J. Hunt, and J. S. Mason. Vocal tract shape trajectory estimation using MLP analysis-by-synthesis. In ICASSP, pages 1287–1290, Munich, 1997.

    Google Scholar 

  11. M. J. Russell. Advances in speech recognition. In Proceedings: Institute of Acoustics, Vol. 18: Part 9, pages 267–274, 1996.

    Google Scholar 

  12. M. J. Russell and W. J. Holmes. Linear trajectory segmental HMM’s. IEEE Signal Processing Letters, 4 (3): 72–74, 1997.

    Article  Google Scholar 

  13. R Schmid and E. Barnard. Explicit, N-best formant features for vowel classification. In ICASSP, pages 991–994, Munich, 1997.

    Google Scholar 

  14. M. J. Tomlinson, M. J. Russell, R. K. Moore, A. R Buckland, and M. A. Fawley. Modelling asynchrony in speech using elementary single-signal decomposition. In ICASSP, pages 1247–1250, Munich, 1997.

    Google Scholar 

  15. L. Welling and H. Ney. A model for efficient formant estimation. In ICASSP, pages 797–800, Atlanta, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Holmes, W.J. (1999). Trajectory Representations and Acoustic Descriptions for a Segment-Modelling Approach to Automatic Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-60087-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-64250-0

  • Online ISBN: 978-3-642-60087-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics