Summary
This paper discusses some of the possibilities for modelling speech segment trajectories in a domain which is more directly correlated with the mechanisms of speech production than the typical mel-cepstrum representation. Initial developments are described towards using linear dynamic segmental HMMs [12] to model underlying (unobserved) trajectories of features which closely reflect the nature of articulation. So far, this work has involved calculating segment probabilities using an approach which is different from that used in earlier studies (e.g. [4]), and is more consistent with the idea of treating the trajectory as unobserved. In parallel, experiments have demonstrated that formant features can be useful for HMM-based automatic speech recognition [3].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
L. Deng, G. Ramsay, and H. Sameti. From modeling surface phenomena to modeling mechanisms: Towards a faithful model of the speech process aiming at speech recognition. In Proc. IEEE Automatic Speech Recognition Workshop, pages 183–184, Snowbird, 1995.
M. J. Gales and S. J. Young. Segmental hidden Markov models. In EUROSPEECH, pages 1611–1614, Berlin, 1993.
J. N. Holmes, W. J. Holmes, and R N. Garner. Using formant frequencies in speech recognition. In EUROSPEECH, Rhodes, 1997.
W. J. Holmes and M. J. Russell. Linear dynamic segmental HMMs: Variability representation and training procedure. In ICASSP, pages 1399–1402, Munich, 1997.
A. Hu and E. Barnard. Smoothness analysis for trajectory features. In ICASSP, pages 979–982, Munich, 1997.
R. K. Moore. Signal decomposition using Markov modelling techniques. RSRE Memo 3931, RSRE, Malvern, UK, 1986.
R. K. Moore. Twenty things we still don’t know about speech. In Proceedings CRIM/FORWISS Workshop on Progress and Prospects of Speech Research Technology, 1994.
M. Ostendorf, V. V. Digalakis, and O. A. Kimball. From HMM’s to segment models: A unified view of stochastic modeling for speech recognition. IEEE Trans Speech and Audio Processing, 4 (5): 360–378, 1996.
G. Ramsay and L. Deng. Maximum-likelihood estimation for articulatory speech recognition using a stochastic target model. In EUROSPEECH, pages 1401–1404, Madrid, 1995.
H. B. Richards, J. S. Bridle, M. J. Hunt, and J. S. Mason. Vocal tract shape trajectory estimation using MLP analysis-by-synthesis. In ICASSP, pages 1287–1290, Munich, 1997.
M. J. Russell. Advances in speech recognition. In Proceedings: Institute of Acoustics, Vol. 18: Part 9, pages 267–274, 1996.
M. J. Russell and W. J. Holmes. Linear trajectory segmental HMM’s. IEEE Signal Processing Letters, 4 (3): 72–74, 1997.
R Schmid and E. Barnard. Explicit, N-best formant features for vowel classification. In ICASSP, pages 991–994, Munich, 1997.
M. J. Tomlinson, M. J. Russell, R. K. Moore, A. R Buckland, and M. A. Fawley. Modelling asynchrony in speech using elementary single-signal decomposition. In ICASSP, pages 1247–1250, Munich, 1997.
L. Welling and H. Ney. A model for efficient formant estimation. In ICASSP, pages 797–800, Atlanta, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Holmes, W.J. (1999). Trajectory Representations and Acoustic Descriptions for a Segment-Modelling Approach to Automatic Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-60087-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive