Abstract
This chapter deals with the estimation and tracking of the movements of the spectral resonances of human vocal tracts, also known as formants. The representation or modeling of speech in terms of formants is useful in several areas of speech processing: coding, recognition, synthesis, and enhancement, as formants efficiently describe essential aspects of speech using a very limited set of parameters. However, estimating formants is more difficult than simply searching for peaks in an amplitude spectrum, as the spectral peaks of vocal-tract output depend upon a variety for factors in complicated ways: vocal-tract shape, excitation, and periodicity. We describe in detail the formal task of formant tracking, and explore its successes and difficulties, as well as giving reasons for the various approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ASR:
-
automatic speech recognition
- CZT:
-
chirp z-transform
- DFT:
-
discrete Fourier transform
- DP:
-
dynamic programming
- FFT:
-
fast Fourier transform
- FT:
-
Fourier transform
- LP:
-
linear prediction
- LPC:
-
linear prediction coefficients
- LPC:
-
linear predictive coding
- MFCC:
-
mel-filter cepstral coefficient
- MSE:
-
mean-square error
- STFT:
-
short-time Fourier transform
- TTS:
-
text-to-speech
- VT:
-
voice tokenization
- VTR:
-
vocal tract resonance
References
D. OʼShaughnessy: Speech Communication: Human and Machine, 2nd edn. (IEEE, Piscataway 2000)
J. Darch, B. Milner, S. Vaseghi: MAP prediction of formant frequencies and voicing class from MFCC vectors in noise, Speech Commun. 11, 1556-1572 (2006)
R. Togneri, L. Deng: A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients, Speech Commun. 48(8), 971-988 (2006)
K. Weber, S. Ikbal, S. Bengio, H. Bourlard: Robust speech recognition and feature extraction using HMM2, Comput. Speech Lang. 17(2-3), 195-211 (2003)
W. Ding, N. Campbell: Optimizing unit selection with voice source and formants in the CHATR speech synthesis system, Proc. Eurospeech (1997) pp. 537-540
J. Malkin, X. Li, J. Bilmes: A graphical model for formant tracking, Proc. IEEE ICASSP, Vol. 1 (2005) pp. 913-916
K. Sjlander, J. Beskow: WAVESURFER - an open source speech tool, Proc. ICSLP (2000)
L. Deng, L.J. Lee, H. Attias, A. Acero: A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 557-560
Y. Zheng, M. Hasegawa-Johnson: Formant tracking by mixture state particle filter, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 565-568
D.T. Toledano, J.G. Villardebo, L.H. Gomez: Initialization, training, and context-cependency in HMM-based formant tracking, IEEE Trans. Audio Speech 14(2), 511-523 (2006)
M. Lee, J. van Santen, B. Mobius, J. Olive: Formant tracking using context-dependent phonemic information, IEEE Trans. Speech Audio Process. 13(5), 741-750 (2005), Part 2
S. McCandless: An algorithm for automatic formant extraction using linear prediction spectra, Proc. IEEE ICASSP 22(2), 135-141 (1974)
G. Kopec: Formant tracking using hidden Markov models and vector quantization, Proc. IEEE ICASSP 34(4), 709-729 (1986)
G.K. Vallabha, B. Tuller: Systematic errors in the formant analysis of steady-state vowels, Speech Commun. 38(1-2), 141-160 (2002)
Y. Laprie, M.-O. Berger: Cooperation of regularization and speech heuristics to control automatic formant tracking, Speech Commun. 19(4), 255-269 (1996)
K. Mustafa, I.C. Bruce: Robust formant tracking for continuous speech with speaker variability, IEEE Trans. Audio Speech 14(2), 435-444 (2006)
A. Rao, R. Kumaresan: On decomposing speech into modulated components, IEEE Trans. Speech Audio Process. 8(3), 240-254 (2000)
I.C. Bruce, N.V. Karkhanis, E.D. Young, M.B. Sachs: Robust formant tracking in noise, Proc. IEEE ICASSP, Vol. 1 (2002) pp. 281-284
L. Welling, H. Ney: Formant estimation for speech recognition, IEEE Trans. Speech Audio Process. 6(1), 36-48 (1998)
B. Chen, P.C. Loizou: Formant frequency estimation in noise, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 581-584
D.J. Nelson: Cross-spectral based formant estimation and alignment, Proc. IEEE ICASSP, Vol. 2 (2004) pp. 621-624
A. Watanabe: Formant estimation method using inverse-filter control, IEEE Trans. Speech Audio Process. 9(4), 317-326 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
OʼShaughnessy, D. (2008). Formant Estimation and Tracking. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)