Formant Estimation and Tracking

OʼShaughnessy, Douglas

doi:10.1007/978-3-540-49127-9_11

Douglas OʼShaughnessy Prof.⁴

Part of the book series: Springer Handbooks ((SHB))

8150 Accesses
5 Citations

Abstract

This chapter deals with the estimation and tracking of the movements of the spectral resonances of human vocal tracts, also known as formants. The representation or modeling of speech in terms of formants is useful in several areas of speech processing: coding, recognition, synthesis, and enhancement, as formants efficiently describe essential aspects of speech using a very limited set of parameters. However, estimating formants is more difficult than simply searching for peaks in an amplitude spectrum, as the spectral peaks of vocal-tract output depend upon a variety for factors in complicated ways: vocal-tract shape, excitation, and periodicity. We describe in detail the formal task of formant tracking, and explore its successes and difficulties, as well as giving reasons for the various approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:: automatic speech recognition
CZT:: chirp z-transform
DFT:: discrete Fourier transform
DP:: dynamic programming
FFT:: fast Fourier transform
FT:: Fourier transform
LP:: linear prediction
LPC:: linear prediction coefficients
LPC:: linear predictive coding
MFCC:: mel-filter cepstral coefficient
MSE:: mean-square error
STFT:: short-time Fourier transform
TTS:: text-to-speech
VT:: voice tokenization
VTR:: vocal tract resonance

References

D. OʼShaughnessy: Speech Communication: Human and Machine, 2nd edn. (IEEE, Piscataway 2000)
Google Scholar
J. Darch, B. Milner, S. Vaseghi: MAP prediction of formant frequencies and voicing class from MFCC vectors in noise, Speech Commun. 11, 1556-1572 (2006)
Article Google Scholar
R. Togneri, L. Deng: A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients, Speech Commun. 48(8), 971-988 (2006)
Article Google Scholar
K. Weber, S. Ikbal, S. Bengio, H. Bourlard: Robust speech recognition and feature extraction using HMM2, Comput. Speech Lang. 17(2-3), 195-211 (2003)
Article Google Scholar
W. Ding, N. Campbell: Optimizing unit selection with voice source and formants in the CHATR speech synthesis system, Proc. Eurospeech (1997) pp. 537-540
Google Scholar
J. Malkin, X. Li, J. Bilmes: A graphical model for formant tracking, Proc. IEEE ICASSP, Vol. 1 (2005) pp. 913-916
Google Scholar
K. Sjlander, J. Beskow: WAVESURFER - an open source speech tool, Proc. ICSLP (2000)
Google Scholar
L. Deng, L.J. Lee, H. Attias, A. Acero: A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 557-560
Google Scholar
Y. Zheng, M. Hasegawa-Johnson: Formant tracking by mixture state particle filter, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 565-568
Google Scholar
D.T. Toledano, J.G. Villardebo, L.H. Gomez: Initialization, training, and context-cependency in HMM-based formant tracking, IEEE Trans. Audio Speech 14(2), 511-523 (2006)
Article Google Scholar
M. Lee, J. van Santen, B. Mobius, J. Olive: Formant tracking using context-dependent phonemic information, IEEE Trans. Speech Audio Process. 13(5), 741-750 (2005), Part 2
Article Google Scholar
S. McCandless: An algorithm for automatic formant extraction using linear prediction spectra, Proc. IEEE ICASSP 22(2), 135-141 (1974)
Google Scholar
G. Kopec: Formant tracking using hidden Markov models and vector quantization, Proc. IEEE ICASSP 34(4), 709-729 (1986)
Google Scholar
G.K. Vallabha, B. Tuller: Systematic errors in the formant analysis of steady-state vowels, Speech Commun. 38(1-2), 141-160 (2002)
Article MATH Google Scholar
Y. Laprie, M.-O. Berger: Cooperation of regularization and speech heuristics to control automatic formant tracking, Speech Commun. 19(4), 255-269 (1996)
Article Google Scholar
K. Mustafa, I.C. Bruce: Robust formant tracking for continuous speech with speaker variability, IEEE Trans. Audio Speech 14(2), 435-444 (2006)
Article Google Scholar
A. Rao, R. Kumaresan: On decomposing speech into modulated components, IEEE Trans. Speech Audio Process. 8(3), 240-254 (2000)
Article Google Scholar
I.C. Bruce, N.V. Karkhanis, E.D. Young, M.B. Sachs: Robust formant tracking in noise, Proc. IEEE ICASSP, Vol. 1 (2002) pp. 281-284
Google Scholar
L. Welling, H. Ney: Formant estimation for speech recognition, IEEE Trans. Speech Audio Process. 6(1), 36-48 (1998)
Article Google Scholar
B. Chen, P.C. Loizou: Formant frequency estimation in noise, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 581-584
Google Scholar
D.J. Nelson: Cross-spectral based formant estimation and alignment, Proc. IEEE ICASSP, Vol. 2 (2004) pp. 621-624
Google Scholar
A. Watanabe: Formant estimation method using inverse-filter control, IEEE Trans. Speech Audio Process. 9(4), 317-326 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

INRS Énergie, Matériaux et Télécommunications (INRS-EMT), Université du Québec, 800, de la Gauchetiere Ouest, H5A 1K6, Montréal, Québec, Canada
Douglas OʼShaughnessy Prof.

Authors

Douglas OʼShaughnessy Prof.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas OʼShaughnessy Prof. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

OʼShaughnessy, D. (2008). Formant Estimation and Tracking. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics