Skip to main content

Part of the book series: Springer Handbooks ((SHB))

Abstract

This chapter deals with the estimation and tracking of the movements of the spectral resonances of human vocal tracts, also known as formants. The representation or modeling of speech in terms of formants is useful in several areas of speech processing: coding, recognition, synthesis, and enhancement, as formants efficiently describe essential aspects of speech using a very limited set of parameters. However, estimating formants is more difficult than simply searching for peaks in an amplitude spectrum, as the spectral peaks of vocal-tract output depend upon a variety for factors in complicated ways: vocal-tract shape, excitation, and periodicity. We describe in detail the formal task of formant tracking, and explore its successes and difficulties, as well as giving reasons for the various approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:

automatic speech recognition

CZT:

chirp z-transform

DFT:

discrete Fourier transform

DP:

dynamic programming

FFT:

fast Fourier transform

FT:

Fourier transform

LP:

linear prediction

LPC:

linear prediction coefficients

LPC:

linear predictive coding

MFCC:

mel-filter cepstral coefficient

MSE:

mean-square error

STFT:

short-time Fourier transform

TTS:

text-to-speech

VT:

voice tokenization

VTR:

vocal tract resonance

References

  1. D. OʼShaughnessy: Speech Communication: Human and Machine, 2nd edn. (IEEE, Piscataway 2000)

    Google Scholar 

  2. J. Darch, B. Milner, S. Vaseghi: MAP prediction of formant frequencies and voicing class from MFCC vectors in noise, Speech Commun. 11, 1556-1572 (2006)

    Article  Google Scholar 

  3. R. Togneri, L. Deng: A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from Mel-cepstral coefficients, Speech Commun. 48(8), 971-988 (2006)

    Article  Google Scholar 

  4. K. Weber, S. Ikbal, S. Bengio, H. Bourlard: Robust speech recognition and feature extraction using HMM2, Comput. Speech Lang. 17(2-3), 195-211 (2003)

    Article  Google Scholar 

  5. W. Ding, N. Campbell: Optimizing unit selection with voice source and formants in the CHATR speech synthesis system, Proc. Eurospeech (1997) pp. 537-540

    Google Scholar 

  6. J. Malkin, X. Li, J. Bilmes: A graphical model for formant tracking, Proc. IEEE ICASSP, Vol. 1 (2005) pp. 913-916

    Google Scholar 

  7. K. Sjlander, J. Beskow: WAVESURFER - an open source speech tool, Proc. ICSLP (2000)

    Google Scholar 

  8. L. Deng, L.J. Lee, H. Attias, A. Acero: A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 557-560

    Google Scholar 

  9. Y. Zheng, M. Hasegawa-Johnson: Formant tracking by mixture state particle filter, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 565-568

    Google Scholar 

  10. D.T. Toledano, J.G. Villardebo, L.H. Gomez: Initialization, training, and context-cependency in HMM-based formant tracking, IEEE Trans. Audio Speech 14(2), 511-523 (2006)

    Article  Google Scholar 

  11. M. Lee, J. van Santen, B. Mobius, J. Olive: Formant tracking using context-dependent phonemic information, IEEE Trans. Speech Audio Process. 13(5), 741-750 (2005), Part 2

    Article  Google Scholar 

  12. S. McCandless: An algorithm for automatic formant extraction using linear prediction spectra, Proc. IEEE ICASSP 22(2), 135-141 (1974)

    Google Scholar 

  13. G. Kopec: Formant tracking using hidden Markov models and vector quantization, Proc. IEEE ICASSP 34(4), 709-729 (1986)

    Google Scholar 

  14. G.K. Vallabha, B. Tuller: Systematic errors in the formant analysis of steady-state vowels, Speech Commun. 38(1-2), 141-160 (2002)

    Article  MATH  Google Scholar 

  15. Y. Laprie, M.-O. Berger: Cooperation of regularization and speech heuristics to control automatic formant tracking, Speech Commun. 19(4), 255-269 (1996)

    Article  Google Scholar 

  16. K. Mustafa, I.C. Bruce: Robust formant tracking for continuous speech with speaker variability, IEEE Trans. Audio Speech 14(2), 435-444 (2006)

    Article  Google Scholar 

  17. A. Rao, R. Kumaresan: On decomposing speech into modulated components, IEEE Trans. Speech Audio Process. 8(3), 240-254 (2000)

    Article  Google Scholar 

  18. I.C. Bruce, N.V. Karkhanis, E.D. Young, M.B. Sachs: Robust formant tracking in noise, Proc. IEEE ICASSP, Vol. 1 (2002) pp. 281-284

    Google Scholar 

  19. L. Welling, H. Ney: Formant estimation for speech recognition, IEEE Trans. Speech Audio Process. 6(1), 36-48 (1998)

    Article  Google Scholar 

  20. B. Chen, P.C. Loizou: Formant frequency estimation in noise, Proc. IEEE ICASSP, Vol. 1 (2004) pp. 581-584

    Google Scholar 

  21. D.J. Nelson: Cross-spectral based formant estimation and alignment, Proc. IEEE ICASSP, Vol. 2 (2004) pp. 621-624

    Google Scholar 

  22. A. Watanabe: Formant estimation method using inverse-filter control, IEEE Trans. Speech Audio Process. 9(4), 317-326 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas OʼShaughnessy Prof. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

OʼShaughnessy, D. (2008). Formant Estimation and Tracking. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics