Skip to main content

Speech Enhancement: Application of the Kalman Filter in the Estimate-Maximize (EM) Framework

  • Chapter

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

The application of the Kalman filter to the single-microphone speech enhancement task is presented in this chapter. Among numerous published algorithms, an important sub-group employs the estimate-maximize (EM) procedure to iteratively estimate the spectral parameters of the speech and noise signals. We elaborate on a specific member of this sub-group. In the E-step, the Kalman smoother is applied and in the M-step, a non-standard Yule-Walker equation set is solved. An approximated EM algorithm is derived by applying the gradient-descent method to the likelihood function. We obtain a sequential, computationally efficient, algorithm. It is then shown, that the sequential parameter estimation can be replaced by a Kalman filter to obtain a dual speech and parameters Kalman filter. A natural generalization to the dual scheme is an estimation scheme in which both speech and parameters are jointly estimated by applying a nonlinear extension to the Kalman filter, namely the unscented Kalman filter. Extensive experimental study, using real speech and noise signals is provided to compare the proposed methods with alternative speech enhancement algorithms. Kalman filter based algorithms are shown to maintain the natural speech quality. However, their noise reduction ability is limited.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series. John Wiley & Sons, Inc., New York, N.Y., USA, 1949.

    Google Scholar 

  2. R. E. Kalman, “A new approach to linear filtering and prediction problems,” Trans. of the ASME-Journal of Basic Engineering, 82 (Series D), pp. 35–45, 1960.

    Google Scholar 

  3. A. P. Dempster, N. M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Stat. Soc., Ser. 3g, pp. 1–38, 1977.

    Google Scholar 

  4. J. S. Lim and A. V. Oppenheim, “All-pole modeling of degraded speech,” IEEE Trans. on Acoustic, Speech and Signal Processing, vol. 26, pp. 197–210, June 1978.

    Article  MATH  Google Scholar 

  5. J. H. L. Hansen and M. A. Clements, “Constrained iterative speech enhancement with application to automatic speech recognition,” in Proc. IEEE ICASSP, 1988, pp. 561–564.

    Google Scholar 

  6. J. H. L. Hansen and M. A. Clements, “Constrained iterative speech enhancement with application to speech recognition,” IEEE Trans. on Signal Processing, vol. 39, pp. 795–805, Apr. 1991.

    Article  Google Scholar 

  7. B. L. Pellom and J. H. L. Hansen, “An improved constrained iterative speech enhancement for colored noise environments,” IEEE Trans. on Speech and Audio Processing, vol. 6, pp. 573–579, Nov. 1998.

    Article  Google Scholar 

  8. E. Masgrau, J. Salavedra, A. Moreno, and A. Ardanuy, “Speech enhancement by adaptive Wiener filtering based on cumulant AR modeling,” in M. Grenie and J. C. Junqua, editors, Speech Processing in Adverse Conditions, pp. 143–146. 1992.

    Google Scholar 

  9. K. K. Paliwal and A. Basu, “A Speech enhancement method based on Kalman filtering,” in Proc. IEEE ICASSP, 1987, pp. 177–180.

    Google Scholar 

  10. B. Koo, J. D. Gibson, and S. D. Gray, “Filtering of colored noise for speech enhancement and coding,” in Proc. IEEE ICASSP, 1989, pp. 349–352.

    Google Scholar 

  11. J. D. Gibson, B. Koo, and S. D. Gray, “Filtering of colored noise for speech enhancement and coding,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 39, pp. 1732–1742, Aug. 1991.

    Google Scholar 

  12. M. Feder, A. V. Oppenheim, and E. Weinstein, “Methods for noise cancellation based on the EM algorithm,” in Proc. IEEE ICASSP, 1987, pp. 201–204.

    Google Scholar 

  13. B. Widrow, J. R. Glover Jr., J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn, J. R. Zeider, E. Dong Jr., and R. C. Goodlin, “Adaptive noise cancelling: principals and applications,” Proceeding of the IEEE, vol. 63, 1692–1716, Dec. 1975.

    Google Scholar 

  14. M. Feder, A. V. Oppenheim, and E. Weinstein, “Maximum likelihood noise cancellation using the EM algorithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 204–216, Feb. 1989.

    Article  Google Scholar 

  15. E. Weinstein, A. V. Oppenheim, and M. Feder, “Signal enhancement using single and multi-sensor measurements,” Technical Report no. 560, M.I.T, Cambridge, MA, Nov. 1990.

    Google Scholar 

  16. E. Weinstein, A. V. Oppenheim, M. Feder, and J. R. Buck, “Iterative and sequential algorithms for multisensor signal enhancement,” IEEE Trans. Signal Processing, vol. 42, pp. 846–859, Apr. 1994.

    Article  Google Scholar 

  17. M. Feder, E. Weinstein, and A. V. Oppenheim, “A new class of sequential and adaptive algorithms with application to noise cancellation,” in Proc. IEEE ICASSP, 1988, pp. 557–560.

    Google Scholar 

  18. A. V. Oppenheim, E. Weinstein, K. C. Zangi, M. Feder, and D. Gauger, “Single-sensor active noise cancellation,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 285–290, Apr. 1994.

    Article  Google Scholar 

  19. B.-G. Lee, K. Y. Lee, and S. Ann, “An EM-based approach for parameter enhancement with an application to speech signals,” Signal Processing, vol. 46, pp. 1–14, 1995.

    Article  MATH  Google Scholar 

  20. K. Y. Lee, B.-G. Lee, and S. Ann, “Adaptive filtering for speech enhancement in colored noise,” IEEE Signal Processing Letters, vol. 4, pp. 277–279, Oct. 1997.

    Article  Google Scholar 

  21. Z. Goh, K.-C. Tan, and B. T. G. Tan, “Kalman-filtering speech enhancement method based on a voiced-unvoiced speech model,” IEEE Trans. on Speech and Audio Processing, vol. 7, pp. 510–524, Sept. 1999.

    Article  Google Scholar 

  22. M. Gabrea, E. Grivel, and M. Najim, “A single microphone Kalman filter-based noise canceller,” IEEE Signal Processing Letters, vol. 6, pp. 55–57, Mar. 1999.

    Article  Google Scholar 

  23. S. Gannot, D. Burshtein, and E. Weinstein, “Iterative and sequential Kalman filter-based speech enhancement algorithms,” IEEE Trans. on Speech and Audio Processing, vol. 6, pp. 373–385, July 1998.

    Article  Google Scholar 

  24. M. Fujimoto and Y. Ariki, “Noisy speech recognition using noise reduction method based on Kalman filter,” in Proc. IEEE ICASSP, 2000, pp. 1727–1730.

    Google Scholar 

  25. S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 27, pp. 113–120, Apr. 1979.

    Article  Google Scholar 

  26. K. Y. Lee, B.-G. Lee, I. Song, and S. Ann, “Robust estimation of AR parameters and its application for speech enhancement,” in Proc. IEEE ICASSP, 1992, pp. 309–312.

    Google Scholar 

  27. N. Ma, M. Bouchard, and R. A. Goubran, “Perceptual Kalman filtering for speech enhancement in colored noise,” in Proc. IEEE ICASSP, vol. 1, 2004, pp. 717–720.

    Google Scholar 

  28. X. Shen and L. Deng, “A dynamic system approach to speech enhancement using the H∞ filtering algorithm,” IEEE Trans. on Speech and Audio Processing, vol. 27, pp. 391–399, July 1999.

    Article  Google Scholar 

  29. E. A. Wan and A. T. Nelson, “Removal of noise from speech using the dual EKF algorithm,” in Proc. IEEE ICASSP, 1998.

    Google Scholar 

  30. S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,” Proceedings of the IEEE, vol. 92, pp. 401–422, Mar. 2004.

    Article  Google Scholar 

  31. E. A. Wan and R. van der Merwe, “The unscented Kalman filter for nonlinear estimation,” in Proc. IEEE Symposium on Adaptive Systems for Signal Processing, Communication and Control (AS-SPCC), 2000.

    Google Scholar 

  32. S. Gannot and M. Moonen, “On the application of the unscented Kalman filter to speech processing,” in Proc. IWAENC, 2003, pp. 27–30.

    Google Scholar 

  33. W. Fong and S. Godsill, “Monte Carlo smoothing with application to audio signal enhancement,” in Proc. IEEE SSP Workshop, 2001, pp. 18–210.

    Google Scholar 

  34. Y. Ephraim, D. Malah, and B. H. Juang, “On the application of hidden Markov models for enhancing noisy speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 1846–1856, 1989.

    Article  Google Scholar 

  35. Y. Ephraim, “A Bayesian estimation approach for speech enhancement using hidden Markov models,” IEEE Trans. Signal Processing, vol. 40, pp. 725–735, 1992.

    Article  Google Scholar 

  36. Y. Ephraim, “Speech enhancement using state dependent dynamical system model,” in Proc. IEEE ICASSP, 1992, pp. 289–292.

    Google Scholar 

  37. K. Y. Lee and K. Shirai, “Efficient recursive estimation for speech enhancement in colored noise,” IEEE Signal Processing Letters, vol. 3, pp. 196–199, 1996.

    Article  Google Scholar 

  38. K. Y. Lee and S. Jung, “Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise,” IEEE Trans. Speech and Audio Proc., vol. 8, pp. 373–385, May 2000.

    Google Scholar 

  39. J. B. Kim, K. Y. Lee, and C. W. Lee, “On the applications of the interacting multiple model algorithm for enhancing noisy speech,” IEEE Trans. Speech and Audio Processing, vol. 8, pp. 349–352, May 2000.

    Article  Google Scholar 

  40. K. Y. Lee, S. McLaughlin, and K. Shirai, “Speech enhancement based on extended Kalman filter and neural predictive hidden Markov model,” in Proc. IEEE Int. Workshop Neural Networks for Signal Processing, 1996, pp. 302–310.

    Google Scholar 

  41. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 33, pp. 443–445, Apr. 1985.

    Article  Google Scholar 

  42. I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments Signal Processing, vol. 81, pp. 2403–2418, Oct. 2001.

    Article  MATH  Google Scholar 

  43. D. Burshtein and S. Gannot, “Speech enhancement using a mixture-maximum model,” IEEE Trans. Speech and Audio Processing, vol. 10, pp. 341–351, Sept. 2002.

    Article  Google Scholar 

  44. D. Burshtein, “Joint modeling and maximum-likelihood estimation of pitch and linear prediction coefficient parameters,” J. Acoustic Society of America, vol. 3, pp. 1531–1537, Mar. 1992.

    Article  Google Scholar 

  45. R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing and forecasting using the EM algorithm,” J. Time Series Anal., vol. 3, no. 7, pp. 253–264, 1982.

    MATH  Google Scholar 

  46. C. L. Nikias and A. P. Petropulu, Higher-Order Spectra Analysis. Pearson Education POD, 1st edition, 1993.

    Google Scholar 

  47. K. K. Paliwal and M. M. Sondhi, “Recognition of noisy speech using cumulant based linear prediction analysis,” in Proc. IEEE ICASSP, 1991, pp. 429–432.

    Google Scholar 

  48. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, “Acoustic-phonetic continuous speech corpus (timit),” CD-ROM, Oct. 1991.

    Google Scholar 

  49. R. G. Leonard and G. Doddington, “A database for speaker independent digit recognition (tidigits),” CD-ROM, Oct. 1984.

    Google Scholar 

  50. A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: a database and an axperiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, pp. 247–251, July 1993.

    Article  Google Scholar 

  51. ANSI, “Specifications for octave-band and fractional-octave-band analog and digital filters,” S1.1-1986 (ASA 65-1986), 1993.

    Google Scholar 

  52. S. R. Quackenbush, T. P. Barnwell, and M. A. Clements, Objective Measures of Speech Quality. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1988.

    Google Scholar 

  53. S. Gannot, “Audio sample files,” http://www.biu.ac.il/~gannot, Oct. 2004.

    Google Scholar 

  54. R. van der Merwe, “Recursive Bayesian estimation library (ReBEL),” http://cslu.ece.ogi.edu/mlsp/rebel/, 2002.

    Google Scholar 

  55. R. A. Fisher, “Theory of statistical estimation,” Proc. of the Cambridge Philosophical Society, vol. 22, pp. 700–725, 1925.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gannot, S. (2005). Speech Enhancement: Application of the Kalman Filter in the Estimate-Maximize (EM) Framework. In: Speech Enhancement. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27489-8_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-27489-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24039-6

  • Online ISBN: 978-3-540-27489-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics