Abstract
The application of the Kalman filter to the single-microphone speech enhancement task is presented in this chapter. Among numerous published algorithms, an important sub-group employs the estimate-maximize (EM) procedure to iteratively estimate the spectral parameters of the speech and noise signals. We elaborate on a specific member of this sub-group. In the E-step, the Kalman smoother is applied and in the M-step, a non-standard Yule-Walker equation set is solved. An approximated EM algorithm is derived by applying the gradient-descent method to the likelihood function. We obtain a sequential, computationally efficient, algorithm. It is then shown, that the sequential parameter estimation can be replaced by a Kalman filter to obtain a dual speech and parameters Kalman filter. A natural generalization to the dual scheme is an estimation scheme in which both speech and parameters are jointly estimated by applying a nonlinear extension to the Kalman filter, namely the unscented Kalman filter. Extensive experimental study, using real speech and noise signals is provided to compare the proposed methods with alternative speech enhancement algorithms. Kalman filter based algorithms are shown to maintain the natural speech quality. However, their noise reduction ability is limited.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
N. Wiener, The Extrapolation, Interpolation and Smoothing of Stationary Time Series. John Wiley & Sons, Inc., New York, N.Y., USA, 1949.
R. E. Kalman, “A new approach to linear filtering and prediction problems,” Trans. of the ASME-Journal of Basic Engineering, 82 (Series D), pp. 35–45, 1960.
A. P. Dempster, N. M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Stat. Soc., Ser. 3g, pp. 1–38, 1977.
J. S. Lim and A. V. Oppenheim, “All-pole modeling of degraded speech,” IEEE Trans. on Acoustic, Speech and Signal Processing, vol. 26, pp. 197–210, June 1978.
J. H. L. Hansen and M. A. Clements, “Constrained iterative speech enhancement with application to automatic speech recognition,” in Proc. IEEE ICASSP, 1988, pp. 561–564.
J. H. L. Hansen and M. A. Clements, “Constrained iterative speech enhancement with application to speech recognition,” IEEE Trans. on Signal Processing, vol. 39, pp. 795–805, Apr. 1991.
B. L. Pellom and J. H. L. Hansen, “An improved constrained iterative speech enhancement for colored noise environments,” IEEE Trans. on Speech and Audio Processing, vol. 6, pp. 573–579, Nov. 1998.
E. Masgrau, J. Salavedra, A. Moreno, and A. Ardanuy, “Speech enhancement by adaptive Wiener filtering based on cumulant AR modeling,” in M. Grenie and J. C. Junqua, editors, Speech Processing in Adverse Conditions, pp. 143–146. 1992.
K. K. Paliwal and A. Basu, “A Speech enhancement method based on Kalman filtering,” in Proc. IEEE ICASSP, 1987, pp. 177–180.
B. Koo, J. D. Gibson, and S. D. Gray, “Filtering of colored noise for speech enhancement and coding,” in Proc. IEEE ICASSP, 1989, pp. 349–352.
J. D. Gibson, B. Koo, and S. D. Gray, “Filtering of colored noise for speech enhancement and coding,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 39, pp. 1732–1742, Aug. 1991.
M. Feder, A. V. Oppenheim, and E. Weinstein, “Methods for noise cancellation based on the EM algorithm,” in Proc. IEEE ICASSP, 1987, pp. 201–204.
B. Widrow, J. R. Glover Jr., J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn, J. R. Zeider, E. Dong Jr., and R. C. Goodlin, “Adaptive noise cancelling: principals and applications,” Proceeding of the IEEE, vol. 63, 1692–1716, Dec. 1975.
M. Feder, A. V. Oppenheim, and E. Weinstein, “Maximum likelihood noise cancellation using the EM algorithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 204–216, Feb. 1989.
E. Weinstein, A. V. Oppenheim, and M. Feder, “Signal enhancement using single and multi-sensor measurements,” Technical Report no. 560, M.I.T, Cambridge, MA, Nov. 1990.
E. Weinstein, A. V. Oppenheim, M. Feder, and J. R. Buck, “Iterative and sequential algorithms for multisensor signal enhancement,” IEEE Trans. Signal Processing, vol. 42, pp. 846–859, Apr. 1994.
M. Feder, E. Weinstein, and A. V. Oppenheim, “A new class of sequential and adaptive algorithms with application to noise cancellation,” in Proc. IEEE ICASSP, 1988, pp. 557–560.
A. V. Oppenheim, E. Weinstein, K. C. Zangi, M. Feder, and D. Gauger, “Single-sensor active noise cancellation,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 285–290, Apr. 1994.
B.-G. Lee, K. Y. Lee, and S. Ann, “An EM-based approach for parameter enhancement with an application to speech signals,” Signal Processing, vol. 46, pp. 1–14, 1995.
K. Y. Lee, B.-G. Lee, and S. Ann, “Adaptive filtering for speech enhancement in colored noise,” IEEE Signal Processing Letters, vol. 4, pp. 277–279, Oct. 1997.
Z. Goh, K.-C. Tan, and B. T. G. Tan, “Kalman-filtering speech enhancement method based on a voiced-unvoiced speech model,” IEEE Trans. on Speech and Audio Processing, vol. 7, pp. 510–524, Sept. 1999.
M. Gabrea, E. Grivel, and M. Najim, “A single microphone Kalman filter-based noise canceller,” IEEE Signal Processing Letters, vol. 6, pp. 55–57, Mar. 1999.
S. Gannot, D. Burshtein, and E. Weinstein, “Iterative and sequential Kalman filter-based speech enhancement algorithms,” IEEE Trans. on Speech and Audio Processing, vol. 6, pp. 373–385, July 1998.
M. Fujimoto and Y. Ariki, “Noisy speech recognition using noise reduction method based on Kalman filter,” in Proc. IEEE ICASSP, 2000, pp. 1727–1730.
S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 27, pp. 113–120, Apr. 1979.
K. Y. Lee, B.-G. Lee, I. Song, and S. Ann, “Robust estimation of AR parameters and its application for speech enhancement,” in Proc. IEEE ICASSP, 1992, pp. 309–312.
N. Ma, M. Bouchard, and R. A. Goubran, “Perceptual Kalman filtering for speech enhancement in colored noise,” in Proc. IEEE ICASSP, vol. 1, 2004, pp. 717–720.
X. Shen and L. Deng, “A dynamic system approach to speech enhancement using the H∞ filtering algorithm,” IEEE Trans. on Speech and Audio Processing, vol. 27, pp. 391–399, July 1999.
E. A. Wan and A. T. Nelson, “Removal of noise from speech using the dual EKF algorithm,” in Proc. IEEE ICASSP, 1998.
S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,” Proceedings of the IEEE, vol. 92, pp. 401–422, Mar. 2004.
E. A. Wan and R. van der Merwe, “The unscented Kalman filter for nonlinear estimation,” in Proc. IEEE Symposium on Adaptive Systems for Signal Processing, Communication and Control (AS-SPCC), 2000.
S. Gannot and M. Moonen, “On the application of the unscented Kalman filter to speech processing,” in Proc. IWAENC, 2003, pp. 27–30.
W. Fong and S. Godsill, “Monte Carlo smoothing with application to audio signal enhancement,” in Proc. IEEE SSP Workshop, 2001, pp. 18–210.
Y. Ephraim, D. Malah, and B. H. Juang, “On the application of hidden Markov models for enhancing noisy speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 1846–1856, 1989.
Y. Ephraim, “A Bayesian estimation approach for speech enhancement using hidden Markov models,” IEEE Trans. Signal Processing, vol. 40, pp. 725–735, 1992.
Y. Ephraim, “Speech enhancement using state dependent dynamical system model,” in Proc. IEEE ICASSP, 1992, pp. 289–292.
K. Y. Lee and K. Shirai, “Efficient recursive estimation for speech enhancement in colored noise,” IEEE Signal Processing Letters, vol. 3, pp. 196–199, 1996.
K. Y. Lee and S. Jung, “Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise,” IEEE Trans. Speech and Audio Proc., vol. 8, pp. 373–385, May 2000.
J. B. Kim, K. Y. Lee, and C. W. Lee, “On the applications of the interacting multiple model algorithm for enhancing noisy speech,” IEEE Trans. Speech and Audio Processing, vol. 8, pp. 349–352, May 2000.
K. Y. Lee, S. McLaughlin, and K. Shirai, “Speech enhancement based on extended Kalman filter and neural predictive hidden Markov model,” in Proc. IEEE Int. Workshop Neural Networks for Signal Processing, 1996, pp. 302–310.
Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 33, pp. 443–445, Apr. 1985.
I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments Signal Processing, vol. 81, pp. 2403–2418, Oct. 2001.
D. Burshtein and S. Gannot, “Speech enhancement using a mixture-maximum model,” IEEE Trans. Speech and Audio Processing, vol. 10, pp. 341–351, Sept. 2002.
D. Burshtein, “Joint modeling and maximum-likelihood estimation of pitch and linear prediction coefficient parameters,” J. Acoustic Society of America, vol. 3, pp. 1531–1537, Mar. 1992.
R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing and forecasting using the EM algorithm,” J. Time Series Anal., vol. 3, no. 7, pp. 253–264, 1982.
C. L. Nikias and A. P. Petropulu, Higher-Order Spectra Analysis. Pearson Education POD, 1st edition, 1993.
K. K. Paliwal and M. M. Sondhi, “Recognition of noisy speech using cumulant based linear prediction analysis,” in Proc. IEEE ICASSP, 1991, pp. 429–432.
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, “Acoustic-phonetic continuous speech corpus (timit),” CD-ROM, Oct. 1991.
R. G. Leonard and G. Doddington, “A database for speaker independent digit recognition (tidigits),” CD-ROM, Oct. 1984.
A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: a database and an axperiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, pp. 247–251, July 1993.
ANSI, “Specifications for octave-band and fractional-octave-band analog and digital filters,” S1.1-1986 (ASA 65-1986), 1993.
S. R. Quackenbush, T. P. Barnwell, and M. A. Clements, Objective Measures of Speech Quality. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1988.
S. Gannot, “Audio sample files,” http://www.biu.ac.il/~gannot, Oct. 2004.
R. van der Merwe, “Recursive Bayesian estimation library (ReBEL),” http://cslu.ece.ogi.edu/mlsp/rebel/, 2002.
R. A. Fisher, “Theory of statistical estimation,” Proc. of the Cambridge Philosophical Society, vol. 22, pp. 700–725, 1925.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gannot, S. (2005). Speech Enhancement: Application of the Kalman Filter in the Estimate-Maximize (EM) Framework. In: Speech Enhancement. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27489-8_8
Download citation
DOI: https://doi.org/10.1007/3-540-27489-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24039-6
Online ISBN: 978-3-540-27489-6
eBook Packages: EngineeringEngineering (R0)