Abstract
The existence of noise is inevitable. In all applications that are related to voice and speech, from sound recording, telecommunications, and telecollaborations, to human-machine interfaces, the signal of interest that is picked up by a microphone is generally contaminated by noise. As a result, the microphone signal has to be cleaned up with digital signal-processing tools before it is stored, analyzed, transmitted, or played out. The cleaning process, which is often referred to as either noise reduction or speech enhancement, has attracted a considerable amount of research and engineering attention for several decades. Remarkable advances have already been made, and this area is continuing to progress, with the aim of creating processors that can extract the desired speech signal as if there is no noise. This chapter presents a methodical overview of the state of the art of noise-reduction algorithms. Based on their theoretical origin, the algorithms are categorized into three fundamental classes: filtering techniques, spectral restoration, and model-based methods. We outline the basic ideas underlying these approaches, discuss their characteristics, explain their intrinsic relationships, and review their advantages and disadvantages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- AR:
-
autoregressive
- CE:
-
categorical estimation
- CRLB:
-
Cramèr-Rao lower bound
- DFT:
-
discrete Fourier transform
- EM:
-
estimate-maximize
- FFT:
-
fast Fourier transform
- FIR:
-
finite impulse response
- HMM:
-
hidden Markov models
- HNM:
-
harmonic-plus-noise model
- IDFT:
-
inverse DFT
- LP:
-
linear prediction
- LPC:
-
linear predictive coding
- MAP:
-
maximum a posteriori
- ML:
-
maximum-likelihood
- MMSE:
-
minimum mean-square error
- MOS:
-
mean opinion score
- MSE:
-
mean-square error
- PSD:
-
power spectral density
- SNR:
-
signal-to-noise ratio
- STFT:
-
short-time Fourier transform
References
J. Benesty, S. Makino, J. Chen (Eds.): Speech Enhancement (Springer, Berlin, Heidelberg 2005)
D.H. Johnson, D.E. Dudgeon: Array Signal Processing: Concepts and Techniques (Prentice Hall, Upper Saddle River 1993)
M. Brandstein, D. Ward (Eds.): Microphone Arrays: Signal Processing Techniques and Applications (Springer, Berlin, Heidelberg 2001)
Y. Huang, J. Benesty (Eds.): Audio Signal Processing for Next-Generation Multimedia Communication Systems (Kluwer Academic, Boston 2004)
B. Widrow, J.R. Glover, J.M. McCool, J. Kaunitz, C.S. Williams, R.H. Hearn, J.R. Zeidler, E. Dong, R.C. Goodwin: Adaptive noise canceling: principles and applications, Proc. IEEE 63, 1692-1716 (1975)
B. Widrow, S.D. Stearns: Adaptive Signal Processing (Prentice Hall, Englewood Cliffs 1985)
M.M. Goulding, J.S. Bird: Speech enhancement for mobile telephony, IEEE Trans. Veh. Technol. 39, 316-326 (1990)
H.J. Kushner: On closed-loop adaptive noise cancellation, IEEE Trans. Automat. Contr. 43, 1103-1107 (1998)
A.S. Abutaled: An adaptive filter for noise canceling, IEEE Trans. Circuits Syst. 35, 1201-1209 (1998)
M. R. Schroeder: U.S. Patent No. 3180936, filed Dec. 1, 1960, issued Apr. 27, 1965
M. R. Schroeder: U.S. Patent No. 3403224, filed May 28, 1965, issued Sept. 24, 1968
S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process. ASSP-27, 113-120 (1979)
J.S. Lim, A.V. Oppenheim: Enhancement and bandwidth compression of noisy speech, Proc. IEEE 67, 1586-1604 (1979)
J.S. Lim (Ed.): Speech Enhancement (Prentice Hall, Englewood Cliffs 1983)
P. Vary: Noise suppression by spectral magnitude estimation-mechanism and theoretical limits, Signal Process. 8, 387-400 (1985)
R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process. 9, 504-512 (2001)
W. Etter, G.S. Moschytz: Noise reduction by noise-adaptive spectral magnitude expansion, J. Audio Eng. Soc. 42, 341-349 (1994)
J. Chen, J. Benesty, Y. Huang, S. Doclo: New insights into the noise reduction Wiener filter, IEEE Trans. Speech Audio Process. 14, 1218-1234 (2006)
Y. Ephraim, H.L. Van Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 3, 251-266 (1995)
M. Dendrinos, S. Bakamidis, G. Garayannis: Speech enhancement from noise: A regenerative approach, Speech Commun. 10, 45-57 (1991)
P.S.K. Hansen: Signal Subspace Methods for Speech Enhancement, Ph.D. Dissertation (Tech. Univ. Denmark, Lyngby 1997)
S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sørensen: Reduction of broad-band noise in speech by truncated QSVD, IEEE Trans. Speech Audio Process. 3, 439-448 (1995)
H. Lev-Ari, Y. Ephraim: Extension of the signal subspace speech enhancement approach to colored noise, IEEE Trans. Speech Audio Process. 10, 104-106 (2003)
A. Rezayee, S. Gazor: An adaptive KLT approach for speech enhancement, IEEE Trans. Speech Audio Process. 9, 87-95 (2001)
U. Mittal, N. Phamdo: Signal/noise KLT based approach for enhancing speech degraded by colored noise, IEEE Trans. Speech Audio Process. 8, 159-167 (2000)
Y. Hu, P.C. Loizou: A generalized subspace approach for enhancing spech corrupted by colored noise, IEEE Trans. Speech Audio Process. 11, 334-341 (2003)
Y. Ephraim, D. Malah: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. 32, 1109-1121 (1984)
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-33, 443-445 (1985)
R.J. McAulay, M.L. Malpass: Speech enhancement using a soft-decision noise suppression filter, IEEE Trans. Acoust. Speech Signal Process. 28, 137-145 (1980)
P.J. Wolfe, S.J. Godsill: Simple alternatives to the Ephraim and Malah suppression rule for speech ehancemnet, Proc. IEEE ICASSP 2001, 496-499 (2001)
K.K. Paliwal, A. Basu: A speech enhancement method based on Kalman filtering, Proc. IEEE ICASSP 1987, 177-180 (1987)
J.D. Gibson, B. Koo, S.D. Gray: Filtering of colored noise for speech enhancement and coding, IEEE Trans. Signal Process. 39, 1732-1742 (1991)
S. Gannot, D. Burshtein, E. Weinstein: Iterative and sequential Kalman filter-based speech enhancement algorithms, IEEE Trans. Speech Audio Process. 6, 373-385 (1998)
Y. Ephraim, D. Malah, B.-H. Juang: On the application of hidden Markov models for enhancing noisy speech, IEEE Trans. Acoust. Speech Signal Process. 37, 1846-1856 (1989)
Y. Ephraim: A Bayesian estimation approach for speech enhancement using hidden Markov models, IEEE Trans. Signal Process. 40, 725-735 (1992)
Y. Ephraim: Statstical-model-based speech enhancement systems, Proc. IEEE 80, 1526-1555 (1992)
D. Klatt: Review of test-to-speech conversion for English, J. Acoust. Soc. Am. 82, 737-793 (1987)
U. Jekosch: Speech quality assessment and evaluation, Proc. Eurospeech 1993, 1387-1394 (1993)
C. Delogu, P. Paoloni, P. Pocci, C. Sementina: Quality evaluation of text-to-speech synthesizers using magnitude estimation, categorical estimation, pair comparison and reaction time methods, Proc. Eurospeech 1991, 353-356 (1991)
S.R. Quackenbush, T.P. Barnwell, M.A. Clements: Objective Measures of Speech Quality (Prentice Hall, Englewood Cliffs 1988)
L.R. Rabiner, B.H. Juang: Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs 1993)
D. Mansour, B.H. Juang: A family of distortion meansures based upon projection operation for robust speech recognition, IEEE Trans. Acoust. Speech Signal Process. 37, 1659-1671 (1989)
F. Itakura, S. Saito: A statistical method for estimation of speech spectral density and formant frequencies, Electron. Commun. Jpn. 53A, 36-43 (1970)
G. Chen, S.N. Koh, I.Y. Soon: Enhanced Itakura measure incorporating masking properties of human auditory system, Signal Process. 83, 1445-1456 (2003)
K. Fukunaga: Introduction to Statistial Pattern Recognition (Academic, San Diego 1990)
N. Wiener: Extrapolation, Interpolation, and Smoothing of Stationary Time Series (Wiley, New York 1949)
H.L. Van Trees: Dection, Estimation, and Modulation Theory, Part I (Wiley, New York 1968)
R. Martin: Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors, Proc. IEEE ICASSP 2002, I253-I256 (2002)
I.S. Gradshteyn, I.M. Ryzhik, A. Jeffery, D. Zwillinger (Eds.): Table of Integrals, Series, and Products (Academic, San Diego 2000)
C. Breithaupt, R. Martin: MMSE estimation fo magnitude-square DFT coefficients with supergaussian priors, Proc. IEEE ICASSP 2003, I848-I851 (2003)
I. Cohen: Speech enhancement using supergaussian speech models and noncausal a priori SNR estimation, Speech Commun. 47, 336-350 (2005)
S.O. Rice: Stasitical properties of a sinewave plus random noise, Bell System Tech. J. 0, 109-157 (1948)
D. Middleton, R. Esposito: Simultaneous optimum detection and estimation of signals in noise, IEEE Trans. Inform. Theory IT-14, 434-444 (1968)
D.L. Wang, J.S. Lim: The unimportance of phase in speech enhancement, IEEE Trans. Acoust. Speech Signal Process. ASSP-30, 679-681 (1982)
H. Dudley, T.H. Tarnoczy: The speaking machine of Wolfgang von Kempelen, J. Acoust. Soc. Am. 22, 151-166 (1950)
Sir R. Paget: Human Speech (Harcourt, London, New York 1930)
J.Q. Stewart: An electrical analogue of the vocal cords, Nature 110, 311-312 (1922)
H.K. Dunn: The calculation of vowel resonances, and an electrical vocal tract, J. Acoust. Soc. Am. 22, 740-753 (1950)
B.S. Atal, L.S. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50, 637-655 (1971)
F. Itakura: Minimum prediction residual principle applied to speech recognition, IEEE Trans. Acoust. Speech Signal Process. ASSP-23, 67-72 (1975)
T.W. Parsons: Separation of speech from interfering speech by means of harmonic selection, J. Acoust. Soc. Am. 60, 911-918 (1976)
R.H. Frazier, S. Samsam, L.D. Braida, A.V. Oppenheim: Enhancement of speech by adaptive filtering, Proc. IEEE ICASSP 1976, 251-253 (1976)
R.J. McAulay, T.F. Quatieri: Mid-rate coding based on sinusoidal representation of speech, Proc IEEE ICASSP 1985, 945-948 (1985)
D.P. Morgan, E.B. George, L.T. Lee, S.M. Kay: Cochannel speaker separation by harmonic enhancement and suprresion, IEEE Trans. Speech Audio Process. 5, 405-424 (1997)
E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5, 389-406 (1997)
D. OʼBrien, A.I.C. Monaghan: Concatenative synthesis based on a harmonic model, IEEE Trans. Speech Audio Process. 9, 11-20 (2001)
J.S. Lim, A.V. Oppenheim, L.D. Braida: Evaluation of an adpative comb filtering method for enhancing speech degraded by white noise addition, IEEE Trans. Acoust. Speech Signal Process. ASSP-26, 354-358 (1978)
J. Makhoul: Linear prediction: A tutorial review, Proc. IEEE 63, 561-580 (1975)
J.R. Deller, J.G. Proakis, J.H.L. Hansen: Discrete-Time Processing of Speech Signals (Macmillan, New York 1993)
K.M. Malladi, R.V. Rajakumar: Estimation of time-varying AR models of speech through Gauss-Markov modeling, Proc. IEEE ICASSP 6, 305-308 (2003)
M. Niedźwiecki, K. Cisowski: Adaptive scheme for elimination fo broadband noise and impulsive disturbance from AR and ARMA signals, IEEE Trans. Signal Process. 44, 528-537 (1996)
B. Koo, J.D. Gibson: Filtering of colored noise for speech enhancement and coding, Proc. IEEE ICASSP 1989, 345-352 (1989)
B. Lee, K.Y. Lee, S. Ann: An EM-based approach for parameter enhancement with an application to speech signals, Signal Process. 46, 1-14 (1995)
Z. Goh, K.C. Tan, B.T.G. Tan: Kalman-filtering speech enhancement method based on a voiced-unvoiced speech model, IEEE Trans. Speech Audio Process. 7, 510-524 (1999)
C. Li, S.V. Andersen: Intergrating Kalman filtering and multi-pulse coding for speech enhancement with a non-stationary model of the speech signal, Proc. IEEE ICASSP 2004, 2300-2304 (2004)
L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77, 257-286 (1989)
B.H. Juang, L.R. Rabiner: Mixture autoregressive hidden Markov models for speech signals, IEEE Trans. Acoust. Speech Signal Process. ASSP-33, 1404-1413 (1985)
A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39, 1-38 (1977)
H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strategies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process. 6, 445-455 (1998)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, J., Benesty, J., Huang, Y.(., Diethorn, E.J. (2008). Fundamentals of Noise Reduction. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)