In this chapter partial spectral reconstruction methods for improving noisy speech signals are described. The reconstruction process is performed on the basis of speech models for the short-term spectral envelope and for the so-called excitation signal: the signal that would be recorded directly behind the vocal cords.
Conventional noise suppression methods achieve at low signal-to-noise ratios (SNRs) only a low output quality and, thus, are improvable in these situations. The idea of model-based speech enhancement is first to detect those time-frequency areas that seem to be appropriate for reconstruction. In order to achieve a successful reconstruction it is necessary that at least a few timefrequency areas have a sufficiently high SNR. These signal parts are then used to reconstruct those parts with lower SNR. For reconstruction several speech signal properties such as pitch frequency or the degree of voicing need to be estimated in a reliable manner.
With the reconstruction approach it is possible to generate noise-free signals. But in most cases the resulting signals sound a bit robotic (comparable to low bit rate speech coders). For that reason the reconstructed signal is adaptively combined with a conventionally noise suppressed signal. In those time-frequency parts that exhibit a sufficiently high SNR the output signal of a conventional noise reduction is utilized – in the other parts the reconstructed signal is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. de Cheveigne, H. Kawahara: Yin, a fundamental frequency estimator for speech and music, JASA, 111(4), 1917–1930, 2002.
J. Deller, J. Hansen, J. Proakis: Discrete-Time Processing of Speech Signals, New York, NY, USA: IEEE Press, 1993.
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 32(6), 1109–1121, 1984.
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 33(2), 443–445, 1985.
ETS 300 903 (GSM 03.50): Transmission planning aspects of the speech service in the GSM public land mobile network (PLMS) system, ETSI, France, 1999.
J. H. L. Hanson: Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect, EEE Trans. Speech Audio Process., 2(4), 598–614, 1994.
E. Hänsler, G. Schmidt: Acoustic Echo and Noise Control, Hoboken, NJ, USA: Wiley, 2004.
W. Hess: Pitch Determination of Speech Signals, Berlin, Germany: Springer, 1983.
U. Heute: Noise reduction, in E. Hänsler, G. Schmidt (eds.), Topics in Acoustic Echo and Noise Control, Berlin, Germany: Springer, 325–384, 2006.
M. Krini, G. Schmidt: Spectral refinement and its application to fundamental frequency estimation, Proc. WASPAA ’07, New Paltz, NY, USA, 2007.
Y. Linde, A. Buzo, R. M. Gray: An algorithm for vector quantizer design, IEEE Trans. Comm., COM-28(1), 84–95, Jan. 1980.
K. Linhard, T. Haulick: Spectral noise subtraction with recursive gain curves, Proc. ICSLP ’98, 4, 1479–1482, Sydney, Australia, 1998.
E. Lombard: Le signe de l’elevation de la voix, Ann. Maladies Oreille, Larynx, Nez. Pharynx, 37, 101–119, 1911 (in French).
T. Lotter, P. Vary: Noise reduction by joint maximum a posteriori spectral amplitude and phase estimation with super-Gaussian speech modelling, Proc. EUSIPCO ’04, 2, 1457–1460, Wien, Austria, 2004.
R. Martin: An efficient algorithm to estimate the instantaneous SNR of speech signals, Proc. EUROSPEECH ’93, 1093–1096, 1994.
R. Martin: Spectral subtraction based on minimum statistics, Proc. EURASIP ’94, 1182–1185, Elsevier, Amsterdam, Netherlands, 1994.
R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process., T-SA-9(5), 504–512, 2001.
A. V. Oppenheim, R. W. Schafer, J. R. Buck: Discrete-Time Signal Processing, 2nd ed., Englewood Cliffs, NJ, USA: Prentice Hall, 1998.
C. Plapous, C. Marro, P. Scalart: Speech enhancement using harmonic regeneration, Proc. ICASSP ’05, 157–160, Philadelphia, Pennsylvania, USA, 2005.
H. Puder, O. Soffke: An approach for an optimized voice-activity detector for noisy speech signals, Proc. EUSIPCO ’02, 1, 243–246, Toulouse, France, 2002.
M. R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency measurements, JASA, 43(4), 829–834, 1968.
A. Spanias: Speech coding – a tutorial review, Proc. IEEE, 82(10), 1541–1582, 1994.
P. P. Vaidyanathan: Mulitrate Systems and Filter Banks, Englewood Cliffs, NJ, USA: Prentice Hall, 1992.
P. Vary, R. Martin: Digital Speech Transmission, Hoboken, NJ, USA: Wiley, 2006.
E. Zwicker, H. Fastl: Psychoacoustics – Facts and Models, 2nd ed., Berlin, Germany: Springer, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Krini, M., Schmidt, G. (2008). Model-Based Speech Enhancement. In: Hänsler, E., Schmidt, G. (eds) Speech and Audio Processing in Adverse Environments. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70602-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-70602-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70601-4
Online ISBN: 978-3-540-70602-1
eBook Packages: EngineeringEngineering (R0)