Model-Based Speech Enhancement

Krini, Mohamed; Schmidt, Gerhard

doi:10.1007/978-3-540-70602-1_4

Mohamed Krini³ &
Gerhard Schmidt³

Part of the book series: Signals and Communication Technology ((SCT))

1656 Accesses
6 Citations

In this chapter partial spectral reconstruction methods for improving noisy speech signals are described. The reconstruction process is performed on the basis of speech models for the short-term spectral envelope and for the so-called excitation signal: the signal that would be recorded directly behind the vocal cords.

Conventional noise suppression methods achieve at low signal-to-noise ratios (SNRs) only a low output quality and, thus, are improvable in these situations. The idea of model-based speech enhancement is first to detect those time-frequency areas that seem to be appropriate for reconstruction. In order to achieve a successful reconstruction it is necessary that at least a few timefrequency areas have a sufficiently high SNR. These signal parts are then used to reconstruct those parts with lower SNR. For reconstruction several speech signal properties such as pitch frequency or the degree of voicing need to be estimated in a reliable manner.

With the reconstruction approach it is possible to generate noise-free signals. But in most cases the resulting signals sound a bit robotic (comparable to low bit rate speech coders). For that reason the reconstructed signal is adaptively combined with a conventionally noise suppressed signal. In those time-frequency parts that exhibit a sufficiently high SNR the output signal of a conventional noise reduction is utilized – in the other parts the reconstructed signal is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. de Cheveigne, H. Kawahara: Yin, a fundamental frequency estimator for speech and music, JASA, 111(4), 1917–1930, 2002.
Google Scholar
J. Deller, J. Hansen, J. Proakis: Discrete-Time Processing of Speech Signals, New York, NY, USA: IEEE Press, 1993.
Google Scholar
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 32(6), 1109–1121, 1984.
Article Google Scholar
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 33(2), 443–445, 1985.
Article Google Scholar
ETS 300 903 (GSM 03.50): Transmission planning aspects of the speech service in the GSM public land mobile network (PLMS) system, ETSI, France, 1999.
Google Scholar
J. H. L. Hanson: Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect, EEE Trans. Speech Audio Process., 2(4), 598–614, 1994.
Article Google Scholar
E. Hänsler, G. Schmidt: Acoustic Echo and Noise Control, Hoboken, NJ, USA: Wiley, 2004.
Book Google Scholar
W. Hess: Pitch Determination of Speech Signals, Berlin, Germany: Springer, 1983.
Google Scholar
U. Heute: Noise reduction, in E. Hänsler, G. Schmidt (eds.), Topics in Acoustic Echo and Noise Control, Berlin, Germany: Springer, 325–384, 2006.
Google Scholar
M. Krini, G. Schmidt: Spectral refinement and its application to fundamental frequency estimation, Proc. WASPAA ’07, New Paltz, NY, USA, 2007.
Google Scholar
Y. Linde, A. Buzo, R. M. Gray: An algorithm for vector quantizer design, IEEE Trans. Comm., COM-28(1), 84–95, Jan. 1980.
Article Google Scholar
K. Linhard, T. Haulick: Spectral noise subtraction with recursive gain curves, Proc. ICSLP ’98, 4, 1479–1482, Sydney, Australia, 1998.
Google Scholar
E. Lombard: Le signe de l’elevation de la voix, Ann. Maladies Oreille, Larynx, Nez. Pharynx, 37, 101–119, 1911 (in French).
Google Scholar
T. Lotter, P. Vary: Noise reduction by joint maximum a posteriori spectral amplitude and phase estimation with super-Gaussian speech modelling, Proc. EUSIPCO ’04, 2, 1457–1460, Wien, Austria, 2004.
Google Scholar
R. Martin: An efficient algorithm to estimate the instantaneous SNR of speech signals, Proc. EUROSPEECH ’93, 1093–1096, 1994.
Google Scholar
R. Martin: Spectral subtraction based on minimum statistics, Proc. EURASIP ’94, 1182–1185, Elsevier, Amsterdam, Netherlands, 1994.
Google Scholar
R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process., T-SA-9(5), 504–512, 2001.
Article Google Scholar
A. V. Oppenheim, R. W. Schafer, J. R. Buck: Discrete-Time Signal Processing, 2nd ed., Englewood Cliffs, NJ, USA: Prentice Hall, 1998.
Google Scholar
C. Plapous, C. Marro, P. Scalart: Speech enhancement using harmonic regeneration, Proc. ICASSP ’05, 157–160, Philadelphia, Pennsylvania, USA, 2005.
Google Scholar
H. Puder, O. Soffke: An approach for an optimized voice-activity detector for noisy speech signals, Proc. EUSIPCO ’02, 1, 243–246, Toulouse, France, 2002.
Google Scholar
M. R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency measurements, JASA, 43(4), 829–834, 1968.
Google Scholar
A. Spanias: Speech coding – a tutorial review, Proc. IEEE, 82(10), 1541–1582, 1994.
Article Google Scholar
P. P. Vaidyanathan: Mulitrate Systems and Filter Banks, Englewood Cliffs, NJ, USA: Prentice Hall, 1992.
Google Scholar
P. Vary, R. Martin: Digital Speech Transmission, Hoboken, NJ, USA: Wiley, 2006.
Book Google Scholar
E. Zwicker, H. Fastl: Psychoacoustics – Facts and Models, 2nd ed., Berlin, Germany: Springer, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Harman/Becker Automotive Systems, Ulm, Germany
Mohamed Krini & Gerhard Schmidt

Authors

Mohamed Krini
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technische Universität, Darmstadt, Germany
Eberhard Hänsler
Harman/Becker Automotive Systems, Ulm, Germany
Gerhard Schmidt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Krini, M., Schmidt, G. (2008). Model-Based Speech Enhancement. In: Hänsler, E., Schmidt, G. (eds) Speech and Audio Processing in Adverse Environments. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70602-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-70602-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70601-4
Online ISBN: 978-3-540-70602-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics