Skip to main content

Part of the book series: Signals and Communication Technology ((SCT))

In this chapter partial spectral reconstruction methods for improving noisy speech signals are described. The reconstruction process is performed on the basis of speech models for the short-term spectral envelope and for the so-called excitation signal: the signal that would be recorded directly behind the vocal cords.

Conventional noise suppression methods achieve at low signal-to-noise ratios (SNRs) only a low output quality and, thus, are improvable in these situations. The idea of model-based speech enhancement is first to detect those time-frequency areas that seem to be appropriate for reconstruction. In order to achieve a successful reconstruction it is necessary that at least a few timefrequency areas have a sufficiently high SNR. These signal parts are then used to reconstruct those parts with lower SNR. For reconstruction several speech signal properties such as pitch frequency or the degree of voicing need to be estimated in a reliable manner.

With the reconstruction approach it is possible to generate noise-free signals. But in most cases the resulting signals sound a bit robotic (comparable to low bit rate speech coders). For that reason the reconstructed signal is adaptively combined with a conventionally noise suppressed signal. In those time-frequency parts that exhibit a sufficiently high SNR the output signal of a conventional noise reduction is utilized – in the other parts the reconstructed signal is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. de Cheveigne, H. Kawahara: Yin, a fundamental frequency estimator for speech and music, JASA, 111(4), 1917–1930, 2002.

    Google Scholar 

  2. J. Deller, J. Hansen, J. Proakis: Discrete-Time Processing of Speech Signals, New York, NY, USA: IEEE Press, 1993.

    Google Scholar 

  3. Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 32(6), 1109–1121, 1984.

    Article  Google Scholar 

  4. Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., 33(2), 443–445, 1985.

    Article  Google Scholar 

  5. ETS 300 903 (GSM 03.50): Transmission planning aspects of the speech service in the GSM public land mobile network (PLMS) system, ETSI, France, 1999.

    Google Scholar 

  6. J. H. L. Hanson: Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect, EEE Trans. Speech Audio Process., 2(4), 598–614, 1994.

    Article  Google Scholar 

  7. E. Hänsler, G. Schmidt: Acoustic Echo and Noise Control, Hoboken, NJ, USA: Wiley, 2004.

    Book  Google Scholar 

  8. W. Hess: Pitch Determination of Speech Signals, Berlin, Germany: Springer, 1983.

    Google Scholar 

  9. U. Heute: Noise reduction, in E. Hänsler, G. Schmidt (eds.), Topics in Acoustic Echo and Noise Control, Berlin, Germany: Springer, 325–384, 2006.

    Google Scholar 

  10. M. Krini, G. Schmidt: Spectral refinement and its application to fundamental frequency estimation, Proc. WASPAA ’07, New Paltz, NY, USA, 2007.

    Google Scholar 

  11. Y. Linde, A. Buzo, R. M. Gray: An algorithm for vector quantizer design, IEEE Trans. Comm., COM-28(1), 84–95, Jan. 1980.

    Article  Google Scholar 

  12. K. Linhard, T. Haulick: Spectral noise subtraction with recursive gain curves, Proc. ICSLP ’98, 4, 1479–1482, Sydney, Australia, 1998.

    Google Scholar 

  13. E. Lombard: Le signe de l’elevation de la voix, Ann. Maladies Oreille, Larynx, Nez. Pharynx, 37, 101–119, 1911 (in French).

    Google Scholar 

  14. T. Lotter, P. Vary: Noise reduction by joint maximum a posteriori spectral amplitude and phase estimation with super-Gaussian speech modelling, Proc. EUSIPCO ’04, 2, 1457–1460, Wien, Austria, 2004.

    Google Scholar 

  15. R. Martin: An efficient algorithm to estimate the instantaneous SNR of speech signals, Proc. EUROSPEECH ’93, 1093–1096, 1994.

    Google Scholar 

  16. R. Martin: Spectral subtraction based on minimum statistics, Proc. EURASIP ’94, 1182–1185, Elsevier, Amsterdam, Netherlands, 1994.

    Google Scholar 

  17. R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process., T-SA-9(5), 504–512, 2001.

    Article  Google Scholar 

  18. A. V. Oppenheim, R. W. Schafer, J. R. Buck: Discrete-Time Signal Processing, 2nd ed., Englewood Cliffs, NJ, USA: Prentice Hall, 1998.

    Google Scholar 

  19. C. Plapous, C. Marro, P. Scalart: Speech enhancement using harmonic regeneration, Proc. ICASSP ’05, 157–160, Philadelphia, Pennsylvania, USA, 2005.

    Google Scholar 

  20. H. Puder, O. Soffke: An approach for an optimized voice-activity detector for noisy speech signals, Proc. EUSIPCO ’02, 1, 243–246, Toulouse, France, 2002.

    Google Scholar 

  21. M. R. Schroeder: Period histogram and product spectrum: New methods for fundamental frequency measurements, JASA, 43(4), 829–834, 1968.

    Google Scholar 

  22. A. Spanias: Speech coding – a tutorial review, Proc. IEEE, 82(10), 1541–1582, 1994.

    Article  Google Scholar 

  23. P. P. Vaidyanathan: Mulitrate Systems and Filter Banks, Englewood Cliffs, NJ, USA: Prentice Hall, 1992.

    Google Scholar 

  24. P. Vary, R. Martin: Digital Speech Transmission, Hoboken, NJ, USA: Wiley, 2006.

    Book  Google Scholar 

  25. E. Zwicker, H. Fastl: Psychoacoustics – Facts and Models, 2nd ed., Berlin, Germany: Springer, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Krini, M., Schmidt, G. (2008). Model-Based Speech Enhancement. In: Hänsler, E., Schmidt, G. (eds) Speech and Audio Processing in Adverse Environments. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70602-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70602-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70601-4

  • Online ISBN: 978-3-540-70602-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics