Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Senapati, Suman

doi:10.1007/s10772-013-9195-3

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Published: 12 April 2013

Volume 16, pages 439–459, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Suman Senapati¹

278 Accesses
Explore all metrics

Abstract

This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Article 05 November 2016

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

Article 13 November 2020

Bayesian STSA estimation using masking properties and generalized Gamma prior for speech enhancement

Article Open access 06 October 2015

References

Brehm, H. (1982). Description of spherically invariant random processes by means of G-functions. In Lecture notes in computer science (Vol. 969, pp. 39–73). New York: Springer.
Google Scholar
Brehm, H., & Stammler, W. (1987). Description and generation of spherically invariant speech model signals. Signal Processing, 12, 119–141.
Article Google Scholar
Cohen, I. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9, 113–116.
Article Google Scholar
Cohen, I. (2004). Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.
Article Google Scholar
Cohen, I., & Berdugo, B. (2001). Speech enhancement for nonstationary noise environments. Signal Processing, 81(11), 2403–2418.
Article MATH Google Scholar
Donoho, D. L. (1995). De-noising by soft thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.
Article MathSciNet MATH Google Scholar
Donoho, D. L., & Johnston, I. M. (1994). Ideal adaptation via wavelet shrinkage. Biometrika, 81, 425–455.
Article MathSciNet MATH Google Scholar
Ephraim, Y., & Cohen, I. (2006). Recent advancements in speech enhancement, The electrical engineering handbook. Boca Raton: CRC Press.
Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-32, 1109–1121.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33, 443–445.
Article Google Scholar
Figueiredo, M. A. T., & Nowak, R. D. (2001). Wavelet-based image estimation: an empirical Bayes approach using Jeffreys’ noninformative prior. IEEE Transactions on Image Processing, 10(9), 1322–1331.
Article MathSciNet MATH Google Scholar
Ghanbari, Y., & Karami-Mollaei, M. R. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48, 927–940.
Article Google Scholar
Grossman, A., Kronland-Martinet, R., & Morlet, J. (1987). Analysis of sound patterns through wavelet transforms. International Journal of Pattern Recognition and Artificial Intelligence, 1, 97–126.
Google Scholar
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
Huang, S. T., & Cambanis, S. (1979). Spherically invariant processes: their nonlinear structures, discrimination and estimation. Journal of Multivariate Analysis, 9, 59–83.
Article MathSciNet MATH Google Scholar
ITU, ITU-T Rec, 862 (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs.
Kim, N. S., & Chang, J.-H. (2000). Spectral enhancement based on global soft decision. IEEE Signal Processing Letters, 7, 108–110.
Article Google Scholar
Levitt, H. (2001). Noise reduction in hearing aids: an overview. Journal of Rehabilitation Research and Development, 38(1), 111–121.
MathSciNet Google Scholar
Loizou, P. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press, Taylor and Francis.
Google Scholar
Malah, D., Cox, R. V., & Accardi, A. J. (1999). Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environment. In Proc. IEEE int. conf. acoust., speech, signal processing (Vol. 1, pp. 789–792).
Google Scholar
Martin, R. (2002). Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In IEEE ICASSP’02, Orlando, Florida, May 2002.
Google Scholar
Martin, R., & Breithaupt, C. (2003). Speech enhancement in the DFT domain using Laplacian speech priors. In Proc. international workshop on acoustic echo and noise control (IWAENC 03), Japan, Kyoto, Sep. 2003 (pp. 87–90).
Google Scholar
Martin, R., Wittke, I., & Jax, P. (2000). Optimized estimation of spectral parameters for the coding of noisy speech. In Proc. int. conf. acoustics, speech, and signal processing (pp. 1479–1482).
Google Scholar
McAulay, R. J., & Malpass, M. L. (1980). Speech enhancement using a softdecision noise suppression filter. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-28, 137–145.
Article Google Scholar
Middleton, D., & Esposito, R. (1968). Simultaneous optimum detection and estimation of signals in noise. IEEE Transactions on Information Theory, IT-34, 434–444.
Article Google Scholar
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proc. IEEE int. conf. acoust., speech, signal process (Vol. 2, pp. 749–752).
Google Scholar
Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 629–632).
Google Scholar
Senapati, S., Chakroborty, S., & Saha, G. (2008). Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain. Speech Communication, 50(6), 504–518.
Article Google Scholar
Senapati, S., Bhende, N., & Saha, G. (2011). Bayesian marginal statistics for speech enhancement using Log Gabor Wavelet. International Journal of Speech Technology, 14(3), 193–210.
Article Google Scholar
Sendur, L., & Selesnick, I. W. (2002). Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. IEEE Transactions on Signal Processing, 50, 2744–2756.
Article Google Scholar
Seok, J., & Bae, K. (1997). Speech enhancement with reduction of noise components in the wavelet domain. In Proc. IEEE Internet. conf. acoust. speech signal process (ICASSP) (Vol. 2, pp. 1323–1326).
Google Scholar
Soon, I. Y., Koh, S. N., & Yeo, C. K. (1999). Improved noise suppression filter using self-adaptive estimator of probability of speech absence. Signal Processing, 75, 151–159.
Article MATH Google Scholar
Wise, G. L., & Gallagher, N. C. Jr. (1978). On spherically invariant random processes. IEEE Transactions on Information Theory, 24, 118–120.
Article MathSciNet MATH Google Scholar
Yao, K. (1973). A representation theorem and its applications to spherically invariant random processes. IEEE Transactions on Information Theory, 19, 600–608.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology, Kharagpur, West Bengal, India
Suman Senapati

Authors

Suman Senapati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suman Senapati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Senapati, S. Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics. Int J Speech Technol 16, 439–459 (2013). https://doi.org/10.1007/s10772-013-9195-3

Download citation

Received: 27 January 2013
Accepted: 18 March 2013
Published: 12 April 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10772-013-9195-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Abstract

Access this article

Similar content being viewed by others

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

Bayesian STSA estimation using masking properties and generalized Gamma prior for speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Abstract

Access this article

Similar content being viewed by others

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

Bayesian STSA estimation using masking properties and generalized Gamma prior for speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation