Skip to main content
Log in

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  • Brehm, H. (1982). Description of spherically invariant random processes by means of G-functions. In Lecture notes in computer science (Vol. 969, pp. 39–73). New York: Springer.

    Google Scholar 

  • Brehm, H., & Stammler, W. (1987). Description and generation of spherically invariant speech model signals. Signal Processing, 12, 119–141.

    Article  Google Scholar 

  • Cohen, I. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9, 113–116.

    Article  Google Scholar 

  • Cohen, I. (2004). Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.

    Article  Google Scholar 

  • Cohen, I., & Berdugo, B. (2001). Speech enhancement for nonstationary noise environments. Signal Processing, 81(11), 2403–2418.

    Article  MATH  Google Scholar 

  • Donoho, D. L. (1995). De-noising by soft thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho, D. L., & Johnston, I. M. (1994). Ideal adaptation via wavelet shrinkage. Biometrika, 81, 425–455.

    Article  MathSciNet  MATH  Google Scholar 

  • Ephraim, Y., & Cohen, I. (2006). Recent advancements in speech enhancement, The electrical engineering handbook. Boca Raton: CRC Press.

    Google Scholar 

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-32, 1109–1121.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33, 443–445.

    Article  Google Scholar 

  • Figueiredo, M. A. T., & Nowak, R. D. (2001). Wavelet-based image estimation: an empirical Bayes approach using Jeffreys’ noninformative prior. IEEE Transactions on Image Processing, 10(9), 1322–1331.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghanbari, Y., & Karami-Mollaei, M. R. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48, 927–940.

    Article  Google Scholar 

  • Grossman, A., Kronland-Martinet, R., & Morlet, J. (1987). Analysis of sound patterns through wavelet transforms. International Journal of Pattern Recognition and Artificial Intelligence, 1, 97–126.

    Google Scholar 

  • Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Huang, S. T., & Cambanis, S. (1979). Spherically invariant processes: their nonlinear structures, discrimination and estimation. Journal of Multivariate Analysis, 9, 59–83.

    Article  MathSciNet  MATH  Google Scholar 

  • ITU, ITU-T Rec, 862 (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs.

  • Kim, N. S., & Chang, J.-H. (2000). Spectral enhancement based on global soft decision. IEEE Signal Processing Letters, 7, 108–110.

    Article  Google Scholar 

  • Levitt, H. (2001). Noise reduction in hearing aids: an overview. Journal of Rehabilitation Research and Development, 38(1), 111–121.

    MathSciNet  Google Scholar 

  • Loizou, P. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press, Taylor and Francis.

    Google Scholar 

  • Malah, D., Cox, R. V., & Accardi, A. J. (1999). Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environment. In Proc. IEEE int. conf. acoust., speech, signal processing (Vol. 1, pp. 789–792).

    Google Scholar 

  • Martin, R. (2002). Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In IEEE ICASSP’02, Orlando, Florida, May 2002.

    Google Scholar 

  • Martin, R., & Breithaupt, C. (2003). Speech enhancement in the DFT domain using Laplacian speech priors. In Proc. international workshop on acoustic echo and noise control (IWAENC 03), Japan, Kyoto, Sep. 2003 (pp. 87–90).

    Google Scholar 

  • Martin, R., Wittke, I., & Jax, P. (2000). Optimized estimation of spectral parameters for the coding of noisy speech. In Proc. int. conf. acoustics, speech, and signal processing (pp. 1479–1482).

    Google Scholar 

  • McAulay, R. J., & Malpass, M. L. (1980). Speech enhancement using a softdecision noise suppression filter. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-28, 137–145.

    Article  Google Scholar 

  • Middleton, D., & Esposito, R. (1968). Simultaneous optimum detection and estimation of signals in noise. IEEE Transactions on Information Theory, IT-34, 434–444.

    Article  Google Scholar 

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proc. IEEE int. conf. acoust., speech, signal process (Vol. 2, pp. 749–752).

    Google Scholar 

  • Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 629–632).

    Google Scholar 

  • Senapati, S., Chakroborty, S., & Saha, G. (2008). Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain. Speech Communication, 50(6), 504–518.

    Article  Google Scholar 

  • Senapati, S., Bhende, N., & Saha, G. (2011). Bayesian marginal statistics for speech enhancement using Log Gabor Wavelet. International Journal of Speech Technology, 14(3), 193–210.

    Article  Google Scholar 

  • Sendur, L., & Selesnick, I. W. (2002). Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. IEEE Transactions on Signal Processing, 50, 2744–2756.

    Article  Google Scholar 

  • Seok, J., & Bae, K. (1997). Speech enhancement with reduction of noise components in the wavelet domain. In Proc. IEEE Internet. conf. acoust. speech signal process (ICASSP) (Vol. 2, pp. 1323–1326).

    Google Scholar 

  • Soon, I. Y., Koh, S. N., & Yeo, C. K. (1999). Improved noise suppression filter using self-adaptive estimator of probability of speech absence. Signal Processing, 75, 151–159.

    Article  MATH  Google Scholar 

  • Wise, G. L., & Gallagher, N. C. Jr. (1978). On spherically invariant random processes. IEEE Transactions on Information Theory, 24, 118–120.

    Article  MathSciNet  MATH  Google Scholar 

  • Yao, K. (1973). A representation theorem and its applications to spherically invariant random processes. IEEE Transactions on Information Theory, 19, 600–608.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suman Senapati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Senapati, S. Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics. Int J Speech Technol 16, 439–459 (2013). https://doi.org/10.1007/s10772-013-9195-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9195-3

Keywords

Navigation