Abstract
This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time.
Similar content being viewed by others
References
Brehm, H. (1982). Description of spherically invariant random processes by means of G-functions. In Lecture notes in computer science (Vol. 969, pp. 39–73). New York: Springer.
Brehm, H., & Stammler, W. (1987). Description and generation of spherically invariant speech model signals. Signal Processing, 12, 119–141.
Cohen, I. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9, 113–116.
Cohen, I. (2004). Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.
Cohen, I., & Berdugo, B. (2001). Speech enhancement for nonstationary noise environments. Signal Processing, 81(11), 2403–2418.
Donoho, D. L. (1995). De-noising by soft thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.
Donoho, D. L., & Johnston, I. M. (1994). Ideal adaptation via wavelet shrinkage. Biometrika, 81, 425–455.
Ephraim, Y., & Cohen, I. (2006). Recent advancements in speech enhancement, The electrical engineering handbook. Boca Raton: CRC Press.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-32, 1109–1121.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33, 443–445.
Figueiredo, M. A. T., & Nowak, R. D. (2001). Wavelet-based image estimation: an empirical Bayes approach using Jeffreys’ noninformative prior. IEEE Transactions on Image Processing, 10(9), 1322–1331.
Ghanbari, Y., & Karami-Mollaei, M. R. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48, 927–940.
Grossman, A., Kronland-Martinet, R., & Morlet, J. (1987). Analysis of sound patterns through wavelet transforms. International Journal of Pattern Recognition and Artificial Intelligence, 1, 97–126.
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Huang, S. T., & Cambanis, S. (1979). Spherically invariant processes: their nonlinear structures, discrimination and estimation. Journal of Multivariate Analysis, 9, 59–83.
ITU, ITU-T Rec, 862 (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs.
Kim, N. S., & Chang, J.-H. (2000). Spectral enhancement based on global soft decision. IEEE Signal Processing Letters, 7, 108–110.
Levitt, H. (2001). Noise reduction in hearing aids: an overview. Journal of Rehabilitation Research and Development, 38(1), 111–121.
Loizou, P. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press, Taylor and Francis.
Malah, D., Cox, R. V., & Accardi, A. J. (1999). Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environment. In Proc. IEEE int. conf. acoust., speech, signal processing (Vol. 1, pp. 789–792).
Martin, R. (2002). Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In IEEE ICASSP’02, Orlando, Florida, May 2002.
Martin, R., & Breithaupt, C. (2003). Speech enhancement in the DFT domain using Laplacian speech priors. In Proc. international workshop on acoustic echo and noise control (IWAENC 03), Japan, Kyoto, Sep. 2003 (pp. 87–90).
Martin, R., Wittke, I., & Jax, P. (2000). Optimized estimation of spectral parameters for the coding of noisy speech. In Proc. int. conf. acoustics, speech, and signal processing (pp. 1479–1482).
McAulay, R. J., & Malpass, M. L. (1980). Speech enhancement using a softdecision noise suppression filter. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-28, 137–145.
Middleton, D., & Esposito, R. (1968). Simultaneous optimum detection and estimation of signals in noise. IEEE Transactions on Information Theory, IT-34, 434–444.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proc. IEEE int. conf. acoust., speech, signal process (Vol. 2, pp. 749–752).
Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proc. IEEE int. conf. acoust., speech, signal process (pp. 629–632).
Senapati, S., Chakroborty, S., & Saha, G. (2008). Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain. Speech Communication, 50(6), 504–518.
Senapati, S., Bhende, N., & Saha, G. (2011). Bayesian marginal statistics for speech enhancement using Log Gabor Wavelet. International Journal of Speech Technology, 14(3), 193–210.
Sendur, L., & Selesnick, I. W. (2002). Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. IEEE Transactions on Signal Processing, 50, 2744–2756.
Seok, J., & Bae, K. (1997). Speech enhancement with reduction of noise components in the wavelet domain. In Proc. IEEE Internet. conf. acoust. speech signal process (ICASSP) (Vol. 2, pp. 1323–1326).
Soon, I. Y., Koh, S. N., & Yeo, C. K. (1999). Improved noise suppression filter using self-adaptive estimator of probability of speech absence. Signal Processing, 75, 151–159.
Wise, G. L., & Gallagher, N. C. Jr. (1978). On spherically invariant random processes. IEEE Transactions on Information Theory, 24, 118–120.
Yao, K. (1973). A representation theorem and its applications to spherically invariant random processes. IEEE Transactions on Information Theory, 19, 600–608.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Senapati, S. Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics. Int J Speech Technol 16, 439–459 (2013). https://doi.org/10.1007/s10772-013-9195-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9195-3