Abstract
In this paper, a novel Weibull and Nakagami priors based regularized non-negative matrix factorization (NMF) with adaptive wiener filter approach for speech enhancement (SE) is proposed. In the past recent NMF with Wiener filter is used for the task of speech enhancement. However, the wiener filtering is inadequate while dealing with non-stationary noises. Still, there is a scope for further improvement of speech under non-stationary noises. In the proposed regularized NMF with adaptive Wiener filter method, prior distributions are used for transformed domain magnitudes of speech and noise spectral components to implement an iterative posterior NMF model. The magnitude of the spectral components of speech is considered as Weibull, and Nakagami distributions and the noise spectral components as Gaussian distribution. An adaptively estimating of the necessary statistics of these distributions to get a natural regularization of the NMF criterion is also proposed. And an adaptive factor (α) is introduced in the Wiener filtering approach to adjust the weights between noise levels and estimated speech based on signal-to-noise level for the gain function, which helps to further enhance the speech quality. The proposed adaptive Wiener filter has an adaption algorithm that monitors the environment and varies the filter coefficients accordingly and the genetic algorithm is used to find proper adaptive parameter (α), to achieve enhanced speech quality on the basis of the PESQ measure. The Suggested method outperformed the other benchmark algorithms in terms of SDR (signal-to-distortion ratio), STOI (short-time objective intelligibility) and PESQ (perceptual evaluation of the speech quality).
Similar content being viewed by others
Data availability
The datasets for speech, as well as noise models, used in this study, are available in the NOIZEUS repository <http://ecs.utdallas.edu/loizou/speech/noizeus> .
References
Bahrami, M., & Faraji, N. (2021). Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty. International Journal of Speech Technology, 24(1), 97–108.
Beerends, J. G., Hekstra, A. P., Rix, A. W., & Hollier, M. P. (2002). Perceptual evaluation of speech quality (pesq) the new itu standard for end-to-end speech quality assessment part ii: Psychoacoustic model. Journal of the Audio Engineering Society, 50(10), 765–778.
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Bryan, N., & Mysore, G. (2013). An efficient posterior regularized latent variable model for interactive sound source separation. In International conference on machine learning, (pp. 208–216). PMLR.
Chan, K. Y., Nordholm, S., Yiu, K. F. C., & Togneri, R. (2013). Speech enhancement strategy for speech recognition microcontroller under noisy environments. Neurocomputing, 118, 279–288. https://doi.org/10.1016/j.neucom.2013.03.008
Duan, Z., Mysore, G. J., & Smaragdis, P. (2012). Speech enhancement by online non-negative spectrogram decomposition in nonstationary noise environments. In Thirteenth annual conference of the international speech communication association (Interspeech 2012).
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Erkelens, J. S., Hendriks, R. C., & Heusdens, J. J. (2007). Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Transactions on Audio, Speech, and Language Processing, 15(6), 1741–1752. https://doi.org/10.1109/TASL.2007.899233
Fan, H. T., Hung, J. W., Lu, X., Wang, S. S., & Tsao, Y. (2014). Speech enhancement using segmental nonnegative matrix factorization. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4483–4487). IEEE.
Févotte, C., Bertin, N., & Durrieu, J. L. (2009). Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Computation, 21(3), 793–830.
Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000, Paris, France, September 18–20.
Holland, J. H. (1992). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press.
Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Huang, G., Benesty, J., Long, T., & Chen, J. (2014). A family of maximum SNR filters for noise reduction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 2034–2047.
Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11(6), 700–708.
C Joder, F Weninger, F Eyben, D Virette, B Schuller (2012). Real-time speech separation by semi-supervised nonnegative matrix factorization. In Fabian Theis, Andrzej Cichocki, Arie Yeredor & Michael Zibulevsky (Eds.). Latent variable analysis and signal separation, Conference proceedings (pp. 322-329), Springer.
Kumar, K., & Cruces, S. (2017). An iterative posterior NMF method for speech enhancement in the presence of additive Gaussian noise. Neurocomputing, 230, 312–315.
Lai, Y. H., Wang, S. S., Chen, C. H., & Jhang, S. H. (2019). Adaptive Wiener gain to improve sound quality on nonnegative matrix factorization-based noise reduction system. IEEE Access, 7, 43286–43297.
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, NIPS 2000, 27 November - 2 December, Denver, CO, USA.
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.
Lim, J., & Oppenheim, A. (1978). All-pole modeling of degraded speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(3), 197–210.
Loizou, P. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Madhu, N., Spriet, A., Jansen, S., Koning, R., & Wouters, J. (2013). The potential for speech intelligibility improvement using the ideal binary mask and the ideal wiener filter in single channel noise reduction systems: Application to auditory prostheses. IEEE Transactions on Audio, Speech, and Language Processing, 21(1), 63–72.
Martin, R. (2002). Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In 2002 IEEE international conference on acoustics, speech, and signal processing, Vol. 1, (ICASSP) (pp. I–253). IEEE.
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.
Mohammadiha, N., Taghia, J., & Leijon, A. (2012). Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4561–4564). IEEE.
Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2140–2151.
Ozerov, A., & Févotte, C. (2009). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.
Recommendation, I. T. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862.
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing (ICASSP). Proceedings (Cat. No. 01CH37221), Vol. 2, (pp. 749–752). IEEE.
Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2007). Codebook-based Bayesian speech enhancement for nonstationary environments. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 441–452.
Sunnydayal, V., Sivaprasad, N., & Kumar, T. K. (2014). A survey on statistical based single channel speech enhancement techniques. International Journal of Intelligent Systems and Applications, 6(12), 69.
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125–2136.
Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.
Wang, L. C., & Lea, C. T. (1998). Co-channel interference analysis of shadowed Rician channels. IEEE Communications Letters, 2(3), 67–69.
Wang, S. S., Chern, A., Tsao, Y., Hung, J. W., Lu, X., Lai, Y. H., & Su, B. (2016). Wavelet speech enhancement based on nonnegative matrix factorization. IEEE Signal Processing Letters, 23(8), 1101–1105.
Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., & Rigoll, G. (2011a). The Munich 2011a CHiME challenge contribution: NMF-BLSTM speech enhancement and recognition for reverberated multisource environments. In Proceedings of machine listening in multisource environments (CHiME 2011a), satellite workshop of Interspeech 2011a, ISCA, Florence, Italy.
Weninger, F., Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011b). Recognition of nonprototypical emotions in reverberated and noisy speech by nonnegative matrix factorization. EURASIP Journal on Advances in Signal Processing, 2011, 1–16.
Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). Speech denoising using nonnegative matrix factorization with priors. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 4029–4032). IEEE.
Xiang, Y., Shi, L., Højvang, J. L., Rasmussen, M. H., & Christensen, M. G. (2020). An NMF-HMM speech enhancement method based on Kullback-Leibler divergence. In Interspeech 2020 (pp. 2667–2671).
Xiao, X., Lee, P., & Nickel, R. M. (2009). Inventory based speech enhancement for speaker dedicated speech communication systems. In 2009 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 3877–3880). https://doi.org/10.1109/ICASSP.2009.4960474.
Xie, D., & Zhang, W. (2014). Estimating speech spectral amplitude based on the Nakagami approximation. IEEE Signal Processing Letters, 21(11), 1375–1379.
Zhao, D. Y., & Kleijn, W. B. (2007). HMM-based gain modeling for enhancement of speech in noise. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 882–892.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors did not receive support from any organization for the submitted work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jannu, C., Vanambathina, S.D. Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement. Int J Speech Technol 26, 197–209 (2023). https://doi.org/10.1007/s10772-023-10020-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-023-10020-5