Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

Jannu, Chaitanya; Vanambathina, Sunny Dayal

doi:10.1007/s10772-023-10020-5

Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

Published: 02 February 2023

Volume 26, pages 197–209, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Chaitanya Jannu¹ &
Sunny Dayal Vanambathina¹

147 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, a novel Weibull and Nakagami priors based regularized non-negative matrix factorization (NMF) with adaptive wiener filter approach for speech enhancement (SE) is proposed. In the past recent NMF with Wiener filter is used for the task of speech enhancement. However, the wiener filtering is inadequate while dealing with non-stationary noises. Still, there is a scope for further improvement of speech under non-stationary noises. In the proposed regularized NMF with adaptive Wiener filter method, prior distributions are used for transformed domain magnitudes of speech and noise spectral components to implement an iterative posterior NMF model. The magnitude of the spectral components of speech is considered as Weibull, and Nakagami distributions and the noise spectral components as Gaussian distribution. An adaptively estimating of the necessary statistics of these distributions to get a natural regularization of the NMF criterion is also proposed. And an adaptive factor (α) is introduced in the Wiener filtering approach to adjust the weights between noise levels and estimated speech based on signal-to-noise level for the gain function, which helps to further enhance the speech quality. The proposed adaptive Wiener filter has an adaption algorithm that monitors the environment and varies the filter coefficients accordingly and the genetic algorithm is used to find proper adaptive parameter (α), to achieve enhanced speech quality on the basis of the PESQ measure. The Suggested method outperformed the other benchmark algorithms in terms of SDR (signal-to-distortion ratio), STOI (short-time objective intelligibility) and PESQ (perceptual evaluation of the speech quality).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on speech separation in cocktail party environment: challenges and approaches

Article 23 February 2023

Speech coding techniques and challenges: a comprehensive literature survey

Article 14 September 2023

Bioacoustic signal denoising: a review

Article 09 November 2020

Data availability

The datasets for speech, as well as noise models, used in this study, are available in the NOIZEUS repository <http://ecs.utdallas.edu/loizou/speech/noizeus> .

References

Bahrami, M., & Faraji, N. (2021). Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty. International Journal of Speech Technology, 24(1), 97–108.
Article Google Scholar
Beerends, J. G., Hekstra, A. P., Rix, A. W., & Hollier, M. P. (2002). Perceptual evaluation of speech quality (pesq) the new itu standard for end-to-end speech quality assessment part ii: Psychoacoustic model. Journal of the Audio Engineering Society, 50(10), 765–778.
Google Scholar
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Article Google Scholar
Bryan, N., & Mysore, G. (2013). An efficient posterior regularized latent variable model for interactive sound source separation. In International conference on machine learning, (pp. 208–216). PMLR.
Chan, K. Y., Nordholm, S., Yiu, K. F. C., & Togneri, R. (2013). Speech enhancement strategy for speech recognition microcontroller under noisy environments. Neurocomputing, 118, 279–288. https://doi.org/10.1016/j.neucom.2013.03.008
Article Google Scholar
Duan, Z., Mysore, G. J., & Smaragdis, P. (2012). Speech enhancement by online non-negative spectrogram decomposition in nonstationary noise environments. In Thirteenth annual conference of the international speech communication association (Interspeech 2012).
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Erkelens, J. S., Hendriks, R. C., & Heusdens, J. J. (2007). Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Transactions on Audio, Speech, and Language Processing, 15(6), 1741–1752. https://doi.org/10.1109/TASL.2007.899233
Article Google Scholar
Fan, H. T., Hung, J. W., Lu, X., Wang, S. S., & Tsao, Y. (2014). Speech enhancement using segmental nonnegative matrix factorization. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4483–4487). IEEE.
Févotte, C., Bertin, N., & Durrieu, J. L. (2009). Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Computation, 21(3), 793–830.
Article MATH Google Scholar
Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000, Paris, France, September 18–20.
Holland, J. H. (1992). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press.
Book Google Scholar
Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
Huang, G., Benesty, J., Long, T., & Chen, J. (2014). A family of maximum SNR filters for noise reduction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 2034–2047.
Article Google Scholar
Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11(6), 700–708.
Article Google Scholar
C Joder, F Weninger, F Eyben, D Virette, B Schuller (2012). Real-time speech separation by semi-supervised nonnegative matrix factorization. In Fabian Theis, Andrzej Cichocki, Arie Yeredor & Michael Zibulevsky (Eds.). Latent variable analysis and signal separation, Conference proceedings (pp. 322-329), Springer.
Kumar, K., & Cruces, S. (2017). An iterative posterior NMF method for speech enhancement in the presence of additive Gaussian noise. Neurocomputing, 230, 312–315.
Article Google Scholar
Lai, Y. H., Wang, S. S., Chen, C. H., & Jhang, S. H. (2019). Adaptive Wiener gain to improve sound quality on nonnegative matrix factorization-based noise reduction system. IEEE Access, 7, 43286–43297.
Article Google Scholar
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, NIPS 2000, 27 November - 2 December, Denver, CO, USA.
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.
Article MATH Google Scholar
Lim, J., & Oppenheim, A. (1978). All-pole modeling of degraded speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(3), 197–210.
Article MATH Google Scholar
Loizou, P. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Google Scholar
Madhu, N., Spriet, A., Jansen, S., Koning, R., & Wouters, J. (2013). The potential for speech intelligibility improvement using the ideal binary mask and the ideal wiener filter in single channel noise reduction systems: Application to auditory prostheses. IEEE Transactions on Audio, Speech, and Language Processing, 21(1), 63–72.
Article Google Scholar
Martin, R. (2002). Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In 2002 IEEE international conference on acoustics, speech, and signal processing, Vol. 1, (ICASSP) (pp. I–253). IEEE.
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.
Article Google Scholar
Mohammadiha, N., Taghia, J., & Leijon, A. (2012). Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4561–4564). IEEE.
Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2140–2151.
Article Google Scholar
Ozerov, A., & Févotte, C. (2009). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.
Article Google Scholar
Recommendation, I. T. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862.
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing (ICASSP). Proceedings (Cat. No. 01CH37221), Vol. 2, (pp. 749–752). IEEE.
Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2007). Codebook-based Bayesian speech enhancement for nonstationary environments. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 441–452.
Article Google Scholar
Sunnydayal, V., Sivaprasad, N., & Kumar, T. K. (2014). A survey on statistical based single channel speech enhancement techniques. International Journal of Intelligent Systems and Applications, 6(12), 69.
Article Google Scholar
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125–2136.
Article Google Scholar
Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.
Article Google Scholar
Wang, L. C., & Lea, C. T. (1998). Co-channel interference analysis of shadowed Rician channels. IEEE Communications Letters, 2(3), 67–69.
Article Google Scholar
Wang, S. S., Chern, A., Tsao, Y., Hung, J. W., Lu, X., Lai, Y. H., & Su, B. (2016). Wavelet speech enhancement based on nonnegative matrix factorization. IEEE Signal Processing Letters, 23(8), 1101–1105.
Article Google Scholar
Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., & Rigoll, G. (2011a). The Munich 2011a CHiME challenge contribution: NMF-BLSTM speech enhancement and recognition for reverberated multisource environments. In Proceedings of machine listening in multisource environments (CHiME 2011a), satellite workshop of Interspeech 2011a, ISCA, Florence, Italy.
Weninger, F., Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011b). Recognition of nonprototypical emotions in reverberated and noisy speech by nonnegative matrix factorization. EURASIP Journal on Advances in Signal Processing, 2011, 1–16.
Article Google Scholar
Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2008). Speech denoising using nonnegative matrix factorization with priors. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 4029–4032). IEEE.
Xiang, Y., Shi, L., Højvang, J. L., Rasmussen, M. H., & Christensen, M. G. (2020). An NMF-HMM speech enhancement method based on Kullback-Leibler divergence. In Interspeech 2020 (pp. 2667–2671).
Xiao, X., Lee, P., & Nickel, R. M. (2009). Inventory based speech enhancement for speaker dedicated speech communication systems. In 2009 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 3877–3880). https://doi.org/10.1109/ICASSP.2009.4960474.
Xie, D., & Zhang, W. (2014). Estimating speech spectral amplitude based on the Nakagami approximation. IEEE Signal Processing Letters, 21(11), 1375–1379.
Article Google Scholar
Zhao, D. Y., & Kleijn, W. B. (2007). HMM-based gain modeling for enhancement of speech in noise. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 882–892.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics Engineering, VIT-AP University, Amaravati, India
Chaitanya Jannu & Sunny Dayal Vanambathina

Authors

Chaitanya Jannu
View author publications
You can also search for this author in PubMed Google Scholar
Sunny Dayal Vanambathina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaitanya Jannu.

Ethics declarations

Competing interests

The authors did not receive support from any organization for the submitted work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jannu, C., Vanambathina, S.D. Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement. Int J Speech Technol 26, 197–209 (2023). https://doi.org/10.1007/s10772-023-10020-5

Download citation

Received: 30 December 2021
Accepted: 08 January 2023
Published: 02 February 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10772-023-10020-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

Abstract

Access this article

Similar content being viewed by others

A review on speech separation in cocktail party environment: challenges and approaches

Speech coding techniques and challenges: a comprehensive literature survey

Bioacoustic signal denoising: a review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

Abstract

Access this article

Similar content being viewed by others

A review on speech separation in cocktail party environment: challenges and approaches

Speech coding techniques and challenges: a comprehensive literature survey

Bioacoustic signal denoising: a review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation