Abstract
In this paper, a speech enhancement method based on correlation canceling approach associated with the Log- minimum mean-square-error estimator is presented. Unlike the conventional statistical-model methods based on the nonlinear estimation of the enhanced speech signal, such as Maximum-Likelihood estimator (ML), Maximum A Posteriori (MAP) estimator, Minimum Mean Square Error (MMSE) estimator and log MMSE estimator, in the proposed hybrid method (CC/Log-MMSE), the nonlinear estimation is transformed into a linear estimation by exploiting the orthogonal projection of clean signal into the noisy signal. Thus, the enhanced signal represents the best “copy,” or estimate, of clean signal that can be made on the basis of the noisy signal vector. This is also seen as a canceling of the component of the noisy vector residing in the noise subspace, which therefore leads to improve the intelligibility of the enhanced signal. Extensive simulations are carried out using speech files corrupted by different noises available in the NOIZEUS corpus, show that the proposed hybrid method CC/Log-MMSE consistently outperforms the baseline methods of speech enhancement at different levels of SNR in terms of objective and subjective measures, spectrogram analysis and the overall SNR improvement.
Similar content being viewed by others
Data availability
The noisy speech dataset (NOIZEUS) and documentation related to this work can be downloaded from. https://ecs.utdallas.edu/loizou/speech/noizeus/
References
Akbacak M, Hansen JHL (2007) Environmental Sniffing: Noise Knowledge Estimation For Robust Speech Systems. IEEE Trans on ASLP 15(2):465–477
Asbai N, Amrouche A (2017) Boosting scores fusion approach using front-end diversity and adaboost algorithm, for speaker verification. Comput Electr Eng 62:648–662
Bahrami M, Seyedin S (2018) MMSE log-spectral amplitude estimation for single channel speech enhancement under speech presence uncertainty by Weibull speech priors. In: Electrical engineering (ICEE), Iranian conference. IEEE, pp 749–754
Bbeach RE, Harris JT, Montgomery RC, et al.(2014) Voice and data wireless communications network and method. U.S. Patent No 8, pp.660–661.
Cohen I (2003) Noise Spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans on SAP 11(5):466–475
Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments, Elsevier. Signal Process 81:2403–2418
Cohen I, Berdugo B (2002) Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE SPL 9(1):12–15
de Reyna JA (2019) The value of an integral in Gradshteyn and Ryzhik’s table. The Ramanujan J 50(3):551–571
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 23(2):443–445
Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR2000-Automatic speech recognition: challenges for the new Millenium ISCA tutorial and research workshop (ITRW).
Hu Y, Loizou PC (2006) Subjective comparison of speech enhancement algorithms, proceedings of IEEE international conference on acoustics, speech, and signal processing, vol I. Toulouse, France, pp 153–156
Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. Audio, Speech, Language Proces, IEEE Trans on 16(1):229–238
Hu Y, Loizou PC(n.d.) NOIZEUS: a noisy speech corpus for evaluation of speech enhancement algorithms, available at http://www.utdallas.edu/~loizou/speech/noizeus/
ITU-T, (2001) P. 862: Perceptual evaluation of speech quality (pesq), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs . ITU-T Recommendation, 862.
ITU-T, (2003). P. 835: Subjective test methodology for evaluating speech communication systems that include noise suppression algorithms. ITU-T Recommendation (ITU, Geneva), 835.
Ju GH, Lee LS (2002) Speech enhancement based on generalized singular value decomposition approach. In Seventh International Conference on Spoken Language Processing.
Junqua JC, Haton JP (2012) Robustness in automatic speech recognition: fundamentals and applications. Springer Science & Business Media
KATES JM (2008) Digital hearing aids. Plural publishing
Kenai O, Ouamour S, Guerti M, Asbai N (2019) A new architecture based VAD for speaker diarization/detection systems. Int J Speech Technol 22(3):827–840
Lee GW, Kim HK (2020) Multi-task learning u-net for single-channel speech enhancement and mask-based voice activity detection. Appl Sci 10(9):3230
Loizou PC (2013) Speech enhancement: theory and practice. CRC press
Malah D, Cox RV, Accardi AJ (1999) Tracking Speech-Presence Uncertainty To Improve Speech Enhancement In Nonstationary Noise Environments. Proc IEEE ICASSP:789–792
Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans on SAP 9(5):504–512
Martin VA, Pollack P (2005) Methods for speech SNR estimation: evaluation tool and analysis of VAD dependency. Radioengineering 14(1):6–11
Poularikas AD (2018) Handbook of formulas and tables for signal processing. CRC press, p 2018
Rangachari S, Loizou PC (2006) A noise estimation algorithm for highly non-stationary environments, speech communication. Elsevier 28:220–231
Sharma RR, Pachori RB (2018) Eigenvalue decomposition of Hankel matrix-based time-frequency representation for complex signals. Circuits, Syst Signal Proces 37(8):3313–3329
Sophocles JO (2018) Optimum signal processing, 2nd edition, New York, NY. p.392.
Wang H, Ye Z, Chen J (2018) A Speech Enhancement System for Automotive Speech Recognition with a Hybrid Voice Activity Detection Method. In: 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE, pp 1–9
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Asbai, N., Zitouni, S., Bounazou, H. et al. Noisy speech enhancement based on correlation canceling/log-MMSE hybrid method. Multimed Tools Appl 82, 5803–5821 (2023). https://doi.org/10.1007/s11042-022-13591-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13591-8