Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Hepsiba, D.; Justin, Judith

doi:10.1007/s00500-021-06291-2

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Focus
Published: 06 October 2021

Volume 26, pages 13037–13047, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

D. Hepsiba^1,2 &
Judith Justin¹

443 Accesses
3 Citations
Explore all metrics

Abstract

Nowadays, deep neural network has become the prime approach for enhancing speech signals as it yields good results compared to the traditional methods. This paper describes the transformation in the enhanced speech signal by applying the deep convolutional neural network (Deep CNN), which can model nonlinear relationships and compare it with the Wiener filtering method, which is the best technique for speech enhancement among the traditional methods. Denoising is performed in the frequency domain and converted back to the time domain to analyze performance metrics such as speech quality and speech intelligibility. The speech quality is analyzed based on the signal to noise ratio (SNR) and perceptual evaluation of speech quality (PESQ). Speech intelligibility is analyzed by short-time objective intelligibility (STOI). Both the methods evaluated the denoised speech, and the analysis made on the results shows that the SNR of the conventional Wiener filtering method is much improved when compared with Deep CNN. However, the PESQ and STOI of Deep CNN-based enhanced speech outperform the Wiener filtering method. The performance metrics indicate that Deep CNN achieves better results than the conventional technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

A Literature Survey on Speech Enhancement Based on Deep Neural Network Technique

Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

Article 01 August 2023

References

Chai L, Du J, Liu Q-F, Lee C-H (2019) Using generalized Gaussian distributions to improve regression error modeling for deep learning-based speech enhancement. IEEE ACM Trans Audio Speech Lang Process 27(12):1919–1931
Article Google Scholar
Cui X, Chen Z, Yin F (2020) Speech enhancement based on simple recurrent unit network. Appl Acoust 157:107019
Article Google Scholar
De S, Smith SL (2020) Batch normalization biases deep residual networks towards shallow paths. CoRR, vol. abs/2002.10444
Dionelis N, Brookes M (2018) Phase aware single channel speech enhancement with modulation domain Kalman filtering. IEEE ACM Trans Audio Speech Lang Process 26:5
Google Scholar
Du et al (2017) Stacked convolutional denoising auto-encoders for feature representation. IEEE Trans Cybern 47(4):1017–1027
Article Google Scholar
Fu S-W, Tsao Y, Lu X (2016) Snr-aware convolutional neural network modeling for speech enhancement. In: Interspeech, pp 3768–3772
Fu S-W, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech enhancement by fully convolutional networks. In: Proceedings of the APSIPA ASC, pp 6–12
Fu S-W, Wang T-W, Tsao Y, Lu X, Kawai H (2018) End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks. IEEE ACM Trans Audio Speech Lang Process (TASLP) 26(9):1570–1584
Article Google Scholar
Grais EM, Erdogan H (2013) Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation. In: Proc. Inter-speech
Grais EM, Plumbley MD (2017) Single channel audio source separation using convolutional denoising autoencoders. In: Proceedings of the IEEE global conference on signal information processing, pp 1265–1269
Healy EW, Delfarah M, Vasko JL, Carter BL, Wang D (2017) An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker. J Acoust Soc Am 141(6):4230–4239
Article Google Scholar
Hsieh T-A, Wang H-M, Lu X, Tsao Y (2020) WaveCRN: an efficient convolutional recurrent neural network for end-to-end speech enhancement. IEEE Signal Process Lett 27:2149
Article Google Scholar
https://datashare.is.ed.ac.uk/handle/10283/2791
Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Article Google Scholar
ITU, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs ITU-T Rec. p 862 (2000)
Jain K, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans Circuits Syst Video Technol 14(1):4–20
Article Google Scholar
Kameoka H, Tanaka K, Kwasny D, Kaneko T, Hojo N (2020) ConvS2S-VC: fully convolutional sequence-to-sequence voice conversion. IEEE ACM Trans Audio Speech Lang Process 28:1849–1863
Article Google Scholar
Kolbæk M, Tran Z-H, Jensen SH, Jensen J (2020) On loss functions for supervised monaural time-domain speech enhancement. IEEE ACM Trans Audio Speech Lang Process 28:825–838
Article Google Scholar
Kolbk M, Tan Z, Jensen J (2017) Speech intelligibility potential of general and specialized deep neural network-based speech enhancement systems. IEEE ACM Trans Audio Speech Lang Process 25(1):153–167
Article Google Scholar
Kumar TS (2021) Construction of hybrid deep learning model for predicting children behavior based on their emotional reaction. J Inf Technol 3(01):29–43
Google Scholar
Lan T, Lyu Y, Ye W, Hui G, Zenglin Xu, Liu Q (2020) Combining multi-perspective attention mechanism with convolutional networks for monoaural speech enhancement. IEEE Access 8:78979–78991
Article Google Scholar
Li A, Yuan M, Zheng C, Li X (2020) Speech enhancement using progressive learning-based convolutional recurrent neural network. Appl Acoust 166:107347
Article Google Scholar
Li R, Liu Y, Shi Y, Dong L, Cui W (2016) ILMSAF based speech enhancement with DNN and noise classification. Speech Commun 85:53–70
Article Google Scholar
Li J, Zhang H, Zhang X, Li C (2019) Single channel speech enhancement using temporal convolutional recurrent neural networks. In: Proceedings of the APSIPA ASC, pp 896–900
Loizou PC (2013) Speech enhancement: theory and practice, 2nd edn. CRC Press, Boca Raton
Book Google Scholar
Meng Z, Li J, Gong Y, Juang BH (2018) Cycle-consistent speech enhancement. In: Proceedings of the INTERSPEECH, pp 1165–1169
Nossier SA, Wall J, Moniri M, Glackin C, Cannings N (2021) An experimental analysis of deep learning architectures for supervised speech enhancement. Electronics 10(1):17
Article Google Scholar
Paliwal KK, Wojcicki K, Schwerin B (2010) Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun 52(5):450–475
Article Google Scholar
Pandey D, Wang D (2019) TCNN: temporal convolutional neural network for real-time speech enhancement in the time domain. In: Proceedings of the Interspeech, pp 6975–6879
Pandey A, Wang D (2019) A new framework for CNN based speech enhancement in the time domain. IEEE ACM Trans Audio Speech Lang Process 27(7):1179
Article Google Scholar
Park SR, Lee JW (2017) A fully convolutional neural network for speech enhancement. Proc Interspeech 2017:1993–1997
Article Google Scholar
Rix W, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2, pp 749–752
Schwerin B, Paliwal KK (2014) Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun 58:49–68
Article Google Scholar
Srinivasan S, Samuelsson J, Kleijn WB (2006) Codebook driven short term predictor parameter estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(1):163–176
Article Google Scholar
Sungheetha A, Rajesh Sharma R (2021) Classification of remote sensing image scenes using double feature extraction hybrid deep learning approach. J Inf Technol 3(02):133–149
Google Scholar
Tan K, Wang D (2020) Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE ACM Trans Audio Speech Lang Process 28:380–390
Article Google Scholar
Thiergart O, Taseska M, Habets EAP (2014) An informed parametric spatial filter based on instantaneous direction-of-arrival estimates. IEEE ACM Trans Audio Speech Lang Process 22:12
Google Scholar
Wang D, Chen J (2018) Supervised speech separation based on deep learning: An overview. IEEE ACM Trans Audio Speech Lang Process 26(10):1702–1726
Article Google Scholar
Wang NY-H, Wang H-LS, Wang F-W, Lu X, Wang H-M, Tsao Y (2021) Improving the intelligibility of speech for simulated electric and acoustic simulation using fully convolutional neural network. IEEE Trans Neural Syst Rehabil Eng 29:184–195
Article Google Scholar
Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29
Article Google Scholar
Xian Y, Sun Y, Wang W, Naqvi SM (2021) Convolutional fusion network for monaural speech enhancement. Neural Netw 143:97–107
Article Google Scholar
Xu Y, Jun Du, Dai L-R, Lee C-H (2013) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):65–68
Article Google Scholar
Yuan W (2020) A time–frequency smoothing neural network for speech enhancement. Speech Commun 124:75–84
Article Google Scholar
Zhao H, Zarar S, Tashev I, Lee C (2018) Convolutional-recurrent neural networks for speech enhancement. In: International conference on acoustics, speech, and signal processing, pp 2401–2405
Zheng N, Shi Y, Rong W, Kang Y (2020) Effects of skip connections in CNN-based architectures for speech enhancement. J Signal Process Syst 92:875–884
Article Google Scholar

Download references

Funding

No funding.

Author information

Authors and Affiliations

Department of Biomedical Instrumentation Engineering, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India
D. Hepsiba & Judith Justin
Department of Biomedical Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India
D. Hepsiba

Authors

D. Hepsiba
View author publications
You can also search for this author in PubMed Google Scholar
Judith Justin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Hepsiba.

Ethics declarations

Conflict of interest

We don’t have any conflict of interest.

Human and animal rights statement

Humans/animals are not involved in this research work.

Data availability statements

The datasets analyzed during the current study are available in the University of Edinburgh, Centre for Speech Technology Research (CSTR). https://datashare.is.ed.ac.uk/handle/10283/2791.

Additional information

Communicated by Joy Iong-Zong Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hepsiba, D., Justin, J. Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN. Soft Comput 26, 13037–13047 (2022). https://doi.org/10.1007/s00500-021-06291-2

Download citation

Accepted: 15 September 2021
Published: 06 October 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00500-021-06291-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Abstract

Access this article

Similar content being viewed by others

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

A Literature Survey on Speech Enhancement Based on Deep Neural Network Technique

Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights statement

Data availability statements

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Abstract

Access this article

Similar content being viewed by others

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

A Literature Survey on Speech Enhancement Based on Deep Neural Network Technique

Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights statement

Data availability statements

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation