Skip to main content
Log in

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Nowadays, deep neural network has become the prime approach for enhancing speech signals as it yields good results compared to the traditional methods. This paper describes the transformation in the enhanced speech signal by applying the deep convolutional neural network (Deep CNN), which can model nonlinear relationships and compare it with the Wiener filtering method, which is the best technique for speech enhancement among the traditional methods. Denoising is performed in the frequency domain and converted back to the time domain to analyze performance metrics such as speech quality and speech intelligibility. The speech quality is analyzed based on the signal to noise ratio (SNR) and perceptual evaluation of speech quality (PESQ). Speech intelligibility is analyzed by short-time objective intelligibility (STOI). Both the methods evaluated the denoised speech, and the analysis made on the results shows that the SNR of the conventional Wiener filtering method is much improved when compared with Deep CNN. However, the PESQ and STOI of Deep CNN-based enhanced speech outperform the Wiener filtering method. The performance metrics indicate that Deep CNN achieves better results than the conventional technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Chai L, Du J, Liu Q-F, Lee C-H (2019) Using generalized Gaussian distributions to improve regression error modeling for deep learning-based speech enhancement. IEEE ACM Trans Audio Speech Lang Process 27(12):1919–1931

    Article  Google Scholar 

  • Cui X, Chen Z, Yin F (2020) Speech enhancement based on simple recurrent unit network. Appl Acoust 157:107019

    Article  Google Scholar 

  • De S, Smith SL (2020) Batch normalization biases deep residual networks towards shallow paths. CoRR, vol. abs/2002.10444

  • Dionelis N, Brookes M (2018) Phase aware single channel speech enhancement with modulation domain Kalman filtering. IEEE ACM Trans Audio Speech Lang Process 26:5

    Google Scholar 

  • Du et al (2017) Stacked convolutional denoising auto-encoders for feature representation. IEEE Trans Cybern 47(4):1017–1027

    Article  Google Scholar 

  • Fu S-W, Tsao Y, Lu X (2016) Snr-aware convolutional neural network modeling for speech enhancement. In: Interspeech, pp 3768–3772

  • Fu S-W, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech enhancement by fully convolutional networks. In: Proceedings of the APSIPA ASC, pp 6–12

  • Fu S-W, Wang T-W, Tsao Y, Lu X, Kawai H (2018) End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks. IEEE ACM Trans Audio Speech Lang Process (TASLP) 26(9):1570–1584

    Article  Google Scholar 

  • Grais EM, Erdogan H (2013) Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation. In: Proc. Inter-speech

  • Grais EM, Plumbley MD (2017) Single channel audio source separation using convolutional denoising autoencoders. In: Proceedings of the IEEE global conference on signal information processing, pp 1265–1269

  • Healy EW, Delfarah M, Vasko JL, Carter BL, Wang D (2017) An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker. J Acoust Soc Am 141(6):4230–4239

    Article  Google Scholar 

  • Hsieh T-A, Wang H-M, Lu X, Tsao Y (2020) WaveCRN: an efficient convolutional recurrent neural network for end-to-end speech enhancement. IEEE Signal Process Lett 27:2149

    Article  Google Scholar 

  • https://datashare.is.ed.ac.uk/handle/10283/2791

  • Hu Y, Loizou PC (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238

    Article  Google Scholar 

  • ITU, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs ITU-T Rec. p 862 (2000)

  • Jain K, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans Circuits Syst Video Technol 14(1):4–20

    Article  Google Scholar 

  • Kameoka H, Tanaka K, Kwasny D, Kaneko T, Hojo N (2020) ConvS2S-VC: fully convolutional sequence-to-sequence voice conversion. IEEE ACM Trans Audio Speech Lang Process 28:1849–1863

    Article  Google Scholar 

  • Kolbæk M, Tran Z-H, Jensen SH, Jensen J (2020) On loss functions for supervised monaural time-domain speech enhancement. IEEE ACM Trans Audio Speech Lang Process 28:825–838

    Article  Google Scholar 

  • Kolbk M, Tan Z, Jensen J (2017) Speech intelligibility potential of general and specialized deep neural network-based speech enhancement systems. IEEE ACM Trans Audio Speech Lang Process 25(1):153–167

    Article  Google Scholar 

  • Kumar TS (2021) Construction of hybrid deep learning model for predicting children behavior based on their emotional reaction. J Inf Technol 3(01):29–43

    Google Scholar 

  • Lan T, Lyu Y, Ye W, Hui G, Zenglin Xu, Liu Q (2020) Combining multi-perspective attention mechanism with convolutional networks for monoaural speech enhancement. IEEE Access 8:78979–78991

    Article  Google Scholar 

  • Li A, Yuan M, Zheng C, Li X (2020) Speech enhancement using progressive learning-based convolutional recurrent neural network. Appl Acoust 166:107347

    Article  Google Scholar 

  • Li R, Liu Y, Shi Y, Dong L, Cui W (2016) ILMSAF based speech enhancement with DNN and noise classification. Speech Commun 85:53–70

    Article  Google Scholar 

  • Li J, Zhang H, Zhang X, Li C (2019) Single channel speech enhancement using temporal convolutional recurrent neural networks. In: Proceedings of the APSIPA ASC, pp 896–900

  • Loizou PC (2013) Speech enhancement: theory and practice, 2nd edn. CRC Press, Boca Raton

    Book  Google Scholar 

  • Meng Z, Li J, Gong Y, Juang BH (2018) Cycle-consistent speech enhancement. In: Proceedings of the INTERSPEECH, pp 1165–1169

  • Nossier SA, Wall J, Moniri M, Glackin C, Cannings N (2021) An experimental analysis of deep learning architectures for supervised speech enhancement. Electronics 10(1):17

    Article  Google Scholar 

  • Paliwal KK, Wojcicki K, Schwerin B (2010) Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun 52(5):450–475

    Article  Google Scholar 

  • Pandey D, Wang D (2019) TCNN: temporal convolutional neural network for real-time speech enhancement in the time domain. In: Proceedings of the Interspeech, pp 6975–6879

  • Pandey A, Wang D (2019) A new framework for CNN based speech enhancement in the time domain. IEEE ACM Trans Audio Speech Lang Process 27(7):1179

    Article  Google Scholar 

  • Park SR, Lee JW (2017) A fully convolutional neural network for speech enhancement. Proc Interspeech 2017:1993–1997

    Article  Google Scholar 

  • Rix W, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2, pp 749–752

  • Schwerin B, Paliwal KK (2014) Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement. Speech Commun 58:49–68

    Article  Google Scholar 

  • Srinivasan S, Samuelsson J, Kleijn WB (2006) Codebook driven short term predictor parameter estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(1):163–176

    Article  Google Scholar 

  • Sungheetha A, Rajesh Sharma R (2021) Classification of remote sensing image scenes using double feature extraction hybrid deep learning approach. J Inf Technol 3(02):133–149

    Google Scholar 

  • Tan K, Wang D (2020) Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE ACM Trans Audio Speech Lang Process 28:380–390

    Article  Google Scholar 

  • Thiergart O, Taseska M, Habets EAP (2014) An informed parametric spatial filter based on instantaneous direction-of-arrival estimates. IEEE ACM Trans Audio Speech Lang Process 22:12

    Google Scholar 

  • Wang D, Chen J (2018) Supervised speech separation based on deep learning: An overview. IEEE ACM Trans Audio Speech Lang Process 26(10):1702–1726

    Article  Google Scholar 

  • Wang NY-H, Wang H-LS, Wang F-W, Lu X, Wang H-M, Tsao Y (2021) Improving the intelligibility of speech for simulated electric and acoustic simulation using fully convolutional neural network. IEEE Trans Neural Syst Rehabil Eng 29:184–195

    Article  Google Scholar 

  • Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29

    Article  Google Scholar 

  • Xian Y, Sun Y, Wang W, Naqvi SM (2021) Convolutional fusion network for monaural speech enhancement. Neural Netw 143:97–107

    Article  Google Scholar 

  • Xu Y, Jun Du, Dai L-R, Lee C-H (2013) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):65–68

    Article  Google Scholar 

  • Yuan W (2020) A time–frequency smoothing neural network for speech enhancement. Speech Commun 124:75–84

    Article  Google Scholar 

  • Zhao H, Zarar S, Tashev I, Lee C (2018) Convolutional-recurrent neural networks for speech enhancement. In: International conference on acoustics, speech, and signal processing, pp 2401–2405

  • Zheng N, Shi Y, Rong W, Kang Y (2020) Effects of skip connections in CNN-based architectures for speech enhancement. J Signal Process Syst 92:875–884

    Article  Google Scholar 

Download references

Funding

No funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Hepsiba.

Ethics declarations

Conflict of interest

We don’t have any conflict of interest.

Human and animal rights statement

Humans/animals are not involved in this research work.

Data availability statements

The datasets analyzed during the current study are available in the University of Edinburgh, Centre for Speech Technology Research (CSTR). https://datashare.is.ed.ac.uk/handle/10283/2791.

Additional information

Communicated by Joy Iong-Zong Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hepsiba, D., Justin, J. Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN. Soft Comput 26, 13037–13047 (2022). https://doi.org/10.1007/s00500-021-06291-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06291-2

Keywords

Navigation