Skip to main content
Log in

Deep Speech Denoising with Minimal Dependence on Clean Speech Data

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Most of the existing deep learning-based speech denoising methods rely heavily on clean speech data. According to the traditional view, a large number of noisy and clean speech samples are required for good speech denoising performance. However, the data collection is a technical barrier to this criteria, particularly in economically challenged areas and for languages with limited resources. Training deep denoising networks with only noisy speech samples is a viable option to avoid dependence on sample data size. In this study, the target and input of a DCU-Net were trained using only noisy speech samples. Experimental results demonstrate that, when compared to traditional speech denoising techniques, the proposed approach avoids not only the high dependence on clean targets but also the high dependence on large data sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author on request.

References

  1. N. Alamdari, A. Azarang, N. Kehtarnavaz, Improving deep speech denoising by noisy2noisy signal mapping. Appl. Acoust. 172, 107631 (2021)

    Article  Google Scholar 

  2. Y. Attabi, B. Champagne, W.P. Zhu, Dnn-based calibrated-filter models for speech enhancement. Circuits Syst. Signal Process. 40, 2926–2949 (2021)

    Article  Google Scholar 

  3. A. Azarang, N. Kehtarnavaz, A review of multi-objective deep learning speech denoising methods. Speech Commun. 122, 1–10 (2020)

    Article  Google Scholar 

  4. D. Baby, S. Verhulst, Sergan: speech enhancement using relativistic generative adversarial networks with gradient penalty. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 106–110. IEEE (2019)

  5. H.S. Choi, J.H. Kim, J. Huh, A. Kim, J.W. Ha, K. Lee, Phase-aware speech enhancement with deep complex u-net. In: International Conference on Learning Representations (2019)

  6. A. Defossez, G. Synnaeve, Y. Adi, Real time speech enhancement in the waveform domain. arXiv preprint arXiv:2006.12847 (2020)

  7. S.W. Fu, C. Yu, T.A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, Metricgan+: an improved version of metricgan for speech enhancement. arXiv preprint arXiv:2104.03538 (2021)

  8. E.M. Grais, M.D. Plumbley, Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1265–1269. IEEE (2017)

  9. F. He, S.H.C. Chu, O. Kjartansson, C.E. Rivera, A. Katanova, A. Gutkin, I. Demirsahin, C.C. Johny, M. Jansche, S. Sarin, et al., Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems. In: Proceedings of the Twelfth Language Resources and Evaluation Conference (2020)

  10. Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2007)

    Article  Google Scholar 

  11. M.M. Kashyap, A. Tambwekar, K. Manohara, S. Natarajan, Speech denoising without clean training data: a noise2noise approach. arXiv preprint arXiv:2104.03838 (2021)

  12. M. Kawanaka, Y. Koizumi, R. Miyazaki, K. Yatabe, Stable training of dnn for speech enhancement based on perceptually-motivated black-box cost function. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7524–7528. IEEE (2020)

  13. Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, D. Takeuchi, Speech enhancement using self-adaptation and multi-head self-attention. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 181–185. IEEE (2020)

  14. J. Le Roux, S. Wisdom, H. Erdogan, J.R. Hershey, Sdr–half-baked or well done? In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 626–630. IEEE (2019)

  15. J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, T. Aila, Noise2noise: learning image restoration without clean data. arXiv preprint arXiv:1803.04189 (2018)

  16. X. Lu, Y. Tsao, S. Matsuda, C. Hori, Speech enhancement based on deep denoising autoencoder. In: Interspeech, vol. 2013, pp. 436–440 (2013)

  17. A.A. Nugraha, A. Liutkus, E. Vincent, Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(9), 1652–1664 (2016)

    Article  Google Scholar 

  18. S. Pascual, A. Bonafonte, J. Serra, Segan: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)

  19. I.T. Recommendation, Perceptual evaluation of speech quality (pesq): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862 (2001)

  20. O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)

  21. J. Salamon, C. Jacoby, J.P. Bello, A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041–1044 (2014)

  22. N. Sharma, M.K. Singh, S.Y. Low, A. Kumar, Weighted sigmoid-based frequency-selective noise filtering for speech denoising. Circuits Syst. Signal Process. 40, 276–295 (2021)

    Article  Google Scholar 

  23. J. Su, Z. Jin, A. Finkelstein, Hifi-gan: high-fidelity denoising and dereverberation based on speech deep features in adversarial networks. arXiv preprint arXiv:2006.05694 (2020)

  24. C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)

    Article  Google Scholar 

  25. N. Takahashi, N. Goswami, Y. Mitsufuji, Mmdenselstm: an efficient combination of convolutional and recurrent neural networks for audio source separation. In: 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 106–110. IEEE (2018)

  26. J. Thiemann, N. Ito, E. Vincent, The diverse environments multi-channel acoustic noise database (demand): a database of multichannel environmental noise recordings. In: Proceedings of Meetings on Acoustics ICA2013, vol. 19, p. 035081. Acoustical Society of America (2013)

  27. C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J.F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, C.J. Pal, Deep complex networks. arXiv preprint arXiv:1705.09792 (2017)

  28. C. Valentini-Botinhao, X. Wang, S. Takaki, J. Yamagishi, Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks. In: Interspeech, vol. 8, pp. 352–356 (2016)

  29. K. Wang, B. He, W.P. Zhu, Caunet: context-aware u-net for speech enhancement in time domain. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2021)

  30. D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(3), 483–492 (2015)

    Article  PubMed  Google Scholar 

  31. J. Wu, Q. Li, G. Yang, L. Senhadji, H. Shu, Self-supervised speech denoising using only noisy audio signals. arXiv preprint arXiv:2111.00242 (2021)

  32. Y. Xu, J. Du, L.R. Dai, C.H. Lee, An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2013)

    Article  ADS  CAS  Google Scholar 

  33. Y. Xu, J. Du, L.R. Dai, C.H. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(1), 7–19 (2014)

    Article  CAS  Google Scholar 

  34. J. Zeng, L. Yang, Speech enhancement of complex convolutional recurrent network with attention. Circuits, Syst. Signal Process. 1–14 (2022)

  35. H. Zhao, S. Zarar, I. Tashev, C.H. Lee, Convolutional-recurrent neural networks for speech enhancement. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2401–2405. IEEE (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Venkateswarlu Poluboina.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Poluboina, V., Pulikala, A. & Pitchaimuthu, A.N. Deep Speech Denoising with Minimal Dependence on Clean Speech Data. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02644-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00034-024-02644-y

Keywords

Navigation