Data Augmentation and Loss Normalization for Deep Noise Suppression

Braun, Sebastian; Tashev, Ivan

doi:10.1007/978-3-030-60276-5_8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1899 Accesses
33 Citations

Abstract

Speech enhancement using neural networks is recently receiving large attention in research and being integrated in commercial devices and applications. In this work, we investigate data augmentation techniques for supervised deep learning-based speech enhancement. We show that not only augmenting SNR values to a broader range and a continuous distribution helps to regularize training, but also augmenting the spectral and dynamic level diversity. However, to not degrade training by level augmentation, we propose a modification to signal-based loss functions by applying sequence level normalization. We show in experiments that this normalization overcomes the degradation caused by training on sequences with imbalanced signal levels, when using a level-dependent loss function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Audio Denoising Using Deep Neural Networks

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Article Open access 13 January 2016

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Article Open access 25 October 2023

References

Cho, K., Merriënboer, B.V., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8) (2014)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Trans. Graph. 37(4), 112:1–112:11 (2018)
Article Google Scholar
Gerkmann, T., Hendriks, R.C.: Noise power estimation based on the probability of speech presence. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 145–148, October 2011
Google Scholar
Hu, K., Divenyi, P., Ellis, D., Jin, Z., Shinn-Cunningham, B.G., Wang, D.: Preliminary intelligibility tests of a monaural speech segregation system. In: Proceedings of the Workshop on Statistical and Perceptual Audition, Brisbane, September 2008
Google Scholar
ITU-T: Recommendation P.862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, February 2001
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9, 504–512 (2001)
Article Google Scholar
Reddy, C.K.A., et al.: The INTERSPEECH 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. In: Proceedings of the INTERSPEECH 2020 (2020, to appear)
Google Scholar
Roux, J.L., Wisdom, S., Erdogan, H., Hershey, J.R.: SDR - half-baked or well done? In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 626–630, May 2019
Google Scholar
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Article Google Scholar
Tan, K., Wang, D.: A convolutional recurrent neural network for real-time speech enhancement. In: Proceedings of the Interspeech, pp. 3229–3233 (2018)
Google Scholar
Tu, Y.H., Tashev, I., Zarar, S., Lee, C.: A hybrid approach to combining conventional and deep learning techniques for single-channel speech enhancement and recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2531–2535, April 2018
Google Scholar
Valin, J.: A hybrid DSP/deep learning approach to real-time full-band speech enhancement. In: 20th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5, August 2018
Google Scholar
Vincent, E., Barker, J., Watanabe, S., Nesta, F.: The second ‘CHIME’ speech separation and recognition challenge: datasets, tasks and baselines. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2012
Google Scholar
Wang, D., Chen, J.: Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans. Audio Speech Lang. Process. 26(10), 1702–1726 (2018)
Article MathSciNet Google Scholar
Wichern, G., Lukin, A.: Low-latency approximation of bidirectional recurrent networks for speech denoising. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 66–70, October 2017
Google Scholar
Wilson, K., et al.: Exploring tradeoffs in models for low-latency speech enhancement. In: Proceedings of the International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 366–370, September 2018
Google Scholar
Wisdom, S., et al.: Differentiable consistency constraints for improved deep speech enhancement. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 900–904, May 2019
Google Scholar
Xia, R., Braun, S., Reddy, C., Dubey, H., Cutler, R., Tahev, I.: Weighted speech distortion losses for neural-network-based real-time speech enhancement. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, USA
Sebastian Braun & Ivan Tashev

Authors

Sebastian Braun
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Tashev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Braun .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Braun, S., Tashev, I. (2020). Data Augmentation and Loss Normalization for Deep Noise Suppression. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_8
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Augmentation and Loss Normalization for Deep Noise Suppression

Abstract

Access this chapter

Similar content being viewed by others

Audio Denoising Using Deep Neural Networks

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Data Augmentation and Loss Normalization for Deep Noise Suppression

Abstract

Access this chapter

Similar content being viewed by others

Audio Denoising Using Deep Neural Networks

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation