Skip to main content

Data Augmentation and Loss Normalization for Deep Noise Suppression

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

Abstract

Speech enhancement using neural networks is recently receiving large attention in research and being integrated in commercial devices and applications. In this work, we investigate data augmentation techniques for supervised deep learning-based speech enhancement. We show that not only augmenting SNR values to a broader range and a continuous distribution helps to regularize training, but also augmenting the spectral and dynamic level diversity. However, to not degrade training by level augmentation, we propose a modification to signal-based loss functions by applying sequence level normalization. We show in experiments that this normalization overcomes the degradation caused by training on sequences with imbalanced signal levels, when using a level-dependent loss function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cho, K., Merriënboer, B.V., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8) (2014)

    Google Scholar 

  2. Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  3. Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Trans. Graph. 37(4), 112:1–112:11 (2018)

    Article  Google Scholar 

  4. Gerkmann, T., Hendriks, R.C.: Noise power estimation based on the probability of speech presence. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 145–148, October 2011

    Google Scholar 

  5. Hu, K., Divenyi, P., Ellis, D., Jin, Z., Shinn-Cunningham, B.G., Wang, D.: Preliminary intelligibility tests of a monaural speech segregation system. In: Proceedings of the Workshop on Statistical and Perceptual Audition, Brisbane, September 2008

    Google Scholar 

  6. ITU-T: Recommendation P.862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, February 2001

    Google Scholar 

  7. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7

  8. Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9, 504–512 (2001)

    Article  Google Scholar 

  9. Reddy, C.K.A., et al.: The INTERSPEECH 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. In: Proceedings of the INTERSPEECH 2020 (2020, to appear)

    Google Scholar 

  10. Roux, J.L., Wisdom, S., Erdogan, H., Hershey, J.R.: SDR - half-baked or well done? In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 626–630, May 2019

    Google Scholar 

  11. Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)

    Article  Google Scholar 

  12. Tan, K., Wang, D.: A convolutional recurrent neural network for real-time speech enhancement. In: Proceedings of the Interspeech, pp. 3229–3233 (2018)

    Google Scholar 

  13. Tu, Y.H., Tashev, I., Zarar, S., Lee, C.: A hybrid approach to combining conventional and deep learning techniques for single-channel speech enhancement and recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2531–2535, April 2018

    Google Scholar 

  14. Valin, J.: A hybrid DSP/deep learning approach to real-time full-band speech enhancement. In: 20th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5, August 2018

    Google Scholar 

  15. Vincent, E., Barker, J., Watanabe, S., Nesta, F.: The second ‘CHIME’ speech separation and recognition challenge: datasets, tasks and baselines. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2012

    Google Scholar 

  16. Wang, D., Chen, J.: Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans. Audio Speech Lang. Process. 26(10), 1702–1726 (2018)

    Article  MathSciNet  Google Scholar 

  17. Wichern, G., Lukin, A.: Low-latency approximation of bidirectional recurrent networks for speech denoising. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 66–70, October 2017

    Google Scholar 

  18. Wilson, K., et al.: Exploring tradeoffs in models for low-latency speech enhancement. In: Proceedings of the International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 366–370, September 2018

    Google Scholar 

  19. Wisdom, S., et al.: Differentiable consistency constraints for improved deep speech enhancement. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 900–904, May 2019

    Google Scholar 

  20. Xia, R., Braun, S., Reddy, C., Dubey, H., Cutler, R., Tahev, I.: Weighted speech distortion losses for neural-network-based real-time speech enhancement. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Braun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Braun, S., Tashev, I. (2020). Data Augmentation and Loss Normalization for Deep Noise Suppression. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics