Skip to main content

Audio Source Separation with Discriminative Scattering Networks

  • Conference paper
  • First Online:
Latent Variable Analysis and Signal Separation (LVA/ICA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9237))

Abstract

Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lee, D.D., Seung, H.S.: Learning parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  2. Smaragdis, P., Fevotte, C., Mysore, G., Mohammadiha, N., Hoffman, M.: Static and dynamic source separation using nonnegative factorizations: a unified view. IEEE Sig. Process. Mag. 31(3), 66–75 (2014)

    Article  Google Scholar 

  3. Mairal, J., Bach, F., Ponce, J.: Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach. Intel. 34(4), 791–804 (2012)

    Article  Google Scholar 

  4. Sprechmann, P., Bronstein, A.M., Sapiro, G.: Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement. In: HSCMA, pp. 11–15. IEEE (2014)

    Google Scholar 

  5. Weninger, F., Le Roux, J., Hershey, J.R., Watanabe, S.: Discriminative NMF and its application to single-channel source separation. In: Proceedings of ISCA Interspeech (2014)

    Google Scholar 

  6. Huang, P.-S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Deep learning for monaural speech separation. In: ICASSP, pp. 1562–1566 (2014)

    Google Scholar 

  7. Sprechmann, P., Bronstein, A., Bronstein, M., Sapiro, G.: Learnable low rank sparse models for speech denoising. In: ICASSP, pp. 136–140 (2013)

    Google Scholar 

  8. Weninger, F., Le Roux, J., Hershey, J.R., Schuller, B.: Discriminatively trained recurrent neural networks for single-channel speech separation. In: Proceedings IEEE GlobalSIP 2014 Symposium on Machine Learning Applications in Speech Processing (2014)

    Google Scholar 

  9. Févotte, C.: Majorization-minimization algorithm for smooth itakura-saito nonnegative matrix factorization. In: ICASSP, pp. 1980–1983. IEEE (2011)

    Google Scholar 

  10. Wilson, K.W., Raj, B., Smaragdis, P., Divakaran, A.: Speech denoising using nonnegative matrix factorization with priors. In: ICASSP, pp. 4029–4032 (2008)

    Google Scholar 

  11. Mysore, G.J., Smaragdis, P.: A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics. In: ICASSP, pp. 17–20 (2011)

    Google Scholar 

  12. Han, J., Mysore, G.J., Pardo, B.: Audio imputation using the non-negative hidden markov model. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) LVA/ICA 2012. LNCS, vol. 7191, pp. 347–355. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Févotte, C., Le Roux, J., Hershey, J.R.: Non-negative dynamical system with application to speech and audio. In: ICASSP (2013)

    Google Scholar 

  14. Boulanger-Lewandowski, N., Mysore, G.J., Hoffman, M.: Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation. In: ICASSP, May 2014, pp. 6969–6973 (2014)

    Google Scholar 

  15. Bruna, J., Sprechmann, P., LeCun, Y.: Source separation with scattering non-negative matrix factorization (2014, submitted)

    Google Scholar 

  16. Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)

    Google Scholar 

  17. Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intel. 35(8), 1872–1886 (2013)

    Article  Google Scholar 

  18. Andén, J., Mallat, S.: Deep scattering spectrum (2013). arXiv preprint arXiv:1304.6763

  19. Schmidt, M.N., Larsen, J., Hsiao, F.-T.: Wind noise reduction using non-negative sparse coding. In: MLSP, August 2007, pp. 431–436 (2007)

    Google Scholar 

  20. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  21. Mallat, S.: Recursive interferometric representation. In: Proceedings of EUSICO Conference, Denmark (2010)

    Google Scholar 

  22. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, New York (1999)

    MATH  Google Scholar 

  23. Bruna, J., Mallat, S.: Audio texture synthesis with scattering moments (2013). arXiv preprint arXiv:1311.0407

  24. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Proc. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Sprechmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sprechmann, P., Bruna, J., LeCun, Y. (2015). Audio Source Separation with Discriminative Scattering Networks. In: Vincent, E., Yeredor, A., Koldovský, Z., Tichavský, P. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science(), vol 9237. Springer, Cham. https://doi.org/10.1007/978-3-319-22482-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22482-4_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22481-7

  • Online ISBN: 978-3-319-22482-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics