Advertisement

Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks

  • Emad M. Grais
  • Gerard Roma
  • Andrew J. R. Simpson
  • Mark D. Plumbley
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10169)

Abstract

The sources separated by most single channel audio source separation techniques are usually distorted and each separated source contains residual signals from the other sources. To tackle this problem, we propose to enhance the separated sources to decrease the distortion and interference between the separated sources using deep neural networks (DNNs). Two different DNNs are used in this work. The first DNN is used to separate the sources from the mixed signal. The second DNN is used to enhance the separated signals. To consider the interactions between the separated sources, we propose to use a single DNN to enhance all the separated sources together. To reduce the residual signals of one source from the other separated sources (interference), we train the DNN for enhancement discriminatively to maximize the dissimilarity between the predicted sources. The experimental results show that using discriminative enhancement decreases the distortion and interference between the separated sources.

Keywords

Single channel audio source separation Deep neural networks Audio enhancement Discriminative training 

Notes

Acknowledgment

This work is supported by grants EP/L027119/1 and EP/L027119/2 from the UK Engineering and Physical Sciences Research Council (EPSRC).

References

  1. 1.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010)Google Scholar
  2. 2.
    Erdogan, H., Hershey, J., Watanabe, S., Roux, J.L.: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: Proceedings of the ICASSP, pp. 708–712 (2015)Google Scholar
  3. 3.
    Grais, E.M., Erdogan, H.: Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation. In: Proceedings of the InterSpeech (2012)Google Scholar
  4. 4.
    Grais, E.M., Erdogan, H.: Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation. In: Proceedings of the InterSpeech (2013)Google Scholar
  5. 5.
    Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: Proceedings of the ICASSP, pp. 3734–3738 (2014)Google Scholar
  6. 6.
    Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-Voice separation from monaural recordings using deep recurrent neural networks. In: Proceedings of the ISMIR, pp. 477–482 (2014)Google Scholar
  7. 7.
    Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRefGoogle Scholar
  8. 8.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. (NIPS) 13, 556–562 (2001)Google Scholar
  9. 9.
    Narayanan, A., Wang, D.: Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of the ICASSP, pp. 7092–7096 (2013)Google Scholar
  10. 10.
    Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)CrossRefGoogle Scholar
  11. 11.
    Ono, N., Rafii, Z., Kitamura, D., Ito, N., Liutkus, A.: The 2015 signal separation evaluation campaign. In: Proceedings of the LVA/ICA, pp. 387–395 (2015)Google Scholar
  12. 12.
    Ozerov, A., Fevotte, C., Charbit, M.: Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: Proceedings of the WASPAA, pp. 121–124 (2009)Google Scholar
  13. 13.
    Simpson, A.J.R., Roma, G., Grais, E.M., Mason, R., Hummersone, C., Liutkus, A., Plumbley, M.D.: Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods. In: Proceedings of the EUSIPCO (2016)Google Scholar
  14. 14.
    Simpson, A.J.R., Roma, G., Plumbley, M.D.: Deep Karaoke: extracting vocals from musical mixtures using a convolutional deep neural network. In: Proceedings of the LVA/ICA, pp. 429–436 (2015)Google Scholar
  15. 15.
    Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRefGoogle Scholar
  16. 16.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15, 1066–1074 (2007)CrossRefGoogle Scholar
  18. 18.
    Weninger, F., Hershey, J.R., Roux, J.L., Schuller, B.: Discriminatively trained recurrent neural networks for single-channel speech separation. In: Proceedings of the GlobalSIP, pp. 577–581 (2014)Google Scholar
  19. 19.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)CrossRefGoogle Scholar
  20. 20.
    Williamson, D., Wang, Y., Wang, D.: A two-stage approach for improving the perceptual quality of separated speech. In: Proceedings of the ICASSP, pp. 7034–7038 (2014)Google Scholar
  21. 21.
    Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)Google Scholar
  22. 22.
    Hochberg, Y., Tamhane, A.C.: Multiple Comparison Procedures. Wiley, New York (1987)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Emad M. Grais
    • 1
  • Gerard Roma
    • 1
  • Andrew J. R. Simpson
    • 1
  • Mark D. Plumbley
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordUK

Personalised recommendations