Skip to main content

Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6684))

Abstract

Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cao, Y., Eggermont, P.P.B., Terebey, S.: Cross Burg entropy maximization and its application to ringing suppression in image reconstruction. IEEE Transactions on Image Processing 8(2), 286–292 (1999)

    Article  Google Scholar 

  2. Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Computational Intelligence and Neuroscience (Article ID 785152), 17 pages (2009); doi:10.1155/2009/785152

    Google Scholar 

  3. Févotte, C.: Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems, ch. 11. IGI Global Press (August 2010), http://perso.telecom-paristech.fr/~fevotte/Chapters/isnmf.pdf

  4. Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation 21(3), 793–830 (2009), http://www.tsi.enst.fr/~fevotte/Journals/neco09_is-nmf.pdf

    Article  MATH  Google Scholar 

  5. FitzGerald, D., Cranitch, M., Coyle, E.: Non-negative tensor factorisation for sound source separation. In: Proc. of the Irish Signals and Systems Conference, Dublin, Ireland (September 2005)

    Google Scholar 

  6. FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tensor factorisation models for musical sound source separation. Computational Intelligence and Neuroscience (Article ID 872425), 15 pages (2008)

    Google Scholar 

  7. Helén, M., Virtanen, T.: Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. 13th European Signal Processing Conference (EUSIPCO 2005) (2005)

    Google Scholar 

  8. Lee, D.D., Seung, H.S.: Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)

    Article  MATH  Google Scholar 

  9. Neeser, F.D., Massey, J.L.: Proper complex random processes with applications to information theory. IEEE Transactions on Information Theory 39(4), 1293–1302 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  10. Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing 18(3), 550–563 (2010), http://www.tsi.enst.fr/~fevotte/Journals/ieee_asl_multinmf.pdf

    Article  Google Scholar 

  11. Parry, R.M., Essa, I.: Estimating the spatial position of spectral components in audio. In: Rosca, J.P., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 666–673. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proc. 22nd International Conference on Machine Learning, pp. 792–799. ACM, Bonn (2005)

    Google Scholar 

  13. Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging 1(2), 113–122 (1982)

    Article  Google Scholar 

  14. Smaragdis, P.: Convolutive speech bases and their application to speech separation. IEEE Transactions on Audio, Speech, and Language Processing 15(1), 1–12 (2007)

    Article  Google Scholar 

  15. Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2003) (October 2003)

    Google Scholar 

  16. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1462–1469 (2006), http://www.tsi.enst.fr/~fevotte/Journals/ieee_asl_bsseval.pdf

    Article  Google Scholar 

  17. Vincent, E., Sawada, H., Bofill, P., Makino, S., Rosca, J.P.: First stereo audio source separation evaluation campaign: Data, algorithms and results. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) ICA 2007. LNCS, vol. 4666, pp. 552–559. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Vincent, E., Araki, S., Bofill, P.: Signal Separation Evaluation Campaign. In: (SiSEC 2008) / Under-determined speech and music mixtures task results (2008), http://www.irisa.fr/metiss/SiSEC08/SiSEC_underdetermined/dev2_eval.html

  19. Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing 15(3), 1066–1074 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Févotte, C., Ozerov, A. (2011). Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues. In: Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K. (eds) Exploring Music Contents. CMMR 2010. Lecture Notes in Computer Science, vol 6684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23126-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23126-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23125-4

  • Online ISBN: 978-3-642-23126-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics