Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues

Févotte, Cédric; Ozerov, Alexey

doi:10.1007/978-3-642-23126-1_8

Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues

Cédric Févotte²⁰ &
Alexey Ozerov²¹

Conference paper

1217 Accesses
16 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6684))

Abstract

Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cao, Y., Eggermont, P.P.B., Terebey, S.: Cross Burg entropy maximization and its application to ringing suppression in image reconstruction. IEEE Transactions on Image Processing 8(2), 286–292 (1999)
Article Google Scholar
Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Computational Intelligence and Neuroscience (Article ID 785152), 17 pages (2009); doi:10.1155/2009/785152
Google Scholar
Févotte, C.: Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems, ch. 11. IGI Global Press (August 2010), http://perso.telecom-paristech.fr/~fevotte/Chapters/isnmf.pdf
Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation 21(3), 793–830 (2009), http://www.tsi.enst.fr/~fevotte/Journals/neco09_is-nmf.pdf
Article MATH Google Scholar
FitzGerald, D., Cranitch, M., Coyle, E.: Non-negative tensor factorisation for sound source separation. In: Proc. of the Irish Signals and Systems Conference, Dublin, Ireland (September 2005)
Google Scholar
FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tensor factorisation models for musical sound source separation. Computational Intelligence and Neuroscience (Article ID 872425), 15 pages (2008)
Google Scholar
Helén, M., Virtanen, T.: Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. 13th European Signal Processing Conference (EUSIPCO 2005) (2005)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)
Article MATH Google Scholar
Neeser, F.D., Massey, J.L.: Proper complex random processes with applications to information theory. IEEE Transactions on Information Theory 39(4), 1293–1302 (1993)
Article MathSciNet MATH Google Scholar
Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing 18(3), 550–563 (2010), http://www.tsi.enst.fr/~fevotte/Journals/ieee_asl_multinmf.pdf
Article Google Scholar
Parry, R.M., Essa, I.: Estimating the spatial position of spectral components in audio. In: Rosca, J.P., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 666–673. Springer, Heidelberg (2006)
Chapter Google Scholar
Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proc. 22nd International Conference on Machine Learning, pp. 792–799. ACM, Bonn (2005)
Google Scholar
Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging 1(2), 113–122 (1982)
Article Google Scholar
Smaragdis, P.: Convolutive speech bases and their application to speech separation. IEEE Transactions on Audio, Speech, and Language Processing 15(1), 1–12 (2007)
Article Google Scholar
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2003) (October 2003)
Google Scholar
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1462–1469 (2006), http://www.tsi.enst.fr/~fevotte/Journals/ieee_asl_bsseval.pdf
Article Google Scholar
Vincent, E., Sawada, H., Bofill, P., Makino, S., Rosca, J.P.: First stereo audio source separation evaluation campaign: Data, algorithms and results. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) ICA 2007. LNCS, vol. 4666, pp. 552–559. Springer, Heidelberg (2007)
Chapter Google Scholar
Vincent, E., Araki, S., Bofill, P.: Signal Separation Evaluation Campaign. In: (SiSEC 2008) / Under-determined speech and music mixtures task results (2008), http://www.irisa.fr/metiss/SiSEC08/SiSEC_underdetermined/dev2_eval.html
Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing 15(3), 1066–1074 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CNRS LTCI, Telecom ParisTech, Paris, France
Cédric Févotte
IRISA, INRIA, Rennes, France
Alexey Ozerov

Authors

Cédric Févotte
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Ozerov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS - LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Sølvi Ystad
CNRS-INCM, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Mitsuko Aramaki
CNRS-LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Richard Kronland-Martinet
Aalborg University Esbjerg, Niels Bohr Vej 8, 6700, Esbjerg, Denmark
Kristoffer Jensen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Févotte, C., Ozerov, A. (2011). Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues. In: Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K. (eds) Exploring Music Contents. CMMR 2010. Lecture Notes in Computer Science, vol 6684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23126-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-23126-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23125-4
Online ISBN: 978-3-642-23126-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics