A General Modular Framework for Audio Source Separation

Ozerov, Alexey; Vincent, Emmanuel; Bimbot, Frédéric

doi:10.1007/978-3-642-15995-4_5

Alexey Ozerov²¹,
Emmanuel Vincent²¹ &
Frédéric Bimbot²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6365))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

3144 Accesses
9 Citations

Abstract

Most of audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper we introduce a general modular audio source separation framework based on a library of flexible source models that enable the incorporation of prior knowledge about the characteristics of each source. First, this framework generalizes several existing audio source separation methods, while bringing a common formulation for them. Second, it allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the flexible model, explaining its generality, and summarizing our modular implementation using a Generalized Expectation-Maximization algorithm. Finally, we illustrate the above-mentioned capabilities of the framework by applying it in several new and existing configurations to different source separation scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdallah, S.A., Plumbley, M.D.: Polyphonic transcription by nonnegative sparse coding of power spectra. In: Proc. 5th International Symposium Music Information Retrieval (ISMIR 2004), pp. 318–325 (October 2004)
Google Scholar
Arberet, S., Gribonval, R., Bimbot, F.: A robust method to count and locate audio sources in a multichannel underdetermined mixture. IEEE Transactions on Signal Processing 58(1), 121–133 (2010)
Article Google Scholar
Arberet, S., Ozerov, A., Duong, N., Vincent, E., Gribonval, R., Bimbot, F., Vandergheynst, P.: Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation. In: 10th Int. Conf. on Information Sciences, Signal Proc. and their Applications, ISSPA 2010 (2010)
Google Scholar
Cardoso, J.F., Martin, M.: A flexible component model for precision ICA. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) ICA 2007. LNCS, vol. 4666, pp. 1–8. Springer, Heidelberg (2007)
Chapter Google Scholar
Duong, N.Q.K., Vincent, E., Gribonval, R.: Under-determined convolutive blind source separation using spatial covariance models. In: IEEE International Conference on Acoustics,Speech, and Signal Processing ICASSP (March 2010)
Google Scholar
Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation 21(3), 793–830 (2009)
MATH Google Scholar
Févotte, C., Cardoso, J.F.: Maximum likelihood approach for blind audio source separation using time-frequency Gaussian models. In: WASPAA 2005, Mohonk, NY, USA (October 2005)
Google Scholar
FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tensor factorisation models for musical sound source separation. In: Computational Intelligence and Neuroscience. Hindawi Publishing Corp. 2008 (2008)
Google Scholar
Nesta, F., Svaizer, P., Omologo, M.: Cumulative state coherence transform for a robust two-channel multiple source localization. In: Adali, T., Jutten, C., Romano, J.M.T., Barros, A.K. (eds.) ICA 2009. LNCS, vol. 5441, pp. 290–297. Springer, Heidelberg (2009)
Chapter Google Scholar
Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. on Audio, Speech and Lang. Proc. 18(3), 550–563 (2010)
Article Google Scholar
Ozerov, A., Févotte, C., Charbit, M.: Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: WASPAA 2009, October 18-21, pp. 121–124 (2009)
Google Scholar
Pham, D.T., Servière, C., Boumaraf, H.: Blind separation of speech mixtures based on nonstationarity. In: Proceedings of the 7th International Symposium on Signal Processing and its Applications, pp. II–73–76 (2003)
Google Scholar
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. on Audio, Speech and Language Processing 18(3), 528–537 (2010)
Article Google Scholar
Vincent, E., Jafari, M., Abdallah, S.A., Plumbley, M.D., Davies, M.E.: Probabilistic modeling paradigms for audio source separation. In: Machine Audition: Principles, Algorithms and Systems. IGI Global (2010) (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, Rennes Bretagne Atlantique,
Alexey Ozerov & Emmanuel Vincent
IRISA, CNRS - UMR 6074, Campus de Beaulieu, 35042, Rennes cedex, France
Frédéric Bimbot

Authors

Alexey Ozerov
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Vincent
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Bimbot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical Engineering, Universitè d’Evry Val d’Essone, 40 rue du Pelvoux, 91020, Courcouronnes, France
Vincent Vigneron
Laboratoire I3S, Les Algorithmes - Euclide-B, BP 121, Université de Nice-Sophia Antipolis, 2000 Route des Lucioles, 06903, Sophia Antipolis Cedex, France
Vicente Zarzoso
School of Engineering, Dept. of Telecommunications, ISITSchool of Engineering, Dept. of Telecommunications, ISITV, Université de Toulon, Avenue George Pompidou, BP 56, La Valette du Var, Cedex, 83162, France
Eric Moreau
INRIA France, Equipe-projet METISS, Centre de Recherche INRIA Rennes-Bretagne Atlantique, Campus de Beaulieu, 35042, Rennes cedex, France
Rémi Gribonval
INRIA France, Equipe-projet METISS, Centre de Recherche INRIA Rennes-Bretagne Atlantique, Campus de Beaulieu, 35042, Rennes Cedex, France
Emmanuel Vincent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ozerov, A., Vincent, E., Bimbot, F. (2010). A General Modular Framework for Audio Source Separation. In: Vigneron, V., Zarzoso, V., Moreau, E., Gribonval, R., Vincent, E. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2010. Lecture Notes in Computer Science, vol 6365. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15995-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-15995-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15994-7
Online ISBN: 978-3-642-15995-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics