Abstract
This chapter introduces multichannel nonnegative matrix factorization (NMF) methods for audio source separation. All the methods and some of their extensions are introduced within a more general local Gaussian modeling (LGM) framework. These methods are very attractive since allow combining spatial and spectral cues in a joint and principal way, but also are natural extensions and generalizations of many single-channel NMF-based methods to the multichannel case. The chapter introduces the spectral (NMF-based) and spatial models, as well as the way to combine them within the LGM framework. Model estimation criteria and algorithms are described as well, while going deeper into details of some of them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Throughout the chapter we will generally refer to all these methods as multichannel NMF, while precising when we are speaking about multichannel NTF.
- 2.
The spatial image of a source means not the source signal itself, but its contribution into the I-channel mixture.
- 3.
Due to the scale ambiguity between \(\mathbf{R}_{jfn}\) and \(v_{jfn}\) in (4.2), the loudness can be fully attributed to \(v_{jfn}\).
- 4.
When we write \(\overset{\mathrm{c}}{=}\), that means that the equality is up to some constant that is independent on model parameters \(\varvec{\theta }\), and thus has no influence on the optimization over parameters in (4.23).
- 5.
Note that if the spatial covariances \(\mathbf{R}_{jf}\) are needed, they can be always computed with (4.29).
References
D.D. Lee, H.S. Seung, Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)
T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3),1066–1074 (2007)
M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization, in Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH) (2006)
L. Le Magoarou, A. Ozerov, N.Q. Duong, Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization. J. Signal Process. Syst. 79(2), 117–131 (2015)
C.Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Comput. 21(3), 793–830 (2009)
D. El Badawy, N.Q. Duong, A. Ozerov, On-the-fly audio source separation—a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017)
E. Vincent, N. Bertin, R. Badeau, Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18, 528–537 (2010)
A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)
N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)
D. FitzGerald, M. Cranitch, E. Coyle, Non-negative tensor factorisation for sound source separation, in Proceeding of the Irish Signals and Systems Conference, Dublin, Ireland, Sept 2005
D. FitzGerald, M. Cranitch, E. Coyle, Extended nonnegative tensor factorisation models for musical sound source separation. Comput. Intell. Neurosci. 2008(872425),15 (2008)
A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010)
H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)
M.I. Mandel, D.P. Ellis, T. Jebara, An EM algorithm for localizing multiple sound sources in reverberant environments. NIPS. 19 (2006)
A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, (May 2011), pp. 257–260
H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013)
J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014)
N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)
C.Févotte, J.-F. Cardoso, Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE, 2005), pp. 78–81
E. Vincent, S. Arberet, R. Gribonval, Underdetermined instantaneous audio source separation via local gaussian modeling, in International Conference on Independent Component Analysis and Signal Separation. (Springer, 2009), pp. 775–782
H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 245–253
T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden markov models, in INTERSPEECH, (2014), pp. 850–854
J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, Parametric coding of stereo audio. EURASIP J. Appl. Signal Process. 2005, 1305–1322 (2005)
M.I. Mandel, R.J. Weiss, D.P. Ellis, Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)
E. Vincent, X. Rodet, Underdetermined source separation with structured source priors, in International Conference on Independent Component Analysis and Signal Separation, (Springer, 2004), pp. 327–334
E. Vincent, Musical source separation using time-frequency source priors. IEEE Trans. Audio Speech Lang. Process. 14(1), 91–98 (2006)
S. Arberet, A. Ozerov, N.Q. Duong, E. Vincent, R. Gribonval, F. Bimbot, P. Vandergheynst, Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation, in 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), 2010, (IEEE, 2010), pp. 1–4
T. Virtanen, A. Klapuri, Analysis of polyphonic audio using source-filter model and non-negative matrix factorization, in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop, (Citeseer, 2006)
N. Souviraà-Labastie, A. Olivero, E. Vincent, F. Bimbot, Multi-channel audio source separation using multiple deformed references. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(11), 1775–1787 (2015)
V.Y.F. Tan, C. Févotte, Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1592–1605 (2013)
R. Bro, Parafac. tutorial and applications. Chemom. Intell. Lab. Syst. 38(2), 149–171 (1997)
L. Parra, C. Spence, Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)
S. Gannot, E. Vincent, S. Markovich-Golan, A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)
N.Q. Duong, E. Vincent, R. Gribonval, Spatial location priors for gaussian model based reverberant audio source separation. EURASIP J. Adv. Signal Process. 2013(1), 149 (2013)
R. Badeau, M.D. Plumbley, Multichannel high-resolution nmf for modeling convolutive mixtures of non-stationary signals in the time-frequency domain. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(11), 1670–1680 (2014)
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, An inverse-gamma source variance prior with factorized parameterization for audio source separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2016), pp. 136–140
N.Q. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2011), pp. 205–208
T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2014), pp. 3191–3195
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)
M. Togami, Online speech source separation based on maximum likelihood of local gaussian modeling, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (IEEE, 2011), pp. 213–216
L.S. Simon, E. Vincent, A general framework for online audio source separation, in International conference on Latent Variable Analysis and Signal Separation, (Springer, 2012), pp. 397–404
N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using local observed covariance and auditory-motivated time-frequency representation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 73–80
K. Adiloğlu, E. Vincent, Variational bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1746–1758 (2016)
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat.Soc. Ser. B (Statistical Methodology) 39, 1–38 (1977)
J. Thiemann, E. Vincent, A fast EM algorithm for Gaussian model-based source separation, in Proceedings of the 21st European Signal Processing Conference (EUSIPCO), (IEEE, 2013), pp. 1–5
D.R. Hunter, K. Lange, A tutorial on mm algorithms. Am. Stat. 58(1), 30–37 (2004)
Acknowledgements
Cédric Févotte acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 681839 (project FACTORY).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Ozerov, A., Févotte, C., Vincent, E. (2018). An Introduction to Multichannel NMF for Audio Source Separation. In: Makino, S. (eds) Audio Source Separation. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-73031-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-73031-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73030-1
Online ISBN: 978-3-319-73031-8
eBook Packages: EngineeringEngineering (R0)