General Formulation of Multichannel Extensions of NMF Variants

Kameoka, Hirokazu; Sawada, Hiroshi; Higuchi, Takuya

doi:10.1007/978-3-319-73031-8_5

Hirokazu Kameoka²,
Hiroshi Sawada² &
Takuya Higuchi²

Part of the book series: Signals and Communication Technology ((SCT))

1922 Accesses
3 Citations

Abstract

Blind source separation (BSS) is generally a mathematically ill-posed problem that involves separating out individual source signals from microphone array inputs. The frequency domain BSS approach is particularly notable in that it provides the flexibility needed to exploit various models for the time-frequency representations of source signals and/or array responses. Many frequency domain BSS approaches can be categorized according to the way in which the source power spectrograms and/or the mixing process are modeled. For source power spectrogram modeling, the non-negative matrix factorization (NMF) model and its variants have recently proved very powerful. For mixing process modeling, one reasonable way involves introducing a plane wave assumption so that the spatial covariances of each source can be described explicitly using the direction of arrival (DOA). This chapter provides a general formulation of the frequency domain BSS that makes it possible to incorporate the models for the source power spectrogram and the source spatial covariance matrix. Through this formulation, we reveal the relationship between the state-of-the-art BSS approaches. We further show that combining these models allows us to solve the problems of source separation, DOA estimation, dereverberation, and voice activity detection in a unified manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The permutation alignment problem refers to a problem of grouping together the separated components of different frequency bins that originate from the same source to construct a separated signal.
2.
If we want to maximize \(\mathscr {C}({\varvec{\theta }})\), we will use a minorizer instead, which is defined as \(\mathscr {C}({\varvec{\theta }}) = \max _{{\varvec{\alpha }}} \mathscr {D}({\varvec{\theta }},{\varvec{\alpha }})\).

References

A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)
Book Google Scholar
A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceedings International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 601–608
Google Scholar
T. Kim, T. Eltoft, T.-W. Lee, Independent vector analysis: An extension of ICA to multivariate components, in Proceedings of International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 165–172
Google Scholar
A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010). Mar
Article Google Scholar
H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2010), pp. 245–253
Google Scholar
A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, May 2011, pp. 257–260
Google Scholar
H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013). May
Article Google Scholar
J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014). Mar
Article Google Scholar
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 276–280
Google Scholar
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1626–1641 (2016)
Article Google Scholar
K. Adiloğlu, E. Vincent, Variational Bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1746–1758 (2016)
Article Google Scholar
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)
Article Google Scholar
P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2003), pp. 177–180
Google Scholar
C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Comput. 21(3), 793–830 (2009). Mar
Article MATH Google Scholar
T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models, in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2014), pp. 850–854
Google Scholar
T. Higuchi, H. Kameoka, Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (2014)
Google Scholar
T. Higuchi, H. Kameoka, Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM, in Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2014)
Google Scholar
T. Higuchi, H. Kameoka, Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model, in Proceedings of European Signal Processing Conference (EUSIPCO), August 2015
Google Scholar
H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Blind separation of infinitely many sparse sources, in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC) (2012)
Google Scholar
H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Bayesian nonparametric approach to blind separation of infinitely many sparse sources. IEICE Trans. Fundamentals Electronics E96-A(10), 1928–1937 (2013)
Google Scholar
T. Otsuka, K. Ishiguro, H. Sawada, H.G. Okuno, Bayesian nonparametrics for microphone array processing. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 493–504 (2014)
Article Google Scholar
T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 3215–3219
Google Scholar
H. Attias, New EM algorithms for source separation and deconvolution with a microphone array, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. V (2003), pp. 297–300
Google Scholar
N.Q.K. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)
Article Google Scholar
A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)
Article Google Scholar
T. Ono, N. Ono, S. Sagayama, User-guided independent vector analysis with source activity tuning, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 2417–2420
Google Scholar
S. Dégerine, A. Zaïdi, Separation of an instantaneous mixture of gaussian autoregressive sources by the exact maximum likelihood approach. IEEE Trans. Sig. Process. 52(6), 1499–1512 (2004)
Article MathSciNet MATH Google Scholar
T. Yoshioka, T. Nakatani, M. Miyoshi, H.G. Okuno, Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. Audio Speech Lang. Process. 19(1), 69–84 (2011). Mar.
Article Google Scholar
H. Kameoka, K. Kashino, Composite autoregressive system for sparse source-filter representation of speech, in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (2009), pp. 2477–2480
Google Scholar
N.Q.K. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 205–208
Google Scholar
J.D. Leeuw, W.J. Heiser, Convergence of correction matrix algorithms for multidimensional scaling, in Geometric representations of relational data, ed. by J.C. Lingoes, E.E. Roskam, I. Borg (Mathesis Press, Ann Arbor, MI, 1977)
Google Scholar
D.R. Hunter, K. Lange, A tutorial on MM algorithms. Am. Statistician 58(1), 30–37 (2004). Feb.
Article MathSciNet Google Scholar
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Statistical Soc. Series B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, vol. 13 (2001)
Google Scholar
M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, S. Sagayama, Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2010), pp. 283–288
Google Scholar
C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Article MathSciNet MATH Google Scholar
C. Bishop, Pattern Recognit. Mach. Learn. (Springer-Verlag, New York, 2006)
Google Scholar
Y. Izumi, N. Ono, S. Sagayama, Sparseness-based 2ch BSS using the EM algorithm in reverberant environment, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2007), pp. 147–150
Google Scholar
H. Kameoka, M. Goto, S. Sagayama, Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes, in IPSJ SIG Technical Reports, vol. 2006-MUS-66-13 (2006), pp. 77–84, in Japanese
Google Scholar
S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blind signal separation, in Advances in Neural Information Processing Systems (MIT Press, 1996), pp. 757–763
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198, Japan
Hirokazu Kameoka, Hiroshi Sawada & Takuya Higuchi

Authors

Hirokazu Kameoka
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sawada
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Higuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hirokazu Kameoka .

Editor information

Editors and Affiliations

University of Tsukuba, Ibaraki, Japan
Shoji Makino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kameoka, H., Sawada, H., Higuchi, T. (2018). General Formulation of Multichannel Extensions of NMF Variants. In: Makino, S. (eds) Audio Source Separation. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-73031-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-73031-8_5
Published: 02 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73030-1
Online ISBN: 978-3-319-73031-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics