Determined Blind Source Separation with Independent Low-Rank Matrix Analysis

  • Daichi Kitamura
  • Nobutaka Ono
  • Hiroshi Sawada
  • Hirokazu Kameoka
  • Hiroshi Saruwatari
Chapter
Part of the Signals and Communication Technology book series (SCT)

Abstract

In this chapter, we address the determined blind source separation problem and introduce a new effective method of unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the statistical independence between source vectors. However, since the source model in IVA is based on a spherically symmetric multivariate distribution, IVA cannot utilize the characteristics of specific spectral structures such as various sounds appearing in music signals. To solve this problem, we introduce NMF as the source model in IVA to capture the spectral structures. Since this approach is a natural extension of the source model from a vector to a low-rank matrix represented by NMF, the new method is called independent low-rank matrix analysis (ILRMA). We also reveal the relationship between IVA, ILRMA, and multichannel NMF (MNMF), namely, IVA and ILRMA are identical to a special case of MNMF, which employs a rank-1 spatial model. Experimental results show the efficacy of ILRMA compared with IVA and MNMF in terms of separation accuracy and convergence speed.

Notes

Acknowledgements

This work was partially supported by Grant-in-Aid for JSPS Fellows Grant Number \(26\cdot 10796\), and SECOM Science and Technology Foundation.

References

  1. 1.
    P. Comon, Independent component analysis, a new concept? Signal Process. 36(3), 287–314 (1994)CrossRefMATHGoogle Scholar
  2. 2.
    A.J. Bell, T.J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)CrossRefGoogle Scholar
  3. 3.
    J.-F. Cardoso, Infomax and maximum likelihood for blind source separation. IEEE Signal Process. Lett. 4(4), 112–114 (1997)CrossRefGoogle Scholar
  4. 4.
    S. Haykin (ed.), Unsupervised Adaptive Filtering (Volume I: Blind Source Separation) (Wiley-Interscience, 2000)Google Scholar
  5. 5.
    A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley-Interscience, 2001)Google Scholar
  6. 6.
    P. Smaragdis, Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22(1), 21–34 (1998)CrossRefMATHGoogle Scholar
  7. 7.
    S. Araki, R. Mukai, S. Makino, T. Nishikawa, H. Saruwatari, The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech and Audio Process. 11(2), 109–116 (2003)CrossRefMATHGoogle Scholar
  8. 8.
    H. Sawada, R. Mukai, S. Araki, S. Makino, Convolutive blind source separation for more than two sources in the frequency domain, in Proceeding ICASSP (2004), pp. III-885–III-888Google Scholar
  9. 9.
    H. Buchner, R. Aichner, W. Kellerman, A generalization of blind source separation algorithms for convolutive mixtures based on second order statistics. IEEE Trans. Speech and Audio Process. 13(1), 120–134 (2005)CrossRefGoogle Scholar
  10. 10.
    H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, K. Shikano, Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Speech and Audio Process. 14(2), 666–678 (2006)CrossRefGoogle Scholar
  11. 11.
    D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefMATHGoogle Scholar
  12. 12.
    D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Proceedings NIPS (2000), pp. 556–562Google Scholar
  13. 13.
    A. Cichocki, R. Zdunek, A.H. Phan, S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation (Wiley, 2009)Google Scholar
  14. 14.
    T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio, Speech, and Lang. Process. 15(3), 1066–1074 (2007)CrossRefGoogle Scholar
  15. 15.
    A. Ozerov, C. Févotte, M. Charbit, Factorial scaled hidden Markov model for polyphonic audio representation and source separation, in Proceedings WASPAA (2009), pp. 121–124Google Scholar
  16. 16.
    P. Smaragdis, B. Raj, M. Shashanka, Supervised and semi-supervised separation of sounds from single-channel mixtures, in Proceedings ICA (2007), pp. 414–421Google Scholar
  17. 17.
    D. Kitamura, H. Saruwatari, K. Yagi, K. Shikano, Y. Takahashi, K. Kondo, Music signal separation based on supervised nonnegative matrix factorization with orthogonality and maximum-divergence penalties. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E97-A(5), 1113–1118 (2014)Google Scholar
  18. 18.
    D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, S. Nakamura, Multichannel signal separation combining directional clustering and nonnegative matrix factorization with spectrogram restoration. IEEE/ACM Trans. Audio, Speech, and Lang. Process. 23(4), 654–669 (2015)CrossRefGoogle Scholar
  19. 19.
    S. Araki, F. Nesta, E. Vincent, Z. Koldovský, G. Nolte, A. Ziehe, A. Benichoux, The 2011 signal separation evaluation campaign (SiSEC2011):-audio source separation, in Proceedings LVA/ICA (2012), pp. 414–422Google Scholar
  20. 20.
    N. Ono, Z. Koldovský, S. Miyabe, N. Ito, The 2013 signal separation evaluation campaign (SiSEC2013), in Proceedings MLSP (2013)Google Scholar
  21. 21.
    N. Ono, Z. Rafii, D. Kitamura, N. Ito, A. Liutkus, The 2015 signal separation evaluation campaign, in Proceedings LVA/ICA (2015), pp. 387–395Google Scholar
  22. 22.
    A. Liutkus, F.-R. Stöter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, J. Fontecave, The 2016 signal separation evaluation campaign, in Proceedings LVA/ICA (2017)Google Scholar
  23. 23.
    S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, F. Itakura, Evaluation of blind signal separation method using directivity pattern under reverberant conditions, in Proceedings ICASSP (2000), pp. 3140–3143Google Scholar
  24. 24.
    N. Murata, S. Ikeda, A. Ziehe, An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 41(1–4), 1–24 (2001)CrossRefMATHGoogle Scholar
  25. 25.
    H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech and Audio Process. 12(5), 530–538 (2004)CrossRefGoogle Scholar
  26. 26.
    H. Sawada, S. Araki, S. Makino, Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS, in Proceedings ISCAS (2007), pp. 3247–3250Google Scholar
  27. 27.
    A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceedings ICA (2006), pp. 601–608Google Scholar
  28. 28.
    T. Kim, T. Eltoft, T.-W. Lee, Independent vector analysis: an extension of ICA to multivariate components, in Proceedings ICA (2006), pp. 165–172Google Scholar
  29. 29.
    T. Kim, H.T. Attias, S.-Y. Lee, T.-W. Lee, Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio, Speech, and Lang. Process. 15(1), 70–79 (2007)CrossRefGoogle Scholar
  30. 30.
    D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proceedings ICASSP (2015), pp. 276–280Google Scholar
  31. 31.
    D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio, Speech, and Lang. Process. 24(9), 1626–1641 (2016)CrossRefGoogle Scholar
  32. 32.
    S. Arberet, A. Ozerov, N.Q.K. Duong, E. Vincent, R. Gribonval, F. Bimbot, P. Vandergheynst, Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation, in Proceedings ISSPA (2010), pp. 1–4Google Scholar
  33. 33.
    H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in Proceedings LVA/ICA (2010), pp. 245–253Google Scholar
  34. 34.
    A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio, Speech, and Lang. Process. 18(3), 550–563 (2010)CrossRefGoogle Scholar
  35. 35.
    A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings ICASSP (2011), pp. 257–260Google Scholar
  36. 36.
    H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio, Speech, and Lang. Process. 21(5), 971–982 (2013)CrossRefGoogle Scholar
  37. 37.
    T. Eltoft, T. Kim, T.-W. Lee, On the multivariate Laplace distribution. IEEE Signal Process. Lett. 13(5), 300–303 (2006)CrossRefGoogle Scholar
  38. 38.
    S. Kotz, T.J. Kozubowski, K. Podgórski, Symmetric multivariate Laplace distribution, in The Laplace Distribution and Generalizations, chap. 5 (Birkhäuser, Basel, 2001), pp. 231–238Google Scholar
  39. 39.
    T. Adali, H. Ki, J.-F. Cardoso, Complex ICA using nonlinear functions. IEEE Trans. Signal Process. 56(9), 4536–4544 (2008)MathSciNetCrossRefGoogle Scholar
  40. 40.
    N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, in Proceedings WASPAA (2011), pp. 189–192Google Scholar
  41. 41.
    N. Ono, Fast stereo independent vector analysis and its implementation on mobile phone, in Proceedings IWAENC (2012)Google Scholar
  42. 42.
    N. Ono, Auxiliary-function-based independent vector analysis with power of vector-norm type weighting functions, in Proceedings APSIPA ASC (2012)Google Scholar
  43. 43.
    T. Ono, N. Ono, S. Sagayama, User-guided independent vector analysis with source activity tuning, in Proceedings ICASSP (2012), pp. 2417–2420Google Scholar
  44. 44.
    K. Hild, H.T. Attias, S. Nagarajan, An expectation-maximization method for spatio-temporal blind source separation using an AR-MOG source model. IEEE Trans. Neural Netw. 19(3), 508–519 (2008)CrossRefMATHGoogle Scholar
  45. 45.
    C. Févotte, J.-F. Cardoso, Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models, in Proceedings WASPAA (2005), pp. 78–81Google Scholar
  46. 46.
    T. Nakatani, B.-H. Juang, T. Yoshioka, K. Kinoshita, M. Delcroix, M. Miyoshi, Speech dereverberation based on maximum-likelihood estimation with time-varying Gaussian source model. IEEE Trans. Audio, Speech, and Lang. Process. 16(8), 1512–1527 (2008)CrossRefGoogle Scholar
  47. 47.
    C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Comput. 21(3), 793–830 (2009)CrossRefMATHGoogle Scholar
  48. 48.
    F.D. Neeser, J.L. Massey, Proper complex random processes with applications to information theory. IEEE Trans. Inf. Theory 39(4), 1293–1302 (1993)MathSciNetCrossRefMATHGoogle Scholar
  49. 49.
    F. Itakura, S. Saito, Analysis synthesis telephony based on the maximum likelihood method, in Proceedings ICA (1968), pp. C-17–C-20Google Scholar
  50. 50.
    M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, S. Sagayama, Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with beta-divergence, in Proceedings MLSP (2010), pp. 283–288Google Scholar
  51. 51.
    A.R. López, N. Ono, U. Remes, K. Palomäki, M. Kurimo, Designing multichannel source separation based on single-channel source separation, in Proceedings ICASSP (2015), pp. 469–473Google Scholar
  52. 52.
    N. Ono, S. Miyabe, Auxiliary-function-based independent component analysis for super-Gaussian sources, in Proceedings LVA/ICA (2010), pp. 165–172Google Scholar
  53. 53.
    S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blind signal separation, in Proceedings NIPS (1996), pp. 757–763Google Scholar
  54. 54.
    A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, vol. 1 (Wiley, 2002)Google Scholar
  55. 55.
    T.G. Kolda, B.W. Bader, Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetCrossRefMATHGoogle Scholar
  56. 56.
    D. FitzGerald, M. Cranitch, E. Coyle, Non-negative tensor factorisation for sound source separation, in Proceedings ISSC (2005), pp. 8–12Google Scholar
  57. 57.
    R.M. Parry, I.A. Essa, Estimating the spatial position of spectral components in audio, in Proceedings ICA (2006), pp. 666–673Google Scholar
  58. 58.
    Y. Mitsufuji, A. Roebel, Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge, in Proceedings ICASSP (2013), pp. 71–75Google Scholar
  59. 59.
    N.Q.K. Duong, E. Vincent, R. Gribonval, Spatial covariance models for under-determined reverberant audio source separation, in Proceedings WASPAA (2009), pp. 129–132Google Scholar
  60. 60.
    N.Q.K. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio, Speech, and Lang. Process. 18(7), 1830–1840 (2010)CrossRefGoogle Scholar
  61. 61.
    K.U. Simmer, J. Bitzer, C. Marro, Post-filtering techniques, in Microphone Arrays: Signal Processing Techniques and Applications, ed. by M. Brandstein, D. Ward, chap. 3 (Springer, Heidelberg, 2001), pp. 39–60Google Scholar
  62. 62.
    W. James, C. Stein, Estimation with quadratic loss, in Proceedings Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (1961), pp. 361–379Google Scholar
  63. 63.
    B. Kulis, M. Sustik, I. Dhillon, Learning low-rank kernel matrices, in Proceedings ICML (2006), pp. 505–512Google Scholar
  64. 64.
    S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, T. Yamada, Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition, in Proceedings LREC (2000), pp. 965–968Google Scholar
  65. 65.
    E. Vincent, R. Gribonval, C. Févotte, Performance measurement in blind audio source separation. IEEE Trans. Audio, Speech, and Lang. Process. 14(4), 1462–1469 (2006)CrossRefGoogle Scholar
  66. 66.
    S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, H. Saruwatari, Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures. EURASIP J. Adv. Signal Process. 2003(11), 1–10 (2003)CrossRefMATHGoogle Scholar
  67. 67.
    J.-F. Cardoso, A. Souloumiac, Blind beamforming for non-Gaussian signals. IEE Proc. F - Radar and Signal Process. 140(6), 362–370 (1993)CrossRefGoogle Scholar
  68. 68.
    D.B. Ward, R.A. Kennedy, R.C. Williamson, Constant directivity beamforming, in Microphone Arrays: Signal Processing Techniques and Applications, ed. by M. Brandstein, D. Ward, chap. 1 (Springer, Heidelberg, 2001), pp. 3–17Google Scholar
  69. 69.
    D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Relaxation of rank-1 spatial constraint in overdetermined blind source separation, in Proceedings EUSIPCO (2015), pp. 1271–1275Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Daichi Kitamura
    • 1
  • Nobutaka Ono
    • 2
  • Hiroshi Sawada
    • 3
  • Hirokazu Kameoka
    • 4
  • Hiroshi Saruwatari
    • 1
  1. 1.The University of TokyoTokyoJapan
  2. 2.Tokyo Metropolitan UniversityTokyoJapan
  3. 3.NTT Communication Science LaboratoriesKyotoJapan
  4. 4.NTT Communication Science LaboratoriesAtsugiJapan

Personalised recommendations