A GMM Sound Source Model for Blind Speech Separation in Under-determined Conditions
This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.
KeywordsBlind speech separation Under-determined condition GMM sound source model Auxiliary function method
Unable to display preview. Download preview PDF.
- 1.Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. on ASLP 19(3), 516–527 (2011)Google Scholar
- 2.Duong, N.Q.K., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. on ASLP 18(7), 1830–1840 (2010)Google Scholar
- 3.Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. on ASLP 18(3), 550–563 (2010)Google Scholar
- 5.Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)Google Scholar
- 6.Kameoka, H., Ono, N., Sagayama, S.: Auxiliary function approach to parameter estimation of constrained sinusoidal model for monaural speech separation. In: Proc. of ICASSP 2008, pp. 29–32 (2008)Google Scholar
- 7.Araki, S., Ozerov, A., Gowreesunker, V., Sawada, H., Theis, F., Nolte, G., Lutter, D., Duong, N.Q.K.: The 2010 Signal Separation Evaluation Campaign (SiSEC2010): Audio Source Separation. In: Vigneron, V., Zarzoso, V., Moreau, E., Gribonval, R., Vincent, E. (eds.) LVA/ICA 2010. LNCS, vol. 6365, pp. 114–122. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 8.Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. on ASLP 14(4), 1462–1469 (2006)Google Scholar