Improved Convolutive and UnderDetermined Blind Audio Source Separation with MRF Smoothing
 840 Downloads
 3 Citations
Abstract
Convolutive and underdetermined blind audio source separation from noisy recordings is a challenging problem. Several computational strategies have been proposed to address this problem. This study is concerned with several modifications to the expectationminimizationbased algorithm, which iteratively estimates the mixing and source parameters. This strategy assumes that any entry in each source spectrogram is modeled using superimposed Gaussian components, which are mutually and individually independent across frequency and time bins. In our approach, we resolve this issue by considering a locally smooth temporal and frequency structure in the power source spectrograms. Local smoothness is enforced by incorporating a Gibbs prior in the complete data likelihood function, which models the interactions between neighboring spectrogram bins using a Markov random field. Simulations using audio files derived from stereo audio source separation evaluation campaign 2008 demonstrate high efficiency with the proposed improvement.
Keywords
Blind source separation Nonnegative matrix factorization Expectationmaximization Markov random field Simultaneous autoregressionIntroduction
Blind source separation (BSS) aims to recover unknown source signals from observed mixtures with or without very limited information about their mixing process. BSS problems have been addressed in many previous studies, for example, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], which were motivated by several realworld applications.
In a cocktailparty problem, microphones receive noisy mixtures of acoustic signals that propagate along multiple paths from their sources. In a real scenario, the number of audio sources may be greater than the number of microphones, audio sources may have different timbres and similar pitches, and audio signals may be only locally stationary.
A convolutive and underdetermined mixing approach needs to be adopted to model this problem. There are several techniques for solving convolutive unmixing problems [13]. Some of these [14] operate in the timedomain by solving the alternative finite impulse response (FIR) inverse model using independent component analysis (ICA) methods [2]. Another method is to extract meaningful features from the timefrequency (TF) representations of mixtures. This approach seems to be more efficient than the ICAbased techniques especially when the number of microphones is lower than the number of sources. Acoustic signals are usually sparse in the TF domain, so the source signals can be separated efficiently even if they are partially overlapped and the problem is underdetermined. These features can be extracted using several techniques, including TF masking [15, 16], frequency binwise clustering with permutation alignment (FBWCPA) [17, 18], subspace projection [19], hidden Markov models (HMM) [20], interaural phase difference (IPD) [21], nonnegative matrix factorization (NMF) [22, 23], and nonnegative tensor factorization (NTF) [24].
Nonnegative matrix factorization [25] is a feature extraction method with many realworld applications [26]. A convolutive NMFbased unmixing model was proposed by Smaragdis [22]. Ozerov and Fevotte [23] developed the EMNMF algorithm, which is suitable for unsupervised convolutive and possibly underdetermined unmixing of audio sources using only stereo observations. Their model of the sources was based on the generalized Wiener filtering model [27, 28, 29], which assumes that each source is locally stationary and that it can be expressed in terms of superimposed amplitudemodulated Gaussian components. Thus, a power spectrogram of each source can be factorized into lowerrank nonnegative matrices, which facilitates the use of NMF for estimating the frequency and temporal profiles of each latent source component. In the TF representation, the latent components are mutually and individually independent across frequency and time bins. However, this assumption is very weak for any adjacent bins because real audio signals have locally smooth frequency and temporal structures.
Motivated by several papers on smoothness [26, 28, 30, 31] in BSS models, we attempt to further improve the EMNMF algorithm by enforcing local smoothness both in the frequency and temporal profiles of the NMF factors. Similar to [28, 30, 32], we introduce a priori knowledge to the NMFbased model using a Bayesian framework, although our approach is based on a Gibbs prior with a Markov random field (MRF) model to describe pairwise interactions among adjacent bins in spectrograms. As demonstrated in [33], the MRF model with Green’s function, which is well known in many tomographic image reconstruction applications [34], can improve the EMNMF algorithm. In this paper, we extend the results presented in [33] using other smoothing functions, particularly a more flexible simultaneous autoregressive (SAR) model that is more appropriate in term of hyperparameter estimation and computational complexity.
The rest of this paper is organized as follows. The next section reviews the underlying separation model. Section 3 is concerned with MRF smoothing. The optimization algorithm is described in Sect. 4. Audio source separation experiments are presented in Sect. 5. Finally, the conclusions are provided in Sect. 6.
Model
The priors \(P(\varvec{W})\) and \(P(\varvec{H})\) in (10) can be determined in many ways. Févotte et al. [28] proposed the determination of priors using Markov chains and the inverse Gamma distribution. In our approach, we propose to model the priors with the Gibbs distribution, which is particularly useful for enforcing local smoothness in images.
MRF Smoothing
Potential functions
Author(s) (name)  Functions: ψ(ξ, δ)  Reference 

(Gaussian)  (ξ/δ)^{2}  
Besag (Laplacian)  \(\left\xi/\delta \right\)  [36] 
Bouman and Sauer (GGMRF)  ξ/δ^{ p }  [37] 
Geman and McClure  \(\frac{16}{3 \sqrt{3}}\frac{(\xi/\delta)^2}{(1+(\xi/\delta)^2 )}\)  [38] 
Geman and Reynolds  \(\frac{\xi/\delta}{1+\xi/\delta}\)  [39] 
Green  \(\delta \ln [\cosh(\xi/\delta)]\)  [34] 
Hebert and Leahy  \(\delta \ln [1+(\xi/\delta)^2]\)  [40] 
According to Lange [41], a robust potential function in the Gibbs prior should have the following properties: nonnegative, even, 0 at ξ = 0, strictly increasing for ξ > 0, unbounded, and convex with bounded firstderivative. Of the functions listed in Table 1, Green’s function satisfies all of these properties, and consequently, it was selected for use in the tests in [33]. Unfortunately, the application of Green’s function to both matrices W and H demands the determination of two hyperparameters δ_{ W } and δ_{ H }, and two penalty parameters α_{ W } and α_{ H }. Moreover, datadriven hyperparameter estimation usually involves an approximation of the partition functions \(\varvec{Z}_W\) and \(\varvec{Z}_H\), which is not easy in this task.
According to [45, 46], the spatial dependence matrices can be expressed as \(\varvec{S}^{(W)}=\gamma \varvec{Z}^{(W)}\) and \(\varvec{S}^{(H)}=\gamma \varvec{Z}^{(H)}\), where γ is a constant that ensures that the matrices \(\varvec{C}^{(W)}={\bf I}_F  \varvec{S}^{(W)}\) and \(\varvec{C}^{(H)}={\bf I}_T  \varvec{S}^{(H)}\) are positivedefinite, while \(\varvec{Z}^{(W)}=[z_{mf}^{(W)}]\) and \(\varvec{Z}^{(H)}=[z_{tn}^{(H)}]\) are binary symmetric band matrices indicating the neighboring entries in \(\varvec{w}_r\) and \(\underline {{\bf h}}_r\), respectively. In the firstorder interactions, we have z _{1,2} ^{(W)} = z _{ F,F1} ^{(W)} = z _{ m,m1} ^{(W)} = z _{ m,m+1} ^{(W)} = 1 for \(m \in \{2, \ldots, F1 \}; z_{2,1}^{(H)}=z_{T1,T}^{(H)}=z_{n1,n}^{(H)}=z_{n+1,n}^{(H)} =1\) for \(n \in \{2, \ldots, T1 \}\), and z _{ mf } = z _{ tn } = 0 otherwise. In the Porder interactions, each entry w _{ fr } and h _{ rt } has the corresponding sets of neighbors: {w _{ fν,r }}, {w _{ f+ν,r }}, {h _{ r,tν}}, {h _{ r,t+ν}} with \(\nu=1, \ldots, P\). As a consequence, \(\varvec{Z}^{(W)}\) and \(\varvec{Z}^{(H)}\) are symmetric band matrices with P subdiagonals and P superdiagonals, the entries of which are equal to ones, but zeros otherwise. The matrices \(\varvec{C}^{(W)}\) and \(\varvec{C}^{(H)}\) are positivedefinite, if γ < (2P)^{−1} for Porder interactions [45, 46]. We selected \(\gamma=(2P)^{1}  \tilde \epsilon\), where \(\tilde \epsilon\) is a small constant, for example, \(\tilde \epsilon=10^{16}\).
Algorithm
 Gaussian (SAR model):$$ \nabla_{w_{fr}} U(\varvec{W})= \left [(\varvec{C}^{(W)})^T \varvec{C}^{(W)} \varvec{W} \right]_{fr}, $$(32)$$ \nabla_{h_{rt}} U(\varvec{H})= \left [\varvec{H} \varvec{C}^{(H)} (\varvec{C}^{(H)})^T \right]_{rt}, $$(33)
 GR function (proposed by Green [34]):$$ \nabla_{w_{fr}} U(\varvec{W})= \sum_{l \in S_f} \nu_{fl} \tanh \left (\frac{w_{fr}  w_{lr}}{\delta_W} \right), $$(34)$$ \nabla_{h_{rt}} U(\varvec{H})= \sum_{l \in S_t} \nu_{tl} \tanh \left (\frac{h_{rt}  h_{rl}}{\delta_H} \right). $$(35)
 HL function (proposed by Hebert and Leahy [40]):$$ \nabla_{w_{fr}} U(\varvec{W})= \sum_{l \in S_f} \nu_{fl} \frac{2 \delta_W (w_{fr}  w_{lr})}{\delta^2_W+(w_{fr}  w_{lr})^2}, $$(36)$$ \nabla_{h_{rt}} U(\varvec{H})= \sum_{l \in S_t} \nu_{tl} \frac{2 \delta_H (h_{rt}  h_{rl})}{\delta^2_H+(h_{rt}  h_{rl})^2}. $$(37)
Experiments
Benchmarks
Instantaneous  Convolutive 

male3_inst_mix  male3_synthconv_250ms_1m_mix 
female3_inst_mix  female3_synthconv_250ms_1m_mix 
nodrums_inst_mix  nodrums_synthconv_250ms_1m_mix 
wdrums_inst_mix  wdrums_synthconv_250ms_1m_mix 
The spectrograms were obtained by a shorttime fourier transform (STFT) using halfoverlapping sine windows. To create the spectrograms and recover the timedomain signals from STFT coefficients, we used the corresponding stft _ multi and istft _ multi Matlab functions from the SiSEC2008 webpage^{2} [48]. For instantaneous and convolutive mixtures, the window lengths were set to 1,024 and 2,048 samples, respectively.
The EMNMF algorithm was taken from Ozerov’s homepage^{3}, while the MRFEMNMF algorithm was coded and extensively tested by Ochal [49].
The proposed algorithm is based on an alternating optimization scheme, which is intrinsically nonconvex, and hence, its initialization plays an important role. An incorrect initialization may result in slow convergence and early stagnation at an unfavorable local minimum of the objective function. As done in many NMF algorithms, the factors W and H are initialized with uniformly distributed random numbers, whereas the entries in the matrix A are drawn from a zeromean complex Gaussian distribution. After W and H have been initialized, the covariance matrices \(\varvec{\Upsigma}_{ft}^{(s)}\) and \(\varvec{\Upsigma}_{ft}^{(c)}\) given by (8) can be computed. A noise covariance matrix \(\varvec{\Upsigma}_n\) is needed to update the Estep. Ozerov and Fevotte [23] tested several techniques for determining this matrix. The Estep in MRFEMNMF is identical to that in EMNMF [23], and hence, all of these techniques can be used in this experiment. The initial matrix \(\varvec{\Upsigma}_n\) was determined based on the empirical variance of the observed power spectrograms.
The MRFEMNMF and EMNMF algorithms were initialized using the same random values (given as \(\bar{R}\)) and run for 1,500 iterations.
 Linear thresholding:$$ \alpha(k)=\alpha \frac{k}{k_{\rm max}}, $$
 Nonlinear thresholding:$$ \alpha(k)=\frac{\alpha}{2} \left (1+\tanh \left (\frac{k  \nu k_{\rm max}}{\tau k_{\rm max}} \right) \right), $$
 Fixed thresholding:where k is the current iteration, k _{max} is the maximum number of iterations, \(\tau \in (0,1)\) is the shape parameter, \(\nu \in (0,1)\) is the shift parameter, k _{1} is the threshold, and α can be equal to α_{ W } or α_{ H }. All of the above thresholding strategies aim to relax smoothing during the early iterations when the descent directions in the updates are sufficiently steep and to emphasize smoothing if noisy perturbations become significantly detrimental to the overall smoothness. These strategies are motivated by standard regularization rules that apply to illposed problems. We tested all of the thresholding strategies using instantaneous and convolutive mixtures, and we obtained the best performance with fixed thresholding using k _{1} = k _{max}/2.$$ \alpha(k)=\left \{\begin{array}{cc}\alpha & \hbox{if }\quad k > k_1, \\ 0 & \hbox{otherwise} \end{array} \right. $$
The parameters δ_{ W } and δ_{ H } in the MRF models can be estimated using standard marginalization procedures or by maximizing the Type II ML estimate for (10). However, these techniques have a huge computational cost for the nonlinear potential functions in the MRF models. For practical reasons, they are not very useful for the GR or HR functions.
Parameters of the MRFEMNMF algorithm for each test case shown in Fig. 1
Benchmark  Smoothing  Instantaneous mixture  Convolutive mixture  


 \(\bar{R}\)  α_{ W }  α_{ H }  δ_{ W }  δ_{ H }  \(\bar{R}\)  α_{ W }  α_{ H }  δ_{ W }  δ_{ H } 
Male  GR  12  0.01  0.01  1  1  4  0.01  0.01  0.1  10 
Male  HL  12  0.001  0.001  1  10  4  0.001  0.01  1  1 
Male  1Gaussian  12  0.001  0.01  –  –  4  0.05  0.05  –  – 
Male  2Gaussian  12  0.001  0.01  –  –  4  0.05  0.01  –  – 
Female  GR  12  0.01  0.01  10  10  4  0.01  0.01  1  1 
Female  HL  12  0.001  0.001  1  10  4  0.001  0.001  0.1  10 
Female  1Gaussian  12  0.001  0.001  –  –  4  0.1  0.001  –  – 
Female  2Gaussian  12  0.001  0.001  –  –  4  0.05  0.005  –  – 
Nodrums  GR  4  0.01  0.01  10  1  4  0.01  0.01  10  0.1 
Nodrums  HL  4  0.01  0.001  1  10  4  0.01  0.01  0.1  0.1 
Nodrums  1Gaussian  4  0.001  0.01  –  –  4  0.001  0.05  –  – 
Nodrums  2Gaussian  4  0.01  0.001  –  –  4  0.005  0.01  –  – 
Wdrums  GR  4  0.01  0.01  1  10  4  0.01  0.01  1  1 
Wdrums  HL  4  0.01  0.001  1  10  4  0.001  0.01  1  0.1 
Wdrums  1Gaussian  4  0.001  0.001  –  –  4  0.001  0.1  –  – 
Wdrums  2Gaussian  4  0.001  0.001  –  –  4  0.005  0.1  –  – 
Mean SDR (dB) and running time (s) for sources estimated from the mixtures shown in Table 2
Benchmark  Mixture  Male  Female  Nodrums  Wdrums  Time 

MRFEMNMF (HR)  inst  8.06  9.95  24.07  21.72  2487 
MRFEMNMF (GR) [33]  inst  7.69  8.86  26.65  21.28  2498 
EMNMF [23]  inst  2.62  6.5  11.7  19.87  2456 
GGP [51]  inst  8.4  8.57  13.9  10.3  5 
SABM+SSDP [52]  inst  4.25  3.82  5.83  9.43  2 
MRFEMNMF (HR)  conv  1.06  2.2  1.17  1.7  2760 
MRFEMNMF (GR) [33]  conv  1.4  2.1  1.2  1.56  2762 
EMNMF [23]  conv  0.95  1.6  0.2  0.44  2720 
IPD [21]  conv  1.53  1.43  2.2  −2.7  1200 
FBWCPA [17]  conv  −0.1  4.43  0.77  −2.53  40 
Generalized FBWCPA [18]  conv  5.95  7.45  1.2  −0.69  8 
ConvNMF [22]  conv  −0.7  −0.47  3.85  8.13  347 
The averaged elapsed time measured using Matlab 2008a for 1,500 iterations with \(\bar{R}=12\), executed on a 64bit Intel Quad Core CPU 3 GHz with 8 GB RAM was almost the same for the MRFEMNMF and EMNMF algorithms (see Table 4).
The simulations demonstrate that MRF smoothing improved the source separation results in almost all test cases. The results confirmed that instantaneous mixtures were considerably easier to separate than convolutive ones. The MRFEMNMF algorithm delivered the best mean SDR performance of all the algorithms tested with instantaneous mixtures. The highest SDR values were produced with instantaneously mixed nonpercussive music sources. This was justified by the smooth frequency and temporal structures of nonpercussive music spectrograms. If the source spectrograms were not very smooth (as with the percussive audio recordings), MRF smoothing gave only a slight improvement (see Figs. 1, 2) in the firstorder MRF interactions, and even a slight deterioration in the higherorder MRF interactions. According to Fig. 1, the HL function delivered the most promising SDR results, which were stable with a wide range of parameters. In each case with the instantaneous mixtures, the best results were produced with the same hyperparameter values, δ_{ W } = 1 and δ_{ H } = 10, and almost the same penalty parameter values, α_{ W } and α_{ H }. The SAR model also improved the results compared with the standard EMNMF algorithm. Moreover, the SAR model was tuned using only two penalty parameters, and the partition function of the associated Gibbs prior could be derived using a closedform expression, which might be very useful for datadriven hyperparameter estimation.
The source separation results produced with the MRFEMNMF algorithm for convolutive and underdetermined mixtures were better than those obtained with the EMNMF algorithm. Unfortunately, the SDR values showed that these results were still a long way from being perfect, even after 1,500 iterations, and thus, further research is needed in this field. It is likely that some additional prior information could be imposed, especially on a mixing operator, which might increase the efficiency considerably.
It should be noted that the SDR performance with both mixtures could still be improved by refining the associated parameters, especially in the MRF models, and by using more efficient initializers.
Conclusions
This study demonstrated that imposing MRF smoothing on the power spectrograms of audio sources estimated from underdetermined unmixing problems may improve the quality of estimated audio sounds considerably. This was justified because any type of meaningful prior information improves the performance, especially with underdetermined problems. This study addressed the application of MRF smoothing in the EMNMF algorithm, but this type of smoothing could be applied to many other related BSS algorithms based on feature extraction from power spectrograms. Thus, the theoretical results presented in this paper may have broad practical applications. Clearly, further studies are needed to improve this technique for convolutive mixtures and to integrate regularization parameter estimation techniques in the main algorithm.
Footnotes
Notes
Acknowledgments
This work was supported by habilitation grant N N515 603139 (2010–2012) from the Ministry of Science and Higher Education, Poland. The author would like to thank the reviewers for their valuable comments.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
References
 1.Cichocki A, Amari SI. Adaptive blind signal and image processing (new revised and improved edition). New York: Wiley; 2003.Google Scholar
 2.Hyvrinen A, Karhunen J, Oja E. Independent component analysis. New York: Wiley; 2001.CrossRefGoogle Scholar
 3.Comon P, Jutten C. Handbook of blind source separation: independent component analysis and applications. 1st ed. Burlington, MA: Academic Press, Elsevier; 2010, ISBN: 0123747260, 9780123747266.Google Scholar
 4.Naik GR, Kumar DK. Dimensional reduction using blind source separation for identifying sources. Int J Innov Comput Inf Control (IJICIC). 2011;7(2):989–1000.Google Scholar
 5.Popescu TD. A new approach for dam monitoring and surveillance using blind source separation. Int J Innov Comput Inf Control (IJICIC). 2011;7(6):3811–3824.Google Scholar
 6.Zhang Z, Miyake T, Imamura T, Enomoto T, Toda H. Blind source separation by combining independent component analysis with the complex discrete wavelet transform. Int J Innov Comput Inf Control (IJICIC). 2010;6(9):4157–4172.Google Scholar
 7.Khosravy M, Asharif MR, Yamashita K: A PDFmatched shortterm linear predictability approach to blind source separation. Int J Innov Comput Inf Control (IJICIC). 2009;5(11(A)):3677–3690.Google Scholar
 8.Yang Z, Zhou G, Ding S, Xie S. Nonnegative blind source separation by iterative volume maximization with fully nonnegativity constraints. ICIC Express Lett. 2010;4(6(B)):2329–2334.Google Scholar
 9.Pao TL, Liao WY, Chen YT, Wu TN. Mandarin audiovisual speech recognition with effects to the noise and emotion. Int J Innov Comput Inf Control (IJICIC). 2010;6(2):711–724.Google Scholar
 10.Lin SD, Huang CC, Lin JH. A hybrid audio watermarking technique in cepstrum domain. ICIC Express Lett. 2010;4(5(A)):1597–1602.Google Scholar
 11.Zin TT, Hama H, Tin P, Toriu T. HOG embedded markov chain model for pedestrian detection. ICIC Express Lett. 2010;4(6(B)):2463–2468.Google Scholar
 12.Virtanen T. Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process. 2007;15(3):1066–1074.CrossRefGoogle Scholar
 13.Pedersen MS, Larsen J, Kjems U, Parra LC. Convolutive blind source separation methods. In: Benesty J, Huang Y, Sondhi M, editors. Springer handbook of speech processing. Berlin: Springer; 2008. p. 1065−94, ISBN: 9783540491255.Google Scholar
 14.Parra L, Spence C. Convolutive blind separation of nonstationary sources. IEEE Trans Speech Audio Process. 2000;8(3)320–327.CrossRefGoogle Scholar
 15.Yilmaz O, Rickard S. Blind separation of speech mixtures via timefrequency masking. IEEE Trans Signal Process. 2004;52(7):1830–1847.CrossRefGoogle Scholar
 16.Reju VG, Koh SN, Soon IY. Underdetermined convolutive blind source separation via timefrequency masking. IEEE Trans Audio Speech Lang Process. 2010;18(1):101–116.CrossRefGoogle Scholar
 17.Sawada H, Araki S, Makino S. Measuring dependence of binwise separated signals for permutation alignment in frequencydomain bss. In: ISCAS; 2007. p. 3247–3250.Google Scholar
 18.Sawada H, Araki S, Makino S. Underdetermined convolutive blind source separation via frequency binwise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process. 2011;19(3):516–527.CrossRefGoogle Scholar
 19.AïssaElBey A, AbedMeraim K, Grenier Y. Blind separation of underdetermined convolutive mixtures using their timefrequency representation. IEEE Trans Audio Speech Lang Process. 2007;15(5):1540–1550.CrossRefGoogle Scholar
 20.Weiss RJ, Ellis DPW. Speech separation using speakeradapted eigenvoice speech models. Comput Speech Lang. 2010; 24(1):16–29.CrossRefGoogle Scholar
 21.Mandel MI, Ellis DPW, Jebara T. An EM algorithm for localizing multiple sound sources in reverberant environments. In: Schölkopf B, Platt J, Hoffman T, editors. Advances in neural information processing systems 19. Cambridge: MIT Press; p. 953–960.Google Scholar
 22.Smaragdis P. Convolutive speech bases and their application to supervised speech separation. IEEE Trans Audio Speech Lang Process. 2007;15(1):1–12.CrossRefGoogle Scholar
 23.Ozerov A, Févotte C. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process. 2010;18(3):550–563.CrossRefGoogle Scholar
 24.Ozerov A, Févotte C, Blouet R, Durrieu JL (2011) Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation. In: ICASSP; p. 257–260.Google Scholar
 25.Lee DD, Seung HS. Learning the parts of objects by nonnegative matrix factorization. Nature 1999;401:788–791.PubMedCrossRefGoogle Scholar
 26.Cichocki A, Zdunek R, Phan AH, Amari SI. Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. Chichester, UK: Wiley and Sons; 2009.CrossRefGoogle Scholar
 27.Benaroya L, Gribonval R, Bimbot F. Nonnegative sparse representation for Wiener based source separation with a single sensor. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP’03), Hong Kong; 2003. p. 613–616.Google Scholar
 28.Févotte C, Bertin N, Durrieu JL. Nonnegative matrix factorization with the ItakuraSaito divergence: with application to music analysis. Neural Computation. 2009;21(3):793–830.PubMedCrossRefGoogle Scholar
 29.Duong NQK, Vincent E, Gribonval R. Underdetermined reverberant audio source separation using a fullrank spatial covariance model. IEEE Trans Audio Speech Lang Process. 2010;18(7);1830–1840.CrossRefGoogle Scholar
 30.Zdunek R, Cichocki A. Blind image separation using nonnegative matrix factorization with Gibbs smoothing. In: Ishikawa M, Doya K, Miyamoto H, Yamakawa T editors. Neural information processing, vol 4985 of Lecture notes in computer science. Berlin: Springer; 2008. p. 519–528 ICONIP 2007.Google Scholar
 31.Zdunek R, Cichocki A. Improved MFOCUSS algorithm with overlapping blocks for locally smooth sparse signals. IEEE Trans Signal Process. 2008;56(10):4752–4761.CrossRefGoogle Scholar
 32.Ozerov A, Vincent E, Bimbot F. A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process. 2012;20(4):1118–1133.CrossRefGoogle Scholar
 33.Zdunek R. Convolutive nonnegative matrix factorization with Markov random field smoothing for blind unmixing of multichannel speech recordings. In: TraviesoGonzalez CM, AlonsoHernandez JB, editors. Advances in nonlinear speech processing, vol 7015 of Lecture notes in artificial intelligence (LNAI). Springer Berlin/Heidelberg; 2011. p. 25–32 NOLISP 2011.Google Scholar
 34.Green PJ. Bayesian reconstruction from emission tomography data using a modified EM algorithm. IEEE Trans Med Imaging. 1990;9:84–93.PubMedCrossRefGoogle Scholar
 35.Itakura F, Saito S. An analysissynthesis telephony based on the maximum likelihood method, vol c55. In: Proceedings of the 6th International Congress on Acoustics, Tokyo, Japan. New York: Elsevier; 1968. p. 17–20.Google Scholar
 36.Besag J. Toward Bayesian image analysis. J Appl Stat. 1989;16:395–407.CrossRefGoogle Scholar
 37.Bouman CA, Sauer K. A generalized Gaussian image model for edgepreserving MAP estimation. IEEE Trans Image Process. 1993;2:296–310.PubMedCrossRefGoogle Scholar
 38.Geman S, McClure D (1987) Statistical methods for tomographic image reconstruction. Bull Int Stat Inst. 1987;LII4: 5–21.Google Scholar
 39.Geman S, Reynolds G. Constrained parameters and the recovery of discontinuities. IEEE Trans Pattern Anal Mach Intell. 1992;14:367–383.CrossRefGoogle Scholar
 40.Hebert T, Leahy R. A generalized EM algorithm for 3D Bayesian reconstruction from poisson data using Gibbs priors. IEEE Trans Med Imaging. 1989;8:194–202.PubMedCrossRefGoogle Scholar
 41.Lange K. Convergence of EM image reconstruction algorithms with Gibbs smoothing. IEEE Trans Med Imaging. 1990;9(4):439–446.PubMedCrossRefGoogle Scholar
 42.Whittle P. On stationary processes in the plane. Biometrika. 1954;41(3):434–449.Google Scholar
 43.Besag J. Spatial interactions and the statistical analysis of lattice systems. J R Stat Soc Ser B. 1974;36:192–236.Google Scholar
 44.Ripley BD. Spatial statistics. New York: Wiley; 1981.CrossRefGoogle Scholar
 45.Molina R, Katsaggelos A, Mateos J. Bayesian and regularization methods for hyperparameter estimation in image restoration. IEEE Trans Image Process. 1999;8(2):231–246.PubMedCrossRefGoogle Scholar
 46.Galatsanos N, Mesarovic V, Molina R, Katsaggelos A. Hierarchical Bayesian image restoration for partiallyknown blurs. IEEE Trans Image Process. 2000;9(10):1784–1797.PubMedCrossRefGoogle Scholar
 47.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. 1977;39(1):1–38.Google Scholar
 48.Vincent E, Araki S, Theis FJ, Nolte G, Bofill P, Sawada H, Ozerov A, Gowreesunker BV, Lutter D, Duong QKN. The signal separation evaluation campaign (2007–2010): achievements and remaining challenges. Signal Process. 2012;92:1928–1936.CrossRefGoogle Scholar
 49.Ochal P. Application of convolutive nonnegative matrix factorization for separation of muscial instrument sounds from multichannel polyphonic recordings. M.Sc. thesis (supervised by Dr. R. Zdunek), Wroclaw University of Technology, Poland (2010) (in Polish).Google Scholar
 50.Vincent E, Gribonval R, Févotte C. Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 2006;14(4):1462–1469.CrossRefGoogle Scholar
 51.Vincent E. Complex nonconvex lp norm minimization for underdetermined source separation. In: Proceedings of the 7th international conference on Independent component analysis and signal separation. ICA’07. Berlin: Springer; 2007. p. 430–437.Google Scholar
 52.Xiao M, Xie S, Fu Y. A statistically sparse decomposition principle for underdetermined blind source separation. In: Proceedings of 2005 international symposium on intelligent signal processing and communication systems (ISPACS 2005); 2005. p. 165–168.Google Scholar