Underdetermined Audio Source Separation Using Laplacian Mixture Modelling

Mitianoudis, Nikolaos

doi:10.1007/978-3-642-55016-4_7

Nikolaos Mitianoudis³

Part of the book series: Signals and Communication Technology ((SCT))

2847 Accesses

Abstract

The problem of underdetermined audio source separation has been explored in the literature for many years. The instantaneous $K$-sensors, $L$-sources mixing scenario (where $K<L$) has been tackled by many different approaches, provided the sources remain quite distinct in the virtual positioning space spanned by the sensors. In this case, the source separation problem can be solved as a directional clustering problem along the source position angles in the mixture. The use of Laplacian Mixture Models in order to cluster and thus separate sparse sources in underdetermined mixtures will be explained in detail in this chapter. The novel Generalised Directional Laplacian Density will be derived in order to address the problem of modelling multidimensional angular data. The developed scheme demonstrates robust separation performance along with low processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Model-Independent Method of Nonlinear Blind Source Separation

Multichannel Audio Source Separation Exploiting NMF-Based Generic Source Spectral Model in Gaussian Modeling Framework

DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

Article Open access 18 June 2022

Notes

1.
A $\pi $-periodicity is valid for the observed phenomenon, since data in $(\pi /2,3\pi /2)$ are symmetrical to the ones in $(-\pi /2,\pi /2)$ (See Fig. 7.1b). Hence, the use of the atan function instead of the extended atan2 function is justified. For the rest of the analysis, we will assume that $\theta _n$ takes values between $(0,\pi )$ rather than $(-\pi /2,\pi /2)$. This implies that data in the 4th quadrant $(-\pi /2,0)$ are mapped with odd symmetry to the 2nd quadrant ($\pi /2,\pi $). This is performed in order to facilate the derivations of the Generalised Directional Laplacian Distribution and does not alter anything in the actual data.
2.
Note that for $n$ positive integer, we have that $\varGamma (n)=(n-1)!$
3.
MATLAB code for the “GaussSep” algorithm is available from http://www.irisa.fr/metiss/members/evincent/software.
4.
MATLAB code for the “DEMIX” algorithm is available from http://infoscience.epfl.ch/record/165878/files/.
5.
http://utopia.duth.gr/~nmitiano/mdld.htm

References

Araki, S., Sawada, H., Mukai, R., Makino, S.: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process. 87, 1833–1847 (2007)
Article MATH Google Scholar
Arberet, S., Gribonval, R., Bimbot, F.: A robust method to count and locate audio sources in a multichannel underdetermined mixture. IEEE Trans. Signal Process. 58(1), 121–133 (2010)
Article MathSciNet Google Scholar
Attias, H.: Independent factor analysis. Neural Comput. 11(4), 803–851 (1999)
Article Google Scholar
Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
MATH MathSciNet Google Scholar
Bentley, J.: Modelling circular data using a mixture of von Mises and uniform distributions. Simon Fraser University, MSc thesis (2006)
Google Scholar
Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden mixture models. Technical Report, Department of Electrical Engineering and Computer Science, U.C. Berkeley, California (1998)
Google Scholar
Blumensath, T., Davies, M.: Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Process. 14(1), 50–57 (2006)
Article Google Scholar
Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Process. 81(11), 2353–2362 (2001)
Article MATH Google Scholar
Cardoso, J.F.: Blind signal separation: statistical principles. Proc. IEEE 9(10), 2009–2025 (1998)
Article Google Scholar
Cemgil, A., Févotte, C., Godsill, S.: Variational and stochastic inference for bayesian source separation. Digit. Signal Process. 17, 891913 (2007)
Article Google Scholar
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Wiley, New York (2002)
Google Scholar
Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications, 856 p. Academic Press, Waltham (2010)
Google Scholar
Daudet, L., Sandler, M.: MDCT analysis of sinusoids: explicit results and applications to coding artifacts reduction. IEEE Trans. Speech Audio Process. 12(3), 302–312 (2004)
Google Scholar
Davies, M., Mitianoudis, N.: A simple mixture model for sparse overcomplete ICA. IEE Proc. Vis. Image Signal Process. 151(1), 35–43 (2004)
Article Google Scholar
Davies, M., Daudet, L.: Sparse audio representations using the mclt. Signal Process. 86(3), 358–368 (2006)
Article Google Scholar
Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood for incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
Google Scholar
Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
Book MATH Google Scholar
Dhillon, I., Sra, S.: Modeling data using directional distributions. Technical Report TR-03-06, University of Texas at Austin, Austin, TX (2003)
Google Scholar
Duarte, L., Suyama, R., Rivet, B., Attux, R., Romano, J., Jutten, C.: Blind compensation of nonlinear distortions: application to source separation of post-nonlinear mixtures. IEEE Trans. Signal Process. 60(11), 5832–5844 (2012)
Article MathSciNet Google Scholar
Duong, N., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)
Google Scholar
Eriksson, J., Koivunen, V.: Identifiability, separability, and uniqueness of linear ica models. IEEE Signal Process. Lett. 11(7), 601–604 (2004)
Article Google Scholar
Févotte, C., Godsill, S.: A bayesian approach to blind separation of sparse sources. IEEE Trans. Audio Speech Lang. Process. 14(6), 2174–2188 (2006)
Google Scholar
Févotte, C., Gribonval, R., Vincent, E.: BSS EVAL toolbox user guide. Techical Report, IRISA Technical Report 1706, Rennes, France, April 2005, http://www.irisa.fr/metiss/bsseval/ (2005)
Fisher, N.: Statistical Analysis of Circular Data. Cambridge University Press, Cambridge (1993)
Google Scholar
Girolami, M.: A variational method for learning sparse and overcomplete representations. Neural Comput. 13(11), 2517–2532 (2001)
Article MATH Google Scholar
Gribonval, R., Nielsen, M.: Sparse decomposition in unions of bases. IEEE Trans. Inf. Theory 49(12), 3320–3325 (2003)
Article MATH MathSciNet Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York, 481+xxii p. http://www.cis.hut.fi/projects/ica/book/ (2001)
Hyvärinen, A.: Independent component analysis in the presence of Gaussian noise by maximizing joint likelihood. Neurocomputing 22, 49–67 (1998)
Article MATH Google Scholar
Jammalamadaka, S., Sengupta, A.: Topics in Circular Statistics. World Scientific, Singapore (2001)
Google Scholar
Jutten, C., Karhunen, J.: Advances in nonlinear blind source separation. In: Proceedings of 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), pp. 245–256. Nara, Japan (2003)
Google Scholar
Kreyszig, E.: Advanced Engineering Mathematics, 1264 p. Wiley (2010)
Google Scholar
Lee, T.W., Bell, A.J., Lambert, R.: Blind separation of delayed and convolved sources. In: Advances in Neural Information Processing Systems (NIPS), vol. 9, pp. 758–764 (1997)
Google Scholar
Lee, T.W., Lewicki, M., Girolami, M., Sejnowski, T.: Blind source separation of more sources than mixtures using overcomplete representations. IEEE Signal Process. Lett. 4(5), 87–90 (1999)
Google Scholar
Lewicki, M., Sejnowski, T.: Learning overcomplete representations. Neural Comput. 12, 337–365 (2000)
Article Google Scholar
Lewicki, M.: Efficient coding of natural sounds. Nat. Neurosci. 5(4), 356–363 (2002)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. Berkeley, California (1967)
Google Scholar
Mardia, K., Kanti, V., Jupp, P.: Directional Statistics. Wiley, Chichester (1999)
Google Scholar
Mitianoudis, N., Davies, M.: Permutation alignment for frequency domain ICA using subspace beamforming methods. In: Proceedings of the International Workshop on Independent Component Analysis and Source Separation (ICA2004), pp. 127–132. Granada, Spain (2004)
Google Scholar
Mitianoudis, N., Stathaki, T.: Underdetermined source separation using mixtures of warped Laplacians. In: International Conference on Independent Component Analysis and Source Separation (ICA). London, UK (2007)
Google Scholar
Mitianoudis, N.: A directional Laplacian density for underdetermined audio source separation. In: 20th International Conference on Artificial Neural Networks (ICANN). Thessaloniki, Greece (2010)
Google Scholar
Mitianoudis, N., Davies, M.: Audio source separation of convolutive mixtures. IEEE Trans. Audio Speech Process. 11(5), 489–497 (2003)
Article Google Scholar
Mitianoudis, N., Stathaki, T.: Batch and online underdetermined source separation using Laplacian mixture models. IEEE Trans. Audio Speech Lang. Process. 15(6), 1818–1832 (2007)
Article Google Scholar
Moulines, E., Cardoso, J.F., Gassiat, E.: Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’97), pp. 3617–3620. Munich, Germany (1997)
Google Scholar
O’Grady, P., Pearlmutter, B.: Hard-LOST: modified K-means for oriented lines. In: Proceedings of the Irish Signals and Systems Conference, pp. 247–252. Ireland (2004)
Google Scholar
O’Grady, P., Pearlmutter, B.: Soft-LOST: EM on a mixture of oriented lines. In: Proceedings of the International Conference on Independent Component Analysis 2004, pp. 428–435. Granada, Spain (2004)
Google Scholar
Pajunen, P., Hyvärinen, A., Karhunen, J.: Nonlinear blind source separation by self-organizing maps. In: Proceedings of the International Conference on Neural Information Processing, pp. 1207–1210. Hong Kong (1996)
Google Scholar
Plumbley, M., Abdallah, S., Blumensath, T., Davies, M.: Sparse representations of polyphonic music. Signal Process. 86(3), 417–431 (2006)
Article MATH Google Scholar
Rickard, S., Balan, R., Rosca, J.: Real-time time-frequency based blind source separation. In: Proceedings of the ICA2001, pp. 651–656. San Diego, CA (2001)
Google Scholar
Sawada, H., Araki, S., Makino, S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007), pp. 139–142 (2007)
Google Scholar
Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2011)
Google Scholar
Sawada, H., Mukai, R., Araki, S., Makino, S.: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 75–87 (2004)
Article Google Scholar
SiSEC 2008: Signal separation evaluation campaign. http://sisec2008.wiki.irisa.fr/tiki-index.php
SiSEC 2010: Signal separation evaluation campaign. http://sisec2010.wiki.irisa.fr/tiki-index.php
SiSEC 2011: Signal separation evaluation campaign. http://sisec.wiki.irisa.fr/tiki-index.php
Smaragdis, P., Boufounos, P.: Position and trajectory learning for microphone arrays. IEEE Trans. Audio Speech Lang. Process. 15(1), 358–368 (2007)
Google Scholar
Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22, 21–34 (1998)
Article MATH Google Scholar
Torkkola, K.: Blind separation of delayed and convolved sources. In: S. Haykin (ed.) Unsupervised Adaptive Filtering, vol. I, pp. 321–375. Wiley (2000)
Google Scholar
Vincent, E., Arberet, S., Gribonval, R.: Underdetermined instantaneous audio source separation via local gaussian modeling. In: 8th International Conferences on Independent Component Analysis and Signal Separation (ICA), pp. 775–782. Paraty, Brazil (2009)
Google Scholar
Vincent, E., Gribonval, R., Fevotte, C., Nesbit, A., Plumbley, M., Davies, M., Daudet, L.: BASS-dB: the blind audio source separation evaluation database. http://bass-db.gforge.inria.fr/BASS-dB/
Winter, S., Kellermann, W., Sawada, H., Makino, S.: MAP based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and L1-norm minimization. EURASIP J. Adv. Signal Process. 1, 12 p (2007)
Google Scholar
Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)
Article MathSciNet Google Scholar
Zibulevsky, M., Kisilev, P., Zeevi, Y., Pearlmutter, B.: Blind source separation via multinode sparse representation. Adv. Neural Inf. Process. Syst. 14, 1049–1056 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing and Multimedia Lab, Electrical and Computer Engineering Department, Democritus University of Thrace, 67100 , Xanthi, Greece
Nikolaos Mitianoudis

Authors

Nikolaos Mitianoudis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaos Mitianoudis .

Editor information

Editors and Affiliations

University of Technology, Sydney, Sydney, Australia
Ganesh R. Naik
University of Surrey, Guildford, United Kingdom
Wenwu Wang

Appendices

Appendix 1

Calculation of the Normalisation Parameter for the Generalised DLD

To estimate the normalisation coefficient $c_p(k)$ of (7.27), we need to solve the following equation:

$$\begin{aligned} \int \limits _{\mathbf {x}\in \fancyscript{S}^{p-1}} c_p(k)e^{-k\sqrt{1-(\mathbf {m}^T\mathbf {x})^2}} \mathrm{d}\mathbf {x}=1. \end{aligned}$$

(7.43)

Following Eq. (B.8) and in a similar manner to the analysis in Appendix B.2 in [18], we can rewrite the above equation as follows:

$$\begin{aligned} c_p(k)\int \limits _{0}^{\pi }\mathrm{d}\theta _{p-1}\int \limits _{0}^{\pi } e^{-k\sqrt{1-\cos ^2\theta _1}}\sin ^{p-2}\theta _1 \mathrm{d}\theta _1 \prod _{j=3}^{p-1}\int \limits _{0}^{\pi }\sin ^{p-j}\theta _{j-1} \mathrm{d}\theta _{j-1}=1. \end{aligned}$$

(7.44)

Following a similar methodology to Appendix B.2 in [18], the above yields:

$$\begin{aligned} c_p(k)\pi \int \limits _{0}^{\pi } e^{-k\sin \theta _1}\sin ^{p-2}\theta _1\mathrm{d}\theta _1 \frac{\pi ^{\frac{p-3}{2}}}{\varGamma \left( \frac{p-1}{2}\right) }=1. \end{aligned}$$

(7.45)

Using the definition of $I_p(k)$, we can write

$$\begin{aligned} c_p(k)I_{p-2}(k)\frac{\pi ^{\frac{p+1}{2}}}{\varGamma \left( \frac{p-1}{2}\right) }=1\Rightarrow c_p(k)=\frac{\varGamma \left( \frac{p-1}{2}\right) }{\pi ^{\frac{p+1}{2}}I_{p-2}(k)}. \end{aligned}$$

(7.46)

Appendix 2

Gradient Updates for $\mathbf {m}$ and $k$ for the MDDLD

The first-order derivative of the log-likelihood in (7.28) for the estimation of $\mathbf {m}$ are calculated below:

$$\begin{aligned} \frac{\partial J(\mathbf {X},\mathbf {m},k)}{\partial \mathbf {m}}&= -k\sum _{n-1}^N\frac{-2\mathbf {m}^T\mathbf {x}_n}{2\sqrt{1-(\mathbf {m}^T\mathbf {x}_n)^2}}\mathbf {x}_n\nonumber \\&= k\sum _{n=1}^N\frac{\mathbf {m}^T\mathbf {x}_n}{\sqrt{1-(\mathbf {m}^T\mathbf {x}_n)^2}}\mathbf {x}_n. \end{aligned}$$

(7.47)

Before we estimate $k$ from the log-likelihood (7.28), we derive the following property:

$$\begin{aligned} \frac{\partial }{\partial k}I_0(k)= -\frac{1}{\pi }\int \limits _{0}^{\pi } e^{-k\sin \theta }\sin \theta \mathrm{d}\theta =-I_1(k). \end{aligned}$$

(7.48)

The above property can be generalised as follows:

$$\begin{aligned} \frac{\partial ^p}{\partial k^p}I_0(k)=(-1)^{p}\frac{1}{\pi }\int \limits _{0}^{\pi }\sin ^p\theta e^{-k\sin \theta }\mathrm{d}\theta =(-1)^{p}I_p(k). \end{aligned}$$

(7.49)

The first-order derivative of the log-likelihood in (7.28) for the estimation of $k$ is then calculated below:

$$\begin{aligned} \frac{\partial J(\mathbf {X},\mathbf {m},k)}{\partial k}=N\frac{I_{p-1}(k)}{I_{p-2}(k)}-\sum _{n=1}^{N}\sqrt{1-(\mathbf {m}^T\mathbf {x}_n)^2}. \end{aligned}$$

(7.50)

Appendix 3

A Directional K-Means Algorithm

Assume that $K$ is the number of clusters, $\fancyscript{C}_i,\mathrm i=1,\dots ,K$ are the clusters, $\mathbf {m}_i$ are the cluster centres and $\mathbf {X}=\{\mathbf {x}_1,\dots ,\mathbf {x}_n,\dots ,\mathbf {x}_N\}$ is a $p$-D angular dataset lying on the half-unit $p$-D sphere. The original $K$-means [36] minimises the following non-directional error function:

$$\begin{aligned} Q=\sum _{n=1}^N\sum _{i=1}^K||\mathbf {x}_n-\mathbf {m}_i||^2 \end{aligned}$$

(7.51)

where $||\cdot ||$ represents the Euclidean distance. Instead of using the squared Euclidean distance for the $p$-D Directional $K$-Means, we introduce the following distance function:

$$\begin{aligned} D_l(\mathbf {x}_n,\mathbf {m}_i)=\sqrt{1-(\mathbf {m}_i^T\mathbf {x}_n)^2}. \end{aligned}$$

(7.52)

The novel function $D_l$ is similarly monotonic as the original distance but emphasises more on the contribution of points closer to the cluster centre. In addition, $D_l$ is periodic with period $\pi $. The $p$-D Directional $K$-Means can thus be described as follows:

1.
Randomly initialise $K$ cluster centres $\mathbf {m}_i$, where $||\mathbf {m}_i||=1$.
2.
Calculate the distance of all points $\mathbf {x}_n$ to the cluster centres $\mathbf {m}_i$, using $D_l$.
3.
The points with minimum distance to the centres $\mathbf {m}_i$ form the new clusters $\fancyscript{C}_i$.
4.
The clusters $\fancyscript{C}_i$ vote for their new centres $\mathbf {m}_i^+$. To avoid averaging mistakes with directional data, vector averaging is employed to ensure the validity of the addition. The resulting average is normalised to the half-unit $p$-D sphere:
$$\begin{aligned} \mathbf {m}_i^+=\frac{1}{k_i}\sum _{\mathbf {x}_n \in k_i }\mathbf {x}_n \end{aligned}$$
(7.53)

$$\begin{aligned} \mathbf {m}_i^+\leftarrow \mathbf {m}_i^+/||\mathbf {m}_i^+|| \end{aligned}$$
(7.54)
5.
Repeat steps (2)–(4) until the means $\mathbf {m}_i$ have converged.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mitianoudis, N. (2014). Underdetermined Audio Source Separation Using Laplacian Mixture Modelling. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-55016-4_7
Published: 22 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55015-7
Online ISBN: 978-3-642-55016-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Underdetermined Audio Source Separation Using Laplacian Mixture Modelling

Abstract

Access this chapter

Similar content being viewed by others

Model-Independent Method of Nonlinear Blind Source Separation

Multichannel Audio Source Separation Exploiting NMF-Based Generic Source Spectral Model in Gaussian Modeling Framework

DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Underdetermined Audio Source Separation Using Laplacian Mixture Modelling

Abstract

Access this chapter

Similar content being viewed by others

Model-Independent Method of Nonlinear Blind Source Separation

Multichannel Audio Source Separation Exploiting NMF-Based Generic Source Spectral Model in Gaussian Modeling Framework

DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation