Skip to main content

Underdetermined Audio Source Separation Using Laplacian Mixture Modelling

  • Chapter
  • First Online:
Blind Source Separation

Part of the book series: Signals and Communication Technology ((SCT))

  • 2847 Accesses

Abstract

The problem of underdetermined audio source separation has been explored in the literature for many years. The instantaneous \(K\)-sensors, \(L\)-sources mixing scenario (where \(K<L\)) has been tackled by many different approaches, provided the sources remain quite distinct in the virtual positioning space spanned by the sensors. In this case, the source separation problem can be solved as a directional clustering problem along the source position angles in the mixture. The use of Laplacian Mixture Models in order to cluster and thus separate sparse sources in underdetermined mixtures will be explained in detail in this chapter. The novel Generalised Directional Laplacian Density will be derived in order to address the problem of modelling multidimensional angular data. The developed scheme demonstrates robust separation performance along with low processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A \(\pi \)-periodicity is valid for the observed phenomenon, since data in \((\pi /2,3\pi /2)\) are symmetrical to the ones in \((-\pi /2,\pi /2)\) (See Fig. 7.1b). Hence, the use of the atan function instead of the extended atan2 function is justified. For the rest of the analysis, we will assume that \(\theta _n\) takes values between \((0,\pi )\) rather than \((-\pi /2,\pi /2)\). This implies that data in the 4th quadrant \((-\pi /2,0)\) are mapped with odd symmetry to the 2nd quadrant (\(\pi /2,\pi \)). This is performed in order to facilate the derivations of the Generalised Directional Laplacian Distribution and does not alter anything in the actual data.

  2. 2.

    Note that for \(n\) positive integer, we have that \(\varGamma (n)=(n-1)!\)

  3. 3.

    MATLAB code for the “GaussSep” algorithm is available from http://www.irisa.fr/metiss/members/evincent/software.

  4. 4.

    MATLAB code for the “DEMIX” algorithm is available from http://infoscience.epfl.ch/record/165878/files/.

  5. 5.

    http://utopia.duth.gr/~nmitiano/mdld.htm

References

  1. Araki, S., Sawada, H., Mukai, R., Makino, S.: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process. 87, 1833–1847 (2007)

    Article  MATH  Google Scholar 

  2. Arberet, S., Gribonval, R., Bimbot, F.: A robust method to count and locate audio sources in a multichannel underdetermined mixture. IEEE Trans. Signal Process. 58(1), 121–133 (2010)

    Article  MathSciNet  Google Scholar 

  3. Attias, H.: Independent factor analysis. Neural Comput. 11(4), 803–851 (1999)

    Article  Google Scholar 

  4. Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)

    MATH  MathSciNet  Google Scholar 

  5. Bentley, J.: Modelling circular data using a mixture of von Mises and uniform distributions. Simon Fraser University, MSc thesis (2006)

    Google Scholar 

  6. Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden mixture models. Technical Report, Department of Electrical Engineering and Computer Science, U.C. Berkeley, California (1998)

    Google Scholar 

  7. Blumensath, T., Davies, M.: Sparse and shift-invariant representations of music. IEEE Trans. Audio Speech Lang. Process. 14(1), 50–57 (2006)

    Article  Google Scholar 

  8. Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Process. 81(11), 2353–2362 (2001)

    Article  MATH  Google Scholar 

  9. Cardoso, J.F.: Blind signal separation: statistical principles. Proc. IEEE 9(10), 2009–2025 (1998)

    Article  Google Scholar 

  10. Cemgil, A., Févotte, C., Godsill, S.: Variational and stochastic inference for bayesian source separation. Digit. Signal Process. 17, 891913 (2007)

    Article  Google Scholar 

  11. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Wiley, New York (2002)

    Google Scholar 

  12. Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications, 856 p. Academic Press, Waltham (2010)

    Google Scholar 

  13. Daudet, L., Sandler, M.: MDCT analysis of sinusoids: explicit results and applications to coding artifacts reduction. IEEE Trans. Speech Audio Process. 12(3), 302–312 (2004)

    Google Scholar 

  14. Davies, M., Mitianoudis, N.: A simple mixture model for sparse overcomplete ICA. IEE Proc. Vis. Image Signal Process. 151(1), 35–43 (2004)

    Article  Google Scholar 

  15. Davies, M., Daudet, L.: Sparse audio representations using the mclt. Signal Process. 86(3), 358–368 (2006)

    Article  Google Scholar 

  16. Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood for incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    Google Scholar 

  17. Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)

    Book  MATH  Google Scholar 

  18. Dhillon, I., Sra, S.: Modeling data using directional distributions. Technical Report TR-03-06, University of Texas at Austin, Austin, TX (2003)

    Google Scholar 

  19. Duarte, L., Suyama, R., Rivet, B., Attux, R., Romano, J., Jutten, C.: Blind compensation of nonlinear distortions: application to source separation of post-nonlinear mixtures. IEEE Trans. Signal Process. 60(11), 5832–5844 (2012)

    Article  MathSciNet  Google Scholar 

  20. Duong, N., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)

    Google Scholar 

  21. Eriksson, J., Koivunen, V.: Identifiability, separability, and uniqueness of linear ica models. IEEE Signal Process. Lett. 11(7), 601–604 (2004)

    Article  Google Scholar 

  22. Févotte, C., Godsill, S.: A bayesian approach to blind separation of sparse sources. IEEE Trans. Audio Speech Lang. Process. 14(6), 2174–2188 (2006)

    Google Scholar 

  23. Févotte, C., Gribonval, R., Vincent, E.: BSS EVAL toolbox user guide. Techical Report, IRISA Technical Report 1706, Rennes, France, April 2005, http://www.irisa.fr/metiss/bsseval/ (2005)

  24. Fisher, N.: Statistical Analysis of Circular Data. Cambridge University Press, Cambridge (1993)

    Google Scholar 

  25. Girolami, M.: A variational method for learning sparse and overcomplete representations. Neural Comput. 13(11), 2517–2532 (2001)

    Article  MATH  Google Scholar 

  26. Gribonval, R., Nielsen, M.: Sparse decomposition in unions of bases. IEEE Trans. Inf. Theory 49(12), 3320–3325 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  27. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York, 481+xxii p. http://www.cis.hut.fi/projects/ica/book/ (2001)

  28. Hyvärinen, A.: Independent component analysis in the presence of Gaussian noise by maximizing joint likelihood. Neurocomputing 22, 49–67 (1998)

    Article  MATH  Google Scholar 

  29. Jammalamadaka, S., Sengupta, A.: Topics in Circular Statistics. World Scientific, Singapore (2001)

    Google Scholar 

  30. Jutten, C., Karhunen, J.: Advances in nonlinear blind source separation. In: Proceedings of 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), pp. 245–256. Nara, Japan (2003)

    Google Scholar 

  31. Kreyszig, E.: Advanced Engineering Mathematics, 1264 p. Wiley (2010)

    Google Scholar 

  32. Lee, T.W., Bell, A.J., Lambert, R.: Blind separation of delayed and convolved sources. In: Advances in Neural Information Processing Systems (NIPS), vol. 9, pp. 758–764 (1997)

    Google Scholar 

  33. Lee, T.W., Lewicki, M., Girolami, M., Sejnowski, T.: Blind source separation of more sources than mixtures using overcomplete representations. IEEE Signal Process. Lett. 4(5), 87–90 (1999)

    Google Scholar 

  34. Lewicki, M., Sejnowski, T.: Learning overcomplete representations. Neural Comput. 12, 337–365 (2000)

    Article  Google Scholar 

  35. Lewicki, M.: Efficient coding of natural sounds. Nat. Neurosci. 5(4), 356–363 (2002)

    Article  Google Scholar 

  36. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. Berkeley, California (1967)

    Google Scholar 

  37. Mardia, K., Kanti, V., Jupp, P.: Directional Statistics. Wiley, Chichester (1999)

    Google Scholar 

  38. Mitianoudis, N., Davies, M.: Permutation alignment for frequency domain ICA using subspace beamforming methods. In: Proceedings of the International Workshop on Independent Component Analysis and Source Separation (ICA2004), pp. 127–132. Granada, Spain (2004)

    Google Scholar 

  39. Mitianoudis, N., Stathaki, T.: Underdetermined source separation using mixtures of warped Laplacians. In: International Conference on Independent Component Analysis and Source Separation (ICA). London, UK (2007)

    Google Scholar 

  40. Mitianoudis, N.: A directional Laplacian density for underdetermined audio source separation. In: 20th International Conference on Artificial Neural Networks (ICANN). Thessaloniki, Greece (2010)

    Google Scholar 

  41. Mitianoudis, N., Davies, M.: Audio source separation of convolutive mixtures. IEEE Trans. Audio Speech Process. 11(5), 489–497 (2003)

    Article  Google Scholar 

  42. Mitianoudis, N., Stathaki, T.: Batch and online underdetermined source separation using Laplacian mixture models. IEEE Trans. Audio Speech Lang. Process. 15(6), 1818–1832 (2007)

    Article  Google Scholar 

  43. Moulines, E., Cardoso, J.F., Gassiat, E.: Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’97), pp. 3617–3620. Munich, Germany (1997)

    Google Scholar 

  44. O’Grady, P., Pearlmutter, B.: Hard-LOST: modified K-means for oriented lines. In: Proceedings of the Irish Signals and Systems Conference, pp. 247–252. Ireland (2004)

    Google Scholar 

  45. O’Grady, P., Pearlmutter, B.: Soft-LOST: EM on a mixture of oriented lines. In: Proceedings of the International Conference on Independent Component Analysis 2004, pp. 428–435. Granada, Spain (2004)

    Google Scholar 

  46. Pajunen, P., Hyvärinen, A., Karhunen, J.: Nonlinear blind source separation by self-organizing maps. In: Proceedings of the International Conference on Neural Information Processing, pp. 1207–1210. Hong Kong (1996)

    Google Scholar 

  47. Plumbley, M., Abdallah, S., Blumensath, T., Davies, M.: Sparse representations of polyphonic music. Signal Process. 86(3), 417–431 (2006)

    Article  MATH  Google Scholar 

  48. Rickard, S., Balan, R., Rosca, J.: Real-time time-frequency based blind source separation. In: Proceedings of the ICA2001, pp. 651–656. San Diego, CA (2001)

    Google Scholar 

  49. Sawada, H., Araki, S., Makino, S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007), pp. 139–142 (2007)

    Google Scholar 

  50. Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2011)

    Google Scholar 

  51. Sawada, H., Mukai, R., Araki, S., Makino, S.: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 75–87 (2004)

    Article  Google Scholar 

  52. SiSEC 2008: Signal separation evaluation campaign. http://sisec2008.wiki.irisa.fr/tiki-index.php

  53. SiSEC 2010: Signal separation evaluation campaign. http://sisec2010.wiki.irisa.fr/tiki-index.php

  54. SiSEC 2011: Signal separation evaluation campaign. http://sisec.wiki.irisa.fr/tiki-index.php

  55. Smaragdis, P., Boufounos, P.: Position and trajectory learning for microphone arrays. IEEE Trans. Audio Speech Lang. Process. 15(1), 358–368 (2007)

    Google Scholar 

  56. Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22, 21–34 (1998)

    Article  MATH  Google Scholar 

  57. Torkkola, K.: Blind separation of delayed and convolved sources. In: S. Haykin (ed.) Unsupervised Adaptive Filtering, vol. I, pp. 321–375. Wiley (2000)

    Google Scholar 

  58. Vincent, E., Arberet, S., Gribonval, R.: Underdetermined instantaneous audio source separation via local gaussian modeling. In: 8th International Conferences on Independent Component Analysis and Signal Separation (ICA), pp. 775–782. Paraty, Brazil (2009)

    Google Scholar 

  59. Vincent, E., Gribonval, R., Fevotte, C., Nesbit, A., Plumbley, M., Davies, M., Daudet, L.: BASS-dB: the blind audio source separation evaluation database. http://bass-db.gforge.inria.fr/BASS-dB/

  60. Winter, S., Kellermann, W., Sawada, H., Makino, S.: MAP based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and L1-norm minimization. EURASIP J. Adv. Signal Process. 1, 12 p (2007)

    Google Scholar 

  61. Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)

    Article  MathSciNet  Google Scholar 

  62. Zibulevsky, M., Kisilev, P., Zeevi, Y., Pearlmutter, B.: Blind source separation via multinode sparse representation. Adv. Neural Inf. Process. Syst. 14, 1049–1056 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos Mitianoudis .

Editor information

Editors and Affiliations

Appendices

Appendix 1

Calculation of the Normalisation Parameter for the Generalised DLD

To estimate the normalisation coefficient \(c_p(k)\) of (7.27), we need to solve the following equation:

$$\begin{aligned} \int \limits _{\mathbf {x}\in \fancyscript{S}^{p-1}} c_p(k)e^{-k\sqrt{1-(\mathbf {m}^T\mathbf {x})^2}} \mathrm{d}\mathbf {x}=1. \end{aligned}$$
(7.43)

Following Eq. (B.8) and in a similar manner to the analysis in Appendix B.2 in [18], we can rewrite the above equation as follows:

$$\begin{aligned} c_p(k)\int \limits _{0}^{\pi }\mathrm{d}\theta _{p-1}\int \limits _{0}^{\pi } e^{-k\sqrt{1-\cos ^2\theta _1}}\sin ^{p-2}\theta _1 \mathrm{d}\theta _1 \prod _{j=3}^{p-1}\int \limits _{0}^{\pi }\sin ^{p-j}\theta _{j-1} \mathrm{d}\theta _{j-1}=1. \end{aligned}$$
(7.44)

Following a similar methodology to Appendix B.2 in [18], the above yields:

$$\begin{aligned} c_p(k)\pi \int \limits _{0}^{\pi } e^{-k\sin \theta _1}\sin ^{p-2}\theta _1\mathrm{d}\theta _1 \frac{\pi ^{\frac{p-3}{2}}}{\varGamma \left( \frac{p-1}{2}\right) }=1. \end{aligned}$$
(7.45)

Using the definition of \(I_p(k)\), we can write

$$\begin{aligned} c_p(k)I_{p-2}(k)\frac{\pi ^{\frac{p+1}{2}}}{\varGamma \left( \frac{p-1}{2}\right) }=1\Rightarrow c_p(k)=\frac{\varGamma \left( \frac{p-1}{2}\right) }{\pi ^{\frac{p+1}{2}}I_{p-2}(k)}. \end{aligned}$$
(7.46)

Appendix 2

Gradient Updates for \(\mathbf {m}\) and \(k\) for the MDDLD

The first-order derivative of the log-likelihood in (7.28) for the estimation of \(\mathbf {m}\) are calculated below:

$$\begin{aligned} \frac{\partial J(\mathbf {X},\mathbf {m},k)}{\partial \mathbf {m}}&= -k\sum _{n-1}^N\frac{-2\mathbf {m}^T\mathbf {x}_n}{2\sqrt{1-(\mathbf {m}^T\mathbf {x}_n)^2}}\mathbf {x}_n\nonumber \\&= k\sum _{n=1}^N\frac{\mathbf {m}^T\mathbf {x}_n}{\sqrt{1-(\mathbf {m}^T\mathbf {x}_n)^2}}\mathbf {x}_n. \end{aligned}$$
(7.47)

Before we estimate \(k\) from the log-likelihood (7.28), we derive the following property:

$$\begin{aligned} \frac{\partial }{\partial k}I_0(k)= -\frac{1}{\pi }\int \limits _{0}^{\pi } e^{-k\sin \theta }\sin \theta \mathrm{d}\theta =-I_1(k). \end{aligned}$$
(7.48)

The above property can be generalised as follows:

$$\begin{aligned} \frac{\partial ^p}{\partial k^p}I_0(k)=(-1)^{p}\frac{1}{\pi }\int \limits _{0}^{\pi }\sin ^p\theta e^{-k\sin \theta }\mathrm{d}\theta =(-1)^{p}I_p(k). \end{aligned}$$
(7.49)

The first-order derivative of the log-likelihood in (7.28) for the estimation of \(k\) is then calculated below:

$$\begin{aligned} \frac{\partial J(\mathbf {X},\mathbf {m},k)}{\partial k}=N\frac{I_{p-1}(k)}{I_{p-2}(k)}-\sum _{n=1}^{N}\sqrt{1-(\mathbf {m}^T\mathbf {x}_n)^2}. \end{aligned}$$
(7.50)

Appendix 3

A Directional K-Means Algorithm

Assume that \(K\) is the number of clusters, \(\fancyscript{C}_i,\mathrm i=1,\dots ,K\) are the clusters, \(\mathbf {m}_i\) are the cluster centres and \(\mathbf {X}=\{\mathbf {x}_1,\dots ,\mathbf {x}_n,\dots ,\mathbf {x}_N\}\) is a \(p\)-D angular dataset lying on the half-unit \(p\)-D sphere. The original \(K\)-means [36] minimises the following non-directional error function:

$$\begin{aligned} Q=\sum _{n=1}^N\sum _{i=1}^K||\mathbf {x}_n-\mathbf {m}_i||^2 \end{aligned}$$
(7.51)

where \(||\cdot ||\) represents the Euclidean distance. Instead of using the squared Euclidean distance for the \(p\)-D Directional \(K\)-Means, we introduce the following distance function:

$$\begin{aligned} D_l(\mathbf {x}_n,\mathbf {m}_i)=\sqrt{1-(\mathbf {m}_i^T\mathbf {x}_n)^2}. \end{aligned}$$
(7.52)

The novel function \(D_l\) is similarly monotonic as the original distance but emphasises more on the contribution of points closer to the cluster centre. In addition, \(D_l\) is periodic with period \(\pi \). The \(p\)-D Directional \(K\)-Means can thus be described as follows:  

  1. 1.

    Randomly initialise \(K\) cluster centres \(\mathbf {m}_i\), where \(||\mathbf {m}_i||=1\).

  2. 2.

    Calculate the distance of all points \(\mathbf {x}_n\) to the cluster centres \(\mathbf {m}_i\), using \(D_l\).

  3. 3.

    The points with minimum distance to the centres \(\mathbf {m}_i\) form the new clusters \(\fancyscript{C}_i\).

  4. 4.

    The clusters \(\fancyscript{C}_i\) vote for their new centres \(\mathbf {m}_i^+\). To avoid averaging mistakes with directional data, vector averaging is employed to ensure the validity of the addition. The resulting average is normalised to the half-unit \(p\)-D sphere:

    $$\begin{aligned} \mathbf {m}_i^+=\frac{1}{k_i}\sum _{\mathbf {x}_n \in k_i }\mathbf {x}_n \end{aligned}$$
    (7.53)
    $$\begin{aligned} \mathbf {m}_i^+\leftarrow \mathbf {m}_i^+/||\mathbf {m}_i^+|| \end{aligned}$$
    (7.54)
  5. 5.

    Repeat steps (2)–(4) until the means \(\mathbf {m}_i\) have converged.

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mitianoudis, N. (2014). Underdetermined Audio Source Separation Using Laplacian Mixture Modelling. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55016-4_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55015-7

  • Online ISBN: 978-3-642-55016-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics