Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS

  • Qingju Liu
  • Wenwu Wang
  • Philip Jackson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6365)


Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An adapted expectation maximization (AEM) algorithm is proposed in the training process to model the audio-visual coherence upon the extracted features. The coherence is exploited to solve the permutation problem in the frequency domain using a new sorting scheme. We test our algorithm on the XM2VTS multimodal database. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS.


Discrete Cosine Transform Gaussian Mixture Model Blind Source Separation Visual Speech Audio Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jutten, C., Herault, J.: Blind Separation of Sources, Part I: An Adaptive Algorithm Based on Neuromimetic Architecture. Signal Process. 24(1), 1–10 (1991)zbMATHCrossRefGoogle Scholar
  2. 2.
    Comon, P.: Independent Component Analysis, a New Concept? Signal Process. 36(3), 287–314 (1994)zbMATHCrossRefGoogle Scholar
  3. 3.
    Cardoso, J.F., Souloumiac, A.: Blind Beamforming for Non-Gaussian Signals. IEEE Proc.-F 140(6), 362–370 (1993)Google Scholar
  4. 4.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, New York (2001)CrossRefGoogle Scholar
  5. 5.
    Sodoyer, D., Schwartz, J.L., Girin, L., Klinkisch, J., Jutten, C.: Separation of Audio-Visual Speech Sources: a New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli. EURASIP J. Appl. Signal Process. 11, 1165–1173 (2002)Google Scholar
  6. 6.
    Wang, W., Cosker, D., Hicks, Y., Sanei, S., Chambers, J.: Video Assisted Speech Source Separation. In: Proc. IEEE ICASSP, pp. 425–428 (2005)Google Scholar
  7. 7.
    Rivet, B., Girin, L., Jutten, C.: Mixing Audiovisual Apeech Processing and Blind Source Separation for the Extraction of Speech Signals from Convolutive Mixtures. IEEE Trans. Audio Speech Lang. Process. 15(1), 96–108 (2009)CrossRefGoogle Scholar
  8. 8.
    Anemüller, J., Kollmeier, B.: Amplitude Modulation Decorrelation for Convolutive Blind Source Separation. In: Proc. ICA, pp. 215–220 (2000)Google Scholar
  9. 9.
    Ikram, M.Z., Morgan, D.R.: A Beamforming Approach to Permutation Alignment for Multichannel Frequency-Domain Blind Speech Separation. In: Proc. IEEE ICASSP, pp. 881–884 (2002)Google Scholar
  10. 10.
    Matsuoka, K., Nakashima, S.: Minimal Distortion Principle for Blind Source Separation. In: Proc. ICA, pp. 722–727 (2001)Google Scholar
  11. 11.
    Thomas, J., Deville, Y., Hosseini, S.: Time-Domain Fast Fixed-Point Algorithms for Convolutive ICA. IEEE Signal Process. Lett. 13(4), 228–231 (2006)CrossRefGoogle Scholar
  12. 12.
    Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: XM2VTSDB: The Extended M2VTS Database. In: AVBPA (1999),
  13. 13.
    Westner, A.: Room Impulse Responses (1998),

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Qingju Liu
    • 1
  • Wenwu Wang
    • 1
  • Philip Jackson
    • 1
  1. 1.Centre for Vision, Speech and Signal Processing, Faculty of Engineering and Physical SciencesUniversity of SurreyGuildfordUnited Kingdom

Personalised recommendations