Cepstral Smoothing for Convolutive Blind Speech Separation

  • Ibrahim Missaoui
  • Zied Lachiri
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 250)


In this work, we have proposed an approach which combines two source separation techniques, convolutive blind source separation (BSS) exploiting the second-order non-stationary signals and binary time-frequency masking, together with a cepstral smoothing post-processing. The latter consists in smoothing of the estimated binary masks from the outputs of BSS algorithm in cepstral domain. The idea behind employing a cepstral smoothing of spectral masks is to improve the interference suppression and to reduce musical noise typically produced by time-frequency masking. Experimental results and the evaluation measurement prove the performance of proposed convolutive blind speech separation system.


Cepstral smoothing Ideal binary mask Convolutive mixtures Blind speech Separation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Haykin, S., Chen, Z.: The cocktail party problem. Neural Computation 17, 1875–1902 (2005)CrossRefGoogle Scholar
  2. 2.
    Asari, H., Pearlmutter, B.A., Zador, A.M.: Sparse Representations for the Cocktail Party Problem. The Journal of Neuroscience 26(28), 7477–7490 (2006)CrossRefGoogle Scholar
  3. 3.
    Gorokhov, A., Loubaton, P.: Subspace based techniques for second order blind separation of convolutive mixtures with temporally correlated sources. IEEE Trans. on Circuit Systems I: Fundamental Theory and Applications 44(9), 813–820 (1997)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Douglas, S.C., Gupta, M., Sawada, H., Makino, S.: Spatio-temporal fastica algorithms for the blind separation of convolutive mixtures. IEEE Transactions on Audio Speech Lang. Processing. 15(5), 1511–1520 (2007)CrossRefGoogle Scholar
  5. 5.
    Parra, L., Spence, C.: Convolutive blind separation of non-stationary sources. IEEE Trans. on Speech and Audio Processing 8(3), 320–327 (2000)CrossRefzbMATHGoogle Scholar
  6. 6.
    Makino, S., Sawada, H., Mukai, R., Araki, S.: Blind source separation of convolutive mixtures of speech in frequency domain. IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences E88-A(7), 1640–1655 (2005)CrossRefGoogle Scholar
  7. 7.
    Vincent, E., Gribonval, R., Fevotte, C.: Performance Measurement in Blind Audio Source Separation. IEEE Trans. on Audio, Speech, and Language Processing 14(4), 1462–1469 (2006)CrossRefGoogle Scholar
  8. 8.
    Yellin, D., Weinstein, E.: Multichannel signal separation: methods and analysis. IEEE Trans. on Signal Processing 44, 106–118 (1996)CrossRefGoogle Scholar
  9. 9.
    Wang, D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: Speech Separation by Humans and Machines. Springer, Heidelberg (2005)Google Scholar
  10. 10.
    Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Handbook of Speech Processing. Springer, Heidelberg (2007)Google Scholar
  11. 11.
    Wang, D.L., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, Hoboken, New Jersey (2006)CrossRefGoogle Scholar
  12. 12.
    Oppenheim, A.V., Schafer, R.W.: Discrete Time Signal Processing, 3rd edn. Prentice Hall, New Jersey (2009)zbMATHGoogle Scholar
  13. 13.
    Aichner, R., Buchner, H., Araki, S., Makino, S.: On-line time-domain blind source separation of non stationary convolved signals. In: 4th International Symposium on Independent Component Analysis and Blind Signal Separation, Japan, pp. 987–992 (2003)Google Scholar
  14. 14.
    Rahbar, K., Reilly, J.: Geometric optimization methods for blind source separation of signals. In: International Workshop on Independent Component Analysis and Signal Separation, Finland, pp. 375–380 (2000)Google Scholar
  15. 15.
    Chan, D., Rayner, P., Godsill, S.: Multi-channel signal separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Georgia, pp. 649–652 (1996)Google Scholar
  16. 16.
    Madhu, N., Breithaupt, C., Martin, R.: Temporal smoothing of spectral masks in the cepstral domain for speech separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, pp. 45–48 (2008)Google Scholar
  17. 17.
    Jan, T., Wang, W., Wang, D.L.: A multistage approach for blind separation of convolutive speech mixtures. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Taiwan, pp. 1713–1716 (2009)Google Scholar
  18. 18.
    Pesquet, J., Chen, B., Petropulu, A.P.: Frequency domain contrast functions for separation of convolutive mixtures. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 2765–2768 (2001)Google Scholar
  19. 19.
    ITU-T P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, International Telecommunication Union, Geneva (2001) Google Scholar
  20. 20.
    Fevotte, C., Gribonval, R., Vincent, E.: BSS EVAL toolbox user guide. Technical Report 1706, IRISA (2005)Google Scholar
  21. 21.
    Gaubitch, N.D.: Allen and Berkeley image model for room impulse response, Imperial College London (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ibrahim Missaoui
    • 1
  • Zied Lachiri
    • 1
    • 2
  1. 1.National Engineering School of Tunis, ENITTunisTunisia
  2. 2.National Institute of Applied Science and Technology, INSATTunisTunisia

Personalised recommendations