Model-Driven Speech Enhancement for Multisource Reverberant Environment (Signal Separation Evaluation Campaign (SiSEC) 2011)

  • Pejman Mowlaee
  • Rahim Saeidi
  • Rainer Martin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7191)


We present a low complexity speech enhancement technique for real-life multi-source environments. Assuming that the speaker identity is known a priori, we present the idea of incorporating speaker model to enhance a target signal corrupted in non-stationary noise in a reverberant scenario. Based on experiments, this helps to improve the limited performance of noise-tracking based speech enhancement methods under unpredictable and non-stationary noise scenarios. Using pre-trained speaker model captures a constrained subspace for target speech and is capable to provide enhanced speech estimate by rejecting the non-stationary noise sources. Experimental results on Signal Separation Evaluation Campaign (SiSEC) showed that the proposed approach is successful in canceling the interference signal in the noisy input and providing an enhanced output signal.


Model-driven Speaker model SiSEC 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Audio, Speech, and Language Process. 32(6), 1109–1121 (1984)CrossRefGoogle Scholar
  2. 2.
    Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing 33(2), 443–445 (1985)CrossRefGoogle Scholar
  3. 3.
    Hendriks, R.C., Heusdens, R., Jensen, J.: MMSE based noise PSD tracking with low complexity. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 4266–4269 (2010)Google Scholar
  4. 4.
    Christensen, H., Barker, J., Ma, N., Green, P.: The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. In: Proc. Interspeech, pp. 1918–1921 (2010)Google Scholar
  5. 5.
    Mowlaee, P.: New Stategies for Single-channel Speech Separation, Ph.D. thesis, Institut for Elektroniske Systemer, Aalborg Universitet (2010)Google Scholar
  6. 6.
    Mowlaee, P., Christensen, M., Jensen, S.: New results on single-channel speech separation using sinusoidal modeling. IEEE Trans. Audio, Speech, and Language Process. 19(5), 1265–1277 (2011)CrossRefGoogle Scholar
  7. 7.
    Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environments. Speech Communication 48(2), 220–231 (2006)CrossRefGoogle Scholar
  8. 8.
    Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Processing 81(11), 2403–2418 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Speech Separation by Humans and Machines, pp. 181–197. Kluwer (2005)Google Scholar
  10. 10.
    Erkelens, J., Hendriks, R., Heusdens, R., Jensen, J.: Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors. IEEE Transactions on Audio, Speech, and Language Processing 15(6), 1741–1752 (2007)CrossRefGoogle Scholar
  11. 11.
    Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1462–1469 (2006)CrossRefGoogle Scholar
  12. 12.
    The third community-based Signal Separation Evaluation Campaign (SiSEC 2011),
  13. 13.
    Emiya, V., Vincent, E., Harlander, N., Hohmann, V.: Subjective and objective quality assessment of audio source separation. IEEE Transactions on Audio, Speech, and Language Processing (99), 1 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Pejman Mowlaee
    • 1
  • Rahim Saeidi
    • 2
  • Rainer Martin
    • 1
  1. 1.Institute of Communication Acoustics (IKA)Ruhr-Universität Bochum (RUB)Germany
  2. 2.Centre for Language and Speech TechnologyRadboud University NijmegenThe Netherlands

Personalised recommendations