Bayesian Noise Compensation of Time Trajectories of Spectral Coefficients for Robust Speech Recognition

  • Ilyas Potamitis
  • Nikos Fakotakis
  • George Kokkinakis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2166)


Our work presents a novel data driven compensation technique that modifies on-line the incoming spectral representation of degraded speech to approximate the features of high quality speech used to train a classifier. We apply the Bayesian inference framework to the degraded spectral coefficients based on modeling clean speech linear-spectrum with appropriate non-Gaussian distributions that allow maximum a-posteriori (MAP) closed form solution to be set. MAP solution leads to a soft threshold function applied and adapted to the spectral characteristics and noise variance of each spectral band. We perform extensive evaluation of our algorithm against white and coloured Gaussian noise in the context of Automatic Speech Recognition (ASR), and demonstrate its robustness in adverse conditions. The enhancement process comes at little to no extra computational overhead, thus achieving real time, on line performance.


Speech Recognition Speech Signal Spectral Band Minimum Mean Square Error Automatic Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gong Y., (1995), Speech recognition in noisy environments, Speech Communication, 16, pp. 261–291.CrossRefGoogle Scholar
  2. 2.
    Gales M., “Predictive model-based compensation schemes for robust speech recognition,” Speech Communication, 25, pp. 49–74, 1998.CrossRefGoogle Scholar
  3. 3.
    Moreno P., Raj B., Stern R., “Data-driven environmental compensation for speech recognition: A unified approach”, Speech Communication, 24, pp. 267–285, 1998.CrossRefGoogle Scholar
  4. 4.
    Moreno P., Raj B., Stern R., “A vector Taylor series approach for environment-independent speech recognition”, Proc. ICASSP, pp. 733–736, 1996.Google Scholar
  5. 5.
    Acero A., Deng L., Kristjansson T., Zhang J., “HMM adapation using vector Taylor series for noisy speech recognition”, ICSLP 2000, pp. 869–872.Google Scholar
  6. 6.
    Gauvain J., Lee C., “MAP estimation for multivariate Gaussian mixture observation of Markov Chains, ” IEEE Transactions on Speech & Audio Processing, 2, pp.291–298, 1994.CrossRefGoogle Scholar
  7. 7.
    Leggetter C., Woodland P., “Maximum Likelihood Linear Regression for speaker adaptation of continuous density HMMs,” Computer Speech and Lang., pp. 171–185, 1995Google Scholar
  8. 8.
    Hyvärinen A., Hoyer O., Oja E., “Sparse Code Shrinkage: Denoising of nongaussian data by maximum likelihood estimation” Technical Report A51, Helsinki University of Technology, Laboratory of Computer Information Science, 1998.Google Scholar
  9. 9.
    Potamitis I., Fakotakis N., Kokkinakis G., “Speech enhancement using the Sparse Code Shrinkage technique”, to appear in Proc. of ICASSP, Utah, 2001.Google Scholar
  10. 10.
    Donoho D., Johnstone I., Kerkyacharian G., Picard D., “Wavelet Shrinkage: asymptotia” Journal of the Royal Statistical Society, B, 57, pp. 301–337, 1995.zbMATHMathSciNetGoogle Scholar
  11. 11.
    Acero, A., “Acoustical and Environmental Robustness in Automatic Speech Recognition”, Kluwer Academic Publishers, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Ilyas Potamitis
    • 1
  • Nikos Fakotakis
    • 1
  • George Kokkinakis
    • 1
  1. 1.Wire Communications Lab., Electrical & Computer Engineering Dept.University of PatrasPatrasGreece

Personalised recommendations