Bayesian Noise Compensation of Time Trajectories of Spectral Coefficients for Robust Speech Recognition
Our work presents a novel data driven compensation technique that modifies on-line the incoming spectral representation of degraded speech to approximate the features of high quality speech used to train a classifier. We apply the Bayesian inference framework to the degraded spectral coefficients based on modeling clean speech linear-spectrum with appropriate non-Gaussian distributions that allow maximum a-posteriori (MAP) closed form solution to be set. MAP solution leads to a soft threshold function applied and adapted to the spectral characteristics and noise variance of each spectral band. We perform extensive evaluation of our algorithm against white and coloured Gaussian noise in the context of Automatic Speech Recognition (ASR), and demonstrate its robustness in adverse conditions. The enhancement process comes at little to no extra computational overhead, thus achieving real time, on line performance.
KeywordsSpeech Recognition Speech Signal Spectral Band Minimum Mean Square Error Automatic Speech Recognition
Unable to display preview. Download preview PDF.
- 4.Moreno P., Raj B., Stern R., “A vector Taylor series approach for environment-independent speech recognition”, Proc. ICASSP, pp. 733–736, 1996.Google Scholar
- 5.Acero A., Deng L., Kristjansson T., Zhang J., “HMM adapation using vector Taylor series for noisy speech recognition”, ICSLP 2000, pp. 869–872.Google Scholar
- 7.Leggetter C., Woodland P., “Maximum Likelihood Linear Regression for speaker adaptation of continuous density HMMs,” Computer Speech and Lang., pp. 171–185, 1995Google Scholar
- 8.Hyvärinen A., Hoyer O., Oja E., “Sparse Code Shrinkage: Denoising of nongaussian data by maximum likelihood estimation” Technical Report A51, Helsinki University of Technology, Laboratory of Computer Information Science, 1998.Google Scholar
- 9.Potamitis I., Fakotakis N., Kokkinakis G., “Speech enhancement using the Sparse Code Shrinkage technique”, to appear in Proc. of ICASSP, Utah, 2001.Google Scholar
- 11.Acero, A., “Acoustical and Environmental Robustness in Automatic Speech Recognition”, Kluwer Academic Publishers, 1992.Google Scholar