Feature Compensation Employing Variational Model Composition for Robust Speech Recognition in In-Vehicle Environment

  • Wooil Kim
  • John H. L. Hansen


This chapter proposes a novel model composition method to improve speech recognition performance in time-varying background noise conditions. It is suggested that each order of the cepstral coefficients represents the frequency degree of changing components in the envelope of the log-spectrum. With this motivation, in the proposed method, variational noise models are generated by selectively applying perturbation factors to a basis model, resulting in a collection of various types of spectral patterns in the log-spectral domain. The basis noise model is obtained from the silent duration segments of input speech. The proposed Variational Model Composition (VMC) method is employed to generate multiple environmental models for our previously proposed feature compensation method. Experimental results prove that the proposed method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions with +20.80% relative improvement in word error rates for the CU-Move real-life in-vehicle corpus, compared to an existing single model–based method.


Feature compensation In-vehicle environment Multiple model Robust speech recognition Variational model composition (VMC) 


  1. 1.
    Boll SF (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans on Acoustics, Speech and Signal Proc 27:113–120CrossRefGoogle Scholar
  2. 2.
    Ephraim Y, Malah D (1984) Speech enhancement using minimum mean square error short time spectral amplitude estimator. IEEE Trans on Acoustics, Speech and Signal Proc 32(6):1109–1121CrossRefGoogle Scholar
  3. 3.
    Hansen JHL, Clements MA (1991) Constrained iterative speech enhancement with application to speech recognition. IEEE Trans on Signal Proc 39(4):795–805CrossRefGoogle Scholar
  4. 4.
    Gauvain JL, Lee CH (1994) Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans on Speech and Audio Proc 2(2):291–298CrossRefGoogle Scholar
  5. 5.
    Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput Speech Lang 9:171–185CrossRefGoogle Scholar
  6. 6.
    Gales MJF, Young SJ (1996) Robust continuous speech recognition using parallel model combination. IEEE Trans on Speech and Audio Proc 4(5):352–359CrossRefGoogle Scholar
  7. 7.
    Moreno PJ, Raj B, Stern RM (1998) Data-driven environmental compensation for speech recognition: a unified approach. Speech Commun 24(4):267–285CrossRefGoogle Scholar
  8. 8.
    Kim NS (2002) Feature domain compensation of nonstationary noise for robust speech recognition. Speech Commun 37:231–248MATHCrossRefGoogle Scholar
  9. 9.
    Kim W, Kwon O, Ko H (2004) PCMM-based feature compensation schemes using model interpolation and mixture sharing. ICASSP-2004 1:989–992Google Scholar
  10. 10.
    Kim W, Hansen JHL (2009) Feature compensation in the cepstral domain employing model combination. Speech Commun 51(2):83–96CrossRefGoogle Scholar
  11. 11.
    Cook M, Green P, Josifovski L, Vizinho A (2001) Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun 34(3):267–285CrossRefGoogle Scholar
  12. 12.
    Raj B, Seltzer ML, Stern RM (2004) Reconstruction of missing features for robust speech recognition. Speech Commun 43(4):275–296CrossRefGoogle Scholar
  13. 13.
    Kim W, Stern RM (2006) Band-independent mask estimation for missing-feature reconstruction in the presence of unknown background noise. ICASSP-2006 305-308, May 2006.Google Scholar
  14. 14.
    Jr Deller JR, Hansen JHL, Proakis JG (2000) Discrete-Time Processing of Speech Signals. IEEE Press, New YorkGoogle Scholar
  15. 15.
    Hansen JHL, Zhang X, Akbacak M, Yapanel U, Pellom B, Ward W, Angkititrakul P (2004) CU-Move: Advances for in-vehicle speech systems for route navigation. In: Abut H, Hansen JHL, Taketa K (eds) DSP for in-vehicle and mobile systems. Springer, USA, Chap. 2Google Scholar
  16. 16.
    Hirsch HG Pearce D (2000) The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions. ISCA ITRW ASR2000.Paris, FranceGoogle Scholar
  17. 17.
    ETSI standard document (2000) ETSI ES 201 108 v1.1.2 (2000–04)Google Scholar
  18. 18.
    NIST Speech Quality Assurance (SPQA) package version 2.3,
  19. 19.
    Martin R (1994) Spectral subtraction based on minimum statistics. EUSIPCO-94 1182–1185Google Scholar
  20. 20.
    ETSI standard document (2002) ETSI ES 202 050 v1.1.1 (2002–10)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer ScienceUniversity of Texas at DallasRichardsonUSA

Personalised recommendations