Circuits, Systems, and Signal Processing

, Volume 35, Issue 5, pp 1593–1609 | Cite as

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

  • Vikas JoshiEmail author
  • N. Vishnu Prasad
  • S. Umesh


Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance. In this work, we argue that some amount of useful information is lost during normalization as every utterance is forced to have the same first- and second-order statistics, i.e., zero mean and unit variance. We propose to modify CMVN methodology to retain the useful information and yet compensate for noise. The proposed normalization approach transforms every test utterance to utterance-specific clean mean (i.e., utterance mean if the noise was absent) and clean variance, instead of zero mean and unit variance. We derive expressions to estimate the clean mean and variance from a noisy utterance. The proposed normalization is effective in the recognizing voice commands that are typically short (single words or short phrases), where more advanced methods [such as histogram equalization (HEQ)] are not effective. Recognition results show a relative improvement (RI) of \(21\,\%\) in word error rate over conventional CMVN on the Aurora-2 database and a RI of 20 and \(11\,\%\) over CMVN and HEQ on short utterances of the Aurora-2 database.




  1. 1.
    R. Balchandran, R. Mammone, Non-parametric estimation and correction of non-linear distortion in speech system. in Proceedings of ICASSP (1998)Google Scholar
  2. 2.
    J. Du, R.H. Wang, Cepstral shape normalization for robust speech recognition. in Proceedings of ICASSP (2008), pp. 4389–4392Google Scholar
  3. 3.
    S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)CrossRefGoogle Scholar
  4. 4.
    M. Gales, Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang. 12, 75–98 (1998)CrossRefGoogle Scholar
  5. 5.
    L. Garcia, J.C. Segura, J. Ramirez, A. Torre, C. Benitez, Parametric nonlinear feature equalization for robust speech recognition. in Proceedings of ICASSP (2006)Google Scholar
  6. 6.
    C. Hsu, L. Lee, Higher order cepstral moment normalization for improved robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(2), 205–220 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    V. Joshi, N.V. Prasad, S. Umesh, Modified cepstral mean normalization–transforming to utterance specific non-zero mean. in Interspeech, (Lyon, 2013), pp. 881–885Google Scholar
  8. 8.
    C. Leggetter, P. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang. 9, 171–185 (1995)CrossRefGoogle Scholar
  9. 9.
    J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 22, 1–33 (2013)Google Scholar
  10. 10.
    S. Molau, M. Pitz, H. Ney, Histogram based normalization in the acoustic feature space. in Proceedings of ASRU (2001)Google Scholar
  11. 11.
    P. Moreno, Speech recognition in noisy environments. PhD thesis, Carnegie Mellon University (1996)Google Scholar
  12. 12.
    P. Moreno, B. Raj, R. Stern, A vector taylor series approach for environment-independent speech recognition. in Proceedings of ICASSP (1996), pp. 733–736Google Scholar
  13. 13.
    Y. Obuchi, R. Stern, Normalization of time-derivative parameters using histogram equalization. in Proceedings of EUROSPEECH 2003 (Geneva, 2003)Google Scholar
  14. 14.
    D. Pearce, H.G. Hirsch, The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. in ISCA ITRW ASR2000 (2000), pp. 29–32Google Scholar
  15. 15.
    N. Prasad, S. Umesh, Improved cepstral mean and variance normalization using bayesian framework. in Proceedings of Automatic Speech Recognition and Understanding (ASRU) (2013), pp. 156–161Google Scholar
  16. 16.
    J. Segura, C. Benitez, A. Torre, A. Rubio, J. Ramirez, Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Process. Lett. 11, 517–520 (2004)CrossRefGoogle Scholar
  17. 17.
    O. Strand, A. Egeberg, Cepstral mean and variance normalization in the model domain. in ISCA Tutorial and Research Workshop (2004)Google Scholar
  18. 18.
    R. Togneri, A. Ming Toh, S. Nordholm, Evaluation and modification of cepstral moment normalization for speech recognition in additibe babble ensemble. in Australian International Conference on Speech Science and Technology (2006)Google Scholar
  19. 19.
    A. Torre, J. Segura, C. Benitez, A. Peinado, A. Rubio, Non-linear transformations of the feature space for robust speech recognition. in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1 (2002), pp. 401–404Google Scholar
  20. 20.
    A. Torre, A. Peinado, J. Segura, J. Perez-Cordoba, M. Benitez, A. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)CrossRefGoogle Scholar
  21. 21.
    O. Viikki, K. Laurila, Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun. 25(1), 133–147 (1998)CrossRefGoogle Scholar
  22. 22.
    S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P.C. Woodland, The HTK Book, version 3.4. (Cambridge University Engineering Department, Cambridge, 2006)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Electrical EngineeringIIT MadrasChennaiIndia
  2. 2.IBM India Research LabsBangaloreIndia
  3. 3.Soliton TechnologiesCoimbatoreIndia

Personalised recommendations