Advertisement

International Journal of Speech Technology

, Volume 21, Issue 4, pp 887–893 | Cite as

An investigation of the impact of MVA normalization on the advanced front-end features

  • Azzedine Touazi
  • Mohamed Debyeche
Article
  • 17 Downloads

Abstract

Feature normalization is a key objective in speech related applications. In this paper, we study the effects of the Mean subtraction, Variance normalization, and Autoregressive Moving Average (ARMA) filtering (MVA) normalization method on the ETSI Advanced Front-End (AFE) features. A series of experiments, on the Aurora-2 task, was conducted to show the impact of MVA normalization for different subsets of AFE feature components. Compared to the AFE baseline system, recognition results show performance improvement when only the logarithmic energy coefficient is normalized. However, the performance is degraded through the normalization of the rest of AFE coefficients. To investigate this degradation, other experiments were performed by eliminating the AFE implemented blind equalization post-processing block. It has shown that one part of this degradation can plausibly be interpreted as the effect of over-normalization caused by the MVA post-processing to the AFE original features. Furthermore, by analyzing the statistical distributions of AFE features we found that the effectiveness of MVA could also be affected by the high intra-frame variability of AFE features.

Keywords

Distributed speech recognition ETSI-AFE standard Aurora-2 task MVA normalization 

References

  1. Atal, B.-S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America, 55(6), 1304–1312.CrossRefGoogle Scholar
  2. Chen, C. & Bilmes, J.-A. (2007). MVA processing of speech features. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 15(1), 257–270.CrossRefGoogle Scholar
  3. Cheng, Y.-M. & Macho, D. (2001). SNR-dependent waveform processing for robust speech recognition, In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 305–308.Google Scholar
  4. Dey, N. & Ashour, A. S. (2018). Direction of arrival estimation and localization of multi-speech sources. New York: SpringerCrossRefGoogle Scholar
  5. ETSI document ES 201 108. (2003a). Speech processing, transmission, and quality aspects (STQ): Distributed speech recognition; front-end feature extraction algorithm; compression algorithms. Version 1.1.3.Google Scholar
  6. ETSI document ES 202 050. (2007). Speech processing, transmission, and quality aspects (STQ): Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. Version 1.1.5.Google Scholar
  7. ETSI document ES 202 211. (2003b). Speech processing, transmission, and quality aspects (STQ): Distributed speech recognition; extended front-end feature extraction algorithm; compression algorithms; back-end speech reconstruction algorithm. Version 1.1.1.Google Scholar
  8. Hirsch, H.-G. & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA tutorial and research workshop, pp. 181–188.Google Scholar
  9. Hirsch, H.-G. & Pearce, D. (2006). Applying the advanced ETSI frontend to the AURORA-2 task. Technical report, Version 1.1.Google Scholar
  10. Hung, J.-W., Hsieh, H.-J. & Chen, B. (2016). Robust speech recognition via enhancing the complex-valued acoustic spectrum in modulation domain. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(2), 236–251.CrossRefGoogle Scholar
  11. Li, J., Deng, L., Gong, Y. & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.CrossRefGoogle Scholar
  12. Mauuary, L. (1998). Blind equalization in the cepstral domain for robust telephone based speech recognition, In Proceedings of the european signal processing conference, EUSIPCO, pp. 359–363.Google Scholar
  13. Pearce, D. (2000). Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition. In Proceedings of the voice input/output applied society conference, AVIOS, pp. 83–86. San Jose: AVIOSGoogle Scholar
  14. Rabiner, L.-R. & Juang, B.-H. (1993). Fundamentals of speech recognition (Vol. 14). Englewood Cliffs: PTR Prentice Hall.Google Scholar
  15. Viikki, O. & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 25(1–3), 133–147.CrossRefGoogle Scholar
  16. Xiao, X., Chng, E. S. & Li, H. (2007). Temporal structure normalization of speech feature for robust speech recognition. IEEE Signal Processing Letters, 14(7), 500–503.CrossRefGoogle Scholar
  17. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., … Woodland, P. (2006). The HTK book. Version 3.4. Cambridge: Cambridge University, Engineering Department.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center for Development of Advanced Technologies (CDTA)AlgiersAlgeria
  2. 2.University of Science and Technology Houari Boumediene (USTHB)AlgiersAlgeria

Personalised recommendations