Skip to main content

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7911))

Included in the following conference series:

Abstract

In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separation algorithm (ESA). Similar to the PNCC (Power Normalized Cepstral Coefficients) front-end, a medium duration power bias subtraction (MDPBS) is used to enhance the AM power spectrum. The performance of the proposed feature extractor is evaluated, in the context of speech recognition, on the AURORA-4 corpus, which represents additive noise and channel mismatch conditions. The ETSI advanced front-end (ETSI-AFE),power normalized cepstral coefficients (PNCC), Cochlear filterbank cepstral coefficients (CFCC) and conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results on the AURORA-4 task depict that the proposed method is robust against both additive and different microphone channel environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  2. Hermansky, H.: Perceptual linear prediction analysis of speech, J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  3. Terasawa, H.: A Hybrid Model for Timbre Perception: Quantitative Representations of Sound Color and Density. Ph.D. Thesis, Stanford University, Stanford, CA (2009)

    Google Scholar 

  4. ETSI ES 202 050, Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; Compression algorithms (2003)

    Google Scholar 

  5. Kim, C., Stern, R.M.: Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4574–4577 (March 2010)

    Google Scholar 

  6. Alam, M.J., Kenny, P., O’Shaughnessy, D.: Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum. In: Proc. INTERSPEECH, Portland Oregon (September 2012)

    Google Scholar 

  7. van Hout, J., Alwan, A.: A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition. In: Proc. of ICASSP, pp. 4105–4108 (2012)

    Google Scholar 

  8. Vikramjit Mitra, H., Franco, M., Graciarena, A.: Mandal, Normalized Amplitude modulation features for large vocabulary noise-robust speech recognition. In: Proc. of ICASSP, pp. 4117–4120 (2012)

    Google Scholar 

  9. Maragos, Kaiser, J.F., Quatieri, T.F.: On amplitude and frequency demodulation using energy operators. IEEE Trans. Signal Processing 41(4), 1532–1550 (1993)

    Article  MATH  Google Scholar 

  10. Potamianos, A., Maragos, P.: Speech analysis and synthesis using an AM–FM modulation model. Speech Communication 28, 195–209 (1999)

    Article  Google Scholar 

  11. Dimitriadis, D., Maragos, P.: Continuous energy demodulation methods and application to speech analysis. Speech Communication 48(7), 819–837 (2006)

    Article  Google Scholar 

  12. Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9, 201–216 (2001)

    Article  Google Scholar 

  13. Gao, H., Chen, S.G.: Emotion classification of mandarin speech based on TEO nonlinear features. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 394–398 (2007)

    Google Scholar 

  14. Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters 6(10), 259–261 (1999)

    Article  Google Scholar 

  15. Dimitriadis, D., Maragos, P., Potamianos, A.: Robust AM–FM features for speech recognition. IEEE Signal Processing Letters 12(9), 621–624 (2005)

    Article  Google Scholar 

  16. Jankowski Jr., C.R., Quatieri, T.F., Reynolds, D.A.: Measuring fine structure in speech: Application to speaker identification. In: ICASSP 1995, Detroit, USA (May 1995)

    Google Scholar 

  17. Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech and Audio Processing 7(5), 569–586 (1999)

    Article  Google Scholar 

  18. Grimaldi, M., Cummins, F.: Speaker identification using instantaneous frequencies. IEEE Trans. Audio, Speech and Language Processing 16(6), 1097–1111 (2008)

    Article  Google Scholar 

  19. Tsiakoulis, P., Potamianos, A.: Statistical Analysis of Amplitude Modulation in Speech Signals using an AM-FM Model. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan (April 2009)

    Google Scholar 

  20. Potamianos, A., Maragos, P.: A comparison of energy operator and Hilbert transform approach to signal and speech demodulation. Signal Process 37(1), 95–120 (1994)

    Article  MATH  Google Scholar 

  21. Mukhopadhyay, S., Ray, G.C.: A new interpretation of nonlinear energy operator and its efficacy in spike detection. IEEE Tans. on Biomedical Engg. 45(2), 180–187 (1998)

    Article  Google Scholar 

  22. Parihar, N., Picone, J., Pearce, D., Hirsch, H.G.: Performance analysis of the Aurora large vocabulary baseline system. In: Proceedings of the European Signal Processing Conference, Vienna, Austria (2004)

    Google Scholar 

  23. Kaiser, J.F.: On a Simple Algorithm to Calculate the ‘Energy’ of a Signal,”. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, pp. 381–384 (April 1990)

    Google Scholar 

  24. Li, Q(P.), Huang, Y.: Robust speaker identification using an auditory-based feature. In: Proc. ICASSP, pp. 4514–4517 (2010)

    Google Scholar 

  25. Kvedalen, E.: Signal processing using the Teager energy operator and other nonlinear operators, Cand. Scient Thesis, University of Oslo (May 2003)

    Google Scholar 

  26. Au Yeung, S.-K., Siu, M.-H.: Improved performance of Aurora-4 using HTK and unsupervised MLLR adaptation. In: Proceedings of the Int. Conference on Spoken Language Processing, Jeju, Korea (2004)

    Google Scholar 

  27. Young, S.J., et al.: HTK Book, Entropic Cambridge Research Laboratory Ltd., 3.4 edition (2006), http://htk.eng.cam.ac.uk/

  28. Alam, M.J., Ouellet, P., Kenny, P., O’Shaughnessy, D.: Comparative Evaluation of Feature Normalization Techniques for Speaker Verification. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds.) NOLISP 2011. LNCS, vol. 7015, pp. 246–253. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alam, M.J., Kenny, P., O’Shaughnessy, D. (2013). Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38847-7_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38846-0

  • Online ISBN: 978-3-642-38847-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics