Skip to main content

Temporal Feature Selection for Noisy Speech Recognition

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

  • 2664 Accesses

Abstract

Automatic speech recognition systems rely on feature extraction techniques to improve their performance. Static features obtained from each frame are usually enhanced with dynamical components using derivative operations (delta features). However, the susceptibility to noise of the derivative impacts on the accuracy of the recognition in noisy environments. We propose an alternative to the delta features by selecting coefficients from adjacent frames based on frequency. We noticed that consecutive samples were highly correlated at low frequency and more representative dynamics could be incorporated by looking farther away in time. The strategy we developed to perform this frequency-based selection was evaluated on the Aurora 2 continuous-digits and connected-digits tasks using MFCC, PLPCC and LPCC standard features. The results of our experimentations show that our strategy achieved an average relative improvement of \(32.10\%\) in accuracy, with most gains in very noisy environments where the traditional delta features have low recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahl, L., De Souza, P., Gopalakrishnan, P., Nahamoo, D., Picheny, M.: Robust methods for using context-dependent features and models in a continuous speech recognizer. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1994, vol. 1, pp. I–533. IEEE (1994)

    Google Scholar 

  2. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional Inc., San Diego (1990)

    MATH  Google Scholar 

  3. Furui, S.: Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1986, vol. 11, pp. 1991–1994. IEEE (1986)

    Google Scholar 

  4. Gales, M., Young, S.: The application of hidden markov models in speech recognition. Foundations and Trends in Signal Processing 1(3), 195–304 (2008)

    Article  Google Scholar 

  5. Gales, M.J.: Maximum likelihood linear transformations for hmm-based speech recognition. Computer Speech & Language 12(2), 75–98 (1998)

    Article  Google Scholar 

  6. Gales, M.J.: Semi-tied covariance matrices for hidden markov models. IEEE Transactions on Speech and Audio Processing 7(3), 272–281 (1999)

    Article  Google Scholar 

  7. Gopinath, R.A.: Maximum likelihood modeling with gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 661–664. IEEE (1998)

    Google Scholar 

  8. Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–5. IEEE (2010)

    Google Scholar 

  9. Jolliffe, I.: Principal component analysis. Springer Series in Statistics, vol. 1. Springer, Berlin (1986)

    Google Scholar 

  10. Kumar, K., Kim, C., Stern, R.M.: Delta-spectral cepstral coefficients for robust speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4784–4787. IEEE (2011)

    Google Scholar 

  11. Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26(4), 283–297 (1998)

    Article  Google Scholar 

  12. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech & Language 9(2), 171–185 (1995)

    Article  Google Scholar 

  13. Lockwood, P., Boudy, J.: Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication 11(2–3), 215–228 (1992)

    Article  Google Scholar 

  14. Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete-time Signal Processing, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1999)

    Google Scholar 

  15. Pearce, D., günter Hirsch, H., Gmbh, E.E.D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR2000, pp. 29–32 (2000)

    Google Scholar 

  16. Rath, S.P., Povey, D., Veselỳ, K.: Improved feature processing for deep neural networks. In: Proc. Interspeech (2013)

    Google Scholar 

  17. Saon, G., Padmanabhan, M., Gopinath, R., Chen, S.: Maximum likelihood discriminant feature spaces. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, vol. 2, pp. II1129–II1132. IEEE (2000)

    Google Scholar 

  18. Shrawankar, U., Thakare, V.M.: Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145 (2013)

  19. Trottier, L., Chaib-draa, B., Giguère, P.: Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition. In: Sokolova, M., van Beek, P. (eds.) Canadian AI. LNCS, vol. 8436, pp. 357–362. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  20. Weng, Z., Li, L., Guo, D.: Speaker recognition using weighted dynamic MFCC based on GMM. In: 2010 International Conference on Anti-Counterfeiting Security and Identification in Communication (ASID), pp. 285–288. IEEE (2010)

    Google Scholar 

  21. Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

  22. Yu, D., Seltzer, M.L., Li, J., Huang, J.T., Seide, F.: Feature learning in deep neural networks-studies on speech recognition tasks. arXiv:1301.3605 (2013)

  23. Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. Journal of Computer Science and Technology 16(6), 582–589 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ludovic Trottier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Trottier, L., Chaib-draa, B., Giguère, P. (2015). Temporal Feature Selection for Noisy Speech Recognition. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18356-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18355-8

  • Online ISBN: 978-3-319-18356-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics