Temporal Feature Selection for Noisy Speech Recognition

Trottier, Ludovic; Chaib-draa, Brahim; Giguère, Philippe

doi:10.1007/978-3-319-18356-5_14

Ludovic Trottier⁶,
Brahim Chaib-draa⁶ &
Philippe Giguère⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

2664 Accesses

Abstract

Automatic speech recognition systems rely on feature extraction techniques to improve their performance. Static features obtained from each frame are usually enhanced with dynamical components using derivative operations (delta features). However, the susceptibility to noise of the derivative impacts on the accuracy of the recognition in noisy environments. We propose an alternative to the delta features by selecting coefficients from adjacent frames based on frequency. We noticed that consecutive samples were highly correlated at low frequency and more representative dynamics could be incorporated by looking farther away in time. The strategy we developed to perform this frequency-based selection was evaluated on the Aurora 2 continuous-digits and connected-digits tasks using MFCC, PLPCC and LPCC standard features. The results of our experimentations show that our strategy achieved an average relative improvement of \(32.10\%\) in accuracy, with most gains in very noisy environments where the traditional delta features have low recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bahl, L., De Souza, P., Gopalakrishnan, P., Nahamoo, D., Picheny, M.: Robust methods for using context-dependent features and models in a continuous speech recognizer. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1994, vol. 1, pp. I–533. IEEE (1994)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional Inc., San Diego (1990)
MATH Google Scholar
Furui, S.: Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1986, vol. 11, pp. 1991–1994. IEEE (1986)
Google Scholar
Gales, M., Young, S.: The application of hidden markov models in speech recognition. Foundations and Trends in Signal Processing 1(3), 195–304 (2008)
Article Google Scholar
Gales, M.J.: Maximum likelihood linear transformations for hmm-based speech recognition. Computer Speech & Language 12(2), 75–98 (1998)
Article Google Scholar
Gales, M.J.: Semi-tied covariance matrices for hidden markov models. IEEE Transactions on Speech and Audio Processing 7(3), 272–281 (1999)
Article Google Scholar
Gopinath, R.A.: Maximum likelihood modeling with gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 661–664. IEEE (1998)
Google Scholar
Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–5. IEEE (2010)
Google Scholar
Jolliffe, I.: Principal component analysis. Springer Series in Statistics, vol. 1. Springer, Berlin (1986)
Google Scholar
Kumar, K., Kim, C., Stern, R.M.: Delta-spectral cepstral coefficients for robust speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4784–4787. IEEE (2011)
Google Scholar
Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26(4), 283–297 (1998)
Article Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech & Language 9(2), 171–185 (1995)
Article Google Scholar
Lockwood, P., Boudy, J.: Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication 11(2–3), 215–228 (1992)
Article Google Scholar
Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete-time Signal Processing, 2nd edn. Prentice-Hall Inc., Upper Saddle River (1999)
Google Scholar
Pearce, D., günter Hirsch, H., Gmbh, E.E.D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR2000, pp. 29–32 (2000)
Google Scholar
Rath, S.P., Povey, D., Veselỳ, K.: Improved feature processing for deep neural networks. In: Proc. Interspeech (2013)
Google Scholar
Saon, G., Padmanabhan, M., Gopinath, R., Chen, S.: Maximum likelihood discriminant feature spaces. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, vol. 2, pp. II1129–II1132. IEEE (2000)
Google Scholar
Shrawankar, U., Thakare, V.M.: Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145 (2013)
Trottier, L., Chaib-draa, B., Giguère, P.: Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition. In: Sokolova, M., van Beek, P. (eds.) Canadian AI. LNCS, vol. 8436, pp. 357–362. Springer, Heidelberg (2014)
Chapter Google Scholar
Weng, Z., Li, L., Guo, D.: Speaker recognition using weighted dynamic MFCC based on GMM. In: 2010 International Conference on Anti-Counterfeiting Security and Identification in Communication (ASID), pp. 285–288. IEEE (2010)
Google Scholar
Young, S.J., Evermann, G., Gales, M.J.F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Yu, D., Seltzer, M.L., Li, J., Huang, J.T., Seide, F.: Feature learning in deep neural networks-studies on speech recognition tasks. arXiv:1301.3605 (2013)
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. Journal of Computer Science and Technology 16(6), 582–589 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Université Laval, Quebec (QC), G1V 0A6, Canada
Ludovic Trottier, Brahim Chaib-draa & Philippe Giguère

Authors

Ludovic Trottier
View author publications
You can also search for this author in PubMed Google Scholar
Brahim Chaib-draa
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Giguère
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ludovic Trottier .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, Canada
Denilson Barbosa
Dalhousie University, Halifax, Canada
Evangelos Milios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trottier, L., Chaib-draa, B., Giguère, P. (2015). Temporal Feature Selection for Noisy Speech Recognition. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-18356-5_14
Published: 29 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics