Prosodic Features and Formant Contribution for Speech Recognition System over Mobile Network

Bouchakour, Lallouani; Debyeche, Mohamed

doi:10.1007/978-3-319-01854-6_14

Lallouani Bouchakour¹¹ &
Mohamed Debyeche¹¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 239))

1974 Accesses

Abstract

This paper investigates the contribution of formants and prosodic features like pitch and energy on automatic speech recognition system performance in mobile networks especially the GSMEFR (Global System for Mobile Enhanced Full Rate) codec.The front-end of the speech recognition system combines feature extracted by converting the quantized spectral information of speech coder, prosodic information and formant frequencies. The quantized spectral information is represented by the LPC (Linear Predictive Coding) coefficients, the LSF (Line Spectral Frequencies) coefficients, the approximation of the LSF’s to the LPC Cepstral Coefficients (LPCC’s) that are the Pseudo Cepstral Coefficients (PCC) and the Pseudo-Cepstrum (PCEP) coefficients. The achieved speaker-independent speech recognition system is based on Continuous Hidden Markov Model (CHMMs) classifier. The obtained results show that the resulting multivariate feature vectors lead to a significant improvement of the speech recognition system performance in mobile environment, compared to speech coder bit-stream system alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Holmes, J.N., Holmes, W.J.: Using formant frequencies in speech recognition. In: Proc. Eurospeech, Rhodes, pp. 2083–2086 (1997)
Google Scholar
Selouani, S.A., Tolba, H., O’Shaughnessy, D.: Auditory-based acoustic distinctive featuresand spectral cues for automatic speech recognition using a multi-stream paradigm. In: Proc. of ICASSP, pp. 837–840 (2002)
Google Scholar
Tolba, H., Selouani, S.A., O’Shaughnessy, D.: Comparative experiments to evaluate theuse of auditory-based acoustic distinctive features and formant cues for robust automaticspeech recognition in low snr car environments. In: Proc. of Eurospeech, pp. 3085–3088 (2003)
Google Scholar
Chongjia, N.I., Wenju, L., Xu, B.: Improved Large Vocabulary Mandarin Speech Recognition Using Prosodic and Lexical Information in Maximum Entropy Framework. In: Proc. CCPR 2009, pp. 1–4 (2009)
Google Scholar
Ma, B., Zhu, D., Tong, R.: Chinese Dialect Identification Using Tone Features Based on Pitch Flux. In: Proc. ICASSP, p. I (2006)
Google Scholar
Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, N.J.: Multi-stream product modalaudio-visual integration strategy for robust adaptive speech recognition. In: Proc. ICASSP, pp. II-2021–II-2024 (2002)
Google Scholar
Guoyun, L.V., Dongmei, J., Rongchun, Z., Yunshu, H.: Multi-stream AsynchronyModeling for Audio-Visual Speech Recognition. In: Proc. ISM, pp. 37–44 (2007)
Google Scholar
Amrouche, A., Debyeche, M., Taleb-Ahmed, A., Rouvean, J.M., Yagoub, M.C.E.: An efficient speech recognition system in adverse conditions using the nonparametric regression. International Journal of Engineering Applications of Artificial Intelligence, IJEAAI 2010 23(1), 85–94 (2010)
Article Google Scholar
Jarvinen, K., Vainio, J., Kapanen, P., Honkanen, T.: GSM Enhanced Full Rate Speech Codec. In: ICC 1997, Montreal, pp. 721–724. IEEE (1997)
Google Scholar
Honkanen, T., Vainoi, J., Jarvinen, K., Haavisto, P., Salami, R.: Enhanced Full Rate Speech Code For Is-136 Digital Cellular System. In: ICASSP 1997, vol. 2, pp. 731–734. IEEE (1997)
Google Scholar
Salami, R., Laflamme, C.: Description of GSM Enhanced Full Rate Speech Codec. In: ICC 1997, Montreal, pp. 725–729. IEEE (1997)
Google Scholar
Digital Cellular Telecommunications System Enhanced Full Rate (EFR) SpeechTranscoding GSM 06.60 ETSI Technical Report version 8.0.1. Release (1999)
Google Scholar
Fink, G.A.: Markov Models for Pattern Recognition, pp. 61–92. Springer (2008)
Google Scholar
Furui, S.: Digital Speech Processing, Synthesis and Recognition, 2nd edn., pp. 243–328 (2001)
Google Scholar
Peinado, A.M., Segura, J.C.: Speech Recognition Over Digital Channels, pp. 7–29. John Wiley & Sons Ltd. (2006)
Google Scholar
Holmes, J., Holmes, W.: Speech Synthesis and Recognition, 2nd edn., pp. 161–164. Taylor & Francis e-Library (2003)
Google Scholar
Chu, W.C.: Speech Coding Algorithms, pp. 33–44. John Wiley (2003)
Google Scholar
Sayoud, K.: Introduction To Data Compression, 3rd edn., pp. 540–542. Elsevier (2006)
Google Scholar
Tan, Z.H., Lindberg, B.: Automatic Speech Recognition On Mobile Devices And Over Communication Networks, pp. 41–117. Springer (2008)
Google Scholar
Fabregas Surigué de Alencar, V., Alcaim, A.: Transformations of LPC and LSF Parameters to Speech Recognition Features, pp. 522–528. Springer (2005)
Google Scholar
Kim, H.K., Choi, S.H., Lee, H.S.: On Approximating Line Spectral Frequencies To Lpc Cepstral Coefficients. IEEE 8(2) (March 2000)
Google Scholar
Young, S., Odell, J.: The HTK Book Version 3.4 (December 2006), http://htk.eng.cam.ac.uk/

Download references

Author information

Authors and Affiliations

Faculty of Electronics and Computer Sciences (FEI), USTHB, Speech Communication and Signal Processing Laboratory (LCPTS), P.O. Box 32, Bab Ezzouar, Algiers, Algeria
Lallouani Bouchakour & Mohamed Debyeche

Authors

Lallouani Bouchakour
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Debyeche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lallouani Bouchakour .

Editor information

Editors and Affiliations

Department of Civil Engineering, University of Burgos, Burgos, Spain
Álvaro Herrero
Department of Civil Engineering, University of Burgos, Burgos, Spain
Bruno Baruque
German Workforce ADL Partnership Laboratory, Waltershausen, Germany
Fanny Klett
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, Washington, USA
Ajith Abraham
Department of Computer Science Faculty of Ele. Eng. & Computer Science, VŠB-TU Ostrava, Ostrava, Czech Republic
Václav Snášel
Department of Computer Science, University of Sao Paulo at Sao Carlos, Sao Carlos, Brazil
André C.P.L.F. de Carvalho
DeustoTech Computing, University of Deusto, Bilbao, Spain
Pablo García Bringas
Department of Computer Science Faculty of Elec. Eng. and Comp. Science, VŠB-TU Ostrava, Ostrava, Czech Republic
Ivan Zelinka
University of Salamanca, Salamanca, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouchakour, L., Debyeche, M. (2014). Prosodic Features and Formant Contribution for Speech Recognition System over Mobile Network. In: Herrero, Á., et al. International Joint Conference SOCO’13-CISIS’13-ICEUTE’13. Advances in Intelligent Systems and Computing, vol 239. Springer, Cham. https://doi.org/10.1007/978-3-319-01854-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-01854-6_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01853-9
Online ISBN: 978-3-319-01854-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics