Skip to main content

Prosodic Features and Formant Contribution for Speech Recognition System over Mobile Network

  • Conference paper
International Joint Conference SOCO’13-CISIS’13-ICEUTE’13

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 239))

  • 1974 Accesses

Abstract

This paper investigates the contribution of formants and prosodic features like pitch and energy on automatic speech recognition system performance in mobile networks especially the GSMEFR (Global System for Mobile Enhanced Full Rate) codec.The front-end of the speech recognition system combines feature extracted by converting the quantized spectral information of speech coder, prosodic information and formant frequencies. The quantized spectral information is represented by the LPC (Linear Predictive Coding) coefficients, the LSF (Line Spectral Frequencies) coefficients, the approximation of the LSF’s to the LPC Cepstral Coefficients (LPCC’s) that are the Pseudo Cepstral Coefficients (PCC) and the Pseudo-Cepstrum (PCEP) coefficients. The achieved speaker-independent speech recognition system is based on Continuous Hidden Markov Model (CHMMs) classifier. The obtained results show that the resulting multivariate feature vectors lead to a significant improvement of the speech recognition system performance in mobile environment, compared to speech coder bit-stream system alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holmes, J.N., Holmes, W.J.: Using formant frequencies in speech recognition. In: Proc. Eurospeech, Rhodes, pp. 2083–2086 (1997)

    Google Scholar 

  2. Selouani, S.A., Tolba, H., O’Shaughnessy, D.: Auditory-based acoustic distinctive featuresand spectral cues for automatic speech recognition using a multi-stream paradigm. In: Proc. of ICASSP, pp. 837–840 (2002)

    Google Scholar 

  3. Tolba, H., Selouani, S.A., O’Shaughnessy, D.: Comparative experiments to evaluate theuse of auditory-based acoustic distinctive features and formant cues for robust automaticspeech recognition in low snr car environments. In: Proc. of Eurospeech, pp. 3085–3088 (2003)

    Google Scholar 

  4. Chongjia, N.I., Wenju, L., Xu, B.: Improved Large Vocabulary Mandarin Speech Recognition Using Prosodic and Lexical Information in Maximum Entropy Framework. In: Proc. CCPR 2009, pp. 1–4 (2009)

    Google Scholar 

  5. Ma, B., Zhu, D., Tong, R.: Chinese Dialect Identification Using Tone Features Based on Pitch Flux. In: Proc. ICASSP, p. I (2006)

    Google Scholar 

  6. Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, N.J.: Multi-stream product modalaudio-visual integration strategy for robust adaptive speech recognition. In: Proc. ICASSP, pp. II-2021–II-2024 (2002)

    Google Scholar 

  7. Guoyun, L.V., Dongmei, J., Rongchun, Z., Yunshu, H.: Multi-stream AsynchronyModeling for Audio-Visual Speech Recognition. In: Proc. ISM, pp. 37–44 (2007)

    Google Scholar 

  8. Amrouche, A., Debyeche, M., Taleb-Ahmed, A., Rouvean, J.M., Yagoub, M.C.E.: An efficient speech recognition system in adverse conditions using the nonparametric regression. International Journal of Engineering Applications of Artificial Intelligence, IJEAAI 2010 23(1), 85–94 (2010)

    Article  Google Scholar 

  9. Jarvinen, K., Vainio, J., Kapanen, P., Honkanen, T.: GSM Enhanced Full Rate Speech Codec. In: ICC 1997, Montreal, pp. 721–724. IEEE (1997)

    Google Scholar 

  10. Honkanen, T., Vainoi, J., Jarvinen, K., Haavisto, P., Salami, R.: Enhanced Full Rate Speech Code For Is-136 Digital Cellular System. In: ICASSP 1997, vol. 2, pp. 731–734. IEEE (1997)

    Google Scholar 

  11. Salami, R., Laflamme, C.: Description of GSM Enhanced Full Rate Speech Codec. In: ICC 1997, Montreal, pp. 725–729. IEEE (1997)

    Google Scholar 

  12. Digital Cellular Telecommunications System Enhanced Full Rate (EFR) SpeechTranscoding GSM 06.60 ETSI Technical Report version 8.0.1. Release (1999)

    Google Scholar 

  13. Fink, G.A.: Markov Models for Pattern Recognition, pp. 61–92. Springer (2008)

    Google Scholar 

  14. Furui, S.: Digital Speech Processing, Synthesis and Recognition, 2nd edn., pp. 243–328 (2001)

    Google Scholar 

  15. Peinado, A.M., Segura, J.C.: Speech Recognition Over Digital Channels, pp. 7–29. John Wiley & Sons Ltd. (2006)

    Google Scholar 

  16. Holmes, J., Holmes, W.: Speech Synthesis and Recognition, 2nd edn., pp. 161–164. Taylor & Francis e-Library (2003)

    Google Scholar 

  17. Chu, W.C.: Speech Coding Algorithms, pp. 33–44. John Wiley (2003)

    Google Scholar 

  18. Sayoud, K.: Introduction To Data Compression, 3rd edn., pp. 540–542. Elsevier (2006)

    Google Scholar 

  19. Tan, Z.H., Lindberg, B.: Automatic Speech Recognition On Mobile Devices And Over Communication Networks, pp. 41–117. Springer (2008)

    Google Scholar 

  20. Fabregas Surigué de Alencar, V., Alcaim, A.: Transformations of LPC and LSF Parameters to Speech Recognition Features, pp. 522–528. Springer (2005)

    Google Scholar 

  21. Kim, H.K., Choi, S.H., Lee, H.S.: On Approximating Line Spectral Frequencies To Lpc Cepstral Coefficients. IEEE 8(2) (March 2000)

    Google Scholar 

  22. Young, S., Odell, J.: The HTK Book Version 3.4 (December 2006), http://htk.eng.cam.ac.uk/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lallouani Bouchakour .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bouchakour, L., Debyeche, M. (2014). Prosodic Features and Formant Contribution for Speech Recognition System over Mobile Network. In: Herrero, Á., et al. International Joint Conference SOCO’13-CISIS’13-ICEUTE’13. Advances in Intelligent Systems and Computing, vol 239. Springer, Cham. https://doi.org/10.1007/978-3-319-01854-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01854-6_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01853-9

  • Online ISBN: 978-3-319-01854-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics