Prosodic Features and Formant Contribution for Speech Recognition System over Mobile Network

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 239)

Abstract

This paper investigates the contribution of formants and prosodic features like pitch and energy on automatic speech recognition system performance in mobile networks especially the GSMEFR (Global System for Mobile Enhanced Full Rate) codec.The front-end of the speech recognition system combines feature extracted by converting the quantized spectral information of speech coder, prosodic information and formant frequencies. The quantized spectral information is represented by the LPC (Linear Predictive Coding) coefficients, the LSF (Line Spectral Frequencies) coefficients, the approximation of the LSF’s to the LPC Cepstral Coefficients (LPCC’s) that are the Pseudo Cepstral Coefficients (PCC) and the Pseudo-Cepstrum (PCEP) coefficients. The achieved speaker-independent speech recognition system is based on Continuous Hidden Markov Model (CHMMs) classifier. The obtained results show that the resulting multivariate feature vectors lead to a significant improvement of the speech recognition system performance in mobile environment, compared to speech coder bit-stream system alone.

Keywords

ASR GSMEFR CHMM ARADIGIT bit-stream Formant Pitch 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Holmes, J.N., Holmes, W.J.: Using formant frequencies in speech recognition. In: Proc. Eurospeech, Rhodes, pp. 2083–2086 (1997)Google Scholar
  2. 2.
    Selouani, S.A., Tolba, H., O’Shaughnessy, D.: Auditory-based acoustic distinctive featuresand spectral cues for automatic speech recognition using a multi-stream paradigm. In: Proc. of ICASSP, pp. 837–840 (2002)Google Scholar
  3. 3.
    Tolba, H., Selouani, S.A., O’Shaughnessy, D.: Comparative experiments to evaluate theuse of auditory-based acoustic distinctive features and formant cues for robust automaticspeech recognition in low snr car environments. In: Proc. of Eurospeech, pp. 3085–3088 (2003)Google Scholar
  4. 4.
    Chongjia, N.I., Wenju, L., Xu, B.: Improved Large Vocabulary Mandarin Speech Recognition Using Prosodic and Lexical Information in Maximum Entropy Framework. In: Proc. CCPR 2009, pp. 1–4 (2009)Google Scholar
  5. 5.
    Ma, B., Zhu, D., Tong, R.: Chinese Dialect Identification Using Tone Features Based on Pitch Flux. In: Proc. ICASSP, p. I (2006)Google Scholar
  6. 6.
    Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, N.J.: Multi-stream product modalaudio-visual integration strategy for robust adaptive speech recognition. In: Proc. ICASSP, pp. II-2021–II-2024 (2002)Google Scholar
  7. 7.
    Guoyun, L.V., Dongmei, J., Rongchun, Z., Yunshu, H.: Multi-stream AsynchronyModeling for Audio-Visual Speech Recognition. In: Proc. ISM, pp. 37–44 (2007)Google Scholar
  8. 8.
    Amrouche, A., Debyeche, M., Taleb-Ahmed, A., Rouvean, J.M., Yagoub, M.C.E.: An efficient speech recognition system in adverse conditions using the nonparametric regression. International Journal of Engineering Applications of Artificial Intelligence, IJEAAI 2010 23(1), 85–94 (2010)CrossRefGoogle Scholar
  9. 9.
    Jarvinen, K., Vainio, J., Kapanen, P., Honkanen, T.: GSM Enhanced Full Rate Speech Codec. In: ICC 1997, Montreal, pp. 721–724. IEEE (1997)Google Scholar
  10. 10.
    Honkanen, T., Vainoi, J., Jarvinen, K., Haavisto, P., Salami, R.: Enhanced Full Rate Speech Code For Is-136 Digital Cellular System. In: ICASSP 1997, vol. 2, pp. 731–734. IEEE (1997)Google Scholar
  11. 11.
    Salami, R., Laflamme, C.: Description of GSM Enhanced Full Rate Speech Codec. In: ICC 1997, Montreal, pp. 725–729. IEEE (1997)Google Scholar
  12. 12.
    Digital Cellular Telecommunications System Enhanced Full Rate (EFR) SpeechTranscoding GSM 06.60 ETSI Technical Report version 8.0.1. Release (1999)Google Scholar
  13. 13.
    Fink, G.A.: Markov Models for Pattern Recognition, pp. 61–92. Springer (2008)Google Scholar
  14. 14.
    Furui, S.: Digital Speech Processing, Synthesis and Recognition, 2nd edn., pp. 243–328 (2001)Google Scholar
  15. 15.
    Peinado, A.M., Segura, J.C.: Speech Recognition Over Digital Channels, pp. 7–29. John Wiley & Sons Ltd. (2006)Google Scholar
  16. 16.
    Holmes, J., Holmes, W.: Speech Synthesis and Recognition, 2nd edn., pp. 161–164. Taylor & Francis e-Library (2003)Google Scholar
  17. 17.
    Chu, W.C.: Speech Coding Algorithms, pp. 33–44. John Wiley (2003)Google Scholar
  18. 18.
    Sayoud, K.: Introduction To Data Compression, 3rd edn., pp. 540–542. Elsevier (2006)Google Scholar
  19. 19.
    Tan, Z.H., Lindberg, B.: Automatic Speech Recognition On Mobile Devices And Over Communication Networks, pp. 41–117. Springer (2008)Google Scholar
  20. 20.
    Fabregas Surigué de Alencar, V., Alcaim, A.: Transformations of LPC and LSF Parameters to Speech Recognition Features, pp. 522–528. Springer (2005)Google Scholar
  21. 21.
    Kim, H.K., Choi, S.H., Lee, H.S.: On Approximating Line Spectral Frequencies To Lpc Cepstral Coefficients. IEEE 8(2) (March 2000)Google Scholar
  22. 22.
    Young, S., Odell, J.: The HTK Book Version 3.4 (December 2006), http://htk.eng.cam.ac.uk/

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Faculty of Electronics and Computer Sciences (FEI)USTHB, Speech Communication and Signal Processing Laboratory (LCPTS)Bab EzzouarAlgeria

Personalised recommendations