Creating a Mexican Spanish Version of the CMU Sphinx-III Speech Recognition System

  • Armando Varela
  • Heriberto Cuayáhuitl
  • Juan Arturo Nolazco-Flores
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2905)

Abstract

In this paper we present the creation of a Mexican Spanish version of the CMU Sphinx-III speech recognition system. We trained acoustic and N-gram language models with a phonetic set of 23 phonemes. Our speech data for training and testing was collected from an auto-attendant system under telephone environments. We present experiments with different language models. Our best result scored an overall error rate of 6.32%. Using this version is now possible to develop speech applications for Spanish speaking communities. This version of the CMU Sphinx system is freely available for non-commercial use under request.

Keywords

Automatic Speech Recognition Automatic Gain Control Speech Data Speech Recognition System Word Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Huerta, J.M., Thayer, E., Ravishankar, M., Stern, R.M.: The Development of the 1997 CMU Spanish Broadcast News Transcription System. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, Virginia (February 1998)Google Scholar
  2. 2.
    Huerta, J.M., Chen, S.J., Stern, R.M.: The 1998 Carnegie Mellon University Sphinx-III Spanish Broadcast News Transcription System. In: The proceedigns of the DARPA Broadcast News Transcription and Understanding Workshop, Herndon, Virginia (March 1999)Google Scholar
  3. 3.
    Cuayáhuitl, H., Serridge, B.: Out-Of-Vocabulary Word Modeling and Rejection for Spanish Keyword Spotting Systems. In: Coello Coello, C.A., de Albornoz, Á., Sucar, L.E., Battistutti, O.C. (eds.) MICAI 2002. LNCS (LNAI), vol. 2313, pp. 158–167. Springer, Heidelberg (2002)Google Scholar
  4. 4.
    Hwang, M.-Y.: Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition. Ph.D. thesis, Carnegie Mellon University (1993)Google Scholar
  5. 5.
    Hieronymus, L.J.: ASCII Phonetic Symbols for World’s Languages: worldbet. Technical report, Bell Labs (1993)Google Scholar
  6. 6.
    Clarkson, P., Rosenfeld, R.: Statistical Language Modeling Using the CMU Cambridge Toolkit. In: The proceedings of Eurospeech, Rodhes, Greece, pp. 2707–2710 (1997)Google Scholar
  7. 7.
    Farfán, F., Cuayáhuitl, H., Portilla, A.: Evaluating Dialogue Strategies in a Spoken Dialogue System for Email. In: The proceedings of the IASTED Artificial Intelligence and Applications, September 2003, ACTA Press, Manalmádena (2003)Google Scholar
  8. 8.
    CMU Robust Speech Group, Carnegie Mellon University, http://www.cs.cmu.edu/afs/cs/user/robust/www/

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Armando Varela
    • 1
  • Heriberto Cuayáhuitl
    • 1
  • Juan Arturo Nolazco-Flores
    • 2
  1. 1.Department of Engineering and Technology, Intelligent Systems Research GroupUniversidad Autónoma de TlaxcalaApizacoMexico
  2. 2.Instituto Tecnológico de Estudios Superiores de Monterrey, Sucursal de Correos “J”MonterreyMexico

Personalised recommendations