Language Resources and Evaluation

, Volume 44, Issue 4, pp 347–370 | Cite as

The Corpus DIMEx100: transcription and evaluation

  • Luis A. Pineda
  • Hayde Castellanos
  • Javier Cuétara
  • Lucian Galescu
  • Janet Juárez
  • Joaquim Llisterri
  • Patricia Pérez
  • Luis Villaseñor
Article

Abstract

In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.

Keywords

Phonetic corpus Phonetic transcription Transcription granularity Mexican Spanish Acoustic models 

Notes

Acknowledgments

The corpus DIMEx100 has been developed within the context of the DIME Project, at IIMAS, UNAM, with the collaboration of the Facultad de Filosofía y Letras, UNAM, and INAOE in Tonanzintla, Puebla. The authors wish to thank the enthusiastic participation of all members of the project who were involved in the collection and transcription of the corpus: Fernanda López, Varinia Estrada, Sergio Coria, Iván Moreno, Ivonne López, Arturo Wong, Laura Pérez, René López, Alejandro Acosta, Alejandro Carrasco, Rafael Torres, Gerardo Mendoza, Ana Ceballos, Alejandra Espinosa and Isabel López; special thanks go to Alejandro Reyes for technical support at INAOE, and to the 100 speakers that provided their voice for the corpus. We also thank James Allen for his continuous collaboration and encouragement along the development of this project. The authors also acknowledge the support of CONACyT’s grant 39380-U and PAPIIT-UNAM grant IN121206.

References

  1. Alarcos, E. (1950/1965). Fonología española. Madrid: Gredos.Google Scholar
  2. Canfield, D. L. (1981/1992). Spanish pronunciation in the Americas. Chicago: The University of Chicago Press.Google Scholar
  3. Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using CMU-Cambridge Toolkit. In Proceedings of Eurospeech’97, Rhodes, Greece, pp. 2207–2710.Google Scholar
  4. Cuétara, J. (2004). Fonética de la ciudad de México. Aportaciones desde las tecnologías del habla. MSc. Dissertation, Universidad Nacional Autónoma de México, México.Google Scholar
  5. Fetter, P. (1998). Detection and transcription of out-of-vocabulary words in continuous-speech recognition, PhD thesis, Daimler-Benz AG, aug 1998. Verbmobil Report 231.Google Scholar
  6. Guirao, M., & Borzone, A. M. (1972). Fonemas, sílabas y palabras en el español de Buenos Aires. Filología, 16, 135–165.Google Scholar
  7. Hieronymus, J. L. (1997). Worldbet phonetic symbols for multilanguage speech recognition and synthesis. New Jersey: AT&T and Bell Labs.Google Scholar
  8. Kirschning, I. (2001). Research and Development of Speech Technology and Applications for Mexican Spanish at the Tlatoa Group (Development Consortium at CHI 2001, Seattle, WA).Google Scholar
  9. Lander, T. (1997). The CSLU labeling guide. Oregon: Oregon Graduate Institute of Science and Technology. http://cslu.cse.ogi.edu/corpora/docs/labeling.pdf.
  10. Llisterri, J., Machuca, M. J., de la Mota, C., Riera, M., & Ríos, A. (2003). The perception of lexical stress in Spanish, in Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona, 3–9 August 2003. pp. 2023–2026. http://liceu.uab.es/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_03_Perception_Stress_Spanish.pdf.
  11. Llisterri, J., Machuca, M. J., de la Mota, C., Riera, M., & Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del discurso oral, 8, 289–325. http://liceu.uab.es/~joaquim/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf.
  12. Llisterri, J., & Mariño, J. B. (1993). Spanish adaptation of SAMPA and automatic phonetic transcription. Technical Report. SAM-A/UPC/001/v1 – ESPRIT PROJECT 6819 (SAM-A) Speech Technology Assessment in Multilingual Applications. http://liceu.uab.es/~joaquim/publicacions/SAMPA_Spanish_93.pdf.
  13. Lope Blanch, J. M. (1963–1964/1983). En torno a las vocales caedizas del español mexicano, in Estudios sobre el español de México, pp. 57-77. México: Universidad Nacional Autónoma de México.Google Scholar
  14. Moreno, A., Comeyne, R., Haslam, K., van den Heuvel, H., Höge, H., Horbach, S., et al. (2000). SALA: Speechdat Across Latin America. Results of the First Phase, Proceedings of the second international conference on language resources and evaluation. Greece: Athens.Google Scholar
  15. Moreno de Alba, J. (1994). La Pronunciación del Español de México. México: El Colegio de México.Google Scholar
  16. Moreno, A., & Mariño, J. (1998). Spanish dialects: Phonetic transcription, Proceedings of ICSLP’98, the fifth international conference on spoken language processing. Rundle, Mall: Causal Productions.Google Scholar
  17. Navarro Tomás, T. (1918/1970). Manual de pronunciación española. Madrid: Consejo Superior de Investigaciones Científicas.Google Scholar
  18. Navarro Tomás, T. (1946/1966). Escala de frecuencia de fonemas españoles in Estudios de fonología española (pp. 15–30). New York: Las Américas Publishing Company).Google Scholar
  19. NIST (2007). Speech recognition scoring toolkit (SCTK) Version 2.2.4. http://www.nist.gov/speech/tools.
  20. Pérez, E. H. (2003). Frecuencia de fonemas. e-rthabla, Revista electrónica de Tecnología del Habla 1. http://lorien.die.upm.es/~lapiz/e-rthabla/numeros/N1/N1_A4.pdf.
  21. Perissinotto, G. (1975). Fonología del español hablado en la Ciudad de México. Ensayo de un método sociolingüístico. México: El Colegio de México.Google Scholar
  22. Pineda, L. A., Massé, A., Meza, I., Salas, M., Schwarz, E., Uraga, E., & Villaseñor, L. (2002). The DIME Project, Proceedings of MICAI2002, Lectures Notes in Artificial Intelligence,vol. 2313, pp.166–175, Springer-Verlag.Google Scholar
  23. Pineda, L. A., Villaseñor, L., Cuétara, J., Castellanos, H., & López, I. (2004). DIMEx100: A new phonetic and speech corpus for Mexican Spanish, en Advances. In C. Lemaitre, C. A. Reyes, & J. A. Gonzalez (Eds.), Artificial intelligence, Iberamia-2004, lectures notes in artificial intelligence (vol. 3315, pp. 974–983), Springer-Verlag, Google Scholar
  24. Quilis, A. (1981/1988). Fonética acústica de la lengua española. Madrid: Gredos.Google Scholar
  25. Quilis, A., & Esgueva, M. (1980). Frecuencia de fonemas en el español hablado. Lingüística Española Actual, 2(1), 1–25.Google Scholar
  26. Ríos Mestre, A. (1999). La transcripción fonética automática del diccionario electrónico de formas simples flexivas del español: estudio fonológico del léxico, Estudios de Lingüística Española, vol. 4. http://elies.rediris.es/elies4/.
  27. Rojo, G. (1991) Frecuencia de fonemas en español actual. In M. Brea & F. M. Fernández Rei (Eds.), Homenaxe ó profesor Constantino García (pp. 451–467). Santiago de Compostela: Universidade de Santiago de Compostela, Servicio de Publicación e Intercambio Científico.Google Scholar
  28. Sphinx (2006). The CMU sphinx open source speech recognition engines. http://cmusphinx.sourceforge.net/html/cmusphinx.php.
  29. Strik, H., & Cucchiarini, C. (1998). Modeling pronunciation variation for ASR: Overview and comparison of methods. In H. Strik, J. M. Kessens, & M. Wester (Eds.), Proceedings of the ESCA workshop ‘modeling pronunciation variation for automatic speech recognition’, Rolduc, Kerkrade, 4–6 May 1998, pp. 137–144.Google Scholar
  30. Sutton, S., Cole, R., et al. (1998). Universal speech tools: The CSLU toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 3221–3224, Sydney, Australia, November 1998. http://www.cslu.ogi.edu.
  31. Villaseñor, L., Massé, A. & Pineda, L. (2000). The DIME Corpus, Memorias 3º. Proceedings of Encuentro Internacional de Ciencias de la Computación ENC01, Tomo II, C. Zozaya, M. Mejía, P. Noriega y A. Sánchez (Eds.), SMCC, Aguascalientes, Ags. México, September, 2001.Google Scholar
  32. Villaseñor, L., Montes y Gómez, M., Vaufreydaz, D. & Serignat, J. F. (2004). Experiments on the Construction of a Phonetically Balanced Corpus from the WEB, Proceedings of CICLING2004, LNCS, Springer-Verlag, vol. 2945, 416–419.Google Scholar
  33. Wells, J. (1998). SAMPA. Computer readable phonetic alphabet. University College London, http://www.phon.ucl.ac.uk/home/sampa.

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Luis A. Pineda
    • 1
  • Hayde Castellanos
    • 1
  • Javier Cuétara
    • 2
  • Lucian Galescu
    • 3
  • Janet Juárez
    • 1
  • Joaquim Llisterri
    • 4
  • Patricia Pérez
    • 1
  • Luis Villaseñor
    • 5
  1. 1.Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS)Universidad Nacional Autónoma de México (UNAM)Mexico CityMéxico
  2. 2.Facultad de Filosofía y LetrasUNAMMexico CityMéxico
  3. 3.Florida Institute for Human and Machine CognitionPensacolaUSA
  4. 4.Departament de Filologia EspanyolaUniversitat Autònoma de BarcelonaBarcelonaSpain
  5. 5.Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE)PueblaMéxico

Personalised recommendations