Advertisement

Experiments on the Construction of a Phonetically Balanced Corpus from the Web

  • Luis Villaseñor-Pineda
  • Manuel Montes-y-Gómez
  • Dominique Vaufreydaz
  • Jean-François Serignat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2945)

Abstract

The construction of a speech recognition system requires a recorded set of phrases to compute the pertinent acoustic models. This set of phrases must be phonetically rich and balanced in order to obtain a robust recognizer. By tradition, this set is defined manually implicating a great human effort. In this paper we propose an automated method for assembling a phonetically balanced corpus (set of phrases) from the Web. The proposed method was used to construct a phonetically balanced corpus for the Mexican Spanish language.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Vaufreydaz, D., Bergamini, C., Serignat, J.F., Besacier, L., Akbar, M.: A New Methodology for Speech Corpora Definition from Internet Documents. In: LREC 2000 Language Resources & Evaluation international Conference, Athens, Greece (2000)Google Scholar
  2. 2.
    Galicia-Haro, S.: Procesamiento de Textos Electrónicos para la Construcción de un Corpus. In: CORE 2003, México, D.F (2003)Google Scholar
  3. 3.
    Gelbukh, A., Sidorov, G., Chanona, L.: Compilation of a Spanish Representative Corpus. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 285. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Uraga, E., Pineda, L.: Automatic generation of pronunciation lexicons for Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 330. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Pérez, H.E.: Frecuencia de fonemas. Revista Electrónica de la Red Temática en Tecnologías del Habla, Número 1, Marzo (2003)Google Scholar
  6. 6.
    Alarcos-Llorach, E.: Fonología española, Madrid, Gredos (1965)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Luis Villaseñor-Pineda
    • 1
  • Manuel Montes-y-Gómez
    • 1
  • Dominique Vaufreydaz
    • 2
  • Jean-François Serignat
    • 2
  1. 1.Laboratorio de Tecnologías del LenguajeINAOEMéxico
  2. 2.Laboratoire CLIPS/IMAGFrance

Personalised recommendations