Skip to main content

Experiments on the Construction of a Phonetically Balanced Corpus from the Web

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Abstract

The construction of a speech recognition system requires a recorded set of phrases to compute the pertinent acoustic models. This set of phrases must be phonetically rich and balanced in order to obtain a robust recognizer. By tradition, this set is defined manually implicating a great human effort. In this paper we propose an automated method for assembling a phonetically balanced corpus (set of phrases) from the Web. The proposed method was used to construct a phonetically balanced corpus for the Mexican Spanish language.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vaufreydaz, D., Bergamini, C., Serignat, J.F., Besacier, L., Akbar, M.: A New Methodology for Speech Corpora Definition from Internet Documents. In: LREC 2000 Language Resources & Evaluation international Conference, Athens, Greece (2000)

    Google Scholar 

  2. Galicia-Haro, S.: Procesamiento de Textos Electrónicos para la Construcción de un Corpus. In: CORE 2003, México, D.F (2003)

    Google Scholar 

  3. Gelbukh, A., Sidorov, G., Chanona, L.: Compilation of a Spanish Representative Corpus. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 285. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Uraga, E., Pineda, L.: Automatic generation of pronunciation lexicons for Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 330. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Pérez, H.E.: Frecuencia de fonemas. Revista Electrónica de la Red Temática en Tecnologías del Habla, Número 1, Marzo (2003)

    Google Scholar 

  6. Alarcos-Llorach, E.: Fonología española, Madrid, Gredos (1965)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Villaseñor-Pineda, L., Montes-y-Gómez, M., Vaufreydaz, D., Serignat, JF. (2004). Experiments on the Construction of a Phonetically Balanced Corpus from the Web. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics