Advertisement

Language Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish

  • G. Bordel
  • A. Ezeiza
  • K. Lopez de Ipina
  • J. M. López
  • M. Peñagarikano
  • E. Zulueta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3773)

Abstract

Automatic Indexing of Broadcast News is a developing research area of great recent interest [1]. This paper describes the development steps for designing an automatic index system of broadcast news for both Basque and Spanish. This application requires of appropriate Language Resources to design all the components of the system. Nowadays, large and well-defined resources can be found in most widely used languages, but there is a lot of work to do with respect to minority languages. Even if Spanish has much more resources than Basque, this work has parallel efforts for both languages. These two languages have been chosen because they are evenly official in the Basque Autonomous Community and they are used in many mass media of the Community including the Basque Public Radio and Television EITB [2].

Keywords

Basque Country Minority Language Textual Sample Language Resource Vocabulary Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Vandecatseye, A., Martens, J.P., Neto, J., Meinedo, H., Garcia-Mateo, C., Dieguez, F.J., Mihelic, F., Zibert, J., Nouza, J., David, P., Pleva, M., Cizmar, A., Papageorgiou, H., Alexandris, C.: The COST278 pan-European Broadcast News Database. In: Proceedings of LREC 2004, Lisbon, Portugal (2004)Google Scholar
  2. 2.
    EITB Basque Public Radio and Television, http://www.eitb.com/
  3. 3.
  4. 4.
    Alegria, I., Artola, X., Sarasola, K., Urkia, M.: Automatic morphological analysis of Basque. In: Literary & Linguistic Computing, vol. 11(4), pp. 193–203. Oxford Univ. Press, Oxford (1996)Google Scholar
  5. 5.
    Peñagarikano, M., Bordel, G., Varona, A., de Ipina, L.: Using non-word Lexical Units in Automatic Speech Understanding. In: Proceedings of IEEE, ICASSP 1999, Phoenix, Arizona (1999)Google Scholar
  6. 6.
    Lopez de Ipiña, K., Graña, M., Ezeiza, N., Hernández, M., Zulueta, E., Ezeiza, A., Tovar, C.: Selection of Lexical Units for Continuous Speech Recognition of Basque. Progress in Pattern Recognition, Speech and Image Analysis, 244–250 (2003)Google Scholar
  7. 7.
    Lopez de Ipina, K., Ezeiza, N.: Bordel. N., Graña M.: Automatic Morphological Segmentation for Speech Processing in Basque IEEE TTS Workshop. Santa Monica USA (2002)Google Scholar
  8. 8.
    Egunkaria, Euskaldunon Egunkaria, the only newspaper in Basque, which has been recently replaced by Berria, online, at http://www.berria.info/
  9. 9.
    GARA, local Basque Country newspaper in Spanish, online, at http://www.gara.net/
  10. 10.
    Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: First International Conference on Language Resources and Evaluation, LREC 1998 (1998)Google Scholar
  11. 11.
    Linguistic Data Consortium, Design Specifications for the Transcription of Spoken Language, available online, at http://www.ldc.upenn.edu/Projects/Corpus_Cookbook

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • G. Bordel
    • 1
  • A. Ezeiza
    • 2
  • K. Lopez de Ipina
    • 3
  • J. M. López
    • 3
  • M. Peñagarikano
    • 1
  • E. Zulueta
    • 3
  1. 1.University of the Basque CountryLeioa
  2. 2.Ixa taldea. Sistemen Ingeniaritza eta Automatika SailaDonostia
  3. 3.Sistemen Ingeniaritza eta Automatika SailaGasteiz

Personalised recommendations