Valuable Language Resources and Applications Supporting the Use of Basque

  • Iñaki Alegria
  • Maxux Aranzabe
  • Xabier Arregi
  • Xabier Artola
  • Arantza Díaz de Ilarraza
  • Aingeru Mayor
  • Kepa Sarasola
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)

Abstract

We present some Language Technology applications and resources that have proven to be valuable tools to promote the use of Basque, a low density language. We also present the strategy we have followed for almost twenty years to develop those tools and derived applications as the top of an integrated environment of language resources, language tools and other applications. In our opinion, if Basque is now in a quite good position in Language Technology is because those guidelines have been followed.

Keywords

Language resources Language Technology applications Strategy for Language Technology development 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aduriz, I., Agirre, E., Aldezabal, I., Alegria, I., Ansa, O., Arregi, X., Arriola, J.M., Artola, X., Díaz de Ilarraza, A., Ezeiza, N., Gojenola, K., Maritxalar, M., Oronoz, M., Sarasola, K., Soroa, A., Urizar, R.: A framework for the automatic processing of Basque. In: Proceedings of Workshop on Lexical Resources for Minority Languages (1998)Google Scholar
  2. 2.
    Aduriz, I., Alegria, I., Artola, X., Ezeiza, N., Sarasola, K., Urkia, M.: A spelling corrector for Basque based on morphology. Literary and Linguistic Computing 12(1), 31–38 (1997)CrossRefGoogle Scholar
  3. 3.
    Aduriz, I., Aranzabe, M., Arriola, J.M., Atutxa, A., Díaz de Ilarraza, A., Ezeiza, N., Gojenola, K., Oronoz, M., Soroa, A., Urizar, R.: Methodology and steps towards the construction of epec, a corpus of written basque tagged at morphological and syntactic levels for the automatic processing. In: Archer, D., Rayson, P., Wilson, A., McEnery, T. (eds.) Proceedings of the Corpus Linguistics 2003 Conference, March 28-31, vol. 16 (1), pp. 10–11. Lancaster University, UK (2003)Google Scholar
  4. 4.
    Alegria, I., Díaz de Ilarraza, A., Labaka, G., Lersundi, M., Mayor, A., Sarasola, K.: Transfer-based MT from spanish into basque: Reusability, standardization and open source. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 374–384. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Amorrortu, E.: Bilingual Education in the Basque Country: Achievements and Challenges after Four Decades of Acquisition Planning. Journal of Iberian and Latin American Literary and Cultural Studies 2(2) (2002)Google Scholar
  6. 6.
    Areta, N., Gurrutxaga, A., Leturia, I., Alegria, I., Artola, X., Díaz de Ilarraza, A., Ezeiza, N., Sologaistoa, A.: ZT Corpus: Annotation and tools for Basque corpora. In: Corpus Linguistics, Birmingham (2007)Google Scholar
  7. 7.
    Artola, X., Díaz de Ilarraza, A., Soroa, A., Sologaistoa, A.: Dealing with Complex Linguistic Annotations within a Language Processing Framework. IEEE Transactions on Audio, Speech, and Language Processing 17(5), 904–915 (2009)CrossRefGoogle Scholar
  8. 8.
    Bengoetxea, K., Gojenola, K.: Desarrollo de un analizador sintctico estadstico basado en dependencias para el euskera. Procesamiento del Lenguaje Natural 1(39), 5–12 (2007)Google Scholar
  9. 9.
    Borin, L.: Linguistic diversity in the information society. In: SALTMIL 2009 Workshop: IR-IE-LRL Information Retrieval and Information Extraction for Less Resourced Languages. University of the Basque Country (2009)Google Scholar
  10. 10.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  11. 11.
    Koskenniemi, K.: Two-level morphology: a general computational model for word-form recognition and production, University of Helsinki (1983)Google Scholar
  12. 12.
    Krauwer, S.: The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap. In: International Workshop Speech and Computer, Moscow, Russia (2003)Google Scholar
  13. 13.
    Labaka, G.: EUSMT: Incorporating Linguistic Information into SMT for a Morphologically Rich Language. Its use in SMT-RBMT-EBMT hybridation. PhD thesis. University of the Basque Country (UPV-EHU), Donosita, Basque Country (2010), http://ixa.si.ehu.es/lxa/Argitalpenak/Tesiak/1271852575/publikoak/GorkaLabaka.Thesis.pdf
  14. 14.
    Petek, B.: Funding for research into human language technologies for less prevalent languages. In: Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece (2000)Google Scholar
  15. 15.
    Streiter, O., Scannell, K., Stuflesser, M.: Implementing nlp projects for noncentral languages: instructions for funding bodies, strategies for developers. Machine Translation 20(4), 267–289 (2006)CrossRefGoogle Scholar
  16. 16.
    Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell (1998)MATHGoogle Scholar
  17. 17.
    Williams, B., Sarasola, K., ÓCróinin, D., Petek, B.: Speech and Language Technology for Minority Languages. In: Proceedings of Eurospeech 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Iñaki Alegria
    • 1
  • Maxux Aranzabe
    • 1
  • Xabier Arregi
    • 1
  • Xabier Artola
    • 1
  • Arantza Díaz de Ilarraza
    • 1
  • Aingeru Mayor
    • 1
  • Kepa Sarasola
    • 1
  1. 1.Ixa Group.University of the Basque CountrySpain

Personalised recommendations