Language Resources and Evaluation

, Volume 45, Issue 2, pp 121–142 | Cite as

Methodology and construction of the Basque WordNet

  • Elisabete Pociello
  • Eneko Agirre
  • Izaskun Aldezabal
Original Paper

Abstract

Semantic interpretation of language requires extensive and rich lexical knowledge bases (LKB). The Basque WordNet is a LKB based on WordNet and its multilingual counterparts EuroWordNet and the Multilingual Central Repository. This paper reviews the theoretical and practical aspects of the Basque WordNet lexical knowledge base, as well as the steps and methodology followed in its construction. Our methodology is based on the joint development of wordnets and annotated corpora. The Basque WordNet contains 32,456 synsets and 26,565 lemmas, and is complemented by a hand-tagged corpus comprising 59,968 annotations.

Keywords

Lexical semantics Lexical knowledge bases Wordnet 

References

  1. Agirre, E., Aldezabal, I., Etxeberria, J., Izagirre, E., Mendizabal, K., Quintian, M., & Pociello, E. (2005). EuSemCor: Euskarako corpusa semantikoki etiketatzeko eskuliburua: Editatze- etiketatze- eta epaitze-lanak. Technical report, University of the Basque Country.Google Scholar
  2. Agirre, E., Ansa, O., Arregi, X., Arriola, J., Díaz de Ilarraza, A., Pociello, E., & Uria, L. (2002). Methodological issues in the building of the Basque WordNet: Quantitative and qualitative analysis. In Proceedings of first international wordnet conference. Mysore, India.Google Scholar
  3. Agirre, E., Ansa, O., Arregi, X., Artola, X., Zubillaga, X., Díaz de Ilarraza, A., & Lersundi, M. (2003). A conceptual schema for a Basque lexical-semantic framework. In Conference on computational lexicography and text research. Budapest, Hungary.Google Scholar
  4. Agirre, E., & Lersundi, M. (2001). Extraccióon de relaciones léxico-semánticas a partir de palabras derivadas usando patrones de definición. In Proceedings of the annual SEPLN meeting. Jaén, Spain.Google Scholar
  5. Agirre, E., & Martinez, D. (2002). Integrating selectional preferences in WordNet. In Proceedings of first international WordNet conference. Mysore, India.Google Scholar
  6. Aldezabal, I. (2004). Aditz-azpikategorizazioaren azterketa sintaxi partzialetik sintaxi osorako bidean. 100 aditzen azterketa. Levin-en (1993) lana oinarri hartuta eta metodo informatikoak baliatuz. PhD thesis, University of the Basque Country.Google Scholar
  7. Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B., & Vossen, P. (2004). The MEANING multilingual central repository. In Proceedings of the 2nd global WordNet conference. Brno, Czech Republic.Google Scholar
  8. Bentivogli, L., & Pianta, E. (2002). Extending WordNet with syntagmatic information. In Proceedings of second global WordNet conference. Brno, Czech Republic.Google Scholar
  9. Calzolari, N., Fillmore, C., Grishman, R., Ide, N., Lenci, A., MacLeod, C., & Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of the 3rd international conference on language resources and evaluation (LREC 2002). Las Palmas, Spain.Google Scholar
  10. Carletta, J. (1996). Assessing agreement on classication tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.Google Scholar
  11. Contreras, J. M., & Sueñer, A. (2004). Los procesos de la lexicalización. In E. Perez Gaztelu, I. Zabala, & L. Gràcia (Eds.), Las fronteras de la composición en lenguas románicas y en vasco (pp. 47–109). Deusto: University of Deusto.Google Scholar
  12. Cowie, A. P., Mackin, R., & McCaig, I. R. (1990). Oxford dictionary of current Idiomatic English: Verbs with prepositions and particles, v2. London: Oxford University Press.Google Scholar
  13. Cruse, A. (2000). Meaning in language: An introduction to semantics and pragmatics. London: Oxford University Press.Google Scholar
  14. Elhuyar, (1996). Elhuyar Hiztegia: Euskara-gaztelania. Donostia: Elhuyar Kultur Elkartea.Google Scholar
  15. Elhuyar, (1998). Elhuyar Hiztegi Txikia. Donostia: Elhuyar Kultur Elkartea.Google Scholar
  16. Elhuyar, (2000). Hiztegi Modernoa. Donostia: Elhuyar Kultur Elkartea.Google Scholar
  17. Euskaltzaindia, (2000). Hiztegi Batua. Donostia: Elkar.Google Scholar
  18. Fellbaum, C. (1998). WordNet. An electronic lexical database. Cambridge (Massachussetts): MIT Press.Google Scholar
  19. Fellbaum, C., Palmer, M., Dang, H. T., Delfs, L., & Wolf, S. (2001). Manual and automatic semantic annotation with WordNet. In Proceedings of the NAACL 2001 workshop on WordNet and other lexical resources. Pittsburgh.Google Scholar
  20. Fernández, A., Saint-Dizier, P., Vázquez, G., Kamel, M., & Benamara, F. (2002). The Volem project: A framework for the construction of advanced multilingual lexicons. In Proceedings of language engineering conference (LEC02). Hyderabad, India.Google Scholar
  21. Fillmore, C. J., & Baker, C. F. (2001). FrameNet: Frame semantics meets the corpus. In Proceedings of WordNet and other lexical resources workshop. Pittsburgh.Google Scholar
  22. Francopoulo, G., Bel, N., George, M., Calzolari, N., Monachini, M., Pet, M., & Soria, C. (2007). Lexical markup framework: ISO standard for semantic information in NLP lexicons. GLDV (Gesellschaft für linguistische Datenverarbeitung), Tubingen.Google Scholar
  23. Gonzalo, J., Chugur, I., Verdejo, F. (2000). Sense clusters for information retrieval: Evidencerom SemCor and the EuroWordNet interlingual index. In Proceedings of the SIGLEX workshop on word senses and multilinguality, in conjunction with ACL-2000. Hong Kong, China.Google Scholar
  24. Jackendoff, R. S. (1990). Semantic structure. Cambridge (Massachussetts): MIT Press.Google Scholar
  25. Kingsbury, P., & Palmer, M. (2002). From TreeBank to PropBank. In Proceedings of the 3rd international conference on language resources and evaluation (LREC-2002). Las Palmas, Spain.Google Scholar
  26. Lersundi, M. (2005). Ezagutza-base lexikala eraikitzeko Euskal Hiztegiko definizioen azterketa sintaktikosemantikoa. Hitzen arteko erlazio lexiko-semantikoak: Definizio-patroiak, eratorpena eta postposizioak. PhD thesis, University of the Basque Country.Google Scholar
  27. Levin, B. (1993). English verb classes and alternations. A preliminary investigation. Chicago: The University of Chicago Press.Google Scholar
  28. Lewandowski, T. (1992). Diccionario de Lingüística. Cátedra.Google Scholar
  29. Miller, G. A. (1985). WordNet: A dictionary browser. In Proceedings of the first international conference on information in data. Waterloo.Google Scholar
  30. Miller, G. A., Chodorow, M., Landes, S., Leacock, C., & Thomas, R. G. (1994). Using a semantic concordance for sense identification. In Proceedings of the ARPA human language technology workshop. San Francisco.Google Scholar
  31. Niles, I., & Pease, A. (2001). Towards a standard upper ontology. In Proceedings of the 2nd international conference on formal ontology in information systems, FOIS 2001. Ogunquit, Maine.Google Scholar
  32. Peters, W., & Peters, I. (2000). Automatic sense clustering in EuroWordNet. In Proceedings of LREC-2000. Athens, Greece.Google Scholar
  33. Pociello, E. (2008). Euskararen ezagutza-base lexikala: Euskal WordNet. PhD thesis, University of the Basque Country.Google Scholar
  34. Pociello, E., Gurrutxaga, A., Agirre, E., Aldezabal, I., & Rigau, G. (2008). WNTERM: Combining the Basque WordNet and a Terminological Dictionary. In Proceedings of the 6th international conference on language resources and evaluations (LREC). Marrakech.Google Scholar
  35. Pustejovsky, J. (1995). The Generative Lexicon. Cambridge: MIT Press.Google Scholar
  36. Rigau, G., Agirre, E., & Atserias, J. (2003). The MEANING project. In Proceedings of the XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN). Alcalá de Henares (Madrid).Google Scholar
  37. Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of the third international conference on intelligent text processing and computational linguistics (CICLING 2002). Mexico City, Mexico.Google Scholar
  38. Sarasola, I. (1996). Euskal Hiztegia. Kutxa Gizarte eta Kultur Fundazioa, Donostia.Google Scholar
  39. Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., & Grigoriadou, M. (2002). Balkanet: A multilingual semantic network for the Balkan Languages. In Proceedings of first international WordNet conference. Mysore, India.Google Scholar
  40. Tufis, D., Cristea, D., & Stamou, S. (2004). BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal of Information science and technology, 7-1-2, 9–44.Google Scholar
  41. UZEI (1987). Euskalterm. http://www1.euskadi.net/euskalterm/indice_c.htm. Accessed 17 March 2010.
  42. Vossen, P. (1997). EuroWordNet: A multilingual database for information retrieval. In Proceedings of the DELOS workshop on cross-language information retrieval. Zurich.Google Scholar
  43. Vossen, P. (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers.Google Scholar
  44. Vossen, P. (1999). EuroWordNet general document. EuroWordNet (LE2-4003, LE4-8328), part a, final document deliverable D032D033/2D014.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Elisabete Pociello
    • 1
  • Eneko Agirre
    • 2
  • Izaskun Aldezabal
    • 2
  1. 1.Elhuyar R&DUsurbilBasque Country
  2. 2.IXA NLP Research GroupDonostiaBasque Country

Personalised recommendations