Advertisement

Combining Multiple Resources to Build Reliable Wordnets

  • Darja Fišer
  • Benoît Sagot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5246)

Abstract

This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET parallel corpus, a subcorpus of the JRC Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC thesaurus. A representative sample of the generated synsets was evaluated against the goldstandards.

Keywords

Word Sense Disambiguation Multiple Resource Parallel Corpus Translation Approach Word Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks for European Languages. Kluwer, Dordrecht (1999)Google Scholar
  2. 2.
    Tufiş, D.: Balkanet design and development of a multilingual balkan wordnet. Romanian Journal of Information Science and Technology 7 (2000)Google Scholar
  3. 3.
    Farreres, X., Rigau, G., Rodriguez, H.: Using WordNet for building WordNets. In: Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)Google Scholar
  4. 4.
    Barbu, E., Mititelu, V.B.: Automatic building of Wordnets. In: Proceedings of RANLP 2005, Borovets, Bulgaria (2006)Google Scholar
  5. 5.
    Jacquin, C., Desmontils, E., Monceaux, L.: French EuroWordNet, Lexical Database Improvements. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Pianta, E., Bentivogli, L., Girardi, C.: Multiwordnet: developing an aligned multilingual database. In: Proc. of the1st Global WordNet Conf., Mysore, India (2002)Google Scholar
  7. 7.
    Resnik, P., Yarowsky, D.: A perspective on word sense disambiguation methods and their evaluation. In: ACL SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How?, Washington, DC, United States (1997)Google Scholar
  8. 8.
    Ide, N., Erjavec, T., Tufiş, D.: Sense discrimination with parallel corpora. In: Proc. of ACL 2002 Workshop on Word Sense Disambiguation (2002)Google Scholar
  9. 9.
    Diab, M.: The feasibility of bootstrapping an Arabic Wordnet leveraging parallel corpora and an English Wordnet. In: Proc. of the Arabic Language Technologies and Resources (2004)Google Scholar
  10. 10.
    Fišer, D.: Leveraging parallel corpora and existing wordnets for automatic construction of the Slovene Wordnet. In: Proc. of L&TC 2007, Poznań, Poland (2007)Google Scholar
  11. 11.
    Fišer, D., Sagot, B.: Proc. of Ontolex 2008. In: Building a free French wordnet from multilingual resources (to appear, 2008)Google Scholar
  12. 12.
    Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D., Varga, D.: The JRC Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proc. of LREC 2006 (2006)Google Scholar
  13. 13.
    Tiedemann, J.: Combining clues for word alignment. In: Proc. of EACL 2003, Budapest, Hungary (2003)Google Scholar
  14. 14.
    Erjavec, T., Fišer, D.: Building Slovene WordNet. In: Proc. of LREC 2006, Genoa, Italy (2006)Google Scholar
  15. 15.
    Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of COLING 1992, Nantes, France (1992)Google Scholar
  17. 17.
    Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513. Springer, Heidelberg (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Darja Fišer
    • 1
  • Benoît Sagot
    • 2
  1. 1.Fac. of ArtsUniv. of LjubljanaLjubljanaSlovenia
  2. 2.Alpage, INRIA / Paris 7ParisFrance

Personalised recommendations