Combining Multiple Resources to Build Reliable Wordnets

Fišer, Darja; Sagot, Benoît

doi:10.1007/978-3-540-87391-4_10

Darja Fišer¹ &
Benoît Sagot²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

974 Accesses
5 Citations

Abstract

This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET parallel corpus, a subcorpus of the JRC Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC thesaurus. A representative sample of the generated synsets was evaluated against the goldstandards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks for European Languages. Kluwer, Dordrecht (1999)
Google Scholar
Tufiş, D.: Balkanet design and development of a multilingual balkan wordnet. Romanian Journal of Information Science and Technology 7 (2000)
Google Scholar
Farreres, X., Rigau, G., Rodriguez, H.: Using WordNet for building WordNets. In: Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
Google Scholar
Barbu, E., Mititelu, V.B.: Automatic building of Wordnets. In: Proceedings of RANLP 2005, Borovets, Bulgaria (2006)
Google Scholar
Jacquin, C., Desmontils, E., Monceaux, L.: French EuroWordNet, Lexical Database Improvements. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394. Springer, Heidelberg (2007)
Chapter Google Scholar
Pianta, E., Bentivogli, L., Girardi, C.: Multiwordnet: developing an aligned multilingual database. In: Proc. of the^1st Global WordNet Conf., Mysore, India (2002)
Google Scholar
Resnik, P., Yarowsky, D.: A perspective on word sense disambiguation methods and their evaluation. In: ACL SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How?, Washington, DC, United States (1997)
Google Scholar
Ide, N., Erjavec, T., Tufiş, D.: Sense discrimination with parallel corpora. In: Proc. of ACL 2002 Workshop on Word Sense Disambiguation (2002)
Google Scholar
Diab, M.: The feasibility of bootstrapping an Arabic Wordnet leveraging parallel corpora and an English Wordnet. In: Proc. of the Arabic Language Technologies and Resources (2004)
Google Scholar
Fišer, D.: Leveraging parallel corpora and existing wordnets for automatic construction of the Slovene Wordnet. In: Proc. of L&TC 2007, Poznań, Poland (2007)
Google Scholar
Fišer, D., Sagot, B.: Proc. of Ontolex 2008. In: Building a free French wordnet from multilingual resources (to appear, 2008)
Google Scholar
Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D., Varga, D.: The JRC Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proc. of LREC 2006 (2006)
Google Scholar
Tiedemann, J.: Combining clues for word alignment. In: Proc. of EACL 2003, Budapest, Hungary (2003)
Google Scholar
Erjavec, T., Fišer, D.: Building Slovene WordNet. In: Proc. of LREC 2006, Genoa, Italy (2006)
Google Scholar
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528. Springer, Heidelberg (2005)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of COLING 1992, Nantes, France (1992)
Google Scholar
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513. Springer, Heidelberg (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Fac. of Arts, Univ. of Ljubljana, Aškerčeva 2, 1000, Ljubljana, Slovenia
Darja Fišer
Alpage, INRIA / Paris 7, 30 rue du Ch. des rentiers, 75013, Paris, France
Benoît Sagot

Authors

Darja Fišer
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Sagot
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fišer, D., Sagot, B. (2008). Combining Multiple Resources to Build Reliable Wordnets. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Multiple Resources to Build Reliable Wordnets