Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

Fišer, Darja

doi:10.1007/978-3-642-04235-5_31

Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

Darja Fišer²¹

Conference paper

661 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Abstract

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnets in various languages in order to disambiguate the entries and attach appropriate synset ids to Slovene entries in the lexicon. Slovene lexicon entries sharing the same attached synset id were then organized into a synset. The results obtained by the different settings in the experiment are evaluated against a manually created gold standard and also checked by hand.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Diab, M.: The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Corpora and an English WordNet. In: Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo (2004)
Google Scholar
Dyvik, H.: Translations as semantic mirrors: from parallel corpus to wordnet. Revised version of paper presented at the ICAME 2002 Conference in Gothenburg (2002)
Google Scholar
Erjavec, T., Fišer, D.: Building Slovene WordNet. In: Proceedings of the 5th International Conference on Language Resources and Evaluation LREC 2006, Genoa, Italy, May 24-26 (2006)
Google Scholar
Farreres, X., Rigau, G., Rodrguez, H.: Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
Google Scholar
Farreres, X., Gibert, K., Rodriguez, H.: Towards Binding Spanish Senses to Wordnet Senses through Taxonomy Alignment. In: Proceedings of the Second Global WordNet Conference, Brno, Czech Republic, pp. 259–264 (2004)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Giguet, E., Luquet, P.-S.: Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions (2006)
Google Scholar
Horak, A., Smrž, P.: New Features of Wordnet Editor VisDic. Romanian Journal of Information Science and Technology Special Issue 7(1-2) (2000)
Google Scholar
Ide, N., Erjavec, T., Tufis, D.: Sense Discrimination with Parallel Corpora. In: Proceedings of ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, pp. 54–60 (2002)
Google Scholar
Knight, K., Luk, S.: Building a Large-Scale Knowledge Base for Machine Translation. In: Proceedings of the American Association of Artificial Intelligence AAAI-1994, Seattle, WA (1994)
Google Scholar
Krstev, C., Pavlović-Lažetić, G., Vitas, D., Obradović, I.: Using textual resources in developing Serbian wordnet. Romanian Journal of Information Science and Technology 7(1-2), 147–161 (2004)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1) (2003)
Google Scholar
Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual. In: Proceedings of the First International Conference on Global WordNet, Mysore, India (2002)
Google Scholar
Resnik, P., Yarowsky, D.: A perspective on word sense disambiguation methods and their evaluation. In: ACL-SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How?, Washington, D.C, pp. 79–86 (1997)
Google Scholar
Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006)
Google Scholar
Tiedemann, J.: Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing, Doctoral Thesis. Studia Linguistica Upsaliensia 1 (2003)
Google Scholar
Tufis, D.: BalkaNet - Design and Development of a Multilingual Balkan WordNet. Romanian Journal of Information Science and Technology Special Issue 7(1-2) (2000)
Google Scholar
Van der Plas, L., Tiedemann, J.: Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity. In: Proceedings of ACL/COLING (2006)
Google Scholar
Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks for European Languages. Kluwer, Dordrecht (1998)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Translation, Faculty of Arts, University of Ljubljana, Aškerčeva 2, 1000, Ljubljana, Slovenia
Darja Fišer

Authors

Darja Fišer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, ul. Umultowska 87, P.O. Box, 61614, Poznań, Poland
Zygmunt Vetulani
Language Technology Lab, German Research Center for Artificial Intelligence (DFKI), Campus D 3 1, Stuhlsatzenhausweg 3, D-66123, Saarbrücken, Germany
Hans Uszkoreit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fišer, D. (2009). Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-04235-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics