Abstract
The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnets in various languages in order to disambiguate the entries and attach appropriate synset ids to Slovene entries in the lexicon. Slovene lexicon entries sharing the same attached synset id were then organized into a synset. The results obtained by the different settings in the experiment are evaluated against a manually created gold standard and also checked by hand.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Diab, M.: The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Corpora and an English WordNet. In: Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo (2004)
Dyvik, H.: Translations as semantic mirrors: from parallel corpus to wordnet. Revised version of paper presented at the ICAME 2002 Conference in Gothenburg (2002)
Erjavec, T., Fišer, D.: Building Slovene WordNet. In: Proceedings of the 5th International Conference on Language Resources and Evaluation LREC 2006, Genoa, Italy, May 24-26 (2006)
Farreres, X., Rigau, G., Rodrguez, H.: Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
Farreres, X., Gibert, K., Rodriguez, H.: Towards Binding Spanish Senses to Wordnet Senses through Taxonomy Alignment. In: Proceedings of the Second Global WordNet Conference, Brno, Czech Republic, pp. 259–264 (2004)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Giguet, E., Luquet, P.-S.: Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions (2006)
Horak, A., Smrž, P.: New Features of Wordnet Editor VisDic. Romanian Journal of Information Science and Technology Special Issue 7(1-2) (2000)
Ide, N., Erjavec, T., Tufis, D.: Sense Discrimination with Parallel Corpora. In: Proceedings of ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, pp. 54–60 (2002)
Knight, K., Luk, S.: Building a Large-Scale Knowledge Base for Machine Translation. In: Proceedings of the American Association of Artificial Intelligence AAAI-1994, Seattle, WA (1994)
Krstev, C., Pavlović-Lažetić, G., Vitas, D., Obradović, I.: Using textual resources in developing Serbian wordnet. Romanian Journal of Information Science and Technology 7(1-2), 147–161 (2004)
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1) (2003)
Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual. In: Proceedings of the First International Conference on Global WordNet, Mysore, India (2002)
Resnik, P., Yarowsky, D.: A perspective on word sense disambiguation methods and their evaluation. In: ACL-SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How?, Washington, D.C, pp. 79–86 (1997)
Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006)
Tiedemann, J.: Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing, Doctoral Thesis. Studia Linguistica Upsaliensia 1 (2003)
Tufis, D.: BalkaNet - Design and Development of a Multilingual Balkan WordNet. Romanian Journal of Information Science and Technology Special Issue 7(1-2) (2000)
Van der Plas, L., Tiedemann, J.: Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity. In: Proceedings of ACL/COLING (2006)
Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks for European Languages. Kluwer, Dordrecht (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fišer, D. (2009). Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)