Skip to main content

Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Abstract

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnets in various languages in order to disambiguate the entries and attach appropriate synset ids to Slovene entries in the lexicon. Slovene lexicon entries sharing the same attached synset id were then organized into a synset. The results obtained by the different settings in the experiment are evaluated against a manually created gold standard and also checked by hand.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Diab, M.: The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Corpora and an English WordNet. In: Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo (2004)

    Google Scholar 

  2. Dyvik, H.: Translations as semantic mirrors: from parallel corpus to wordnet. Revised version of paper presented at the ICAME 2002 Conference in Gothenburg (2002)

    Google Scholar 

  3. Erjavec, T., Fišer, D.: Building Slovene WordNet. In: Proceedings of the 5th International Conference on Language Resources and Evaluation LREC 2006, Genoa, Italy, May 24-26 (2006)

    Google Scholar 

  4. Farreres, X., Rigau, G., Rodrguez, H.: Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)

    Google Scholar 

  5. Farreres, X., Gibert, K., Rodriguez, H.: Towards Binding Spanish Senses to Wordnet Senses through Taxonomy Alignment. In: Proceedings of the Second Global WordNet Conference, Brno, Czech Republic, pp. 259–264 (2004)

    Google Scholar 

  6. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  7. Giguet, E., Luquet, P.-S.: Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions (2006)

    Google Scholar 

  8. Horak, A., Smrž, P.: New Features of Wordnet Editor VisDic. Romanian Journal of Information Science and Technology Special Issue 7(1-2) (2000)

    Google Scholar 

  9. Ide, N., Erjavec, T., Tufis, D.: Sense Discrimination with Parallel Corpora. In: Proceedings of ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, Philadelphia, pp. 54–60 (2002)

    Google Scholar 

  10. Knight, K., Luk, S.: Building a Large-Scale Knowledge Base for Machine Translation. In: Proceedings of the American Association of Artificial Intelligence AAAI-1994, Seattle, WA (1994)

    Google Scholar 

  11. Krstev, C., Pavlović-Lažetić, G., Vitas, D., Obradović, I.: Using textual resources in developing Serbian wordnet. Romanian Journal of Information Science and Technology 7(1-2), 147–161 (2004)

    Google Scholar 

  12. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1) (2003)

    Google Scholar 

  13. Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual. In: Proceedings of the First International Conference on Global WordNet, Mysore, India (2002)

    Google Scholar 

  14. Resnik, P., Yarowsky, D.: A perspective on word sense disambiguation methods and their evaluation. In: ACL-SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How?, Washington, D.C, pp. 79–86 (1997)

    Google Scholar 

  15. Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006)

    Google Scholar 

  16. Tiedemann, J.: Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing, Doctoral Thesis. Studia Linguistica Upsaliensia 1 (2003)

    Google Scholar 

  17. Tufis, D.: BalkaNet - Design and Development of a Multilingual Balkan WordNet. Romanian Journal of Information Science and Technology Special Issue 7(1-2) (2000)

    Google Scholar 

  18. Van der Plas, L., Tiedemann, J.: Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity. In: Proceedings of ACL/COLING (2006)

    Google Scholar 

  19. Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks for European Languages. Kluwer, Dordrecht (1998)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fišer, D. (2009). Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics