Skip to main content

Parallel Corpora for WordNet Construction: Machine Translation vs. Automatic Sense Tagging

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7182)

Abstract

In this paper we present a methodology for WordNet construction based on the exploitation of parallel corpora with semantic annotation of the English source text. We are using this methodology for the enlargement of the Spanish and Catalan versions of WordNet 3.0, but the methodology can also be used for other languages. As big parallel corpora with semantic annotation are not usually available, we explore two strategies to overcome this problem: to use monolingual sense tagged corpora and machine translation, on the one hand; and to use parallel corpora and automatic sense tagging on the source text, on the other.

With these resources, the problem of acquiring a WordNet from parallel corpora can be seen as a word alignment task. Fortunately, this task is well known, and some aligning algorithms are freely available.

Keywords

  • lexical resources
  • wordnet
  • parallel corpora
  • machine translation
  • automatic sense tagging

This research has been carried out thanks to the Project MICINN, TIN2009-14715-C04-04 of the Spanish Ministry of Science and Innovation.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atserias, J., Climent, S., Farreres, X., Rigau, G., Rodriguez, H.: Combining multiple methods for the automatic construction of multi-lingual WordNets. In: Recent Advances in Natural Language Processing II. Selected papers from RANLP, vol. 97, pp. 327–338 (1997)

    Google Scholar 

  2. Azarova, I., Mitrofanova, O., Sinopalnikova, A., Yavorskaya, M., Oparin, I.: Russnet: Building a lexical database for the Russian language. In: Workshop on WordNet Structures and Standarisation, and how these affect WordNet Application and Evaluation, Las Palmas de Gran Canaria (Spain), pp. 60–64 (2002)

    Google Scholar 

  3. Benítez, S., Escudero, G., López, M., Rigau, G., Taulé, M.: Methods and tools for building the catalan WordNet. In: Proceedings of the ELRA Workshop on Language Resources for European Minority Languages (1998)

    Google Scholar 

  4. Brandt, M., Loftsson, H., Sigurρórsson, H., Tyers, F.: Apertium-IceNLP: a rule-based icelandic to english machine translation system. Reykjavik University, Reykjavík (2011) (unpublished paper)

    Google Scholar 

  5. Cilibrasi, R.L., Vitanyi, P.M.: The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)

    CrossRef  Google Scholar 

  6. Diab, M.: The feasibility of bootstrapping an arabic WordNet leveraging parallel corpora and an english WordNet. In: Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo (2004)

    Google Scholar 

  7. Fellbaum, C.: WordNet: An electronic lexical database. The MIT Press (1998)

    Google Scholar 

  8. Fišer, D.: Leveraging parallel corpora and existing wordnets for automatic construction of the slovene wordnet. In: Proceedings of the 3rd Language and Technology Conference, vol. 7, p. 3–5 (2007)

    Google Scholar 

  9. Ide, N., Erjavec, T., Tufis, D.: Sense discrimination with parallel corpora. In: Proceedings of the ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, vol. 8. p. 61–66 (2002)

    Google Scholar 

  10. Isahara, H., Bond, F., Uchimoto, K., Utiyama, M., Kanzaki, K.: Development of the japanese WordNet. In: Proceedings of the 6th LREC (2008)

    Google Scholar 

  11. Kazakov, D., Shahid, A.: Unsupervised construction of a multilingual WordNet from parallel corpora. In: Proceedings of the Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning, pp. 9–12 (2009)

    Google Scholar 

  12. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT summit, vol. 5 (2005)

    Google Scholar 

  13. Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the HLT-NAACL 2006 (2006)

    Google Scholar 

  14. Miller, G.A., Leacock, C., Tengi, R., Bunker, R.T.: A semantic concordance. In: Proceedings of the Workshop on Human Language Technology, HLT 1993, pp. 303–308. Association for Computational Linguistics, Stroudsburg (1993), ACM ID: 1075742

    CrossRef  Google Scholar 

  15. Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 216–225. Association for Computational Linguistics, Stroudsburg (2010), ACM ID: 1858704

    Google Scholar 

  16. Oliver, A., Climent, S.: Construcción de los wordnets 3.0 para castellano y catalán mediante traducción automática de corpus anotados semánticamente. In: Proceedings of the 27th Conference of the SEPLN, Huelva, Spain (2011)

    Google Scholar 

  17. Oliver, A., Climent, S.: Building wordnets by machine translation of sense tagged corpora. In: Proceedings of the Global WordNet Conference, Matsue, Japan (2012)

    Google Scholar 

  18. Padró, L., Reese, S., Agirre, E., Soroa, A.: Semantic services in freeling 2.1: Wordnet and UKB. In: Proceedings of the 5th International Conference of the Global WordNet Association (GWC 2010) (2010)

    Google Scholar 

  19. Pedersen, B., Nimb, S., Asmussen, J., Sørensen, N., Trap-Jensen, L., Lorentzen, H.: DanNet: the challenge of compiling a wordnet for danish by reusing a monolingual dictionary. Language resources and evaluation 43(3), 269–299 (2009)

    CrossRef  Google Scholar 

  20. Rajendran, S., Arulmozi, S., Shanmugam, B., Baskaran, S., Thiagarajan, S.: Tamil WordNet. In: Proceedings of the First International Global WordNet Conference, Mysore, vol. 152, pp. 271–274 (2002)

    Google Scholar 

  21. Sagot, B., Fišer, D.: Building a free french wordnet from multilingual resources. In: Proceedings of OntoLex 2008, Marrackech,Morocco (2008)

    Google Scholar 

  22. Saveski, M., Trajkovski, I.: Automatic construction of wordnets by using machine translation and language modeling. In: 13th Multiconference Information Society, Ljubljana, Slovenia (2010)

    Google Scholar 

  23. Sinha, M., Reddy, M., Bhattacharyya, P.: An approach towards construction and application of multilingual indo-wordnet. In: 3rd Global Wordnet Conference (GWC 2006), Jeju Island, Korea (2006)

    Google Scholar 

  24. Tufis, D., Cristea, D., Stamou, S.: BalkaNet: aims, methods, results and perspectives: a general overview. Science and Technology 7(1-2), 9–43 (2004)

    Google Scholar 

  25. Vandeghinste, V., Martens, S.: PaCo-MT-D4. 2. report on lexical selection. Tech. rep., Centre for Computational Linguistics - KULeuven (2010)

    Google Scholar 

  26. Vossen, P.: Introduction to Eurowordnet. Computers and the Humanities 32(2), 73–89 (1998)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oliver, A., Climent, S. (2012). Parallel Corpora for WordNet Construction: Machine Translation vs. Automatic Sense Tagging. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28601-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28600-1

  • Online ISBN: 978-3-642-28601-8

  • eBook Packages: Computer ScienceComputer Science (R0)