Advertisement

Extending and Updating the Finnish Wordnet

  • Krister Lindén
  • Jyrki Niemi
  • Mirka Hyvärinen

Abstract

This paper presents simple methods for adding new words to a wordnet. We use the Finnish wordnet, FinnWordNet, as an example. We pay particular attention to high- and medium-frequency words thus far missing from FinnWordNet, and arrive at an estimate for the number of culture-specific words among them. We also find that the majority of the high- and medium-frequency words are compounds, which makes them relatively easy to add by using the head word of a compound to locate hypernym synset candidates. Another goal of ours is to add new synonyms to the existing synsets of FinnWordNet. We present a method that finds candidates for new synonyms from a bilingual lexical resource by exploiting the direct word sense translation correspondences between FinnWordNet and the Princeton WordNet. We apply the method to the interlanguage links between articles on the same topic in the Finnish and English Wikipedias on the one hand, and to the translations in the Finnish and English Wiktionaries on the other, and compare the results.

Keywords

Noun Phrase English Word Common Word Word Sense Proper Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We are grateful to our two reviewers, Prof Lars Borin and Dr Antti Arppe, for their many insightful comments that helped us clarify many of our claims and statements. All remaining errors are of course our own.

We also wish to acknowledge the FIN-CLARIN and META-NORD funding for making this work possible. The META-NORD project has received funding from the European Union’s ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 270899.

References

  1. Alkhalifa, Musa, and Horacio Rodríguez. 2009. Automatically extending NE coverage of Arabic WordNet using Wikipedia. In Proceedings of 3rd international conference on arabic language processing (CITALA’09), Rabat, Morocco, 23–30. http://www.emi.ac.ma/citala2009/docs/citala%20papers/%28N%B00%4-Paper%2036%29.pdf. Google Scholar
  2. Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2008. An approach for extracting bilingual terminology from Wikipedia. In Database systems for advanced applications, eds. Jayant Haritsa, Ramamohanarao Kotagiri, and Vikram Pudi. Vol. 4947 of Lecture notes in computer science, 380–392. Heidelberg: Springer. doi: 10.1007/978-3-540-78568-2_28. CrossRefGoogle Scholar
  3. Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2009. Improving the extraction of bilingual terminology from Wikipedia. ACM Transactions on Multimedia Computing Communications and Applications 5: 1–17. doi: 10.1145/1596990.1596995. CrossRefGoogle Scholar
  4. Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Cambridge: MIT Press. zbMATHGoogle Scholar
  5. Jäppinen, Harri, Aarno Lehtola, Eero Nelimarkka, and Matti Ylilammi. 1983. Knowledge engineering approach to morphological analysis. In First conference of the European chapter of ACL, Pisa, Italy, 49–51. Google Scholar
  6. Jäppinen, Harri, and Matti Ylilammi. 1986. Associative model of morphological analysis: An empirical inquiry. Computational Linguistics 12: 257–272. Google Scholar
  7. Krizhanovsky, Andrew A. 2010. Transformation of Wiktionary entry structure into tables and relations in a relational database schema. CoRR 1011.1368. http://dblp.uni-trier.de/db/journals/corr/corr1011.html#abs-1%011-1368, preprint.
  8. Lindén, Krister, and Lauri Carlson. 2010. FinnWordNet—WordNet på finska via översättning. LexicoNordica 17: 119–140. Google Scholar
  9. Matuschek, Michael, and Iryna Gurevych. 2010. Beyond the synset: Synonyms in collaboratively constructed semantic resources. In Workshop on computational approaches to synonymy at the symposium on re-thinking synonymy, ed. Antti Arppe, 58–59. Helsinki: University of Helsinki. Google Scholar
  10. Medelyan, Olena, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67: 716–754. doi: 10.1016/j.ijhcs.2009.05.004, http://dl.acm.org/citation.cfm?id=1618876.1619040. CrossRefGoogle Scholar
  11. Meyer, Christian M., and Iryna Gurevych. 2010. Worth its weight in gold or yet another resource—a comparative study of Wiktionary, OpenThesaurus and GermaNet. In Proceedings of the 11th international conference on intelligent text processing and computational linguistics (CICLing 2010), ed. Alexander Gelbukh. Vol. 6008 of Lecture notes in computer science, 38–49. Berlin: Springer. http://www.informatik.tu-darmstadt.de/fileadmin/user_upload/G%roup_UKP/publikationen/2010/cicling2010-meyer-lsrcomparison.pdf. Google Scholar
  12. Meyer, Christian M., and Iryna Gurevych. 2011. What psycholinguists know about chemistry: Aligning Wiktionary and WordNet for increased domain coverage. In Proceedings of the 5th international joint conference on natural language processing (IJCNLP), 883–892. http://www.christian-meyer.org/research/publications/ijcnlp20%11/. Google Scholar
  13. Navarro, Emmanuel, Franck Sajous, Gau Bruno, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 workshop on the people’s Web meets NLP: Collaboratively constructed semantic resources, 19–27. Stroudsburg: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1699765.1699768. CrossRefGoogle Scholar
  14. Navigli, Roberto, and Simone Paolo Ponzetto. 2010. BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, 216–225. Google Scholar
  15. Niemann, Elisabeth, and Iryna Gurevych. 2011. The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In Proceedings of the 9th international conference on computational semantics, Oxford, UK, 205–214. http://dl.acm.org/ft_gateway.cfm?id=2002691&type=pdf. Google Scholar
  16. Niemi, Jyrki, Krister Lindén, and Mirka Hyvärinen. 2012. Using a bilingual resource to add synonyms to a wordnet: FinnWordNet and Wikipedia as an example. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 227–231. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5. Google Scholar
  17. Pääkkö, Paula, and Krister Lindén. 2012. Finding a location for a new word in WordNet. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 286–293. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5. Google Scholar
  18. Ruiz-Casado, Maria, Enrique Alfonseca, and Pablo Castells. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Advances in Web intelligence, eds. Piotr Szczepaniak, Janusz Kacprzyk, and Adam Niewiadomski. Vol. 3528 of Lecture notes in computer science, 947–950. Berlin: Springer. doi: 10.1007/11495772_59. CrossRefGoogle Scholar
  19. Sjöbergh, Jonas, Olof Sjöbergh, and Kenji Araki. 2008. What types of translations hide in Wikipedia. In Proceedings of the 3rd international conference on large-scale knowledge resources: Construction and application (LKR’08), 59–66. Berlin: Springer. http://dl.acm.org/citation.cfm?id=1787800.1787808. CrossRefGoogle Scholar
  20. Stevenson, Angus, ed. 2010. Oxford dictionary of English. London: Oxford University Press. Oxford Reference Online. Google Scholar
  21. Toral, Antonio, Óscar Ferrández, Eneko Agirre, and Rafael Muñoz. 2009. A study on linking Wikipedia categories to Wordnet synsets using text similarity. In Proceedings of the international conference RANLP-2009, 449–454. Borovets: Association for Computational Linguistics. http://www.aclweb.org/anthology/R09-1080. Google Scholar
  22. Tyers, Francis M., and Jacques A. Pienaar. 2008. Extracting bilingual word pairs from Wikipedia. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages (A SALTMIL workshop), 19–22. http://ixa2.si.ehu.es/saltmil/files/EBWPFW.pdf. Google Scholar
  23. Valkonen, Kari, Harri Jäppinen, and Aarno Lehtola. 1987. Blackboard-based dependency parsing. In Proceedings of IJCAI’87, tenth international joint conference on artificial intelligence, 700–702. Google Scholar
  24. Vossen, Piek, ed. 1998. EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic. zbMATHGoogle Scholar
  25. Zesch, Torsten, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the international conference on language resources and evaluation, LREC 2008, 1646–1652. Marrakech: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2008/summaries/420.html. Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Krister Lindén
    • 1
  • Jyrki Niemi
    • 1
  • Mirka Hyvärinen
    • 1
  1. 1.Department of Modern LanguagesUniversity of HelsinkiHelsinkiFinland

Personalised recommendations