Extending and Updating the Finnish Wordnet
This paper presents simple methods for adding new words to a wordnet. We use the Finnish wordnet, FinnWordNet, as an example. We pay particular attention to high- and medium-frequency words thus far missing from FinnWordNet, and arrive at an estimate for the number of culture-specific words among them. We also find that the majority of the high- and medium-frequency words are compounds, which makes them relatively easy to add by using the head word of a compound to locate hypernym synset candidates. Another goal of ours is to add new synonyms to the existing synsets of FinnWordNet. We present a method that finds candidates for new synonyms from a bilingual lexical resource by exploiting the direct word sense translation correspondences between FinnWordNet and the Princeton WordNet. We apply the method to the interlanguage links between articles on the same topic in the Finnish and English Wikipedias on the one hand, and to the translations in the Finnish and English Wiktionaries on the other, and compare the results.
KeywordsNoun Phrase English Word Common Word Word Sense Proper Noun
We are grateful to our two reviewers, Prof Lars Borin and Dr Antti Arppe, for their many insightful comments that helped us clarify many of our claims and statements. All remaining errors are of course our own.
We also wish to acknowledge the FIN-CLARIN and META-NORD funding for making this work possible. The META-NORD project has received funding from the European Union’s ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 270899.
- Alkhalifa, Musa, and Horacio Rodríguez. 2009. Automatically extending NE coverage of Arabic WordNet using Wikipedia. In Proceedings of 3rd international conference on arabic language processing (CITALA’09), Rabat, Morocco, 23–30. http://www.emi.ac.ma/citala2009/docs/citala%20papers/%28N%B00%4-Paper%2036%29.pdf. Google Scholar
- Erdmann, Maike, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2008. An approach for extracting bilingual terminology from Wikipedia. In Database systems for advanced applications, eds. Jayant Haritsa, Ramamohanarao Kotagiri, and Vikram Pudi. Vol. 4947 of Lecture notes in computer science, 380–392. Heidelberg: Springer. doi: 10.1007/978-3-540-78568-2_28. CrossRefGoogle Scholar
- Jäppinen, Harri, Aarno Lehtola, Eero Nelimarkka, and Matti Ylilammi. 1983. Knowledge engineering approach to morphological analysis. In First conference of the European chapter of ACL, Pisa, Italy, 49–51. Google Scholar
- Jäppinen, Harri, and Matti Ylilammi. 1986. Associative model of morphological analysis: An empirical inquiry. Computational Linguistics 12: 257–272. Google Scholar
- Krizhanovsky, Andrew A. 2010. Transformation of Wiktionary entry structure into tables and relations in a relational database schema. CoRR 1011.1368. http://dblp.uni-trier.de/db/journals/corr/corr1011.html#abs-1%011-1368, preprint.
- Lindén, Krister, and Lauri Carlson. 2010. FinnWordNet—WordNet på finska via översättning. LexicoNordica 17: 119–140. Google Scholar
- Matuschek, Michael, and Iryna Gurevych. 2010. Beyond the synset: Synonyms in collaboratively constructed semantic resources. In Workshop on computational approaches to synonymy at the symposium on re-thinking synonymy, ed. Antti Arppe, 58–59. Helsinki: University of Helsinki. Google Scholar
- Meyer, Christian M., and Iryna Gurevych. 2010. Worth its weight in gold or yet another resource—a comparative study of Wiktionary, OpenThesaurus and GermaNet. In Proceedings of the 11th international conference on intelligent text processing and computational linguistics (CICLing 2010), ed. Alexander Gelbukh. Vol. 6008 of Lecture notes in computer science, 38–49. Berlin: Springer. http://www.informatik.tu-darmstadt.de/fileadmin/user_upload/G%roup_UKP/publikationen/2010/cicling2010-meyer-lsrcomparison.pdf. Google Scholar
- Meyer, Christian M., and Iryna Gurevych. 2011. What psycholinguists know about chemistry: Aligning Wiktionary and WordNet for increased domain coverage. In Proceedings of the 5th international joint conference on natural language processing (IJCNLP), 883–892. http://www.christian-meyer.org/research/publications/ijcnlp20%11/. Google Scholar
- Navarro, Emmanuel, Franck Sajous, Gau Bruno, Laurent Prévot, Hsieh ShuKai, Kuo Tzu-Yi, Pierre Magistry, and Huang Chu-Ren. 2009. Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 workshop on the people’s Web meets NLP: Collaboratively constructed semantic resources, 19–27. Stroudsburg: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1699765.1699768. CrossRefGoogle Scholar
- Navigli, Roberto, and Simone Paolo Ponzetto. 2010. BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, 216–225. Google Scholar
- Niemann, Elisabeth, and Iryna Gurevych. 2011. The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In Proceedings of the 9th international conference on computational semantics, Oxford, UK, 205–214. http://dl.acm.org/ft_gateway.cfm?id=2002691&type=pdf. Google Scholar
- Niemi, Jyrki, Krister Lindén, and Mirka Hyvärinen. 2012. Using a bilingual resource to add synonyms to a wordnet: FinnWordNet and Wikipedia as an example. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 227–231. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5. Google Scholar
- Pääkkö, Paula, and Krister Lindén. 2012. Finding a location for a new word in WordNet. In Proceedings of the 6th international global Wordnet conference (GWC 2012), 286–293. Matsue: Global WordNet Association. ISBN 978-80-263-0244-5. Google Scholar
- Ruiz-Casado, Maria, Enrique Alfonseca, and Pablo Castells. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Advances in Web intelligence, eds. Piotr Szczepaniak, Janusz Kacprzyk, and Adam Niewiadomski. Vol. 3528 of Lecture notes in computer science, 947–950. Berlin: Springer. doi: 10.1007/11495772_59. CrossRefGoogle Scholar
- Sjöbergh, Jonas, Olof Sjöbergh, and Kenji Araki. 2008. What types of translations hide in Wikipedia. In Proceedings of the 3rd international conference on large-scale knowledge resources: Construction and application (LKR’08), 59–66. Berlin: Springer. http://dl.acm.org/citation.cfm?id=1787800.1787808. CrossRefGoogle Scholar
- Stevenson, Angus, ed. 2010. Oxford dictionary of English. London: Oxford University Press. Oxford Reference Online. Google Scholar
- Toral, Antonio, Óscar Ferrández, Eneko Agirre, and Rafael Muñoz. 2009. A study on linking Wikipedia categories to Wordnet synsets using text similarity. In Proceedings of the international conference RANLP-2009, 449–454. Borovets: Association for Computational Linguistics. http://www.aclweb.org/anthology/R09-1080. Google Scholar
- Tyers, Francis M., and Jacques A. Pienaar. 2008. Extracting bilingual word pairs from Wikipedia. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages (A SALTMIL workshop), 19–22. http://ixa2.si.ehu.es/saltmil/files/EBWPFW.pdf. Google Scholar
- Valkonen, Kari, Harri Jäppinen, and Aarno Lehtola. 1987. Blackboard-based dependency parsing. In Proceedings of IJCAI’87, tenth international joint conference on artificial intelligence, 700–702. Google Scholar
- Zesch, Torsten, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the international conference on language resources and evaluation, LREC 2008, 1646–1652. Marrakech: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2008/summaries/420.html. Google Scholar