Abstract
We discuss the development of a multilingual lexicon linked to the Suggested Upper Merged Ontology (SUMO) formal ontology. The ontology as well as the lexicon have been expressed in Web Ontology Language (OWL), as well as their original formats, for use on the semantic web and in linked data. We describe the Open Multilingual Wordnet (OMW), a multilingual wordnet with 22 languages and a rich structure of semantic relations. It is made by exploiting links from various monolingual wordnets to the English Wordnet. Currently, it contains 118,337 concepts expressed in 1,643,260 senses in 22 languages. It is available as simple tab-separated files, Wordnet-Lexical Markup Framework (LMF) or lemon and had been used by many projects including BabelNet and Google Translate. We discuss some issues in extending the wordnets and improving the multilingual representation to cover concepts not lexicalized in English and how concepts are stated in the formal ontology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
http://sigma-01.cim3.net:8080/sigma/OWL.jsp?kb=SUMO also provides a “live” generation of OWL one term at a time, where “&term=name” can be appended to the URL and the desired term name substituted for “name.”
- 7.
- 8.
- 9.
Definition from the Open Knowledge Foundation: http://opendefinition.org/.
- 10.
- 11.
- 12.
With the extensions that were added with the Japanese translation by Masato Hagiwara (Bird et al. 2010).
- 13.
Thanks to John P. McCrae for help in adding this.
- 14.
- 15.
We are delighted to see that an Open Dutch Wordnet will be released soon (Vossen and Postma 2014) and will integrate it as soon the data is available.
- 16.
Note that according to psycholinguistic studies from Ahrens et al. (1998), there are two types of active complexity in natural language. The first is “triggered complexity” initiated by the speaker that involves puns; the second is “latent complexity” in which no pun or vagueness is intended. The Chinese Wordnet’s model focuses only on latent complexity.
- 17.
It would be possible to link ontologies other than SUMO. There are other ontologies with at least partial links to wordnet, including DOLCE (Gangemi et al. 2003) and the Kyoto Ontology (Laparra et al. 2012). We only discuss SUMO here, as it is both the largest ontology and the most fully integrated with the OMW.
References
Ahrens, K., Chang, L. L., Chen, K. J., & Huang, C.-R. (1998). Meaning representation and meaning instantiation for Chinese nominals. International Journal of Computational Linguistics and Chinese Language Processing, 3, 45–60.
Apresjan, J. (1973). Regular polysemy. Linguistics, 142(5), 5–32.
Benzmüller, C., & Pease, A. (2010). Progress in automating higher-order ontology reasoning. In B. Konev, R. Schmidt, & S. Schulz (Eds.), Workshop on Practical Aspects of Automated Reasoning (PAAR-2010). Edinburgh, UK: CEUR Workshop Proceedings.
Berners-Lee, T. (2009). Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. O’Reilly. www.nltk.org/book.
Bird, S., Klein, E., & Loper, E. (2010). Nyumon Shizen Gengo Shori [Introduction to natural language processing] (Hagiwara, Nakamura, & Mizuno, Trans.). Sebastopol: O’Reilly, Beijing, China.
Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Introducing the Arabic WordNet project. In P. Sojka, K.-S. Choi, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Third International WordNet Conference, Jeju, Korea, 295–299.
Bond, F., & Foster, R. (2013). Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013, Sofia (pp. 1352–1362). http://aclweb.org/anthology/P13-1133
Bond, F., Isahara, H., Fujita, S., Uchimoto, K., Kuribayashi, T., & Kanzaki, K. (2009). Enhancing the Japanese WordNet. In The 7th Workshop on Asian Language Resources (pp. 1–8). Singapore: ACL-IJCNLP 2009.
Bond, F., & Paik, K. (2012). A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue (pp. 64–71).
Borra, A., Pease, A., Roxas, R., & Dita, S. (2010). Introducing Filipino WordNet. In P. Bhattacharyya, C. Fellbaum, & P. Vossen (Eds.), Principles of Construction and Application of Multilingual Wordnets: Proceedings of the 5th Global WordNet Conference (pp. 306–310). Mumbai, India: Narosa Pub.
Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. (2006). Adding dense, weighted connections to WordNet. In Proceedings of the Third Global WordNet Meeting, Jeju.
Burnard, L. (2000). The British national corpus users reference guide. Oxford: Oxford University Computing Services.
Daude, J., Padro, L., & Rigau, G. (2003). Validation and tuning of Wordnet mapping techniques. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’03), Borovets, Bulgaria.
de Melo, G., Suchanek, F., & Pease, A. (2008). Integrating YAGO into the suggested upper merged ontology. In Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence.
de Paiva, V., & Rademaker, A. (2012). Revisiting a Brazilian wordnet. In Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue.
Fellbaum, C. (Ed.). (1998). WordNet: An electronic Lexical database. Cambridge: MIT Press.
Fellbaum, C., & Vossen, P. (2012). Challenges for a multilingual wordnet. Language Resources and Evaluation, 46(2), 313–326. Doi=10.1007/s10579-012-9186-z.
Gangemi, A., Guarino, N., Masolo, C., & Oltramari, A. (2003). Sweetening WordNet with DOLCE. AI Magazine, 24(3), 13–24.
Genesereth, M. (1991). Knowledge interchange format. In J. Allen, R. Fikes, & E. Sandewall (Eds.), Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning (pp. 238–249). Los Altos: Morgan Kaufman.
Gonzalez-Agirre, A., Laparra, E., & Rigau, G. (2012). Multilingual central repository version 3.0: Upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue.
Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., et al. (2010). Chinese wordnet: Design and implementation of a cross-lingual knowledge processing infrastructure. Journal of Chinese Information Processing, 24(2), 14–23 (in Chinese).
Isahara, H., Bond, F., Uchimoto, K., Utiyama, M., & Kanzaki, K. (2008). Development of the Japanese WordNet. In Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech.
Koide, S., Morita, T., Yamaguchi, T., Muljadi, H., & Takeda, H. (2006). OWL expressions on WordNet and EDR. In AI Society Semantic Web Ontology SIG 13, SIG-SWO-A601-03 (in Japanese). http://www.jaist.ac.jp/ks/labs/kbs-lab/sig-swo/fpapers.htm
Kunze, C., & Lemnitzer, L. (2002). Germanet — Representation, visualization, application. In LREC (pp. 1485–1491).
Laparra, E., Rigau, G., & Vossen, P. (2012). Mapping wordnet to the KYOTO ontology. In N. Calzolari, K. Choukri, T. Declerck, M. U. Dogan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC2012) (pp. 2584–2589). Luxembourg: Publ. European Language Resources Association (ELRA).
Lindén, K., & Carlson, L. (2010). Finnwordnet — wordnet påfinska via översättning. LexicoNordica — Nordic Journal of Lexicography, 17, 119–140. In Swedish with an English abstract.
McCrae, J., Spohr, D., & Cimiano, P. (2011). Linking lexical resources and ontologies on the semantic web with lemon. In The Semantic Web: Research and applications, Springer Berlin Heidelberg, (pp. 245–259).
Montazery, M., & Faili, H. (2010). Automatic Persian wordnet construction. In 23rd International Conference on Computational Linguistics (pp. 846–850).
Niles, I., & Pease, A. (2001). Toward a standard upper ontology. In C. Welty & B. Smith (Eds.), Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001) (pp. 2–9).
Niles, I., & Pease, A. (2003). Linking lexicons and ontologies: Mapping WordNet to the suggested upper merged ontology. In Proceedings of the IEEE International Conference on Information and Knowledge Engineering (pp. 412–416).
Nurril Hirfana Mohamed Noor, Sapuan, S., & Bond, F. (2011). Creating the open Wordnet Bahasa. In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), Singapore (pp. 258–267).
Ordan, N., & Wintner, S. (2007). Hebrew wordnet: A test case of aligning lexical databases across languages. International Journal of Translation, 19(1), 39–58.
Pease, A. (2006). Formal representation of concepts: The Suggested Upper Merged Ontology and its use in linguistics. In Ontolinguistics: How ontological status shapes the linguistic coding of concepts. New York: Mouton de Gruyter.
Pease, A. (2011). Ontology: A practical guide. Angwin, CA: Articulate Software Press.
Pease, A., & Benzmüller, C. (2013). Sigma: An integrated development environment for logical theories. AI Communications, 26, 9–97.
Pease, A., Fellbaum, C., & Vossen, P. (2008). Building the global WordNet grid. In Proceedings of the CIL-18 Workshop on Linguistic Studies of Ontology, Seoul, South Korea.
Pease, A., Sutcliffe, G., Siegel, N., & Trac, S. (2010). Large theory reasoning with SUMO at CASC. AI Communications, Special Issue on Practical Aspects of Automated Reasoning, 23(2–3), 137–144.
Pedersen, B.S., Nimb, S., Asmussen, J., Sørensen, N.H., Trap-Jensen, L., & Lorentzen, H. (2009). DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Language Resources and Evaluation, 43(3), 269–299.
Peters, W., Vossen, P., Díez-Orzas, P., & Adriens, G. (1998). Cross-linguistic alignment of wordnets with an inter-lingual-index. In P. Vossen (Ed.), Euro WordNet (pp. 149–251). Dordecht: Kluwer
Pianta, E., Bentivogli, L., & Girardi, C. (2002). Multiwordnet: Developing an aligned multilingual database. In Proceedings of the First International Conference on Global WordNet, Mysore, India (pp. 293–302).
Piasecki, M., Szpakowicz, S., & Broda, B. (2009). A Wordnet from the Ground Up. Wroclaw University of Technology Press. ISBN 978-83-7493-476-3. http://www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf
Pociello, E., Agirre, E., & Aldezabal, I. (2011). Methodology and construction of the Basque wordnet. Language Resources and Evaluation, 45(2), 121–142.
Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization, (pp. 27–48). Hillsdale, NJ, USA: Lawrence Erlbaum Associates. Reprinted in Readings in Cognitive Science. A Perspective from Psychology and Artificial Intelligence, A. Collins and E.E. Smith, editors, Morgan Kaufmann Publishers, Los Altos (CA), USA, 1991.
Ruci, E. (2008). On the current state of Albanet and related applications. Tech. Rep., University of Vlora. http://fjalnet.com/technicalreportalbanet.pdf.
Sagot, B., & Fišer, D. (2008). Building a free French wordnet from multilingual resources. In ELRA (Ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco.
Savas, B., Hayashi, Y., Monachini, M., Soria, C., & Calzolari, N. (2010). An LMF-based web service for accessing wordnet-type semantic lexicons. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta: European Language Resources Association (ELRA).
Thoongsup, S., Charoenporn, T., Robkop, K., Sinthurahat, T., Mokarat, C., Sornlertlamvanich, V., et al. (2009). Thai wordnet construction. In Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP), Suntec, Singapore.
van Assem, M., Gangemi, A., & Schreiber, G. (2006). Conversion of wordnet to a standard RDF/OWL representation. In Proceedings of The Fifth International Conference on Language Resources and Evaluation (LREC 2006).
Vincze, V., & Almázi, A. (2014). Non-lexicalized concepts in wordnets: A case study of English and Hungarian. In Proceedings of the 7th Global WordNet Conference (GWC 2014), Tartu (pp. 118–126).
Vossen, P. (Ed.). (1998). Euro WordNet. Dordecht: Kluwer.
Vossen, P., Maks, I., Segers, R., & Van der Vliet, H. (2008). Integrating lexical units, synsets and ontology in the Cornetto database. In LREC 2008. Marrakech, Morocco: European Language Resources Association (ELRA).
Vossen, P., Peters, W., & Gonzalo, J. (1999). Towards a universal index of meaning. In Proceedings of ACL-99 Workshop, Siglex-99, Standardizing Lexical Resources, Maryland (pp. 81–90).
Vossen, P., & Postma, M. (2014). Open Dutch wordnet. In Proceedings of the 7th Global WordNet Conference (GWC 2014), Tartu (presentation only).
Vossen, P., & Rigau, G. (2010). Division of semantic labor in the global wordnet grid. In P. Bhattacharyya, C. Fellbaum, & P. Vossen (Eds.), 5th Global Wordnet Conference: GWC-2010. Mumbai: Narosa Pub.
Vossen, P., Soria, C., & Monachini, M. (2013). LMF - Lexical markup framework. In G. Francopoulo (Ed.), LMF - Lexical markup framework, Chap. 4. New York: ISTE Ltd + Wiley.
Wang, S., & Bond, F. (2013). Building a Chinese wordnet: Starting from core synsets. In Proceedings of the 11th Workshop on Asian Language Resources, Nagoya.
Yoon, A., Hwang, S., Lee, E., & Kwon, H.-C. (2009). Construction of Korean wordnet KorLex 1.5. Journal of KIISE: Software and Applications, 36(1), 92–108.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bond, F., Fellbaum, C., Hsieh, SK., Huang, CR., Pease, A., Vossen, P. (2014). A Multilingual Lexico-Semantic Database and Ontology. In: Buitelaar, P., Cimiano, P. (eds) Towards the Multilingual Semantic Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43585-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-662-43585-4_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43584-7
Online ISBN: 978-3-662-43585-4
eBook Packages: Computer ScienceComputer Science (R0)