Skip to main content

A Multilingual Lexico-Semantic Database and Ontology

  • Chapter
  • First Online:
Towards the Multilingual Semantic Web

Abstract

We discuss the development of a multilingual lexicon linked to the Suggested Upper Merged Ontology (SUMO) formal ontology. The ontology as well as the lexicon have been expressed in Web Ontology Language (OWL), as well as their original formats, for use on the semantic web and in linked data. We describe the Open Multilingual Wordnet (OMW), a multilingual wordnet with 22 languages and a rich structure of semantic relations. It is made by exploiting links from various monolingual wordnets to the English Wordnet. Currently, it contains 118,337 concepts expressed in 1,643,260 senses in 22 languages. It is available as simple tab-separated files, Wordnet-Lexical Markup Framework (LMF) or lemon and had been used by many projects including BabelNet and Google Translate. We discuss some issues in extending the wordnets and improving the multilingual representation to cover concepts not lexicalized in English and how concepts are stated in the formal ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.globalwordnet.org/gwa/gwa_grid.html.

  2. 2.

    www.ontologyportal.org.

  3. 3.

    http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/sigma/suo-kif.pdf.

  4. 4.

    http://www.ontologyportal.org/SUMO.owl.

  5. 5.

    http://www.ontologyportal.org/WordNet.owl.

  6. 6.

    http://sigma-01.cim3.net:8080/sigma/OWL.jsp?kb=SUMO also provides a “live” generation of OWL one term at a time, where “&term=name” can be appended to the URL and the desired term name substituted for “name.”

  7. 7.

    http://globalwordnet.org/.

  8. 8.

    http://compling.ntu.edu.sg/omw.

  9. 9.

    Definition from the Open Knowledge Foundation: http://opendefinition.org/.

  10. 10.

    http://babelnet.org/about.jsp.

  11. 11.

    http://translate.google.com/about/intl/en_ALL/.

  12. 12.

    With the extensions that were added with the Japanese translation by Masato Hagiwara (Bird et al. 2010).

  13. 13.

    Thanks to John P. McCrae for help in adding this.

  14. 14.

    http://nlpwww.nict.go.jp/wn-ja/index.en.html.

  15. 15.

    We are delighted to see that an Open Dutch Wordnet will be released soon (Vossen and Postma 2014) and will integrate it as soon the data is available.

  16. 16.

    Note that according to psycholinguistic studies from Ahrens et al. (1998), there are two types of active complexity in natural language. The first is “triggered complexity” initiated by the speaker that involves puns; the second is “latent complexity” in which no pun or vagueness is intended. The Chinese Wordnet’s model focuses only on latent complexity.

  17. 17.

    It would be possible to link ontologies other than SUMO. There are other ontologies with at least partial links to wordnet, including DOLCE (Gangemi et al. 2003) and the Kyoto Ontology (Laparra et al. 2012). We only discuss SUMO here, as it is both the largest ontology and the most fully integrated with the OMW.

References

  • Ahrens, K., Chang, L. L., Chen, K. J., & Huang, C.-R. (1998). Meaning representation and meaning instantiation for Chinese nominals. International Journal of Computational Linguistics and Chinese Language Processing, 3, 45–60.

    Google Scholar 

  • Apresjan, J. (1973). Regular polysemy. Linguistics, 142(5), 5–32.

    Google Scholar 

  • Benzmüller, C., & Pease, A. (2010). Progress in automating higher-order ontology reasoning. In B. Konev, R. Schmidt, & S. Schulz (Eds.), Workshop on Practical Aspects of Automated Reasoning (PAAR-2010). Edinburgh, UK: CEUR Workshop Proceedings.

    Google Scholar 

  • Berners-Lee, T. (2009). Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.

    Article  Google Scholar 

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. O’Reilly. www.nltk.org/book.

  • Bird, S., Klein, E., & Loper, E. (2010). Nyumon Shizen Gengo Shori [Introduction to natural language processing] (Hagiwara, Nakamura, & Mizuno, Trans.). Sebastopol: O’Reilly, Beijing, China.

    Google Scholar 

  • Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Introducing the Arabic WordNet project. In P. Sojka, K.-S. Choi, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Third International WordNet Conference, Jeju, Korea, 295–299.

    Google Scholar 

  • Bond, F., & Foster, R. (2013). Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013, Sofia (pp. 1352–1362). http://aclweb.org/anthology/P13-1133

  • Bond, F., Isahara, H., Fujita, S., Uchimoto, K., Kuribayashi, T., & Kanzaki, K. (2009). Enhancing the Japanese WordNet. In The 7th Workshop on Asian Language Resources (pp. 1–8). Singapore: ACL-IJCNLP 2009.

    Google Scholar 

  • Bond, F., & Paik, K. (2012). A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue (pp. 64–71).

    Google Scholar 

  • Borra, A., Pease, A., Roxas, R., & Dita, S. (2010). Introducing Filipino WordNet. In P. Bhattacharyya, C. Fellbaum, & P. Vossen (Eds.), Principles of Construction and Application of Multilingual Wordnets: Proceedings of the 5th Global WordNet Conference (pp. 306–310). Mumbai, India: Narosa Pub.

    Google Scholar 

  • Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. (2006). Adding dense, weighted connections to WordNet. In Proceedings of the Third Global WordNet Meeting, Jeju.

    Google Scholar 

  • Burnard, L. (2000). The British national corpus users reference guide. Oxford: Oxford University Computing Services.

    Google Scholar 

  • Daude, J., Padro, L., & Rigau, G. (2003). Validation and tuning of Wordnet mapping techniques. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’03), Borovets, Bulgaria.

    Google Scholar 

  • de Melo, G., Suchanek, F., & Pease, A. (2008). Integrating YAGO into the suggested upper merged ontology. In Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence.

    Google Scholar 

  • de Paiva, V., & Rademaker, A. (2012). Revisiting a Brazilian wordnet. In Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue.

    Google Scholar 

  • Fellbaum, C. (Ed.). (1998). WordNet: An electronic Lexical database. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Fellbaum, C., & Vossen, P. (2012). Challenges for a multilingual wordnet. Language Resources and Evaluation, 46(2), 313–326. Doi=10.1007/s10579-012-9186-z.

  • Gangemi, A., Guarino, N., Masolo, C., & Oltramari, A. (2003). Sweetening WordNet with DOLCE. AI Magazine, 24(3), 13–24.

    Google Scholar 

  • Genesereth, M. (1991). Knowledge interchange format. In J. Allen, R. Fikes, & E. Sandewall (Eds.), Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning (pp. 238–249). Los Altos: Morgan Kaufman.

    Google Scholar 

  • Gonzalez-Agirre, A., Laparra, E., & Rigau, G. (2012). Multilingual central repository version 3.0: Upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012), Matsue.

    Google Scholar 

  • Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., et al. (2010). Chinese wordnet: Design and implementation of a cross-lingual knowledge processing infrastructure. Journal of Chinese Information Processing, 24(2), 14–23 (in Chinese).

    Google Scholar 

  • Isahara, H., Bond, F., Uchimoto, K., Utiyama, M., & Kanzaki, K. (2008). Development of the Japanese WordNet. In Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech.

    Google Scholar 

  • Koide, S., Morita, T., Yamaguchi, T., Muljadi, H., & Takeda, H. (2006). OWL expressions on WordNet and EDR. In AI Society Semantic Web Ontology SIG 13, SIG-SWO-A601-03 (in Japanese). http://www.jaist.ac.jp/ks/labs/kbs-lab/sig-swo/fpapers.htm

  • Kunze, C., & Lemnitzer, L. (2002). Germanet — Representation, visualization, application. In LREC (pp. 1485–1491).

    Google Scholar 

  • Laparra, E., Rigau, G., & Vossen, P. (2012). Mapping wordnet to the KYOTO ontology. In N. Calzolari, K. Choukri, T. Declerck, M. U. Dogan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC2012) (pp. 2584–2589). Luxembourg: Publ. European Language Resources Association (ELRA).

    Google Scholar 

  • Lindén, K., & Carlson, L. (2010). Finnwordnet — wordnet påfinska via översättning. LexicoNordica — Nordic Journal of Lexicography, 17, 119–140. In Swedish with an English abstract.

    Google Scholar 

  • McCrae, J., Spohr, D., & Cimiano, P. (2011). Linking lexical resources and ontologies on the semantic web with lemon. In The Semantic Web: Research and applications, Springer Berlin Heidelberg, (pp. 245–259).

    Google Scholar 

  • Montazery, M., & Faili, H. (2010). Automatic Persian wordnet construction. In 23rd International Conference on Computational Linguistics (pp. 846–850).

    Google Scholar 

  • Niles, I., & Pease, A. (2001). Toward a standard upper ontology. In C. Welty & B. Smith (Eds.), Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001) (pp. 2–9).

    Google Scholar 

  • Niles, I., & Pease, A. (2003). Linking lexicons and ontologies: Mapping WordNet to the suggested upper merged ontology. In Proceedings of the IEEE International Conference on Information and Knowledge Engineering (pp. 412–416).

    Google Scholar 

  • Nurril Hirfana Mohamed Noor, Sapuan, S., & Bond, F. (2011). Creating the open Wordnet Bahasa. In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), Singapore (pp. 258–267).

    Google Scholar 

  • Ordan, N., & Wintner, S. (2007). Hebrew wordnet: A test case of aligning lexical databases across languages. International Journal of Translation, 19(1), 39–58.

    Google Scholar 

  • Pease, A. (2006). Formal representation of concepts: The Suggested Upper Merged Ontology and its use in linguistics. In Ontolinguistics: How ontological status shapes the linguistic coding of concepts. New York: Mouton de Gruyter.

    Google Scholar 

  • Pease, A. (2011). Ontology: A practical guide. Angwin, CA: Articulate Software Press.

    Google Scholar 

  • Pease, A., & Benzmüller, C. (2013). Sigma: An integrated development environment for logical theories. AI Communications, 26, 9–97.

    Google Scholar 

  • Pease, A., Fellbaum, C., & Vossen, P. (2008). Building the global WordNet grid. In Proceedings of the CIL-18 Workshop on Linguistic Studies of Ontology, Seoul, South Korea.

    Google Scholar 

  • Pease, A., Sutcliffe, G., Siegel, N., & Trac, S. (2010). Large theory reasoning with SUMO at CASC. AI Communications, Special Issue on Practical Aspects of Automated Reasoning, 23(2–3), 137–144.

    MATH  MathSciNet  Google Scholar 

  • Pedersen, B.S., Nimb, S., Asmussen, J., Sørensen, N.H., Trap-Jensen, L., & Lorentzen, H. (2009). DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Language Resources and Evaluation, 43(3), 269–299.

    Article  Google Scholar 

  • Peters, W., Vossen, P., Díez-Orzas, P., & Adriens, G. (1998). Cross-linguistic alignment of wordnets with an inter-lingual-index. In P. Vossen (Ed.), Euro WordNet (pp. 149–251). Dordecht: Kluwer

    Google Scholar 

  • Pianta, E., Bentivogli, L., & Girardi, C. (2002). Multiwordnet: Developing an aligned multilingual database. In Proceedings of the First International Conference on Global WordNet, Mysore, India (pp. 293–302).

    Google Scholar 

  • Piasecki, M., Szpakowicz, S., & Broda, B. (2009). A Wordnet from the Ground Up. Wroclaw University of Technology Press. ISBN 978-83-7493-476-3. http://www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf

  • Pociello, E., Agirre, E., & Aldezabal, I. (2011). Methodology and construction of the Basque wordnet. Language Resources and Evaluation, 45(2), 121–142.

    Article  Google Scholar 

  • Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press.

    Google Scholar 

  • Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization, (pp. 27–48). Hillsdale, NJ, USA: Lawrence Erlbaum Associates. Reprinted in Readings in Cognitive Science. A Perspective from Psychology and Artificial Intelligence, A. Collins and E.E. Smith, editors, Morgan Kaufmann Publishers, Los Altos (CA), USA, 1991.

    Google Scholar 

  • Ruci, E. (2008). On the current state of Albanet and related applications. Tech. Rep., University of Vlora. http://fjalnet.com/technicalreportalbanet.pdf.

  • Sagot, B., & Fišer, D. (2008). Building a free French wordnet from multilingual resources. In ELRA (Ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco.

    Google Scholar 

  • Savas, B., Hayashi, Y., Monachini, M., Soria, C., & Calzolari, N. (2010). An LMF-based web service for accessing wordnet-type semantic lexicons. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta: European Language Resources Association (ELRA).

    Google Scholar 

  • Thoongsup, S., Charoenporn, T., Robkop, K., Sinthurahat, T., Mokarat, C., Sornlertlamvanich, V., et al. (2009). Thai wordnet construction. In Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP), Suntec, Singapore.

    Google Scholar 

  • van Assem, M., Gangemi, A., & Schreiber, G. (2006). Conversion of wordnet to a standard RDF/OWL representation. In Proceedings of The Fifth International Conference on Language Resources and Evaluation (LREC 2006).

    Google Scholar 

  • Vincze, V., & Almázi, A. (2014). Non-lexicalized concepts in wordnets: A case study of English and Hungarian. In Proceedings of the 7th Global WordNet Conference (GWC 2014), Tartu (pp. 118–126).

    Google Scholar 

  • Vossen, P. (Ed.). (1998). Euro WordNet. Dordecht: Kluwer.

    Google Scholar 

  • Vossen, P., Maks, I., Segers, R., & Van der Vliet, H. (2008). Integrating lexical units, synsets and ontology in the Cornetto database. In LREC 2008. Marrakech, Morocco: European Language Resources Association (ELRA).

    Google Scholar 

  • Vossen, P., Peters, W., & Gonzalo, J. (1999). Towards a universal index of meaning. In Proceedings of ACL-99 Workshop, Siglex-99, Standardizing Lexical Resources, Maryland (pp. 81–90).

    Google Scholar 

  • Vossen, P., & Postma, M. (2014). Open Dutch wordnet. In Proceedings of the 7th Global WordNet Conference (GWC 2014), Tartu (presentation only).

    Google Scholar 

  • Vossen, P., & Rigau, G. (2010). Division of semantic labor in the global wordnet grid. In P. Bhattacharyya, C. Fellbaum, & P. Vossen (Eds.), 5th Global Wordnet Conference: GWC-2010. Mumbai: Narosa Pub.

    Google Scholar 

  • Vossen, P., Soria, C., & Monachini, M. (2013). LMF - Lexical markup framework. In G. Francopoulo (Ed.), LMF - Lexical markup framework, Chap. 4. New York: ISTE Ltd + Wiley.

    Google Scholar 

  • Wang, S., & Bond, F. (2013). Building a Chinese wordnet: Starting from core synsets. In Proceedings of the 11th Workshop on Asian Language Resources, Nagoya.

    Google Scholar 

  • Yoon, A., Hwang, S., Lee, E., & Kwon, H.-C. (2009). Construction of Korean wordnet KorLex 1.5. Journal of KIISE: Software and Applications, 36(1), 92–108.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francis Bond .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bond, F., Fellbaum, C., Hsieh, SK., Huang, CR., Pease, A., Vossen, P. (2014). A Multilingual Lexico-Semantic Database and Ontology. In: Buitelaar, P., Cimiano, P. (eds) Towards the Multilingual Semantic Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43585-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43585-4_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43584-7

  • Online ISBN: 978-3-662-43585-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics