Advertisement

Applying Linked Data Principles to Linking Multilingual Wordnets

  • Philipp Cimiano
  • Christian Chiarcos
  • John P. McCrae
  • Jorge Gracia
Chapter

Abstract

Wordnets are the most widely used lexical resources in natural language processing (NLP). There exist wordnets in more than 40 languages by now and all of these are connected to the original Princeton WordNet. The origins of linguistic linked data (LD) can thus in some sense be traced to the WordNet project. The implementation of the linking, however, has not relied on stable identifiers and has thus led to technical problems of reference when new versions of a wordnet are released. This chapter describes how linked data principles have been applied in the development of the Global WordNet Grid (GWG), an attempt to form a catalogue of interlingual contexts that extends beyond the Anglo-Saxon roots of the Princeton WordNet. We will describe in particular how LD technologies have been used in realizing a Collaborative Interlingual Index (CILI) that builds on standard LD vocabularies and the resource description framework (RDF) data model. We finally describe a method to link wordnets to external resources such as DBpedia/Wikipedia.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    G.A. Miller, WordNet: a lexical database for English. Commun. Assoc. Comput. Mach. 38(11), 39 (1995)CrossRefGoogle Scholar
  2. 2.
    C. Fellbaum, Wordnet, in Theory and Applications of Ontology: Computer Applications (Springer, Berlin, 2010), pp. 231–243CrossRefGoogle Scholar
  3. 3.
    S. Rothe, H. Schütze, Autoextend: extending word embeddings to embeddings for synsets and lexemes, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (2015), pp. 1793–1803Google Scholar
  4. 4.
    K.K. Schuler, VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis (University of Pennsylvania, Pennsylvania, 2005)Google Scholar
  5. 5.
    C.F. Baker, C.J. Fillmore, J.B. Lowe, The Berkeley FrameNet project, in Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, 1998), pp. 86–90Google Scholar
  6. 6.
    A. Esuli, F. Sebastiani, SentiWordNet: a high-coverage lexical resource for opinion mining, in Technical Report ISTI-PP-002/2007, Institute of Information Science and Technologies (ISTI) of the Italian National Research Council (CNR) (2006). http://tcc.itc.it/projects/ontotext/Publications/sentiWN-TR.pdf
  7. 7.
    J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, F.F. Li, ImageNet: a large-scale hierarchical image database, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255Google Scholar
  8. 8.
    M. Maziarz, M. Piasecki, E. Rudnicka, S. Szpakowicz, P. Kedzia, plWordNet 3.0—a comprehensive lexical-semantic resource, in Proceedings of the 26th International Conference on Computational Linguistics (COLING), ed. by N. Calzolari, Y. Matsumoto, R. Prasad (ACL, Osaka, 2016), pp. 2259–2268Google Scholar
  9. 9.
    F. Bond, R. Foster, Linking and extending an open multilingual wordnet, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (The Association for Computer Linguistics, Stroudsburg, 2013), pp. 1352–1362Google Scholar
  10. 10.
    P. Vossen, EuroWordNet General Document (University of Amsterdam, The Netherlands, 1999). Technical Report. http://www.illc.uva.nl/EuroWordNet/
  11. 11.
    P. Vossen, Introduction to eurowordnet. Comput. Hum. 32(2-3), 73 (1998)Google Scholar
  12. 12.
    S. Stamou, K. Oflazer, K. Pala, D. Christoudoulakis, D. Cristea, D. Tufis, S. Koeva, G. Totkov, D. Dutoit, M. Grigoriadou, BalkaNet: a multilingual semantic network for the Balkan languages, in Proceedings of the International Wordnet Conference, Mysore, India (2002), pp. 21–25Google Scholar
  13. 13.
    P. Bhattacharyya, IndoWordNet, in Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC) (2010)Google Scholar
  14. 14.
    M. Van Assem, A. Gangemi, G. Schreiber, Conversion of WordNet to a standard RDF/OWL representation, in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa (2006), pp. 237–242Google Scholar
  15. 15.
    J. McCrae, E. Montiel-Ponsoda, P. Cimiano, Integrating WordNet and Wiktionary with lemon, in Linked Data in Linguistics (Springer, Berlin, 2012), pp. 25–34CrossRefGoogle Scholar
  16. 16.
    R. Navigli, S.P. Ponzetto, BabelNet: building a very large multilingual semantic network, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010), pp. 216–225Google Scholar
  17. 17.
    M. Ehrmann, D. Vannela, J.P. McCrae, F. Cecconi, P. Cimiano, R. Navigli, Representing multilingual data as linked data: the case of BabelNet 2.0, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-14) (2014)Google Scholar
  18. 18.
    I. Gurevych, J. Eckle-Kohler, S. Hartmann, M. Matuschek, C.M. Meyer, C. Wirth, UBY: A large-scale unified lexical-semantic resource based on LMF, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 580–590Google Scholar
  19. 19.
    J. Eckle-Kohler, J. McCrae, C. Chiarcos, lemonUby-a large, interlinked, syntactically-rich resource for ontologies. Semant. Web 6(4), 371–378 (2015)Google Scholar
  20. 20.
    J.P. McCrae, C. Fellbaum, P. Cimiano, Publishing and Linking WordNet using lemon and RDF, in Proceedings of the 3rd Workshop on Linked Data in Linguistics (2014)Google Scholar
  21. 21.
    J. McCrae, A. Rademaker, F. Bond, E. Rudnicka, C. Fellbaum, English WordNet 2019—an open-source WordNet for english, in Proceedings of the 10th Global WordNet Conference (2019)Google Scholar
  22. 22.
    N. Guarino, Some ontological principles for designing upper level lexical resources, in Proceedings of the 1st International Conference on Language Resources and Evaluation (LREC), Granada, 28–30 May 1998Google Scholar
  23. 23.
    M. Kemps-Snijders, M. Windhouwer, P. Wittenburg, S.E. Wright, ISOcat: corralling data categories in the wild, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC) (2008), pp. 887–891Google Scholar
  24. 24.
    M. Windhouwer, S.E. Wright, Linking to linguistic data categories in ISOcat, in Linked Data in Linguistics (Springer, Berlin, 2012), pp. 99–107CrossRefGoogle Scholar
  25. 25.
    E. Ruci, On the current state of Albanet and related applications (University of Vlora, University of Vlora, 2008). Technical Report. http://fjalnet.com/technicalreportalbanet.pdf
  26. 26.
    L. Abouenour, K. Bouzoubaa, P. Rosso, On the evaluation and improvement of Arabic wordnet coverage and usability. Lang. Resour. Eval. 47(3), 891 (2013)CrossRefGoogle Scholar
  27. 27.
    S. Elkateb, W. Black, H. Rodríguez, M. Alkhalifa, P. Vossen, A. Pease, C. Fellbaum, Building a wordnet for Arabic, in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (2006)Google Scholar
  28. 28.
    K. Simov, P. Osenova, Constructing of an ontology-based lexicon for Bulgarian, in Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), ed. by N.C.C. Chair, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, D. Tapias (European Language Resources Association (ELRA), Valletta, 2010)Google Scholar
  29. 29.
    S. Wang, F. Bond, Building the Chinese open Wordnet (cow): starting from core synsets, in Proceedings of the 6th International Joint Conference on Natural Language Processing (2013), pp. 10–18Google Scholar
  30. 30.
    C.R. Huang, S.K. Hsieh, J.F. Hong, Y.Z. Chen, I.L. Su, Y.X. Chen, S.W. Huang, Chinese wordnet: design and implementation of a cross-lingual knowledge processing infrastructure. J. Chin. Inf. Process. 24(2), 14 (2010) (in Chinese)Google Scholar
  31. 31.
    B. Pedersen, S. Nimb, J. Asmussen, N. Sørensen, L. Trap-Jensen, H. Lorentzen, DanNet—the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Lang. Resour. Eval. 43(3), 269 (2009)CrossRefGoogle Scholar
  32. 32.
    M. Montazery, H. Faili, Automatic Persian wordnet construction, in Proceedings of the 23rd International Conference on Computational Linguistics (COLING) (2010), pp. 846–850Google Scholar
  33. 33.
    K. Lindén, L. Carlson., Finnwordnet—wordnet påfinska via översättning. LexicoNordica—Nord. J. Lexicogr. 17, 119 (2010). In Swedish with an English abstractGoogle Scholar
  34. 34.
    B. Sagot, D. Fišer, Building a free French wordnet from multilingual resources, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), ed. by E.L.R.A. (ELRA) (Marrakech, Morocco, 2008)Google Scholar
  35. 35.
    N. Ordan, S. Wintner, Hebrew WordNet: a test case of aligning lexical databases across languages. Int. J. Transl. 19(1), 39 (2007)Google Scholar
  36. 36.
    A. Oliver, K. Šojat, M. Srebačić, Automatic expansion of Croatian wordnet, in Proceedings of the 29th CALS International Conference on Language “Applied Linguistic Research and Methodology”, Zadar (2015)Google Scholar
  37. 37.
    I. Raffaelli, B. Bekavac, Agi, M. Tadi, Building croatian wordnet, in Proceedings of the 4th Global WordNet Conference 2008, Szeged, ed. by A. Tancs, D. Csendes, V. Vincze, C. Fellbaum, P. Vossen (2008), pp. 349–359Google Scholar
  38. 38.
    E. Pianta, L. Bentivogli, C. Girardi, Multiwordnet: Developing an aligned multilingual database, in Proceedings of the 1st International Conference on Global WordNet, Mysore (2002), pp. 293–302Google Scholar
  39. 39.
    A. Toral, S. Bracale, M. Monachini, C. Soria, Rejuvenating the Italian WordNet: upgrading, standardising, extending, in Proceedings of the 5th International Conference of the Global WordNet Association (GWC), Mumbai (2010)Google Scholar
  40. 40.
    H. Isahara, F. Bond, K. Uchimoto, M. Utiyama, K. Kanzaki, Development of the Japanese WordNet, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), Marrakech (2008)Google Scholar
  41. 41.
    A. Gonzalez-Agirre, E. Laparra, G. Rigau, Multilingual central repository version 3.0: upgrading a very large lexical knowledge base, in Proceedings of the 6th Global WordNet Conference (GWC), Matsue (2012)Google Scholar
  42. 42.
    E. Pociello, E. Agirre, I. Aldezabal, Methodology and construction of the Basque wordnet. Lang. Resour. Eval. 45(2), 121 (2011)CrossRefGoogle Scholar
  43. 43.
    N. Mohamed Noor, S. Sapuan, F. Bond, Creating the open Wordnet Bahasa, in Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), Singapore (2011), pp. 258–267Google Scholar
  44. 44.
    M. Postma, E. van Miltenburg, R. Segers, A. Schoen, P. Vossen, Open DutchWordNet, in Proceedings of the 8th Global Wordnet Conference, Bucharest (2016)Google Scholar
  45. 45.
    R.V. Fjeld, L. Nygaard, Nornet—a monolingual wordnet of modern Norwegian, in Proceedings of the NODALIDA 2009 Workshop WordNets and Other Lexical Semantic Resources—Between Lexical Semantics, Lexicography, Terminology and Formal Ontologies, vol. NEALT Proceedings Series, Vol. 7 (Estonia, 2009), pp. 13–16Google Scholar
  46. 46.
    M. Piasecki, S. Szpakowicz, B. Broda, A Wordnet from the Ground Up (Wroclaw University of Technology Press, Wroclaw, 2009). http://www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf. ISBN 978-83-7493-476-3Google Scholar
  47. 47.
    D. Tufiş, R. Ion, L. Bozianu, A. Ceauşu, D. Ştefănescu, Romanian wordnet: current state, new applications and prospects, in Proceedings of the 4th Global WordNet Association Conference, Szeged (2008), pp. 441–452Google Scholar
  48. 48.
    R. Garabk, I. Pileckyt, From multilingual dictionary to Lithuanian wordnet, in Natural Language Processing, Corpus Linguistics, E-Learning, ed. by K. Gajdoov, A. kov (RAM, Ldenscheid, 2013), pp. 74–80Google Scholar
  49. 49.
    D. Fišer, J. Novak, T. Erjavec, sloWNet 3.0: development, extension and cleaning, in Proceedings of the 6th International Global Wordnet Conference (GWC) (The Global WordNet Association, Herensingel, 2012), pp. 113–117Google Scholar
  50. 50.
    L. Borin, M. Forsberg, L. Lönngren, Saldo: a touch of yin to wordnet’s yang. Lang. Resour. Eval. 47(4), 1191 (2013)CrossRefGoogle Scholar
  51. 51.
    S. Thoongsup, T. Charoenporn, K. Robkop, T. Sinthurahat, C. Mokarat, V. Sornlertlamvanich, H. Isahara, Thai wordnet construction, in Proceedings of the 7th Workshop on Asian Language Resources (ALR7), co-located with the Joint of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) (Suntec, Singapore, 2009)Google Scholar
  52. 52.
    X.S. Vu, S.B. Park, Construction of Vietnamese SentiWordNet by using Vietnamese dictionary. 40th Conf. Korea Inf. Process. Soc. 21, 745 (2014)Google Scholar
  53. 53.
    L. Bentivogli, P. Forner, B. Magnini, E. Pianta, Revising wordnet domains hierarchy: semantics, coverage, and balancing, in Proceedings of the Workshop on Multilingual Linguistic Resources Co-located with COLING, Geneva (2004), pp. 101–108Google Scholar
  54. 54.
    Y.J. Seah, F. Bond, Annotation of pronouns in a multilingual corpus of Mandarin Chinese, English and Japanese, in Proceedings of the 10th Joint Annual Meeting of the Association for Computational Linguistics (ACL)—ISO Workshop on Interoperable Semantic Annotation, Reykjavik (2014)Google Scholar
  55. 55.
    P. Vossen, F. Bond, J.P. McCrae, Toward a truly multilingual Global Wordnet Grid, in Proceedings of the Global WordNet Conference (2016)Google Scholar
  56. 56.
    F. Bond, P. Vossen, J.P. McCrae, C. Fellbaum, CILI: the Collaborative Interlingual Index, in Proceedings of the Global WordNet Conference (2016)Google Scholar
  57. 57.
    CICC, Research on Malaysian Dictionary. Technical Report 6—CICC—MT54 (Center of the International Cooperation for Computerization, Tokyo, 1994)Google Scholar
  58. 58.
    J.P. McCrae, P. Vossen, L.M. da Costa, F. Bond, The GLobal WOrdNEt ASsociation Schemas. Linguistic Issues in Language Technology (2018, Under Review)Google Scholar
  59. 59.
    G. Francopoulo, M. George, N. Calzolari, M. Monachini, N. Bel, M. Pet, C. Soria, et al., Lexical markup framework (LMF), in Proceedings of the International Conference on Language Resources and Evaluation, vol. 6 (2006)Google Scholar
  60. 60.
    J. McCrae, G.A. de Cea, P. Buitelaar, P. Cimiano, T. Declerck, A. Gómez-Pérez, J. Gracia, L. Hollink, E. Montiel-Ponsoda, D. Spohr, T. Wunner, Interchanging lexical resources on the Semantic Web. Lang. Resour. Eval. 46(6), 701 (2012)Google Scholar
  61. 61.
    P. Cimiano, J.P. McCrae, P. Buitelaar, Lexicon model for ontologies: community report. W3C community group final report (World Wide Web Consortium, Cambridge, 2014)Google Scholar
  62. 62.
    C. Soria, M. Monachini, P. Vossen, Wordnet-LMF: fleshing out a standardized format for wordnet interoperability, in Proceedings of the International Workshop on Intercultural Collaboration (ACM, New York, 2009), pp. 139–146Google Scholar
  63. 63.
    M. Sporny, D. Longley, G. Kellogg, M. Lanthaler, N. Lindström, JSON-LD 1.0, in W3C recommendation (World Wide Web Consortium, Cambridge, 2014)Google Scholar
  64. 64.
    D. Beckett, T. Berners-Lee, E. Prud’hommeaux, G. Carothers, RDF 1.1 Turtle, in W3C Recommendation (World Wide Web Consortium, Cambridge, 2004)Google Scholar
  65. 65.
    D. Beckett, B. McBride, RDF/XML Syntax Specification, in W3C Recommendation (World Wide Web Consortium, Cambridge, 2004)Google Scholar
  66. 66.
    J. Eckle-Kohler, I. Gurevych, S. Hartmann, M. Matuschek, C.M. Meyer, UBY-LMF-a uniform model for standardizing heterogeneous lexical-semantic resources in ISO-LMF, in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012), pp. 275–282Google Scholar
  67. 67.
    M. Windhouwer, J. Petro, S. Shayan, RELISH LMF: unlocking the full power of the Lexical Markup Framework, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (2014), pp. 1032–1037Google Scholar
  68. 68.
    D. Lindemann, F. Kliche, Bilingual Dictionary Drafting: Bootstrapping WordNet and BabelNet, in Proceedings of the 5th Biennial Conference on Electronic Lexicography (eLex) (2017), pp. 23–42Google Scholar
  69. 69.
    J.P. McCrae, P. Buitelaar, Linking datasets using semantic textual similarity. Cybern. Inf. Technol. 18(1), 109 (2018)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Semantic Computing GroupBielefeld UniversityBielefeldGermany
  2. 2.Angewandte ComputerlinguistikGoethe-UniversityFrankfurt am MainGermany
  3. 3.Insight Centre for Data AnalyticsNational University of IrelandGalwayIreland
  4. 4.Aragon Institute of Engineering Research (I3A)University of ZaragozaZaragozaSpain

Personalised recommendations