Skip to main content

Advertisement

Log in

Diversicon: Pluggable Lexical Domain Knowledge

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Natural language understanding is a key task in a wide range of applications targeting data interoperability or analytics. For the analysis of domain-specific data, specialised knowledge resources (terminologies, grammars, word vector models, lexical databases) are necessary. The heterogeneity of such resources is, however, a major obstacle to their efficient use, especially in combination. This paper presents the open-source Diversicon Framework that helps application developers in finding, integrating, and accessing lexical domain knowledge, both symbolic and statistical, in a unified manner. The major components of the framework are: (1) an API and domain knowledge model that allow applications to retrieve domain knowledge through a common interface from a diversity of resource types, (2) implementations of the API for some of the most commonly used symbolic and statistical knowledge sources, (3) a domain-aware knowledge base that helps integrate static lexico-semantic resources, and (4) an online catalogue that either hosts or links to the existing resources from multiple domains. Support for Diversicon is already integrated into two of the most popular ontology matcher applications, a fact that we exploit to validate the framework and demonstrate its use on a example study that evaluates the effect of several common-sense and domain knowledge resources on a medical ontology matching task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For example, the SNOMED ontology of medical terms contains over 500,000 concepts and 1,500,000 labels in English only.

  2. http://oaei.ontologymatching.org/.

  3. http://www.udcc.org.

  4. The vectors of antonyms, such as hot and cold, are typically considered as very closely related by these models.

  5. https://deeplearning4j.org/.

  6. https://github.com/thomasjungblut/glove.

  7. http://github.com/diversicon-kb/.

  8. http://github.com/diversicon-kb/divercli.

  9. http://github.com/diversicon-kb/divmaker.

  10. http://www.diversicon-kb.eu.

  11. http://getdkan.org.

  12. https://specialist.nlm.nih.gov/lexicon/.

  13. https://www.nlm.nih.gov/research/umls/.

  14. https://nlp.stanford.edu/projects/glove/.

  15. https://www.ncbi.nlm.nih.gov/pubmed/.

  16. The Diversicon-based SMATCH extensions are downloadable from https://github.com/s-match/.

  17. The Diversicon-equipped version of LogMap is downloadable from https://github.com/diversicon-kb/logmap-matcher.

  18. https://github.com/diversicon-kb.

References

  1. Bella G, Giunchiglia F, McNeill F (2017) Language and domain aware lightweight ontology matching. Web Semant Sci Serv Agents World Wide Web 43(1):1–17

    Article  Google Scholar 

  2. Bella G, Zamboni A, Giunchiglia F (2016) Domain-based sense disambiguation in multilingual structured data. In: The diversity workshop at the European conference on artificial intelligence

  3. Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(Suppl 1):D267–D270

    Article  Google Scholar 

  4. Donnelly K (2006) SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inf 121:279

    Google Scholar 

  5. Eckle-Kohler J, McCrae JP, Chiarcos C (2015) LemonUby—a large, interlinked, syntactically-rich lexical resource for ontologies. Semant Web 6(4):371–378

    Article  Google Scholar 

  6. Ehrmann M et al (2014) Representing multilingual data as linked data: the case of BabelNet 2.0. In: Proceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland

  7. Faria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF (2018) Tackling the challenges of matching biomedical ontologies. J Biomed Semant 9(1):4

    Article  Google Scholar 

  8. Francopoulo G, George M, Calzolari N, Monachini M, Bel N, Pet M, Soria C (2006) Lexical markup framework (LMF). In: International conference on language resources and evaluation-LREC 2006

  9. Fumagalli M, Bella G, Giunchiglia F (2019) Towards understanding classification and identification. In: Proceedings of the 16th Pacific Rim international conference on artificial intelligence (PRICAI)

    Google Scholar 

  10. Gella S, Strapparava C, Nastase V (2014) Mapping WordNet domains, WordNet topics and Wikipedia categories to generate multilingual domain specific resources. In: LREC, pp 1117–1121

  11. Ghosh S, Chakraborty P, Cohn E, Brownstein JS, Ramakrishnan N (2016) Designing domain specific word embeddings: applications to disease surveillance. arXiv preprint arXiv:1603.00106

  12. Giunchiglia F, McNeill F, Yatskevich M, Pane J, Besana P, Shvaiko P (2008) Approximate structure-preserving semantic matching. In: Meersman R, Tari Z (eds) OTM confederated international conferences "on the move to meaningful internet systems". Springer, Berlin, pp 1217–1234

    Google Scholar 

  13. Giunchiglia F, Yatskevich M, Shvaiko P (2007) Semantic matching: algorithms and implementation. J Data Semant 9:1–38

    MATH  Google Scholar 

  14. Gliozzo A, Strapparava C (2009) Semantic domains in computational linguistics. Springer, Berlin

    Book  Google Scholar 

  15. González-Agirre A, Rigau G, Castillo M (2012) A graph-based method to improve WordNet domains. Springer, Berlin, pp 17–28. https://doi.org/10.1007/978-3-642-28604-9_2

    Book  Google Scholar 

  16. Gurevych I, Eckle-Kohler J, Hartmann S, Matuschek M, Meyer CM, Wirth C (2012) Uby: a large-scale unified lexical-semantic resource based on LMF. In: Proceedings of the 13th EACL conference. Association for Computational Linguistics, pp 580–590

  17. Jiménez-Ruiz E, Cuenca Grau B (2011) LogMap: logic-based and scalable ontology matching. In: The semantic web—ISWC 2011, vol 7031, pp 273–288

    Chapter  Google Scholar 

  18. Lambrix P, Tan H (2006) SAMBO—a system for aligning and merging biomedical ontologies. Web Semant Sci Serv Agents World Wide Web 4(3):196–206

    Article  Google Scholar 

  19. Magnini B, Strapparava C, Pezzulo G, Gliozzo A (2001) Using domain information for word sense disambiguation. In: The proceedings of the second international workshop on evaluating word sense disambiguation systems, SENSEVAL ’01. Association for Computational Linguistics, Stroudsburg, pp 111–114

  20. McCoy RT, Pavlick E, Linzen T (2019) Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. arXiv preprint arXiv:1902.01007

  21. McCrae J, Spohr D, Cimiano P (2011) Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou G, Grobelnik M, Simperl E, Parsia B, Plexousakis D, De Leenheer P, Pan J (eds) Extended semantic web conference. Springer, Berlin, pp 245–259

    Google Scholar 

  22. McNeill F, Gkaniatsou A, Bundy A (2014) Dynamic data sharing for facilitating communication during emergency responses. In: ISCRAM

  23. Monachini M, Quochi V, Del Gratta R, Calzolari N (2007) Using LMF to shape a lexicon for the biomedical domain. In: LangTech proceeding, Rome

  24. Nooralahzadeh F, Øvrelid L, Lønning JT (2018) Evaluation of domain-specific word embeddings using knowledge resources. In: Proceedings of LREC 2018. European Language Resources Association (ELRA). Miyazaki, Japan

  25. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  26. Pianta E, Bentivogli L, Girardi C (2002) MultiWordNet: developing an aligned multilingual database. In: Proceedings of the first international conference on global WordNet, pp 21–25. http://multiwordnet.fbk.eu/paper/MWN-India-published.pdf

  27. Pilehvar MT, Collier N (2016) Improved semantic representation for domain-specific entities. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 12–16

  28. Rotella F, Ferilli S, Leuzzi F (2012) A domain-based approach to information retrieval in digital libraries. In: Agosti M, Esposito F, Ferilli S, Ferro N (eds) Italian research conference on digital libraries. Springer, Berlin, pp 129–140

    Google Scholar 

  29. Toral A, Monachini M, Soria C, Cuadros M, Rigau G, Bosma W, Vossen P (2010) Linking a domain thesaurus to WordNet and conversion to WordNet-LMF. In: Proceedings of second international conference on global interoperability for language resources (ICGL2010). Hong Kong

  30. Trier J (1931) Der deutsche Wortschatz im Sinnbezirk des Verstandes: die Geschichte eines sprachlichen Feldes. 1. von den Anfängen bis zum Beginn des 13. Jahrhunderts. Winter

  31. Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Kluwer, Norwell

    MATH  Google Scholar 

  32. Vulić I, Ponzetto SP, Glavaš G (2019) Multilingual and cross-lingual graded lexical entailment. In: Proceedings of the 57th conference of the association for computational linguistics, pp 4963–4974

  33. Wright S, Budin G (2001) Handbook of terminology management. Application-oriented terminology management. J. Benjamins, New York

    Book  Google Scholar 

  34. Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 425–434

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gábor Bella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bella, G., McNeill, F., Leoni, D. et al. Diversicon: Pluggable Lexical Domain Knowledge. J Data Semant 8, 219–234 (2019). https://doi.org/10.1007/s13740-019-00107-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-019-00107-1

Keywords

Navigation