Journal on Data Semantics

, Volume 8, Issue 4, pp 219–234 | Cite as

Diversicon: Pluggable Lexical Domain Knowledge

  • Gábor BellaEmail author
  • Fiona McNeill
  • David Leoni
  • Francisco José Quesada Real
  • Fausto Giunchiglia
Original Article


Natural language understanding is a key task in a wide range of applications targeting data interoperability or analytics. For the analysis of domain-specific data, specialised knowledge resources (terminologies, grammars, word vector models, lexical databases) are necessary. The heterogeneity of such resources is, however, a major obstacle to their efficient use, especially in combination. This paper presents the open-source Diversicon Framework that helps application developers in finding, integrating, and accessing lexical domain knowledge, both symbolic and statistical, in a unified manner. The major components of the framework are: (1) an API and domain knowledge model that allow applications to retrieve domain knowledge through a common interface from a diversity of resource types, (2) implementations of the API for some of the most commonly used symbolic and statistical knowledge sources, (3) a domain-aware knowledge base that helps integrate static lexico-semantic resources, and (4) an online catalogue that either hosts or links to the existing resources from multiple domains. Support for Diversicon is already integrated into two of the most popular ontology matcher applications, a fact that we exploit to validate the framework and demonstrate its use on a example study that evaluates the effect of several common-sense and domain knowledge resources on a medical ontology matching task.


Domain knowledge Lexical knowledge Word vector models Natural language understanding Knowledge framework 



  1. 1.
    Bella G, Giunchiglia F, McNeill F (2017) Language and domain aware lightweight ontology matching. Web Semant Sci Serv Agents World Wide Web 43(1):1–17CrossRefGoogle Scholar
  2. 2.
    Bella G, Zamboni A, Giunchiglia F (2016) Domain-based sense disambiguation in multilingual structured data. In: The diversity workshop at the European conference on artificial intelligenceGoogle Scholar
  3. 3.
    Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(Suppl 1):D267–D270CrossRefGoogle Scholar
  4. 4.
    Donnelly K (2006) SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inf 121:279Google Scholar
  5. 5.
    Eckle-Kohler J, McCrae JP, Chiarcos C (2015) LemonUby—a large, interlinked, syntactically-rich lexical resource for ontologies. Semant Web 6(4):371–378CrossRefGoogle Scholar
  6. 6.
    Ehrmann M et al (2014) Representing multilingual data as linked data: the case of BabelNet 2.0. In: Proceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, IcelandGoogle Scholar
  7. 7.
    Faria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF (2018) Tackling the challenges of matching biomedical ontologies. J Biomed Semant 9(1):4CrossRefGoogle Scholar
  8. 8.
    Francopoulo G, George M, Calzolari N, Monachini M, Bel N, Pet M, Soria C (2006) Lexical markup framework (LMF). In: International conference on language resources and evaluation-LREC 2006Google Scholar
  9. 9.
    Fumagalli M, Bella G, Giunchiglia F (2019) Towards understanding classification and identification. In: Proceedings of the 16th Pacific Rim international conference on artificial intelligence (PRICAI)Google Scholar
  10. 10.
    Gella S, Strapparava C, Nastase V (2014) Mapping WordNet domains, WordNet topics and Wikipedia categories to generate multilingual domain specific resources. In: LREC, pp 1117–1121Google Scholar
  11. 11.
    Ghosh S, Chakraborty P, Cohn E, Brownstein JS, Ramakrishnan N (2016) Designing domain specific word embeddings: applications to disease surveillance. arXiv preprint arXiv:1603.00106
  12. 12.
    Giunchiglia F, McNeill F, Yatskevich M, Pane J, Besana P, Shvaiko P (2008) Approximate structure-preserving semantic matching. In: Meersman R, Tari Z (eds) OTM confederated international conferences "on the move to meaningful internet systems". Springer, Berlin, pp 1217–1234Google Scholar
  13. 13.
    Giunchiglia F, Yatskevich M, Shvaiko P (2007) Semantic matching: algorithms and implementation. J Data Semant 9:1–38zbMATHGoogle Scholar
  14. 14.
    Gliozzo A, Strapparava C (2009) Semantic domains in computational linguistics. Springer, BerlinCrossRefGoogle Scholar
  15. 15.
    González-Agirre A, Rigau G, Castillo M (2012) A graph-based method to improve WordNet domains. Springer, Berlin, pp 17–28. CrossRefGoogle Scholar
  16. 16.
    Gurevych I, Eckle-Kohler J, Hartmann S, Matuschek M, Meyer CM, Wirth C (2012) Uby: a large-scale unified lexical-semantic resource based on LMF. In: Proceedings of the 13th EACL conference. Association for Computational Linguistics, pp 580–590Google Scholar
  17. 17.
    Jiménez-Ruiz E, Cuenca Grau B (2011) LogMap: logic-based and scalable ontology matching. In: The semantic web—ISWC 2011, vol 7031, pp 273–288CrossRefGoogle Scholar
  18. 18.
    Lambrix P, Tan H (2006) SAMBO—a system for aligning and merging biomedical ontologies. Web Semant Sci Serv Agents World Wide Web 4(3):196–206CrossRefGoogle Scholar
  19. 19.
    Magnini B, Strapparava C, Pezzulo G, Gliozzo A (2001) Using domain information for word sense disambiguation. In: The proceedings of the second international workshop on evaluating word sense disambiguation systems, SENSEVAL ’01. Association for Computational Linguistics, Stroudsburg, pp 111–114Google Scholar
  20. 20.
    McCoy RT, Pavlick E, Linzen T (2019) Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. arXiv preprint arXiv:1902.01007
  21. 21.
    McCrae J, Spohr D, Cimiano P (2011) Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou G, Grobelnik M, Simperl E, Parsia B, Plexousakis D, De Leenheer P, Pan J (eds) Extended semantic web conference. Springer, Berlin, pp 245–259Google Scholar
  22. 22.
    McNeill F, Gkaniatsou A, Bundy A (2014) Dynamic data sharing for facilitating communication during emergency responses. In: ISCRAMGoogle Scholar
  23. 23.
    Monachini M, Quochi V, Del Gratta R, Calzolari N (2007) Using LMF to shape a lexicon for the biomedical domain. In: LangTech proceeding, RomeGoogle Scholar
  24. 24.
    Nooralahzadeh F, Øvrelid L, Lønning JT (2018) Evaluation of domain-specific word embeddings using knowledge resources. In: Proceedings of LREC 2018. European Language Resources Association (ELRA). Miyazaki, JapanGoogle Scholar
  25. 25.
    Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  26. 26.
    Pianta E, Bentivogli L, Girardi C (2002) MultiWordNet: developing an aligned multilingual database. In: Proceedings of the first international conference on global WordNet, pp 21–25.
  27. 27.
    Pilehvar MT, Collier N (2016) Improved semantic representation for domain-specific entities. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 12–16Google Scholar
  28. 28.
    Rotella F, Ferilli S, Leuzzi F (2012) A domain-based approach to information retrieval in digital libraries. In: Agosti M, Esposito F, Ferilli S, Ferro N (eds) Italian research conference on digital libraries. Springer, Berlin, pp 129–140Google Scholar
  29. 29.
    Toral A, Monachini M, Soria C, Cuadros M, Rigau G, Bosma W, Vossen P (2010) Linking a domain thesaurus to WordNet and conversion to WordNet-LMF. In: Proceedings of second international conference on global interoperability for language resources (ICGL2010). Hong KongGoogle Scholar
  30. 30.
    Trier J (1931) Der deutsche Wortschatz im Sinnbezirk des Verstandes: die Geschichte eines sprachlichen Feldes. 1. von den Anfängen bis zum Beginn des 13. Jahrhunderts. WinterGoogle Scholar
  31. 31.
    Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Kluwer, NorwellzbMATHGoogle Scholar
  32. 32.
    Vulić I, Ponzetto SP, Glavaš G (2019) Multilingual and cross-lingual graded lexical entailment. In: Proceedings of the 57th conference of the association for computational linguistics, pp 4963–4974Google Scholar
  33. 33.
    Wright S, Budin G (2001) Handbook of terminology management. Application-oriented terminology management. J. Benjamins, New YorkCrossRefGoogle Scholar
  34. 34.
    Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 425–434Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Information Engineering and Computer ScienceUniversity of TrentoTrentoItaly
  2. 2.School of Mathematical and Computer SciencesHeriot-Watt UniversityEdinburghUK
  3. 3.School of InformaticsUniversity of EdinburghEdinburghUK

Personalised recommendations