Abstract
Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new Bio-ML track at OAEI 2022.
Resource type: Ontology Matching Dataset
License: CC BY 4.0 International
DOI: https://doi.org/10.5281/zenodo.6510086
Documentation: https://krr-oxford.github.io/DeepOnto/#/om_resources
OAEI track: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/
Keywords
- Ontology Alignment
- Equivalence matching
- Subsumption matching
- Evaluation resource
- Biomedical ontology
- OAEI
This is a preview of subscription content, access via your institution.
Buying options
Notes
- 1.
- 2.
- 3.
- 4.
Mondo was working on official versioning, the information of current mappings is based on the preliminary release at: https://github.com/monarch-initiative/mondo/tree/master/src/ontology/mappings.
- 5.
We exclude mappings involving missing class ids.
- 6.
Compact IRI of a class in the form of ontology_prefix:class_ID.
- 7.
The license to access UMLS is global and can be used to access SNOMED CT. We obtained SNOMED CT (and UMLS) after signing up to the UTS account and license following SNOMED and UMLS licensing in https://www.nlm.nih.gov/healthit/snomedct/snomed_licensing.html.
- 8.
Labels are extracted from annotation properties concerning synonyms of the class name, e.g., rdfs:label, fma:synonym, skos:prefLabel, etc.
- 9.
EditSim and BERTMap codes: https://github.com/KRR-Oxford/DeepOnto.
- 10.
- 11.
- 12.
BERTSubs codes: https://gitlab.com/chen00217/bert_subsumption; Word2Vec (or OWL2Vec*) + RF codes are in the folder Inter_Ontology/baselines/ of the this repository.
References
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. ArXiv abs/1904.03323 (2019)
Amberger, J.S., Bocchini, C.A., Schiettecatte, F., Scott, A.F., Hamosh, A.: OMIM. org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43(D1), D789–D798 (2015)
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucl. Acids Res. (2004)
Chen, J., He, Y., Jimenez-Ruiz, E., Dong, H., Horrocks, I.: Contextual semantic embeddings for ontology subsumption prediction. arXiv preprint arXiv:2202.09791 (2022)
Chen, J., Hu, P., Jimenez-Ruiz, E., Holter, O.M., Antonyrajah, D., Horrocks, I.: OWL2Vec*: embedding of OWL ontologies. Mach. Learn. 110(7), 1813–1845 (2021)
Chen, J., Jiménez-Ruiz, E., Horrocks, I., Antonyrajah, D., Hadian, A., Lee, J.: Augmenting ontology alignment by semantic embedding and distant supervision. In: European Semantic Web Conference, pp. 392–408. Springer (2021). https://doi.org/10.1007/978-3-030-77385-4_23
Coiera, E.: Guide to Health Informatics, chap. Chapter 23 Healthcare Terminologies and Classification Systems, pp. 381–399. CRC Press (2015)
Donnelly, K., et al.: SNOMED-CT: the advanced terminology and coding system for ehealth. In: Medical and Care Compunetics 3, Studies in health technology and informatics, vol. 121, pp. 279–290. IOS Press (2006)
Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The agreement maker light ontology matching system. In: OTM Conferences (2013)
Haendel, M., et al.: How many rare diseases are there? Nat. Rev. Drug Disc. 19(2), 77–78 (2020)
Harrow, I., et al.: Matching disease and phenotype ontologies in the ontology alignment evaluation initiative. J. Biomed. Semant. 8(1), 1–13 (2017)
He, Y., Chen, J., Antonyrajah, D., Horrocks, I.: BERTMap: a BERT-based ontology alignment system. In: AAAI (2022)
Hertling, S., Portisch, J., Paulheim, H.: Melt - matching evaluation toolkit. In: SEMANTiCS (2019)
Iyer, V., Agarwal, A., Kumar, H.: VeeAlign: multifaceted context representation using dual attention for ontology alignment. In: EMNLP (2021)
Jiménez-Ruiz, E., Grau, B.C.: LogMap: logic-based and scalable ontology matching. In: International Semantic Web Conference (2011)
Jiménez-Ruiz, E., Grau, B.C., Horrocks, I., Berlanga, R.: Logic-based assessment of the compatibility of UMLS ontology sources. J. Biomed. Semant. 2(1), 1–16 (2011)
Kolyvakis, P., Kalousis, A., Kiritsis, D.: DeepAlignment: unsupervised ontology matching with refined word vectors. In: NAACL (2018)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mungall, C.J., Koehler, S., Robinson, P.N., Holmes, I.H., Haendel, M.A.: k-BOOM: a Bayesian approach to ontology structure inference, with applications in disease ontology construction. F1000Research (2016)
Neutel, S., de Boer, M.: Towards automatic ontology alignment using BERT. In: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering (2021)
Nguyen, V., Yip, H.Y., Bodenreider, O.: Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus. In: Proceedings of the Web Conference 2021, pp. 2672–2683 (2021)
Pesquita, C., Faria, D., Santos, E., Couto, F.M.: To repair or not to repair: reconciling correctness and coherence in ontology reference alignments. In: Proceedings of the 8th International Workshop on Ontology Matching, pp. 13–24 (2013)
Rosse, C., Mejino, J.L.: The foundational model of anatomy ontology. In: Anatomy Ontologies for Bioinformatics, pp. 59–117. Springer (2008). https://doi.org/10.1007/978-1-84628-885-2_4
Rossi, A., Firmani, D., Matinata, A., Merialdo, P., Barbosa, D.: Knowledge graph embedding for link prediction: a comparative analysis. ACM Trans. Knowl. Discov. Data 15, 14:1–14:49 (2021)
Schriml, L.M., et al.: Human disease ontology 2018 update: classification, content and workflow expansion. Nucl. Acids Res. (2018)
Shefchek, K.A., et al.: The monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucl. Acids Res. (2020)
Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 158–176 (2013)
Sioutos, N., de Coronado, S., Haber, M.W., Hartel, F.W., Shaiu, W.L., Wright, L.W.: NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J. Biomed. Inform. 40(1), 30–43 (2007). bio*Medical Informatics
Vasant, D., et al.: ORDO: an ontology connecting rare disease, epidemiology and genetic data. In: Proceedings of ISMB, vol. 30 (2014)
Acknowledgments
This work was supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889), eBay, Samsung Research UK, Siemens AG, and the EPSRC projects OASIS (EP/S032347/1), UK FIRES (EP/S019111/1) and ConCur (EP/V050869/1). We would like to to thank the Mondo team, especially Nicolas Matentzoglu and Joe Flake, for their great help in creating the Mondo datasets.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
He, Y., Chen, J., Dong, H., Jiménez-Ruiz, E., Hadian, A., Horrocks, I. (2022). Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching. In: , et al. The Semantic Web – ISWC 2022. ISWC 2022. Lecture Notes in Computer Science, vol 13489. Springer, Cham. https://doi.org/10.1007/978-3-031-19433-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-19433-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19432-0
Online ISBN: 978-3-031-19433-7
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://swsa.semanticweb.org/