Skip to main content

Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain

Abstract

In recent times, the use of knowledge graphs has been massively adopted so that many of these graphs can even be found publicly on the Web. This makes that solutions for solving interoperability problems among them might be in high demand. The reason is that unifying these knowledge graphs could impact a wide range of industrial and academic disciplines that can benefit from aspects such as the ability to configure queries that were not possible until now, for example, in the biomedical domain where there are significant problems of semantic interoperability. To date, several effective methods have been put forward to solve the heterogeneity problems in this knowledge ecosystem. However, it is not possible to assess their superiority in each different scenario they are facing. Therefore, we explore several penalized regression techniques that can mitigate the risk of incurring severe errors in real settings and preserve properties related to the interpretability of the solution. As a result, we have obtained a proposal for entity meta-alignment that yields promising results in the biomedical domain.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    The term meta-alignment reflects the idea that we do not build an alignment method from scratch, but we aim to build an ensemble of existing ones.

  2. 2.

    https://wordnet.princeton.edu/.

  3. 3.

    https://gsi-upm.github.io/sematch/.

  4. 4.

    https://scikit-learn.org/.

References

  1. 1.

    Agibetov, A., Samwald, M.: Benchmarking neural embeddings for link prediction in knowledge graphs under semantic and structural changes. J. Web Semant. 64, 100590 (2020)

    Article  Google Scholar 

  2. 2.

    Berrendorf, M., Faerman, E., Melnychuk, V., Tresp, V., Seidl, T.: Knowledge graph entity alignment with graph convolutional networks: lessons learned. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.), Advances in Information Retrieval—42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II, Volume 12036 of Lecture Notes in Computer Science, pp. 3–11. Springer (2020)

  3. 3.

    de Coronado, S., Haber, M.W., Sioutos, N., Tuttle, M.S., Wright, L.W.: NCI thesaurus: using science-based terminology to integrate cancer research results. In: Fieschi, M., Coiera, E.W., Li, J.Y. (eds.), MEDINFO 2004—Proceedings of the 11th World Congress on Medical Informatics, San Francisco, California, USA, September 7–11, 2004, Volume 107 of Studies in Health Technology and Informatics, pp. 33–37. IOS Press (2004)

  4. 4.

    Do, H.H., Rahm, E.: COMA—a system for flexible combination of schema matching approaches. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, August 20–23, 2002, pp. 610–621. Morgan Kaufmann (2002)

  5. 5.

    Donnelly, K.: Snomed-ct: the advanced terminology and coding system for ehealth. Stud. Health Technol. Inform. 121, 279 (2006)

    Google Scholar 

  6. 6.

    Ferrucci, D.A., Brown, E.W., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J.M., Schlaefer, N., Welty, C.A.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)

    Google Scholar 

  7. 7.

    Martinez-Gil, J., Chaves-González, J.M.: Automatic design of semantic similarity controllers based on fuzzy logics. Expert Syst. Appl. 131, 45–59 (2019)

    Article  Google Scholar 

  8. 8.

    Martinez-Gil, J., Aldana-Montes, J.F.: An overview of current ontology meta-matching solutions. Knowl. Eng. Rev. 27(4), 393–412 (2012)

    Article  Google Scholar 

  9. 9.

    Hao, Y., Zhang, Y., He, S., Liu, K., Zhao, J.: A joint embedding method for entity alignment of knowledge bases. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds.), Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data—First China Conference, CCKS 2016, Beijing, China, September 19–22, 2016, Revised Selected Papers, Volume 650 of Communications in Computer and Information Science, pp. 3–14. Springer (2016)

  10. 10.

    Hayamizu, T.F., Mangan, M., Corradi, J.P., Kadin, J.A., Ringwald, M.: The adult mouse anatomical dictionary: a tool for annotating and integrating data. Genome Biol. 6(3), 1–8 (2005)

    Article  Google Scholar 

  11. 11.

    Huang, L., Luo, X.: EASA: entity alignment algorithm based on semantic aggregation and attribute attention. IEEE Access 8, 18162–18170 (2020)

    Article  Google Scholar 

  12. 12.

    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Chen, K., Huang, C., Sproat, R. (eds.) Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997, Taipei, Taiwan, August 1997, pp. 19–33. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP) (1997)

  13. 13.

    Kolyvakis, P., Kalousis, A., Kiritsis, D.: Deepalignment: unsupervised ontology matching with refined word vectors. In: Walker, M.A., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 1 (Long Papers), pp. 787–798. Association for Computational Linguistics (2018)

  14. 14.

    Kolyvakis, P., Kalousis, A., Smith, B., Kiritsis, D.: Biomedical ontology alignment: an approach based on representation learning. J. Biomed. Semant. 9(1), 1–20 (2018)

    Article  Google Scholar 

  15. 15.

    Lastra-Díaz, J.J., Goikoetxea, J., Taieb, M.A.H., García-Serrano, A., Aouicha, M.B., Agirre, E.: A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Eng. Appl. Artif. Intell. 85, 645–665 (2019)

    Article  Google Scholar 

  16. 16.

    Leacock, C., Chodorow, M., Miller, G.A.: Using corpus statistics and wordnet relations for sense identification. Comput. Linguist. 24(1), 147–165 (1998)

    Google Scholar 

  17. 17.

    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  18. 18.

    Li, W., Duan, X., Wang, M., Zhang, X., Qi, G.: Multi-view embedding for biomedical ontology matching. In: Shvaiko, P., Euzenat, J., Jiménez-Ruiz, E., Hassanzadeh, O., Trojahn, C. (eds.) Proceedings of the 14th International Workshop on Ontology Matching Co-located with the 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, October 26, 2019, Volume 2536 of CEUR Workshop Proceedings, pp. 13–24. CEUR-WS.org (2019)

  19. 19.

    Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)

    Article  Google Scholar 

  20. 20.

    Lin, D.: An information-theoretic definition of similarity. In: Shavlik, J.W. (ed.) Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24–27, 1998, pp. 296–304. Morgan Kaufmann (1998)

  21. 21.

    Liu, F., Shen, Y., Zhang, T., Gao, H.: Entity-related paths modeling for knowledge base completion. Front. Comput. Sci. 14(5), 145311 (2020)

    Article  Google Scholar 

  22. 22.

    Ngomo, A.N., Auer, S.: LIMES—a time-efficient approach for large-scale link discovery on the web of data. In: Walsh, T. (ed.) IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011, pp. 2312–2317. IJCAI/AAAI (2011)

  23. 23.

    Noy, N.F., Musen, M.A., Mejino, J.L.V., Jr., Rosse, C.: Pushing the envelope: challenges in a frame-based representation of human anatomy. Data Knowl. Eng. 48(3), 335–359 (2004)

    Article  Google Scholar 

  24. 24.

    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19(1), 17–30 (1989)

    Article  Google Scholar 

  25. 25.

    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Montréal Québec, Canada, August 20–25 1995, 2 Volumes, pp. 448–453. Morgan Kaufmann (1995)

  26. 26.

    Rinser, D., Lange, D., Naumann, F.: Cross-lingual entity matching and infobox alignment in Wikipedia. Inf. Syst. 38(6), 887–907 (2013)

    Article  Google Scholar 

  27. 27.

    Ritze, D., Paulheim, H.: Towards an automatic parameterization of ontology matching tools based on example mappings. In: Proceedings of 6th ISWC Ontology Matching Workshop (OM), Bonn (DE), pp. 37–48 (2011)

  28. 28.

    Sogancioglu, G., Öztürk, H., Özgür, A.: BIOSSES: a semantic sentence similarity estimation system for the biomedical domain. Bioinform. 33(14), i49–i58 (2017)

    Article  Google Scholar 

  29. 29.

    Sun, Z., Hu, W., Zhang, Q., Qu, Y.: Bootstrapping entity alignment with knowledge graph embedding. In: Lang, J. (ed.) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13–19, 2018, Stockholm, Sweden, pp. 4396–4402. ijcai.org (2018)

  30. 30.

    Trisedya, B.D., Qi, J., Zhang, R.: Entity alignment between knowledge graphs using attribute embeddings. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, pp. 297–304. AAAI Press (2019)

  31. 31.

    Trisedya, B.D., Weikum, G., Qi, J., Zhang, R.:. Neural relation extraction for knowledge base enrichment. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 229–240. Association for Computational Linguistics (2019)

  32. 32.

    Vrandecic, D.: The rise of wikidata. IEEE Intell. Syst. 28(4), 90–95 (2013)

    Article  Google Scholar 

  33. 33.

    Wang, H., Zhang, F., Xie, X., Guo, M.: DKN: deep knowledge-aware network for news recommendation. In: Champin, P., Gandon, F.L., Lalmas, M., Ipeirotis, P.G. (eds.) Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23–27, 2018, pp. 1835–1844. ACM (2018)

  34. 34.

    Wu, Y., Liu, X., Feng, Y., Wang, Z., Yan, R., Zhao, D.: Relation-aware entity alignment for heterogeneous knowledge graphs. In: Kraus, S. (ed.) Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp. 5278–5284. ijcai.org (2019)

  35. 35.

    Wu, Z., Palmer, M.S.: Verb semantics and lexical selection. In: Pustejovsky, J. (ed.) 32nd Annual Meeting of the Association for Computational Linguistics, 27–30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings, pp. 133–138. Morgan Kaufmann Publishers/ACL (1994)

  36. 36.

    Yang, K., Liu, S., Zhao, J., Wang, Y., Xie, B.: COTSAE: co-training of structure and attribute embeddings for entity alignment. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp. 3025–3032. AAAI Press (2020)

  37. 37.

    Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2017)

    Article  Google Scholar 

  38. 38.

    Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Hike: a hybrid human-machine method for entity alignment in large-scale knowledge bases. In: Lim, E., Winslett, M., Sanderson, M., Fu, A.W., Sun, J., Culpepper, J.S., Lo, E., Ho, J.C., Donato, D., Agrawal, R., Zheng, Y., Castillo, C., Sun, A., Tseng, V.S., Li, C. (eds.) Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 6–10, 2017, pp. 1917–1926. ACM (2017)

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their help in improving the manuscript. This work has been funded by the project FR06/2020 by International Cooperation & Mobility (ICM) of the Austrian Agency for International Cooperation in Education and Research (OeAD-GmbH). It is also supported in part by the French Ministries of Europe and Foreign Affairs (MEFA) and of Higher Education, Research and Innovation (MHERI). Amadeus Program 2020 (French-Austrian Hubert Curien Partnership–PHC). Grant Number 44086TD.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jorge Martinez-Gil.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Martinez-Gil, J., Mokadem, R., Morvan, F. et al. Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain. Prog Artif Intell (2021). https://doi.org/10.1007/s13748-021-00263-1

Download citation

Keywords

  • Knowledge graphs
  • Knowledge engineering
  • Knowledge-based technology