Abstract
Open Science aims at sharing results and data widely between different research domains. Interoperability is one of the keys to enable the exchange and crossing of data between different research communities. In this paper, we assess the state of the interoperability of Open Science datasets from various communities. The diversity of metadata schemata of these datasets from different sources does not allow for native interoperability, highlighting the need for matching tools. The question is whether current metadata schema matching tools are sufficiently efficient to achieve interoperability between existing datasets. In our study, we first define our vision of interoperability by transversally considering the technical and semantic aspects when dealing with metadata schemata coming from various domains. We then evaluate the interoperability of some datasets from the medical domain and Earth system study domain using acknowledged matching tools. We evaluate the performance of the tools, then we investigate the correlation between various metrics characterizing the schemata and the performance related to their mapping. This paper leads to identify complementary ways to improve dataset interoperability: (1) to adapt mapping algorithms to the issues raised by metadata schema matching; (2) to adapt metadata schemata, for instance by sharing a core vocabulary and/or reusing existing standards; (3) to combine various trends in a more complex interoperability approach that would also make available and operational the (RDA) crosswalks between schemata and that would promote good practices in metadata labeling and documentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
References
Alasem, A.: An overview of e-government metadata standards and initiatives based on Dublin core. Electron. J. e-Gov. 7(1), 1–10 (2009)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Advances in Neural Information Processing Systems, vol. 13 (2000)
Diallo, S.Y., et al.: Understanding interoperability. In: Proceedings of the 2011 Emerging M &S Applications in Industry and Academia Symposium, pp. 84–91 (2011)
Do, H.H., Rahm, E.: Coma-a system for flexible combination of schema matching approaches. In: VLDB 2002: Proceedings of the 28th International Conference on Very Large Databases, pp. 610–621. Elsevier (2002)
Embley, D.W., Jackman, D., Xu, L.: Multifaceted exploitation of metadata for attribute match discovery in information integration. In: Workshop on Information Integration on the Web, pp. 110–117 (2001)
Fallatah, O., et al.: A gold standard dataset for large knowledge graphs matching. In: Ontology Matching 2020: Proceedings of the 15th International Workshop on Ontology Matching co-located with the 19th International Semantic Web Conference (ISWC 2020), vol. 2788, pp. 24–35. CEUR Workshop Proceedings (2020)
Gittens, A., Achlioptas, D., Mahoney, M.W.: Skip-gram- zipf+ uniform= vector additivity. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 69–76 (2017)
Gonçalves, R.S., Kamdar, M.R., Musen, M.A.: Aligning biomedical metadata with ontologies using clustering and embeddings. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 146–161. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_10
He, Y., Chen, J., Antonyrajah, D., Horrocks, I.: BERTMap: a BERT-based ontology alignment system. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 5, pp. 5684–5691 (2022). https://doi.org/10.1609/aaai.v36i5.20510
Heiler, S.: Semantic interoperability. ACM Comput. Surv. (CSUR) 27(2), 271–273 (1995)
Hertling, S., Portisch, J., Paulheim, H.: MELT - matching EvaLuation toolkit. In: Acosta, M., Cudré-Mauroux, P., Maleshkova, M., Pellegrini, T., Sack, H., Sure-Vetter, Y. (eds.) SEMANTiCS 2019. LNCS, vol. 11702, pp. 231–245. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33220-4_17
Jacobsen, A., et al.: Fair principles: interpretations and implementation considerations (2020)
Koutras, C., et al.: Valentine: evaluating matching techniques for dataset discovery. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 468–479. IEEE (2021)
Kusner, M., et al.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966. PMLR (2015)
Li, Z., et al.: Temporal knowledge graph reasoning based on evolutional representation learning. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 408–417 (2021)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, vol. 1, pp. 49–58 (2001)
Massmann, S., Engmann, D., Rahm, E.: COMA++: results for the ontology alignment contest OAEI 2006. Ontol. Matching 225 (2006)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nilsson, M., Baker, T., Johnston, P.: Interoperability levels for Dublin core metadata. Technical report, Dublin Core Metadata Initiative (2008). https://www.dublincore.org/specifications/dublin-core/interoperability-levels/
Noura, M., et al.: Interoperability in internet of things: taxonomies and open challenges. Mob. Netw. Appl. 24(3), 796–809 (2019)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer (2010)
National Academies of Sciences, Engineering, and Medicine: Open science by design: Realizing a vision for 21st century research. National Academies Press, Washington DC (2018). https://doi.org/10.17226/25116. https://nap.nationalacademies.org/catalog/25116/open-science-by-design-realizing-a-vision-for-21st-century
Tolk, A., Diallo, S.Y., Turnitsa, C.D.: Applying the levels of conceptual interoperability model in support of integratability, interoperability, and composability for system-of-systems engineering. J. Syst. Cybern. Inform. 5(5) (2007)
Van Der Veer, H., Wiles, A.: Achieving technical interoperability. European Telecommunications Standards Institute (2008)
Vicente-Saez, R., Martinez-Fuentes, C.: Open science now: a systematic literature review for an integrated definition. J. Bus. Res. 88, 428–436 (2018)
Wegner, P.: Interoperability. ACM Comput. Surv. (CSUR) 28(1), 285–287 (1996)
Weibel, S., Kunze, J., Lagoze, C., Wolf, M.: Dublin core metadata for resource discovery. Technical report, IETF Request for Comments (1998). http://www.ietf.org/rfc/rfc2413.txt
Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3(1), 1–9 (2016)
Willis, C., Greenberg, J., White, H.: Analysis and synthesis of metadata goals for scientific data. J. Am. Soc. Inform. Sci. Technol. 63(8), 1505–1520 (2012)
Zeng, M.L.: Interoperability. KO. Knowl. Organ. 46(2), 122–146 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dang, VN., Aussenac-Gilles, N., Megdiche, I., Ravat, F. (2023). Interoperability of Open Science Metadata: What About the Reality?. In: Nurcan, S., Opdahl, A.L., Mouratidis, H., Tsohou, A. (eds) Research Challenges in Information Science: Information Science and the Connected World. RCIS 2023. Lecture Notes in Business Information Processing, vol 476. Springer, Cham. https://doi.org/10.1007/978-3-031-33080-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-33080-3_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33079-7
Online ISBN: 978-3-031-33080-3
eBook Packages: Computer ScienceComputer Science (R0)