Abstract
The German initiative “National Research Data Infrastructure for Personal Health Data” (NFDI4Health) focuses on research data management in health research. It aims to foster and develop harmonized informatics standards for public health, epidemiological studies, and clinical trials, facilitating access to relevant data and metadata standards. This publication lists syntactic and semantic data standards of potential use for NFDI4Health and beyond, based on interdisciplinary meetings and workshops, mappings of study questionnaires and the NFDI4Health metadata schema, and literature search. Included are 7 syntactic, 32 semantic and 9 combined syntactic and semantic standards. In addition, 101 ISO Standards from ISO/TC 215 Health Informatics and ISO/TC 276 Biotechnology could be identified as being potentially relevant. The work emphasizes the utilization of standards for epidemiological and health research data ensuring interoperability as well as the compatibility to NFDI4Health, its use cases, and to (inter-)national efforts within these sectors. The goal is to foster collaborative and inter-sectoral work in health research and initiate a debate around the potential of using common standards.
Similar content being viewed by others
Introduction
The amount of health data has been growing rapidly over the past years. To search, find, (re-)use, analyze and exchange these huge amounts of data, the FAIR guiding principles – findable, accessible, interoperable and reusable – were established1. However, in healthcare as well as in health and epidemiological research, data is often not complying with any of these principles: data is frequently unstructured and stored in different, decentralized silos. This work focuses on problems related to interoperability deficiency. When there is lack of interoperability, data cannot be exchanged in a structured and meaningful manner across different institutions and software systems without substantial additional efforts.
International standards are needed to enable interoperability. Standards developing organizations (SDO) focus on the development, maintenance and promotion of standards for a specific group of users or for industry needs. The work of SDOs is mainly performed by volunteers collaborating over many years in small working groups. Proposals of each working group are usually presented to a much larger audience to achieve consensus. The number of created standards differs from one SDO to another, depending on the focus of each organization2. Semantic standards involve the use of structured vocabularies, terminologies and classification systems to represent healthcare concepts3. These standards ensure that health information is accurately and consistently represented across different systems, facilitating clear and precise communication within the healthcare sector. Syntactic standards define the structure or format of data exchange, ensuring that the meaning of data is preserved during transmission4. Further definitions of terms used in this manuscript are provided in additional file 1 in our GitHub Repository (https://github.com/nfdi4health/IdentifiedStandards.git).
This work was performed within the NFDI4Health initiative5, a German research initiative developing a national research data infrastructure for personal health data. NFDI4Health represents an interdisciplinary research community which develops harmonized informatics standards for public health, epidemiological studies, and clinical trials to improve their FAIRness. We focus on the standardization of health research data to foster collaboration within these three domains. This work includes a comprehensive yet non-exhaustive list of standardization projects and initiatives at both global and national levels, along with syntactic and semantic standards. These can be utilized by the research community to describe metadata, data types, and formats from clinical, epidemiological, and public health research in a structured manner. Further standards, ontologies and terminologies might be applicable. We present an initial overview of the collaborative standardization efforts and current use of standards within a national infrastructure project for epidemiological, public health and clinical studies. Through the dissemination of these insights, we aim to empower the research community to leverage standardized practices, thereby advancing the pursuit of breakthroughs in health and medical sciences.
Results
Standardization efforts in health research
The independent International Organization for Standardization (ISO) is a non-governmental organization focusing on the development and publication of international standards. To date, 171 national standards bodies are members, facilitating the exchange of expert knowledge to tackle global challenges and foster innovation by developing relevant consensus-based, voluntary standards6. The Research Data Alliance (RDA) collects, develops and refines several standards and information to enable interoperability between research data repositories7. One example is the RDA COVID-19 Recommendations and Guidelines on Data Sharing8 that also can be seen as model for data sharing guidelines for other research studies in the health sector. In the US and Canada, the Accredited Standards Committee (ASC) is the prevailing SDO. At the European level three SDOs are responsible for defining and developing voluntary standards: the Comité Européen de Normalisation (short: CEN; for various kinds of services, processes, products and materials), Comité Européen de Normalisation Electrotechnique (short: CENELEC; for electrotechnical standardization)9 and European Telecommunications Standards Institute (short: ETSI; for information and communication technologies)10.
In the domain of healthcare, nine global initiatives work together since 2007 within the Joint Initiative Council (JIC) on solving real-world problems: Clinical Data Interchange Standards Consortium (CDISC), Digital Imaging and Communications in Medicine (DICOM), CEN/TC 251, GS1 Healthcare, Health Level 7 (HL7) International, Integrating the Healthcare Enterprise (IHE) International, ISO/Technical Committee 215, Logical Observation Identifiers Names and Codes (LOINC) and Systematized Nomenclature of Medicine (SNOMED) International. They enable real-time information exchange in healthcare by using standards based on full interoperability of information and processes11. The Global Alliance for Genomics & Health (GA4GH)12 reunites a growing number of public and private institutions from healthcare delivery and (health) research, companies, societies, funders, agencies and NGOs with the overarching goal of allowing responsible sharing of genomic data while respecting human rights. GA4GH frames policies and develops and/or refines technical standards13. Global Digital Health Partnership (GDHP), an international collaboration on digital health, was established in 2018 by several governments, government agencies, territories, multinational organizations and the World Health Organization (WHO). The alliance comprises currently 36 members and intercedes for the best use of digital technologies backed by evidence to improve well-being and health14. GDHP publishes regularly white papers about interoperability, clinical and consumer engagement, cybersecurity, policy environments and evidence and evaluation topics15,16. Further collaboration entail the Personal Connected Health Alliance (PCHA)17, or the collaboration between the American Office of the National Coordinator for Health Information Technology (ONC)18 and the European Union19 or the United Kingdom20. ONC serves also as the lead US representative to the GDHP21.
The ISO committee for standards in biotechnology (ISO/TC 276)22 and its working group ISO/TC 276/WG 5“Data Processing and Integration” are working on standards for data in life sciences that can and should be considered for health data (Table 3). Initial releases include guideline standards for data publication (ISO/TR 3985)23 and requirements for data formatting and description in life sciences (ISO 20691)24. Additionally, a series of standards for provenance information models for biological material and data (ISO 23494) is currently under development in ISO/TC 276/WG 5 and will be published progressively in the coming years. Moreover, in ISO/TC 215, as well as in ISO/TC 276/WG 5 several standard drafts are currently being developed for data and metadata in personalized medicine.
Identified standards
We identified 7 syntactic, 32 semantic and 9 combined syntactic and semantic standards that are potentially relevant to NFDI4Health (Fig. 1). In addition, we identified further 101 ISO Standards (Table 3) from ISO/TC 215 Health Informatics and ISO/TC 276 Biotechnology, which are presented in additional file 2. Features of syntactic and semantic standards are represented in Table 1 and Table 2, respectively.
Current standards in NFDI4Health
Within NFDI4Health, a tailored metadata schema (MDS) was created to collect information from German clinical, epidemiological and public health studies collecting information on studies and their comprised study resources (e.g., study documents, instruments, data collections, etc.)25,26. To ensure the syntactic and semantic interoperability of the register based on the MDS, a mapping of the MDS elements to FHIR was performed and the feasibility was analyzed27. In addition, metadata included in the re3data28 schema and clinicaltrials.gov were compared to the NFDI4Health MDS. The metadata from ECRIN29 and DDI30 were also compared to the MDS. SNOMED CT, HL7 Terminology, NCIt, MeSH, ISO and ICD were used for Value Sets in the NFDI4Health MDS. The suitability of SNOMED CT for the data annotation of variables from questionnaires originating from clinical but also epidemiological and Public Health studies was evaluated by performing mappings to SNOMED CT. The results of the annotation were implemented on a test basis in OPAL/MICA31,32. OPAL/MICA are open software solutions built for managing and harmonizing epidemiological data33. With our mapping activities we evaluated suitability of different standards for our NFDI4Health use-cases.
Discussion
Next to the existence of several SDOs responsible for developing standards in health research, we found in total, 7 syntactic, 32 semantic and 9 combined syntactic and semantic standards that may be pertinent to the epidemiological, public health and clinical research. Furthermore, we also identified an additional 101 ISO Standards sourced from ISO/TC 215 Health Informatics and ISO/TC 276 Biotechnology (Table 3).
While there is literature and guidance on the use of standards in health records34, our literature review revealed a notable lack of comprehensive overviews to guide the selection of standards for health research studies.
In our use case for NFDI4Health, we specifically focused on metadata describing studies and study questionnaires. Our ultimate goal is to make these metadata elements exchangeable and comparable across clinical, epidemiological, and public health studies.
The World Health Organization’s “International Standards for Clinical Registries” does not specifically recommend any semantic or syntactic standards. It merely advises that “in addition to free text, controlled vocabularies may be used,” citing SNOMED, ICD, and MeSH as examples and recommending controlled vocabularies that can be mapped to the Unified Medical Language System (UMLS) Metathesaurus, as used by the ICTRP Search Portal35. In our current MDS, we implemented SNOMED and MeSH amongst NCI, LOINC and ISO and provided concept maps to UMLS36,37.
The “Second Joint Action Towards the European Health Data Space – TEHDAS2” project provided a list of relevant standards for harmonizing semantic and syntactic interoperability in the European Health Data Space38. Our list includes all the semantic and syntactic standards identified by this working group. However, they also provided a list of metadata standards, which we did not explore in this manuscript. When developing the MDS, we performed mappings to several of these metadata standards, such as ECRIN which we will report on the future.
Harmonization of retrospective data is only one goal of NFDI4Health. There is a need to increase global awareness about the importance of standards and to incentivize the prospective use of internationally, widely recognized standards in studies, starting with the planning phase. As NFDI4Health targets health data, it is crucial to apply standards used in the healthcare system such as SNOMED CT, ICD and LOINC. By doing so, the entire community may benefit from improved data exchange possibilities.
Due to the heterogeneity of studies, multiple standards might need to be combined based on the specific needs and variables assessed. Evaluating the mappings between these standards is also essential. However, it is important to avoid creating further data silos by installing an excessive number of standards, which could hinder interoperability. Relying on existing concepts and aligning with other projects can benefit the entire community by improving data exchange possibilities. This vast array of health standards highlights the need for interoperability at an organizational level to implement standards on a consensus basis. Therefore, it is essential to consider already existing guidelines and established standards on both national and international levels. For NFDI4Health, this means considering already established data models and standards in the German healthcare system as well as other projects, such as the medical informatics initiative39. The transport and content standard HL7 FHIR is part of several new requirements in the European healthcare system to rely on one common standard40. FHIR is easy to use, adaptable and relies on already existing web technologies which can be used in web and mobile applications, and we therefore decided to use it as exchange standard for our MDS36. Of course, other standards are not to be missed and will be identified according to the requirements of the use cases. This work serves as basis for future (meta-)data repositories, establishing services necessary to harmonize and standardize (meta-)data, enabling analyze and access those (meta-)data, and introducing relevant guidelines for the entire NFDI4Health consortium and beyond.
Methods
Identification of standards
To identify relevant SDOs and standards for health(care) data within NFDI4Health, we conducted a literature search and searches in community-driven portals such as FAIRsharing, BioPortal, and EMBL-EBI Ontology Lookup Service (short: OLS Ontology Search) as well as the website of ISO. In addition, over a time of three years, we gathered information from use cases and domain experts in the field of health research and interoperability. Therefore, interviews were conducted with each of the five use cases at least once and up to three times. We held a workshop on metadata standards with the entire community discussing the community needs and identifying relevant standards. We performed mappings of study instruments and the developed metadata schema to international standards and analysed these for their suitability and finally developed value sets to be used in NFDI4Health’ MDS27,32. Standards such as terminologies, ontologies, vocabularies were considered semantic standards. We included requirements from the user community and their use cases, feedback and experiences, existing guidelines and recommended (inter-)national standards. Each activity was reviewed in interdisciplinary biweekly meetings and commented by the general assembly of NFDI4Health. All authors have either significant expertise and/or practical experience in the field of interoperability and/or health research.
Categorization of standards
In this study, we categorized standards used in health research into three categories: semantic, syntactic, or both. The categorization process was based on specific criteria related to the nature and application of each standard. Standards were classified as semantic if they primarily focused on meaning and interpretation of data. This included terminologies, vocabularies, and ontologies. Examples of such standards include SNOMED CT, LOINC, and ICD. These standards provide a structured way to describe the data and ensure consistent interpretation across different systems and contexts. Standards were classified as syntactic if they focused on the structure and format of data exchange. These standards define how data is formatted, encoded, and transmitted between systems. Examples include HL7 CDA. These standards ensure that the data can be correctly parsed and understood at a structural level by receiving systems. Some standards encompass both semantic and syntactic elements. These standards not only define the structure and format of data but also include value sets or terminologies for ensuring consistent meaning. An example of such a standard is HL7 FHIR, which includes both a syntactic framework for data exchange and its own value sets for semantic consistency.
Analysis
Results were presented in tables and Fig. 1 was created using R statistical software (version 2024.04.1; R Foundation for Statistical Computing)41 and the VennDiagramm packages.
Data availability
All identified standards can be found in our GitHub repository (https://github.com/nfdi4health/IdentifiedStandards.git)
Code availability
The code for generating Fig. 1 is provided in our GitHub repository (https://github.com/nfdi4health/IdentifiedStandards.git).
References
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, (2016).
Chapter 2 - SDO Education Playbook. https://www.healthit.gov/playbook/sdo-education/chapter-2/.
Terminology Standards | HIMSS. https://www.himss.org/terminology-standards.
Umberfield, E. E. et al. Syntactic interoperability and the role of syntactic standards in health information exchange. Health Information Exchange: Navigating and Managing a Network of Health Information Systems 217–236 https://doi.org/10.1016/B978-0-323-90802-3.00004-6 (2023).
Home - NFDI4Health. https://www.nfdi4health.de/.
ISO - ISO/IEC Guide 2:2004 - Standardization and related activities — General vocabulary. https://www.iso.org/standard/39976.html.
RDA | Research Data Sharing without barriers. https://www.rd-alliance.org/.
Group, R. C.-19 W. RDA COVID-19 Recommendations and Guidelines on Data Sharing, https://doi.org/10.15497/RDA00052 (2020).
CEN-CENELEC - CEN-CENELEC. https://www.cencenelec.eu/.
ETSI - Welcome to the World of Standards! https://www.etsi.org/.
Welcome to Joint Initiative Council. http://www.jointinitiativecouncil.org/.
GA4GH. https://www.ga4gh.org/.
Global Alliance for Genomics and Health (GA4GH).
https://gdhp.health/ Global Digital Health Partnership.
Personal Connected Health Alliance. https://www.pchalliance.org/.
About ONC | HealthIT.gov. https://www.healthit.gov/topic/about-onc.
Collaboration with the European Union | HealthIT.gov. https://www.healthit.gov/topic/collaboration-european-union.
Collaboration with the United Kingdom | HealthIT.gov. https://www.healthit.gov/topic/collaboration-united-kingdom.
The Global Digital Health Partnership | HealthIT.gov. https://www.healthit.gov/topic/global-digital-health-partnership.
ISO - ISO/TC 276 - Biotechnology. https://www.iso.org/committee/4514241.html.
ISO - ISO/TR 3985:2021 - Biotechnology — Data publication — Preliminary considerations and concepts. https://www.iso.org/standard/79690.html.
ISO - ISO 20691:2022 - Biotechnology — Requirements for data formatting and description in the life sciences. https://www.iso.org/standard/68848.html.
Schmidt, C. O. et al. [Making COVID-19 research data more accessible-building a nationwide information infrastructure]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 64, 1084–1092 (2021).
NFDI4Health Task Force COVID-19 Metadata Schema. https://fairdomhub.org/data_files/3972?version=1.
Klofenstein, S. A. I. et al. Fast Healthcare Interoperability Resources (FHIR) in a FAIR Metadata Registry for COVID-19 Research. Stud Health Technol Inform 287, 73–77 (2021).
Home | re3data.org. https://www.re3data.org/.
Ecrin | Facilitating European Clinical Research. https://ecrin.org/.
Welcome to the Data Documentation Initiative | Data Documentation Initiative. https://ddialliance.org/.
Vorisek, C. N. et al. Evaluating Suitability of SNOMED CT in Structured Searches for COVID-19 Studies. Stud Health Technol Inform 281, 88–92 (2021).
Vorisek, C. N. et al. Implementing SNOMED CT in Open Software Solutions to Enhance the Findability of COVID-19 Questionnaires. Stud Health Technol Inform 294, 649–653 (2022).
Doiron, D., Marcon, Y., Fortier, I., Burton, P. & Ferretti, V. Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination. Int J Epidemiol 46, 1372–1378 (2017).
de Mello, B. H. et al. Semantic interoperability in health records standards: a systematic literature review. Health Technol (Berl) 12, 255 (2022).
International Standards for Clinical Trial Registries. (2018).
NFDI4Health Metadata Schema - SIMPLIFIER.NET. https://simplifier.net/NFDI4Health-Metadata-Schema/~introduction.
ART-DECOR®. https://art-decor.org/ad/#/nfdhtfcov19-/project/overview.
Bernal-Delgado, E., Estupiñán-Romero, F. & Launa-Garces, R. Identification of relevant standards and data models for semantic harmonization 0 Document info 0.1 Authors Author Partner. (2021).
Medical Informatics Initiative | Medical Informatics Initiative. https://www.medizininformatik-initiative.de/en/start.
Gesundheitsdaten: FHIR wird europaweiter Standard. https://www.aerzteblatt.de/nachrichten/142159/Gesundheitsdaten-FHIR-wird-europaweiter-Standard.
R: The R Project for Statistical Computing. https://www.r-project.org.
ISO/TR 12300:2014 - Health informatics — Principles of mapping between terminological systems. https://www.iso.org/standard/51344.html.
ISO 13119:2022 - Health informatics — Clinical knowledge resources — Metadata. https://www.iso.org/standard/78392.html.
ISO 13940:2015 - Health informatics — System of concepts to support continuity of care. https://www.iso.org/standard/58102.html.
ISO 13972:2022 - Health informatics — Clinical information models — Characteristics, structures and requirements. https://www.iso.org/standard/79498.html.
ISO 14199:2015 - Health informatics — Information models — Biomedical Research Integrated Domain Group (BRIDG) Model. https://www.iso.org/standard/66767.html.
ISO 16278:2016 - Health informatics — Categorial structure for terminological systems of human anatomy. https://www.iso.org/standard/56047.html.
ISO/TS 21526:2019 - Health informatics — Metadata repository requirements (MetaRep). https://www.iso.org/standard/71041.html.
ISO/TS 21564:2019 - Health Informatics — Terminology resource map quality measures (MapQual). https://www.iso.org/standard/71088.html.
ISO/HL7 21731:2014 - Health informatics — HL7 version 3 — Reference information model — Release 4. https://www.iso.org/standard/61454.html.
ISO 27269:2021 - Health informatics — International patient summary. https://www.iso.org/standard/79491.html.
ISO/HL7 27931:2009 - Data Exchange Standards — Health Level Seven Version 2.5 — An application protocol for electronic data exchange in healthcare environments. https://www.iso.org/standard/44428.html.
ISO/HL7 27932:2009 - Data Exchange Standards — HL7 Clinical Document Architecture, Release 2. https://www.iso.org/standard/44429.html.
ISO/HL7 27951:2009 - Health informatics — Common terminology services, release 1. https://www.iso.org/standard/44437.html.
ISO/TS 23494-1:2023 - Biotechnology — Provenance information model for biological material and data — Part 1: Design concepts and general requirements. https://www.iso.org/standard/80715.html.
ISO/TR 3985:2021 - Biotechnology — Data publication — Preliminary considerations and concepts. https://www.iso.org/standard/79690.html.
ISO/CD TS 6201 - Health Informatics — Personalized Digital Health -Framework. https://www.iso.org/standard/82107.html.
ISO/PWI TS 6203. https://iss.rs/en/project/show/iso:proj:82109.
ISO/TR 11147:2023 - Health informatics — Personalized digital health — Digital therapeutics health software systems. https://www.iso.org/standard/83767.html.
ISO/CD 9472-10000 - Health informatics — Personalized health navigation — Part 10000: Architecture. https://www.iso.org/standard/83497.html.
ISO/AWI TR 24305 - Health informatics - Guidelines for implementation of HL7/FHIR based on ISO 13940 and ISO 13606. https://www.iso.org/standard/78390.html.
ISO 29585 - 2023-06 - Beuth.de. https://www.beuth.de/de/norm/iso-29585/370015864.
ISO 10781:2023 - Health informatics — HL7 Electronic Health Record-System Functional Model, Release 2.1 (EHR FM). https://www.iso.org/standard/84722.html.
ISO 4454:2022 - Genomics informatics — Phenopackets: A format for phenotypic data exchange. https://www.iso.org/standard/79991.html.
Acknowledgements
This work was done as part of the NFDI4Health Consortium (www.nfdi4health.de). We gratefully acknowledge the financial support of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 442326535.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
S.A.I.K. and C.N.V. performed the literature search and analysis of possible standards, participating and conducting workshops, interviews and meetings, interpreting the data as well as writing and revising the manuscript. S.T. was part of the literature search and analysis and revised the manuscript. P.J.M. supported interpretation of the data as well as writing the manuscript. C.O.S. and M.G. collaborated on the development of the MDS and initial standards comparisons. All authors provided feedback on the manuscript.
Corresponding author
Ethics declarations
Competing interests
S.T. is chair of HL7 Deutschland e.V. MG is convenor of ISO/TC 276/WG 5. The remaining authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Vorisek, C.N., Klopfenstein, S.A.I., Löbe, M. et al. Towards an Interoperability Landscape for a National Research Data Infrastructure for Personal Health Data. Sci Data 11, 772 (2024). https://doi.org/10.1038/s41597-024-03615-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03615-3
- Springer Nature Limited