Skip to main content

Abstract

Data harmonisation is an essential step for federated research, which often involves heterogeneous data sources. A standardised structure and terminology of the source allows application of standardised study protocol and analysis code. A Common Data Model (CDM) accompanied with standardised software supports standardised federated analytics. In this chapter we demonstrate the benefit of Common Data Models and the OMOP CDM in particular. We also introduce a general pipeline of an Extract Transform Load process to transform health data to the OMOP CDM and provide an overview of the supporting tooling that ensures a high-quality conversion. Finally, we discuss potential challenges of the harmonisation process and how to address them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The Pareto Principle, also 80/20 principle; applied on the mapping problem the principle says that by covering 20% of most frequently used terms, an 80% of all records will be mapped correctly. In the opposite way, to cover/map the last 20% of source terms will take approx. 80% of time.

References

  1. Hripcsak G, Ryan PB, Duke JD, et al. Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A. 2016;113:7329–36. https://doi.org/10.1073/pnas.1510502113.

    Article  Google Scholar 

  2. Williams RD, Markus AF, Yang C, et al. Seek COVER: development and validation of a personalized risk calculator for COVID-19 outcomes in an international network. bioRxiv. 2020. https://doi.org/10.1101/2020.05.26.20112649.

  3. FAIR principles. GO FAIR. 2017. https://www.go-fair.org/fair-principles/ (Accessed 29 Jun 2022).

  4. EMA. A common data model in Europe? – Why? Which? How? European Medicines Agency. 2018. https://www.ema.europa.eu/events/common-data-model-europe-why-which-how (Accessed 29 Jun 2022).

  5. FHIR v4.3.0. http://hl7.org/fhir/R4B (Accessed 29 Jun 2022).

  6. Kalra D, Beale T, Heard S. The openEHR Foundation. Stud Health Technol Inform 2005;115:153–73. https://www.ncbi.nlm.nih.gov/pubmed/16160223.

  7. OHDSI—observational health data sciences and informatics. http://ohdsi.org (Accessed 29 Jun 2022).

  8. SDTM. https://www.cdisc.org/standards/foundational/sdtm (Accessed 29 Jun 2022).

  9. Schuemie MJ, Ryan PB, Pratt N, et al. Large-scale evidence generation and evaluation across a network of databases (LEGEND): assessing validity using hypertension as a case study. J Am Med Inform Assoc. 2020;27:1268–77. https://doi.org/10.1093/jamia/ocaa124.

    Article  Google Scholar 

  10. Overhage JM, Ryan PB, Reich CG, et al. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19:54–60. https://doi.org/10.1136/amiajnl-2011-000376.

    Article  Google Scholar 

  11. Benson T, Grieve G. Principles of health interoperability: FHIR, HL7 and SNOMED CT. Springer Nature 2020. https://play.google.com/store/books/details?id=TiwEEAAAQBAJ.

  12. Observational Health Data Sciences, Informatics. Chapter 1 the OHDSI community. 2021.https://ohdsi.github.io/TheBookOfOhdsi/OhdsiCommunity.html (Accessed 29 Jun 2022).

  13. index.knit. https://ohdsi.github.io/CommonDataModel/index.html (Accessed 29 Jun 2022).

  14. Data standardization. https://ohdsi.org/data-standardization/ (Accessed 29 Jun 2022).

  15. Liu J, Li D, Gioiosa R, et al. Athena. In: Proceedings of the ACM international conference on supercomputing. New York, NY, USA: ACM 2021. https://doi.org/10.1145/3447818.3460355.

  16. Kernighan BW, Plauger PJ. Software tools. SIGSOFT Softw Eng Notes. 1976;1:15–20. https://doi.org/10.1145/1010726.1010728.

    Article  MATH  Google Scholar 

  17. Digital Natives. Mapping UK Biobank to the OMOP CDM using the flexible ETL framework Delphyne. the-hyve. https://www.thehyve.nl/cases/mapping-uk-biobank-to-omop-using-delphyne (Accessed 19 Jul 2022).

  18. ‘Perseus’: Design and run your own ETL to CDM. https://ohdsi.org/2021-global-symposium-showcase-79/ (Accessed 19 Jul 2022).

  19. OHDSI WhiteRabbit tool. Github https://github.com/OHDSI/WhiteRabbit (Accessed 25 May 2022).

  20. Rabbit in a Hat. http://ohdsi.github.io/WhiteRabbit/RabbitInAHat.html (Accessed 29 Jun 2022).

  21. Denaxas SC, George J, Herrett E, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol. 2012;41:1625–38. https://doi.org/10.1093/ije/dys188.

    Article  Google Scholar 

  22. Papez V, Moinat M, Payralbe S, et al. Transforming and evaluating electronic health record disease phenotyping algorithms using the OMOP common data model: a case study in heart failure. JAMIA Open 2021;4:ooab001. https://doi.org/10.1093/jamiaopen/ooab001.

  23. USAGI for vocabulary mapping. https://www.ohdsi.org/analytic-tools/usagi/ (Accessed 29 Jun 2022).

  24. ACHILLES for data characterization. https://www.ohdsi.org/analytic-tools/achilles-for-data-characterization/ (Accessed 29 Jun 2022).

  25. DataQualityDashboard: A tool to help improve data quality standards in observational data science. Github https://github.com/OHDSI/DataQualityDashboard (Accessed 29 Jun 2022).

  26. CdmInspection: R Package to support quality control inspection of an OMOP-CDM instance. Github https://github.com/EHDEN/CdmInspection (Accessed 29 Jun 2022).

  27. Diagnostics for OHDSI cohorts. https://ohdsi.github.io/CohortDiagnostics/ (Accessed 29 Jun 2022).

  28. Valueset-concept-map-equivalence - FHIR v4.3.0. https://www.hl7.org/fhir/valueset-concept-map-equivalence.html (Accessed 29 Jun 2022).

  29. Denaxas S. spiros/tofu: Updated release for DOI. 2020. https://doi.org/10.5281/zenodo.3634604.

  30. synthetichealth. Github https://github.com/synthetichealth (Accessed 29 Jun 2022).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaclav Papez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Moinat, M., Papez, V., Denaxas, S. (2023). Data Integration and Harmonisation. In: Asselbergs, F.W., Denaxas, S., Oberski, D.L., Moore, J.H. (eds) Clinical Applications of Artificial Intelligence in Real-World Data. Springer, Cham. https://doi.org/10.1007/978-3-031-36678-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36678-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36677-2

  • Online ISBN: 978-3-031-36678-9

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics