Skip to main content
Log in

Ontology-Based Data Preparation in Healthcare: The Case of the AMD-STITCH Project

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In the context of healthcare, an AI solution is generally developed for a specific analysis task, based on a relevant dataset, with little attention to reusability and generalizability of its data preparation step. This paper focuses on a different scenario, which can be called context-oriented, where a set of clinical data sources, relevant for a specific context (e.g., a particular disease), is available and can be used for a variety of data analytics tasks, often carried out by different research groups. Therefore, the aim of this research is to present a systematic method, which exploits the Ontology-based Data Management paradigm to enhance data preparation in a context-oriented scenario. The introduced methodology has been applied to a project dealing with big data and regarding the treatment of diabetes and its complications. The peculiarity and challenge of this project lies in the fact that it deals with real world data, extracted from Electronic Medical Records within a 13 years timeframe, and thus not collected for research purposes. The paper focuses on two main steps of data preparation, namely data modeling and data cleaning, and it shows how this approach provides effective techniques for setting up a unified and shared database, to be used in the subsequent data analytics phases as an asset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The data that support the findings of this study are property of the AMD Foundation. Restrictions apply to the availability of these data.

Notes

  1. We clarify that not all the 320 centres constituting the AMD network included their data in the latest AMD annals.

References

  1. Hameed M, Naumann F. Data preparation: a survey of commercial tools. SIGMOD Rec. 2020;49(3):18–29.

    Article  Google Scholar 

  2. Furche T, Gottlob G, Libkin L, Orsi G, Paton N. Data wrangling for big data: challenges and opportunities. In: Advances in database technology — EDBT 2016; 2016. p. 473–8.

  3. Data-centric ai. https://datacentricai.org. Accessed 21 Aug 2022.

  4. Poggi A, Lembo D, Calvanese D, et al. Linking data to ontologies. J Data Semant. 2008;10:133–73.

    Google Scholar 

  5. Calvanese D, Giacomo GD, Lembo D, et al. Ontologies and databases: The dl-lite approach. In: Reasoning Web. Semantic Technologies for Information Systems. Cham: Springer; 2009. p. 255–356.

    Chapter  Google Scholar 

  6. Lenzerini M. Managing data through the lens of an ontology. AI Mag. 2018;39(2):65–74.

    Google Scholar 

  7. Lin J-H, Haug PJ. Data preparation framework for preprocessing clinical data in data mining. In AMIA Annu Symp Proc., 2006; 489–493.

  8. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inf. 2018;22(5):1589–604.

    Article  Google Scholar 

  9. Shang N, Weng C, Hripcsak G. A conceptual framework for evaluating data suitability for observational studies. J Am Med Inf Assoc. 2018;25(3):248–58.

    Article  Google Scholar 

  10. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. Phewas: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinform. 2010;26(9):1205–10.

    Article  Google Scholar 

  11. Miao Z, Sealey MD, Sathyanarayanan SR, Delen D, Zhu L, Shepherd S. A data preparation framework for cleaning electronic health records and assessing cleaning outcomes for secondary analysis. Inf Syst. 2023;111: 102130.

    Article  Google Scholar 

  12. Weiskopf NG, Bakken S, Hripcsak G, Weng C. A data quality assessment guideline for electronic health record data reuse. J Electron Health Data Methods. 2017;5(1):14–33.

    Google Scholar 

  13. Guo H, Scriney M, Liu K. An ostensive information architecture to enhance semantic interoperability for healthcare information systems. Inf Syst Front. 2024;26:277–300.

    Article  Google Scholar 

  14. El-Sappagh S, Ali F. Ddo: a diabetes mellitus diagnosis ontology. Applied Informatics. 2016;3(5)

  15. El-Sappagh S, Kwak D, Ali F, Kwak K-S. Dmto: a realistic ontology for standard diabetes mellitus treatment. Journal of Biomedical Semantics volume. 2018;9(8)

  16. International Diabetes Federation - facts figures. https://idf.org/aboutdiabetes/what-is-diabetes/facts-figures.html. Accessed 21 Aug 2022.

  17. Lin X, Xu Y, Pan X, et al. Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025. Sci Rep. 2020;10(1):14790.

    Article  Google Scholar 

  18. International Diabetes Federation - about diabetes. https://www.idf.org/aboutdiabetes/type-2-diabetes.html. Accessed 21 Aug 2022.

  19. Dabelea D, Mayer-Davis EJ, Saydah S, et al. Prevalence of type 1 and type 2 diabetes among children and adolescents from 2001 to 2009. JAMA. 2014;311(17):1778–86.

    Article  Google Scholar 

  20. Pintaudi B, Scatena A, Piscitelli G, et al. Clinical profiles and quality of care of subjects with type 2 diabetes according to their cardiovascular risk: an observational, retrospective study. Cardiovasc Diabetol. 2021;20(1):59.

    Article  Google Scholar 

  21. The journal of amd. https://www.jamd.it/archivio-annali-amd/. Accessed 21 Aug 2022.

  22. Cucinotta D, Nicolucci A, Giandalia A, et al. Temporal trends in intensification of glucose-lowering therapy for type 2 diabetes in italy: data from the amd annals initiative and their impact on clinical inertia. Diabetes Res Clin Pract. 2021;181:109096.

    Article  Google Scholar 

  23. ATC code. https://www.ema.europa.eu/en/glossary/atc-code. Accessed 21 Aug 2022.

  24. OWL web ontology language guide; 2004. https://www.w3.org/TR/2004/REC-owl-guide-20040210/. Accessed May 2023.

  25. Lembo D, Santarelli V, Savo DF, Giacomo GD. Graphol: a graphical language for ontology modeling equivalent to OWL 2. Future Internet. 2022;14(3):78.

    Article  Google Scholar 

  26. Medicode. ICD-9-CM: International classification of diseases, 9th revision, clinical modification. 1996.

  27. Geerts F, Mecca G, Papotti P, Santoro D. Cleaning data with llunatic. VLDB J. 2020;29(4):867–92.

    Article  Google Scholar 

  28. ADA - understanding A1C. https://diabetes.org/diabetes/a1c. Accessed 21 Aug 2022.

  29. Valentini R, Carrani E, Torre M, Lenzerini M. Ontology-based data management in healthcare: the case of the Italian arthroplasty registry. In: Basili R, Lembo D, Limongelli C, Orlandini A, editors. AIxIA 2023 - Advances in artificial intelligence. Cham: Springer Nature Switzerland; 2023. p. 88–101.

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by MUR under the PRIN 2017 project “HOPE” (prot. 2017MMJJRE), by the EU under the H2020-EU.2.1.1 project TAILOR, grant id. 952215, and by the projects FAIR (PE0000013) and SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU. The authors would like to thank the Associazione Medici Diabetologi (AMD), Fondazione AMD and all the scientists involved in the STITCH-AMD initiative for supporting this work. This work would not have been possible without the precious efforts of Dr. Sebastiano Filetti and the expertise of Dr. Antonio Nicolucci and Dr. Giuseppe Lucisano (CORESEARCH S.r.l.) and all the patients who have been cared over the years in the AMD centers.

Funding

This article is funded by Ministero dell'Università e della Ricerca (2017MMJJRE, PE0000013, PE00000014), H2020 Industrial Leadership (952215).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Croce.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Digital Healthcare and Wellbeing” guest edited by Achilleas Achilleos, George A. Papadopoulos, Edwige Pissaloux and Ramiro Velazquez.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Croce, F., Valentini, R., Maranghi, M. et al. Ontology-Based Data Preparation in Healthcare: The Case of the AMD-STITCH Project. SN COMPUT. SCI. 5, 437 (2024). https://doi.org/10.1007/s42979-024-02757-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02757-w

Keywords

Navigation