Skip to main content

AIM in Medical Informatics

Abstract

Providing accurate diagnoses of diseases and maximizing the effectiveness of treatments requires, in general, complex analyses of many clinical, omics, and pathological data. Making a fruitful use of such data is not straightforward, as they are usually stored in electronic health records (EHRs) that need to be properly handled and processed in order to successfully perform medical diagnosis. In recent years, machine learning and deep learning techniques have emerged as powerful tools to perform specific disease detection and classification using EHRs data, thus providing significant clinical decision support. However, approaches based on such techniques suffer from the lack of proper means for interpreting the choices made by the models, especially in the case of deep-learning ones. In this chapter, we describe clinical and omics data along with the popular processing operations performed to improve the medical analyses. We present the most common used algorithms in automatic medical diagnosis and the advance in explainability of machine learning-based systems to validate healthcare decision-making.

Keywords

  • Medical diagnosis
  • Health care
  • Clinical data
  • Electronic health records
  • Omics data
  • Machine learning
  • Deep learning
  • Explainability

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-64573-1_32
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   799.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-64573-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   1,199.99
Price excludes VAT (USA)
Fig. 1
Fig. 2

References

  1. Evans JA. Electronic medical records system. Google Patents; 1999. US Patent 5,924,074.

    Google Scholar 

  2. Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018;15(3):20170030.

    PubMed Central  Google Scholar 

  3. McGinnis JM, Olsen L, Goolsby WA, Grossmann C, et al. Clinical data as the basic staple of health learning: creating and protecting a public good: workshop summary. National Academies Press; 2011.

    Google Scholar 

  4. Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet. 2020;21:493–502.

    CAS  PubMed  Google Scholar 

  5. Brisimi TS, Xu T, Wang T, Dai W, Adams WG, Paschalidis IC. Predicting chronic disease hospitalizations from electronic health records: an interpretable classification approach. Proc IEEE. 2018;106(4):690–707.

    Google Scholar 

  6. Garcelon N, Burgun A, Salomon R, Neuraz A. Electronic health records for the diagnosis of rare diseases. Kidney Int. 2020;97(4):676–86.

    PubMed  Google Scholar 

  7. Wise AL, Manolio TA, Mensah GA, Peterson JF, Roden DM, Tamburro C, et al. Genomic medicine for undiagnosed diseases. Lancet. 2019;394(10197):533–40.

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Bruno P, Calimeri F. Using heatmaps for deep learning based disease classification. In: 2019 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB). IEEE; 2019. p. 1–7.

    Google Scholar 

  9. Zhu B, Song N, Shen R, Arora A, Machiela MJ, Song L, et al. Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep. 2017;7(1):1–13.

    Google Scholar 

  10. Oromendia A, Ismailgeci D, Ciofii M, Donnelly T, Bojmar L, Jyazbek J, et al. Error-free, automated data integration of exosome cargo protein data with extensive clinical data in an ongoing, multi-omic translational research study. Proc Am Soc Clin Oncol. 2020;38:e16743.

    Google Scholar 

  11. Yamada R, Okada D, Wang J, Basak T, Koyama S. Interpretation of omics data analyses. J Hum Genet. 2020;66:93–102.

    PubMed  PubMed Central  Google Scholar 

  12. Yu XT, Zeng T. Integrative analysis of omics big data. In: Computational systems biology. Springer; 2018. p. 109–35.

    Google Scholar 

  13. Wu PY, Cheng CW, Kaddi CD, Venugopalan J, Hoffman R, Wang MD. Omic and electronic health record big data analytics for precision medicine. IEEE Trans Biomed Eng. 2016;64(2):263–73.

    PubMed  PubMed Central  Google Scholar 

  14. Fu MR, Kurnat-Thoma E, Starkweather A, Henderson WA, Cashion AK, Williams JK, et al. Precision health: a nursing perspective. Int J Nurs Sci. 2020;7(1):5–12.

    PubMed  Google Scholar 

  15. Madhavan S, Subramaniam S, Brown TD, Chen JL. Art and challenges of precision medicine: interpreting and integrating genomic data into clinical practice. Am Soc Clin Oncol Educ Book. 2018;38:546–53.

    PubMed  Google Scholar 

  16. Ford E, Rooney P, Hurley P, Oliver S, Bremner S, Cassell J. Can the use of Bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? Development of a novel method using simulated and real-life clinical data. Front Public Health. 2020;8:54.

    PubMed  PubMed Central  Google Scholar 

  17. Krishnankutty B, Bellary S, Kumar NB, Moodahadu LS. Data management in clinical research: an overview. Indian J Pharmacol. 2012;44(2):168.

    PubMed  PubMed Central  Google Scholar 

  18. Howe JL, Adams KT, Hettinger AZ, Ratwani RM. Electronic health record usability issues and potential contribution to patient harm. JAMA. 2018;319(12):1276–8.

    PubMed  PubMed Central  Google Scholar 

  19. Lowe R, Shirley N, Bleackley M, Dolan S, Shafee T. Transcriptomics technologies. PLoS Comput Biol. 2017;13(5):e1005457.

    PubMed  PubMed Central  Google Scholar 

  20. Weichenhan D, Lipka DB, Lutsik P, Goyal A, Plass C. Epigenomic technologies for precision oncology. In: Seminars in cancer biology. Elsevier; 2020.

    Google Scholar 

  21. Clark DJ, Zhang H. Proteomic approaches for characterizing renal cell carcinoma. Clin Proteomics. 2020;17(1):1–18.

    Google Scholar 

  22. Njoku K, Sutton CJ, Whetton AD, Crosbie EJ. Metabolomic biomarkers for detection, prognosis and identifying recurrence in endometrial Cancer. Meta. 2020;10(8):314.

    CAS  Google Scholar 

  23. Abul-Husn NS, Kenny EE. Personalized medicine and the power of electronic health records. Cell. 2019;177(1):58–69.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights. 2020;14:1177932219899051.

    PubMed  PubMed Central  Google Scholar 

  25. Gajula M. Its time to integrate multi omics data to understand real biology. Int J Syst Algorithms Appl. 2012;2:31–4.

    Google Scholar 

  26. Tebani A, Afonso C, Marret S, Bekri S. Omics-based strategies in precision medicine: toward a paradigm shift in inborn errors of metabolism investigations. Int J Mol Sci. 2016;17(9):1555.

    PubMed Central  Google Scholar 

  27. Iacobucci I, Wen J, Meggendorfer M, Choi JK, Shi L, Pounds SB, et al. Genomic subtyping and therapeutic targeting of acute erythroleukemia. Nat Genet. 2019;51(4):694–704.

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Soler-Botija C, Gálvez-Montón C, Bayes GA. Epigenetic biomarkers in cardiovascular diseases. Front Genet. 2019;10:950.

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Taha IN, Naba A. Exploring the extracellular matrix in health and disease using proteomics. Essays Biochem. 2019;63(3):417–32.

    CAS  PubMed  Google Scholar 

  30. Shao Y, Le W. Recent advances and perspectives of metabolomics-based investigations in Parkinsons disease. Mol Neurodegener. 2019;14(1):3.

    PubMed  PubMed Central  Google Scholar 

  31. Chervitz SA, Deutsch EW, Field D, Parkinson H, Quackenbush J, Rocca-Serra P, et al. Data standards for Omics data: the basis of data sharing and reuse. In: Bioinformatics for Omics data. Springer; 2011. p. 31–69.

    Google Scholar 

  32. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME)toward standards for microarray data. Nat Genet. 2001;29(4):365–71.

    CAS  PubMed  Google Scholar 

  33. Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK, Jones AR, et al. The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol. 2007;25(8):887–93.

    CAS  PubMed  Google Scholar 

  34. Kahl G. Minimum information about a high-throughput nucleotide sequencing experiment (MINSEQE). The dictionary of genomics, transcriptomics and proteomics. Weinheim: Wiley-VCH Verlag GmbH & Co KGaA; 2015.

    Google Scholar 

  35. Wurcel V, Cicchetti A, Garrison L, Kip MM, Koffijberg H, Kolbe A, et al. The value of diagnostic information in personalised healthcare: a comprehensive concept to facilitate bringing this technology into healthcare systems. Public Health Genomics. 2019;22(1-2):8–15.

    PubMed  Google Scholar 

  36. Ahmed Z. Practicing precision medicine with intelligently integrative clinical and multi-omics data analysis. Hum Genomics. 2020;14(1):1–5.

    Google Scholar 

  37. Zampieri G, Vijayakumar S, Yaneske E, Angione C. Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol. 2019;15(7):e1007084.

    PubMed  PubMed Central  Google Scholar 

  38. Voillet V, Besse P, Liaubet L, San Cristobal M, González I. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework. BMC Bioinform. 2016;17(1):1–16.

    Google Scholar 

  39. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162.

    PubMed  PubMed Central  Google Scholar 

  40. Liu L, Nevo D, Nishihara R, Cao Y, Song M, Twombly TS, et al. Utility of inverse probability weighting in molecular pathological epidemiology. Eur J Epidemiol. 2018;33(4):381–92.

    PubMed  Google Scholar 

  41. Malan L, Smuts CM, Baumgartner J, Ricci C. Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns. Nutr Res. 2020;75:67–76.

    CAS  PubMed  Google Scholar 

  42. Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4(2):30.

    PubMed  PubMed Central  Google Scholar 

  43. Khalid S, Khalil T, Nasreen S. A survey of feature selection and feature extraction techniques in machine learning. In: 2014 science and information conference. IEEE; 2014. p. 372–8.

    Google Scholar 

  44. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, et al. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(4):1106–19.

    PubMed  Google Scholar 

  45. Vergara JR, Estévez PA. A review of feature selection methods based on mutual information. Neural Comput Appl. 2014;24(1):175–86.

    Google Scholar 

  46. Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma. 2015;198363:1–13.

    Google Scholar 

  47. Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–38.

    PubMed  Google Scholar 

  48. Almugren N, Alshamlan H. A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access. 2019;7:78533–48.

    Google Scholar 

  49. Pal M, Foody GM. Feature selection for classification of hyperspectral data by SVM. IEEE Trans Geosci Remote Sens. 2010;48(5):2297–307.

    Google Scholar 

  50. Yang L, Xu Z. Feature extraction by PCA and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int J Mach Learn Cybern. 2019;10(3):591–601.

    CAS  Google Scholar 

  51. Thankaswamy-Kosalai S, Sen P, Nookaew I. Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. Genomics. 2017;109(3-4):186–91.

    CAS  PubMed  Google Scholar 

  52. Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21(9):1859–75.

    CAS  PubMed  Google Scholar 

  53. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–60.

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Sugimoto M, Kawakami M, Robert M, Soga T, Tomita M. Bioinformatics tools for mass spectroscopy-based metabolomic data processing and analysis. Curr Bioinforma. 2012;7(1):96–108.

    CAS  Google Scholar 

  55. Cleveland WS, Devlin SJ. Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc. 1988;83(403):596–610.

    Google Scholar 

  56. Corey KM, Kashyap S, Lorenzi E, Lagoo-Deenadayalan SA, Heller K, Whalen K, et al. Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): a retrospective, single-site study. PLoS Med. 2018;15(11):e1002701.

    PubMed  PubMed Central  Google Scholar 

  57. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep. 2016;6(1):1–10.

    Google Scholar 

  58. Kate RJ, Pearce N, Mazumdar D, Nilakantan V. Continual prediction from EHR data for inpatient acute kidney injury. arXiv preprint arXiv:190210228. 2019.

    Google Scholar 

  59. Gupta M, Phan TLT, Bunnell T, Beheshti R. Obesity prediction with EHR data: a deep learning approach with interpretable elements. arXiv. 2019;p. arXiv–1912.

    Google Scholar 

  60. Lee JM, Hauskrecht M. Multi-scale temporal memory for clinical event time-series prediction. In: International conference on artificial intelligence in medicine. Springer; 2020. p. 313–24.

    Google Scholar 

  61. Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Sci Rep. 2018;8(1):1–12.

    Google Scholar 

  62. Chen Y, Li Y, Narayan R, Subramanian A, Xie X. Gene expression inference with deep learning. Bioinformatics. 2016;32(12):1832–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Chen R, Yang L, Goodison S, Sun Y. Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data. Bioinformatics. 2020;36(5):1476–83.

    CAS  PubMed  Google Scholar 

  64. Bruno P, Calimeri F, Kitanidis AS, De Momi E. Data reduction and data visualization for automatic diagnosis using gene expression and clinical data. Artif Intell Med. 2020;107:101884.

    PubMed  Google Scholar 

  65. Thomas J, Thomas S, Sael L. DP-miRNA: an improved prediction of precursor microRNA using deep learning model. In: 2017 IEEE international conference on big data and smart computing (BigComp). IEEE; 2017. p. 96–9.

    Google Scholar 

  66. Bobak CA, Titus AJ, Hill JE. Comparison of common machine learning models for classification of tuberculosis using transcriptional biomarkers from integrated datasets. Appl Soft Comput. 2019;74:264–73.

    Google Scholar 

  67. Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016;44(11):e107.

    PubMed  PubMed Central  Google Scholar 

  68. Yin Q, Wu M, Liu Q, Lv H, Jiang R. DeepHistone: a deep learning approach to predicting histone modifications. BMC Genomics. 2019;20(2):11–23.

    Google Scholar 

  69. Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol. 2017;13(1):e1005324.

    PubMed  PubMed Central  Google Scholar 

  70. Liang CA, Chen L, Wahed A, Nguyen AN. Proteomics analysis of FLT3-ITD mutation in acute myeloid leukemia using deep learning neural network. Ann Clin Lab Sci. 2019;49(1):119–26.

    PubMed  Google Scholar 

  71. Stamate D, Kim M, Proitsi P, Westwood S, Baird A, Nevado-Holgado A, et al. A metabolite-based machine learning approach to diagnose Alzheimer-type dementia in blood: results from the European Medical Information Framework for Alzheimer disease biomarker discovery cohort. Alzheimer’s Dement: Transl Res Clin Interv. 2019;5(1):933–8.

    Google Scholar 

  72. Muzio G, O’Bray L, Borgwardt K. Biological network analysis with deep learning. Brief Bioinform. 2020;22:1515.

    PubMed Central  Google Scholar 

  73. Shaban-Nejad A, Michalowski M, Buckeridge DL. Explainability and interpretability: keys to deep medicine. In: Explainable AI in healthcare and medicine. Springer; 2021. p. 1–10.

    Google Scholar 

  74. Anguita-Ruiz A, Segura-Delgado A, Alcalá R, Aguilera CM, Alcalá-Fdez J. eXplainable Artificial Intelligence (XAI) for the identification of biologically relevant gene expression patterns in longitudinal human studies, insights from obesity research. PLoS Comput Biol. 2020;16(4):e1007792.

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Park S, Kim YJ, Kim JW, Park JJ, Ryu B, Ha JW. Interpretable prediction of vascular diseases from electronic health records via deep attention networks. In: 18th IEEE international conference on bioinformatics and bioengineering, BIBE 2018. Institute of Electrical and Electronics Engineers; 2018. p. 110–7.

    Google Scholar 

  76. Bruno P, Calimeri F, Kitanidis AS, De Momi E. Understanding automatic diagnosis and classification processes with data visualization. In: 2020 IEEE international conference on human-machine systems (ICHMS), vol. 2020. IEEE. p. 1–6.

    Google Scholar 

  77. Hu H, Xiao A, Zhang S, Li Y, Shi X, Jiang T, et al. DeepHINT: understanding HIV-1 integration via deep learning with attention. Bioinformatics. 2019;35(10):1660–7.

    CAS  PubMed  Google Scholar 

  78. Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W. Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in neural information processing systems. Curran Associates; 2016. p. 3504–12.

    Google Scholar 

  79. Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. arXiv preprint arXiv:170402685. 2017.

    Google Scholar 

  80. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in neural information processing systems. Curran Associates; 2017. p. 4765–74.

    Google Scholar 

  81. Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR, Gustafsson M, et al. Digital twins to personalize medicine. Genome Med. 2020;12(1):1–4.

    Google Scholar 

  82. Croatti A, Gabellini M, Montagna S, Ricci A. On the integration of agents and digital twins in healthcare. J Med Syst. 2020;44(9):1–8.

    Google Scholar 

  83. Karczewski K, Snyder M. Integrative omics for health and disease. Nat Rev Genet. 2018;19:299–310.

    Google Scholar 

  84. Cannataro M, Guzzi PH, Mazza T, Tradigo G, Veltri P. Preprocessing of mass spectrometry proteomics data on the grid. 18th IEEE Symposium on Computer-Based Medical Systems (CBMS’05); 2005. pp. 549–554.

    Google Scholar 

  85. Dhillon A, Ashima S. Machine learning in healthcare data analysis: a survey. J Biol and Today’s World 8(2019):1–10.

    Google Scholar 

  86. Bugnon LA, Yones C, Milone DH, Stegmayer G. Deep neural architectures for highly imbalanced data in bioinformatics. IEEE Transactions on Neural Networks and Learning Systems 31(8):2857–2867

    Google Scholar 

  87. Talukder A, Barham C, Li X, Hu H. Interpretation of deep learning in genomics and epigenomics. Briefings in Bioinformatics 2021;22(3)

    Google Scholar 

  88. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision; 2017. pp. 618–626.

    Google Scholar 

  89. Zihni E, Madai VI, Livne M, Galinovic I, Khalil AA, Fiebach JB, et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS ONE 2020;15(4): e0231166.

    Google Scholar 

  90. Botsis T, Hartvigsen G, Chen F, Weng C. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. Summit on translational bioinformatics, 2010;1–5.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierangela Bruno .

Editor information

Editors and Affiliations

Electronic Supplementary Materials

(MP4 17868 kb)

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this entry

Verify currency and authenticity via CrossMark

Cite this entry

Bruno, P., Calimeri, F., Greco, G. (2022). AIM in Medical Informatics. In: Lidströmer, N., Ashrafian, H. (eds) Artificial Intelligence in Medicine. Springer, Cham. https://doi.org/10.1007/978-3-030-64573-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64573-1_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64572-4

  • Online ISBN: 978-3-030-64573-1

  • eBook Packages: MedicineReference Module Medicine