Skip to main content

Machine Learning for Structured Clinical Data

  • Chapter
  • First Online:
Advances in Biomedical Informatics

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 137))

Abstract

Research is a tertiary priority in the EHR, where the priorities are patient care and billing. Because of this, the data is not standardized or formatted in a manner easily adapted to machine learning approaches. Data may be missing for a large variety of reasons ranging from individual input styles to differences in clinical decision making, for example, which lab tests to issue. Few patients are annotated at a research quality, limiting sample size and presenting a moving gold standard. Patient progression over time is key to understanding many diseases but many machine learning algorithms require a snapshot, at a single time point, to create a usable vector form. Furthermore, algorithms that produce black box results do not provide the interpretability required for clinical adoption. This chapter discusses these challenges and others in applying machine learning techniques to the structured EHR (i.e. Patient Demographics, Family History, Medication Information, Vital Signs, Laboratory Tests, Genetic Testing). It does not cover feature extraction from additional sources such as imaging data or free text patient notes but the approaches discussed can include features extracted from these sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Collins, F.S., Varmus, H.: A new initiative on precision medicine. N. Engl. J. Med. 363, 1–3 (2010). doi:10.1056/NEJMp1002530

    Article  Google Scholar 

  2. Bishop, C.M.: Pattern recognition and machine learning. Springer, Berlin (2006)

    Google Scholar 

  3. Kreybe, L.: Histological lung cancer types. A morphological and biological correlation. Acta Pathol Microbiol Scand Suppl 157, 1–92 (1962)

    Google Scholar 

  4. Mountain, C.F.: Revisions in the international system for staging lung cancer. Chest 111, 1710–1717 (1997). doi:10.1378/chest.111.6.1710

    Article  Google Scholar 

  5. West, L., Vidwans, S.J., Campbell, N.P., et al.: A novel classification of lung cancer into molecular subtypes. PLoS ONE 7, 1–11 (2012). doi:10.1371/journal.pone.0031906

    Google Scholar 

  6. Shin, J.-A., Lee, J.-H., Lim, S.-Y., et al.: Metabolic syndrome as a predictor of type 2 diabetes, and its clinical interpretations and usefulness. J Diabetes Investig 4, 334–343 (2013). doi:10.1111/jdi.12075

    Article  Google Scholar 

  7. Li, L., Cheng, W., Glicksberg, B.S., et al.: Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7, 1–16 (2015). doi:10.1126/scitranslmed.aaa9364

    Google Scholar 

  8. Lublin, F.D., Reingold, S.C., Cohen, J.A., et al.: Defining the clinical course of multiple sclerosis: The 2013 revisions. Neurology 83, 278–286 (2014). doi:10.1212/WNL.0000000000000560

    Article  Google Scholar 

  9. Denny, J.C., Ritchie, M.D., Basford, M.A., et al.: PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010). doi:10.1093/bioinformatics/btq126

    Article  Google Scholar 

  10. Buyske, S., Yang, G., Matise, T.C., Gordon, D.: When a case is not a case: Effects of phenotype misclassification on power and sample size requirements for the transmission disequilibrium test with affected child trios. Hum. Hered. 67, 287–292 (2009). doi:10.1159/000194981

    Article  Google Scholar 

  11. Gordon D, Yang Y, Haynes C, et al: Increasing power for tests of genetic association in the presence of phenotype and/or genotype error by use of double-sampling. Stat Appl Genet Mol Biol. 3: Article 26 (2004). doi: 10.2202/1544-6115.1085

  12. Manchia, M., Cullis, J., Turecki, G., et al.: The Impact of phenotypic and genetic heterogeneity on results of genome wide association studies of complex diseases. PLoS ONE 8, 1–7 (2013). doi:10.1371/journal.pone.0076295

    Google Scholar 

  13. Labbe, A., Bureau, A., Moreau, I., et al.: Symptom dimensions as alternative phenotypes to address genetic heterogeneity in schizophrenia and bipolar disorder. Eur. J. Hum. Genet. 20, 1182–1188 (2012). doi:10.1038/ejhg.2012.67

    Article  Google Scholar 

  14. Chaste, P., Klei, L., Sanders, S.J., et al.: A genome-wide association study of autism using the Simons Simplex Collection: Does reducing phenotypic heterogeneity in autism increase genetic homogeneity? Biol. Psychiatry 77, 775–784 (2015). doi:10.1016/j.biopsych.2014.09.017

    Article  Google Scholar 

  15. Wiley, L.K., Vanhouten, J.P., Samuels, D.C., et al.: strategies for equitable pharmacogenomic-guided warfarin dosing among european and african american individuals in a clinical population. Pac Symp Biocomput 22, 545–556 (2016)

    Google Scholar 

  16. Shaw, J.: The erosion of privacy in the internet era (2009)

    Google Scholar 

  17. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets (2008)

    Google Scholar 

  18. Shokri, R., Stronati, M., Song, C., Shmatikov, V. Membership inference attacks against machine learning models (2016)

    Google Scholar 

  19. McSherry, F., Talwar, K.: Mechanism design via differential privacy. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE, pp. 94–103 (2007)

    Google Scholar 

  20. Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv (2017). doi:10.1101/159756

    Google Scholar 

  21. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found trends®. Theor Comput Sci 9, 211–407 (2013). doi:10.1561/0400000042

    MATH  MathSciNet  Google Scholar 

  22. Beaulieu-Jones, B.K., Greene, C.S.: Reproducibility of computational workflows is automated using continuous analysis. Nat Biotech 35, 342–346 (2017)

    Article  Google Scholar 

  23. Group TSR: A randomized trial of intensive versus standard blood-pressure control. N. Engl. J. Med. 373, 2103–2116 (2015). doi:10.1056/NEJMoa1511939

    Article  Google Scholar 

  24. Jensen, A.B., Moseley, P.L., Oprea, T.I., et al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nat Commun 5, 1769–1775 (2014). doi:10.1038/ncomms5022

    Google Scholar 

  25. LeCun, Y., Bengio, Y., Hinton, G., et al.: Deep learning. Nature 521, 436–444 (2015). doi:10.1038/nature14539

    Article  Google Scholar 

  26. Beaulieu-Jones, B.K., Greene, C.S.: Semi-supervised learning of the electronic health record for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016). doi:10.1016/j.jbi.2016.10.007

    Article  Google Scholar 

  27. Miotto, R., Li, L., Kidd, B.A., et al.: Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6, 26094 (2016). doi:10.1038/srep26094

    Article  Google Scholar 

  28. Khardori, R.M. Type 2 Diabetes Mellitus. PhekKB 1–24 (2014)

    Google Scholar 

  29. Ching, T. et al. Opportunities And Obstacles For Deep Learning In Biology And Medicine. bioRXiv. 102 (2017). doi: 10.1101/142760

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brett Beaulieu-Jones .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Beaulieu-Jones, B. (2018). Machine Learning for Structured Clinical Data. In: Holmes, D., Jain, L. (eds) Advances in Biomedical Informatics. Intelligent Systems Reference Library, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-67513-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67513-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67512-1

  • Online ISBN: 978-3-319-67513-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics