Machine Learning for Structured Clinical Data

Beaulieu-Jones, Brett

doi:10.1007/978-3-319-67513-8_3

Brett Beaulieu-Jones⁵

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 137))

Abstract

Research is a tertiary priority in the EHR, where the priorities are patient care and billing. Because of this, the data is not standardized or formatted in a manner easily adapted to machine learning approaches. Data may be missing for a large variety of reasons ranging from individual input styles to differences in clinical decision making, for example, which lab tests to issue. Few patients are annotated at a research quality, limiting sample size and presenting a moving gold standard. Patient progression over time is key to understanding many diseases but many machine learning algorithms require a snapshot, at a single time point, to create a usable vector form. Furthermore, algorithms that produce black box results do not provide the interpretability required for clinical adoption. This chapter discusses these challenges and others in applying machine learning techniques to the structured EHR (i.e. Patient Demographics, Family History, Medication Information, Vital Signs, Laboratory Tests, Genetic Testing). It does not cover feature extraction from additional sources such as imaging data or free text patient notes but the approaches discussed can include features extracted from these sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Collins, F.S., Varmus, H.: A new initiative on precision medicine. N. Engl. J. Med. 363, 1–3 (2010). doi:10.1056/NEJMp1002530
Article Google Scholar
Bishop, C.M.: Pattern recognition and machine learning. Springer, Berlin (2006)
Google Scholar
Kreybe, L.: Histological lung cancer types. A morphological and biological correlation. Acta Pathol Microbiol Scand Suppl 157, 1–92 (1962)
Google Scholar
Mountain, C.F.: Revisions in the international system for staging lung cancer. Chest 111, 1710–1717 (1997). doi:10.1378/chest.111.6.1710
Article Google Scholar
West, L., Vidwans, S.J., Campbell, N.P., et al.: A novel classification of lung cancer into molecular subtypes. PLoS ONE 7, 1–11 (2012). doi:10.1371/journal.pone.0031906
Google Scholar
Shin, J.-A., Lee, J.-H., Lim, S.-Y., et al.: Metabolic syndrome as a predictor of type 2 diabetes, and its clinical interpretations and usefulness. J Diabetes Investig 4, 334–343 (2013). doi:10.1111/jdi.12075
Article Google Scholar
Li, L., Cheng, W., Glicksberg, B.S., et al.: Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7, 1–16 (2015). doi:10.1126/scitranslmed.aaa9364
Google Scholar
Lublin, F.D., Reingold, S.C., Cohen, J.A., et al.: Defining the clinical course of multiple sclerosis: The 2013 revisions. Neurology 83, 278–286 (2014). doi:10.1212/WNL.0000000000000560
Article Google Scholar
Denny, J.C., Ritchie, M.D., Basford, M.A., et al.: PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010). doi:10.1093/bioinformatics/btq126
Article Google Scholar
Buyske, S., Yang, G., Matise, T.C., Gordon, D.: When a case is not a case: Effects of phenotype misclassification on power and sample size requirements for the transmission disequilibrium test with affected child trios. Hum. Hered. 67, 287–292 (2009). doi:10.1159/000194981
Article Google Scholar
Gordon D, Yang Y, Haynes C, et al: Increasing power for tests of genetic association in the presence of phenotype and/or genotype error by use of double-sampling. Stat Appl Genet Mol Biol. 3: Article 26 (2004). doi: 10.2202/1544-6115.1085
Manchia, M., Cullis, J., Turecki, G., et al.: The Impact of phenotypic and genetic heterogeneity on results of genome wide association studies of complex diseases. PLoS ONE 8, 1–7 (2013). doi:10.1371/journal.pone.0076295
Google Scholar
Labbe, A., Bureau, A., Moreau, I., et al.: Symptom dimensions as alternative phenotypes to address genetic heterogeneity in schizophrenia and bipolar disorder. Eur. J. Hum. Genet. 20, 1182–1188 (2012). doi:10.1038/ejhg.2012.67
Article Google Scholar
Chaste, P., Klei, L., Sanders, S.J., et al.: A genome-wide association study of autism using the Simons Simplex Collection: Does reducing phenotypic heterogeneity in autism increase genetic homogeneity? Biol. Psychiatry 77, 775–784 (2015). doi:10.1016/j.biopsych.2014.09.017
Article Google Scholar
Wiley, L.K., Vanhouten, J.P., Samuels, D.C., et al.: strategies for equitable pharmacogenomic-guided warfarin dosing among european and african american individuals in a clinical population. Pac Symp Biocomput 22, 545–556 (2016)
Google Scholar
Shaw, J.: The erosion of privacy in the internet era (2009)
Google Scholar
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets (2008)
Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V. Membership inference attacks against machine learning models (2016)
Google Scholar
McSherry, F., Talwar, K.: Mechanism design via differential privacy. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE, pp. 94–103 (2007)
Google Scholar
Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv (2017). doi:10.1101/159756
Google Scholar
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found trends^®. Theor Comput Sci 9, 211–407 (2013). doi:10.1561/0400000042
MATH MathSciNet Google Scholar
Beaulieu-Jones, B.K., Greene, C.S.: Reproducibility of computational workflows is automated using continuous analysis. Nat Biotech 35, 342–346 (2017)
Article Google Scholar
Group TSR: A randomized trial of intensive versus standard blood-pressure control. N. Engl. J. Med. 373, 2103–2116 (2015). doi:10.1056/NEJMoa1511939
Article Google Scholar
Jensen, A.B., Moseley, P.L., Oprea, T.I., et al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nat Commun 5, 1769–1775 (2014). doi:10.1038/ncomms5022
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G., et al.: Deep learning. Nature 521, 436–444 (2015). doi:10.1038/nature14539
Article Google Scholar
Beaulieu-Jones, B.K., Greene, C.S.: Semi-supervised learning of the electronic health record for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016). doi:10.1016/j.jbi.2016.10.007
Article Google Scholar
Miotto, R., Li, L., Kidd, B.A., et al.: Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6, 26094 (2016). doi:10.1038/srep26094
Article Google Scholar
Khardori, R.M. Type 2 Diabetes Mellitus. PhekKB 1–24 (2014)
Google Scholar
Ching, T. et al. Opportunities And Obstacles For Deep Learning In Biology And Medicine. bioRXiv. 102 (2017). doi: 10.1101/142760
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, D200 Richards Hall, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
Brett Beaulieu-Jones

Authors

Brett Beaulieu-Jones
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brett Beaulieu-Jones .

Editor information

Editors and Affiliations

Dept. of Statistics & Applied Probabilit, University of California Santa Barbara, Santa Barbara, California, USA
Dawn E. Holmes
KES International , Adelaide, South Australia, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Beaulieu-Jones, B. (2018). Machine Learning for Structured Clinical Data. In: Holmes, D., Jain, L. (eds) Advances in Biomedical Informatics. Intelligent Systems Reference Library, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-67513-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-67513-8_3
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67512-1
Online ISBN: 978-3-319-67513-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics