Abstract
Research is a tertiary priority in the EHR, where the priorities are patient care and billing. Because of this, the data is not standardized or formatted in a manner easily adapted to machine learning approaches. Data may be missing for a large variety of reasons ranging from individual input styles to differences in clinical decision making, for example, which lab tests to issue. Few patients are annotated at a research quality, limiting sample size and presenting a moving gold standard. Patient progression over time is key to understanding many diseases but many machine learning algorithms require a snapshot, at a single time point, to create a usable vector form. Furthermore, algorithms that produce black box results do not provide the interpretability required for clinical adoption. This chapter discusses these challenges and others in applying machine learning techniques to the structured EHR (i.e. Patient Demographics, Family History, Medication Information, Vital Signs, Laboratory Tests, Genetic Testing). It does not cover feature extraction from additional sources such as imaging data or free text patient notes but the approaches discussed can include features extracted from these sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Collins, F.S., Varmus, H.: A new initiative on precision medicine. N. Engl. J. Med. 363, 1–3 (2010). doi:10.1056/NEJMp1002530
Bishop, C.M.: Pattern recognition and machine learning. Springer, Berlin (2006)
Kreybe, L.: Histological lung cancer types. A morphological and biological correlation. Acta Pathol Microbiol Scand Suppl 157, 1–92 (1962)
Mountain, C.F.: Revisions in the international system for staging lung cancer. Chest 111, 1710–1717 (1997). doi:10.1378/chest.111.6.1710
West, L., Vidwans, S.J., Campbell, N.P., et al.: A novel classification of lung cancer into molecular subtypes. PLoS ONE 7, 1–11 (2012). doi:10.1371/journal.pone.0031906
Shin, J.-A., Lee, J.-H., Lim, S.-Y., et al.: Metabolic syndrome as a predictor of type 2 diabetes, and its clinical interpretations and usefulness. J Diabetes Investig 4, 334–343 (2013). doi:10.1111/jdi.12075
Li, L., Cheng, W., Glicksberg, B.S., et al.: Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7, 1–16 (2015). doi:10.1126/scitranslmed.aaa9364
Lublin, F.D., Reingold, S.C., Cohen, J.A., et al.: Defining the clinical course of multiple sclerosis: The 2013 revisions. Neurology 83, 278–286 (2014). doi:10.1212/WNL.0000000000000560
Denny, J.C., Ritchie, M.D., Basford, M.A., et al.: PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010). doi:10.1093/bioinformatics/btq126
Buyske, S., Yang, G., Matise, T.C., Gordon, D.: When a case is not a case: Effects of phenotype misclassification on power and sample size requirements for the transmission disequilibrium test with affected child trios. Hum. Hered. 67, 287–292 (2009). doi:10.1159/000194981
Gordon D, Yang Y, Haynes C, et al: Increasing power for tests of genetic association in the presence of phenotype and/or genotype error by use of double-sampling. Stat Appl Genet Mol Biol. 3: Article 26 (2004). doi: 10.2202/1544-6115.1085
Manchia, M., Cullis, J., Turecki, G., et al.: The Impact of phenotypic and genetic heterogeneity on results of genome wide association studies of complex diseases. PLoS ONE 8, 1–7 (2013). doi:10.1371/journal.pone.0076295
Labbe, A., Bureau, A., Moreau, I., et al.: Symptom dimensions as alternative phenotypes to address genetic heterogeneity in schizophrenia and bipolar disorder. Eur. J. Hum. Genet. 20, 1182–1188 (2012). doi:10.1038/ejhg.2012.67
Chaste, P., Klei, L., Sanders, S.J., et al.: A genome-wide association study of autism using the Simons Simplex Collection: Does reducing phenotypic heterogeneity in autism increase genetic homogeneity? Biol. Psychiatry 77, 775–784 (2015). doi:10.1016/j.biopsych.2014.09.017
Wiley, L.K., Vanhouten, J.P., Samuels, D.C., et al.: strategies for equitable pharmacogenomic-guided warfarin dosing among european and african american individuals in a clinical population. Pac Symp Biocomput 22, 545–556 (2016)
Shaw, J.: The erosion of privacy in the internet era (2009)
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets (2008)
Shokri, R., Stronati, M., Song, C., Shmatikov, V. Membership inference attacks against machine learning models (2016)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE, pp. 94–103 (2007)
Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv (2017). doi:10.1101/159756
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found trends®. Theor Comput Sci 9, 211–407 (2013). doi:10.1561/0400000042
Beaulieu-Jones, B.K., Greene, C.S.: Reproducibility of computational workflows is automated using continuous analysis. Nat Biotech 35, 342–346 (2017)
Group TSR: A randomized trial of intensive versus standard blood-pressure control. N. Engl. J. Med. 373, 2103–2116 (2015). doi:10.1056/NEJMoa1511939
Jensen, A.B., Moseley, P.L., Oprea, T.I., et al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nat Commun 5, 1769–1775 (2014). doi:10.1038/ncomms5022
LeCun, Y., Bengio, Y., Hinton, G., et al.: Deep learning. Nature 521, 436–444 (2015). doi:10.1038/nature14539
Beaulieu-Jones, B.K., Greene, C.S.: Semi-supervised learning of the electronic health record for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016). doi:10.1016/j.jbi.2016.10.007
Miotto, R., Li, L., Kidd, B.A., et al.: Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6, 26094 (2016). doi:10.1038/srep26094
Khardori, R.M. Type 2 Diabetes Mellitus. PhekKB 1–24 (2014)
Ching, T. et al. Opportunities And Obstacles For Deep Learning In Biology And Medicine. bioRXiv. 102 (2017). doi: 10.1101/142760
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Beaulieu-Jones, B. (2018). Machine Learning for Structured Clinical Data. In: Holmes, D., Jain, L. (eds) Advances in Biomedical Informatics. Intelligent Systems Reference Library, vol 137. Springer, Cham. https://doi.org/10.1007/978-3-319-67513-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-67513-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67512-1
Online ISBN: 978-3-319-67513-8
eBook Packages: EngineeringEngineering (R0)