Skip to main content

A Giant with Feet of Clay: On the Validity of the Data that Feed Machine Learning in Medicine

  • Conference paper
  • First Online:
Organizing for the Digital World

Abstract

This paper considers the use of machine learning in medicine by focusing on the main problem that it has been aimed at solving or at least minimizing: uncertainty. However, we point out how uncertainty is so ingrained in medicine that it biases also the representation of clinical phenomena, that is the very input of this class of computational models, thus undermining the clinical significance of their output. Recognizing this can motivate researchers to pursue different ways to assess the value of these decision aids, as well as alternative techniques that do not “sweep uncertainty under the rug” within an objectivist fiction (which doctors can come up by trusting).

An extended version of this paper can be found on the arXiv platform at the following address: https://arxiv.org/abs/1706.06838.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is a vague term: here we mean data quality mainly in terms of accuracy and validity.

  2. 2.

    In what follows we introduce the concept of ML predictive model with reference to supervised discriminative (or classification) models, by far the most frequently used in medicine.

  3. 3.

    While biases are, strictly speaking, mental prejudices, idiosyncratic perceptions and cognitive behaviors producing an either impairing and distorting effect, here we rather intend the effect (by metonymy), that is the “error” in the data recorded and the decisions taken caused by the bias itself.

  4. 4.

    Moreover, Burnum traced back this lie of the land to “standards of care and a reimbursement system [that is] blind to biologic diversity”.

References

  1. Ahmad, F.S., Chan, C., Rosenman, M.B., Post, W.S., Fort, D.G., Greenland, P., Liu, K.J., Kho, A., Allen, N.B.: Validity of cardiovascular data from electronic sources: the multi-ethnic study of atherosclerosis and HealthLNK. Circulation 117 (2017)

    Google Scholar 

  2. Althubaiti, A.: Information bias in health research: definition, pitfalls, and adjustment methods. J. Multidiscip. Healthc. 9, 211 (2016)

    Article  Google Scholar 

  3. Andrews, J.E., Richesson, R.L., Krischer, J.: Variation of SNOMED CT coding of clinical research concepts among coding experts. J. Am. Med. Inf. Assoc. 14(4), 497–506 (2007)

    Article  Google Scholar 

  4. Bachmann, L.M., Jüni, P., Reichenbach, S., Ziswiler, H.R., Kessels, A.G., Vögelin, E.: Consequences of different diagnostic gold standards in test accuracy research: Carpal tunnel syndrome as an example. Int. J. Epidemiol. 34(4), 953–955 (2005)

    Article  Google Scholar 

  5. Bello, R., Falcon, R.: Rough Sets in Machine Learning: a review, pp. 87–118. Springer International Publishing, Cham (2017)

    Google Scholar 

  6. Bowker, G.C., Star, S.L.: Sorting Things Out: classification and its consequences. MIT press (2000)

    Google Scholar 

  7. Braun, R., Gutkowicz-Krusin, D., Rabinovitz, H., Cognetta, A., Hofmann-Wellenhof, R., Ahlgrimm-Siess, V., Polsky, D., Oliviero, M., Kolm, I., Googe, P., et al.: Agreement of dermatopathologists in the evaluation of clinically difficult melanocytic lesions: how golden is the gold standard ? Dermatology 224(1), 51–58 (2012)

    Article  Google Scholar 

  8. Burnum, J.F.: The misinformation era: the fall of the medical record. Ann. Int. Med. 110(6), 482–484 (1989)

    Article  Google Scholar 

  9. Cabitza, F., Batini, C.: Information quality in healthcare. In: Data and Information Quality, Chap. 13, pp. 421–438. Springer (2016)

    Google Scholar 

  10. Cabitza, F., Ciucci, D., Locoro, A.: Exploiting collective knowledge with three-way decision theory: cases from the questionnaire-based research. Int. J. Approx. Reason. 83, 356–370 (2017)

    Article  Google Scholar 

  11. Cabitza, F., Rasoini, R., Gensini, G.F.: Unintended consequences of machine learning in medicine. Jama 318(6), 517–518 (2017)

    Article  Google Scholar 

  12. Cappelletti, P.: Appropriateness of diagnostics tests. Int. J. Lab. Hematol. 38(S1), 91–99 (2016)

    Article  Google Scholar 

  13. Carey, I., Nightingale, C., DeWilde, S., Harris, T., Whincup, P., Cook, D.: Blood pressure recording bias during a period when the quality and outcomes framework was introduced. J. Hum. Hypertens. 23(11), 764 (2009)

    Article  Google Scholar 

  14. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1721–1730. ACM (2015)

    Google Scholar 

  15. Denœux, T., Kanjanatarakul, O.: Evidential Clustering: a review, pp. 24–35 (2016)

    Google Scholar 

  16. Dharmarajan, K., Strait, K.M., Tinetti, M.E., Lagu, T., Lindenauer, P.K., Lynn, J., Krukas, M.R., Ernst, F.R., Li, S.X., Krumholz, H.M.: Treatment for multiple acute cardiopulmonary conditions in older adults hospitalized with pneumonia, chronic obstructive pulmonary disease, or heart failure. J. Am. Geriatr. Soc. 64(8), 1574–1582 (2016)

    Article  Google Scholar 

  17. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  18. Elliott, J.H., Grimshaw, J., Altman, R., Bero, L., Goodman, S.N., Henry, D., Macleod, M., Tovey, D., Tugwell, P., White, H., et al.: Informatics: make sense of health data. Nature 527, 31–32 (2015)

    Article  Google Scholar 

  19. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)

    Article  Google Scholar 

  20. Fox, R.C.: Medical uncertainty revisited. Handb. Soc. Stud. Health Med. 409–425 (2000)

    Google Scholar 

  21. Graham, B.: The diagnosis and treatment of carpal tunnel syndrome: surgerywhether open or closed works, but only if the diagnosis is right. BMJ. Br. Med. J. 332(7556), 1463 (2006)

    Article  Google Scholar 

  22. Grzymala-Busse, J.W., Grzymala-Busse, W.J.: Handling missing attribute values. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 33–51. Springer, US, Boston, MA (2010)

    Google Scholar 

  23. Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 316(22), 2402–2410 (2016)

    Article  Google Scholar 

  24. Gwet, K.: Handbook of inter-rater reliability. STATAXIS Publishing Company (2001)

    Google Scholar 

  25. Haouari, B., Amor, N.B., Elouedi, Z., Mellouli, K.: Naïve possibilistic network classifiers. Fuzzy Sets Syst. 160(22), 3224–3238 (2009)

    Article  Google Scholar 

  26. Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE Trans. Syst. Man Cybernet. 31(5), 735–744 (2001)

    Google Scholar 

  27. Hayes, S.: Terminal digit preference occurs in pathology reporting irrespective of patient management implication. J. Clin. Pathol. 61(9), 1071–1072 (2008)

    Article  Google Scholar 

  28. Hemkens, L.G., Contopoulos-Ioannidis, D.G., Ioannidis, J.P.: Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey. BMJ 352, i493 (2016)

    Google Scholar 

  29. Hüllermeier, E.: Possibilistic instance-based learning. Artif. Intell. 148(1–2), 335–383 (2003)

    Article  Google Scholar 

  30. Hüllermeier, E.: Fuzzy sets in machine learning and data mining. Appl. Soft Comput. 11(2), 1493–1505 (2011)

    Article  Google Scholar 

  31. Hüllermeier, E.: Does machine learning need fuzzy logic? Fuzzy Sets Syst. 281, 292–299 (2015)

    Article  Google Scholar 

  32. Jha, S., Topol, E.J.: Adapting to artificial intelligence: radiologists and pathologists as information specialists. JAMA 316(22), 2353–2354 (2016)

    Article  Google Scholar 

  33. Katz, J.: The silent world of doctor and patient. JHU Press (2002)

    Google Scholar 

  34. Krippendorff, K.: Content analysis: an introduction to its methodology. Sage (2012)

    Google Scholar 

  35. Lodwick, W.A.: Fundamentals of interval analysis and linkages to fuzzy set theory, pp. 55–79. Wiley (2008)

    Google Scholar 

  36. Maravalle, M., Ricca, F., Simeone, B., Spinelli, V.: Carpal tunnel syndrome automatic classification: electromyography vs. ultrasound imaging. TOP 23(1), 100–123 (2015)

    Article  Google Scholar 

  37. Mitchell, T.M.: Machine learning. Burr Ridge, IL: McGraw Hill 45(37), 870–877 (1997)

    Google Scholar 

  38. Obermeyer, Z., Emanuel, E.J.: Predicting the future big data, machine learning, and clinical medicine. New Engl. J. Med. 375(13), 1216 (2016)

    Article  Google Scholar 

  39. Parasuraman, R., Manzey, D.H.: Complacency and bias in human use of automation: an attentional integration. Hum. Factors J. Hum. Factors Ergon. Soc. 52(3), 381–410 (2010)

    Article  Google Scholar 

  40. Parsons, S.: Qualitative Approaches for Reasoning Under Uncertainty. The MIT Press, Cambridge, Massachussets (2001)

    Google Scholar 

  41. Paxton, C., Niculescu-Mizil, A., Saria, S.: Developing predictive models using electronic medical records: challenges and pitfalls. In: AMIA Annual Symposium Proceedings. vol. 2013, p. 1109. American Medical Informatics Association (2013)

    Google Scholar 

  42. Pivert, O., Prade, H.: A certainty-based model for uncertain databases. IEEE Trans. Fuzzy Syst. 23(4), 1181–1196 (2015)

    Article  Google Scholar 

  43. Prevention, C., et al.: For disease control, ICD-9-CM official guidelines for coding and reporting. Technical Report Centers for Medicare & Medicaid Services, Atlanta, GA, USA (2011)

    Google Scholar 

  44. Reiser, S.J., Anbar, M.: The Machine at the Bedside: strategies for using technology in patient care. Cambridge University Press (1984)

    Google Scholar 

  45. Reiser, S.J.: The clinical record in medicine Part 2: Reforming content and purpose. Ann. Intern. Med. 114(11), 980–985 (1991)

    Article  Google Scholar 

  46. Ruamviboonsuk, P., Teerasuwanajak, K., Tiensuwan, M., Yuttitham, K., for Diabetic Retinopathy Study Group, T.S., et al.: Interobserver agreement in the interpretation of single-field digital fundus images for diabetic retinopathy screening. Ophthalmology 113(5), 826–832 (2006)

    Google Scholar 

  47. Shafiq, A., Arnold, S.V., Gosch, K., Kureshi, F., Breeding, T., Jones, P.G., Beltrame, J., Spertus, J.A.: Patient and physician discordance in reporting symptoms of angina among stable coronary artery disease patients: Insights from the angina prevalence and provider evaluation of angina relief (appear) study. Am. Heart J. 175, 94–100 (2016)

    Article  Google Scholar 

  48. Shortliffe, E.H., Buchanan, B.G.: A model of inexact reasoning in medicine. Math. Biosci. 23(3–4), 351–379 (1975)

    Article  Google Scholar 

  49. Simpkin, A.L., Schwartzstein, R.M.: Tolerating uncertainty the next medical revolution? New Engl. J. Med. 375(18), 1713–1715 (2016)

    Article  Google Scholar 

  50. Spodick, D.H., Bishop, R.L.: Computer treason: intraobserver variability of an electrocardiographic computer system. Am. J. Cardiol. 80(1), 102–103 (1997)

    Article  Google Scholar 

  51. Svensson, C.M., Hubler, R., Figge, M.T.: Automated classification of circulating tumor cells and the impact of interobsever variability on classifier training and performance. J. Immunol. Res. 2015 (2015)

    Google Scholar 

  52. Timmermans, S., Berg, M.: The Gold Standard: the challenge of evidence-based medicine and standardization in health care. Temple University Press (2010)

    Google Scholar 

  53. Tsumoto, S.: Medical diagnosis: rough set view. In: Thriving Rough Sets, pp. 139–156. Springer (2017)

    Google Scholar 

  54. van der Lei, J., et al.: Use and abuse of computer-stored medical records. Methods Archive 30, 79–80 (1991)

    Google Scholar 

  55. Van Driest, S.L., Wells, Q.S., Stallings, S., Bush, W.S., Gordon, A., Nickerson, D.A., Kim, J.H., Crosslin, D.R., Jarvik, G.P., Carrell, D.S., et al.: Association of arrhythmia-related genetic variants with phenotypes documented in electronic medical records. Jama 315(1), 47–57 (2016)

    Article  Google Scholar 

  56. Veress, B., Gadaleanu, V., Nennesmo, I., Wikström, B.: The reliability of autopsy diagnostics: inter-observer variation between pathologists, a preliminary report. Int. J. Qual Health Care 5(4), 333–337 (1993)

    Article  Google Scholar 

  57. Vetterlein, T., Mandl, H., Adlassnig, K.P.: Fuzzy arden syntax: a fuzzy programming language for medicine. Artif. Intell. Med. 49(1), 1–10 (2010)

    Article  Google Scholar 

  58. Wang, Y.T., Tadarati, M., Wolfson, Y., Bressler, S.B., Bressler, N.M.: Comparison of prevalence of diabetic macular edema based on monocular fundus photography vs optical coherence tomography. JAMA Ophthalmol. 134(2), 222–228 (2016)

    Article  Google Scholar 

  59. Wong, T.Y., Bressler, N.M.: Artificial intelligence with deep learning technology looks into diabetic retinopathy screening. JAMA 316(22), 2366–2367 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Ciucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cabitza, F., Ciucci, D., Rasoini, R. (2019). A Giant with Feet of Clay: On the Validity of the Data that Feed Machine Learning in Medicine. In: Cabitza, F., Batini, C., Magni, M. (eds) Organizing for the Digital World. Lecture Notes in Information Systems and Organisation, vol 28. Springer, Cham. https://doi.org/10.1007/978-3-319-90503-7_10

Download citation

Publish with us

Policies and ethics