Abstract
The value and future potentials of AI in healthcare are becoming self-evident, presenting an escalating body of evidence. However, the adoption into clinical practice is still significantly impacted by the lack of transparency, stemming from inadequate focus on human-comprehensive information from AI, i.e. Interpretable AI. AI interpretations of uncertainty, significance, and causality translate to a more fair, safe, and reliable AI. This is especially pertinent to safer clinical decision making and for minimising any risk to the patient. In this chapter we aim to elucidate what interpretability means and why most machine learning (i.e. AI) models fail to satisfy these definitions. We lay this phenomenon out through what we believe to be its canonical components: predictions, uncertainty, significance, and causality; explaining how these different types of interpretations can support various explanations and how overcoming this barrier can permit the adoption of AI into healthcare.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We abuse terminology here as the details are esoteric and a little beside the point. Please refer to please refer to MacDonald [24] for a more accessible details about uncertainty in deep learning.
- 2.
See Davis et al. [14] for a practical prescriptive guide on how to estimate uncertainties in deep learning.
- 3.
We refer the interested reader to Peters J. et al. [27] for a comprehensive review of causal inference including methods for estimating causal relationships.
References
Alaa AM, van der Shaar M (2017) Bayesian inference of individualized treatment effects using multi-task Gaussian processes. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
Beaulieu-Jones BK, Finlayson SG, Yuan W, Altman RB, Kohane IS, Prasad V, Yu K-H (2020) Examining the use of real-world evidence in the regulatory process. Clin Pharmacol Ther 107(4):843–852. https://doi.org/10.1002/cpt.1658
Begoli E, Bhattacharya T, Kusnezov D (2019) The need for uncertainty quantification in machine-assisted medical decision making. Nat Mach Intell 1(1):20–23. https://doi.org/10.1038/s42256-018-0004-1
Bica I, Jordon J, van der Schaar M (2020) Estimating the effects of continuous-valued interventions using generative adversarial networks. ArXiv:2002.12326 [Cs, Stat]. http://arxiv.org/abs/2002.12326
Bica I, Alaa AM, Lambert C, van der Schaar M (2021) From real-world patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges. Clin Pharmacol Ther 109(1):87–100. https://doi.org/10.1002/cpt.1907
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C et al (2020) Language models are few-shot learners. ArXiv:2005.14165 [Cs]. http://arxiv.org/abs/2005.14165
Ovadia Y, Fertig E, Ren J, Nado Z, Sculley D, Nowozin S, Dillon JV, Lakshminarayanan B, Snoek J (2019) Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.
Chen J, Song L, Wainwright MJ, Jordan MI (2018) Learning to explain: an information-theoretic perspective on model interpretation. arXiv. https://arxiv.org/abs/1802.07814v2
Chen P, Dong W, Lu X, Kaymak U, He K, Huang Z (2019) Deep representation learning for individualized treatment effect estimation using electronic health records. J Biomed Inform 100:103303. https://doi.org/10.1016/j.jbi.2019.103303
Couzin-Frankel J (2019) Medicine contends with how to use artificial intelligence. Science 364(6446):1119–1120. https://doi.org/10.1126/science.2019.6446.364_1119
Fort S, Hu H, Lakshminarayanan B (2019) Deep ensembles: a loss landscape perspective. arXiv. https://arxiv.org/abs/1912.02757v2
Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. arXiv. https://arxiv.org/abs/1706.04599v2
Healthdirect H (2021) Cancer immunotherapy [Text/html]. Healthdirect Australia, September 15. https://www.healthdirect.gov.au/cancer-immunotherapy
Davis J, MacDonald S, Zhu J, Oldfather J, Trzaskowski M (2020) Quantifying uncertainty in deep learning systems. AWS Prescriptive Guidance. https://docs.aws.amazon.com/prescriptive-guidance/latest/ml-quantifying-uncertainty/welcome.html
Kallus N, Puli AM, Shalit U (2018) Removing hidden confounding by experimental grounding. Adv Neural Inf Proces Syst:31. https://papers.nips.cc/paper/2018/hash/566f0ea4f6c2e947f36795c8f58ba901-Abstract.html
Khan S, Hayat M, Zamir SW, Shen J, Shao L (2019) Striking the right balance with uncertainty. In: Proceedings – 2019 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2019, pp 103–112. https://doi.org/10.1109/CVPR.2019.00019
Kristiadi A, Hein M, Hennig P (2020) Being Bayesian, even just a bit, fixes overconfidence in ReLU networks. https://arxiv.org/abs/2002.10118v2
Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc, pp 6402–6413. http://papers.nips.cc/paper/7219-simple-and-scalable-predictive-uncertainty-estimation-using-deep-ensembles.pdf
Ledesma P (2020) How much does a clinical trial cost? Sofpromed, January 2. https://www.sofpromed.com/how-much-does-a-clinical-trial-cost
Lee H-S, Shen C, Zame W, Lee J-W, van der Schaar M (2021) SDF-Bayes: cautious optimism in safe dose-finding clinical trials with drug combinations and heterogeneous patient groups. ArXiv:2101.10998 [Cs, Stat]. http://arxiv.org/abs/2101.10998
Louizos C, Shalit U, Mooij J, Sontag D, Zemel R, Welling M (2017) Causal effect inference with deep latent-variable models. In: Proceedings of the 31st international conference on neural information processing systems, pp 6449–6459
Lundberg S, Lee S-I (2017) A unified approach to interpreting model predictions. ArXiv:1705.07874 [Cs, Stat]. http://arxiv.org/abs/1705.07874
MacDonald S (2019) Interpretations in Bayesian deep learning. University of Queensland. Master of Data Science Capstone Thesis Project
MacDonald S (2020) Interpretations of learning. Medium, March 3. https://towardsdatascience.com/interpretations-in-learning-part-1-4342c5741a71
Obermeyer Z, Emanuel EJ (2016) Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med 375(13):1216–1219. https://doi.org/10.1056/NEJMp1606181
Oberst M, Johansson FD, Wei D, Gao T, Brat G, Sontag D, Varshney KR (2020) Characterization of overlap in observational studies. ArXiv:1907.04138 [Cs, Stat]. http://arxiv.org/abs/1907.04138
Peters J, Janzing D, Scholkopf B (2017) Elements of causal inference: foundations and learning algorithms. MIT Press
Rasmussen CE (2004) Gaussian processes in machine learning. In: Bousquet O, von Luxburg U, Rätsch G (eds) Advanced lectures on machine learning: ML summer schools 2003, Canberra, Australia, February 2–14, 2003, Tübingen, Germany, August 4—16, 2003, Revised lectures. Springer, pp 63–71. https://doi.org/10.1007/978-3-540-28650-9_4
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144. https://doi.org/10.1145/2939672.2939778
Richens JG, Lee CM, Johri S (2020) Improving the accuracy of medical diagnosis with causal machine learning. Nat Commun 11(1):3923. https://doi.org/10.1038/s41467-020-17419-7
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140. https://doi.org/10.1093/bioinformatics/btp616
Lee H, Zhang Y, Zame WR, Shen C, Lee J, van der Schaar M (2020) Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55. https://doi.org/10.1093/biomet/70.1.41
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215. https://doi.org/10.1038/s42256-019-0048-x
Schwab P, Linhardt L, Bauer S, Buhmann JM, Karlen W (2020) Learning counterfactual representations for estimating individual dose-response curves. Proc AAAI Conf Artif Intell 34(04):5612–5619. https://doi.org/10.1609/aaai.v34i04.6014
Shalit U, Johansson FD, Sontag D (2017) Estimating individual treatment effect: generalization bounds and algorithms. In: Proceedings of the 34th international conference on machine learning, pp 3076–3085. https://proceedings.mlr.press/v70/shalit17a.html
Smilkov D, Thorat N, Kim B, Viégas FB, Wattenberg M (2017) SmoothGrad: removing noise by adding noise. CoRR:abs/1706.03825. http://arxiv.org/abs/1706.03825
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Proceedings of the 34th international conference on machine learning – volume 70, pp 3319–3328
van Amersfoort J, Smith W, Teh YW, Gal Y (2020) Uncertainty estimation using a single deep deterministic neural network. Proceedings of the 37 th international conference on machine learning, Vienna, Austria, PMLR 119, 2020.
van Amersfoort J, Smith L, Jesson A, Key O, Gal Y (2022) On feature collapse and deep kernel learning for single forward pass uncertainty. https://arxiv.org/abs/2102.11409
Wang Y, Blei DM (2019) The blessings of multiple causes. J Am Stat Assoc 114(528):1574–1596. https://doi.org/10.1080/01621459.2019.1686987
Yap M, Johnston RL, Foley H, MacDonald S, Kondrashova O, Tran KA, Nones K, Koufariotis LT, Bean C, Pearson JV, Trzaskowski M, Waddell N (2021) Verifying explainability of a deep learning tissue classifier trained on RNA-seq data. Sci Rep 11(1):2641. https://doi.org/10.1038/s41598-021-81773-9
Zhang L, Wang Y, Ostropolets A, Mulgrave JJ, Blei DM, Hripcsak G (2019) The medical Deconfounder: assessing treatment effects with electronic health records. In: Proceedings of the 4th machine learning for healthcare conference, pp 490–512. https://proceedings.mlr.press/v106/zhang19a.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
MacDonald, S., Steven, K., Trzaskowski, M. (2022). Interpretable AI in Healthcare: Enhancing Fairness, Safety, and Trust. In: Raz, M., Nguyen, T.C., Loh, E. (eds) Artificial Intelligence in Medicine. Springer, Singapore. https://doi.org/10.1007/978-981-19-1223-8_11
Download citation
DOI: https://doi.org/10.1007/978-981-19-1223-8_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1222-1
Online ISBN: 978-981-19-1223-8
eBook Packages: MedicineMedicine (R0)