Skip to main content

Risk of Training Diagnostic Algorithms on Data with Demographic Bias

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12446)

Abstract

One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where predictive algorithms are designed mainly based on a given set of medical images, and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.

Keywords

  • Computer-aided diagnosis
  • Demographic bias
  • Classification parity

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-61166-8_20
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-61166-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. Adeli, E., et al.: Representation learning with statistical independence to mitigate bias (2019)

    Google Scholar 

  2. Ashraf, A., Khan, S., Bhagwat, N., Chakravarty, M., Taati, B.: Learning to unlearn: building immunity to dataset bias in medical imaging studies. arXiv preprint arXiv:1812.01716 (2018)

  3. Beddingfield III, F.: The melanoma epidemic: res ipsa loquitur. Oncologist 8(5), 459 (2003)

    Google Scholar 

  4. Bellamy, R.K.E., et al.: AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63(4/5), 4:1–4:15 (2019)

    CrossRef  Google Scholar 

  5. Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)

    Google Scholar 

  6. Cheplygina, V., de Bruijne, M., Pluim, J.P.: Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 54, 280–296 (2019)

    CrossRef  Google Scholar 

  7. Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). arXiv preprint arXiv:1710.05006 (2017)

  8. Cole, J.H., et al.: Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage 163, 115–124 (2017)

    CrossRef  Google Scholar 

  9. Cole, J.H., Underwood, J., et al.: Increased brain-predicted aging in treated HIV disease. Neurology 88(14), 1349–1357 (2017)

    CrossRef  Google Scholar 

  10. Corbett-Davies, S., Goel, S.: The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv preprint arXiv:1808.00023 (2018)

  11. Creager, E., et al.: Flexibly fair representation learning by disentanglement. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 1436–1445. PMLR, Long Beach, California, USA, 09–15 June 2019

    Google Scholar 

  12. Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.): MICCAI 2018. LNCS, vol. 11073. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3

    CrossRef  Google Scholar 

  13. Gebru, T., et al.: Datasheets for datasets. CoRR abs/1803.09010 (2018)

    Google Scholar 

  14. Gill, R.S., et al.: Deep convolutional networks for automated detection of epileptogenic brain malformations. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 490–497. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_56

    CrossRef  Google Scholar 

  15. Greenspan, H., Van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35(5), 1153–1159 (2016)

    CrossRef  Google Scholar 

  16. Hett, K., Ta, V.-T., Manjón, J.V., Coupé, P.: Graph of brain structures grading for early detection of Alzheimer’s disease. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 429–436. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_49

    CrossRef  Google Scholar 

  17. Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: International Conference on Data Mining Workshops, pp. 643–650 (2011)

    Google Scholar 

  18. Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 158–171. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_12

    CrossRef  Google Scholar 

  19. Kinyanjui, N.M., et al.: Estimating skin tone and effects on classification performance in dermatology datasets. arXiv preprint arXiv:1910.13268 (2019)

  20. Larrazabal, A.J., Nieto, N., Peterson, V., Milone, D.H., Ferrante, E.: Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. In: Proceedings of the National Academy of Sciences (2020)

    Google Scholar 

  21. Liu, C., et al.: Biological age estimated from retinal imaging: a novel biomarker of aging. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11764, pp. 138–146. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32239-7_16

    CrossRef  Google Scholar 

  22. Mitchell, M., et al.: Model cards for model reporting. In: Fairness, Accountability, and Transparency (FAccT), pp. 220–229. ACM (2019)

    Google Scholar 

  23. Orlando, J.I., Barbosa Breda, J., van Keer, K., Blaschko, M.B., Blanco, P.J., Bulant, C.A.: Towards a glaucoma risk index based on simulated hemodynamics from fundus images. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 65–73. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_8

    CrossRef  Google Scholar 

  24. Perez, F., Vasconcelos, C., Avila, S., Valle, E.: Data augmentation for skin lesion analysis. In: Stoyanov, D., et al. (eds.) CARE/CLIP/OR 2.0/ISIC -2018. LNCS, vol. 11041, pp. 303–311. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01201-4_33

    CrossRef  Google Scholar 

  25. Pooch, E.H., Ballester, P.L., Barros, R.C.: Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification. arXiv preprint arXiv:1909.01940 (2019)

  26. Roy, P.C., Boddeti, V.N.: Mitigating information leakage in image representations: a maximum entropy approach. In: Computer Vision and Pattern Recognition (CVPR), pp. 2581–2589, June 2019

    Google Scholar 

  27. Saleiro, P., et al.: Aequitas: a bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577 (2018)

  28. Salimi, B., Rodriguez, L., Howe, B., Suciu, D.: Interventional fairness: causal database repair for algorithmic fairness. In: International Conference on Management of Data, pp. 793–810. Association for Computing Machinery (2019)

    Google Scholar 

  29. Suresh, H., Guttag, J.V.: A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 (2019)

  30. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  31. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR), pp. 1521–1528 (2011)

    Google Scholar 

  32. Wachinger, C., Becker, B.G., Rieckmann, A.: Detect, quantify, and incorporate dataset bias: a neuroimaging analysis on 12,207 individuals. arXiv preprint arXiv:1804.10764 (2018)

  33. Wagner, C.H.: Simpson’s paradox in real life. Am. Stat. 36(1), 46–48 (1982)

    Google Scholar 

  34. Wang, T., Zhao, J., Yatskar, M., Chang, K.W., Ordonez, V.: Balanced datasets are not enough: estimating and mitigating gender bias in deep image representations. In: International Conference on Computer Vision (2019)

    Google Scholar 

  35. Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In: Fairness, Accountability, and Transparency (FAccT), FAT* 2020, pp. 547–558 (2020)

    Google Scholar 

  36. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: Dasgupta, S., McAllester, D. (eds.) International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, no. 3, pp. 325–333. PMLR, Atlanta, Georgia, USA, 17–19 June 2013

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veronika Cheplygina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Abbasi-Sureshjani, S., Raumanns, R., Michels, B.E.J., Schouten, G., Cheplygina, V. (2020). Risk of Training Diagnostic Algorithms on Data with Demographic Bias. In: , et al. Interpretable and Annotation-Efficient Learning for Medical Image Computing. IMIMIC MIL3ID LABELS 2020 2020 2020. Lecture Notes in Computer Science(), vol 12446. Springer, Cham. https://doi.org/10.1007/978-3-030-61166-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61166-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61165-1

  • Online ISBN: 978-3-030-61166-8

  • eBook Packages: Computer ScienceComputer Science (R0)