Skip to main content
Log in

A Systematic Review of ‘Fair’ AI Model Development for Image Classification and Prediction

  • Review Article
  • Published:
Journal of Medical and Biological Engineering Aims and scope Submit manuscript

Abstract

Purpose

The new challenge in Artificial Intelligence (AI) is to understand the limitations of models to reduce potential harm. Particularly, unknown disparities based on demographic factors could encrypt currently existing inequalities worsening patient care for some groups.

Methods

Following PRISMA guidelines, we present a systematic review of ‘fair’ deep learning modeling techniques for natural and medical image applications which were published between year 2011 to 2021. Our search used Covidence review management software and incorporates articles from PubMed, IEEE, and ACM search engines and three reviewers independently review the manuscripts.

Results

Inter-rater agreement was 0.89 and conflicts were resolved by obtaining consensus between three reviewers. Our search initially retrieved 692 studies but after careful screening, our review included 22 manuscripts that carried four prevailing themes; ‘fair’ training dataset generation (4/22), representation learning (10/22), model disparity across institutions (5/22) and model fairness with respect to patient demographics (3/22). We benchmark the current literature regarding fairness in AI-based image analysis and highlighted the existing challenges. We observe that often discussion regarding fairness are limited to analyzing existing bias without further establishing methodologies to overcome model disparities.

Conclusion

Based on the current research trends, exploration of adversarial learning for demographic/camera/institution agnostic models is an important direction to minimize disparity gaps for imaging. Privacy preserving approaches also present encouraging performance for both natural and medical image domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data Availability

The authors declare that all data supporting the findings of this study are available within the paper. Additional review data can be shared upon request in Covidence.

References

  1. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M. P., & Ng, A. Y. CheXNet: Radiologist-level pneumonia detection on chest x-rays with deep learning,

  2. Ting, D. S. W., Cheung, C. Y.-L., Lim, G., Tan, G. S. W., Quang, N. D., Gan, A., Hamzah, H., Garcia-Franco, R., San Yeo, I. Y., Lee, S. Y., Wong, E. Y. M., Sabanayagam, C., Baskaran, M., Ibrahim, F., Tan, N. C., Finkelstein, E. A., Lamoureux, E. L., Wong, I. Y., \(\ldots\) Wong, T. Y. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA, 318(22), 2211–2223.

  3. Becker, A. S., Marcon, M., Ghafoor, S., Wurnig, M. C., Frauenfelder, T., & Boss, A. (2017). Deep learning in mammography: Diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Investigative Radialogy, 52(7), 434–440.

    Article  Google Scholar 

  4. Lee, H., Lee, E.-J., Ham, S., Lee, H.-B., Lee, J. S., Kwon, S. U., Kim, J. S., Kim, N., & Kang, D.-W. (2020). Machine learning approach to identify stroke within 4.5 hours. Stroke, 51(3), 860–866.

  5. Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y., & Ghassemi, M. (2021). Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine, 27(12), 2176–2182.

    Article  CAS  Google Scholar 

  6. Parikh, R. B., Teeple, S., & Navathe, A. S. (2019). Addressing bias in artificial intelligence in health care. JAMA, 322(24), 2377.

    Article  Google Scholar 

  7. Whittaker, M., Alper, M., College, O., Kaziunas, L., & Morris, M. R. (2019). Disability, bias, and AI (p. 32).

  8. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

    Article  CAS  Google Scholar 

  9. Benjamin, R. (2019). Assessing risk, automating racism. Science, 366(6464), 421–422.

    Article  CAS  Google Scholar 

  10. Zhang, H., Lu, A. X., Abdalla, M., McDermott, M., & Ghassemi, M. (2020). Hurtful words: Quantifying biases in clinical contextual word embeddings. In Proceedings of the ACM conference on health, inference, and learning, CHIL ’20, (New York, NY, USA) (pp. 110–120). Association for Computing Machinery.

  11. Adamson, A. S., & Smith, A. (2018). Machine learning and health care disparities in dermatology. JAMA Dermatology, 154, 1247–1248.

    Article  Google Scholar 

  12. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification (p. 15).

  13. Banerjee, I., Bhimireddy, A. R., Burns, J. L., Celi, L. A., Chen, L.-C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.-C., Kuo, P.-C., Lungren, M. P., Palmer, L., Price, B. J., Purkayastha, S., Pyrros, A., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., \(\ldots\) Gichoya, J. W. (2021). Reading race: Ai recognises patient’s racial identity in medical images.

  14. Wallis, C. J., Jerath, A., Coburn, N., Klaassen, Z., Luckenbaugh, A. N., Magee, D. E., Hird, A. E., Armstrong, K., Ravi, B., Esnaola, N. F., et al. (2022). Association of surgeon-patient sex concordance with postoperative outcomes. JAMA Surgery, 157(2), 146–156.

    Article  Google Scholar 

  15. Kaushal, A., Altman, R., & Langlotz, C. (2020). Geographic distribution of us cohorts used to train deep learning algorithms. JAMA, 324(12), 1212–1213.

    Article  Google Scholar 

  16. Davis, S. E., Greevy, R. A., Jr., Lasko, T. A., Walsh, C. G., & Matheny, M. E. (2020). Detection of calibration drift in clinical prediction models to inform model updating. Journal of Biomedical Informatics, 112, 103611.

    Article  Google Scholar 

  17. Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The prisma statement. BMJ, 339.

  18. Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR 2011 (pp. 1521–1528). IEEE.

  19. Kou, Z., Zhang, Y., Shang, L., & Wang, D. (2021). Faircrowd: Fair human face dataset sampling via batch-level crowdsourcing bias inference. In 2021 IEEE/ACM 29th international symposium on quality of service (IWQOS) (pp. 1–10). IEEE.

  20. Clapes, A., Bilici, O., Temirova, D., Avots, E., Anbarjafari, G., & Escalera, S. (2018). From apparent to real age: Gender, age, ethnic, makeup, and expression bias analysis in real age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 2373–2382).

  21. Howard, A., Zhang, C., & Horvitz, E. (2017). Addressing bias in machine learning algorithms: A pilot study on emotion recognition for intelligent systems. In 2017 IEEE workshop on advanced robotics and its social impacts (ARSO) (pp. 1–7). IEEE.

  22. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.

    Article  Google Scholar 

  23. Morales, A., Fierrez, J., Vera-Rodriguez, R., & Tolosana, R. (2020). Sensitivenets: Learning agnostic representations with application to face images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6), 2158–2164.

    Article  Google Scholar 

  24. Zhang, H., Cao, H., Yang, X., Deng, C., & Tao, D. (2021). Self-training with progressive representation enhancement for unsupervised cross-domain person re-identification, IEEE Transactions on Image Processing.

  25. Alsulaimawi, Z. (2020). Variational bound of mutual information for fairness in classification. In 2020 IEEE 22nd international workshop on multimedia signal processing (MMSP) (pp. 1–6). IEEE.

  26. Quadrianto, N., Sharmanska, V., & Thomas, O. (2019). Discovering fair representations in the data domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8227–8236).

  27. Jiang, L., Zhang, J., & Deng, B. (2019). Robust rgb-d face recognition using attribute-aware loss. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2552–2566.

    Article  Google Scholar 

  28. Adeli, E., Zhao, Q., Pfefferbaum, A., Sullivan, E. V., Fei-Fei, L., Niebles, J. C., & Pohl, K. M. (2021). Representation learning with statistical independence to mitigate bias. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2513–2523).

  29. Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision (pp. 17–35). Springer.

  30. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision (pp. 1116–1124).

  31. Yu, H.-X., Wu, A., & Zheng, W.-S. (2018). Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 956–973.

    Article  Google Scholar 

  32. Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In Proc. IEEE international workshop on performance evaluation for tracking and surveillance (PETS) (Vol. 3, pp. 1–7). Citeseer.

  33. Yan, L., Zhu, R., Mo, N., & Liu, Y. (2019). Cross-domain distance metric learning framework with limited target samples for scene classification of aerial images. IEEE Transactions on Geoscience and Remote Sensing, 57(6), 3840–3857.

    Article  Google Scholar 

  34. Tonioni, A., Poggi, M., Mattoccia, S., & Di Stefano, L. (2019). Unsupervised domain adaptation for depth prediction from images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2396–2409.

    Article  Google Scholar 

  35. Li, D., Yang, Y., Song, Y.-Z., & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision (pp. 5542–5550).

  36. Dinsdale, N. K., Jenkinson, M., & Namburete, A. I. (2021). Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal. NeuroImage, 228, 117689.

    Article  Google Scholar 

  37. Das, D., Santosh, K. C., & Pal, U. Cross-population train/test deep learning model: Abnormality screening in chest x-rays. In 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS) (pp. 514–519).

  38. Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Medicine, 15(11), e1002683.

    Article  Google Scholar 

  39. Hägele, M., Seegerer, P., Lapuschkin, S., Bockmayr, M., Samek, W., Klauschen, F., Müller, K.-R., & Binder, A. (2020). Resolving challenges in deep learning-based analyses of histopathological images using explanation methods. Scientific Reports, 10(1), 6423.

    Article  Google Scholar 

  40. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE, 10(7), e0130140.

    Article  Google Scholar 

  41. Sweeney, L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10, 571–588.

    Article  Google Scholar 

  42. Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I. Y., & Ghassemi, M. CheXclusion: Fairness gaps in deep chest x-ray classifiers.

  43. Guenther, F., Brandl, C., Winkler, T. W., Wanner, V., Stark, K., Kuechenhoff, H., & Heid, I. M. (2020). Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration. Genetic Epidemiology, 44(7), 759–777.

    Article  Google Scholar 

  44. Suriyakumar, V. M., Papernot, N., Goldenberg, A., & Ghassemi, M. Chasing your long tails: Differentially private prediction in health care settings. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, FAccT ’21, (pp. 723–734). Association for Computing Machinery. Virtual Event, Canada.

  45. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318.

  46. Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H., & Ferrante, E. (2020). Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences of the United States of America, 117(23), 12592–12594.

    Article  CAS  Google Scholar 

  47. Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H., Ferrante,E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis, vol. 117, no. 23, pp. 12592–12594. Proceedings of the National Academy of Sciences of the United States of America,.

Download references

Author information

Authors and Affiliations

Authors

Contributions

Concept and design: RC, JG, and IB. Study selection: RC, MS and IB. Data extraction: RC, MS and IB. Drafting of the manuscript: RC, MS, JG, BP, and IB. Critical revision of the manuscript for important intellectual content: HT, BP, LC, Supervision: IB.

Corresponding author

Correspondence to Imon Banerjee.

Ethics declarations

Competing Interests

Authors declare no conflict of interest.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Correa, R., Shaan, M., Trivedi, H. et al. A Systematic Review of ‘Fair’ AI Model Development for Image Classification and Prediction. J. Med. Biol. Eng. 42, 816–827 (2022). https://doi.org/10.1007/s40846-022-00754-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40846-022-00754-z

Keywords

Navigation