Skip to main content

Advertisement

Log in

Physician–machine partnerships boost diagnostic accuracy, but bias persists

  • Research Briefing
  • Published:

From Nature Medicine

View current issue Submit your manuscript

In a large-scale digital experiment on dermatology diagnosis, we found that specialists and generalists achieved diagnostic accuracy of 38% and 19%, respectively. With decision support from a fair deep learning system, the diagnostic accuracy of physicians improved by more than 33%, but the gap in accuracy of generalists widened across skin tones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Experimental design and key results.

References

  1. Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. AI in health and medicine. Nat. Med. 28, 31–38 (2022). This review article covers advances in medical image analysis, problem formulation in human–AI collaboration, and common challenges, such as data scarcity and racial bias.

    Article  CAS  PubMed  Google Scholar 

  2. Liu, Y. et al. A deep learning system for differential diagnosis of skin disease. Nat. Med. 26, 900–908 (2020). This paper demonstrates the potential of AI assistance in supporting general practitioners and nurse practitioners in diagnosing common skin diseases.

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse curated clinical image set. Sci. Adv. 8, eabq6147 (2022). This paper reports that state-of-the-art dermatology AI models are less accurate on dark skin tones than on light skin tones.

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  4. Almaatouq, A. et al. Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral science. Behav. Brain. Sci. https://doi.org/10.1017/S0140525X22002874 (2022). This paper proposes an integrative experimental design whereby researchers map the design space of possible experiments and test these experiments together to promote commensurability in behavioral science.

    Article  PubMed  Google Scholar 

  5. Groh, M. et al. Evaluating deep neural networks trained on clinical images in dermatology with the Fitzpatrick 17k dataset. Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 1820–1828 (2021). This paper presents a large dataset of clinical images annotated with the Fitzpatrick skin type scale and demonstrates that deep learning classifiers are most accurate on skin tones similar to those it was trained on.

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Groh, M. et al. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat. Med. https://doi.org/10.1038/s41591-023-02728-3 (2024).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Physician–machine partnerships boost diagnostic accuracy, but bias persists. Nat Med 30, 356–357 (2024). https://doi.org/10.1038/s41591-023-02733-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-023-02733-6

  • Springer Nature America, Inc.

Navigation