To merit the trust of patients and physicians, AI in health care must focus on matters of ethics, evidence, and equity.
Ethics
Ethical AI must uphold the fundamental values of medicine as a profession and as a moral activity grounded in relationships between “someone who is ill, on the one hand, and someone who professes to heal on the other” [5]. While incorporating new technologies is expected in health care, AI-enabled technologies possess characteristics that set them apart from other innovations in ways that can impinge on a therapeutic patient-physician relationship. Notably, AI algorithms are trained on datasets of varying quality and completeness and are implemented across multiple environments and thus carry the risk of driving inequities in outcomes across patient populations. Further, the most powerful, and useful, AI systems are adaptive, able to learn and evolve over time outside of human observation and independent of human control [6], while accountability is diffused among the multiple stakeholders who are involved in design, development, deployment, and oversight and who have differing forms of expertise, understandings of professionalism, and goals [7].
Despite these new challenges, existing frameworks lay a foundation for the ethical design and deployment of AI in health care and can help guide our understanding of the current state of AI principles.
For example, guidance in the AMA Code of Medical Ethics on ethically sound innovation in medical practice (Opinion 1.2.11) provides that any innovation intended to directly affect patient care be scientifically well grounded and developed in coordination with individuals who have appropriate clinical expertise; that the risks an innovation poses to individual patients should be minimized, and the likelihood that the innovation can be applied to and benefit populations of patients be maximized [8]. Opinion 1.2.11 further requires that meaningful oversight be ensured—not only in the development of an innovation, but in how it is integrated into the delivery of care.
The Code further addresses issues in the deployment of AI in Opinion 11.2.1, “Professionalism in Health Care Systems,” which emphasizes the ethical need to continuously monitor tools and practices deployed to organize the delivery of care to identify and address adverse consequences and to disseminate outcomes, positive and negative [9]. Opinion 11.2.1 explicitly requires that mechanisms designed to influence the provision of care not disadvantage identifiable populations of patients or exacerbate existing health care disparities and that they be implemented in conjunction with the resources and infrastructure needed to support high value care and professionalism. Institutional oversight should be sensitive to the possibility that even well-intended use of well-designed tools can lead to unintended consequences outside the clinical realm—in the specific context of AI, for example, when the use of clinical prediction models identifies individuals at risk for medical conditions that are stigmatizing or associated with discrimination against individuals or communities.
Ethics Guidelines for Trustworthy AI published by the European Commission’s High-Level Expert Group on Artificial Intelligence in 2019 highlights the essential role of trust in the development and adoption of AI and proposes a framework for achieving it [10].
The report states that trustworthy AI should be lawful, ethical, and robust. It should be based on human-centered design and adhere to ethical principles throughout its life cycle: respect for human autonomy, prevention of harm, fairness, and explicability. This report cautions that AI systems may pose risks that can be difficult to predict or observe and raises awareness about potential impacts on vulnerable populations. The report maintains that trustworthy AI requires a holistic approach involving all parties and processes, both technical and societal.
The European Parliamentary Research Service recently published a study, Artificial Intelligence: From Ethics to Policy that conceptualizes AI as a “real-world experiment” full of both risks and potential benefits [11]. In this framing, AI systems must meet the conditions for ethically responsible research: they must protect humans, assess predicted benefits, and appropriately balance these benefits against the risks AI systems pose to individuals and society. As in the Ethics Guidelines for Trustworthy AI, AI is viewed as a socio-technical system. It should be evaluated within the context of the society in which it is created. Recognizing that technology not only reinforces the way the world works today but can dictate the way it will work in the future, the report stresses the importance of incorporating ethics as an explicit consideration throughout the design, development, and implementation of AI.
Evidence
To date, the evidence base for health care AI has focused primarily on the validation of AI algorithms, and a review of the literature reveals a lack of consistency in terminology and approach [12]. To strengthen the evidence base and earn the trust of patients and physicians, AI must systematically show that it meets the highest standards for scientific inquiry in design and development and must provide clinically relevant evidence of safety and effectiveness.
Existing frameworks for designing, conducting, and evaluating clinical research, such as the development process for drugs and devices approved by the U.S. Food & Drug Administration [13,14,15] offer a model on which to ground a standardized approach to meet this responsibility. At a minimum, an AI system intended for use in clinical care must demonstrate, first, that it is the product of a design protocol that addresses clearly defined, clinically relevant questions and objectives, and a well-documented, scientifically rigorous, and consistent validation process that demonstrates safety and efficacy. Then, that the AI system has been reviewed by a diverse team of well-qualified subject matter experts, and transparently reported in keeping with standards for scientific publication as discussed below.
Given the unique nature of AI, we must be prepared to revisit and refine these core requirements as technology evolves. A review of the literature shows that there are multiple approaches to evaluating the quality and level of evidence needed in health care applications. GRADE (Grading of Recommendations, Assessment, Development and Evaluation) is a method of rating the quality of evidence and the strength of clinical practice recommendations [16]. The International Medical Device Regulators Forum (IMDRF) has developed a risk categorization framework for Software as a Medical Device (SaMD) that assigns an impact level (category I – IV) to SaMDs based on two major factors: the significance of the information the tool provides to the health care decision and the state of the health care situation or condition [17, 18]. These types of evidence and risk frameworks can inform the levels of validation and evidence required for AI systems and address many of the ethical considerations that have been raised in the literature, including socio-technical environment considerations. The IMDRF framework also stresses the importance of post-market surveillance through a continuous learning process driven by real-world evidence. Recognizing that the use of AI in health care can range from administrative tasks to algorithms that inform diagnosis or treatment, it is critical that the level of evidence required be proportional to the degree of risk an AI system may pose to patients.
Bias
Given its centrality to concerns about AI in health care it is appropriate to draw attention briefly to the potential for bias in the design, operation, or deployment of adaptive systems in clinical settings [19, 20]. Algorithms trained on electronic health records (EHRs), as most currently are, risk building into the model itself whatever flaws exist in the record [21]: EHRs capture information only from individuals who have access to care and whose data are captured electronically; data are not uniformly structured across EHRs; and the majority of data in EHRs reflect information captured “downstream” of human judgments, with the risk that the model will replicate human cognitive errors [21, 22]. Moreover, well-intended efforts to correct for possible bias in training data can have unintended consequences, as is the case when “race-corrected” algorithms direct resources away from patients from minoritized populations rather than provide equitable personalized care [23].
Efforts to build fair adaptive models must meet challenges of mathematically defining “fairness” in the first place, [24, 25] and of determining just what trade-offs between fairness and model performance are acceptable [25]. Beyond these challenges, even algorithms that are, hypothetically, fair out of the box may become biased over time when they are deployed in contexts different from those in which they were created, or when they “learn from pervasive, ongoing, and uncorrected biases in the broader health care system” [19]. Models may be followed uncritically, or be implemented only in certain settings such that they disproportionately benefit individuals “who are already experiencing privilege of one sort or another.” Finally, they may preferentially select or encourage outcomes that “do not reflect the interests of individual patients or the community” [19].
Equity
The AMA’s vision for health equity is a nation where all people live in thriving communities where resources work well, systems are equitable and create no harm, everyone has the power to achieve optimal health, and all physicians are equipped with the consciousness, tools, and resources to confront inequities as well as embed and advance equity within and across all aspects of the health care system. While great opportunity exists for technological innovations to advance health equity, current models of resource allocation, evidence development, solution design, and market selection fail to incorporate an equity lens – risking the automation, scaling, and exacerbation of health disparities rooted in historical and contemporary racial and social injustices.
Equity issues arise when the data set used to train an algorithm excludes or underrepresents historically marginalized and minoritized patient populations, failing to account for significant differences in experience or outcomes associated with patient identity. The design of the algorithm itself might exacerbate inequities if proxies or assumptions are based in historical discrimination and injustices, as illustrated by the disease management algorithm cited above.
So too, while algorithms are often exalted as more objective than humans, they are developed by humans who are inherently biased [26]. Solution design and development in venture-backed startups, large technology companies, and academic medical centers often lack representation of marginalized communities – with Black, Latinx, LGBTQ + , people with disabilities, and other populations excluded from resourced innovation teams and in user testing efforts.
The 2018 report on AI in health care by AMA’s Board of Trustees recognized that one of the most significant implications for end users of AI systems is that these systems can, invisibly and unintentionally, “reproduce and normalize” the biases of their training data sets [1]. Sociologist and Princeton University Professor, Ruha Benjamin, PhD in her book, Race After Technology presents several powerful examples of how “coded inequities…hide, speed up, and even deepen discrimination, while appearing to be neutral or benevolent when compared to the racism of a previous era.” She also discusses lack of intentionality as an inadequate excuse for perpetuation of biases and discrimination [27].
The implications for those developing and evaluating health care AI solutions are that an equity lens must be applied intentionally from the very beginning – in populating the design and testing team, the framing of the problem to be solved, the training data set selected, and the design and evaluation of the algorithm itself. This challenge to developers and evaluators aligns with the European Commission’s Statement on Artificial Intelligence, Robotics and ‘Autonomous’ Systems that “Discriminatory biases in data sets used to train and run AI systems should be prevented or detected, reported and neutralized at the earliest stage possible” [28]. It is also critical that we recognize AI as a downstream lever connected to larger upstream issues of inequity in our health system. Even if AI solutions are designed with a more intentional equity lens, we must understand that their deployment is within a system that distributes resources and allocates opportunities for optimal health and wellbeing to some communities at the expense of others. As powerful advocates for patients, physicians have an opportunity to look upstream and ask not just about the design of the algorithm itself but what it will mean for the health and care of patients in the environment within which it is implemented.