The ever more integrated use of conventional and integrative DSS in many different contexts has generated a prolific scholarly description and analysis of the interactions occurring between humans and these machines. Some describe these interactions coming from an instrumental perspective as essentially advanced tool-use (Köhler 2020), in which the behavior of the machine is merely a highly complex process that can be adjusted to respond to complex human actions. Others have taken the autonomous processes of these machines to amount to something more action-like, rendering the description of these interactions more akin to cooperation, collaborations or partnerships (e.g., Patel et al. (2019) for diagnostic imaging, see Nyholm (2018) for a general analysis).
Without a coherent terminology to describe these interactions and the specifics of the relation between human and DSS, it will be difficult to provide an accurate description of the frictions that may arise within these interactions. Additionally, the structure of these frictions will depend on features of the DSS, requiring a technology-sensitive approach to assess the conflicts occurring between humans and a specific system. The wide variety of potential applications as well as the particularly high stakes of conflicts in using DSS in medical contexts presuppose clear conceptual distinctions for an accurate analysis. We propose some terminological choices that allow disentanglement of some conceptual and normative confusions, and to propose some norms to help integrate DSS further into clinical processes.
For this, we first suggest understanding any friction between the expected and the actual output between human physicians and DSS as conflicts. This means, that the diagnostic output of a DSS, and the diagnostic prediction (or expectation) of a physician, are incongruent with each other.
Importantly, we propose to understand conflicts not so much as a problem to be overcome but as an indication of an opportunity to be seized. This positive reframing is based on the idea that certain conflicts, disagreements and ambiguities are indicative of differences that need to be acknowledged and offer an opportunity for improvements. On this account, (the right kind of) conflict is the place where actual progress can be made, progress in epistemological, practical or other terms.
The idea to positively reframe conflicts is indebted to the tradition of American pragmatism, notably the work of John Dewey, whom we quote here at length (Dewey 1922, 301):
What is to be done with […] facts of disharmony and conflict? After we have discovered the place and consequences of conflict in nature, we have still to discover its place and working in human need and thought. What is its office, its function, its possibility, or use? In general, the answer is simple. Conflict is the gadfly of thought. It stirs us to observation and memory. It instigates to invention. It shocks us out of sheep-like passivity, and sets us at noting and contriving. Not that it always effects this result; but that conflict is a sine qua non of reflection and ingenuity. When this possibility of making use of conflict has once been noted, it is possible to utilize it systematically to substitute the arbitration of mind for that of brutal attack and brute collapse.
Thus, Dewey proposes to see conflicts in human need and thought as an extension of conflict in nature. Both instigate innovative change and progress.Footnote 2 Here, we offer a further extension of this idea to cover conflicts between humans and machines and will show in the following how an analysis of conflicts elucidates opportunities for improvements in clinical processes.
Mistakes and malfunctions
Conflicts between physicians and DSS can be caused by different forms of friction or incongruence. Diagnostics is known to operate with wide margins of uncertainty, as both our knowledge about diseases in general as well as their formation in individual patients is still limited. Thus, proposing different diagnoses can be based on three different reasons: either, the physicians is wrong in their assessment, the DSS is wrong in theirs, or both are within range of probable results.
To clarify how we should go about dealing with these different causes for conflict, we first analyze the two instances in which either side commits an error. Thus, a conflict caused by such an error can and should be resolved by taking the error-free side.
These two types concern the incorrect operation by one of the interacting parties, the physician or the DSS. If a physician makes a mistake in operating the system or in their own deliberations, the DSS’s recommendations may be in conflict with the anticipated results. The same counts for a physician who misunderstands or misapplies diagnostic evidence or theories and thereby ends up with erroneous beliefs about the diagnosis, while the DSS may be of faultless assistance. On the other hand, an incorrectly operating system may be operated correctly but still produce false results. In the former case we speak of mistakes (where the error lies with the physician), in the latter of malfunctions (where the error lies with the system). One could also speak of mistakes as “operator errors”, and malfunctions as “operating errors”.
However, due to the complexity and autonomy of decision-making procedures within these systems, the descriptions of human–machine interactions as more than mere mistakes or malfunctions ought to be accounted for in the reconstruction of conflicts. With DSS that, as some philosophers have argued, ought to be capable of replicating the human decision-making process (Lin 2015), especially in cooperating with humans (Nyholm 2018), the frictions potentially occurring here can gain a new quality.
This leads us to the third option, in which a conflict is caused by either both sides being wrong, or both sides having sufficient error-free evidence to support their incompatible claims. Physicians may, for example, begin perceiving some DSS as quasi-agents due to their complex problem-solving skills and the need for physicians to rely on their correct operation. Insofar as it may be impossible for the physicians to fully reduce the process that led a DSS to generate a recommendation to its deductive elements, they may begin to perceive a conflict between a DSS and their own opinion as one about equally weighty beliefs. A physician cannot simply reject a DSS suggestion anymore, even if they came to different results. Conflicts about beliefs or actions with this kind of similar standing, however, are usually considered a different kind of conflict, i.e., disagreements.
Yet, the perception or presumption of a disagreement is, per se, not sufficient for establishing the presence of actual disagreement. In human–human disagreements, one side (or both) may be mistaken in their argument, and thereby not be in disagreement with each other but in a mistake-based conflict (malfunctions-based conflicts may not occur in human–human disagreements).
As a diagnostic AI system only detects patterns, it does not “know” what these patterns represent nor is it designed to provide interpretation. In consequence, disagreeing with the conclusion of a machine may be entirely based on the machine’s arbitrary misinterpretation of certain features of an image. To call such a conflict between a physician and an AI a proper disagreement seems implausible and undesirable: Implausible, because it suggests we can disagree with (quasi-) agents who use unintelligible processes, if any at all. We equally do not disagree with a parrot that merely repeats what it was trained to say without representing the meaning of the sentence. And it is undesirable, because it shifts the burden of proof partially towards the physician who has to justify their decision in case of conflict, which potentially solely rests on a machine’s lack of representations and “common sense.” For these types of conflicts, we would suggest counting them among malfunctions (Flach 2019).
However, this reconstruction does not catch the full scope of which I-AI is capable, as not all of these conflicts can be satisfyingly understood as caused by malfunctions or mistakes. Reconstructing a semi-autonomous diagnostic recommendation system as a simple tool denies the far-reaching role and influence it can have in the diagnostic process, in which the AI may actually perform most of the cognitive work. The system's degree of autonomy, even if not at the level of full human autonomy, and its ability to assess certain features on its own to guide the procedure clearly surpasses what usually is expected from and contributed by “tools”. The relevance of the cognitive work, the ability of some DSS to access non-knowledge-based evidence, and the altogether insufficient explainability (Mittelstadt et al. 2019) gives credence to the perceptions of those conflicts as disagreements.
Perceived disagreements thus are often irresolvable in the way disagreements among physicians are usually resolved: Discursive methods of exchanging reasons to change an opinion under current C-AI and I-AI methods are unachievable due to the fundamental difference in “reasoning” of deep neural networks and other machine-learning paradigms (Pelaccia et al. 2019).
An example-case may help to clarify the proposed terminological set-up: Imagine a physician utilizing a diagnostic machine for assisting in breast-cancer detection as a second opinion, one of the more common uses of DSS in clinical contexts. For their first patient, the physician writes down wrong information from the patient’s file about previous cancer-treatments in their own assessment. After the patient went through the DSS-assisted analysis, the DSS correctly refers to previous treatment methods, while the physician’s diagnosis planned first-time treatments. This conflict is caused by a mistake, as the physician failed in their duty to carefully evaluate the patient. For their second patient, the physician proposes their tentative diagnosis and awaits the machine’s response. As it turns out, the DSS claims the patient to be in the very late stages of cancer with very little chance of healing. As the patient does not report any kind of issue, it is soon discovered that the machine misread the medical images provided due to a malfunction. For the third patient, both physician and machine have different diagnoses, with the physician diagnosing the patient to not have breast cancer, while the DSS is positively diagnosing an early stage of the disease. While the physician has drawn a different conclusion, they can appreciate how the DSS reached a different result (and are sufficiently uncertain to outright reject such result). While in all three situations, both physician and DSS disagree, it requires further analysis as to how they are in conflict and why their respective resolution causes different ethical problems.
From the perspective of a physician in a clinic relying on largely independently operating AI, ultimately, perceiving the interaction with the machine as a cooperation will, in case of conflict, often lead to perceived disagreements. This constitutes a morally distinctive and sensible situation, for it is not inconceivable, then, that physicians will feel the burden of proof shifting towards them, with their expertise questioned, when disagreeing with the diagnostic recommendation of a DSS. Instead of improving the physician’s diagnosis, such perceived conflict puts their own expertise under pressure of justification. A certain “shadow expertise” is established through the DSS, in which physicians may be in potential competition with a machine that cannot take responsibility for its operational error, while, in fact, physicians must take this responsibility.
From meaningless to meaningful disagreements
With an idea in place about the potential for some sort of disagreement among human–machine interactions in medical diagnostics, the analysis can turn towards the normative problem of how to integrate these DSS in medical, and in particular diagnostic, processes without compromising their quality. As these issues are concerning the integration into physicians’ work environments, we focus on two problematic consequences of integrating non-human expertise in clinical processes. This section will describe them and show how they motivate the need for something we propose to call “meaningful disagreement”. Such a term can alleviate these consequences and stands in contrast to “meaningless disagreements”. These are, according to our terminology, merely perceived disagreements which fall back into the categories of mistakes or malfunctions.
A first problematic consequence of integrating DSS in diagnostic processes can be an increased reluctance of doctors to make their own, independent diagnoses (Grote and Berens 2020), as the correctness of these diagnoses can be challenged by AI. Some have even suggested (Kompa et al. 2021; Mozannar and Sontag 2020) that recommendations of an AI may be tailor-made to fit the success score of individual physicians. This proposal claims that the benchmarks when an AI proposes a diagnosis to a physician should be set in comparison to performances of individual physicians. In these cases, an AI only proposes a diagnosis if its confidence score is on average significantly higher for diagnosing a certain disease over the success score of the physician to whom it is recommending the diagnosis. This shifts the burden of proof substantially towards the physician.
As such a score will negatively affect the willingness to propose more “risky” diagnoses, i.e., those of comparatively rare diseases, it will potentially decrease the overall diagnostic precision of physicians who use DSS (Grote and Berens 2020, 208). This turns the very idea of assisting physicians to make more precise diagnoses on its head. The risk and the associated moral costs of being wrong will increase for the physician if there is an I-AI proposing an alternative that the physician cannot classically “reason” with (ibid.).
A second problematic consequence of integrating DSS in diagnostic processes comes from the complementary perspective of conflict-avoidance: rather than reluctance to propose one’s own genuine diagnosis, we can expect the misuse of these DSS to generate diagnoses a physician can then agree to without ever having developed and proposed their own diagnosis. Notably, DSS are conceived as support systems, i.e., not certified to make their own decisions. However, they are capable of decreasing human control to a binary “confirming” or “denying” the suggestion made by a DSS. This conceivably exerts a pressure to “let the machine go first” (McDougall (2019), stresses this point from a patient-perspective) and adjust one’s own diagnosis accordingly, so that no disagreement occurs. In doing so, the machine’s prowess de facto replaces human expertise.
Unfortunately, Braun et al., despite discussing the need for meaningful human control, reject the idea that such perceived disagreements may be a potential area of conflict (Braun et al 2020, 7). We, however, argue that we should account for those. We can encounter I-AI DSS that are so complex that those interacting with them have at least pragmatic reasons to take a DSS’s diagnosis as something they can disagree with. Braun and colleagues consider conflicting diagnoses a “misnomer” because for most decisional situations, there is not one precise way to move forward (ibid.), and because most diagnoses are judgments of relative (un-) certainty, relative to the given evidence and available resources and theories. However, as analyses have suggested in other areas of human–machine interactions (HMI), the perceptions of humans in these interactions factor in the expectations of norms regulating these interactions (Bankins and Formosa 2019). We take these perceived disagreements to be not only a psychological fact but a challenge to be incorporated into the norms of a clinic.