What is Explicability For?
Before getting to what explicability is and who it is for, we must understand what the purpose is for a principle of explicability for AI. This will go some way towards understanding what explicability is and who it is for. I argue that a principle of explicability is primarily for the maintaining of meaningful human control over algorithms. The idea is that an explanation of an algorithm’s output will allow a human being to have meaningful control over the algorithm—enabling the ascription of moral responsibility to that human being (or set of human beings). With an explanation of the algorithm’s decision, it is possible for human beings to accept, disregard, challenge, or overrule that decision. The Center for a New American Security (CNAS), for example, writes that it is necessary that “human operators are making informed, conscious decisions about the use of weapons” and that “human operators have sufficient information to ensure the lawfulness of the action they are taking…”.Footnote 10
There are, however, other features of meaningful human control that would not be captured by explicability. Meaningful human control over autonomous driving systems may not require human beings to have any say over a particular decision because of the psychological limitations of the human driver to gain cognitive awareness in time to act (Heikoop et al. 2019). Santoni de Sio and van den Hoven (2018) argue that meaningful human control occurs when algorithms meet ‘track’ and ‘trace’ conditions. We must be able to trace responsibility for the outcomes of algorithms back to human beings. The decisions of algorithms must also track human values. While I use a specific conception of meaningful human control (i.e. giving humans the ability to accept, disregard, challenge, or overrule an AI algorithm’s decision), I am not arguing that this conception is the best one. Rather, this is the conception that I argue is implicit when one requires that AI be explicable.
We must keep in mind that an explicability principle for AI is ethical in nature. The starting point for these lists is that there are ethical problems associated with algorithms. If the design and development of algorithms follow a particular set of principles, then, it is believed that the resulting algorithm will be ‘good’, ‘trustworthy’, or ‘responsible’. So, a principle of explicability is an attempt to overcome some ethical issues unique to algorithms.
Ethical value is to be contrasted to the epistemic value explicable AI might provide. Explicable AI may be extremely valuable to researchers and others who would be able to use explanations to better understand their domain. Garry Kasparov, for example, may find an explanation of a particular chess move made by an algorithm beneficial for his own ability to play chess.Footnote 11 A doctor may find an explanation useful to better understand how to diagnosis a particular disease. This epistemic value of explicability for AI is not under dispute. In these cases, we are not harmed by the opacity of the algorithm’s decision-making process. A principle of explicability, in contrast, is ethical in that it is about preventing harm (broadly construed) that could occur due to the opacity of the algorithm.
What is the ethical issue that is giving rise to this principle? One candidate is the issue of understanding what went wrong if something harmful happens as a consequence of the algorithm. For example, if a self-driving car swerves into a barrier killing its passenger(s) then it would be helpful to have an explanation of what caused this to happen in order to prevent it happening in the future. While a principle of explainability would help with this, it does not capture the full range of ethical issues that explicability aims to overcome. For example, if someone is incorrectly denied a loan by an algorithm how will we know that something harmful has happened so that we can demand an explanation?
This points to the ethical issue of ensuring that the outputs of algorithms are not made based upon ethically problematic or irrelevant considerations. We expect, for example, a rejection for a loan not to be based on the color of the applicant’s skin (or a proxy thereof). An explanation of the algorithm’s decision can allow for someone to accept, disregard, challenge, or overrule the rejection. This gives meaningful control of the decision to human beings. This goes above and beyond the stipulation that some particular human is responsible for the algorithm’s decisions. This provides a human with the information they need in order to exercise that control.
Explicability, therefore, is an attempt to maintain meaningful human control over algorithms. Only human beings can be held morally accountable so it should be human beings that are in control over these decisions (see e.g. Johnson 2006). If a human being has an explanation of the algorithm’s decision, then it is possible for that human being to accept, disregard, challenge, or overrule that decision.
Who is Explicability for?
How the requirement that AI be explicable is understood depends upon who will receive the explanation. A medical diagnosis algorithm that classifies someone as having a brain tumor might, for example, provide a heat map of which parts of the brain scan most contributed to the diagnosis. This ‘explanation’ would probably be useless to a patient—or to anyone else without very specific medical training. However, if the goal is that the algorithm is under ‘meaningful human control’ then we are not concerned with the patient’s understanding of the explanation.
Just as with any diagnosis, we trust that our physician is making a justified decision in line with current medical practice. The physician should be ultimately responsible for the brain tumor diagnosis and therefore it is the physician who should be able to evaluate the explanation. Remember that the purpose of the explanation is to overcome an ethical problem; namely, to establish meaningful human control over that decision by allowing one to confirm that the reasons for a decision are in line with domain-specific norms and best practices.
To illustrate, let us say that an algorithm rejects a loan application. This algorithm is able to provide an explanation in the form of considerations that played a factor in its rejection. One of those considerations was the fact that the application included a high debt-to-income ratio. To the applicant, this is interesting to know but it would be quite unclear whether their debt-to-income ratio was at a level that justified its factoring in on a decision to reject their loan application. Only those with relevant domain-specific knowledge would be able to evaluate whether this particular debt-to-income ratio should factor into a decision to reject the loan. This only gets more complicated as more considerations factor into algorithmic decisions.
To achieve the ethical goal of a principle of explicability the explanation provided by an algorithm should enable a human being to have meaningful control over the decisions the algorithm makes. This means that the person using the algorithm is the person that the explanation should be directed towards—not the person subject to the decision of the algorithm (although those two roles may be filled by the same person). While the person subject to the algorithm’s outputs may be interested to know the explanation (and in some cases should be provided with it in order to achieve other ethical goals),Footnote 12 this does not establish meaningful human control over the algorithm’s output.
Artificial Intelligence
‘Artificial Intelligence’ is an overused phrase that signifies many things. Explanation also has many uses depending on the context. We have had artificially intelligent systems for decades that did not result in any calls for explanation. This is mainly because what is known as good old-fashioned AI (GOFAI) is simply a set of explicitly coded rules in the form of a decision tree that allows for the automation of processes. For example, if you wanted to automate the decision on which move to make in chess it may look like this:
This is clearly a terrible algorithm for deciding your next chess move—a much more sophisticated algorithm designed using GOFAI could be achieved. However, this kind of automation is inherently explicable because the code makes the reasons for a resulting decision explicit. Opacity with regard to this type of automation would only occur if the institutions doing the automating did not want people to know how the decisions are being made (see e.g. Pasquale 2015).
GOFAI is in contrast to AI that falls under the umbrella of machine learning (ML). The GOFAI approach is limited by what the designers of the algorithm could think of. Novel situations may result in terrible decisions by the AI. ML is one approach to overcome such limitations. In a nutshell, ML attempts to use statistical methods to allow an algorithm to ‘learn’ every time it ‘tries’ to achieve its specified goal. Each attempt, whether it fails or succeed, will result in the algorithm updating its statistical probabilities that correlate to features of the input.Footnote 13
An ML algorithm could be trained to play chess by playing many times without explicit rules given by humans. The ML algorithm may play at random the first time—losing very easily. At the end of the game, we would tell the AI that it lost. The next game the AI would play slightly differently. Over hundreds, thousands, or even millions of games the AI would be very well trained to play the game of chess. The resulting trained ML algorithm would be opaque with regard to its reasoning for any given move.
Is it acceptable that the algorithm makes decisions that are not explicable? If you share my intuition that there is no problem here, it may stem from the fact that the outcomes of these ‘chess move’ decisions cannot result in harm. A terrible chess move may result in the loss of the chess game, but life, limb, reputation, and property are not at stake. An AI making decisions in other contexts, such as medical diagnosis and judicial sentencing, could cause real harm.
The point here is to show that the principle of explicability is important due to the rise of algorithms using ML or other methods that are opaque with regard to how the algorithm reaches a particular decision. If we are simply using automated processes (e.g. GOFAI) then explicability is only a problem if the developer intentionally obfuscates the explanation. In these cases an explanation is readily available to developers and companies; however, they do not see it in their interest to reveal that explanation to the public. While not addressed here, this problem is very important (see Pasquale 2015).
Explicability
So if one is using an ML algorithm for decisions that could result in harm and responsibly wants to adhere to a set of principles that includes a principle of explicability, what is one to do? First, one would need to know what is being demanded by a principle of explicability. That is, what is an explanation that would satisfy the principle?
First, we could be demanding a causal explanation for a particular outcome/action/decision. For example, when Google’s image classification algorithm classified two young black people as gorillas there was an outcry and much embarrassment for Google (Kasperkevic 2015). If Google were to explain the algorithm’s classification by saying that “features of the image input correlated highly with training images classified as gorillas” I doubt that anyone would be satisfied. We are not concerned with how the algorithm classifies images in general. Rather, we want to know why the label ‘gorillas’ was applied to a specific image by the algorithm. In other words, we demand to know the specific features of the image that contributed to the labeling.
Scientific explanations also give us answers to how things happened. However, we do not want to know the how; rather, we want to know the why. I do not want to know how my daughter hit her brother: “I raised my right arm and moved it forward at high velocity”, but the why: “he took my favorite stuffed animal from me.” The latter why explanation is an explanation that provides the reason(s) that a particular action was taken. This reason or reasons may or may not morally justify the action. These reasons are precisely what we want to evaluate. In the case of ML we could get an explanation like the following excerpt used to describe how DeepMind’s AlphaGo chooses its next move:
At the end of the simulation, the action values and visit counts of all traversed edges are updated. Each edge accumulates the visit count and mean evaluation of all simulations passing through that edge is the leaf node from the ith simulation, and 1(s, a, i) indicates whether an edge (s, a) was traversed during the ith simulation. Once the search is complete, the algorithm chooses the most visited move from the root position (Silver et al. 2016)
This, if you are a person with the requisite knowledge to understand it, is an explanation of the how for a particular move in the game of Go made by the algorithm-driven process. It says nothing about the particular features of that move which contributed to the decision to make the move. One could attempt to provide a justification for a particular move made by the algorithm by referencing the effectiveness of the algorithm itself: “the move chosen by the algorithm is a good move because the algorithm has proven to be very good at the game of Go”. We can see that this is an unsatisfying explanation when we apply it to a different context. If the best heart surgeon in the world were to leave a sponge in the patient and a nurse were to ask: “why did she leave the sponge in the patient?” and someone were to respond “it was good to leave the sponge there because the decision was made by the best surgeon in the world.” What we really want with an explanation are all (and only) the considerations important for their contribution to a particular decision—considerations that a human could use to determine whether a particular algorithmic decision was justified.
We could give a general explanation of sorts for opaque algorithms in any context. Why did the ML algorithm decide to label a convicted criminal as high-risk? Because data used as an input to the algorithm correlated with features of data used to train the algorithm that were tagged as having a high risk. While this is an explanation, it clearly falls short of what is desired by the principles highlighted above. What is really desired is an explanation that would provide a human with information that could be used to determine whether the result of the algorithm was justified.
An explanation may justify a particular decision or it may not, and, a decision may be justified by reasons that do not feature in an explanation of that decision (see e.g. Dancy 2004, ch. 5; Darwall 2003). If, for example, I were to make a move in chess because I thought that it would make the board more balanced (in terms of aesthetics) we would have an explanation for the move that I made that failed to justify the move. However, that move may also have been the best move I could have made—making the move justified. While it was a great chess move, I doubt anyone would take my advice on a future move—nor should we if we knew that an algorithm was using board balance as a consideration in favor of a particular move. This shows that we cannot simply look to the decision itself and ask whether that decision was justified or not. An algorithm may flag someone as a dangerous criminal who in fact happens to be a dangerous criminal—justifying the algorithm’s classification. However, if the consideration leading to that classification was the person’s race then we have an explanation that fails to justify the decision whether the decision was correct.
In short, what is desired is an explanation providing the considerations that contributed to the result in question. This gives a human being the information needed in order to accept, disregard, challenge, or overrule the decision. In the same way that a police officer might claim in court that a particular criminal is high-risk, and the judge asks for the considerations used to justify such a label, we want the algorithm to justify itself in reference to the considerations used.
A justification for this ‘high risk’ label given by the police officer might be that while in custody the criminal threatened to do much more harm once she was free. The judge may accept this as a good justification and sentence the criminal to the maximum allowable prison sentence. If, on the other hand, the police officer justified this label by saying that the criminal was really dark-skinned and menacing looking, then the judge (hopefully) would reject the police officer’s label of ‘high-risk’. If an algorithm was delegated the task of labeling criminals as ‘high-risk’ and did so as a result of race, then we would want the judge to know that so that she could reject the algorithm’s decision. A technical, causal, or scientific explanation does not allow the judge to have meaningful human control over the algorithm.