Introduction

Dan Brown’s, 2017 novel, Origin, centers on the mysterious the assassination of a tech billionaire, Edmund Kirsch, streamed live online. Although the assassination itself was carried out by a human religious fundamentalist, the plot was orchestrated and put into action by the Kirsch’s own AI assistant, “Winston”. Winston determines that the public assassination is the most efficient and effective way to ensure that the Kirsch’s major announcement goes viral and has the desired public impact. Winston, lacking ordinary human capacities, like empathy, cannot comprehend the moral implications of this radical promotion strategy.

While no artificial intelligence (hereafter AI) as sophisticated as Winston yet exists, as AI becomes ubiquitous, AI systems are increasingly involved in and create novel, morally charged situations. Self-driving cars provide a concrete and straightforward example about how these questions play out. Fully autonomous cars may shift moral responsibility away from the human occupants of vehicles; if vehicles are not responsible, it is unclear to whom responsibility should be attributed. This problem becomes particularly pressing in cases where human creators of the machine cannot predict or explain it’s actions (and there is no clear case of negligence). Without an account of how we can ascribe moral responsibility to AI, it is difficult to determine who or what is responsible for the death of a pedestrian or any other moral harm caused by a fully autonomous car–or whether there is some sort of gap in our system for ascribing moral responsibility. Thus, an area of concern for artificial intelligence and machine ethics is understanding what it means for a machine to be morally responsible.

As we determine how to assess moral responsibility for AI, we must remember that when AI systems become sufficiently sophisticated, they will join a pre-existing moral community.Footnote 1 Humans will not, and should not, try to design an ideal moral community from scratch simply because there may be a point at which we will induct new members to that moral community. For a moral community to function, ascription of moral responsibility must be mutually intelligible to all moral agents and must be intuitive to the human members of the moral community. Any morally responsible AI will join a moral community that already exists; the rules and norms for ascribing moral responsibility to AIs must cohere with the rules or norms already in place for the human moral community in question. Although the procedures for moral address, for example, might be different for AI systems than for humans, we will live in a shared moral world and our ascription of moral responsibility must reflect that.Footnote 2 It isn’t clear why appending an account of moral responsibility for AIs onto an unrelated account of moral responsibility for humans would be a satisfying explanation of moral responsibility, or whether we could even successfully maintain two wholly different approaches to moral responsibility within one moral community.

We offer a Strawsonian account of morally responsible AI. On a Strawsonian account of moral responsibility, individuals are morally responsible when they are appropriately subject to reactive attitudes such as blame or resentment. Strawsonian accounts are standard accounts of moral responsibility that begin with the social and emotional foundations that undergird our everyday, interpersonal ascriptions of moral responsibility. Since there is no consensus view on how to ascribe moral responsibility to AI, there have been a variety of perspectives.Footnote 3 We argue that a Strawsonian account provides a clear set of standards that AI would have to meet in order for it to gain moral agency; it provides a way to incorporate AI into the preexisting moral world.Footnote 4

After outlining the Strawsonian framework, we evaluate the conditions under which it would be appropriate to subject machines to the same reactive attitudes to which people are appropriately subject. We argue that employing a standard account of moral responsibility is more intuitive and intelligible to humans than starting with an abstract vision of an ideal moral machine. We also demonstrate that while Strawsonian accounts have been applied to other non-human entities, such as corporations (q.v. Tollefsen, 2003), with useful results. In the case of corporations and other groups, just as with individuals, entities can only be held morally responsible when those entities have the necessary interpersonal capacities. Thus, these applications of Strawsonian accounts of moral responsibility may license the inclusion of certain AI into the moral community.

Given Strawson’s prominent place in more general discussions of free will and moral responsibility, we are surprised that little work on morally responsible AI employs an explicitly Strawsonian perspective.Footnote 5 We hope to contribute to the understanding of morally responsible AI by employing a Strawsonian account of moral responsibility to determine when AI systems themselves—as opposed to creators or programmers—can be held morally responsible for their actions. In this discussion, we distinguish between strong AI and weak AI. A strong AI is one that behaves indistinguishably from normal agents and has an inner life. Strong AI stands in contrast to a weak AI, an artificial intelligence that mimics human behavior successfully, but has no will.Footnote 6 We agree with Matthias (2004) that “black box” weak AI falls into a “responsibility gap”, and we provide a further argument that strong AI and only strong AI can be morally responsible, according to a Strawsonian account.

This paper primarily aims to demonstrate that a Strawsonian account of moral responsibility provides a useful framework for deciding how to ascribe moral responsibility to AI. We realize that any AI that meets the criterion we suggest is necessary for this ascription of moral responsibility—namely Strong AI—is quite distant. Even fully autonomous cars, at least as we currently conceive of them, will not have the necessary capacities to be morally responsible for their actions; they will fall into a responsibility gap. However, it is important to consider when we can ascribe moral responsibility before AI is anywhere near developing such a technology. It is also important to delimit the responsibility gap.

Why Strawson?

Although AI systems are novel entities in the moral landscape, we must incorporate them into the landscape of the already existing moral world. Whatever system of moral responsibility we use to ascribe responsibility to AI must cohere with the system that we already have. No matter how metaphysically different we feel that AI systems are from human agents, we must come up with a way to ascribe moral responsibility to them in a way that is intelligible and intuitive in the moral world that currently exists. We argue the appropriate approach is to consider how AIs might fare on a standard account of human moral responsibility, rather than marrying fundamentally different ideas of moral responsibility (i.e. one for AI, one for humans) into a cohesive account. While no account of moral responsibility commands universal assent, Strawsonian accounts carry wide appeal and explanatory power, while side-stepping thorny metaphysical issues. Thus, determining how AIs fare in a Strawsonian framework merits investigation.

We must take seriously the Strawsonian injunction that reactive attitudes have deep roots in human life. If a broadly Strawsonian account of moral responsibility is basically correct, then an account of morally responsible AI that lacked dependence on reactive attitudes might be impossible to construct. It is both hasty and question-begging to assume that understanding issues regarding morally responsible AI demands the development of an account of moral responsibility that does not rely on attitudes.Footnote 7 We need only develop such an account if we begin by assuming both that AI can in fact be morally responsible and that no AIs will ever have attitudinal capacities. Yet, the question at stake is whether or not AIs can be morally responsible. Hence, such an approach begs the question. To avoid this, one must consider the merits of different views of moral responsibility and the ways in which AIs fit in these views, rather than creating a new version of moral responsibility to fit our assumptions about AI. We make no claim that our Strawsonian approach is either the only one worthy of consideration or the obviously correct approach.Footnote 8 However, given that Strawsonian approaches hold great sway in debate about human moral responsibility, we should examine their possible application to AI. In fact, the lack of consensus indicates that we must consider a Strawsonian framework if we hope to resolve the debate about the moral responsibility of AI.

Exploring the application of a Strawsonian account of responsibility to AI is reasonable in light of the possibility of progress in AI research. AIs may someday acquire the emotional capacities a Strawsonian account of moral responsibility requires. Although there are currently no strong AIs, their future existence is an open possibility, and their existence would put substantial pressure on us to understand the nature of moral responsibility for artificial agents. At least some strong AIs may have whatever emotional capacities are required for a reactive-attitude-centered account of moral responsibility. Artificial General Intelligence (AGI) is a likely prerequisite step to Strong AI (see Sun, 2001). Müller and Bostrom surveyed experts, who overwhelmingly predicted that Artificial General Intelligence (AGI)—in other words, AI that isn’t limited to a specific set of tasks—by the end of this century.

Attitudes and ordinary capacities

In brief, the account of moral responsibility in Strawson’s “Freedom and Resentment” is that an agent is morally responsible when that agent is an appropriate object of the interpersonal “reactive attitudes” such as blame, resentment, and indignation. These attitudes characterize ordinary social interactions; they respond to the perceived quality of another’s will. We typically act as though those around us are appropriate objects of reactive attitudes (Strawson, 2008, p 10). The interpersonal reactive attitudes reflect, and may be partially constitutive of or identical to, a demand for goodwill from others (Strawson, 2008, p 15; Hieronymi, 2020, p 14). We are rightfully indignant or resentful of someone who expresses an ill (or insufficiently good) will. Being genuinely indignant or resentful is a way of saying to another: “Why did you do that to me? I am entitled to better and I know that you, a member of my moral community, are able to provide this better treatment.”

Strawson notes two ways in which we step outside normal interpersonal interactions. First, we excuse normally responsible agents when we learn something about the circumstances or quality of their will that changes our view of a particular situation. Perhaps they were pushed down the stairs before they knocked into us and didn’t mean to hurt us, or perhaps they were coerced into hurting us when someone threatened their family. In cases like these, we do not respond with interpersonal reactive attitudes because we see that the harm done to us does not reflect the quality of their will. There is no moral defectFootnote 9 in a person who hurts us simply because they knocked into us after being pushed.Footnote 10 (Strawson, 2008, pp 7–8).

Second, we may treat someone as an inappropriate object of any interpersonal reactive attitudes because they are exempt. Entities are exempt when they fail to meet the criteria for membership in the moral community. Common examples of exempt entities might include small children, animals, and inanimate objects. Exempt entities are not “like us” in that they lack the capacities required to interact in a way that expresses a will. Whereas excused beings have a will and are excused when they cause harm if their actions do not express an ill will (like the person pushed down the stairs), exempt entities lack the kind of will which could be of good or bad quality in the right way. (Strawson, 2008, pp 8–9).Footnote 11

We do not respond to exempt beings with the standard set of interpersonal reactive attitudes. Instead, we employ muted attitudes, characteristic of the “objective” stance (Strawson, 2008, pp 9–10). The circumstances in which we take on the objective stance are outliers; they lack the necessary capacities for membership in the moral community.

Exactly which capacities are required for membership in the moral community is a matter of some debate. Clearly, any Strawsonian account of moral responsibility requires full members of the moral community possess the capacity for the interpersonal reactive attitudes, the capacity to see the interpersonal reactive attitudes as demands for a certain kind of treatment or regard, and the capacity to respond to those attitudes. Call these capacities the core capacities. Creatures that lack any of those capacities cannot be morally responsible and are thus exempt. We use Heironymi’s extension of the Strawsonian framework to determine what these core capacities should be and how to determine what entities are exempt. Hieronymi argues that the capacities in question are the actual capacities possessed by members of the actual moral community. Hieronymi calls these statistically ordinary capacities. If our emotional constitution or our capacities were significantly different, presumably we would have correspondingly different expectations of one another and would live under a different system for attributing moral responsibility (29). Statistically ordinary capacities, then, provide an explanation for the grounds of exemption and full membership in the moral community.

The statistically ordinary capacities of the members of the actual moral community establish the basic workings of our interpersonal reactive attitudes and the details of the interactions a community takes to display the quality of a will.Footnote 12 The appeal to statistical ordinariness means that there may be some individuals connected with the moral community, e.g. children, who lack those capacities; such individuals are exempt (Hieronymi, 2020, chapter 2). As Hieronymi (31) describes it: “‘We normally have to deal with people of normal capacities; so we shall not feel, towards persons of abnormal capacities, as we would feel towards those of normal capacities.’ Those who lack the capacities required to fit into the usual system tolerably well, are, for that reason, exempted from it.”

If we direct interpersonal reactive attitudes at an exempt entity, we make a mistake. Something is wrong with a person who genuinely blames a table when they bang their knee, or genuinely blames a baby who throws food. We might be irritated or angered by these things, but blame is misplaced. A person who genuinely blames the table or baby is at best deeply confused about what is actually required for moral responsibility, and possibly deluded about the reliability and accuracy of their powers of perception. Psychopaths are often able to blend into society, follow social rules, and masquerade as possessing core capacities to be full members of the moral community. Empathy, which psychopaths lack, is a statistically ordinary capacity. While we might be angry at—and would likely punish—a psychopath for causing harm, once we recognize them as psychopaths we react with an objective stance toward their transgressions because they lack this core capacity.Footnote 13 Psychopaths illustrate that being perceived as having core capaticites is not identical with actually possessing them.

Plausibility for applying a Strawsonian framework to entities other than individual humans

While the moral agents Strawson considered were individual human beings, it seems appropriate to expand our Hieronymi/Strawsonian account of moral responsibility to any sort of entity that has the statistically ordinary capacities of the human moral community. Just as some humans (e.g. psychopaths, children) are excluded for lacking those capacities, it seems reasonable that we should consider non-humans as Strawsonian moral agents so long as they have the required capacities. Here, we consider Strawsonian accounts of corporate moral responsibility, and show that they do not license the inclusion of AIs in the moral community.

Tollefsen argues that groups, social institutions, and corporations can serve as agents in a Strawsonian framework precisely because they have the required capacities of moral agents. She does not mean merely that the individual members of a collective are moral agents (and have the necessary capacities), but rather that the collective qua collective has these capacities and bears moral responsibility (Tollefsen, 2003, p 219). She terms this “shared moral responsibility” (Tollefsen, 2003).

Tollefsen establishes the following intuitive claim: humans have reactive attitudes towards collectives. These reactive attitudes are not toward a particular person, but a company—or even an industry—as a whole.Footnote 14 Tollefsen uses the example of tobacco companies that, despite the data, routinely lied that cigarettes were harmful. While individuals might speak on behalf of the company, those who blame tobacco companies do not (merely) blame an individual CEO or even the CEOs of the large companies, but the companies as a whole. Similarly, many blame BP, and not merely its executives, for the oil spill. There are also cases where we morally praise collectives, for example Doctors without Borders, for their work across the globe.

We can see that collectives, as opposed to individual CEOs or leaders of those collectives, are the objects of praise and blame, because there are distinct cases where we blame or praise a CEO instead of the collective. For example, we may blame a particular caucus in congress for the way it voted or we may blame a particular individual representative for a particular vote. There may be cases where we blame both the collective (which engaged in a collective deliberative process before mobilizing members to vote) and a specific individual for their particular vote or for the particular role that individual representative played in the deliberative process by which the caucus made the decision. Tollefsen argues that group deliberation is a form of distributed cognition because while individuals may collectively gather information or voice an opinion, the group itself collects more information than any individual, and the group makes the decision (Tollefsen, 2003, p 228).

To determine whether the reactive attitudes that we have towards collectives are appropriate, Tollefsen suggests we consider two questions “does the behavior exhibit ill will?” and “is the behavior capable of moral address?” (Tollefsen, 2003). The scholarly consensus supports collective intentionality and there are some who argue for collective intentional states (e.g. Gilbert, 2014, pp 94–128, 131–180, 229–256; Tollefsen, 2002, Tuomela, 2013). Margaret Gilbert among others contends that collectives do, in fact, have reactive attitudes (e.g. Gilbert, 2014, pp 229–256) that may be different from the reactive attitudes of individual members. One may, for example, feel remorse as a member of a collective of which one is a member for actions one did not undertake oneself and which one would not have done as an individual. A collective or group is also capable of moral address. Capacity for moral address requires normative competence, “a complex capacity enabling the possessor to appreciate normative considerations, ascertain information relevant to particular normative judgement, and engage in effective deliberation” (Doris, 2002, p 136). Tollefsen argues that collectives have this capacity because they have a process that allows them to engage in effective deliberation as a group (Tollefsen, 2003, p 228).

If Tollefsen’s account of corporate moral responsibility is to cover AIs, they must exhibit ill-will, and be capable of moral address. At this point in the development of AI, there is no reason to think that either is the case. The actions of AIs do not exhibit a will at all, and by Tollefsen’s standards, they do not have the capacity for moral address because they do not have normative competence. While extant AIs may be able to gather, retain, and sort information (and perhaps even deliberate, in some sense of the word) it is not clear that they are able to do so as a way of respecting normative considerations. In the next two sections, we outline the conditions AI would have to meet in order to satisfy something like Tollefsen’s conditions.

Responsible AI?

According to the account of moral responsibility sketched above, one part of what it means to have morally responsible AI is that the rest of the moral community is disposed to respond to it with the interpersonal reactive attitudes. If nobody is disposed to respond to AI with interpersonal reactive attitudes, it is hard to imagine how it could be genuinely morally responsible. Additionally, there must be an account of what it means to rightly respond to AIs with interpersonal reactive attitudes. After all, a psychologically abnormal person might attribute reactive attitudes to a wide variety of things which are not apt for them. We could imagine someone who genuinely resents their coffee table when they hit their knee. This person cannot make their coffee table morally responsible merely by their attitudes toward it; rather, the person is deploying reactive attitudes inappropriately. A table cannot have the core capacities to make it a member of the moral community. Thus, for us to rightly respond to AI with reactive attitudes, it must have, and not just seem to have, the statistically ordinary capacities of that moral community. We call this AI Statistically Responsible Artificial Intelligence (SRAI). An artificial intelligence’s moral responsibility depends on two things:

  1. 1.

    Whether the moral community is disposed to respond to the AI with interpersonal reactive attitudes

  2. 2.

    Whether it in fact has the statistically ordinary core capacities in the way that the rest of the members of the moral community in question generally do.

Our view of SRAI has distinct advantages over some other approaches that try to bring AI into our moral world. In particular, Coeckelbergh (2014) proposes that we ought to stop thinking about AIs having morally relevant properties (be they mental or otherwise). Instead, he urges us to consider the ways in which we relate to particular AIs embedded in a social context and the circumstances in which we attribute moral responsibility to them. We take much of Coeckelbergh’s view to be congenial to our account. The response-dependent aspect of a Strawsonian view emphasizes the importance of considering the ways in which AIs are embedded in a particular moral community and in which members of that community perceive those AIs. Moreover, the process of gathering the appropriate evidence to answer questions about the appropriateness of ascribing properties to AIs to be a process of living with and interacting with them, which we take to be in line with the general thrust of Coeckelbergh’s position.

We diverge from Coeckelbergh, and other entirely response-dependent views of moral responsibility, in requiring the actual presence of certain properties. The Hieronymi/Strawson account we propose provides tools by which we can determine what properties are required for responsibility. Part of Coeckelbergh’s critique is that it is unclear which properties are relevant to moral responsibility. In our view, the relevant properties are picked out by reference to the moral community in question. Statistically ordinary core capacities determine which properties are relevant; we can identify the relevant properties by investigating the actual moral community. We leave tackling the question of how to determine whether an entity has these properties until later.Footnote 15 Here, we simply note that we do not think it necessary to prove beyond the shadow of a doubt that a will exists in other humans in order to ascribe moral responsibility to them; similarly, we see no need to conclusively prove that AIs have a will to ascribe responsibility to them. Moreover, the process of gathering the evidence required to appropriately ascribe responsibility to AIs is a process of living with and interacting with them, which we take to be in line with the general thrust of Coeckelbergh’s position. Here, we simply note that we do not think it necessary to prove beyond the shadow of a doubt that a will exists in other humans in order to ascribe moral responsibility to them; similarly, we see no need to conclusively prove that AIs have a will to ascribe responsibility to them.

Strong AI, and Only Strong AI

SRAI must be strong AI. We are likely to respond to SRAI with interpersonal reactive attitudes. After all, it behaves indistinguishably from us, thus, it acts as though it has the capacities to merit being an object of interpersonal reactive attitudes. Additionally, because an SRAI is a strong AI, it has the core capacities. In other words, SRAI genuinely manifests a will, rather than mimicking one. Therefore, SRAI can be rightly subject to demands that it display a certain quality of will. A small but important conclusion is the following: strong AI is, or at least can be, morally responsible by virtue of the fact that we appropriately react to it with interpersonal reactive attitudes. An AI like Winston, described in the opening, would not count as Strong AI in our sense of the term because it lacks the requisite interpersonal attitudes.

Weak AI meets the standards for exemption on the Strawsonian picture. Exemption is appropriate when an entity fails to possess the capacities required to engage in moral life, where these capacities are determined by the capacities possessed by the actual members of the moral community. These capacities include not only ones related to storing and processing information and following instructions, but capacities to demand a certain kind of goodwill from others through expression of certain attitudes (as opposed to merely seeming to have a certain kind of attitude) and to express good and ill will in its own actions and attitudes (as opposed to merely seeming to do so). In particular, these capacities involve reacting with what humans understand as emotions, since emotions ground our reactive attitudes. Weak AI, by definition, lacks these core capacities and meets the standards for exemption.

Weak AI may act indistinguishably from the humans, but it does so with no inner life or will. Its actions, no matter how consequentially damaging or harmful, cannot express an ill will because a weak AI has no will to express. Therefore, it would be wrong to resent, blame, or express interpersonal reactive attitudes towards a weak AI, even if it responds convincingly to an expression of resentment, rage, or other emotional demands for certain treatment.Footnote 16 We may be frustrated with the consequences of its existence or frustrated that we must now resolve a new problem, but we cannot blame or resent weak AI. Praising Weak AI for expressions of goodwill would be similarly inappropriate. We may be pleased at the consequences of its actions, or find that it makes life more enjoyable, but it is wrong to treat this as an appreciation of its expressions of goodwill.

An example illustrates why a good mimic—like a weak AI with convincing human-like behavior—meets the standards for exemption. Consider again accidentally banging your knee on a coffee table. When you hit your knee, you’d reasonably feel some sort of negative emotion about the experience. That makes sense; the whole state of affairs is unfortunate and you’d rather it not have happened. Blaming your coffee table for hurting you in the way you’d blame a mugger in an alley for hitting your knee would be nonsensical. The mugger’s activity is naturally interpreted as an expression of ill will and the mugger possesses an ill will that can be expressed. A table has no will and thus cannot express an ill will; it is not apt for blame, even if you’re frustrated. Weak AI is like the table: there is, by definition, “nothing going on inside” a weak AI. To be clear, “what is going on inside” must include emotions if an entity is to be morally responsible on a Strawsonian account, because emotions are partially constitutive of reactive attitudes.Footnote 17 Without these emotions, the attitudes in question just are not the interpersonal reactive attitudes (Strawson, 2008, 10). No weak AI is capable of these attitudes, and so no weak AI has the relevant core capacities.

Our conclusion is that AIs are morally responsible if and only if they are a particular sort of strong AI, SRAI. Intuitively, any strong AI with capacities for reactive attitudes is morally responsible on our broadly Strawsonian picture. After all, SRAI would differ from us only in that it happens to be artificial and mechanical—a bit of code—and these do not seem to be morally salient differences between humans and machines.Footnote 18

Objections

In this section, we’ll consider two objections to our position that only strong AI can be morally responsible. The first objection argues that the proliferation of weak AI might alter the moral landscape and shift what “statistically responsible” AI means. The second objection argues that it is practically impossible to draw a bright line between strong AI (which is morally responsible) and a weak AI which can perfectly mimic human moral decision-making. Both objections, in short, attempt to posit circumstances under which we might plausibly conceive of weak AIs as altering the ways in which we ascribe moral responsibility. We respond to each of these objections. In our response to the second objection, we provide some loose guidelines as a heuristic to address this problem of determining when we have created AI with the relevant capacities.

Statistically ordinary weak artificial intelligence

Objection

Our interpretation of the grounds for exemption from the reactive attitudes turns on the idea of statistical ordinariness; the capacities that one must have if they are not to be exempt are those that are normal for members of a given moral community. An objection to this view might be that weak AIs are soon to be universal. Weak AIs already aid in making decisions with moral valence (e.g. AI often make the initial cut for large pools of job applications). When cars become fully autonomous, AI will be making life and death decisions constantly. As weak AIs become more integrated into our lives, they will participate—in some way or another—in most moral decisions. If this were to happen, moral decision making would be riddled with the capacities possessed by weak AI. Indeed, it would seem that the weak AI would be so prevalent in moral decision making that they would impact what we count as statistically normal capacities in this decision making.

Response

While a world with a plethora of weak AIs would likely require the development of new social norms and policies, this does not force us to accept weak AIs as members of the moral community. Consider the number of beings with which we have social interactions who are not members of the moral community. There are already many children and adults of diminished or altered mental and social capacities, and so on. Though we interact with children and the intellectually or socially impaired frequently, and have norms for interacting with them, we don’t take them to be members of the moral community. There being a greater number of exempt beings does not require us to treat them as full members of the moral community. For example, if the proportion of children in the population suddenly skyrocketed, we would not respond by including them as full members of the moral community simply because of their number.Footnote 19 Instead, we would consider that the number of beings towards whom it is inappropriate to direct the full suite of interpersonal reactive attitudes had increased.

Similarly, no matter how many weak AIs participate in moral decisions, they still meet the standards for exemption; they fail to have the capacities for the attitudes that constitute the demand for a certain kind of treatment. Weak AI possesses merely a convincing facsimile of social and emotional capacities that characterize moral life. No matter how many of them there are, weak AIs can only seem to make demands for certain kinds of behavior or to demand the expression of a certain kind of will, can only seem to express an ill will towards anything, because they have no will to speak of.

A massive spike in the number of weak AIs would certainly lead to social changes. Even exempt entities are still objects of policy; it is plausible that new norms will be needed for a society in which a large proportion of beings are weak AIs. However, their being an object of policy does not change the fact that weak AIs lack the core capacities to be members of the moral community.

A practical problem

Objection

Our method for ascribing moral responsibility to AI requires that AIs which are morally responsible have a will and the emotional capacities that allow them to be both subjects and recipients of reactive attitudes. However, it is impossible—in a practical sense—to determine whether AIs do in fact have the required will and emotional capacities. Indeed, any AI that seems to be a strong AI may be merely an exquisite mimic. This seems to render the Strawsonian approach to ascribing moral responsibility to AI nothing more than a theoretical exercise, since weak and strong AIs are indistinguishable.

Response

It is true that we cannot look “inside” an AI to determine whether it is strong or weak, but neither can we look inside our neighbors. In other words, people whom we consider to be members of our moral community might turn out to be philosophical zombies (p-zombies).Footnote 20 We cannot look “inside” each other to determine that the beings we interact with aren’t p-zombies. Nevertheless, we ascribe moral responsibility to others regularly; such ascription is the only way that we can function in a moral world.

Just as any usable policy, law, or social norm governing human interaction will make reference to observable features of the humans in question, any usable policy, law, and social norms governing the interactions between humans and AI must do the same. When non-artificial intelligences interact with artificial ones, we'll inevitably direct attitudes at them and we must evaluate the aptness of those attitudes at least partly on the basis of observable features of those artificial intelligences. It seems no more implausible to do this in the case of AI than in the human case.

Despite the limits of our observational powers, we believe it is important to understand at a theoretical level when AIs are and are not morally responsible because this theoretical framework will color the way in which we observe and interact with AIs. If we enter into these interactions with nothing more than a naive interpretive toolkit—one according to which things that simply seem morally responsible are morally responsible—then there is a real danger of treating far too many beings as morally responsible when they in fact are not. This is not to say that we should assume no AIs are morally responsible; that would be to go too far in the opposite direction. After all, the Strawsonian approach we’ve described here has the result that at least some AIs (the strong ones) are candidates for being morally responsible. Instead, we think that the proper approach is to have a healthy skepticism about whether any particular AI is morally responsible, and having a good theoretical account of what it takes for an AI to be morally responsible is an important part of developing and maintaining that skepticism.

It is likely that we will be unable to use the same observational standards to attribute will and emotional capacities to AI that we would employ with other humans. Instead, we argue that we should determine whether an AI has the necessary capacities to be part of the moral community using a standard based on that which is used to ascribe particular cognitive traits to animals other than humans. In particular, we argue that we should use the same practices that are used by cephalopod researchers; they follow the principle that we should only ascribe human-like capacities to octopus (and other cephalopods) when such behavior can be explained in no other way.

Cephalopods, and particularly octopuses, provide a useful analogy to AI because, like AI, they can interact with data in highly complex ways, but they are startlingly different from humans. The last common ancestor between humans and octopuses lived about 600 million years ago (Godfrey-Smith, 2016, p 8) and lacked many important attributes that are common to both humans and octopus. Octopuses are often regarded as particularly intelligent (Godfrey-Smith, 2016), but the way that they perceive and interact with the world is so alien to us, that it is difficult for us to even test their intelligence (Godfrey-Smith, 2016, pp 43–76). For example, our best explanation of octopus color change is that an octopus can see with its skin (Godfrey-Smith, 2016, pp 120–121), a concept that is challenging for us. AI poses some similar problems, given that it’s perception of the world is completely different from our own. AI that interacts with anything visual (e.g. a photograph, the world) does not see in the way that we do but receives the information pixel-by-pixel. This difference in ways of gaining information about the world will make it challenging for us to determine when AI, in fact, has the capacities in question, but no more challenging than it would be to sort out whether an octopus or intelligent extraterrestrial life form had the necessary capacities. Given that we already use such standards to determine what kinds of protections animals deserve, it seems reasonable to extend these standards to artificial entities as well.

Using a set of standards that have already been developed for assessing the intelligence, emotions, and intentional states of animals is preferable to alternatives developed to specifically assess AI. Sullins (2006) is an example of an attempt to create a way to ascribe moral responsibility to machines specifically that tries to avoid considering whether those machines have a will. Sullins proposes a three-pronged test for morally responsible machines. The three prongs that ask evaluators to consider (1) the autonomy of the machine, (2) intentional behavior, and (3) position of responsibility. The second prong asks “is the robot’s behavior intentional?” and Sullins contends that behavior counts as intentional “as long as the behavior is complex enough that one is forced to rely on standard folk psychological notions of predisposition or ‘intention’ to do good or harm” (Sullins, 2006, p 28). It isn’t clear what Sullins means by “forced”. It isn’t clear what could possibly compel us to see a machine this way as long as competing explanations are available. If what Sullins means is that ascription of intentional mental states is the most natural explanation for machine behavior (if not the only logically consistent), then such an explanation seems to resurrect the mental aspects he pains to avoid, because this is the same way that we ascribe intentionality both to animals and to other people. In fact, it seems similar to the way that we ascribe intentional states of cephalopods mentioned above. The position of responsibility criterion seems to only apply to a niche set of cases, even among AI systems. Using a Strawsonian approach, one need not have the machine or AI have a specific professional responsibility–such as a nurse–to be a moral agent. Any AI that appropriately displayed and received reactive attitudes shows that it understands the moral community and manifests a will.

Conclusion

In this paper, we’ve argued that taking a Strawsonian perspective on moral responsibility concludes that only strong AI can be morally responsible. Although strong and weak AIs may be externally or observationally indistinguishable, only a strong AI has the attitudes necessary for genuine moral responsibility. We’ve defended this view against the objections that the Strawsonian perspective requires the admission of weak AI as morally responsible, and the objection that there is no real point in making the moral responsibility of AI dependent on unobservable features, such as attitudes.

We opened this article by noting that human/AI interaction is becoming increasingly common. Only through continued human/AI interaction can we determine whether AIs can be moral agents, full or marginal. Without continuing to interact with AIs in moral situations—especially novel moral situations—we have no way of gaining a deep understanding of the moral responsibility of AIs. Perhaps, then, we are best described as pessimists about any kind of armchair solution to the question of whether or not there are genuine artificial moral agents. If Strawson is right that the reactive attitudes are a deeply rooted part of human existence, then we will only answer this question by interacting with AIs in circumstances with genuine moral stakes and making a careful study of both our own attitudes and the best explanations of the AI actions. The other lesson here is that the discussion of morally responsible AI is perhaps poorly named. It may be that what is at stake is “reliably predictable and consequently beneficial AI,” as opposed to anything moral.