1 Introduction

Many optimists about technology think that artificial agents can one day achieve moral human-equivalence (Allen et al. 2005; Anderson and Anderson 2011; Coeckelbergh 2009; Gunkel 2012). Optimists typically also think that artificial agents can achieve cognitive human-equivalence – but my focus is on the moral dimension. Discussions of the moral possibilities of artificial agents are divided into two areas: artificial moral agency and artificial moral patiency. I focus here on artificial moral agency. This paper rejects the challenge that artificial systems cannot be moral agents because they cannot competently respond to moral reasons. But it presents another: artificial systems cannot competently respond to moral reasons authentically. I hope to raise awareness of this challenge as an underexplored avenue to challenging the possibility of artificial moral agency and to propose the aim of mitigating or overcoming it in the future.

To be clear, with ‘moral agency’ I am referring to responsible moral agency. Moral agency standardly includes a moral responsibility condition, but some, such as Luciano Floridi and J.W. Sanders (2004) and Tigard (2021), argue that moral agency and moral responsibility come apart. I do not entertain that sort of proposal here. In this paper I understand moral agents to be agents who satisfy two conditions: they are generally responsible for their actions, and they are generally able to make moral decisions (and act upon them) competently. I take it for granted that most humans are moral agents, which leads to the ‘generally’ qualifier. Most humans are not responsible for all their actions, nor do they always make competent moral decisions. But they are nonetheless moral agents.

I use ‘reasons-responsiveness’ to describe these conditions more precisely. Reasons-responsiveness is a concept drawn from the literature in moral responsibility (Fischer and Ravizza 1998) and group agency (List and Pettit 2011) that can be used to describe a variety of agents without making deep assumptions about their ontology (List 2021; Santoni de Sio & van den Hoven, 2018). Using this terminology, making decisions competently can be described as an agent’s responding to the reasons they have for acting in a situation. For example, if it is raining, you have a reason to take an umbrella with you. Making moral decisions competently involves responding to moral reasons – if you have an opportunity to do something morally right (according to your preferred moral theory) you have a moral reason to do that thingFootnote 1. Making competent moral decisions involves responding to both moral and non-moral reasonsFootnote 2. As I will discuss, some claim that artificial agents will necessarily lack the ability to respond to moral reasons. If so, artificial systems can never be moral agents.

The structure of the paper follows. In Sect. 2. I briefly recap some arguments that artificial agents cannot respond to moral reasons. Then, I present a counterexample to these arguments: the Moral Decision Machine. I suggest that the Moral Decision Machine can respond to moral reasons. Section 3 defends this claim against several objections and Sect. 4 defends the very possibility of the Moral Decision Machine. If these defences are successful, then the challenge to artificial moral agency based on moral reasons-responsiveness fails. I discuss the potential ramifications of this in Sect. 5. Unfortunately, the remainder of the paper cannot be dedicated to celebration. Section 6 lays the groundwork for a new challenge to artificial moral agency by outlining the popular position in moral epistemology that moral deference is incompatible with moral agency. Section 6 then argues that the Moral Decision Machine is exclusively dependent on moral deference and therefore cannot be a moral agent. The moral reasons-responsiveness problem for artificial moral agency is thus different than initially cast: to be moral agents, artificial agents must be able to respond to moral reasons without exclusively relying on moral deference.

2 The moral decision machine

There are two central viewpoints that tend to deny that artificial agents can competently respond to moral reasons. Those who subscribe to ‘anti-codifiability’ worries about moral reasons (these are chiefly virtue ethicists and moral particularists) and those who think that competently responding to moral reasons involves possessing metaphysical qualities that artificial agents routinely (or necessarily) lack.

Duncan Purves, Ryan Jenkins & Bradley J. Strawser (2015) argue that artificial agents cannot competently respond to moral reasons with the ‘anti-codifiability’ argument. The anti-codifiability argument is based on the following anti-codifiability thesis. “The codifiability thesis is the claim that the true moral theory could be captured in universal rules that the morally uneducated person could competently apply in any situation. The anti-codifiability thesis is simply the denial of this claim, which entails that some moral judgment on the part of the agent is necessary.” (Purves et al. 2015). According to Purves et al. the anti-codifiability thesis is a common view entailed by many moral theories, especially moral particularism. Though they admit that if the anti-codifiability thesis is false then the anti-codifiability argument will fail. For the sake of argument, let us accept that moral judgement (or some equivalent process) is necessary for moral reasons-responsiveness. Purves et al. outline several possible understandings of moral judgement and argue that all are barred from artificial agents, which must, per Dreyfus (1992, p. 199), be “either arbitrary or strictly rulelike”. So, they claim that without the ability to make moral judgements, artificial agents find themselves unable to respond to moral reasons.

Mihaela Constantinescu & Roger Crisp (2022) make a variation of the same claim. They argue from an assumption resembling the anti-codifiability thesis to the claim that artificial systems cannot be virtuous.Footnote 3 It is unclear whether Constantinescu & Crisp are committed to the claim that artificial agents cannot respond to moral reasons at all as Purves et al. do – but they certainly claim that artificial agents cannot respond to the moral reasons in the right circumstance. Doing so, Constantinescu and Crisp (2022, p. 1552) say, is dependent on having a “particular conception of how one should live” and “of the nature of the good life”, and the personal life history (See also Sparrow, 2021, and Stenseke 2023) that “is needed for agents to embed context sensitivity in their deliberations”. These qualities, they suggest, are “an inner dimension that can obviously not be fulfilled by robotic AI systems”. Constantinescu & Crisp’s claim in terms of reasons-responsiveness is that artificial agents cannot competently respond to moral reasons – even though they may be able to respond to moral reasons without being sensitive to the circumstances.

Other arguments against artificial moral reasons-responsiveness do not depend on anti-codifiability worries but instead target artificial agents’ lack of some seemingly important metaphysical quality. Variously, it has been argued that artificial agents cannot respond to moral reasons because they lack consciousness (Himma 2009; Sparrow 2007; Torrance 2008, 2014), emotions (Brożek and Janik 2019; Véliz 2021) or free will (Johnson 2006; Chakraborty and Bhuyan 2023). These are of secondary concern here, but I believe that the coming counterexample applies just as well to these arguments as well.

I intend to deal with both sorts of arguments with a single counterexample showing that artificial agents can have the required qualities for moral reasons-responsiveness. The counterexample is the ‘Moral Decision Machine’. According to the arguments just mentioned, the Moral Decision Machine, as an artificial agent, should be unable to competently respond to moral reasons. But I think the Moral Decision Machine does just that. Here is the description of it:

The moral decision machine: A team of (highly funded) computer scientists aim to create a machine that competently responds to moral reasons. The machine’s designers collate a database of human moral decisions. This database, drawn from hundreds of years of human history, contains a series of ‘snapshots’ of human moral decisions, and there is a snapshot for almost any circumstance. The snapshots consist of empirical information about all (or close to all) the physical facts surrounding the moral decision, including the context, relationships, and role of the actor. Then the team of scientists program a computational artificial system, the Moral Decision Machine. The Moral Decision Machine follows this decision-making procedure: First, it gathers the empirical information from a live situation in the same way the snapshots were gathered, with an array of sensors, internet searches and memory banks. Then, the Moral Decision Machine trawls the snapshot database aiming to find a functionally identical empirical situation. It then acts in exactly the way the human in the historical snapshot acted (regardless of whether that human was right to do so).

Some may assume that developing the Moral Decision Machine will need to involve ‘supervision’, i.e., being trained by a human approving (or rejecting) its outputs. I do not want to precisely specify the technical details, but any supervisory practices would be intended to improve the machine’s ability to identify the best matching empirical situations, not its ability to identify ‘right’ or ‘wrong’ actions. Supervision is not the only option here either (Alloghani, 2020) pattern-matching and search reinforcement learning algorithms can be extremely effective without it (e.g., Grauman & Darwall, 2006; Leordeanu et al. 2012) As I will discuss in Sect. 4, the challenging technical aspect of the Moral Decision Machine is satisfactorily defining functionally equivalent identicality and achieving the degree of similarity between the snapshot and current situation required for the Moral Decision Machine to be effective.

I will defend the claim that the Moral Decision Machine makes moral decisions that (competently) respond to moral reasons. Furthermore, I will argue that it does so regardless of whether it has metaphysical qualities like free-will, consciousness, and emotions; and despite the anti-codifiability thesis being assumed to be true. The next sections defend this claim against two sorts of objections: objections that the Moral Decision Machine does not competently respond to moral reasons and objections that the Moral Decision Machine is impossible.

3 Objections to the moral decision machine: moral reasons-responsiveness

In this section, I answer a few objections against the claim that the Moral Decision Machine can respond to moral reasons.

3.1 Objection 1

Artificial agents cannot respond to reasons at all.

Purves et al. (2015) present this as part of their discussion of the anti-codifiability argument. If artificial agents cannot respond to any reasons at all, then the Moral Decision Machine example will not even get off the ground. Addressing this objection requires clarification about what it means to respond to reasons. Purves et al. think that responding to a reason means to have a propositional attitude such that one can be motivated to act in accordance with a reason. They suggest that artificial systems cannot have propositional attitudes.

However, it is far from settled whether this is true – there are two popular alternatives, first, that artificial systems either do not need propositional attitudes to respond to reasons (Coeckelbergh 2010; Danaher 2020; Floridi and Sanders 2004; Laukyte 2017), and second, that artificial systems do routinely have propositional attitudes (Dennett 1981, 1991; List 2021; List and Pettit 2011; Powers 2013). The first alternative is typically motivated by an ‘epistemic argument’ to the effect that ‘as-if’ intentional behaviour is sufficient to warrant an ascription of reasons-responsiveness. The most persuasive arguments for the second approach draw an analogy between artificial and collective systems, arguing that if collectives can have propositional attitudes, then artificial systems can too. Hess (2014) for example, argues that collective agents ought to be attributed propositional attitudes under every popular contemporary theory on the topic. List (2021) and Laukyte (2017) draw the analogy from collectives to artificial systems directly, and several others find no issue with attributing propositional attitudes to artificial systems (even Crisp & Constantinescu – 2022 p. 1550). One popular theory of propositional attitudes is interpretivism, which clearly accommodates attributing propositional attitudes to artificial systems.

A further point to make in favour of the attribution of reasons-responsiveness to artificial systems is that an agent can respond to a reason without having a conscious or representational internal state that contains that reason and without being able to explain themselves or recall the reason afterward. What’s required for reasons-responsiveness is a kind of causal co-variance such that an agent reliably acts in a way that promotes its goal in the face of a reason. I see that it’s raining outside and grab an umbrella, or I see a drowning child and leap in to save them, or I desire a cup of tea and take a series of actions to create one. None of these actions need me to have to be able to explain why I am motivated to do them or why I am motivated to do them in a certain way. Perhaps I have no ability to identify my reasons or explain why I ought to stay dry, why the child should be saved, or why I ought to add the milk after the teabag – but so long as there is a robust causal link (i.e., one not easily diverted or undermined), between my reasons and actions, then I am rightly said to respond to reasons. My understanding here centrally follows Fischer & Ravizza’s (1998) account of reasons-responsiveness but reflects several further discussions around this concept (Haji 1998; Mckenna, 2013), even on artificial agency (Kasirzadeh & McGeer, forthcoming; Moth-Lund Christensen 2022). Since reasons-responsiveness turns on relatively easy to model (mostly functional) causal relationships, saying that artificial agents respond to reasons is less controversial than saying that artificial systems exhibit intentionality and have propositional attitudes.

Defending this in more detail is beyond the scope of this paper, but the prevailing mood of the literature is to be optimistic about attributing propositional attitudes to artificial systems and the ball seems to be in deniers of artificial systems’ reasons-responsiveness side of the court.

3.2 Objection 2

The Moral Decision Machine does not respond to moral reasons.

Assuming, then, that the Moral Decision Machine does respond to some reasons, you might wonder whether it responds to moral reasons. Now, the moral reasons it responds to must be relatively precise – they must be reasons that cannot be captured by universal rules. After all, according to the anti-codifiability argument, moral judgement is needed because moral behaviour cannot be captured by rules such as those, and responding to moral reasons must involve exceptions and depend on the individual situation.

However, the Moral Decision Machine is designed precisely to address these types of concerns. Humans have moral judgement and competently respond to moral reasons, and the Moral Decision Machine is designed to respond to the same reasons that a human did, including the moral ones. Theoretically speaking, this depends on ‘ethical supervenience’ – which is well-defended in metaethics (see McPherson 2022) – being true. Ethical supervenience states that moral reasons supervene on a situation’s empirical facts: any change in the moral reasons in a situation must have a corresponding change in the situation’s empirical facts. Resultingly, empirically identical situations must contain the same moral reasons. In the historical snapshot, a human responded to the moral reasons in an empirical situation. The Moral Decision Machine responds to a functionally identicalFootnote 4 empirical situation with the same actions and therefore responds to the same reasons the human did, including the moral ones. It does this despite using a different type of process, as I will discuss, but the crucial point is that it responds to the reasons in its own situation – since reasons-responsiveness is, as described above, a process that denotes a robust causal link between reason and action, even if that causal link is more convoluted than usual (as it is for the Moral Decision Machine)Footnote 5. Moreover, it is rightly said to respond to those reasons even if it is unable to explain or identify them after the fact.

An example can help demonstrate. Imagine the Moral Decision Machine is deciding what to do in a hostage negotiation. There are many complex moral reasons involved: reasons about the welfare of the hostages, the police besiegers, and the hostage-taker themselves; reasons about collateral damage and responsibility attribution; reasons about the morally desirable outcomes, etc. The Moral Decision Machine first assesses the empirical situation through various sensors and mechanisms. It uses cameras and microphones to overhear conversations and scope out the location; it uses internet databases to find the blueprints of the building, the profiles of the participants, and access live media coverage. Then, it trawls through its own database and finds a functionally equivalent historical situation – this is the ‘snapshot’. In the snapshot, a human hostage negotiator (who may or may not have been competent) responded to the hostage situation, including the moral reasons involved, by performing a series of actions. Suppose they prioritised the hostages’ welfare by telling the police to charge in – doing so because they judge the hostage-taker and the police to hold greater responsibility than the hostages. Assuming ethical supervenience is true, the Moral Decision Machine faces the same moral reasons and, by making the call to charge in, responds to the moral reasons just as the human did.

3.3 Objection 3

The Moral Decision Machine does not competently respond to moral reasons because it is inconsistent.

Another line of objection concerns whether the Moral Decision Machine competently responds to moral reasons. It could be suggested that the Moral Decision Machine will have variance in its moral reasons-responsiveness and therefore be somehow incompetent. That is, unlike a competently moral reasons-responsive human, it may first respond strongly to one moral reason, then respond strongly to a different one.

To see the objection, consider the hostage negotiation again; suppose that the initial situation involved a discussion with the hostage-taker over megaphone – in the snapshot, the hostage taker surrendered in the face of careful and cautious tactics from the hostage negotiator. But, faced with the same tactics from the Moral Decision Machine, the present hostage taker refuses to surrender. At this point, the Moral Decision Machine searches for a new snapshot because its present situation is no longer functionally identical to the snapshot. Suppose the Moral Decision Machine’s new snapshot centres on a particularly aggressive and gung-ho hostage negotiator who immediately plans to storm the building. The Moral Decision Machine therefore does the same.

This behaviour is unusual. Humans tend to consistently emphasise the same moral reasons over time. Perhaps they are utilitarian and want to maximise welfare or pleasure; perhaps they are Kantian and tend to respond to reasons that are universalizable and treat other people as ends. But they do not tend to be utilitarian one minute and Kantian the next. But the Moral Decision Machine may be just like this.

But does this mean that the Moral Decision Machine does not competently respond to moral reasons? Note that in most cases the Moral Decision Machine will respond to moral reasons consistently. There is unlikely to be high variance in different humans’ responses in similar snapshots. Most hostage negotiators will respond to the same types of moral reasons. Even in the rare cases like above, the Moral Decision Machine may be accused of being erratic, but not, I think, of failing to competently respond to moral reasons.

Perhaps the accusation of a capricious moral character is grounds for denying the Moral Decision Machine moral agency – but if so, this is not on the grounds that it fails to competently respond to moral reasons, it is on the grounds that it fails to consistently respond to the same moral reasons over time. The threshold here is unclear: some level of moral capriciousness must be acceptable – I am still a responsible moral agent despite flip-flopping on whether eating meat is wrong, for example. But, in any case, the objection diverges from whether artificial agents can competently respond to moral reasons.

3.4 Objection 4

The Moral Decision Machine would occasionally fail to respond to moral reasons, so it does not competently respond to moral reasons.

This objection is worth raising but does not pose a serious concern. Moral Decision Machine might fail to respond to moral reasons if the human its snapshot centres on did. It is possible that the snapshotted human made an error or acted wrongfully, and the Moral Decision Machine would repeat that erroneous or wrongful action. However, occasional failure is consistent with the Moral Decision Machine’s competent moral reasons-responsiveness. After all, occasional failure does not lead humans to fail to be competently morally reasons-responsive. If the Moral Decision Machine is unexpectedly incompetent, it is because the average human is less morally reasons-responsive than expected. If this level of incompetence undermines moral agency, then the chances are that most humans are not moral agents, which seems to spell trouble for the concept and should be avoided. I think it safe to assume that humans are the paradigmatic moral agents – however incompetent they may be.

3.5 Objection 5

The Moral Decision Machine is not using moral judgment; therefore, it cannot respond to moral reasons.

Taking a more direct line from the anti-codifiability argument, could it be suggested that the Moral Decision Machine cannot use moral judgement and thus cannot respond to moral reasons?

One reply is that the Moral Decision Machine does use moral judgement. To make moral decisions, the Moral Decision Machine utilises two distinct processes. One process analyses the situation and searches the database and the other process reproduces the decision of the snapshotted human. Together, these two processes enable the Moral Decision Machine to competently respond to moral reasons. The snapshotted human’s decision is a direct consequence of their moral judgement, so the Moral Decision Machine does use moral judgements – it just does not use moral judgements alone, which seems unproblematic.

A second reply is that even if the Moral Decision Machine does not use moral judgements, per se, its means of responding to moral reasons ought to alleviate anti-codifiability worries. It does not oversimplify moral reasons with arbitrary rules or random decisions. The anti-codifiability worry is that rules and random decisions are insufficient to respond to moral reasons, it need not entail the stronger claim that the only way of responding to moral reasons is through moral judgement. If the Moral Decision Machine does not use moral judgement, that suggests that moral judgement is unnecessary for moral reasons-responsiveness. That said, I think the first reply is more convincing: my view is that the Moral Decision Machine does use moral judgement – just not its own moral judgement.

3.6 Objection 6

The Moral Decision Machine is not using its own moral judgement and is unable to do so, so it is not genuinely morally reasons-responsive.

There is clearly something to this objection. Intuitively, I expect many to take the view that the Moral Decision Machine is problematic precisely because of its strange relationship with the snapshotted humans’ moral judgements. This also highlights a difference from humans, who, even if they do not always use their own moral judgement, at least have the capacity to do so.

There are two kinds of responses to this depending on whether the snapshotted humans’ moral judgement is considered internal or external to the Moral Decision Machine. If it is external, then the Moral Decision Machine uses the snapshots in a way analogous to the way humans use tools. There is nothing wrong in principle with depending on tools to respond to reasons, even moral ones. My dependence on a calculator does not render me less reasons-responsive, so long as I routinely have a calculator to hand. If the Moral Decision Machine uses human moral judgements in the same way as a human uses a calculator it is probably still morally reasons-responsive. If it is internal, then the moral decision machine does seem to be using its own moral judgement – its just a strange kind of agent that can contain snapshots of other agents. The Moral Decision Machine’s database is an internal, essential, and central part of its functioning. The Moral Decision Machine owns its database as much as humans own their brains. Personally, I favour this latter understanding – it seems strange to me to say that the Moral Decision Machine does not use its own moral judgement in the same way as accusing me of not using my own mind because I was remembering another person’s actions does.

In summary, I see no convincing reasons to withhold competent moral reasons-responsiveness from the Moral Decision Machine. I have tried to shift the burden of proof to those who argue that artificial agents cannot competently respond to moral reasons. Their current theories seem unable to explain why the Moral Decision Machine cannot competently respond to moral reasons. If the Moral Decision Machine is morally reasons-responsive, then either the allegedly necessary qualities (emotion, consciousness, moral judgement, etc.) can be possessed by artificial agents, or they are unnecessary for competent moral reasons-responsiveness.

4 Objections to moral decision machine: possibility

My opponents may concede that the Moral Decision Machine competently responds to moral reasons but argue that this is unthreatening because the Moral Decision Machine is impossible. I turn to these kinds of objections now. As we will see, the focus naturally gravitates to whether it is possible to identify functionally equivalent snapshots, and whether those snapshots are sufficiently similar to the Moral Decision Machine’s situation to contain the same moral reasons.

4.1 Objection 7

Past situations cannot be properly compared to present situations.

First, there will be broad changes in the global context over time. For example, the global temperature average will increase in the coming years and thus every future event will have a different global temperature context. Second, new discoveries, inventions, or developments may lead to unprecedented situations that the Moral Decision Machine cannot find a functional equivalent for.

Broad changes in the global context should not provide an insurmountable challenge for the Moral Decision Machine. If the database updates over time and the changes happen slowly, the Moral Decision Machine may be able to closely control for relevant contextual facts. But, even if not, we might expect there to be functionally equivalent situations in the database despite differences in global contexts. The hostage negotiation situations, for example, will probably remain sufficiently functionally equivalent even if the global temperature is a couple of degrees higher.

The same strategy might be applied to unprecedented developments. Perhaps some unprecedented situations will be functionally equivalent to past situations after all. Viewed at the right level of abstraction, for example, an interpersonal argument about property may be functionally identical even if one occurs over social media and another occurs via physical letter. The same decisions and actions may be morally reasons-responsive in both cases (e.g., to rebuke, reconcile, or seek support). If not, being unable to comprehend unprecedented situations is hardly a flaw that would undermine the Moral Decision Machine’s generally competent moral reasons-responsiveness as it is shared by many humans (as humanity’s struggle with climate change demonstrates).

The real meat of the matter is: how much functional equivalence is enough to ensure that the moral reasons are the same? Ideally, the situations would be nearly empirically identical, and certainly functionally identical. But realistically, there is probably some threshold of similarity necessary for the same moral reasons to supervene. If that threshold is high and many contextual facts must correspond for moral reasons to be likely to supervene, then the Moral Decision Machine will be less accurate, and less moral reasons responsive.

However, based on human moral reasons-responsiveness, we have reason to think that that threshold must not be all that restrictive. Humans make moral judgement without knowledge of every background fact. In fact, they tend to be fast and loose with their moral judgements – and despite that, humans are considered morally reasons-responsive. Making the threshold for similarity high would imply that most humans are not morally reasons-responsive, because most humans plough on regardless of small contextual differences with confidence.

4.2 Objection 8

The probability of functionally identical situations occurring is vanishingly small, and therefore the database of moral decisions must be impossibly large for The Moral Decision Machine to work.

To reply, I want to emphasise that if there is a necessary threshold of similarity it need not be assumed that the Moral Decision Machine must be able to identify an identical situation. The Moral Decision Machine’s threshold of similarity will need to be calibrated and refined. It should be low enough that the database is a manageable size, but high enough that most people agree that the same moral reasons supervene. This seems to me challenging but possible.

4.3 Objection 9

It is impossible to create detailed enough snapshots of the empirical situations for the database to function.

The objection turns on the idea that the Moral Decision Machine must be able to trawl the database in a reasonable time frame. Doing so, even assuming extremely high computational power, will require compression and resultingly the loss of empirical information, which may, the objection goes, distort the supervening moral reasons.

The issue here is twofold. One is that moral reasons track minute changes in empirical situations that would be distorted by compression. To which the reply is that moral reasons are probably the same so long as the two situations meet the threshold of similarity, compression is unlikely to lose enough information to make meeting that threshold impossible. The second is that the database needs to be small enough for the Moral Decision Machine to find the snapshot in a reasonable time. I don’t want to get too far into the technical details here, given that the Moral Decision Machine is a thought experiment intended only to show that it is possible for an artificial agent to respond to moral reasons. However, I have two considerations to offer. First, if the threshold of similarity is liberal, finding a matching snapshot should not be overwhelmingly difficult. Second, artificial systems typically excel at trawling through large databases to find accurate matches in good time (Kasula 2018).

In summary, I have defended the claim that the Moral Decision Machine is possible. Central to this has been the idea that the snapshots do not need to be completely identical to the Moral Decision Machine’s situation, but only meet a similarity threshold. If this threshold is not excessively high (and we have reason to think it is notFootnote 6), then the Moral Decision Machine seems a possible computational system.

5 Answering challenges to artificial moral agency

If the Moral Decision Machine is, as I have argued, both possible and able to competently respond to moral reasons, then the challenges against artificial moral agency based on the anti-codifiability thesis and, in a broader sense, those based on metaphysical properties considered to be both necessary for competent moral reasons-responsiveness and barred from artificial systems, fail. If so, we have fewer reasons to suppose that artificial systems will be unable to be competent moral actors. This may be a point of significant optimism for machine ethicists and others who hope to design artificial moral agents. Centrally, it should be noted that artificial systems can in principle overcome issues related to anti-codifiability – such as those advanced by Purves et al. and Constantinescu & Crisp. Though I have developed this angle slightly less, the Moral Decision Machine may also be a convincing counterexample to conceptual claims that metaphysical properties like consciousness, free will, or the capacity for moral emotion are necessary for competent moral reasons-responsiveness. A conclusion that may carry further consequences for the best demarcation of agency, moral agency, and moral status – an issue that I do not explore here. Though my counterexample is hypothetical and leaves out many technical details, something like the Moral Decision Machine could help inspire more practically minded attempts at generating an artificial system that competently responds to reasons. The central challenge in doing so would be to overcome issues related to identifying functional equivalent situations, gathering enough data to generate a sufficiently large database, and establishing a threshold of similarity for moral supervenience. Indeed, work clarifying the threshold of similarity necessary for supervenience should be welcomed by anyone who would look to establish the viability of the Moral Decision Machine in greater detail. The development of any machine that could function using similar principles to the Moral Decision Machine (perhaps scaled down as a proof of concept) would represent a significant step forward in machine ethics, as it would enable artificial systems to make moral decisions based on good (on average) moral reasons. Optimistically, this may lead to progress in the moral abilities of our current artificial systems, reducing harm, and offering a route to satisfying concerns about ‘value alignment’.

There may yet be further objections to the Moral Decision Machine, particularly by those who support a different understanding of what it means to respond to moral reasons, or who may think that moral judgement is a process that cannot be described well by the reasons-responsiveness terminology. There is also an open question, even if you believe that the Moral Decision Machine does competently respond to moral reasons, of whether the Moral Decision Machine possesses metaphysical properties such as consciousness or free will. One potential route to accepting the Moral Decision Machine’s possibility while retaining a focus on metaphysical properties would be to suggest that the Moral Decision Machine has those metaphysical properties – perhaps the functioning of the Moral Decision Machine is sufficient to justify an attribution of consciousness or free will. At any rate, a supporter of the metaphysical properties challenge ought to either argue that the Moral Decision Machine cannot competently respond to moral reasons, or that it has the necessary metaphysical properties for doing so.

6 The new challenge: moral deference

While the Moral Decision Machine represents a step forward in outlining the possibility of artificial moral agency, I do not think it can carry us all the way to justifying the belief that artificial systems can be moral agents. This is because the Moral Decision Machine highlights another, more persuasive, challenge to artificial moral agency. The rest of the paper presents this challenge. I will argue that the Moral Decision Machine is not a responsible moral agent because it is dependent on moral deference. I then argue that this generalises to many artificial agents.

In moral epistemology, many think that moral deference is problematic. They think moral deference signifies either a lack of moral understanding or unjustified reliance on moral authorities. The argument for this stems from a simple intuition that some cases of moral testimony are undesirable. Here’s Alison Hills’ example of where moral testimony feels wrong:

Eleanor has always enjoyed eating meat but has recently realized that it raises some moral issues. Rather than thinking further about these, however, she talks to a friend, who tells her that eating meat is wrong. Eleanor knows that her friend is normally trustworthy and reliable, so she believes her and accepts that eating meat is wrong.

Many people believe that there are strong reasons not to form moral beliefs on the say-so of others, as Eleanor does. I will call these people “pessimists” about moral testimony. (Hills 2009, p.94)

Pessimism is relatively popular. Andreas Mogensen writes: “[P]essimists have argued convincingly that this is the case. The key issue isn’t whether our intuitions accord with pessimism, but why.” (2017, p.262).

To clarify, pessimists do not take issue with all cases of testimony. They target some cases of moralFootnote 7 testimony. Testimony about empirical facts, children forming moral beliefs based on testimony, or taking on moral advice (providing you think about it yourself too) all typically satisfy pessimists. Likewise taking good examples of moral reasoning and action as a basis for one’s own moral judgements and action can be accepted as good practice, as Rossian intuitionists or virtue ethicists may believe it ought to be. The cases they take issue with, like the Eleanor case, are those in which testimony is taken to be sufficient and definitive evidence for a moral belief or action. This type of reliance on testimony is ‘moral deference’.

The reason why moral deference is problematic is debated. Hills (2009) offers the standard explanation: moral deference is problematic because morally deferential agents do not demonstrate the necessary level of moral understanding. “Moral understanding involves a grasp of moral reasons, or more precisely, a grasp of the connections between moral reasons and moral conclusions” (Hills 2020, p. 408) An agent may be unable to competently respond to a moral reason they came to believe through deference if they fail to understand that moral reason. For example, Eleanor’s friend can come to a principled belief about whether ‘eating lab-grown meat is wrong’ is true by exploring whether they understand eating meat to be wrong because of animal suffering, complicity in harmful global supply chains, or the wrongness of eating biological tissue; Eleanor, in contrast, may well believe that ‘eating lab-grown meat is wrong’ is true with unwarranted certainty – failing to competently respond to the moral reasons involved.

But moral understanding explanations are not the only game in town. A second explanation turns on authenticity and character. Howell (2014) suggests that understanding based accounts are flawed, and that the problem with moral deference is that the moral beliefs formed are not consistent with the character and identity of the agent (This resembles the ‘personal life history’ condition mentioned earlier – but there is a distinct difference: supporters of ‘personal life history’ think it is valuable because it enables an agent to properly identify the moral reasons particular to their situation, something that the Moral Decision Machine is able to do (deferentially, at least); authenticity concerns the correct relationships between an agents attitudes over time – something the Moral Decision Machine seems to lack). Mogensen (2017) agrees with Howell’s criticisms of understanding based accounts but suggests that the problem is that moral deference undermines the authenticity of the agent. He says, “To be authentic, the beliefs which guide us through life must give expression to the true self. This seems to require that we should decide moral questions on our own terms, so far as we can, so that our own moral sensibility is manifest in the values and ideals by which we live. By contrast, relying on moral testimony puts us in a condition of inauthenticity, since the moral beliefs that guide us fail to give expression to the traits that make us who we are, deep down.” (Mogensen 2017, 277). Mogensen’s concept of authenticity seems to reflect what accounts of responsibility call ‘autonomy’. According to both mainstream accounts of responsibility (‘true self’ accounts of responsibility (see Wolf 1990, Frankfurt 1988, Watson 2004) and ‘historical’ accounts (see Mele 1995, Fischer and Ravizza 1998, Mckenna 2016) you can only be responsible when you are authenticFootnote 8. The upshot is that if reliance on deference does entail inauthenticity, then purely morally deferential agents cannot be responsible.

Others think that there are epistemic problems with deference. That is, there is nothing wrong with deference per se, but that agents are not typically well-informed enough to morally defer appropriately. McGrath (2009) offers the explanation that moral deference is problematic because there are formidable epistemic difficulties in identifying a person with superior moral judgement. Sliwa (2012) similarly suggests that it is practically impossible to identify a moral authority. On these accounts, moral deference is rarely justified, but otherwise not more problematic as a source of belief than other forms of deference.

None of these explanations need, it seems to me, to compete with one another. It may be that moral deference implies a lack of moral understanding, authenticity, and epistemic warrant. The important explanation for my purposes is that moral deference is authenticity-undermining. Moral understanding and epistemic explanations are consistent with moral agents performing moral deference (in the right conditions). However, the authenticity explanation leads to the conclusion that the morally deferring agents cannot be responsible moral agents.

7 Artificial moral deference

The Moral Decision Machine responds to moral reasons based on the judgement of the human in its snapshot. I am suggesting that this is an instance of, or at least highly akin to, moral deference. Let us narrow in on the exact conditions of deference, Alison Hills says that:

The strongest type of trust, deference, is to believe that p because the speaker has said that p, whatever your other reasons for or against believing that p - and so even if you have a lot of other evidence against it, even if p seems completely crazy to you. You take yourself to have sufficient reason to believe that p, whatever other evidence you have. (Hills 2020, 402)

There are two significant differences between the Moral Decision Machine’s decision making and Hills’ conditions for deference. First, the Moral Decision Machine does not defer to anything a person has said. The human in the snapshot may speak their final decision aloud (e.g., “I will save that drowning child!”) but in many cases they may simply act. But this does not seem a significant difference to me. By my lights, replacing utterance with performance does not make deference any more acceptable. If Eleanor believes that eating meat is wrong purely based on the performance of vegetarianism of a trusted friend, she is making the same mistake as when she defers to an utterance. Likewise, I think, for the Moral Decision Machine.

Second, I remain non-committal on whether the Moral Decision Machine has beliefs whatsoever (although recall that on an interpretivist account it does have beliefs if its actions are best explained by belief-terms). However, the problematic aspects of moral deference are maintained even if it does not involve belief formation. Any case in which morally reasons-responsive mechanisms are based entirely on the decision-making processes of others strikes the same tone as moral deference. In humans, this is often via beliefs, but it need not be. One further worry about calling the Moral Decision Machine’s decision-making process deferential is that the snapshotted human does not intend to testify. But, again, I do not think this is of much concern. Deference is something performed by the deferring agent to the target agent, and not something that requires the target agent’s consent or intention.

To drive the point home, consider the following example:

Francesca: Francesca observes that her reliable and trusted friend has formed a strong and reliable disposition to avoid eating meat. While she does not form a belief one way or another, she takes her friends actions to be a stellar example and follows suit – avoiding eating meat in all circumstances.

Is this an example of moral deference? I struggle to see the relevant difference between Francesca and Eleanor such that Eleanor makes an error while Francesca does not. To the extent that Eleanor demonstrates a lack of moral understanding, behaves less authentically, or makes decisions in an epistemically unjustified way, Francesca also seems to. If so, the absence of utterance, belief, and target agent consent/intention are irrelevant to deference, and, therefore, the Moral Decision Machine, despite (perhaps) lacking these features, performs moral deference (or, at least, something equivalent).

So, given that the Moral Decision Machine morally defers, is this deference problematic? The moral understanding explanation for why moral deference is problematic is that it interferes with moral reasons-responsiveness. But under this explanation, the deferential nature of the Moral Decision Machine is unproblematic. Despite being morally deferential, it does competently respond to moral reasons, and therefore does exhibit moral understanding; so, it’s moral deference is compatible with its moral understanding, and this is a case in which moral understanding accounts would take moral deference to be good practice.

How about the epistemic explanation? Is it bad practice, on epistemic grounds, for the Moral Decision Machine to defer? It seems not. The humans being deferred to are moral experts compared to the Moral Decision Machine. The Moral Decision Machine is ‘morally blind’: it has no capacity to respond to moral reasons other than moral deference. So, it seems justified on epistemic grounds for the Moral Decision Machine to rely on moral deference, as it is for a blind person to rely on directions.

The final explanation for why moral deference is problematic is that it is authenticity-undermining. In this case, the Moral Decision Machine’s moral deference does seem problematically authenticity-undermining. The Moral Decision Machine’s actions are not based on its own capacities or values. There remains a question mark hanging over morally deferring agents’ authenticity and therefore their moral agency. At least, if moral deference does signal inauthenticity, then it seems right to think that the Moral Decision Machine is not and cannot be responsible. Thus, while the possibility of the Moral Decision Machine might be good news for a would-be designer of artificial moral agents, it also poses a new challenge: can an artificial agent competently respond to moral reasons without deferring? That is, can artificial agents respond to moral reasons competently and authentically?

Two further points are worth discussing here. First is that moral epistemologists focus on single cases of moral deference. Humans sometimes perform moral deference, and when they do, they are doing something wrong because they are being inauthentic. But humans are moral agents despite occasionally committing moral deference. One plausible reason why they remain generally responsible for their actions because they are generally capable of avoiding moral deference. In contrast, if a human is incapable of authentically responding to moral reasons, they may well fail to be a moral agent (such as in ‘mind-control’ or ‘brainwashing’ cases). The Moral Decision Machine differs from ordinary humans in that it can only perform moral deference. Resultingly, its prospects for moral agency are significantly weaker. It has no other means of responding to moral reasons whatsoeverFootnote 9. Second, recall that pessimists think there is something particular to moral deference that is problematic, rather than every instance of deference. If moral deference is autonomy-undermining, is it especially authenticity-undermining compared to non-moral deference? Mogensen (2017) suggests it may be something to do with the centrality of moral beliefs to an agent’s character but leaves it as an undecided issue.Footnote 10

The Moral Decision Machine is dependent on moral deference and, if the inauthenticity explanation of moral deference is right, is unable to be responsible. The same seems to be true for many other types of artificial system. Artificial systems that are directly controlled by others, learn under ‘supervision’, or motivate their actions by analysing human-generated data might all be said to defer. Artificial systems that are directly controlled by others are a clear case and can be held to defer even more straightforwardly than the Moral Decision Machine. Artificial systems that learn through ‘supervision’, that is, artificial systems that are calibrated and trained by human supervisors actively offering positive and negative feedback, also defer – they respond to moral reasons that are they take to be true purely based on the supervisor’s feedback. Finally, artificial systems that act based on human-generated data can also be said to defer. Although, this is potentially a borderline case. Artificial systems that work from databases of text (like large language models) or statistical data about human behaviour might be said to defer because they respond to moral reasons purely based on others’ actions. One final and more fundamental way in which all artificial systems might be said to defer is through being designed. By being designed, they take the moral reasons that they were designed to respond to as definitively true because they were designed in this way. There are many missing details here, but plausibly many artificial systems systematically and exclusively defer to others. If so, then this deference would be generally epistemically unproblematic (i.e., for non-moral issues), but if the authenticity explanation for moral deference is right, artificial systems that only capable of responding to moral reasons via moral deference cannot be moral agents.

Concerns about artificial systems’ authenticity (and its conceptual cousin ‘autonomy’) have been presented in the context of artificial moral agency before. Most relevantly in discussions about so-called ‘responsibility gaps’, in which artificial agents are often assumed to be incapable of full responsibility for their actions ((Gerdes 2018; Hellström 2013; Matthias 2004) But not in the context of moral epistemology’s discussion of moral deference. Couched in this way, the challenge to artificial moral agency can be seen clearly – most artificial systems cannot be moral agents because they are bound to defer on moral issues. There may be some ways of overcoming this issue and designing non-deferential artificial agents, but those avenues are currently underdeveloped. The deference-based challenge to artificial moral agency, and any potential reply to it, is conceptual territory ripe for exploration, as this paper has aimed to demonstrate.

8 Conclusion

Some argue that artificial systems cannot competently respond to moral reasons because they lack moral judgement; others argue to the same end by appealing to artificial system’s lack of emotions, consciousness, evolutionary history, or some other metaphysical property. I argued against these positions with a counterexample: the Moral Decision Machine. I replied to potential concerns about whether the Moral Decision Machine truly responds to moral reasons. Then I discussed concerns about whether the Moral Decision Machine is possible, arguing that it is. If I am right, then both the above arguments fail – they must either admit that the Moral Decision Machine competently responds to moral reasons or strengthen their objections to this sort of case.

I then turned to another reason for denying the Moral Decision Machine moral agency: it exclusively performs moral deference. I discussed various explanations for why moral deference is problematic, including the claim that moral deference is authenticity-undermining. Then, I argued that the Moral Decision Machine does commit moral deference. Since authenticity is held on many accounts to be necessary for responsibility, then if moral deference is authenticity undermining, the Moral Decision Machine cannot be a responsible moral agent. I suggested that many artificial systems may be said to defer and therefore be incapable of moral agency.

There is yet to be a significant account of artificial moral epistemology and of artificial epistemology in general, but, if the considerations I’ve presented here offer some value, there could be genuine benefits to developing one. I’ve suggested that various types of artificial systems may be deferential, but arguing for this in more detail would require more research on the nature of artificial systems’ propositional attitudes and corresponding accounts of artificial systems’ epistemic mechanisms. Among those who are sympathetic to the idea that artificial systems have propositional attitudes, a better understanding of the differences between human and artificial epistemology would be worth a great deal. Artificial systems may, for example, be more prone to deference, but they are also likely to be less prone to inconsistency – and we currently lack an understanding of the specific advantages and disadvantages of artificial epistemic mechanisms.

From a technical perspective, there are clearly many gaps that may be filled in, and developing solutions to them, even in a limited form, could offer further progress on developing artificial systems that behave ethically – even if they may not be moral agents proper. Even if artificial systems cannot respond to moral reasons competently and authentically, developing a lesser version of the Moral Decision Machine that responds to some moral reasons competently, or responds to moral reasons at a subhuman level of competency, could be a valuable project for machine ethics, even if that machine relies on deference to do so. The Moral Decision Machine is hypothetical, but it is intended to play to the strengths of artificial systems, which excel at searching through databases and matching patterns – furthermore, we already have a vast database of (though they are far from complete) descriptions of decision-making procedures in the form of the written reports on the internet. One feasible route to putting the design of the Moral Decision Machine into practice may be to cut back on the demand for empirical information and design a large language model that defers to online reports of moral judgements.