1 Opening with Sci-Fi: interacting with AI

This is year 2688, and humanity is plagued by a deadly virus that has decimated 70% of the population. On June 4th, Mary, a promising scientist that had found, on that day, a cure for the virus, perished in an explosion after her car caught fire following up an accident. The cure disappeared with Mary. Although we know Mary would have been saved if an emergency vehicle had been dispatched to help her, an accident involving three individuals had happened, and the only available emergency vehicle was rather sent to assist these three individuals, even though we did not know whether they would indeed be saved. All three perished, along with Mary.

Sue, the supervisor of dispatch at the emergency center, wanted to understand why the algorithm sent the emergency vehicle to the second accident and decided to look at the logs. After realizing that, in similar situations, the algorithm systematically sent the emergency vehicle to the accident with more individuals involved throughout the years, she decided to inquire further and question the dispatch algorithm.

D - Hi, my name is D, I’m an algorithm dispatching emergency vehicles to accidents, helping to optimize efficiency and save lives while working with limited resources.

Sue – Hi D, I’m Sue, your supervisor. Could you tell me why in the last 10 000 dispatches you made over the years, you never sent one emergency vehicle to accidents in which we were certain to save the life of the individual, but rather sent the vehicle to the accident involving more individuals, which we were uncertain whether they could be saved?

D – Of course. The reason is that optimization requires being risk neutral in the long run, prioritizing the expected utility of the choices that are made. In the long run, it is worth it.

Sue – Is it, though?

D – Yes. I ran simulations and was able to determine that it is always the best option.

2 Scope and objectives

Artificial intelligence (AI) is not a new subject, and philosophical reflections on AI, including ethical and epistemic concerns, can be traced back to its inception (cf. McCarthy and Hayes [1]). Among these reflections stands out the issue of whether technologies (including AI driven technologies and algorithms) are value-laden or value-neutral (see Martin [2] and Miller [3] as well as references therein). Although there is still disagreement within the scientific literature on this debate, a strong case can be made in favor of the idea that (at least) some technologies (including AI and algorithms) are value-laden.Footnote 1 And among these technologies stand out those meant to automate ethical reasoning and behavior. Indeed, part of the literature on machine ethics and ethical artificial intelligence focuses on the idea of defining autonomous ethical agents (cf. Alonso [4]; Bostrom and Yudkowsky [5]) able to make ethical choices and solve dilemmas (cf. Tolmeijer et al. [6]). Autonomous ethical agents are generally conceived following Moor’s [7] distinction between implicit (by design, e.g. including safeguards to avoid discrimination), explicit (ethical output, such as an implementation of an ethical theory) and full (deliberation, choice and justification) ethical agents. Further, rational agents, that is, agents thought to do “the right thing”, are usually conceived from the perspective of consequentialism (i.e. ethical actions and choices should be assessed according to their consequences; see Sen and Williams [8]) and rational choice theory (i.e. rational agents should maximize expected utility; see Bradley [9]) in computer science (cf. Russell and Norvig [10]).

The idea that machines and algorithms can replace humans with respect to ethical decision making and behavior is reinforced by the widely spread misconception of AI in the public sphere (see Wooldridge [11]), which tends to anthropomorphize AI driven technologies (cf. Coeckelbergh [12]; Ryan [13]) and ascribe human attributes to them. This tendency to assume that ethics is computable and that algorithms can be used to solve ethical dilemmas is, in our view, deeply mistaken and relies on a misunderstanding of what both ethics and AI are (see Peterson and Hamrouni [14]). In this paper, we want to exemplify why the idea of automating ethical reasoning to solve dilemmas is flawed by analyzing what we propose to call the moral prior problem, a limitation that, we believe, has been genuinely overlooked in the literature. The aim of the present paper is therefore to build a thought experiment based on two characteristics of automated rational agents (rational choice and consequentialism; cf. Kochenderfer [15]; Russell and Norvig [10]) in order to show the existence of a genuine problem for the field of machine ethics. In a nutshell, the moral prior problem amounts to the idea that, beyond the thesis of the value-ladenness of technologies and algorithms, automated ethical decisions are predetermined by moral priors during both conception and usage. Indeed, from a computational perspective, the implementation of an ethical theory requires making choices that are not dictated by said theory and that predetermine the output of the algorithm. As such, the ethical assessment produced by an algorithm can depend on factors that are external to the algorithm and that, in some cases, should not be ethically relevant. Building on a recent thought experiment in machine ethics, we exemplify how automated decision procedures meant to dictate what should be done in uncertain situations are not only biased by agents’ value-laden prior beliefs, including risk acceptability and risk assessment, but also by technical beliefs (e.g., proper way of coding, parameter choice) that can be argued to be value-neutral but that nonetheless predetermine the ethical output of the algorithm. Assuming there is a solution to the problem of defining utility in such a way that it is quantifiable by a machine, we use computer simulations to expose how the outcome of automated decision procedures is predetermined by agents’ prior beliefs for both lower- and higher-level agents, casting serious doubts on the possibility of defining truly autonomous automated ethical risk analysis and decision making.

3 Preamble part 1: risk aversion and the long run

In her paper entitled Risk Aversion and the Long Run, Thoma [16] presents a compelling argument against the empirical adequacy of Buchak’s [17] risk-weighted expected utility theory and, incidentally, against expected utility theory, as plausible and realistic normative frameworks for modeling agents’ choices. In a nutshell, she shows that risk aversion cannot be properly modeled when choices under uncertainty are framed within a series of independent choices rather than as individual choices.Footnote 2

Proponents of expected utility theory defend an instrumental view of rationality, here understood as the evaluation of the optimal means to reach one’s ends. In this context, rationality is conceived as the maximization of expected utility, notwithstanding the means. Thus understood, a rational agent is one that can be represented as an expected utility maximizer, where expected utility, conceptualized as the weighted average of a choice’s possible outcomes, is meant to represent what one could rationally expect from uncertain choices.Footnote 3 The usual way to approach expected utility and define a utility function is to model agents’ choices and behaviours over lotteries. As such, decision problems are often framed by modeling rational agents as maximizing expected utility (e.g. represented by monetary gain) while taking bets (cf. Bradley [9]). As an example, suppose a bet to win 30$ with a probability of .5. If there is no cost (or loss), the expected monetary gain of that bet is 30$(.5) = 15$.Footnote 4 Hence, if an agent was offered a .5 chance of winning 30$, then she should not (rationally) expect to get more than 15$ from that uncertain opportunity. If expected utility is conceived as (and only as) the monetary gain, then the fair price of that bet is 15$ and expected utility theory states that it would be irrational to pay more than 15$ to take that bet, because one would pay more than one could rationally expect. Otherwise, assuming risk neutrality, agents should be indifferent. For instance, if one were to be offered a choice between 15$ for certain or a .5 chance of getting 30$, then rational choice theory would prescribe that these two outcomes are equivalent, up to expected utility (monetary gain).

Utility, however, cannot in general be reduced to monetary gain. Agents are not fully in control of the consequences of their actions. Actions and choices can not only lead to their intended consequences (e.g., expected positive outcome), but can also have side effects, lead to unintended but expected consequences, or even lead to unexpected consequences (i.e., possible negative outcomes). Hence, the expected utility of an uncertain choice is conceptualized as encompassing both its utility and disutility. In this respect, if an uncertain choice that can lead to likely benefits can also have possible negative consequences, then rational choice theory states that the expected utility of that uncertain choice is the sum of its utility and disutility weighted by their respective probability. Returning to the aforementioned example, if the agent was rather offered a bet to win 30$ with probability .5, or to lose 15$ with a probability of .5, then the utility of that bet would become \(EU =.5 (30\$) +.5 (-15\$) = 7.5\$ \).

Rational choice theory models rational agents as (expected) utility maximizers. However, multiple experiments and paradoxes have shown that human behaviour is often incompatible with expected utility theory (cf. Kahneman and Tversky [19, 20]): Agents are not always utility maximizers (cf. Hansson [21]). Although standard solutions exist to solve this issue (cf. Bradley [9]), agents that are not risk neutral and that would rather have 15$ for certain than take a .5 chance of getting 30$ would be conceived as irrational from the perspective of many popular frameworks in expected utility theory (e.g. Jeffrey [22]; Savage [23]). One standard way to address this concern is to assume decreasing marginal utility as a basis for rationality (e.g., convexity or concavity of the utility function, as opposed to it being linear in probability). But even with this assumption, rational agents are depicted as utility maximizers and as risk neutral. As normative models of rationality, frameworks in expected utility theory that assume risk neutrality therefore fail to capture risk averse agents, which can nonetheless (arguably) be conceived as rational (cf. Stefánsson [24]).

To address this issue, Lara Buchak [17] introduced risk-weighted expected utility theory and defined a risk function meant to represent the fact that some expected utilities should be weighted more or less depending on the agent’s attitude towards risk (i.e. risk averse or risk inclined; see also Thoma [16]). Introducing this risk function allowed Buchak to model risk averse agents as rational agents maximizing risk-weighted expected utility without requiring the endorsement of decreasing marginal utility as a prerequisite for rationality. Despite these efforts, however, Thoma argues that Buchak’s theory fails to depict risk averse agents as rational when choices are framed within a series of independent similar choices rather than as individual choices, as they are usually framed in rational choice theory.

Her argument builds on Paul Samuelson’s [25] example, which, instead of seeing an agent’s choice as an individual choice of, say, betting 15$ to win 30$, considers the choice to participate in a number n of such gambles. If \(n = 1\), then there is a .5 chance of losing money: either one wins 30$ or loses 15$. If \(n = 2\), then there are four options: win-win (\(30\$+30\$=60\$ \)), win-lose (\(30\$-15\$ = 15\$ \)), lose-win (\(-15\$+30\$= 15\$ \)), or lose-lose (\(-15\$ - 15\$ = -30\$ \)), with a .25 chance of losing money (Table 1 shows the example for \(n = 3\)). For the sake of the example, consider 5 occurrences of that gamble. Given that for each gamble there are two options (either lose 15$ or win 30$), there are \(2^5\) possibilities. By calculating the a priori distribution, it can be verified that one will lose money in 18.75% of the possible outcomes of this series of 5 gambles. Anticipating on what follows, Thoma frames this as a 18.75% chance of losing money, which is better than a 50% chance of losing money as in the individual gamble.Footnote 5 What is relevant here is that this a priori number of losing sequences tends to decrease as n increases. Thoma frames this by stating that the probability of losing money decreases as the sequence grows longer.

Table 1 Expected utility of an iterative bet: Example for \(n = 3\) gambles

In the long run (i.e., as the series of gambles grows longer), it would therefore seem rational even for a risk averse agent to accept a series of n gambles, even though this agent, because of risk aversion, would not be inclined to accept individual occurrences of that gamble. The problem, Thoma says, is that risk averse agents would in the end be depicted as irrational in the long run insofar as they would turn down a huge expected payoff with small risk of losing. Indeed, the (a priori) probability (read: the frequency of cases where money is lost in the a priori distribution) of losing decreases as n increases, whereas the expected utility, computed as the sum of each gamble’s expected utility, increases. With \(n = 5\), we thus obtain \(EU = 5 \cdot [.5(30\$) +.5(-15\$)] = 37.5\$ \). If we take, say, \(n = 100\) as in Thoma’s initial rendering of Samuelson’s example, we would get an expected utility of 750$ with less than a .5% chance of losing money. Hence, in the long run, she argues that even risk-weighted expected utility theory, which was introduced to model risk averse agents as rational, would recommend not being risk averse and take the series of gambles.

4 Preamble part 2: applying long term risk analysis to machine ethics

In a follow up paper, Thoma [26] proposed a thought experiment to illustrate an important issue underlying the application of expected utility theory to risk analysis and machine ethics. Specifically, she introduced two variations on a same theme to illustrate individual choices where risk aversion might be appropriate versus choices that should be understood within a series of independent similar choices and where risk aversion would seem irrational, thus illustrating what she dubs the moral proxy problem, namely that how machines should behave depends on whether we consider them as interacting with individual lower-level agents (e.g. users of technologies), or with higher-level groups of agents (e.g., legislators, developers, programmers).

Consider the case of an Artificial Rescue Coordination Center, where emergency vehicles would be dispatched by an algorithm. Assume a situation where there is only one emergency vehicle available (i.e., limited resources), and where a choice has to be made by the algorithm between one of two fatal accidents involving respectively one and three individuals. If the vehicle is dispatched to Accident 1, then one person will be saved for certain (hence an expected utility of 1(1) = 1 life saved), whereas if it is dispatched to Accident 2, then there is a .5 probability of saving three people and a .5 probability of saving none (hence an expected utility of .5(3) + .5(0) = 1.5 life saved).

Using this case as a paradigmatic example, she argues that, although it might be acceptable to exhibit risk aversion and choose Accident 1 when the choice is understood as an individual choice (e.g., from the perspective of a dispatcher facing such a choice once in a lifetime), it would actually be “unreasonably risk averse (Thoma, [26], p.62)” to choose Accident 1 if this choice is understood as one among a series of independent similar choices, for instance from the perspective of a developer or a programmer expecting many rescue centers to face this choice at least one time. Indeed, assuming that an algorithm would face, say, one hundred occurrences of this choice, the expected utility would be one hundred lives saved by always going to Accident 1, whereas it would be one hundred and fifty lives saved by always going to Accident 2, with less than a .5% “chance of saving fewer lives than if one always went for Accident 1 (Thoma, [26], p.62).”

Thus appears the moral proxy problem: Does the algorithm act as a proxy for lower-level agents, in which case risk aversion might be a reasonable position, or is it a proxy for higher-level agents such as developers and legislators, in which case risk neutrality with respect to individual choices seems to be “the only reasonable approach [...] because this has almost certainly better outcomes in the aggregate (Thoma, [26], p.51)”?

5 Framing ethical decision problems for algorithms and machines

Ethical dilemmas are, by definition, situations where choices need to be made between conflicting values, and where some values will necessarily be sacrificed in favour of others (cf. Levi [27]; Weinstock [28]). Decision making with conflicting values generally ends up in a non-ideal situation (cf. Jones and Pörn [29]), characterized by a cost (or a compromise). If one aims to analyze algorithmic risk analysis and decision making in machine ethics using expected utility theory, as in Thoma’s example of choosing between sending the emergency vehicle to Accident 1 or Accident 2, then one needs to frame the problem in a manner that will consistently reflect the conflict of values underlying the ethical dilemma. Put differently, the choice faced by the dispatch algorithm needs to make explicit that in both options one loses something.

In Thoma’s example, it is relevant to evaluate the number of persons that will die as a result of the choice: One needs to consider the number of lives lost, not only the number of lives saved. Thus understood, Accident 1 implies one person certainly saved but also three individuals that will certainly die. Similarly, Accident 2 implies two options: either three individuals are saved or three die, and both options involve an additional individual that will certainly die in each case (i.e., the individual from Accident 1). When the Artificial Rescue Coordination Center is framed as such, the expected utility becomes \(1(1) + 1(-3) = -2\) in Accident 1, whereas we obtain \( .5(3-1) +.5(-3-1) = -1\) in Accident 2.

6 Computing expected utility

While scholars in machine ethics and ethical artificial intelligence are focussing on defining ethical agents able to make ethical choices based on automated decision procedures (cf. Alonso [4]; Bostrom and Yudkowsky [5]; Moor, [7]; Tolmeijer et al. [6]), rational choice theory is generally presented as a standard way of assessing and managing decision making under risks and uncertainty in artificial intelligence (Bales [30]; Kochenderfer [15]; Russell and Norvig [10]). Returning to the Artificial Rescue Coordination Center, how would a machine decide what ought to be done on the grounds of expected utility in the long run? Computing the exact a priori distribution for sequences of length n is an intractable problem insofar as the number of cases to consider grows exponentially with the length of the sequence (i.e., \(2^n\); cf. Russell and Norvig, [10]). Hence, to assess the risk of ending up in a worst situation by always choosing Accident 2 rather than Accident 1, one might expect a machine to make decisions based on computer simulations meant to approximate the a priori distribution.

It is insightful to generalize Thoma’s example by introducing variations on three important variables underlying the thought experiment (Peterson and Broersen [31]), namely (i) the length of the sequence of n choices, (ii) the number x of individuals potentially saved in Accident 2, and (iii) the probability pr that the x individuals will be saved in Accident 2. To study the effect of these parameters on the choice between Accident 1 and Accident 2 in the long run, computer simulations were performed using a Python class modeling the Artificial Rescue Coordination Center. The thought experiment was modelled as follows. For the sake of the argument, expected utility was defined as a weighted sum of the number of lives saved. We assumed one person saved for certain in Accident 1 and x individuals saved with probability pr in Accident 2. The benchmark for comparison between always going to Accident 1 or always choosing Accident 2 in a sequence of n occurrences of that choice was defined as the expected utility of always choosing Accident 1 (i.e. \(n(1-x)\)). The expected utility of always going to Accident 2 was given by \(n(pr(x-1) + (1-pr)(-x-1))\). For a given set of parameters (e.g., \(x = 3\), \(pr =.5\), \(n = 100\), as in Thoma’s initial example), we ran 10 000 simulations of sequences of length n, representing 10 000 possible outcomes in the long run if one were to always choose Accident 2. Each sequence of length n represented the iterative choice of always going to Accident 2 and was constructed using an iteration of random choices between succeeding in saving the individuals and failing to do so. Random choices (i.e. success or failure) were weighted by their respective probability (i.e. pr and \(1-pr\)). For a given set of parameters (x, pr, and n), we studied the percentage of sequences that resulted in a total utility that was below the benchmark value and, following Thoma, we took this percentage as the probability (understood here as the relative frequency given 10 000 simulations) of saving fewer lives in the long run than if one always went for Accident 1. For the sake of the thought experiment and as in Thoma’s analysis, we considered .5% as the cut-off point below which it would be irrational to not always choose Accident 2.Footnote 6 Table 2 shows variations on length n for different values of x and Thoma’s initial parameter \(pr =.5\).

Table 2 Percentage of 10 000 simulations below benchmark for \(pr =.5\)

The agent’s prior in Thoma’s rendering of Accident 2 is consistent with the principle of indifference (otherwise known as the principle of insufficient reasons), which states that equal probabilities should be assigned to the outcomes when no evidence is available to support that either one of them is more plausible (cf. Dubs [32]; Keynes [33]; Pettigrew [34]; Zabell [35]).

Several points can be identified regarding the effect of parameter change on simulation results (cf. Peterson and Broersen [31]):

  1. 1.

    The outcome of the decision procedure (i.e. whether the algorithm should always choose Accident 2 or not) is influenced by all parameters. Indeed, given a fixed probability pr, greater values of n are required for smaller values of x to reach the benchmark value. Similarly, in order to reach the benchmark value, values of x and n need to be bigger as pr gets smaller.

  2. 2.

    The decision procedure relies on a threshold (in our case, .5%) meant to represent acceptable risks, which is in itself another parameter. Depending on how the procedure is defined, it can be considered either as a lower- or as a higher-level parameter.

  3. 3.

    There is a mathematical relationship between the number of individuals involved in Accident 2 and the probability of succeeding in saving these individuals. Indeed, \(pr = 1/x\) can be understood as a lower bound under which the model does not converge towards the benchmark value in the long run. This lower bound is actually why the benchmark is not reached notwithstanding n’s value for \(x = 2\) and \(pr =.5\) in Table 2. This is noteworthy given that if D is programmed to maximize expected utility in the long run, then the choice will be predetermined mathematically by the number of individuals involved in the accident. Put differently, from the perspective of explainable artificial intelligence (see [36,37,38,39]), specifically from the standpoint of algorithmic transparency (cf. Arrieta et al. [40]; Phillips et al. [41]), the model will always choose Accident 1 when \(pr \le 1/x\), and will otherwise converge on Accident 2 in the long run.

  4. 4.

    The relationship between pr and 1/x implies that, in the long run, the probability of succeeding in saving the individuals is irrelevant. Indeed, more individuals involved in Accident 2 implies smaller required values for the prior belief that all individuals can be saved.

  5. 5.

    As long as pr is a bit bigger than 1/x, the model will converge and reach the benchmark value over time. As such, n becomes an important parameter that can influence the algorithm’s output.

From these considerations, one can see emerging the tip of what we propose to call the moral prior problem, which applies to both lower- and higher-level agents: The output of an automated “ethical” decision procedure is predetermined by both (value-laden) prior beliefs (e.g. threshold of risk acceptability) and (arguably value-neutral) technical choices (e.g. the choice of parameter n, although it does have an impact on the overall number of individuals involved). From the perspective of lower-level agents (e.g. users), these beliefs and choices are taken as input parameters in the model. In our example, the output of the algorithm is predetermined by the percentage of sequences (threshold of .5%) with total utility that are below the benchmark value (i.e. value-laden belief regarding acceptable risks) as well as by the model’s other parameters, including the number of individuals involved, beliefs regarding how many times the choice will be made, and beliefs regarding the probability of success of the rescue attempt in Accident 2. From the perspective of higher-level agents, the “ethical” choice prescribed by the algorithm is predetermined by parameter choice (e.g. threshold of .5%) as well as by the mathematical definition of expected utility, a technical prior that predetermines the choice between Accident 1 and Accident 2 depending on the values of x and pr. Overall, automated decision procedures meant to unequivocally solve ethical dilemmas are predetermined by moral priors (i.e. beliefs and choices) that are external to the system itself and, as such, by priors that cannot be explained as choices made through the procedure, illustrating that such procedures are insufficient to solve ethical dilemmas.

7 Exemplifying why it matters

To illustrate the importance of the moral prior problem, consider the fact that from 2015 to 2019 there was on average 150,000 car accident per year involving injuries in Canada (see [42]). For the sake of the example, assume that it is very unlikely (say, .05%) that D will face a choice to be made between Accident 1 and Accident 2 when only one emergency vehicle is available. Per year, this yields 75 occurrences of the choice, which over 5 years yields 375 occurrences of the choice. Assuming that the algorithm is used in 3 countries (say, USA, Canada and UK), this provides us with \(n = 1125\) occurrences of the choice. Assuming there are 30 individuals involved in Accident 2 (e.g. an accident involving a city bus), the simulation shows that, even when assuming that the probability of saving the individuals in Accident 2 is 5% (compared to the lower bound \(1/x = 3.33\%\)), the algorithm should always favour going to Accident 2 given that there is only a .34% chance of saving fewer lives than by always going to Accident 1. In the long run, D would always prioritize the city bus (i.e. Accident 2), notwithstanding the fact that one is about 95% certain that the rescue attempt will not succeed and that all individuals will die.

To exemplify further, consider the Montreal underground transport system, with a capacity of 1000 individuals per train (see [43]). Given a sequence of length \(n = 550\), always choosing an accident involving such a train, even though one is 99% certain that all 1000 individuals will die during the rescue attempt (i.e., the probability of success is 1%, compared to the lower bound of \(1/x = .1\%\)), provides an average of .39% of sequences below the benchmark.

All things considered, the assumption that ethical decision procedures should be risk neutral in the long run and maximize expected utility provides us with two plausible situations that appear contrary to moral intuitions: On the one hand, we obtain that one should always sacrifice a life and try to save 30 individuals, even if one believes at 95% that these individuals will die. On the other, one should always sacrifice a life and try to save 1000 individuals, even though one is 99% certain that the rescue attempt will fail and that these individuals will die. From the perspective of automated decision making, D would hold a counterintuitive moral principle such that the more there are individuals to be saved, the less one needs to believe that one will succeed in the attempt at saving their lives. To some extent, this brings us back to the idea that the needs of the many (x) outweigh those of the few (cf. Parfit’s [44] repugnant conclusion), notwithstanding whether or not we believe the many can be saved, as long as this belief is above 1/x.

While this is a consequence of the fact that \(pr = 1/x\) acts as a lower bound in the simulations, which in turn is a consequence of (mathematically) defining utility as a weighted sum of the number of lives saved, it is worth emphasizing that this phenomenon does not depend on a specific definition of utility and can be generalized. Indeed, there will be a lower bound for pr notwithstanding how utility is defined. To exemplify, assume that the utility of choosing Accident 1 is \(u_1\), whereas the utility of succeeding in saving the individuals in Accident 2 is given by \(u_2\) and the disutility of failing to save the individuals is given by \(d_2\). The lower bound becomes \(u_1 - d_2 / u_2 - d_2\).Footnote 7 Assuming that \(u_1\) is constant, that saving all individuals in Accident 2 produces more utility than saving the single individual in Accident 1 (i.e., \(u_2 > u_1\)), and that \(u_2\) and \(d_2\) are influenced by the number of individuals involved (though this relationship might not be linear), the difference \(u_2 - d_2\) will tend to grow larger as the number of individuals involved in Accident 2 does. Hence, \(u_1 - d_2 / u_2 - d_2\) will tend to be smaller, meaning that the more there are individuals involved, the less one will need to believe that the rescue attempt can succeed.

It should also be emphasized that this phenomenon does not depend upon Thoma’s distinction between lower- and higher-level agents. Indeed, when considered as a one-time choice, \(u_1 - d_2 / u_2 - d_2\) marks the minimal pr value so that the expected utility of Accident 2 can be greater than or equal to the expected utility of Accident 1.Footnote 8 This mathematical fact, which explains why the algorithm will chose Accident 1 or Accident 2, needs to be taken as an important limitation of automated decision procedures based on rational choice theory and risk analysis from the perspective of global interpretability (cf. [36, 45]), for the output of the decision procedure and the mathematical structure of the model are biased by the computational behavior of expected utility. Put differently, the (alleged) ethical output of the algorithm in the long run is predetermined by a mathematical relationship which, from an ethical standpoint, can be seen as a theoretic bias.Footnote 9

8 Value-ladenness

At this point, one might argue that the predetermination of the algorithm’s output is merely a consequence of implementing an ethical theory, and that the moral prior problem is only a manifestation of the idea that algorithms can be value-laden. For instance, one might argue that any algorithm implementing an ethical theory is value-laden insofar as it is designed “to perform a task with a particular moral delegation in mind (Martin, [2], p.841)” and, as such, the predetermination of the algorithm’s output by all the aforementioned elements (e.g. x, n, pr, threshold of risk acceptability) is simply a consequence of the (deterministic) idea of implementing an ethical theory. Although this would not explain the fact that different beliefs and choices during conception and usage can lead to contradictory results in spite of the implementation of the same theory, one could still argue that contradictory results remain a consequence of the value-ladenness of the decision procedure. Kraemer et al. [46], for instance, argued that algorithms meant to implement ethical principles (i.e. explicit moral agents in Moor’s [7] sense) can, in themselves, be value-neutral insofar as they see value-ladenness as an implicit or explicit stand on an ethical issue. Indeed, when evaluating two algorithms designed to accomplish the same task, they see these algorithms as value-laden when “one cannot rationally choose between them without explicitly or implicitly taking ethical concerns into account (Kraemer et al., [46], p.251)”. As such, it is not because an algorithm produces an ethical output based on the implementation of an ethical theory (e.g. when implementing rational choice theory and consequentialism) that this algorithm is necessarily value-laden, for it does not necessarily imply that ethical choices will be required during the design of such an algorithm. According to such a view, it is not because prior beliefs and technical choices predetermine the (ethical) output of the decision procedure (and, therefore, have moral repercussions) that these prior beliefs and technical choices necessarily need to be considered value-laden. Keeping in mind this distinction, Kraemer et al. [46] might argue that the aforementioned contradictory results remain a consequence of the value-ladenness of the algorithm insofar as, for instance, choosing the threshold of risk acceptability or evaluating the probability of success are implicit ethical stands, whereas the other elements (x, n) as well as the mathematical relationship between pr and 1/x are merely aspects of the ethical theory that is implemented.

The question lying before us is therefore as follows: Can a value-neutral decision procedure implementing an ethical principle nonetheless be subjected to the moral prior problem? That is, can value-neutral technical choices predetermine the ethical output of an algorithm? If the answer to that question is yes, as we will show in the next section, then it follows that the moral prior problem goes beyond the value-ladenness of AI and algorithms.

9 Modeling uncertainty: digging a little further into the moral prior problem

Let us revisit the decision procedure presented in Sect.  6. If the threshold for risk acceptability and probabilities of success (pr) are taken as input parameters from a lower-level perspective (i.e. they are left to users), then the proposed algorithm can be considered as value-neutral from a higher-level perspective even though it consists in the implementation of an ethical theory (cf. Kraemer et al. [46]).

Computer simulations and mathematical models are idealizations that try to capture important aspects of empirical phenomena. Thoma’s initial example is interesting from the perspective of risk analysis insofar as it assumes uncertainty with regard to the possible success of the rescue attempt in Accident 2. This uncertainty is represented by the principle of indifference which, by fixing the probability of each event at .5, exhibits an epistemic neutrality (or agnosticism) where the agent does not have any reason allowing her to think that one outcome (success or failure) is more plausible than the other. However, one might wonder whether the principle of indifference is always warranted, and whether fixing the probability of success at .5 is the best way to model uncertainty. Indeed, it is plausible that the agent might be aware that there are factors that could influence her belief regarding the probability of success in saving the lives in Accident 2, but that she simply does not know enough of the situation to form an informed judgment. In other words, she is uncertain of what the risk, understood as the probability of success (failure), is. The principle of indifference can thus be applied to her prior belief in the probability of success rather than to the outcome of the rescue attempt. Put differently, when the choice is understood as an individual choice, it makes sense to translate indifference by fixing an equal probability to the success or failure of the rescue attempt (i.e. to assume indifference over outcomes). However, when the choice is understood as one within a series of independent similar choices, and when the agent’s priors are assumed to vary from one choice to another, then indifference can be applied to the agent’s prior regarding her degree of beliefs rather than to her prior regarding the states of the world.

To model the idea that the agent might be uncertain of her priors at each occurrence of the choice, one might want to introduce variability on pr at different levels within the simulations. In what follows, we will discuss various ways to model uncertainty within our simulations in order to further illustrate (i) that whether one should always choose Accident 2 ultimately depends on agents’ priors, including attitudes towards risks and risk assessment, and (ii) what should be done not only depends on parameter choice, but also on prior technical beliefs regarding how the algorithm should be defined (coded), thus further exemplifying how the moral prior problem also applies to higher-level agents (e.g. developers) and to value-neutral algorithms.

9.1 Indifference with respect to the sequence’s prior

Uncertainty and variability on pr can be introduced in various ways. One way to introduce variability is to randomly select the probability that will be assigned to each choice of a sequence, and to vary this probability for each sequence within the simulation. This would represent an agent with a (randomly determined) fixed belief in the probability of succeeding in saving the lives in Accident 2. Thus defined, the probability varies from sequence to sequence, but not from choice to choice (sequences are iterations of choices; this will be studied in Sect. 9.2 below). Table 3 shows results for the probability intervals [0, 1], [.25, .75] and [.4, .6] for selected values of x. These probability intervals are centered around .5 (the average of the interval) and represent different epistemic attitudes towards risk. While in our simulations random choices are always taken from a uniform probability distribution, consistent with the principle of indifference, [0, 1] exhibits more uncertainty with a wider probability interval, whereas [.4, .6] and [.25, .75] represent slight violations of the principle of indifference with less uncertainty (represented by narrower probability intervals). For each simulation and each probability interval, the average probability pr over the 10,000 simulations remains .5. As such, the agent exhibits indifference and epistemic neutrality on average in the 10,000 simulations, but for each sequence the agent remains uncertain about pr.

Table 3 Percentage of 10,000 below the benchmark value for a random pr assigned to each sequence given a fixed probability interval

Two things are worth mentioning. First, the agent’s belief in her priors has an impact on the output of the decision procedure. If the agent assumes a form of moderate uncertainty that is narrower and that excludes the extreme probabilities below .25 and above .75, then the choice needs to be made more frequently (i.e. the sequence needs to be longer) than with a fixed probability of .5 in order to favor Accident 2. And as the interval centers on .5 with [.4, .6], the probability distribution gets closer to the distribution of the fixed probability of .5 (as in Thoma’s initial example; cf. Table  2). Recall that the average (indifferent) probability for all intervals over the 10,000 simulations is .5. That is to say that even if an agent manifests indifference and neutral uncertainty on average, the overall evaluation of what should be done in the long run depends upon her degrees of beliefs in her priors, represented by the width of the probability interval. For instance, maximizing expected utility in the long run does not dictate to always choose Accident 2 if the agent’s priors can range within the probability interval [0, 1], but it does for the interval [.25, .75] from \(n = 15\) and \(x = 20\). For the sake of the example (and although we think it is an epistemic issue rather than an ethical one), we will consider the width of the probability interval as an ethical issue that should be left to user specification in order to keep the algorithm value-neutral.

Second, as previously mentioned, this model considers indifference over prior beliefs rather than outcomes. In contrast to Thoma’s initial example, where indifference is conceived with respect to the rescue attempt’s probability of success (or failure), the probability interval [0,1] here represents indifference with respect to prior belief in the probability of succeeding in saving the life of the individuals in Accident 2 for each sequence. While both approaches are based on the principle of indifference, it is noteworthy, however, that the output of both algorithms are quite different. Indeed, comparing Tables 2 and 3, one can see that Accident 2 would not be chosen despite large values of x and/or n in this case. As a result, how indifference is incorporated (coded) within the model influences the output of the decision procedure.

This example serves to illustrate how deeply rooted the moral prior problem is. Not only does it illustrate that what should be done depends on the moral priors of the agent and her attitude towards risks and uncertainty, but also that prior beliefs regarding how to code and how to integrate different aspects within the model (e.g. indifference) predetermine the output of the decision procedure. This last point is worth emphasizing insofar as it shows how a value-neutral technical choice predetermines the output of the decision procedure. Indeed, the same assumption from the perspective of expected utility theory (i.e. the principle of indifference) can lead to contrary ethical conclusions depending on how the principle is implemented within the algorithm (i.e. how the algorithm is defined). This point is important insofar as deciding whether to consider indifference over outcomes or over priors is not an ethical issue, at least not in Kraemer’s et al. [46] sense. In their view, the value-ladenness of an algorithm appears when the choice to be made “is not just a practical decision that must be made when designing an algorithm, but a genuinely ethical one (Kraemer’s et al., [46], p.259).” In this case, the fact that the choice between modeling indifference over outcomes or over priors has an impact on the output of the algorithm is simply not relevant to determine whether this choice is value-laden. We thus obtain a value-neutral algorithm which output is predetermined by a value-neutral choice regarding how to implement indifference.

9.2 Uncertainty of priors over events with fixed probability intervals

The above example can be modified to further exemplify how introducing uncertainty in various ways (i.e. value-neutral technical choices) can have an impact on the output of the decision procedure. In addition to randomly assigning a probability to each sequence, one can also model uncertainty by introducing variability for each choice of each sequence. This can be accomplished by randomly selecting the probability of each choice within a fixed probability interval. This way of modeling the situation is meant to represent the idea that the agent is uncertain of pr at each occurrence of the choice within a given sequence. As it happens, this way of modeling the situation for the intervals [0, 1], [.25, .75] and [.4, .6] provided results equivalent to those of a fixed \(pr =.5\) (as in Table 2).

The same phenomenon applied for asymmetric probability interval where the average probability was not .5. For instance, simulations with probability intervals [.25, 1] and [0, .75] yielded results equivalent to those of the fixed probabilities \(pr =.625\) and \(pr =.375\). Overall, randomness at the level of choices defined by a random probability taken from a fixed probability interval provided results equivalent to those of the average of the interval. Accordingly, introducing randomness at the level of choices yields results that are dissimilar to those when introducing randomness at the level of sequences and, therefore, the (value-neutral) definition of the algorithm (i.e. how one believes randomness should be introduced) has an effect on the output of the (ethical) decision procedure.

9.3 Uncertainty of priors with random probability intervals

We investigated further the repercussions of introducing uncertainty within our simulation using a random probability taken from a random probability interval for each choice of a given sequence. Different scenarios were tested. First, the probability interval was defined by randomly choosing a lower bound and an upper bound within [0, .5] and [.5, 1], respectively. In this setting, the average probability of success in Accident 2 always converged on .5, and results were equivalent to those obtained with \(pr =.5\) (cf. Table 2). Second, a lower bound was randomly chosen within the interval [0,1], and then an upper bound was randomly chosen within the interval [random lower bound, 1], providing us with an interval [random lower bound, random upper bound]. With this configuration, the average probability always converged on .625, and results were equivalent to those obtained with a fixed \(pr =.625\). Third, the upper bound was first defined randomly, and the lower bound was then randomly selected within the interval [0, random upper bound], providing us with an interval [random lower bound, random upper bound]. In this case, the average probability always converged on .375, and results were equivalent to those obtained with a fixed \(pr = .375\). Fourth, a random probability was chosen from the interval [random lower bound, 1], which always resulted in an optimistic bias equivalent to the simulation with \(pr =.75\). Finally, a random probability was chosen from the interval [0, random upper bound], which always resulted in a pessimistic bias equivalent to the simulation with \(pr =.25\).

From these examples, we obtain that introducing a random interval [random lower bound, random upper bound] will have different effects on the simulations depending on how the algorithm is constructed. For instance, defining the lower bound first biases the results towards an optimistic fixed probability (e.g., .625), whereas defining the upper bound first biases the results towards a pessimistic fixed probability (e.g., .375). This last point is worth emphasizing given that determining how to introduce randomness within an algorithm is not a choice that requires programmers to make ethical judgments. It is a value-neutral choice illustrating that technical considerations such as coding determine the choice that is prescribed by the decision procedure. As such, agents’ priors with respect to coding, which explain why specific technical decisions are made and how algorithms are defined, can be characterized as moral priors insofar as they have an impact on the algorithm’s output (i.e. they have moral repercussions). All things considered, ethical decision procedures can be biased by programmers’ prior (value-neutral technical) beliefs regarding how randomness should be introduced and how algorithms should be defined, thus showing how the moral prior problem not only affects both lower- and higher-level agents, but also value-neutral algorithms.

10 On the possibility of truly autonomous ethical machines

Our analysis is meant to exemplify what we propose to call the moral prior problem, namely that prior beliefs and technical choices made during both conception and usage and that are external to the decision procedure itself predetermine the output of the algorithm. All models, including simulations based on statistical and probabilistic modeling, are sensible to variations in their parameters (cf. Taleb [47]). This point is important from the perspective of explainable AI and the domain of application of automated decision procedures (cf. Phillips et al. [41]) insofar as (so-called) autonomous (automated) ethical (moral) agents cannot, by themselves, make ethical choices, since these choices are predetermined by moral priors. Although we would be inclined to side with Martin [2] and see any algorithm meant to achieve the automation of ethical reasoning and behavior as value-laden insofar as it aims at a form of moral delegation, this would not imply that the moral prior problem is reducible to the value-ladenness of algorithms. Indeed, the moral prior problem goes beyond the value-ladenness of AI and algorithms and lies in the fact that decision procedures are, in themselves, insufficient to solve ethical dilemmas insofar as the explanation of the decision made by the algorithm requires an appeal to choices that cannot be explained by the procedures themselves (see Peterson and Broersen [31]). The moral prior problem and the value-ladenness of AI and algorithms are therefore two related but distinct problems. The moral prior problem points towards challenges in defining explicit moral agents, but also to the impossibility of defining fully autonomous moral agents in Moor’s [7] sense.

Our analysis is not meant to provide a proof that ethical machines and algorithms are impossible, but is rather meant to raise awareness on their limitations. For instance, while this paper focused on consequentialism and the maximization of expected utility, there are further ethical and decision theoretic frameworks that need to be analyzed in order to show how their implementation gives rise to the moral prior problem. In this respect, the present paper should not be considered as a generalization aiming to establish that all implementations of all normative theories suffer from the moral prior problem, but rather that the moral prior problem exists, and that it goes beyond the value-ladenness of algorithms and technologies. As it happens, the moral prior problem applies to both lower- and higher-level agents as well as to value-neutral and value-laden algorithms. From the perspective of higher-level agents, choices made on the grounds of prior beliefs regarding how algorithms should be defined and implemented predetermine the output of the decision procedure, as do lower-level agents’ beliefs regarding how the algorithms should be used with specific inputs and parameters. And although some might exhort developers to leave ethical choices to users (e.g. Kraemer et al. [46]), one needs to keep in mind that value-neutral technical choices can still bias the (alleged) ethical output of a decision procedure. When implementing rational choice and risk analysis in automated decision procedures, the conclusion algorithms will prescribe will always depend on the priors that are assumed during both construction and usage, including priors with respect to risk acceptability, risk analysis, as well as priors regarding the technical implementation of risk-based decision making. These priors can be dubbed moral priors insofar as they predetermine the output of the (alleged) ethical decision procedure, even though they are not necessarily rooted in ethical theory. Overall, instead of trying to build automated autonomous ethical dilemma solvers, scholars should think of ethical AI as a mean to an end rather than as an end in itself, that is, as a tool that can help individuals in making ethical choices rather than as a promising technology that could make ethical choices in their place.

To conclude, scholars should keep in mind the existence of the moral prior problem when trying to implement ethical considerations within AI and algorithms, since otherwise implementing ethics can be synonymous to implementing biases within decision procedures. The moral prior problem is a fundamental latent issue inherent to the implementation of ethical decision procedures. As such, it cannot be solved through a general or universal solution. Rather, the ramifications of this problem need to be studied on a case-by-case basis, depending not only on the framework that is implemented, but also on how it is implemented.