1 Introduction

The use of decision-making algorithms promises societal benefits in a wide variety of applications. For many such applications, decisions have moral implications. Consider, for example, cases of lending and policing through algorithms. These tasks can be understood as a centralized agent’s allocation of scarce resources (i.e., loans and police officers) among several groups with the goal of maximizing objectives (i.e., repayment and security). Fairness considerations based on the principle of equal opportunity require that equally creditworthy individuals or equally criminal individuals have the same chances of receiving a loan or of being arrested (Elzayn et al., 2019). However, it is especially in these morally sensitive domains that the use of algorithms faces societal resistance. Understanding algorithm aversion is therefore of utmost importance because it allows us to understand whether the resulting resistance to this technology can be addressed and, if so, at what level (Khasawneh, 2018). Necessary responses could be located on the level of governance where certain laws (e.g., liability law) would have to be adjusted or on the educational level where certain fears would have to be addressed through a demystification of algorithms and their actual functioning.

A prominent ethical objection to the use of algorithms concerns the opaqueness of machine learning (see, for instance, Mittelstadt et al., 2016; Lepri et al., 2018). Transparency is considered important to be able to contest the algorithm’s implicit values on both epistemic and normative grounds (Binns, 2018). Epistemic arguments comprise questions regarding the performance of the algorithm like whether a model is generalizable or over-fitted, while normative arguments comprise, for instance, the inclusion of discrimination detection or fairness constraints (Binns, 2018). On normative grounds, opaqueness may be less of a problem if the algorithm’s goal is to increase a clearly defined performance measure such as accuracy or speed in the absence of fairness concerns. If an algorithm-managed fund permanently outperforms the market, its opaqueness might be of less concern. If an algorithm’s morally relevant case-by-case decisions cannot be accounted for by its programmers, how can we be sure that the machine follows an ethical rationale? How can we know that the algorithm does not base its decision to grant a loan or arrest someone on—say—racial characteristics? The problem becomes particularly apparent in the case of deep learning algorithms that follow allocation rules that they adjust endogenously based on incoming data.

The explainability of algorithmic decisions is urged for good reasons by ethicists of information technology (e.g., Mittelstadt, 2016; Wachter et al., 2017). Assuring this explainability might imply restricting the algorithm to follow predetermined and exogenous rules that it cannot adjust or accidentally change. However, it is less evident whether opaqueness is the sole reason for laypeople’s algorithm aversion. It is well documented that people’s attitude toward transparency is at least ambiguous. We all have a tendency to ignore relevant available information because processing that information is individually costly (Bettman et al., 1990) and might fundamentally challenge our self-image (Grossman & Van Der Weele, 2017). People are especially likely to avoid potentially negative feedback regarding qualities that they care about, such as intelligence and beauty (Eil & Rao, 2011) or work performance (Moss et al., 2009). Thus, a decision to one’s disadvantage can be attributed to the decision-maker’s supposed (or actual) bias instead of one’s own mediocrity, which may be a comforting conviction. Thus, the inclination to maintain a favorable self-image might feed into the resistance to algorithms, as they might also reveal unpleasant facts.

In this paper, we examine whether there is more to people’s aversion to algorithms than the fear of opaque and biased decisions. This seems justified in light of evidence that suggests that people might not doubt the quality of algorithmic decisions, but still reject them. A representative large-scale survey shows that 73% of Germans do not want algorithms to make decisions without a human check.Footnote 1 A European survey contained similar results: 64% of respondents chose the statement “Algorithms might be objective, but I feel uneasy if computers make decisions about me. I prefer humans make those decisions” to the statement “I prefer that algorithms judge me instead of humans. They make more objective decisions that are the same for everyone.”Footnote 2 The surveys also show that respondents provide many reasons for their reticent attitude toward algorithms. But these reasons seem to be less based on a reflected rationale but rather serve an underlying emotionally driven conviction. Thus, the negative attitude is a feeling of unease—following a hardwired heuristic—that is subsequently justified by numerous plausible arguments (Haidt, 2001; Sunstein, 2005). Put differently, the reticent attitude seems to be an initial emotional response that is only then followed by a conscious post hoc rationalization of the emotion.

It can be assumed that this skeptical attitude is as pronounced when decisions are explicitly ethical in nature. Experimental research shows that people perceive the same decision as less ethical and authentic when it is made by an algorithm instead of a human (Jago, 2019). The skepticism is also exemplified in the field of health care by the titles of recent books reviewed in Nature: Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again (Topol, 2019) and How Algorithms Could Bring Empathy Back to Medicine (Insel, 2019). Topol (2019) claimed that as machines get smarter and take over more tasks, people must become even more human to compensate. From an analytical perspective, this raises the question of what exactly people miss in non-human decision-making.

The skepticism against algorithms may also be fueled by the fact that algorithms might perpetuate human prejudice. There is an extensive body of literature that elaborates on systematic und subconscious distortions in human decision-making such as in-group favoritism (e.g., Tajfel & Turner, 1986), self-serving bias (e.g., Babcock & Loewenstein, 1997), anchoring effects (e.g., Furnham & Boo, 2011), or overconfidence (e.g., Kahneman & Tversky, 1977). If, for example, algorithms allocate fewer loans to certain groups or more police officers to certain areas due to racial characteristics, it is because their training data are derived from prejudiced human decisions. Indeed, recent research shows that algorithms are bound to perpetuate human biases because they are based on data generated by humans (Chander, 2016). A case in point is the Microsoft bot that was programmed to learn from Twitter users and mimic their conversation styles—and thus became a racist bully.Footnote 3

To better understand the roots of algorithm aversion beyond opacity, we conducted two experimental studies. In Study 1, following our conjecture, we tested whether people empirically prefer humans, who are less restricted in their freedom to take morally charged decisions, over clearly rule-bound algorithms, even if veiled discrimination can be ruled out. Support for our conjecture would enable us to test whether people favor human decision-makers for principled or more instrumental reasons. Therefore, in Study 2, we tested whether people appreciate the mere presence of a human being in the decision-making process or the specifically human quality of moral autonomy (i.e., the ability to transcend rules in light of specific cases).

To investigate our research questions, we employed an economic experiment that used monetary incentives. This means that the choices that we elicited in our studies express participants’ true preferences and are less likely to be biased, for instance, by expressions of social desirability (Grimm, 2010). Using an incentivized experiment is particularly advantageous in the context of algorithm aversion, as, in a hypothetical setting, people might merely choose humans over algorithms because they believe that this is what they are expected to choose as social beings. In our experiment, however, expressing a clear preference for one or the other decision-making entity (human or algorithm) comes with actual monetary costs and potential benefits. Furthermore, choosing a decision-making entity that one does not actually prefer might lead to an undesired monetary outcome. In line with the standards of economic research laboratories, no deception was applied to any of the participants: The course of the experiment was made transparent to all participants, and they knew about the no-deception policy beforehand. This is especially important in experiments with ethically relevant research questions, because the anticipation of being deceived might distort participants’ decisions (e.g., Hertwig & Ortmann, 2001).

2 Experimental Setting

The experimental setting comprises a distribution conflict in which algorithms may apply human-created fairness principles to concrete cases. In the experiment, participants acted as “deciders” or “workers.”Footnote 4 The core unit of the experiment consisted of a real-effort task in which participants in the role of workers generated a joint team budget in teams of two; this was later distributed between the two workers.

In the real-effort task, each worker was individually confronted with a list of disarranged sliders on the computer screen (Gill and Prowse, 2012, see Fig. 1). Each slider was initially positioned between the positions “0” and “100,” leaving out the middle position “50.” Each worker had to position as many sliders as possible exactly to the 50 position within a given time frame of 8 min by clicking on the sliders with the mouse to drag and drop them. For each correctly positioned slider, a worker earned one Experimental Currency Unit (1 ECU = 10 Eurocent). After time was up, the joint team budget was determined as follows: One worker from the team was selected randomly, and this worker’s earnings were doubled. The sum of the selected worker’s doubled individual earnings and the (not doubled) individual earnings of the other (not selected) worker constituted the joint team budget for each worker team. For example, if the randomly selected worker correctly positioned 20 sliders and the other worker positioned five sliders correctly, the joint team budget was 20 ECU ∗ 2 + 5 ECU = 45 ECU. Based on the workers’ individual earnings (and the work efforts behind them) and the random factor in duplicating one worker’s earnings, we expected a conflict to arise which triggered various individual moral attitudes on how to distribute the joint team budget.

Fig. 1
figure 1

Real-effort slider task

The described real-effort task represented one stage of the experiment. In total, the experiment comprised four successive stages:

  1. 1.

    Determination of the distribution rule,

  2. 2.

    Regime choice,

  3. 3.

    Real-effort task (as described above), and

  4. 4.

    Implementation of distribution rule.

The course of the experiment is described in detail below.

2.1 Stage 1: Distribution rule determination

As mentioned above, participants were randomly assigned either the role of a worker or the role of a decider. The experiment started with the deciders, who determined a distribution rule according to which the joint team budget of each worker team (generated later during Stage 3) was to be distributed. For this, each decider was presented with the same set of six distribution rules. The distribution rules accounted for the individual earnings of each worker in a team, as well as for the duplication of the individual earnings of one randomly selected worker. The rules included “merit-based,” “equal shares,” and “winner-takes-all” principles (Messick, 1993), or a combination thereof.Footnote 5 For example, one rule posited that each worker in the team shall receive exactly the amount the worker earned, and the additional amount resulting from the duplication of the amount earned by one worker shall be divided equally between the two workers. Deciders had to choose one out of the six presented distribution rules. That way, we ensured that the distribution rule was ultimately determined in a process which engaged humans.

Each rule chosen by the deciders was placed in a virtual ballot box from which one of those rules was subsequently drawn by the computer. If several deciders opted for the same rule, the chances of this rule being drawn increased accordingly.Footnote 6 Later on, this rule could determine the distribution of the joint team budget among the workers on each team. The deciders were immediately informed about which rule from the virtual ballot box had been chosen. The workers were informed about how the distribution rule was determined. However, they learned neither the set nor the content of the six distribution rules. They learned which rule had been selected only at the end of the experiment. Note that we intentionally did not disclose the content of the rule to the workers. Workers could only choose the way in which the rule was applied, which is described in the subsequent section.

2.2 Stage 2: Regime choice

To investigate whether people disliked the algorithm’s lack of moral discretion, we varied experimentally how rigidly the previously determined distribution rule was implemented (changeable vs. not changeable) and by whom (human vs. computer). Consequently, at Stage 2, the workers could choose among three regimes, which determined how rigidly and by whom the previously determined distribution rule was implemented after the workers generated the joint team budget (see Table 1 for an overview).

Table 1 Overview of the regimes that workers could choose

1) In the discrete human regime,Footnote 7 human deciders were assigned to a worker team and could—after learning the individual earnings of the workers on the assigned team and whose individual earnings were duplicated by chance—freely change the previously determined rule. The decider could either stick to the previously determined distribution rule or select one of the other five. Hence, the workers knew that in the discrete human regime, the deciders were not bound to the previously determined distribution rule. (Note, however, that the workers did not know the content of the determined rule.) The workers also knew that the deciders in the discrete human regime earned a fixed amount of 100 ECU whether they implemented the previously determined distribution rule or selected another.

2) Under the rule-bound human regime, deciders were also assigned to worker teams. As in the discrete human regime, they were not bound to the previously determined distribution rule and could change it after learning the individual earnings from the workers of their team and whose individual earnings where duplicated by chance. However, under the rule-bound human regime, the deciders’ payment depended on whether they implemented the previously determined distribution rule or decided to change it. If they implemented the previously determined distribution rule, they earned 100 ECU. If they picked another rule, they received no payment (i.e., 0 ECU). Workers were informed that changing the previously determined distribution rule was costly to the deciders under the rule-bound human regime.

3) Under the rule-bound algorithm regime, no deciders were assigned to the worker teams. The previously determined distribution rule was applied automatically to the worker teams by the computer without human interference.

Generally, workers were not forced to choose any of these regimes. They could refrain from choosing a specific regime and express their indifference. If they chose a specific regime,

1 ECU was deducted from their experimental income. Again, please note that when choosing a regime, workers did not know which rule had been determined during Stage 1. This is a crucial design feature: Knowing that the rule is, say, merit based could open up strategic reasoning including one’s own performance estimation as well as the chance that a human decider would switch to a more equity-based rule. We needed to rule out this kind of reasoning qua design to measure our main dependent variable without additional interfering variables. Therefore, workers only knew about the features of each regime: who will finally implement the distribution rule (a human or a computer), whether the implementer could change the previously determined distribution rule (yes or no), and what the monetary consequences were if the rule was changed (costly or not).

After workers chose their preferred regime during Stage 2, teams of two workers who chose the same regime were formed. Those workers who did not choose a regime during Stage 2 were either assigned to a regime for which there was a spare worker who had no team partner, or, if this demand was fulfilled, randomly assigned to a regime in pairs of two.Footnote 8

Next, deciders were actually assigned to worker teams consisting of workers who either chose regime discrete human or rule-bound human. Workers who chose rule-bound algorithm were not assigned a decider because, in this regime, the previously determined distribution rule was implemented automatically by the computer without the possibility of human interference. If all workers chose rule-bound algorithm or did not prefer a regime and were therefore randomly assigned to the rule-bound algorithm regime, surplus deciders were not assigned to a worker team. They remained inactive for the rest of the experiment and received fixed earnings of 100 ECU.

2.3 Stage 3: Real-effort task and generation of joint team budget

At this stage, the workers generated the joint team budget in teams of two by positioning the sliders on their computer screen within the given period of 8 min. After the workers finished, the computer selected one worker randomly from each team, duplicated the worker’s earnings, and calculated the generated joint budget for each team. (See above for a detailed description of this stage.)

2.4 Stage 4: Change and implementation of selected distribution rule

After the joint team budgets were generated by the worker teams, the deciders in the discrete human and rule-bound human regimes were informed of the size of the team budget for their assigned worker team, the individual earnings of each worker on the team (corresponding to the number of correctly solved slider tasks), and whose earnings were randomly duplicated. The deciders in the discrete human and rule-bound human regimes could then implement a distribution rule. This rule could be the rule that was originally determined in the random draw from the virtual ballot box during Stage 1 (and displayed on their computer screens) or another rule from the set of distribution rules. If the deciders chose to change the previously determined rule, they incurred no cost under the discrete human regime and a cost of 100 ECU under the rule-bound human regime. Because there was no decider assigned to the worker teams under the rule-bound algorithm regime, the previously selected distribution rule was implemented automatically without human interference.

After the distribution rules were implemented, individual payoffs were calculated and paid out at the end of the experiment. Before the experiment ended, participants completed a set of post-experimental questions and were informed about their final earnings. Figure 2 provides an overview of our experimental setup.

Fig. 2
figure 2

Overview of experimental setting

3 Study 1: Testing for Algorithm Aversion Beyond Opaqueness

In our first study, we tested whether workers expressed algorithm aversion by preferring a regime involving human deciders who have the discretion to discard a fairness principle at will in light of an individual case. This implied that workers rejected a regime in which the distribution rule was unchangeable and automatically applied to an individual case with no possibility of human interference. If people’s skepticism toward algorithms overlays potential reservations about human imponderables when veiled discrimination can be ruled out by design, people will prefer human deciders who are free to decide idiosyncratically over the rule-bound algorithms. We explicitly addressed this question to investigate whether the often-identified algorithm aversion might be based on more fundamental grounds than the fear of opaque or potentially biased decisions.

3.1 Participants and Procedure

Study 1 was conducted in an economic laboratory of a large university with students majoring in various disciplines. Data were collected in January 2020, and participants were contacted through university mailing lists using the online recruitment system ORSEE (Greiner, 2015). We recruited 90 participants (50% female) who took part in three experimental sessions. Upon arrival at the laboratory, participants drew an individual code number and were seated individually in opaque cubicles. Participants then received written instructions for the experiment.Footnote 9 The instructions explicated the entire course of the experiment and were read aloud by the experimenters. Subsequently, participants answered a set of comprehension questions. Only after participants had successfully answered all items was the computerized experiment (programmed with z-Tree, Fischbacher, 2007) begun.Footnote 10 At the outset, one-third of the participants were randomly assigned the role of deciders, and two-thirds of the participants the role of workers.Footnote 11 The experiment lasted about 1 h. Subsequently, participants were compensated with a fixed amount of 4€ for showing up, along with the amount they earned during the experiment.

3.2 Measures

In Study 1, workers were instructed about the three regimes described in Sect. 2 and could then choose between the discrete human and rule-bound algorithm regimes. Workers could also express indifference and choose neither regime. Under the discrete human regime, deciders could change the previously determined distribution rule without cost (i.e., they had full discretion over the implementation of the distribution). Under the rule-bound algorithm regime, the previously determined distribution rule was unchangeable and was implemented automatically by the computer without human interference. In total, 60 participants acted as workers, and our main dependent variable was the workers’ choice of regime. We also assessed the number of correctly solved tasks depending on workers’ regime preferences. All decisions were incentivized monetarily.

3.3 Results

Regime choice

Descriptive results can be inferred from the left panel in Fig. 3. In total, 73.33% (CI = 0.603, 0.839) of the workers chose one of the two available regimes. Only 26.67% (CI = 0.161, 0.397) expressed indifference. The fraction of workers who had a specific preference was significantly larger than the fraction who had no specific preference (p < 0.001, binomial probability test,Footnote 12 two-sided). Most workers who expressed a preference preferred the human decider with decision discretion (55%, CI = 0.416, 0.67) over the rule-bound algorithm (18.33%, CI = 0.095, 0.304). The number of workers who preferred the human decider was significantly larger relative to the number of workers who preferred the algorithm (p = 0.001, BPT).

Fig. 3
figure 3

Preferences for regime variants

Performance

We find no evidence that the number of correctly solved tasks differed significantly depending on whether the participants preferred a regime (M = 21.41, CI = 16.746, 26.072) or not (M = 24.19, CI = 17.258, 31.117, d = 0.188; p = 0.522, Fisher–Pitman permutation test for two independent samplesFootnote 13). Performance also did not differ significantly between those participants who chose a human decider (who was not bound to the previously determined distribution rule) (M = 19.67, CI = 14.953, 24.381) or the algorithm (which implemented the rule automatically) (M = 26.64, CI = 13.102, 40.171) and those who had no preference for a specific regime variant (M = 24.19, CI = 17.258, 31.117; for all pair-wise comparisons, d < 0.458, and p > 0.201, FPT).

When we contrasted the performance of participants who were eventually confronted with the human decider—independently of whether they initially had a preference for the human decider or had been indifferent (M = 19.23, CI = 15.213, 23.238)—with the performance of those participants, who were eventually confronted with the algorithm—also independently of whether they initially had a preference for the algorithm or had been indifferent (M = 28, CI = 19.982, 36.018)—we found that the former group performed significantly worse compared to the latter (d = 0.617; p = 0.031, FPT).

3.4 Discussion

Study 1 identified an aversion to the algorithm that implemented a human-created exogenous rule. The workers clearly preferred a human decision-maker with discretion. Did this occur because people value the involvement of flesh-and-blood beings in ethical decision-making or because they value moral autonomy in view of individual cases? Put differently, is it the human nature of the decision-making entity itself or the human capacity to transcend rules and apply one’s own ethical standards that causes participants to favor humans over algorithms? We wanted to answer this question in Study 2 to achieve a deeper understanding of the observed algorithm aversion.

The first explanation implies that people have an intrinsic aversion to the use of algorithms in taking morally charged decisions. Ethicists usually define intrinsic value as a thing’s inherent value (Zimmerman, 2019). In this vein, previous studies indicate an aversion to algorithms that seems based on intuitive or emotional discomfort rather than on specific rational motives. This is corroborated by the fact that on surveys, respondents typically provide numerous justifications for their reticent attitude toward algorithms.Footnote 14 Bigman and Gray (2018) have documented in a series of studies that people consider moral decisions to belong to the domain of humans, not algorithms. They show that people are averse to machines making ethically relevant legal, medical, and military decisions and that this aversion is mediated by the perception that machines can neither fully think nor feel. Gogoll and Uhl (2018) ruled out the possibility that algorithm aversion was based on a misperception of machine errors or a distrust in the technology. They concluded that people might exhibit an aversion to algorithms making moral decisions per se. One potential explanation for preferring that humans be involved in decision-making merely because they are human is that this may allow people to blame someone if they are unsatisfied with the outcome of the decision-making process (Danaher, 2016) and are thus able to lodge a (formal) complaint against the individual decider.

The second explanation implies that people ascribe an instrumental value to humans—as opposed to machines—making moral decisions. Instrumental value is defined as the value that something has as a means to an end.Footnote 15 A fundamental instrumental difference between humans and algorithms is that the former usually have discretion in moral decision-making. Empathy sometimes lets us spontaneously transcend known rules in ethical decision-making in light of specific circumstances. The positive interpretation of human discretion is reflected in the saying “Let mercy take precedence over justice.” An algorithm is incapable of feeling empathy and showing mercy in the traditional human sense.Footnote 16 However, it is also incapable of denying someone their rights due to personal dislike. Rules are reliable, whereas discretion creates room for interpretation that opens the door for biases such unintended discrimination. Although we believe that people’s judgment is reliable, they often expose unintentional unethical behavior (e.g., Kim et al, 2015; Sezer et al., 2015), based on their perceived expertise or their personal relationships. Hence, the human mind still remains a black box (Chander, 2016) and is, therefore, also incomprehensible and potentially susceptible to distortions that might have significant social implications. If the aversion to algorithms was driven mainly by the fear of inexplicability, we might observe a similar skepticism toward human discretion and a preference for strictly rule-based behavior in the context of morally charged decisions. There exists evidence that people favor humans to take decisions that affect them over algorithms more, if the task at hand is perceived to be highly subjective (Castelo et al., 2019) or if they consider themselves as very special people (Longoni et al., 2019). It is, however, not clear whether this phenomenon is also relevant for taking decisions that are morally charged.

4 Study 2: Testing Intrinsic and Instrumental Explanations of Algorithm Aversion

In our second study, we tested whether people prefer the mere presence of a human being in the decision-making process to a rule-bound algorithm. We conjectured that if people appreciated the mere presence of human deciders, they should prefer a regime with a human decider to a regime with algorithm-based decisions, even in situations where deciders must pay a high price if they deviate from the previously determined distribution rule (which makes this action extremely unlikely). In contrast, if people appreciate human deciders’ ability to transcend rules in light of individual work performances (and do not prefer human deciders per se), they should be indifferent toward the choice between a decider who, de facto, is bound to the previously determined distribution rule and an algorithm with the same property. We explicitly tested these two potential explanations to gain a deeper understanding of the causes of the algorithm aversion identified in Study 1.

4.1 Participants and Procedure

We recruited 90 participants (55% female) who took part in four experimental sessions. Data were collected in February and March 2020. Study 2’s procedures were similar to those of Study 1.

4.2 Measures

In Study 2, the workers could again choose between two of the three regimes described in Sect. 2, namely between the rule-bound human and rule-bound algorithm regimes. Workers could also express indifference and choose neither regime. Under the rule-bound human regime, deciders could change the previously determined distribution rule, incurring a cost (i.e., they had discretion over the implementation of the distribution rule but faced the loss of their payoff if they switched to another rule). Under the rule-bound algorithm regime, the previously determined distribution rule was unchangeable and was implemented automatically by the computer without human interference. In total, 58 participants acted as workers. Once again, our main dependent variable was the workers’ regime choice. We also assessed the impact of participants’ regime choice on the number of correctly solved tasks. All decisions were incentivized monetarily.

4.3 Results

Regime choice

Descriptive results can be inferred from the right panel of Fig. 3. Most workers (67.24%, CI = 0.537, 0.790) chose neither regime. Only about one-third of the workers (32.76%, CI = 0.21, 0.463) either preferred the human rule-bound decider (13.79%, CI = 0.061, 0.254) or the algorithm (18.97%, CI = 0.099, 0.314). The proportion of workers who expressed no preference was significantly larger than the proportion of workers who had a specific preference (p = 0.012, BPT); it was also significantly larger than the proportion of workers who had no specific preference in Study 1 (p < 0.001, Chi square test). In addition, in contrast to Study 1, for workers who had a regime preference, there is no evidence that the preferences for the human rule-bound decider and the algorithm differed significantly from an equal distribution (p = 0.647, BPT).

Performance

We found no evidence that the number of correctly solved tasks differed significantly depending on whether the participants preferred a regime (M = 25, CI = 17.538, 32.462) or not (M = 24.15, CI = 20.352, 27.956, d = 0.065; p = 0.825, FPT). Performance also did not differ significantly between those participants who chose a human rule-bound decider (M = 23.88, CI = 8.181, 39.569) or the rule-bound algorithm (M = 25.82, CI = 16.731, 34.905) and those who did not prefer a specific regime (M = 24.15, CI = 20.352, 27.956; for all pairwise comparisons, d < 0.137 and p > 0.704, FPT). Similarly, when comparing the performance of participants who were eventually confronted with the human rule-bound decider independently of whether they initially had this preference (M = 26.032, CI = 21.001, 31.064) with the performance of those participants who were eventually confronted with the algorithm independently of whether they initially had a preference for the algorithm (M = 22.59, CI = 17.852, 27.334), we found no significant performance difference (d = 0.266; p = 0.321, FPT).Footnote 17

4.4 Discussion

Study 2 revealed that participants’ aversion to algorithms as identified in Study 1 was not based on a resistance to the artificial nature of the decision-maker per se. The results instead indicate that people ascribe an instrumental value to humans’ making moral decisions as opposed to machines’ doing so. Decision-makers in the discrete human regime in Study 1 had full discretion concerning which distribution rule to implement. Once this possibility was no longer available, the workers expressed no clear preference for the human decider. Workers’ regime choices were not driven by their desire to have a human being apply the fairness principle to an individual case (e.g., to attribute responsibility and project blame onto this person). If humans are perceived as bureaucratic executors of predetermined fairness principles, they may be easily replaced by algorithms. In this sense, the aversion to algorithms is not intrinsic.

Although deciders are not the focus of analysis, it is informative to also report how they actually behaved in our experiment. This holds particularly in the discrete human regime of Study 1, where deciders were not bound to the previously determined distribution rule and could change it without cost after learning workers’ earnings and whose earnings where duplicated by chance. As argued above, it is plausible to assume that decider behavior is associated with workers’ expectations and that deciders reacted to the information on workers’ individual efforts and duplication luck by spontaneously rewarding or punishing single workers through rule change. Two points stand out in this investigation. Firstly, only five out of 35 deciders (four out of 20 in the discrete human regime of Study 1 and one out of 15 in the rule-bound human regime of Study 2) made use of their right to change the previously determined distribution rule. Secondly, all these deciders re-enforced their initially chosen, but not previously applied, distribution rule. This was the case although the content of the initially chosen rules was different among these deciders. As some of the initially proposed distribution rules were then actually drawn at random, the number of those deciders who subsequently wanted to switch back to their originally proposed distribution rule might be even higher and our figures represent only a lower bound for this behavioral pattern.

5 General Discussion

In conjunction with the findings from Study 1, the results from Study 2 indicate that algorithm aversion is, at least partly, driven by instrumental deliberations: People do not dislike algorithms intrinsically, but cherish the discretionary scope of human deciders. The distinction between both kinds of algorithm aversion may seem rather academic at first, but it has important implications. If the rejection of algorithms in the context of morally charged decisions was based on an intrinsic aversion to the very artificiality of the decision-making entity, it might prove difficult to replace human decision-makers by algorithms as this artificiality is an essential feature of the algorithm. If, however, the rejection of algorithms is based on an instrumental aversion rather than the appearance of the algorithm, features of the decision-making process need to be changed to accommodate the algorithm’s functioning with people’s moral attitudes.

Krishnan (2020) expresses concern that to the extent that researchers working on interpretability emphasize its indispensability, they may fuel the public mistrust of algorithms. The results of our studies suggest that the reasons for the widely observed aversion against algorithms in the context of morally charged decisions may indeed be more multifaceted than implied by some parts of the ethical literature. Although many ethicists discuss the important problem of opaque algorithms, the traceability of moral decisions through transparency might not be the public’s only concern. On the contrary, participants in our studies seemed to appreciate an element of discretion in moral decision-making, as it is peculiar to human beings. The implication of this discretion is the ability to override a fairness principle in light of a specific case that was ex-ante deemed appropriate across various cases. People’s taste for “instantaneous fairness” might imply that opaque algorithms with humans involved in the decision-making process might be more readily accepted than transparent and rule-bound ones that do not involve humans, simply because these algorithms have no capacity to discard fairness principles based on spontaneous inspirations. Testing this conjecture might be an interesting path for future research.

It is noteworthy that the regimes under which workers in our experiments performed and their actual performances were interrelated. When people worked under the regime of the human decider with a discretionary scope, their performance was lower. This might have been caused by a lower ability to perform a given task, by a lower motivation to do so, or both. In any case, low performers seemed to expect the discrete human decider to let mercy take precedence over justice. Their behavior indicates that they expected deciders to deviate from the previously determined fairness principle in favor of equality of outcome through a spontaneous act of pity for the low performer in light of their poor performance. This result points to the fact that the degree of rule conformity implied by the decision-making regime should not be discussed irrespective of its consequences. The chosen regime interacts with the actions people take under the regime. People will likely adapt to the incentives they face. If borrowers rely on the spontaneous leniency of human lenders, this could have implications for their fiscal discipline. If the convicted hope for the spontaneous leniency of human judges, this could influence their parole behavior. This indirect effect of discrete or rule-bound regimes of moral decision-making, which our findings indicate, warrants the attention of ethicists and future studies.

With regard to the regime’s direct influence on performance, our study is a starting point for further investigations into the implications of algorithm supported decision-making in the context of morally charged decisions. Here, new experimental studies are needed to shed light on the question of whether the decision regime (human or algorithm-based) influences people’s propensity to engage in undesirable behavior such as laziness, free-riding, or even cheating. This research will help to address the issues arising when algorithms are implemented in decision-making processes and ultimately ensure that the positive effects of algorithms such as efficiency and incorruptibility are realized while avoiding negative counter-effects. Numerous voices are calling for caution in the application of algorithms to domains with ethical implications such as lending, policing, or medicine, so research must accompany technological development with social and ethical analyses. This research agenda delivers on this claim.

Our studies are subject to several limitations, two of which we want to mention here. First, our results are based on a sample of students from a technical university. Assuming that individuals from this sample are technophiles who are more open-minded with respect to the use of algorithms in decision-making, some caution is required in generalizing our effects. It is conceivable that a more technophobic sample would express a preference for human decision-makers even if these are perceived as rule-bound bureaucrats without moral discretion. Second, we opted for the lottery voting mechanism to determine the distribution rule. Yet, it is an open question if other mechanisms, e.g., a majority vote, would have been perceived as more or less fair and whether this would have changed the preference for the algorithm. Empirical research should address this question to gain further insights into the circumstances under which algorithms are accepted or not. Third, our findings are derived in the context of income distribution in light of work performance where the information that is available to the human decision-maker and to the algorithm is limited. Other ethically relevant situations or multidimensional performance measures that include indications for efficiency or creativity might produce different results. For the purpose of identifying a reason for algorithm aversion beyond intransparency, it was crucial to apply a straightforward and rule-bound algorithm which was not based on machine learning. It should be emphasized, however, that the fact that participants preferred the human decider with moral discretion over the rule-bound algorithm does not imply that people do not appreciate transparency in algorithms. It might well be that while in our experiment transparency was not considered decisive in human decision-making in the context of morally charged decisions, it could still be perceived as an essential feature in algorithmic decision-making. In this case, we might very well expect algorithm aversion to grow even stronger once we equip the algorithm with machine learning skills. Future research can, therefore, also confront participants with more sophisticated—and thus potentially less understandable and more error-prone—algorithms and assess whether this conjecture holds true.