1 Introduction

Our everyday thinking, in dealing with the world around us, mostly relies on evolved cognitive classifications and categorizations (see Atran [5], Boyer [23]). Due to these evolved capacities, we are able to predict changes in our environment and update these predictions rapidly [118]. In this paper, we first describe the cognitive process of categorizing and then show why the concept of artificial intelligence (AI) does not easily fit into our intuitive everyday categories. This lack of fit means that AI can be viewed as a moderately counterintuitive concept [116]. The fallacies and biases encountered in discussions of AI ethics among the general public are, we argue here, partially explained by how AI is a counterintuitive concept [104, 105].

AI commonly refers to technology that has the capacity for making (mechanical) decisions either autonomously or through enhancing decisions made by humans. Thus, the concept of AI covers a wide range of technical programming and statistical algorithmic solutions to typically local and narrowly defined problems, rather than referring just to a deep learning neural network [127]. It is hard to define AI precisely because it is hard to define intelligence precisely [127, 179]. For the purpose of this paper, we define intelligence as goal-oriented and purposeful action taken in an at least partially predictable environment [111]. We will refer to AI in its narrow sense as an algorithm that functions purposefully in an at least partially predictable environment [179].Footnote 1

2 Anthropomorphizing across categories

Folk theories are an innate human tendency of predicting and explaining various natural phenomena without relying on scientific education (see Guglielmo et al. [65]), and they heavily rely on our capacity to intuitively categorize our perceptions. Scientific theorizing and experimental work depend on our capacity for reflective thinking, and do not fully replace our intuitive thinking [156]. This is most apparent in circumstances where we desire knowledge but have little science to rely on [156]. More specifically: folk psychology aims to predict the behaviour, emotions, and cognitions of other humans (or other animals); folk biology, in turn, aims to predict the developmental history and the reproductive mechanisms of various organisms; while folk physics aims to predict and explain the forms and motions of non-living objects and substances. Our capacity to form intuitive explanations has emerged through selection pressures in our evolutionary history. Our ancestors developed everyday, non-scientific, explanations of various natural phenomena, such as earthquakes, the seasons, and the motions of celestial bodies [24]. For the folk theories to function, the human mind must be capable of classifying phenomena into various categories. Each category invites the application of different, mostly automatic cognitive inference rules. Such categories include tools [147], predators [24], pets [4], and plants [6]. In shorter time spans, humans can create new categories through cultural processes; however, these cultural categories do not necessarily function equivalently to our more basic categories [104, 105].

One of the key categorical distinctions is between agents (such as humans and some animals) and non-agents (plants, artifacts, tools, etc.; Barrett [12]). The capacity to make this distinction has been crucial for the survival of humankind. Being able to discriminate between friendly and unfriendly agents with only a sparse amount of input cues from the environment could provide a survival advantage for any organism [109]. Agents are typically conscious biological beings whose behavior can be explained by folk psychology, while non-agents, on the other hand, require folk physics. Recent research in developmental psychology has shown that children can distinguish agents from human-made artifacts at a very early age [87]. Similarly, at an early age children comprehend the functions of different kinds of tools and other artifacts [91].

We argue that AI poses a significant problem to our categorization capacities. We have no evolutionarily developed innate capacity for recognizing complex information-processing systems, such as AIs and robots, which simultaneously appear to be both artifacts and artificial agents. Artificial agents are entities which resemble natural agents but lack both selfhood and, likely—and as a background assumption in this article—consciousness. On an evolutionary scale, there simply has been no time for human cognition to develop a natural capacity to comprehend autonomous technology and predict its behavior. Indeed, some scholars in robotics and developmental psychology now use the neologism new ontological category (NOC), in reference to AIs and robots, to explain the origins of at least some of these problems [88].Footnote 2 There is no precedent in the history of our planet that an object made from lifeless material starts moving, behaving, and acting as if it were a living creature.

Of course, there have long been machines designed to imitate human or animal behavior, but these automata have always been extremely primitive compared to contemporary and developing technological systems [17]. Currently, however, many AIs make independent ethical decisions which immediately or indirectly affect human wellbeing [177], even if the decisions are mechanical and made without consciousness (see [42]). Our cognitive capacity for folk theories developed in the Upper Pleistocene environment (circa 2 million–200,000 years ago, Tooby and Cosmides [171]). Because, at that time, there were no robots, computers, formal algorithms or cybernetic systems to interact with, we lack the natural capacity to categorize them and predict their (non-conscious, logical and probabilistic) behavior based on evolved folk theories alone. Consequently, we need to resolve the lack of innate cognitive categories through developing sufficiently precise cultural categories [105].

Robots and AIs challenge our stone age minds in deeper ways, too, and our folk theories struggle to make sense of them. Our evolved automatic cognitive systems can be trained to process novel domains (such as our face-recognizing system also specializing in recognizing birds if we become ornithologists), but there is always a loss of fluency compared to use within the original domain [24]. We categorize robots and AIs, depending on their surface appearance, inconsistently as animals, tools, toys, or children, while they are none of these [25, 35]. For instance, witnessing a robot dog being kicked may make us frown and feel some form of compassion like we would towards a real dog [123, 124]. Similarly, robots designed for social interaction, such as the Paro seal used in elderly care, activate our positive social emotions [122]. These reactions are partially explained by the phenomenon of anthropomorphizing, the tendency to think and talk of non-human and non-living objects as if they have feelings, desires, personalities. A phenomenon that—as we know in light of developmental psychology—is also apparent in children when they project humanlike mental capacities to soft toys and other objects [60].

Here, we approach anthropomorphizing as a cognitive and biological phenomenon universally shared by all humans without inspecting its cultural variability. Rather than judging the potential value of anthropomorphizing, we describe some possible folk ethical challenges stemming from it.

Anthropomorphizing and the related oversensitivity of agency-detector systems have notably been studied within the cognitive science of religion. These systems are hypothesized to be at least partially responsible for our tendency to sometimes perceive agency where there is none, and also seem to partially explain the popularity of beliefs in supernatural agents (such as gods, ancestors, and spirits) [13]. Religions, like other cultural phenomena from folklore to militaries, are seen as by-products of different cognitive systems (theory of mind, contamination avoidance, kinship recognition, linguistic competence, etc.). Be it anthropomorphism [109], action representation [66] or episodic and autobiographical memory [108], these capacities now work beyond their original functioning and domains. Robots and AI systems commonly activate these same mind perception mechanisms [168], but our anthropomorphizing tendencies lead us astray [178]. Problems arise when robots and other AIs are specifically projected to have humanlike or partially humanlike minds, and are regarded as sentient, intelligent, and feeling beings. Indeed, the actions of robots that look like humans are judged differently from those that do not [101]. However, such superficial assessments are unwarranted, since robots and AIs function according to different principles from those of the human mind. Their cognitive functioning is based on algorithms and probabilistic computation, both of which are particularly challenging to grasp for our folk cognitions [38, 68, 131, 151].

We humans are generally not very good at logical and axiomatic reasoning. We have trouble comprehending conditional statements like syllogisms [47, 94]. Without extensive training, perceiving and estimating mathematical probabilities is counterintuitive and hard [151], a fact well recognized in evolutionary psychology. A classic example is the gambler’s fallacy. We tend to intuitively expect the probability of one event to be related to the outcome of another, even when the events are completely independent (e.g., after a coin is flipped twice and lands on heads both times, it is expected to more likely land on tails with the next flip; Rabin and Vayanos [148]). Moreover, computer programming is generally very difficult and time consuming for people to learn [136]. Programming and understanding software rely on algorithms which are complex concatenations of conditional statements. Current machine learning algorithms are, in turn, based on the concept of probability. Thus, understanding both conditional statements and probability is crucial for understanding how AI functions overall and specifically in ethical contexts. We propose that because people are not capable of comprehending algorithms and probabilities without extensive education, they cannot be expected to accurately comprehend AI technologies or ethical problems related to them.

Given enough time, we might culturally develop a category adequately guiding us in relation to and resembling the new ontological category (which contains robots and AIs). By doing so we could avoid many of the problems raised here. This category would build on features from the existing cognitive categories, such as animals, artifacts, and humans [125]. This cognitive development could be achieved through widespread and systematic education, accommodated by a cultural shift. One starting point would be to increase such education in the elementary and high school curricula. If this project were successful, people would learn to adequately understand the functional principles and behavior of AIs, and regulate their own intuitive reactions towards them.

3 Is it ok to abuse a cute robot?

The problems of anthropomorphizing and miscategorization become readily apparent when we examine social robotics. Social robots have some level of non-conscious comprehension of human social interaction dynamics and norms, and such robots are capable of behaving and communicating with humans in various situations. Social robots, such as the previously mentioned Paro seal, are usually purposefully designed to appeal to human emotions and evoke empathy. For instance, cute and large-eyed care robots remind us of children and baby animals [97]: they appear innocent, helpless, and difficult to ignore. In such cases, robot designers purposefully amplify our anthropomorphizing tendencies and appeal to our universal propensities for compassion towards the innocent and vulnerable.

It has been a matter of debate in the ethics of AI as to what extent it is appropriate to evoke emotions, such as compassion or promote anthropomorphizing attitudes [119, 158]. Is it deceitful to nudge a person to feel compassion towards a being which is incapable of reciprocating, let alone experientially understanding compassion?

Sometimes, relationships resembling authentic social bonds are formed with care robots. People tell their care robots stories, reminisce on the past, stroke and pet them, and cry in their presence [157]. Deceit is a significant moral risk especially when those deceived are people whose cognitive capacities have been diminished, for example, in seniors with dementia [157, 176], because the machines are specifically designed in a way that elicits pro-social reactions. The use of care robots in elderly care risks leading to the infantilization of the elderly: the treatment of the elderly like little children playing with their toys rather than adults with autonomy in many matters [157]. This is how seemingly well-meant manipulation of human social cognition is revealed to be morally problematic and even dangerous. There is evidence that experiencing positive emotions and sharing negative emotions can promote wellbeing [122], but we do not have a clear picture of what the long-term consequences of experiencing emotions based on “deceit” are. It remains worth considering how we could bring about positive emotions in elderly care, without relying on illusion and deceit.

Another type of social interaction in which we encounter miscategorization is that of callous treatment. How should we relate towards robots being damaged or “abused” [178] ? Due to our anthropomorphizing tendencies, some actions towards a robot are interpreted as abuse and thus immoral, even when the action caused no harm, that is experienced suffering or thwarted desires (ibid.). Boston Dynamics, a US robotics company (www.bostondynamics.com) produces robots that move in ways resembling the movement of humans and other animals. When such robots are pushed or kicked, many people readily interpret these actions through our folk psychology lens as “bullying”, which in turn causes empathy towards the robots and negative emotions towards the “bully” [123, 166, 183].

Researchers studying robot “abuse” have suggested that rather than judging the action itself, people judge the “abuser” for being callous and lacking compassion [29, 30]. Conversely, banging a computer keyboard or tossing a mobile phone in anger is usually not regarded as a callous act [30, 149]. Such cases reveal how a robot is intuitively categorized as some kind of quasi-human agent. Comparably, the killing of virtual characters in a computer game usually does not evoke similar negative emotions as the “abuse” of robots. Ultimately, in virtue of how they process information mobile phones, computer game virtual characters, and human-like or cute robots, are all equal.

Even if it were in some way irrational to worry about the wellbeing of a robot toy, we think it is worthwhile to consider the impact on the people who interact with the robots [36]. Indeed, some philosophers (following Kant) have proposed an analogous case in how the treatment of animals affects us. In their view, even if animals lack self-awareness and are incapable of suffering, treating them in a cruel way would be harmful to us humans [4, 36, 86]. The explicitly indifferent attitude towards the suffering of “senseless animals” could also contribute to our disposition to treat other humans callously. This claim could be defended by invoking the need of moral education. According to this view the capacities needed for moral action, such as moral imagination, empathy, and compassion, may be to some extent innate. However, the argument goes, they nevertheless require practice and training to function well [114]. Analogously, the way we treat robots could affect us. This view becomes plausible when we consider how our intuitive categorization and explicit views on whether a being is a moral patient (i.e., worthy of moral consideration) do not necessarily coincide. Empirically, this view finds some support by the finding that routine killing of animals by slaughterhouse workers predicts lower levels of well-being for the workers themselves, compared to other “dirty” jobs, such as cleaning and elderly care [10]. One hypothesis for this effect is our tendency to feel empathy towards other species in pain (ibid.), even despite consciously adhering to the belief that animals are non-conscious. Another mechanism might be the toll of the work on the empathetic capacity. Analogously, due to the anthropomorphizing of robots, in our interactions with them, our intuitive moral capacities might be employed and thus, implicitly affected even while we reflectively acknowledge that the robots are not moral patients. This mechanism would function regardless of the robot’s actual moral status [114].

The same phenomenon can become a moral psychological problem during the era of AIs and robots [54]. When our everyday reality is populated by various intelligent systems which lack the status of moral patiency, people might become accustomed towards cruelty and indifference. Because we sometimes think of robots as if they were alive and conscious, we may implicitly adopt patterns of behavior that could negatively affect our relationships with other people. In a society which welcomes social robots, such as care robots and sex dolls (see [95]), a person’s moral development might be influenced by how they themselves and others around them treat those robots [166]. Due to these moral psychological reasons, we find merit in educating people to approach robots as moral patients even while acknowledging that they are unlikely actual moral patients. We would even go so far as to expect robots to be treated with respect [142].

4 Anthropomorphism and the ideal of rationality of artificial intelligences

Artificial intelligence may in some circumstances muddle our sense of responsibility. Ethical blindness refers to a lowered sense or unawareness of immediately present ethical problems, and it can also happen in social and digi-social contexts [141]. More specifically, an AI may be anthropomorphized as an authority and perceived responsible for decisions made by people [1, 37]. An example of ethical blindness is the case of United Airlines in 2017 when a passenger was forcefully removed from an overbooked flight.Footnote 3 There were no volunteers to give up their seat, so the decision on who was going to be removed was made by a software algorithm. During the event, no staff member questioned the decision or was willing to deviate from it, which led to the situation ending with the passenger being violently removed from the aircraft; the passenger suffered a concussion and lost a tooth. After the incident, the market value of United Airlines fell by hundreds of millions of dollars, and the airline eventually ended up paying the passenger a considerable compensation.Footnote 4 In hindsight and as the general rule, the recommendation made by the algorithm should be open to being questioned, challenged and carefully reflected upon.

In the situation described above, the “emotionally cold” system was perceived as a reasonable authority responsible for making the decision. Moreover, the system might have been anthropomorphized in a very specific way: either explicitly or intuitively, the system was attributed with unbiased rationality. The staff found it implicitly acceptable to regard the system’s recommendation as reasonable and binding. However, in reality, the system’s decision was based merely on rules according to which it had been programmed or trained to act. These rules did not take into account crucial considerations in the context of social interaction, such as the possibility of emotional escalation, the need for diplomacy and creative ex tempore solutions. It remains a topic for further research to establish to what extent the role of AI systems affects the decisions in such cases above inflexible corporate hierarchies or other causes unrelated to AI. This is a general problem associated with rigid organizations [50], but we suspect that the use of AI technology will likely amplify this problem.

The key issue in relying on the assumed ideal rationality and infallibility of AI systems is that the functional principles of these systems are, for the most part, opaque [32]. Even now, some AI systems, and especially machine learning algorithms, are so complex that our cognitive apparatuses are incapable of comprehending their functionality and logic [11, 127, 179]. Within ethics, this is known as the black box problem [81]. A machine learning algorithm can be trained to perform some task well, but it is practically impossible for a human to follow the actual decision process behind its performance. There is ongoing ethical discussion on this problem, especially concerning the transparency of algorithmic decisions affecting human lives [107], but we are not going to address it here.

Of interest in this context is the psychological side to the black box problem. We have the tendency to view the decisions made by a black box more reliable than those made by humans, since algorithms are often perceived as coolly rational or even perfectly rational [110]. At this stage of AI development, it is extremely unlikely that AI systems could process all the sources of tacit information on which-human behavior constantly relies upon [179]. For example, if a member of the flight staff would have tried to persuade another passenger to give up their booking for a price of several thousands of dollars (the originally offered price was 800$), the airline might have avoided the subsequent PR catastrophe and the loss of millions of dollars. The airline was incapable of considering this possibility, and the staff was not willing to or did not dare question the decision of the airline's representative, the AI.

5 The “they only do what they have been programmed to do” fallacy

There are further problems caused by the lack of fit between our basic cognitive categories and the principles underlying machine learning and other types of AI. We stated above that AI does not neatly fall into categories, such as “animal”, “artifact” or “agent”. Yet, these categories guide our intuitive thinking, so we will look at them more closely. We want to draw attention to what happens if AI is viewed solely through the categories of “artifact” and “tool”. It is common to think that AI, or more broadly, any computer executing programs, is doing simply what it has been programmed to do. This assumption is misleading and affects the discussion about AI ethics considerably [110]. It is different to explicitly program an AI to perform a task than to program it to autonomously learn to perform the task. In both cases, the AI indeed only does what it has been programmed to do, but especially in the latter case it is hard for humans to intuitively follow and predict the complex, data-driven and probabilistic decision making; hence, the aforementioned black box problem and the surprise about the way a learning system will perform a given task [106]. “They only do what they have been programmed to do” is true, but falsely implies that we can always tell what they will do.

For example, many reinforcement learning algorithms can learn by creating their own subgoals [167, 184]. In other words, they are not simply executing subgoals determined by their programmers, but rather discover them independently [139, 167]. In 2013, the researchers at DeepMind Technologies developed an AI, based on a reinforcement learning algorithm, which learnt to play several Atari video games and even outcompeted the best human players in some of the games [128, 128, 167]. What is remarkable is that the only input the algorithm received was what a human player would see from the screen: a set of changing pixels. Based on this data, the algorithm worked to maximize its score in many subsequent playthroughs and over time through trial and error, learned to play the game. The AI learned to do exactly what contributed to a maximum score in each individual game. It made no difference whether the game was about flying an airplane or killing virtual characters.

AIs that are based on reinforcement-learning are capable of finding unexpected solutions to well defined problems. They are, ultimately, (simple) cognitive artificial agents learning complex behavioral patterns in different environments. For example, for an AI to kill virtual characters, there is no need to explicitly program an AI to kill, if killing counts towards maximizing the score. The system needs only to be programmed to maximize its score, after which it can quickly become extremely skilled in any specific task. An AI created by the OpenAI project learned to play the popular computer game Dota 2. It managed to develop complex predatory ambushes and feint strategies to maximize its score.Footnote 5 Such AIs are also capable of unpredictable behavior, as they can learn patterns or strategies unknown to humans. An example of this is the AlphaGo algorithm, which used completely novel strategies in the game Go [167]. Another example is AlphaStar in the popular real time strategy game StarCraft 2. One professional player called AlphaStar’s playing style “unimaginably unusual…[and that it] makes you question how much of StarCraft’s diverse possibilities pro players have explored” [164]. These kinds of AI agents could theoretically be trained to kill people in extremely realistic war simulations. Autonomous vacuum cleaners and unmanned aerial vehicles are already tested and trained in virtual environments (e.g., [44]), and when the test results are sufficient, the code is easily copied to an actual physical machine.

Recent history already offers examples on how difficult it can be for humans to predict algorithms’ behavior. To illustrate the issue, let us consider stock trading algorithms. They are usually reliable in trading stocks in a complex environment with many human agents [140]. However, when many algorithms are brought to the stock market and these algorithms have not been tested against both humans and other algorithms, unexpected feedback loops might arise. These can cause temporary stock crashes, and indeed, at least three such cases have already been reported.Footnote 6 Because we have a tendency to overlook such problems and lack intuition on what it actually means for the AI to “only do what it has been programmed to do”, we argue that it would be good to pay institutional and legal attention to algorithm testing and evaluation in different environments. However, unfortunately even that is no guarantee that an AI will work safely and as expected, if novel elements are introduced in its environment [78].Footnote 7

The issue can be presented in the following way: reinforcement-learning-based algorithms are programmed to learn (for example, to behave in a way which maximizes a game score). In this sense, they do exactly what they are programmed to do, but what they learn is also determined by the separate and unprogrammed environment in which the algorithm is used. Even tiny unplanned variations in the environment can potentially lead to the AI learning different things, which results in behavior that has not been explicitly programmed by anyone. One of the more striking examples of this is the phenomenon named adversarial attack. In adversarial attacks image recognition algorithms (or recognition algorithms for other kinds of input data) are completely fooled when random noise with specific parameters (in frequency or shape), invisible to human eyes, is injected into photographs [127]. This results in the algorithms categorizing various objects as something entirely different. For an example, an algorithm might classify pictures to show ostriches, even if, for us, the pictures clearly and unambiguously show dogs, buildings, etc. [134, 154].

The reason why learning algorithms are useful is the same as the reason why they are problematic from an ethical perspective: they produce novel and unexpected solutions. Their learning capacity is what makes it so difficult to predict the actions of an AI [78]. Even carefully tested algorithms can act in unpredicted ways in novel environments beyond where they were tested in. It is thus difficult to assess in advance in which environments the AI would function as planned and which its actions would be unexpected and undesired. AIs lack the natural and unspecific moral chokes and restraints, which at best prevent humans from making morally catastrophic choices. However, it makes no difference for an AI-guided robot whether it kills virtual people in a computer game or real humans in a war. It is not necessary for an AI to “desire” anyone’s death for it to be an efficient action towards its set goal.Footnote 8

6 Artificial intelligence: morally relevant, even if non-conscious

We described above what kind of problems stem from categorizing AI as primarily an “artifact” or “tool”. Next, we want to examine problems which could arise if AI was categorized like a non-artificial “moral agent”. Whether we count a person morally accountable or not affects the way we expect to be treated by then and how we treat them in return; for the judicial institution to function, we must assess legal accountability, which in turn is partially grounded on moral accountability. We seem to have an inclination to regard AI as a moral agent [7, 177], although people do not consider them as appropriate agents for moral decisions (see [115, 139]). Usually, when encountering intelligent and goal-directed behavior, the human mind attributes the target with core features of agency, that is, the presence of a self and consciousness [181]. In other words, we assume a conscious being to be behind intelligent action; or, at the very least, we assume intelligent action to be caused by some kind of intentional (volitional and feeling) goal-oriented self. Such inferences are a useful cognitive strategy in an environment populated mostly by other humans, as it has been the case for Homo sapiens for thousands of years. When we encounter goal-oriented behavior, our folk psychology systems become activated and we start making inferences about the agents’ desires, thoughts, feelings, and beliefs.

However, folk psychology becomes a hindrance when we apply it to AI. AI systems might behave intelligently in the sense that they are capable of goal-oriented action in some specific environments. Despite this, they are generally thought to lack consciousness and an experiential sense of themselves or the world: nothing “feels” or “appears” as something to them [79]. We do not want to take part in the wide philosophical discussion about consciousness and the possibility of artificial consciousness too deeply (see [16]). We are content to remark that, at least the current systems have no selfhood or agency similar to non-artificial agency, let alone anything resembling consciousness as we know it. However, these absences in no way eliminate the possibility of intelligence. A system can be intelligent without being conscious, and similarly, a being can be conscious without being particularly intelligent (e.g., a human baby). In these respects, AI resembles animals. Even if an animal would be regarded incapable of consciousness, it could still be regarded intelligent, according to the cognitive science definition. Cognitive science has not set a limit to what kind of entity can be intelligent and even plant intelligence and learning has been studied [58]. The key message is that intelligence is not just a property of the human brain, and that intelligent action is not required to be conscious.

To summarize superficially, what makes the situation complicated is that there is no philosophical or scientific consensus on the nature, origin, or function of consciousness. The debate has been going on since the 1970s [16]. Some philosophers, such as Daniel Dennett [41], have proposed that in principle, there is no difference between human and artificial consciousness. If consciousness is information processing in the brain, an artificial being can be capable of it as well. It is also often generally thought that selfhood and consciousness are required for full moral agency [142, 177], but according to this view, there is no selfhood separate from the cognitive system and the brain. However, the line of thought represented by Dennett has its critics. For example, mathematician Roger Penrose [144], neuroscientist Giulio Tononi [170], and philosopher David Pearce [143] have proposed that some of the computational properties of consciousness might not be implementable on a silicon-based microchip or on a Turing machine architecture. If they are right, technologies based on classical computation and microchip architecture could not become conscious, not to mention have moral agency or a self. This applies to all currently designed AIs.

There is more to the question whether AI has true moral agency than simply whether it can be conscious. Moral agency is better viewed as a spectrum. Even if AIs were not conscious, but still behaved similarly enough to humans, they could perhaps be counted as moral agents of some kind [142, 177]. This functional equivalence could then form the basis for at least some kind of moral responsibility and AIs might thus be placed on the moral agency spectrum. For instance, there has been discussion on the foundation of moral responsibility [71]. However, philosophers have been very skeptical about placing AI systems high or even to the middle on the moral agency spectrum [9, 28].

Robots and AIs can clearly be intelligent, that is, they can perform tasks and act in goal-oriented ways in an at least partially predictable environment. At the same time, they are not capable of extensive moral agency [177], so they can hardly be held morally or legally accountable any time in the near future. However, it does not follow that they would be morally neutral or irrelevant, since their (intelligent) actions can have an impact on people's well-being. Even a piece of simple technology, such as a hammer or a baseball bat, is not morally neutral. Instead, it reflects the values and objectives of its designers and its chain of production [55, 150]. This is even more evident with AI, since it blends into our social and moral lives, and is more interactive, autonomous and social than hammers or baseball bats.

7 The doctrine of double effect and the problems of folk consequentialism

We now turn away from challenges related to categorization and move to more general limitations of human moral cognition. Various simplifications and biases steer the conversation regarding AI ethics towards over-generalization and dead-ends [104, 105, 116, 127, 177]. As philosophers have noted during the millennia-long debate, it is challenging to establish uniform and consistent stances on ethical problems and the basic issues of ethics (see [53]). Many debates in philosophical ethics take place between incommensurable normative theories, such as consequentialism, deontological ethics and virtue ethics. Understanding these debates often requires delving deep into the conversation or even getting an education in philosophy. Our everyday moral thinking is often simpler and grounded on our social and moral emotions [68, 69]. Folk ethics typically simplifies complex moral problems and hides their nuances. This human tendency results in many difficulties in the context of democratizing AI ethics.

One of the central theories in normative ethics claims that moral acceptability of actions depends only on their consequences [40]. Consequentialism comes in several flavours, which differ based on how they evaluate the costs and benefits associated with the consequences of different actions [160]. Consequentialist arguments have been especially strongly represented in the discourse about new technologies and their introduction into our society [62, 63, 153]. For example, many arguments for autonomous traffic are based on the claim that it ultimately saves human lives. It is, therefore, justified, the argument goes, to test the autonomous traffic technology in the public space and risk a few deaths.

This kind of one-sided folk consequentialist reasoning is riddled with apparent difficulties. Automatized traffic with self-driving cars has to be planned and designed in advance of implementation, which means that the designers are forced to make choices on who is expendable in potential accidents [7, 19]. This reveals one of the best-known counterarguments against consequentialism: the estimation of utility is extremely difficult (who receives the benefit, at what price, with what externalities, etc.). We can make an additional remark that consequentialism clashes with some of our other moral intuitions, if the maximization of utility is applied straightforwardly towards the moral evaluation of actions. Our moral intuitions estimate the moral value of actions themselves (e.g., stealing is wrong) not only on their beneficial results, but (among other considerations) also based on the moral norms and values involved. For example, according to some forms of consequentialism, if someone’s internal organs could save ten other people, the harvesting and reuse of this person’s internal organs would be regarded as morally right regardless of the person’s will (for a discussion on this classical argument, see [56], pp 19–32). Studying consequentialist thinking empirically has turned out to be difficult due to challenges in operationalizing such views [98].

Even if we manage to avoid the challenges of a priori utility estimation (and definition of utility), other problems can arise with respect to AI ethics. Problems arise, for example, when consequentialism based on short-term benefits is indiscriminately applied (without considering other values, such as human dignity, responsibilities, and rights). Consider again the case of automatized traffic. Autonomous cars could be designed to protect its passenger at the expense of a pedestrian in the event of an accident (the alternative would likely not sell). In this case, the fact that someone has the money to buy this particular vehicle becomes the factor deciding between life or death, raising the passenger’s life to a higher value than that of the pedestrian. This banal example shows how equal human rights can be overridden by vulgar consequentialism, while technology escalates inequalities across the board [83]. For these reasons, it is crucial that the discussion about AI is not restricted to a narrow ethical perspective.

The successful application of consequentialism within AI ethics is further hindered by cognitive biases recognized in moral psychology. One of these is the doctrine of double effect (DDE) [121, 165].Footnote 9 People consider killing someone to save another person more morally acceptable when it is done by pushing a button or pulling the rope of a guillotine (that is, indirectly or instrumentally), compared to directly strangling someone with their “own hands”. The consequences of the action are identical (i.e., someone dies). We are susceptible to the doctrine of double effect when we consider self-driving cars in the folk ethics context. Double effect research findings suggest that we do not realize the gravity of the ethical issue when a device or a gadget (e.g., a car) is part of the chain of causation, because it dilutes our perceptions regarding personal agency as a part of the events in question [100, 126]. In a sense, the general human tendency towards the DDE is a form of perceptual bias in the area of moral perception.

When a person drives a car and ends up in a fatal accident, figuring out who is responsible is a comparatively easy process. If the driver caused the accident, they can be punished within some boundary conditions. In such a case, third parties find the punishment natural and consistent with the general moral views prevalent in our society. On the other hand, if the car is autonomous, the matter of allocating responsibility becomes more complicated. It seems that responsibility would be distributed over a network of actors, whose status and contribution to the event is somewhat unclear. This could result in a “responsibility gap”. Does the responsibility lay with the car itself, the programmer of the car’s driving algorithms, the manufacturer of the car, or society more generally? Who should be punished in such a situation? Probably the upper management of the company that produced the car would not be held morally responsible. Quite likely the company that developed the car will be held at most only partially responsible and let off the hook with a fine for negligence. This, in turn, might lead to valuing human lives increasingly in economic terms, the less privileged facing a bigger risk of losing their lives, and the family and friends of the deceased having a lasting sense of injustice and a lack of closure [48].

With the spread of autonomous vehicles, publicly funded roads might become a kind of test laboratory for private companies, on which the consequences of accidents can simply be bought off. In the worst case, vulgar consequentialism might lead to a situation in which, when we enter a public space, we involuntarily become product testers for new technology—not unlike crowd sourced crash test dummies. New technology might be developed in the name of consequentialism and escorted with the promise of increasing safety, while actually motivated by profit or control (the latter most obvious in the case of China). This would in turn lead to rationalizations on why compromising human rights is justified. No university ethical review board would approve such a large-scale pseudoscientific experiment, but the general atmosphere appears to be more permissive with respect to corporate-based R&D.

Recent research suggests that the above worries are not ungrounded. Studies have shown that people prefer self-driving cars to function according to simple consequentialist principles, that is in a way which maximizes the number of human lives saved, even if it meant sacrificing the passengers [7, 19]. This appears to be an encouraging altruistic result. However, some of the same studies further show that people’s preferences are egoistic [19]: they prefer others to use cars following consequentialist principles, while preferring not being passengers in such cars themselves. In other words, people endorse consequentialism until they are required to sacrifice themselves for the benefit of others, in which case even vulgar folk consequentialism becomes too high a bar.

To manage the risks associated with vulgar folk consequentialism, egoism and the doctrine of double effect, we need a wide consensus and adherence to human dignity, democratically determined agreements, rights and responsibilities. There is a general need to regulate the development of technology collectively to ensure that it serves the common good and supports the rule of law. Otherwise, the risk grows in the future for that technology to become an uncontrolled profit maximizing power that forcefully instrumentalizes humans. Fortunately, the Western tradition of philosophy, theology, and political thought provides us with abundant resources to counter the effects of vulgar consequentialism. Governance by the rule of law, where an individual has absolute value that cannot be transformed into money or utility, is a key institution for the development of ethical technology.

8 Values beyond safety

Safety is a major theme of AI ethics. However, aiming towards safety can also predispose us to biases stemming from our everyday thinking. Safety is a moral principle among others and its prioritization might lead to deeply problematic outcomes. It seems that when AI is marketed with themes of increased safety and fear, the perceived threat often stems from other people—we will elaborate on this below.

The 2002 film Minority Report portrays an information-processing system which can predict possible murders and other crimes. The technology is used to prevent all crimes in advance and the world is perfectly safe, superficially. A world, where no wrong is ever made appears highly desirable and good. The film explores the problem of false positives: many people are preemptively jailed even when they never actually committed a single crime. In principle, each of us is a potential criminal. Likewise, no one would commit crimes if, as a precaution, everyone was put in permanent solitary confinement. However, we have no way of knowing if a crime would have necessarily been committed, until it was actually committed. We can only estimate the probabilities of some event taking place. There are many activities which superficially resemble the preparation of a crime, while they are actually absolutely harmless (e.g., the growing of vegetables under a heat lampFootnote 10).

In light of research in personality and moral psychology, people can be characterized along a continuum (broadly speaking) as: open-minded and adventurous, or cautious and conscientious [3, 8, 21, 31, 43, 49, 57, 80, 113, 120, 133, 135, 155, 187]. Open-minded and adventurous people form a minority in our society, but they produce the majority of our new ideas, inventions and innovations. Cautious and conscientious people, on the other hand, are responsible for the functioning and maintenance of society and institutions. These personality clusters are also aligned with different views on everyday moral decisions [34]. Open-minded and adventurous people tend to view moral acts and societies primarily through the lenses of (i) fairness, and (ii) whether such actions or political decisions cause harm to someone. Cautious and conscientious people, in turn, tend to view some other considerations as also relevant for moral judgement: they consider moral actions and political decisions through the lenses of respecting (iii) the local societal norms, (iv) the local societal authorities, or (v) values locally regarded as sacred (whatever such values happen to be; [64, 70]). Cautious and conscientious people are usually more sensitive towards feelings of disgust and fear than open-minded and adventurous people [172], and in turn people with lower sexual disgust sensitivity rely more on consequentialist thinking [99, 102].

Probably due to this predominance of the cautious personality style, many AI systems and emerging technologies are marketed with overtones of fear and safety [15, 26, 59, 76, 129, 186]. These technologies include face recognition algorithms for automated surveillance and profiling [140]. Profiling data in itself was deemed unfair evidence in court: the defendant would be seen responsible for the crimes of others and not their own, but such technology does make certain people more likely law enforcement targets.Footnote 11 Another such technology, already implemented in the US, predicts in which neighborhoods a crime is most likely to occur.Footnote 12 Police forces are channeled into the neighborhood, leading to self-fulfilling prophecies not unlike in Minority Report: the police strive to apprehend anyone who might have cannabis in their pocket, because such crimes are found to be more frequent in that area, because the police successfully apprehends people with cannabis in their pockets.

Without education, people will have a hard time understanding AI ethics and the relevant sociological issues. Populations unfamiliar with the properties of AI systems can easily be persuaded to, “for their own safety”, implement technologies which, actually, might erode the very prerequisites of constitutional democracy [45].

9 Our crimes are not really crimes, but theirs are

The evolution of our moral and social mind took place in the context of relatively small and competitively antagonistic social groups [22, 162]. This is considered to be at least a part of the explanation for the unique and throughout social nature of the human mind [169]. One of the mental tendencies stemming from this social origin is the human tendency to split people into in-groups and out-groups [52]. Individuals categorized in the in-group are perceived as more valuable than those categorized as out-group members. We could simply call this the us and them bias; and it too distorts the discussions taking place around AI policy. Algorithmic profiling is one application, where this bias becomes readily apparent. AI makes possible novel ways for detecting crime such as through information tracking. Certain kinds of crimes are associated with certain kinds of social groups. People are likely to use stronger measures to deter and punish crimes associated with an out-group, and conversely be more cautious and lenient when considering measures to track crimes perceived to be more common within their in-group [112, 117].

The profiling system for preventing drug related crimes, introduced above, is less capable of preventing financial crime, which, on the whole, leads to significantly higher costs and damage to society. Financial crime is not intuitively perceived as a societal security risk, even when it detracts from the financial foundation of society, and in the long term causes higher rates of alcoholism, family violence, and suicides [45]. There is at least one recent example of applying AI algorithms to counter tax evasion and other financial crimes within the EU [173, 174]. When the algorithm was ready, it was run through the tax records of the state. The goal was to extract thousands of names for observation and investigation, but the Euro-group and troikaFootnote 13 obstructed the execution of this program [172, 173]. It is unlikely that similar EU governance level objections would be raised if big data was used to counter crime, such as local drug use. To raise a potential example, in theory it would be possible to create a database which combines individuals’ musical preferences to their travel data and the chemical analyses of the wastewater in their residential area to predict drug use.

One of the common stereotypes of a drug user is that of a poor person who is in risk of becoming a pariah [61, 185]. However, international research shows that most people who have used drugs are never caught, and their drug use has no identifiable negative effects on the rest of society [132]. Illegal drugs are used by the well-off members of society, such as physicians, lawyers, clinical psychologists, university professors, IT entrepreneurs, physicists, primary school teachers, and social workers [132]. However, different demographics tend to use different drugs or commit specific drug related crimes in particular [137]. If it suited the intentions of the government, the lives of thousands of citizens associated with drug related crime could be made more difficult through the analysis of the above-mentioned sensitive information.

These attempts to apply AI to crime prevention clearly show the us and them bias: the in-group and its proclivities are held in higher regard and are more worthy than those of the out-group (see [175]). This bias becomes more pronounced when AI is used to fight crime committed by people who do not belong to the same in-group as the policymakers, most obviously the poor and the immigrants.Footnote 14 Most of us tend to assess the justifications of moral actions based on whether such actions adhere to the values and norms of our own native culture, and we might be ready to support the implementation of profiling algorithms in the fight against crime. People consistently find it difficult to imagine that they themselves are a part of some out-group and would thus be the target of surveillance technology. Implementing new “security providing” technologies is seen as a great idea—until the technology is turned against oneself.

A surveillance system capable of ethnic or other profiling can easily be calibrated to target new groups. Does it lead to a better world if human action can be surveyed and controlled so precisely? It might appear obvious that in a democratic society more efficient law enforcement is beneficial, but the downside to this would be that beneficial social functions which interact with illegal activities would have to adapt to new circumstances. As law enforcement becomes more efficient, so too the laws need to become more nuanced. The moral and ethical advances of society are often marked by innovations occurring in the “moral gray area” [27]. Two drug-assisted treatments have recently been granted the Breakthrough Therapy designation by the US Food and Drug Administration (FDA). FDA assessed two Schedule I substances to have significant potential in treating several severe mental health disorders.Footnote 15 These treatments would unlikely have been developed without the dedicated advocacy of activists belonging to socially marginalized groups [187]. To conclude this section, while we examined profiling algorithms solely from an us-them perspective, this is not to say that such technologies are not fraught with ethical problems related to privacy and other human rights.

10 Egocentric teleology bias

Here we introduce a concept that has only recently surfaced in the cognitive bias literature [146]. In essence, egocentric teleology is an amalgamation of several folk theories that neatly encapsulates several themes under the same umbrella. It has two parts, which we will cover in order.

Egocentricity refers to our tendency to view the world from our own perspectives. Everyone has an intuitive sense of themselves being quite a complex and in some way unique agent; furthermore, we tend to feel that we personally are somehow special and exceptional when compared to other humans [146]. In feeling special, we also as if the surrounding world was somehow there personally for us, and adheres to our personal wishes and needs.

Teleology (from Greek telos, “end”, and logos, “reason”) is traditionally described as the explanation by reference to purpose, end, goal or function. Humans often have teleological reflections on the behaviour of in nature, and also see themselves as pursuing ends and goals. Some experimental studies show that children seem to project function and design to the natural world [46, 90]. In these studies, preschool-age children attribute, for example, goal-directed actions to tigers, icebergs and rocks. Even PhD-level physical scientists and their work are not immune to intuitive and teleological explanations of natural phenomena [92, 93].

This egocentric view is also reflected in the teleological perspective according to which objects, whether human-made or natural, are designed for their appropriate human goals and through this design the objects play a valuable function, which in turn makes those objects good. This view is suggested to be a generalization from the intuition that some objects or phenomena appear to serve a particular function and are too complex to exist by pure chance [146]. Without scientific knowledge, the human eye seems to be finely suited for seeing and too complex to have come through any process but design. By extrapolation, chairs are designed for sitting, as are large flat rocks; and spears for hunting, as are small sharp rocks. Everything in the world appears to have a purpose. Even events and circumstances are what they are because they happened for our benefit and lead up to our well being, since they are designed for us by some higher power [146]. Similarly, more complex technologies appear to have come about, to be designed, for their own designated purposes—“otherwise how could something so unlikely have come about”. When it comes to technologies used for apparently benign purposes, the implicit assumption is that due to their design and purpose, they cannot be the tools ultimately causing adversity.

The egocentric teleology bias has had a part in shaping human history: for example, other animal species have been viewed to exist for the sole purpose of human exploitation, either as food or auxiliary labour [146]. The egocentricity bias can be a part of a complete worldview, where the whole of reality (and especially other animals and plants) is perceived to exist for humans and to be at human disposal. Closer to home, the egocentric bias can be observed in the general human tendency to regard themselves as goal-oriented, complex, and good. By proxy of “knowing” their own goodness, all groups in which a person belongs are thus also, in some sense, good. A good person cannot belong to a bad group, “and even if others can end up in bad company, I surely cannot”. This egocentric tendency has time and time again led to a phenomenon called dehumanization [74, 75, 180] slaves are not human, and Tutsis are cockroaches (sic; [152]).

We tend to see all complicated and superficially goal-oriented technologies as ethically less problematic than they might actually be, in part due to the teleology bias. Smartphones, cars, televisions, nuclear power stations, profiling algorithms, and, for example, care robots might all be intuitively perceived as morally neutral, or even as progressive and morally good. We usually do not stop to reflect on cars and smartphones in themselves as morally relevant objects even if their use and influence on the world might be analyzed from a moral standpoint. In actuality, technological tools cannot be disentangled from moral perceptions: they are not produced in a moral vacuum. As many geologists have pointed out, we have now entered the Anthropocene [138], a new geological era in which Earth is actively reshaped according to short-term human needs. The whole surface of the world is being changed due to technological developments and large portions of the changes are deleterious for the long-term survival of humans on this planet. This means that the technological developments are hostile to life and thus morally questionable, if—given the results—not outright evil.

The egocentric teleology bias is one of many factors which blinds us morally to the challenges of novel technologies. We recognize robots as somewhat autonomous goal-oriented agents, while at the same time, they appear to function as if by magic. Even with their fairly primitive neural networks, robots are already too complicated for us to understand the way they function [127]. We have an intuitive sense that complex goal-oriented technologies, including robots, by the virtue of having been developed by us, are morally good, or at least neutral, when it is just as likely that they are not. We suggest that this moral optimism is in part due how we perceive ourselves: goal-oriented, complex, and good. We perceive AI technologies as extensions of and sharing our traits and are hesitant to doubt their moral value. Deliberate and careful philosophical consideration and inclusive ethical discussion is crucial to the proper maturation of complex novel technologies.

However, the egocentric teleological bias is not the only cause leading us to view technological systems as good. They are perceived as morally good also because the theoretical models on which such systems rely are perceived as “rational” and as something that accurately correspond to the structure of reality, and are thus further perceived as a viable solution to the threat posed by “The Others” (i.e., out-group members, see above). For example, some game theoretical models have been used in designing the automatization of some functions of social institutions [45]. It is crucial to realize that such models are limited to simulating human social behavior without taking into account all relevant non-social factors.

11 Biases of wish fulfillment in risk estimation

We have examined many predicted and actual cases in which AI threatens our values. Regardless of these examples, AI, as a technology, promises unprecedented wish fulfillment. We now briefly examine some of our biases in examining the moral worth of a technology of such a magnitude as AI. The argument developed here is not specific to AI but also similar vaguely defined socially influential scenarios. The scope of problems that could be solved or dreams realized with the right kind of AI technology is vast (unprecedented even), not least due to the potential of AI to act as a “meta-technology” accelerating the development of other revolutionary technologies [2]. In the best case, AI could transform our society in positive ways surpassing anything we could achieve otherwise [89]. Analogous thinking has been applied to AI as to the Christian promise of Heaven and eternal delight surpassing anything conceivable during one’s mortal existence, and in this sense, AI most resembles the category of god (already with its first ChurchFootnote 16). As with Pascal’s Wager, from a strict utility maximization perspective, pursuing the scenario with infinite expected utility is the rational choice regardless of its probability [20, 82]. Similarly, any action which promotes the development of utopian AI would appear rational regardless of the probability of achieving the desired outcome, whatever the sacrifices.

Such thinking might rightly leave professional and folk economists unconvinced [130]. However, our intuitive cognitions can be hypothesized to react with credulity to near infinite rewards and highly desirable scenarios even when their estimated likelihood is extremely slim and they border on the fantastic (i.e., people wish that magic was true with respect to AI technology). Attitudes towards objects, people, or scenarios develop in part through association rather than analytical reasoning (see [18]. Not only the truth matters but also the story. What desires could AI fulfill? Personalized and engaging education or technical training in any skillFootnote 17; crime-free society; dream fantasy worlds of adventure, excitement, sex, and fame [73], a post-scarcity economy [39], world peaceFootnote 18; near unlimited energy [89]. Encountering these and countless other plausible and less plausible ideas can add up to a general sense that AI is awesome.

Would not all the examples of dystopian consequences of AI balance out the utopian examples? Not necessarily. We tend to view technologies as either all-good or all-bad. If we encounter information which suggests higher than expected benefits of a technology, we consequently downplay the risks, and vice versa if we gain information relating to the risks. This is, of course, contrary to how higher benefits and risks are generally actually coupled: the larger the benefit, so also larger the risk [51, 161]. Even scientifically minded people remain susceptible to these biases after receiving further content or context knowledge [85]. Additionally, research suggests that we find our preferred scenarios more likely and update our beliefs asymmetrically by giving more weight to evidence in line with our wishes [84]. It is plausible that some people are cautious towards many or most AI applications, but due to being highly optimistic about a particular application which appears more salient to them personally (e.g., from the perspective of safety), are perhaps overoptimistic and insufficiently cautious about AI development in general. Science fiction hobbyism predicts more positive attitudes towards robots regardless of whether the sci-fi portrays robots in a positive (Star Trek) or negative light (Terminator) [95, 96, 100, 102]. The mere presence of extreme wish fulfillment scenarios may bias us to approach a scenario less cautiously.

The promise of high rewards increases significant risk taking. Some of the most admired people dropped out of college and college students dream of founding successful start-ups rather than completing their studies. However, the tendency to overestimate the chances of success based on the saliency of success stories is a classic case of survivorship bias.Footnote 19 This is exacerbated by our tendency to make more risky choices when we feel positive [33] (see also [67]). It might be socially beneficial that many of us attempt to achieve something in which a tiny fraction succeeds, but no such similar strategy serves us if we think how to guide society itself.

Risk taking motivated by high rewards can also be an act of omission. We might be tempted to withhold from acting in the belief that exponentially developing technology will solve our problem in a short while. The idea of rapid medical advances might decrease the motivation to develop an exercise habit; hopes of an automated society could decrease the felt need for learning marketable skills and building a reputation. Similarly, on a larger scale, emerging technologies are relied upon to solve our increasing energy demands [182] or the risks of climate change [77]. While they are surely a part of the solution, their presence on the horizon, however unlikely, can lead us to misjudge our priorities. The challenge is plotting a course to daring beneficial AI applications without relying on miracles.

12 Conclusion

Whether we want it or not, humanity has evidently entered a new era in which we face novel moral challenges. Such challenges are unprecedented in both their deep and surface structures [72]. For the first time in human history, we find ourselves in an environment in which designed lifeless and unconscious matter makes decisions which deeply affect human wellbeing. Our cognitive limitations and biases might lead the development of such technology into a direction, where its consequences are not desirable, not only in a totalitarian but also a free democratic society.

Through the shifts we are undergoing, technology increasingly functions as the mirror of our moral cognition and political systems. We can consciously guide the development of technology into a direction, where it supports our democratic systems and promotes the principles of equality and justice; or we can allow the end result to be based on market forces and innate human biases. Our moral cognition is shaped by both our emotions and by our primitive herd instincts. If we want to avoid a world shaped by the instinctual animalistic needs of our nature, we need to take into consideration the motivational forces and possible stone age biases that could inadvertently guide the development of AI, or prevent wider discussion about it.

In this text, we have described (depending on how they are counted) around 14 different features of human cognition which in part explain why discussing AI technologies and their risk factors is so difficult. The set of intertwined problems described above stems from the fact that humans evolved in an environment, where there were no AI agents, and thus we do not have innate concepts for probability and conditionality. Humans are social animals that use their intuitive and automatic cognitions to understand the social world specifically [163]. To function in their social environment, we tend to egocentrically project our humanness and conscious agency into our environment and especially on AIs. AI technology is not something that we grasp intuitively through our categorical cognition. We hope that our text gives food for thought to those who are working in relevant fields and encourages the discussion on the social risks of AI, especially when addressing the general public, and reveals potential new directions. As humans, we are facing the choice of having enlightened and cautious discussions by which we prepare to face the risks of AI consciously and guide humanity to a more democratic and egalitarian society, such as in Star Trek, or not having such civilized discussions and end up in a situation more resembling the dystopias of Blade Runner or The Matrix.