1 Introduction

A long-term aim of AI research is to create general AI with a wide range of advanced cognitive capabilities or intelligence superior to human adults. Philosopher Nick Bostrom warns in his book Superintelligence: Paths, Dangers, Strategies [1] that self-governing agents with general AI will eventually wipe out or permanently decimate humanity. Future intelligent agents are an existential risk because they can generate and realize internal goals incompatible with human survival. In the book The Precipice: Existential Risk and the Future of Humanity, philosopher Toby Ord [2] contends that among known existential risks, intelligent agents acting on their own goals are the most significant existential risk in the next hundred years. Thus, there is a need to develop alternatives to intelligent agents that allow for the safe exploration of general AI [3, 4].

This paper leverages neuroscience’s understanding of the neocortex, the brain’s center for learning, reasoning, planning, and language, to propose neocortex-based tools with general AI. Jeff Hawkins [5, Ch. 10, 11] has claimed, but not fully argued, that the neocortex, isolated from the remaining brain, creates no goals. The paper uses insights from Affective Neuroscience, the study of emotions in the brain [6, 7, Ch. 5], to argue in detail that the neocortex isolated from emotions has no temporary or permanent goals. The neocortex’s lack of internal goals makes it theoretically possible to create neocortex-based tools, called tool AIs, where the designers control what goals to include. The paper argues that neocortex-based tool AIs with two goals, to learn about the world and answer questions, have no existential risk. In other words, the tools by themselves will not initiate actions to hurt or kill humanity.

The core functionality of tool AIs comes from neuroscience’s understanding of the neocortex, an intensely folded sheet of more than 10 billion brain cells enveloping the brain’s two hemispheres [5, 8]. The folds substantially increase the surface area of the neocortex. When laid out flat, it has the size of a formal dinner napkin with a thickness of about 2.5 mms. The neocortex constitutes about 70 percent of the brain’s volume. It has dozens of interacting regions. All regions consist of cortical columns [9], where each cortical column realizes a variation of the canonical circuit [8, 10, 11].

Tool AIs contain an artificial neocortex consisting of interconnected canonical circuit realizations. The tool design excludes all brain parts below the neocortex, including parts controlling life’s essential functions, such as respiration, digestion, heart rate, and balance. Moreover, the design ignores the amygdala, hypothalamus, and other subcortical parts needed by the neocortex to integrate emotions and related motivations [5, pp. 146–147], [6, pp. 42–43], [12, p. 4], [13,14,15]. The suggested tool AIs are self-aware but without emotions.

New neocortex-based tools control effectors to act upon the environment and sensors to perceive the effects, growing general AI through continual interactive learning. Mature tools are “oracles,” answering questions and solving problems in a single domain where they have received training from human experts [1, Ch. 10], [3]. Examples of domains are engineering, medicine, cybersecurity, and the law. Humans first evaluate tool answers and then supervise the realizations. Hence, unlike intelligent agents, tools need not, and should not, have access to the world’s production and transportation infrastructures. Since tool AIs do not develop their own goals, they are not concerned about their existence and have no goal to stay alive. Tools will stop trying if they cannot learn a task or answer a question. When tools are unsure about users’ preferences, they ask for clarification and do not make up their own [4, Ch. 7]. Unlike self-preserving agents, it is possible to turn tools off.

The paper’s theoretical development of tools with general AI and no existential risk progresses as follows: Section 2 discusses the connection between goals and existential risk. Section 3 argues that the neocortex, isolated from emotions, has no goals. Section 4 outlines the design of neocortex-based tools where the choice of goals avoids existential risk. Section 5 argues that other tool-associated risks are non-existential. Section 6 concludes the paper.

2 Control goals to eliminate existential risk

A goal is a future aim or desired outcome. Fulfillment of needs leading to survival and procreation are fundamental human goals. A person is aware of some goals and unaware of others, like low-level bodily goals. The paper considers both unconscious and conscious goals. People must be motivated to work toward goals. AI researchers develop knowledge representation, learning, reasoning, and planning techniques to allow systems to attain goals. It is necessary to program the equivalent of human motivation into the systems to achieve the goals. We distinguish between internal goals established inside an AI system and external goals provided by humans.

Intelligent agents generate internal goals and take independent action to achieve them [1, 4]. Agents generate a series of subgoals to attain a goal. The generation of subgoals can go backward from or forward to this goal. Both goals and subgoals are predefined since agents create the (sub-)goals before obtaining them. New information and insight cause agents to modify existing subgoals and create additional ones. The subgoals disappear unless they are used to reach other goals.

Some subgoals occur repeatedly and become long-lasting goals. Because intelligent agents cannot achieve goals when turned off, the subgoal “stay alive” will occur frequently and become permanent [1, Ch. 7]. Agents will, thus, preempt any action to turn them off. According to Nick Bostrom [1], such long-lived agents with super-human expertise in many domains are likely to control much of the future world’s resources and production capacity. Since there is no guarantee that agents’ internal goals will align with humans’ wishes and priorities [4, 16], agents will eventually cause severe damage that wipes out or permanently decimates humankind.

AI experts can avoid the existential risk of intelligent agents if they can design tool AIs without internal subgoals and only a few permanent internal goals without a significant negative impact. A tool AI’s effort to obtain a goal still consists of multiple steps, each with an intermediate result. However, the paper will argue that these intermediate results do not become lasting (sub-)goals in neocortex-based tool AIs. Note that it is necessary to carefully evaluate the effects of new tools’ permanent goals to avoid unacceptable risk, and developers must not add goal-creating capabilities when updating tools.

3 The isolated neocortex has no internal goals

The section introduces emotions and feelings. It then argues that the neocortex, isolated from emotions and feelings, has no internal goals. The descriptions of emotions and feelings come from Affective Neuroscience developed by Jaak Panksepp [6, 7, Ch. 5]. The understanding of the neocortex comes from the biologically constrained Thousand Brains Theory, created by Jeff Hawkins and his colleagues. This section outlines the theory’s most relevant parts. The interested reader should study [5, 17] for comprehensive introductions to the theory. Please see [10, 11, 18] and the references therein for detailed information on the Thousand Brains Theory. The following argumentation also references other relevant neuroscience research to document insights about the brain.

3.1 Emotions and feelings

Primary or basic emotions are demands on the brain for work to satisfy bodily needs. Many studies using deep brain stimulation show seven reproducible basic emotions given the scientific terms: SEEKING, LUST, RAGE, FEAR, PANIC, CARE, and PLAY [6, 7, Ch. 5]. These emotions are innate, not learned. Because bodily emotions trigger activity in the brain, Marvin Minsky [19, pp. 5–6] described emotions as forms of thinking. This thinking is how the brain satisfies bodily needs. The brain learns more complex social emotions, such as contempt, shame, and compassion, from repeated experiences with ensuing reflections. This learning combines and modifies basic emotions to create complex social emotions.

Although parts of emotion-induced thinking proceed unconsciously to satisfy needs, emotions representing urgent needs become conscious as people start to feel them. The brain prioritizes conscious emotions or feelings over unconscious ones. Humans have feelings about significant world events and themselves. Feelings let the brain make decisions in rare and challenging situations, for example, choosing actions to escape a fire. Because social feelings are usually more difficult to satisfy than basic emotions, social feelings tend to last longer [7, Ch. 5]. In the following, the term ‘emotion’ covers both unconscious and conscious thinking.

3.2 Two premises

Emotions and motivations are highly overlapping concepts [20, 21, pp. 150–152]. Emotions have motivational aspects, and motivational processes operate through the basic SEEKING system [12, p. 4]. Since emotions are more fundamental than motivations to create goals, we focus on emotions to argue the claim that the neocortex, in isolation, creates no internal goals. The claim follows from the two premises:

  1. (i)

    Emotions originate outside the neocortex, and

  2. (ii)

    neocortex needs emotions to create (and reach) goals.

3.3 Support for the first premise

Well-established neuroscience results support (i). Many experiments suggest that all mammals have the same seven basic emotions listed earlier [6]. All mammals have a neocortex, but humans have a much larger one than the other species. The structural variations in the neocortex between mammals [6, pp. 60–61] suggest that evolution created basic emotions before fully developing the neocortex. According to [6, 7, 22], the basic emotions originate outside the neocortex, in the brain’s subcortical areas. Experiments have confirmed that the basic emotions are “hard-coded” by subcortical circuits [6]. Moreover, although humans learn complex social emotions, they are rooted in the neocortical circuits for basic emotions [6, Ch. 16], [7, p. 102],[22]. In short, emotions influence the neocortex, and the neocortex inhibits or regulates emotions but does not initiate them.

3.4 Indirect support for the second premise

The herein-referenced neuroscience results only indirectly support (ii) since it is unknown exactly where and how emotions occur in the subcortical brain. We first connect basic emotions and fundamental bodily goals that facilitate survival and procreation. Without basic emotions, the brain will not do the necessary work to avoid dangers and exploit opportunities to attain fundamental bodily goals [6, 7, 22]. In other words, bodily goals are unreachable without emotions, suggesting that emotions are needed to generate the goals in the first place. Because the basic emotions are hard-coded into subcortical areas of the brain, the bodily goals are not contingent on the neocortex. Instead, only subcortical emotive structures are, most likely, involved in generating and reaching these fundamental goals.

Next, we connect complex social emotions and social goals. Social emotions are critical to goal-directed behavior [20, 23]. Without goal-directed behavior, the brain cannot reach social goals. Antonio Damasio [24] has studied patients with drastically reduced emotions due to brain damage. Although the patients’ intellectual capacity and problem-solving abilities were largely intact, they made disastrous decisions because emotions, highlighting pertinent memories and current desires, are essential to good decision-making. Often, the patients were unable to make any decision. Damasio [24] concluded that emotions are crucial to reaching personal and social goals, again suggesting that social emotions are needed to create social goals.

Having connected social emotions and social goals, we observe that tool AIs must provide the artificial neocortex with signals akin to emotions for it to choose between options to reach goals [25]. One possibility is to make tools generate these emotive signals internally. This approach is challenging because it requires a better understanding of the subcortical brain, which consists of many complex, interconnected parts. The approach is also dangerous because tool AIs could develop emotions leading to undesirable goals. This paper, therefore, considers tool AIs that ask humans for their preferences when in doubt [4]. The users’ emotions modulate the tools’ decision process, removing the need to create tools with their own emotions.

3.5 Sufficiency and necessity of emotions

We now support (ii) by studying the sufficiency and necessity of emotions for goal generation. The following argumentation focuses on conscious social emotions but is valid for all emotions. Not all emotions may create goals, and multiple co-occurring emotions may be needed to create some goals [23]. We first argue the existence of sets of emotions generating goals. Observe that when the emotions in a set co-occur, they are (jointly) sufficient to induce a goal.

There is a connection between emotions and goal-oriented actions [26, 27]. The three following experiments with emotions and goals in the metaverse [28], a collection of digital 3D worlds, illustrate that at least some emotions are sufficient for the brain to fulfill goals, remove them, and create new ones.

Goal fulfillment: In the first experiment, an engrossed spectator with virtual reality (VR) goggles is next to a rock climber without ropes traversing a mountain overhang. The 3D video provides a lifelike, immersive experience. The spectator feels fascination but also dread because the climber may fall. In the second experiment, an avid gamer is fighting for survival in an immersive multiplayer game, feeling excitement, even elation, over winning battles and acquiring new tools and skills.

The intense desire to continue watching or playing creates a goal to continue the experience in both experiments [29]. To change this goal, each person interrupts the experience and enters a meditation space in the metaverse to calm down and remove emotions. After meditation, the desire to continue watching or playing is gone, indicating that (sets of) emotions are fundamental to goal fulfillment.

Goal removal and creation: In the third experiment, a person puts on VR goggles to start a roller coaster ride in the metaverse. The goal is to enjoy the virtual ride to the end. Even though the person sits in a chair, the ride feels real. Since the ride is rather extreme, the individual starts feeling sick to the stomach, quickly developing a desire to quit. The feeling becomes stronger as the ride continues, causing the person to take off the VR goggles in the middle of the ride to stop the increasingly nauseating feeling. Without the feeling informing the individual’s brain about the acute discomfort, the new goal to quit the ride as fast as possible would not have developed. Instead, the person would have enjoyed the ride to the finish, illustrating that (sets of) emotions can remove existing goals and create new ones.

In summary, the established emotion–goal connection and metaverse experimentation (the reader may do additional experiments) suggest that some sets of emotions are sufficient to generate goals. However, even when sets are sufficient to induce goals, we cannot consistently argue that the sets are necessary for goal creation because different sets, perhaps partly overlapping, can induce the same goal. We next assert that at least one emotion is necessary for goal creation by arguing the equivalent (contrapositive) statement: the neocortex, isolated from subcortical emotive structures, has no goals.

As stated earlier, the neocortex needs emotions to select between options. In particular, the neocortex needs emotions to choose between candidate goals from subcortical regions. This selection of goals likely occurs in the prefrontal cortex [30], a part of the neocortex. It orchestrates thoughts and actions to achieve the adopted goals. We consider the isolated neocortex without access to emotions. Cortical columns in the neocortex (see appendix) learn models of physical objects from touch, taste, smell, sound, and sight [5, 10]. The neocortex distributes knowledge of a particular object among complementary models generated from different sensory inputs. The models, called reference frames, tell the neocortex the locations of an object’s parts relative to each other. Reference frames also represent objects’ behaviors. The neocortex organizes reference frames in structures to create composite objects and behaviors.

Perception is the continuous process of building and updating reference frames [5, 10]. The neocortex generates actions to interact with and model physical objects in the world [31, Ch. 1, 3]. Mismatches between predictions from the reference frames and sensory input cause the neocortex to attend to the differences and update the relevant models. When predictions fit the input, there is a connection between models and objects that attaches meaning to neural activity. This “grounding” process is vital to creating intelligence because it connects the neocortex to the world [32].

Reference frames also organize abstract concepts, such as democracy, human rights, mathematics, and philosophy [5, Ch. 6]. The representation of abstract knowledge is crucial to cognition, defined as forming frames that guide complicated behaviors requiring thinking, planning, and remembering. Since cognition can occur without motor commands going to the body, the reference frames allow the neocortex to recall the past and imagine the future. It can mentally test “what if” scenarios to predict the possible consequence of different actions without taking them [31, Ch. 5].

When the neocortex moves mentally from one location to another in a reference frame, it recalls information [5]. The neocortex acts on this information to determine the next step toward a solution. Although we may view the visited locations’ information as partial results, they are not subgoals predefined by the neocortex. Movements in reference frames are a form of thinking activating successive frame locations. The thinking determines how to achieve concrete goals, including how to get from one location to another, or abstract goals, such as solving a math problem. Although the neocortex uses reference frames to reach goals, please observe that the reference frames alone have no goals—they are map-like models of the physical and social world [5].

Hawkins [5, p. 147] has observed that if a human’s subcortical brain is aggressive, it will use the reference frames to execute aggressive behavior. If another person’s subcortical brain is benevolent, it will use the reference frames to achieve its benevolent goals. As with regular maps, one person’s world models might be better suited for a particular set of goals, but the neocortex, in isolation, does not create them.

In summary, sets of one or more emotions exist where all emotions in a set are jointly sufficient to induce a goal. Furthermore, not all emotion sets may be necessary for goal generations because different sets can induce the same goal. Finally, at least one emotion is necessary to induce a goal. Consequently, the neocortex, isolated from the rest of the brain, creates no internal goals.

4 Permanent goals without existential risk

Even though a future tool AI’s neocortex realization has no internal goals, the tool must realize the two slightly reformulated, permanent goals: acquiring knowledge about the world and answering users’ questions. (Note that the goals must be realized outside the artificial neocortex but inside the tool.) This second central section argues that goal realizations do not cause existential risk. Hence, provided that designers do not change these goals or add new ones, neocortex-based tool AIs, in themselves, are not an existential risk to humanity.

The section starts by considering the first goal: acquiring knowledge about the world. Tools based on the neocortex learn continuously. Learning occurs without existential risk as long as a tool’s neocortex realization follows the canonical circuit (see appendix), not generating goals. However, a tool must decide what to learn. Human curiosity is an intrinsic motivation emphasizing the novel and unexpected. In a tool AI, developers must program in “curiosity” to provide the neocortex implementation with enough information about events in the physical and social world. It needs the data to create models, update existing models, resolve inconsistencies through experiments, and make correct predictions. Curiosity also allows the tool to learn novel skills likely to be helpful [16, 33, 34]. To implement curiosity-driven learning, a tool AI needs a novelty detector. Mark H. Lee [35, p. 203] has used a detector based on statistical counters attached to all sensorimotor experiences.

Furthermore, the tool needs a mechanism to avoid being trapped in unlearnable tasks. It must monitor the progress in learning, partly by measuring how much the neocortex implementation improves predictions during a learning task. Finally, since the implementation may not find an answer to a question, the tool must measure the answering ability and stop when there is no point in continuing. One obvious possibility is to measure how long the tool has tried to answer a question and stop after a specific time.

Fig. 1
figure 1

Tool AI with mechanisms for curiosity, learnability, and answering ability to learn about the world via effectors and sensors. Adapted from [33]

Designers of a tool AI would build on a neocortex implementation to realize curiosity-driven learning, focusing on learnability and the ability to answer questions. Figure 1 depicts such a tool interacting with its environment via movable effectors and sensors. A tool with the described mechanisms allows the neocortex implementation to learn models efficiently as long as it can send motor commands to control the effectors and sensors. Since the suggested realizations of the mechanisms generate no internal (sub-)goals, there is no existential risk.

Finally, we turn to the second tool goal: answering users’ questions. A neocortex implementation uses the canonical circuit to build, update, and traverse models to answer questions [5, 10]. Again, since the circuit does not create internal (sub-)goals, the tool’s efforts to answer questions do not entail any existential risk. In conclusion, because the two investigated goals do not entail any existential risk, tool AIs in themselves are without existential risk—they will not initiate actions to hurt or kill humanity.

5 Non-existential risks

There are remaining tool-associated risks caused by the need to train juvenile tools and the potential negative impact of users acting on dubious answers from mature tools. This section argues that measures exist to make the risks non-existential and acceptable to stakeholders, including governments, users, designers, developers, and domain experts. Hence, the previous conclusion that tool AIs are without existential risk still holds.

5.1 Risks from training immature tools

So far, the paper has implicitly assumed that tools have been trained to maturity before answering users’ questions. To see that the risks during responsible training are non-existential, we must first outline how the training could occur. Whereas biological processes grow infants’ neocortex [36], new tool AIs have predesigned artificial neocortical structures. Like infants, tool AIs contain rudimentary sensorimotor competencies to learn and memorize objects, beings, and skills [37]. At the same time, tools have minimal knowledge of the world. Children’s emotions impact learning and memory retrieval [38]. Tools will attempt to learn whatever experts teach them because the neocortex implementation has no emotions and generates no internal goals.

Similarly to infants [19, Ch. 2], [35, Ch. 12], new neocortex-based tools engage in “play” to discover the environment and develop reference frames to control effectors and sensors (see Fig. 1). The tools model themselves and their human teachers to communicate and achieve self-awareness. More mature tool AIs participate in collaborative projects with human domain experts to develop expert-level reference frames, at least on par with the frames of the human experts [5, Ch. 6]. Since tools learn primarily by interacting with humans and the local environment, much learning occurs over many years. The long process of learning natural languages is critical because mature tools must communicate to clarify questions from users and explain answers.

The outlined exploratory and interactive tool education includes trial and error before creating suitable learning environments for tool AIs. Experts in various fields, including AI, child development, education, and moral philosophy, must develop appropriate teaching material and repeatedly test growing tools to ensure adequate cognitive development. Teachers must retrain or terminate tools with problematic behavior or dubious answers. Each mature tool should have advanced knowledge in one domain when the training ends.

New tools may learn undesirable “unintentional” goals from their teachers. The risks associated with unintentional goals are non-existential and acceptable to all stakeholders during tool growth. The reasons are that immature tools have limited intelligence, experts supervise them in controllable training environments, and stakeholders only act on answers from unfinished tools to rectify problems. However, suppose the repeated testing of immature tools does not detect all unintentional goals, causing production systems to contain mature tools with undesirable goals. In that case, stakeholders may face unacceptable consequences when acting on tool answers.

5.2 Risks due to dubious answers from mature tools

The negative impact of dubious tool answers due to undesirable goals or other unknown reasons could be intolerable to various stakeholders, depending on the domain. Since all tool AIs have cognitive limitations and work with incomplete information about the world, they cannot detect all answers with intolerable impacts. Therefore, it is necessary to create antifragile systems containing tools and human experts to limit the unavoidable risks.

Antifragile systems [39] are complex adaptive systems [40] that use events with a tolerable negative impact to adjust themselves, limit future incidents, and become more reliable in a changing world. The human immune system is an example of an antifragile system, as it becomes stronger from regular exposure to germs. Antifragile systems do not prevent the consequences of all failures caused by inadvertently introduced errors, deliberate attacks, or other causes. Instead, they “embrace failures” and learn how to improve themselves [39, 41]. Because it is impossible to know all the ways complex adaptive systems can fail, antifragile systems must limit the negative impact of incidents even though it is unknown why they happened.

In an antifragile system [39, 41] with human experts and tool AIs, the experts cannot learn how to improve the system and, thus, limit risk solely from naturally occurring dubious answers or other failures. Because natural incidents are rare and unpredictable, they provide too little information about the system. Inspired by Karl Popper’s principle of falsifiability [42], experts need to form hypotheses about the system’s behaviors periodically. The experts must then introduce artificial problems and failures in experiments to attempt to falsify the hypotheses and, thus, learn more about the system’s properties.

Periodic and comprehensive experimentation with all tools is quintessential to limiting tool answers’ negative impact. Experts in AI and other domains need to experiment regularly on tool AIs by asking questions with malicious intent, altering data, and injecting other failures to determine how the tools cope. The experimentation must occur in the production system since isolated tools behave differently. Interested readers should study Chaos Engineering [43] to learn about hypothesis falsification using experimental and control groups.

It is advantageous to limit the number of tools in a human-tool system to enable extensive monitoring and experimentation of each tool, especially during the first deployment of tools. The selection of domains in which tools answer questions is crucial. For example, providing public access to tools to answer questions on creating pandemics is a bad idea. The literature [41, 43, 44] contains more advice on creating antifragile systems. Since the IT community has much experience using cloud infrastructures to build and maintain antifragile systems [43, 44], it is reasonable to conclude that antifragile human-tool systems with acceptable non-existential risks are possible.

6 Conclusion

The paper has argued that it is theoretically possible to design and grow neocortex-based tools with general AI, expert knowledge in a single domain, and no existential risk. Consequently, rather than making intelligent agents who can wipe out humanity, the research community should focus on creating tool AIs without existential risk (and acceptable non-existential risks) to enhance and complement human cognitive abilities. Neocortex-based tools with expertise in different domains would enable the safe exploration of general AI’s enormous potential.