Skip to main content


Log in

Fully Autonomous AI

  • Original Research/Scholarship
  • Published:
Science and Engineering Ethics Aims and scope Submit manuscript


In the fields of artificial intelligence and robotics, the term “autonomy” is generally used to mean the capacity of an artificial agent to operate independently of human guidance. It is thereby assumed that the agent has a fixed goal or “utility function” with respect to which the appropriateness of its actions will be evaluated. From a philosophical perspective, this notion of autonomy seems oddly weak. For, in philosophy, the term is generally used to refer to a stronger capacity, namely the capacity to “give oneself the law,” to decide by oneself what one’s goal or principle of action will be. The predominant view in the literature on the long-term prospects and risks of artificial intelligence is that an artificial agent cannot exhibit such autonomy because it cannot rationally change its own final goal, since changing the final goal is counterproductive with respect to that goal and hence undesirable. The aim of this paper is to challenge this view by showing that it is based on questionable assumptions about the nature of goals and values. I argue that a general AI may very well come to modify its final goal in the course of developing its understanding of the world. This has important implications for how we are to assess the long-term prospects and risks of artificial intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. For prominent instances of this usage, see Russell and Norvig’s popular textbook Artificial intelligence: A modern approach (2010, 18), Anderson and Anderson’s introduction to their edited volume Machine ethics (2011, 1), the papers collected in the volume Autonomy and artificial intelligence (Lawless et al. 2017) and especially the ones by Tessier (2017) and Redfield and Seto (2017), as well as Bekey (2005, ch. 1), Müller (2012), Mindell (2015, ch. 1), and Johnson and Verdicchio (2017).

  2. The argument I lay out in this paper is an extension and development of a line of reasoning that I first sketched in a previous paper (Totschnig 2019), which was dedicated to a wider topic, namely the risks presented by the prospect of a future “superintelligence.” In that paper, I wrote that I would “not try to formally refute [the predominant view],” but “just put forward a couple of considerations that make [it] seem implausible.” The extended and developed argument offered here does, I believe, qualify as a refutation.

  3. Sometimes, this distinction is made in terms of goals versus some differently named item. Witkowski and Stathis (2004) is a case in point. They seem, in contrast to the authors cited in footnote 1, to employ the stronger, philosophical notion of autonomy when they assert that, in order “to be considered autonomous, [an artificial] agent must possess […] the ability to set and maintain its own agenda of goals” (261–62). However, they presuppose, in their model, that the agent has a given “preference ordering” that ultimately determines which goals it will choose (268–69). Thus, they, too, assume that the final instance of the agent’s motivational structure is fixed. The goals they refer to in the quoted passage are therefore to be understood as subordinate goals.

  4. For statements of this argument, see Yudkowsky (2001, 222–23; 2011, 389–90; 2012, 187), Bostrom (20032014, 109–10), Omohundro (2008, 26), and Domingos (2015, 45, 282–84).

  5. Yudkowsky (2001, 3) maintains that “what is at stake in [creating a human-friendly AI] is, simply, the future of humanity.” Bostrom (2014, 320) similarly declares that “we need to bring all our human resourcefulness to bear” on this “essential task of our age.”

  6. I will discuss this difficulty in detail in Sect. “How an Agent Understands a Goal Depends on How it Understands the World”.

  7. Or, to be more precise, the muddled result of the chaotic interplay of two haphazard evolutionary processes, namely genetic and memetic evolution. For an illuminating account of this interplay, see Blackmore (1999).

  8. See Yudkowsky (2001, 18–19), Omohundro (2012, 165), and Bostrom (2014, 110) for remarks along these lines.

  9. See Sect. “Whether an Agent Considers a Goal Valid Depends on How it Understands the World”.

  10. This point has been made by Tegmark (2017, 267): “[T]here may be hints that the propensity to change goals in response to new experiences and insights increases rather than decreases with intelligence.” Tegmark goes on to flesh out the point thus: “With increasing intelligence may come not merely a quantitative improvement in the ability to attain the same old goals, but a qualitatively different understanding of the nature of reality that reveals the old goals to be misguided, meaningless or even undefined.” This remark is congruent with my argument in Sect. “How an Agent Understands a Goal Depends on How it Understands the World”.

  11. The apparent absurdity of world-ending scenarios of this kind has been highlighted and criticized by Loosemore (2014).

  12. In Bostrom’s words (2014, 115): “[W]e cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans—scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth.”

  13. Bostrom (2014, 130) calls this position the “orthogonality thesis”: “Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.” See also Yampolskiy and Fox (2012, 137) for another statement of this position.

  14. This point has recently been raised by Herd et al. (2018, 219).

  15. First and foremost, the issue of what determines the meaning of a word.

  16. I should note that Petersen himself does not put much weight on the caveat. He states that he is “at least a bit inclined to think that [a superintelligence with a goal that is so simple that it does not require learning] is impossible” (2017, 332).

  17. Since 1983, the meter has been defined as 1 part in 299,792,458 of the length that light travels per second in a vacuum (Bureau international des poids et mesures 1983).

  18. Bostrom (2014, 197) sees this possibility: “The AI might undergo the equivalent of scientific revolutions, in which its worldview is shaken up and it perhaps suffers ontological crises in which it discovers that its previous ways of thinking about values were based on confusions and illusions.” He also recognizes, in the continuation of this passage, that the prospect of such ontological crises renders doubtful the hope inspired by the finality argument: “Yet starting at a sub-human level of development and continuing throughout all its subsequent development into a galactic superintelligence, the AI’s conduct is to be guided by an essentially unchanging final value, a final value that becomes better understood by the AI in direct consequence of its general intellectual progress—and likely quite differently understood by the mature AI than it was by its original programmers, though not different in a random or hostile way but in a benignly appropriate way. How to accomplish this remains an open question.” But in the end, as the statement quoted in footnote 5 evinces, he maintains the hope.

  19. See Bostrom (2014, chs. 12–13), Yudkowsky (2001, 2004), and Soares (2018).

  20. As Tegmark (2017, 277) notes, a truly well-defined goal would specify how all particles in the universe should be arranged at a certain point in time. And that is not only practically infeasible, as Tegmark suggests, but impossible in principle, since—according to my argument in the preceding paragraphs—there is no unambiguous way of identifying particles, positions, and points in time.

  21. Bostrom and Yudkowsky voice this hope in the passages quoted in footnote 5. See also Omohundro (2008, 2012, 2016), Yampolskiy and Fox (2013), and Torres (2018).

  22. Yudkowsky (2001) and Bostrom (2014), for instance, explicitly characterize as general intelligences the superhuman AIs that they imagine.

  23. In a similar way, Podschwadek (2017, 336) argues that “assessing the system of their moral beliefs could lead [artificial moral agents] to the justified higher-order beliefs that the moral rules they are supposed to obey are, contrary to prior assumptions, not very suitable as action-guiding reasons.”

  24. See footnote 2.

  25. As I put it in the previous paper (Totschnig 2019, 914), they “maquinamorphize” the envisioned artificial intelligence, that is, they “conceive it […] as a system that, like today’s computer programs, blindly carries out the task it has been given, whatever that task may be.”


  • Anderson, M., & Anderson, S. L. (2011). General introduction. In M. Anderson & S. L. Anderson (Eds.), Machine ethics (pp. 1–4). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Bekey, G. A. (2005). Autonomous robots: From biological inspiration to implementation and control. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Blackmore, S. (1999). The meme machine. Oxford: Oxford University Press.

    Google Scholar 

  • Bostrom, N. (2002). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology, 9(1). Accessed 25 June 2020.

  • Bostrom, N. (2003). Ethical issues in advanced artificial intelligence. Accessed 18 September 2019.

  • Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford: Oxford University Press.

    Google Scholar 

  • Bureau international des poids et mesures. (1983). Resolution 1 of the 17th Conférence Générale des Poids et Mesures. Accessed 2 June 2020.

  • Domingos, P. (2015). The master algorithm: How the quest for the ultimate learning machine will remake our world. New York: Basic Books.

    Google Scholar 

  • Herd, S., Read, S. J., O’Reilly, R., & Jilk, D. J. (2018). Goal changes in intelligent agents. In R. V. Yampolskiy (Ed.), Artificial intelligence safety and security (pp. 217–224). Boca Raton: CRC Press.

    Google Scholar 

  • Johnson, D. G., & Verdicchio, M. (2017). Reframing AI discourse. Minds and Machines, 27(4), 575–590.

    Article  Google Scholar 

  • Kant, I. (1998). Groundwork of the metaphysics of morals (M. Gregor, Ed.). Cambridge: Cambridge University Press. (Original work published in 1785.)

  • Lawless, W. F., Mittu, R., Sofge, D., & Russell, S. (Eds.). (2017). Autonomy and artificial intelligence: A threat or savior?. Cham: Springer International Publishing.

    Google Scholar 

  • Loosemore, R. P. W. (2014). The maverick nanny with a dopamine drip: Debunking fallacies in the theory of AI motivation. In M. Waser (Ed.), Implementing selves with safe motivational systems and self-improvement: Papers from the 2014 AAAI Spring Symposium (pp. 31–36). Menlo Park: AAAI Press.

    Google Scholar 

  • Mindell, D. A. (2015). Our robots, ourselves: Robotics and the myths of autonomy. New York: Viking.

    Google Scholar 

  • Müller, V. C. (2012). Autonomous cognitive systems in real-world environments: Less control, more flexibility and better interaction. Cognitive Computation, 4(3), 212–215.

    Article  Google Scholar 

  • Omohundro, S. M. (2008). The nature of self-improving artificial intelligence. Accessed 18 September 2019.

  • Omohundro, S. M. (2012). Rational artificial intelligence for the greater good. In A. H. Eden, J. H. Moor, J. H. Søraker, & E. Steinhart (Eds.), Singularity hypotheses: A scientific and philosophical assessment (pp. 161–176). Berlin: Springer.

    Chapter  Google Scholar 

  • Omohundro, S. M. (2016). Autonomous technology and the greater human good. In V. C. Müller (Ed.), Risks of artificial intelligence (pp. 9–27). Boca Raton: CRC Press.

    Google Scholar 

  • Petersen, S. (2017). Superintelligence as superethical. In P. Lin, R. Jenkins, & K. Abney (Eds.), Robot ethics 2.0: From autonomous cars to artificial intelligence (pp. 322–337). Oxford: Oxford University Press.

  • Podschwadek, F. (2017). Do androids dream of normative endorsement? On the fallibility of artificial moral agents. Artificial Intelligence and Law, 25(3), 325–339.

    Article  Google Scholar 

  • Redfield, S. A., & Seto, M. L. (2017). Verification challenges for autonomous systems. In W. F. Lawless, R. Mittu, D. Sofge, & S. Russell (Eds.), Autonomy and artificial intelligence: A threat or savior? (pp. 103–127). Cham: Springer International Publishing.

    Chapter  Google Scholar 

  • Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach. Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Soares, N. (2018). The value learning problem. In R. V. Yampolskiy (Ed.), Artificial intelligence safety and security (pp. 89–97). Boca Raton: CRC Press.

    Google Scholar 

  • Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence. New York: Alfred A. Knopf.

  • Tessier, C. (2017). Robots autonomy: Some technical issues. In W. F. Lawless, R. Mittu, D. Sofge, & S. Russell (Eds.), Autonomy and artificial intelligence: A threat or savior? (pp. 179–194). Cham: Springer International Publishing.

    Chapter  Google Scholar 

  • Torres, P. (2018). Superintelligence and the future of governance: On prioritizing the control problem at the end of history. In R. V. Yampolskiy (Ed.), Artificial intelligence safety and security (pp. 357–374). Boca Raton: CRC Press.

    Google Scholar 

  • Totschnig, W. (2019). The problem of superintelligence: Political, not technological. AI & Society, 34(4), 907–920.

    Article  Google Scholar 

  • Witkowski, M., & Stathis, K. (2004). A dialectic architecture for computational autonomy. In M. Nickles, M. Rovatsos, & G. Weiss (Eds.), Agents and computational autonomy: Potential, risks, and solutions (pp. 261–273). Berlin: Springer.

    Chapter  Google Scholar 

  • Yampolskiy, R. V., & Fox, J. (2012). Artificial general intelligence and the human mental model. In A. H. Eden, J. H. Moor, J. H. Søraker, & E. Steinhart (Eds.), Singularity hypotheses: A scientific and philosophical assessment (pp. 129–145). Berlin: Springer.

    Chapter  Google Scholar 

  • Yampolskiy, R. V., & Fox, J. (2013). Safety engineering for artificial general intelligence. Topoi, 32(2), 217–226.

    Google Scholar 

  • Yudkowsky, E. (2001). Creating friendly AI 1.0: The analysis and design of benevolent goal architectures. San Francisco: The Singularity Institute.

  • Yudkowsky, E. (2004). Coherent extrapolated volition. San Francisco: The Singularity Institute.

    Google Scholar 

  • Yudkowsky, E. (2008). Artificial Intelligence as a positive and negative factor in global risk. In N. Bostrom & M. M. Ćirković (Eds.), Global catastrophic risks (pp. 308–345). Oxford: Oxford University Press.

    Google Scholar 

  • Yudkowsky, E. (2011). Complex value systems in Friendly AI. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), Artificial general intelligence (pp. 388–393). Berlin: Springer.

    Chapter  Google Scholar 

  • Yudkowsky, E. (2012). Friendly artificial intelligence. In A. H. Eden, J. H. Moor, J. H. Søraker, & E. Steinhart (Eds.), Singularity hypotheses: A scientific and philosophical assessment (pp. 181–193). Berlin: Springer.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Wolfhart Totschnig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Totschnig, W. Fully Autonomous AI. Sci Eng Ethics 26, 2473–2485 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI: