The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents

Bostrom, Nick

doi:10.1007/s11023-012-9281-3

The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents

Published: 13 June 2012

Volume 22, pages 71–85, (2012)
Cite this article

Minds and Machines Aims and scope Submit manuscript

Nick Bostrom¹

6159 Accesses
94 Citations
17 Altmetric
1 Mention
Explore all metrics

Abstract

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

This is of course not to deny that differences that appear small visually can be functionally profound.
For some recent attempts to defend the Humean theory of motivation, see Smith (1987), Lewis (1988), and Sinhababu (2009).
See also Parfit (2011).
The orthogonality thesis implies that most any combination of final goal and intelligence level is logically possible; it does not imply that it would be practically easy to endow a superintelligent agent with some arbitrary or human-respecting final goal—even if we knew how to construct the intelligence part. For some preliminary notes on the value-loading problem, see, e.g., Dewey (2011) and Yudkowsky (2011).
See Sandberg and Bostrom (2008).
Stephen Omohundro has written two pioneering papers on this topic (Omohundro 2008a, b). Omohundro argues that all advanced AI systems are likely to exhibit a number of “basic drives”, by which he means “tendencies which will be present unless explicitly counteracted.” The term “AI drive” has the advantage of being short and evocative, but it has the disadvantage of suggesting that the instrumental goals to which it refers influence the AI’s decision-making in the same way as psychological drives influence human decision-making, i.e. via a kind of phenomenological tug on our ego which our willpower may occasionally succeed in resisting. That connotation is unhelpful. One would not normally say that a typical human being has a “drive” to fill out their tax return, even though filing taxes may be a fairly convergent instrumental goal for humans in contemporary societies (a goal whose realization averts trouble that would prevent us from realizing many of our final goals). Our treatment here also differs from that of Omohundro in some other more substantial ways, although the underlying idea is the same. (See also Chalmers (2010) and Omohundro (2012).
See Chislenko (1997).
See also Shulman (2010).
An agent might also change its goal representation if it changes its ontology, in order to transpose its old representation into the new ontology. Cf. de Blanc (2011).
Another type of factor that might make an evidential decision theorist undertake various actions, including changing its final goals, is the evidential import of deciding to do so. For example, an agent that follows evidential decision theory might believe that there exist other agents like it in the universe, and that its own actions will provide some evidence about how those other agents will act. The agent might therefore choose to adopt a final goal that is altruistic towards those other evidentially-linked agents, on grounds that this will give the agent evidence that those other agents will have chosen to act in like manner. An equivalent outcome might be obtained, however, without changing one’s final goals, by choosing in each instant to act as if one had those final goals.
An extensive psychological literature explores adaptive preference formation. See, e.g., Forgas et al. (2009).
In formal models, the value of information is quantified as the difference between the expected value realized by optimal decisions made with that information and the expected value realized by optimal decisions made without it. (See, e.g., Russell and Norvig 2010.) It follows that the value of information is never negative. It also follows that any information you know will never affect any decision you will ever make has zero value for you. However, this kind of model assumes several idealizations which are often invalid in the real world—such as that knowledge has no final value (meaning that knowledge has only instrumental value and is not valuable for its own sake), and that agents are not transparent to other agents.
This strategy is exemplified by the sea squirt larva, which swims about until it finds a suitable rock, to which it then permanently affixes itself. Cemented in place, the larva has less need for complex information processing, whence it proceeds to digest part of its own brain (its cerebral ganglion). Academics can sometimes observe a similar phenomenon in colleagues who are granted tenure.
Cf. Bostrom (2012).
Cf. Bostrom (2006).
One could reverse the question and look instead at possible reasons for a superintelligent singleton not to develop some technological capabilities. These include: (a) The singleton foreseeing that it will have no use of some technological capability; (b) The development cost being too large relative to its anticipated utility. This would be the case if, for instance, the technology will never be suitable for achieving any of the singleton’s ends, or if the singleton has a very high discount rate that strongly discourages investment; (c) The singleton having some final value that requires abstention from particular avenues of technology development; (d) If the singleton is not certain it will remain stable, it might prefer to refrain from developing technologies that could threaten its internal stability or that would make the consequences of dissolution worse (e.g., a world government may not wish to develop technologies that would facilitate rebellion, even if they had some good uses, nor develop technologies for the easy production of weapons of mass destruction which could wreak havoc if the world government were to dissolve); (e) Similarly, the singleton might have made some kind of binding strategic commitment not to develop some technology, a commitment that remains operative even if it would now be convenient to develop it. (Note, however, that some current reasons for technology-development would not apply to a singleton: e.g., reasons arising from unwanted arms races).
Suppose that an agent discounts resources obtained in the future at an exponential rate, and that because of the light speed limitation the agent can only increase its resource endowment at a polynomial rate. Would this mean that there will be some time after which the agent would not find it worthwhile to continue acquisitive expansion? No, because although the present value of the resources obtained at future times would asymptote to zero the further into the future we look, so would the present cost of obtaining them. The present cost of sending out one more von Neumann probe a 100 million years from now (possibly using some resource acquired some short time earlier) would be diminished by the same discount factor that would diminish the present value of the future resources the extra probe would acquire (modulo a constant factor).
Even an agent that has an apparently very limited final goal, such as “to make 32 paperclips”, could pursue unlimited resource acquisition if there were no relevant cost to the agent of doing so. For example, even after an expected-utility-maximizing agent had built 32 paperclips, it could use some extra resources to verify that it had indeed successfully built 32 paperclips meeting all the specifications (and, if necessary, to take corrective action). After it had done so, it could run another batch of tests to make doubly sure that no mistake had been made. And then it could run another test, and another. The benefits of subsequent tests would be subject to steeply diminishing returns; however, so long as there were no alternative action with a higher expected utility, the agent would keep testing and re-testing (and keep acquiring more resources to enable these tests).
While the volume reached by colonization probes at a given time might be roughly spherical and expanding with a rate proportional to the square of time elapsed since the first probe was launched (~t²), the amount of resources contained within this volume will follow a less regular growth pattern, since the distribution of resources is inhomogeneous and varies over several scales. Initially, the growth rate might be ~t² as the home planet is colonized; then the growth rate might become spiky as nearby planets and solar systems are colonized; then, as the roughly disc-shaped volume of the Milky Way gets filled out, the growth rate might even out, to be approximately proportional to t; then the growth rate might again become spiky as nearby galaxies are colonized; then the growth rate might again approximate ~t² as expansion proceeds on a scale over which the distribution of galaxies is roughly homogeneous; then another period of spiky growth followed by smooth ~t² growth as galactic superclusters are colonized; until ultimately the growth rate starts a final decline, eventually reaching zero as the expansion speed of the universe accelerates to such an extent as to make further colonization impossible.
The simulation argument may be of particular importance in this context. A superintelligent agent may assign a significant probability to hypotheses according to which it lives in a computer simulation and its percept sequence is generated by another superintelligence, and this might various generate convergent instrumental reasons depending on the agent’s guesses about what types of simulations it is most likely to be in. Cf. Bostrom (2003).
Human beings might constitute potential threats; they certainly constitute physical resources.
For comments and discussion I am grateful to Stuart Armstrong, Grant Bartley, Owain Evans, Lisa Makros, Luke Muehlhauser, Toby Ord, Brian Rabkin, Rebecca Roache, Anders Sandberg, and three anonymous referees.

References

Bostrom, N. (2003). Are you living in a computer simulation? Philosophical Quarterly, 53(211), 243–255.
Article Google Scholar
Bostrom, N. (2006). What is a singleton? Linguistic and Philosophical Investigations, 5(2), 48–54.
Google Scholar
Bostrom, N. (2012). Information hazards: A Typology of potential harms from knowledge. Review of Contemporary Philosophy, 10, 44–79. (www.nickbostrom.com/information-hazards.pdf).
Chalmers, D. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17, 7–65.
Google Scholar
Chislenko, A. (1997). Technology as extension of human functional architecture. Extropy Online. (project.cyberpunk.ru/idb/technology_as_extension.html).
de Blanc, P. (2011). Ontological crises in artificial agent’s value systems. Manuscript. The singularity institute for artificial intelligence. (arxiv.org/pdf/1105.3821v1.pdf).
Dewey, D. (2011). Learning what to value. In J. Schmidhuber, K. R. Thorisson, & M. Looks (Eds.), Proceedings of the 4th conference on artificial general intelligence, AGI 2011 (pp. 309–314). Heidelberg: Springer.
Google Scholar
Forgas, J., et al. (Eds.). (2009). The psychology of attitudes and attitude change. London: Psychology Press.
Google Scholar
Lewis, D. (1988). Desire as belief. Mind, 97(387), 323–332.
Article MathSciNet Google Scholar
Omohundro, S. (2008a). The basic AI drives. In P. Wang, B. Goertzel, and S. Franklin (eds.). Proceedings of the First AGI Conference, Vol. 171. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press.
Omohundro, S. (2008b). The nature of self-improving artificial intelligence. Manuscript. (selfawaresystems.files.wordpress.com/2008/01/nature_of_self_improving_ai.pdf).
Omohundro, S. (2012). Rationally-shaped artificial intelligence. In Eden, A. et al. (eds.). The singularity hypothesis: A scientific and philosophical assessment (Springer, forthcoming).
Parfit, D. (1984). Reasons and persons. (pp. 123–124). Reprinted and corrected edition, 1987. Oxford: Oxford University Press.
Parfit, D. (2011). On what matters. Oxford: Oxford University Press.
Google Scholar
Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). New Jersey: Prentice Hall.
Google Scholar
Sandberg, A., & Bostrom, N. (2008). Whole brain emulation: A roadmap. Technical Report 2008–3. Oxford: Future of Humanity Institute, Oxford University. (www.fhi.ox.ac.uk/Reports/2008-3.pdf).
Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks. Manuscript. (singinst.org/upload/ai-resource-drives.pdf).
Sinhababu, N. (2009). The humean theory of motivation reformulated and defended. Philosophical Review, 118(4), 465–500.
Article Google Scholar
Smith, M. (1987). The humean theory of motivation. Mind, 46(381), 36–61.
Article Google Scholar
Weizenbaum, J. (1976). Computer power and human reason: From judgment to calculation. San Francisco: W. H. Freeman.
Google Scholar
Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom, & M. Cirkovic (Eds.), Global catastrophic risks. (pp. 308–345; quote from p. 310). Oxford: Oxford University Press.
Yudkowsky, E. (2011). Complex value systems are required to realize valuable futures. In J. Schmidhuber, K. R. Thorisson, & M. Looks (Eds.), Proceedings of the 4th conference on artificial general intelligence, AGI 2011 (pp. 388–393). Heidelberg: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Philosophy, Oxford Martin School, Future of Humanity Institute, Oxford University, Oxford, UK
Nick Bostrom

Authors

Nick Bostrom
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nick Bostrom.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bostrom, N. The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents. Minds & Machines 22, 71–85 (2012). https://doi.org/10.1007/s11023-012-9281-3

Download citation

Received: 31 January 2012
Accepted: 18 May 2012
Published: 13 June 2012
Issue Date: May 2012
DOI: https://doi.org/10.1007/s11023-012-9281-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents

Abstract

Access this article

Similar content being viewed by others

Artificial superintelligence and its limits: why AlphaZero cannot become a general agent

Intrinsic Motivation for Truly Autonomous Agents

The content intelligence: an argument against the lethality of artificial intelligence

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents

Abstract

Access this article

Similar content being viewed by others

Artificial superintelligence and its limits: why AlphaZero cannot become a general agent

Intrinsic Motivation for Truly Autonomous Agents

The content intelligence: an argument against the lethality of artificial intelligence

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation