The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents
First Online: 13 June 2012 Received: 31 January 2012 Accepted: 18 May 2012 DOI:
Cite this article as: Bostrom, N. Minds & Machines (2012) 22: 71. doi:10.1007/s11023-012-9281-3 Abstract
This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the
orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent. Keywords Superintelligence Artificial intelligence AI Goal Instrumental reason Intelligent agent References
Bostrom, N. (2003). Are you living in a computer simulation?
Bostrom, N. (2006). What is a singleton?
Linguistic and Philosophical Investigations,
Bostrom, N. (2012). Information hazards: A Typology of potential harms from knowledge.
Review of Contemporary Philosophy
, 44–79. (
Chalmers, D. (2010). The singularity: A philosophical analysis.
Journal of Consciousness Studies,
Chislenko, A. (1997). Technology as extension of human functional architecture.
de Blanc, P. (2011). Ontological crises in artificial agent’s value systems.
. The singularity institute for artificial intelligence. (
Dewey, D. (2011). Learning what to value. In J. Schmidhuber, K. R. Thorisson, & M. Looks (Eds.),
Proceedings of the 4th conference on artificial general intelligence, AGI 2011 (pp. 309–314). Heidelberg: Springer.
Forgas, J., et al. (Eds.). (2009).
The psychology of attitudes and attitude change. London: Psychology Press.
Lewis, D. (1988). Desire as belief.
Omohundro, S. (2008a). The basic AI drives. In P. Wang, B. Goertzel, and S. Franklin (eds.).
Proceedings of the First AGI Conference, Vol. 171. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press.
Omohundro, S. (2008b). The nature of self-improving artificial intelligence.
Omohundro, S. (2012). Rationally-shaped artificial intelligence. In Eden, A. et al. (eds.).
The singularity hypothesis: A scientific and philosophical assessment (Springer, forthcoming).
Parfit, D. (1984).
Reasons and persons. (pp. 123–124). Reprinted and corrected edition, 1987. Oxford: Oxford University Press.
Parfit, D. (2011).
On what matters. Oxford: Oxford University Press.
Russell, S., & Norvig, P. (2010).
Artificial intelligence: A modern approach (3rd ed.). New Jersey: Prentice Hall.
Sandberg, A., & Bostrom, N. (2008).
Whole brain emulation: A roadmap
. Technical Report 2008–3. Oxford: Future of Humanity Institute, Oxford University. (
Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks.
Sinhababu, N. (2009). The humean theory of motivation reformulated and defended.
Smith, M. (1987). The humean theory of motivation.
Weizenbaum, J. (1976).
Computer power and human reason: From judgment to calculation. San Francisco: W. H. Freeman.
Yudkowsky, E. (2008).
Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom, & M. Cirkovic (Eds.), Global catastrophic risks. (pp. 308–345; quote from p. 310). Oxford: Oxford University Press.
Yudkowsky, E. (2011). Complex value systems are required to realize valuable futures. In J. Schmidhuber, K. R. Thorisson, & M. Looks (Eds.),
Proceedings of the 4th conference on artificial general intelligence, AGI 2011 (pp. 388–393). Heidelberg: Springer. Copyright information
© Springer Science+Business Media B.V. 2012