The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents
This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.
KeywordsSuperintelligence Artificial intelligence AI Goal Instrumental reason Intelligent agent
- Bostrom, N. (2006). What is a singleton? Linguistic and Philosophical Investigations, 5(2), 48–54.Google Scholar
- Bostrom, N. (2012). Information hazards: A Typology of potential harms from knowledge. Review of Contemporary Philosophy, 10, 44–79. (www.nickbostrom.com/information-hazards.pdf).
- Chalmers, D. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17, 7–65.Google Scholar
- Chislenko, A. (1997). Technology as extension of human functional architecture. Extropy Online. (project.cyberpunk.ru/idb/technology_as_extension.html).
- de Blanc, P. (2011). Ontological crises in artificial agent’s value systems. Manuscript. The singularity institute for artificial intelligence. (arxiv.org/pdf/1105.3821v1.pdf).
- Dewey, D. (2011). Learning what to value. In J. Schmidhuber, K. R. Thorisson, & M. Looks (Eds.), Proceedings of the 4th conference on artificial general intelligence, AGI 2011 (pp. 309–314). Heidelberg: Springer.Google Scholar
- Forgas, J., et al. (Eds.). (2009). The psychology of attitudes and attitude change. London: Psychology Press.Google Scholar
- Omohundro, S. (2008a). The basic AI drives. In P. Wang, B. Goertzel, and S. Franklin (eds.). Proceedings of the First AGI Conference, Vol. 171. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press.Google Scholar
- Omohundro, S. (2008b). The nature of self-improving artificial intelligence. Manuscript. (selfawaresystems.files.wordpress.com/2008/01/nature_of_self_improving_ai.pdf).
- Omohundro, S. (2012). Rationally-shaped artificial intelligence. In Eden, A. et al. (eds.). The singularity hypothesis: A scientific and philosophical assessment (Springer, forthcoming).Google Scholar
- Parfit, D. (1984). Reasons and persons. (pp. 123–124). Reprinted and corrected edition, 1987. Oxford: Oxford University Press.Google Scholar
- Parfit, D. (2011). On what matters. Oxford: Oxford University Press.Google Scholar
- Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). New Jersey: Prentice Hall.Google Scholar
- Sandberg, A., & Bostrom, N. (2008). Whole brain emulation: A roadmap. Technical Report 2008–3. Oxford: Future of Humanity Institute, Oxford University. (www.fhi.ox.ac.uk/Reports/2008-3.pdf).
- Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks. Manuscript. (singinst.org/upload/ai-resource-drives.pdf).
- Weizenbaum, J. (1976). Computer power and human reason: From judgment to calculation. San Francisco: W. H. Freeman.Google Scholar
- Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom, & M. Cirkovic (Eds.), Global catastrophic risks. (pp. 308–345; quote from p. 310). Oxford: Oxford University Press.Google Scholar
- Yudkowsky, E. (2011). Complex value systems are required to realize valuable futures. In J. Schmidhuber, K. R. Thorisson, & M. Looks (Eds.), Proceedings of the 4th conference on artificial general intelligence, AGI 2011 (pp. 388–393). Heidelberg: Springer.Google Scholar