There is no strong reason to believe that human-level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed motivation system. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act in the world except by answering questions. Even this narrow approach presents considerable challenges. In this paper, we analyse and critique various methods of controlling the AI. In general an Oracle AI might be safer than unrestricted AI, but still remains potentially dangerous.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
It is generally argued that intelligence and motivation are orthogonal, that a high intelligence is no guarantor of safe motives (Bostrom 2012b).
Humans have many preferences—survival, autonomy, hedonistic pleasure, overcoming challenges, satisfactory interactions, and countless others—and we want them all satisfied, to some extent. But a randomly chosen motivation would completely disregard half of these preferences (actually it would disregard much more, as these preferences are highly complex to define—we wouldn’t want any of the possible ‘partial survival’ motivations, for instance).
Friendliness should not be interpreted here as social or emotional friendliness, but simply a shorthand for whatever behavioural or motivational constraints that keeps a superintelligent system from deliberately or accidentally harming humans.
Another common term is “AI-in-a-box”.
A term coined by Fanya Montalvo by (Mallery 1988) analogy to the mathematical concept of NP-completeness: a problem is AI-complete if an AI capable of solving it would reasonably also be able to solve all major outstanding problems in AI.
Or whatever counterpart might apply to an AI.
Though there have been some attempts to formalise ontology changes, such as (de Blanc 2011).
Overfitting in this way is a common worry in supervised learning methods.
Anderson, M., & Anderson, S. L. (2011). Machine ethics. Cambridge: Cambridge University Press.
Armstrong, S. (2010). Utility indifference. Technical Report. Future of Humanity Institute, Oxford University, no. 2010-1.
Asimov, I. (1942). Runaround. In Street & Smith (Eds.). Astounding Science Fiction.
Bostrom, N. (2003a). Are you living in a computer simulation? Philosophical Quarterly, 53(211), 243–255.
Bostrom, N. (2003b). Ethical issues in advanced artificial intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans 2, 12–17.
Bostrom, N. (2001). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology 9.
Bostrom, N. (2011). Information hazards: A typology of potential harms from knowledge. Review of Contemporary Philosophy, 10, 44–79.
Bostrom, N. (2000). Predictions from philosophy? Coloquia Manilana (PDCIS) 7.
Bostrom, N. (2004). The future of human evolution. In C. Tandy (Ed.), Death and anti-death: Two hundred years after Kant, fifty years after Turing (pp. 339–371). Palo Alto, California: Ria University Press.
Bostrom, N. (2012a). Superintelligence: An analysis of the coming machine intelligence revolution (in preparation).
Bostrom, N. (2012b). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (Forthcoming).
Caplan, B. (2008). The totalitarian threat. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 504–519). Oxford: Oxford University Press.
Chalmers, D. J. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17, 7–65.
de Blanc, P. (2011). Ontological Crises in Artificial Agents’ Value Systems. arXiv:1105.3821v1 [cs.AI].
Good, I. J. (1965). Speculations concerning the first ultraintelligent machine. Advances in Computers, 6, 31–83.
Hall, J. S. (2007). Beyond AI: Creating the conscience of the machine. Amherst, NY: Prometheus Books.
Hanson, R. (2001). Economic growth given machine intelligence. Journal of Artificial Intelligence Research.
Idel, M. (1990). Golem: Jewish magical and mystical traditions on the artificial anthropoid. Albany, New York: State University of New York Press.
Kaas, S., Rayhawk, S., Salamon, A., & Salamon, P. (2010). Economic implications of software minds. http://www.sci.sdsu.edu/~salamon/SoftwareMinds.pdf.
Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgement under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.
Kurzweil, R. (2005). The singularity is near. New York, NY: Penguin Group.
Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Cambridge, MA: MIT Political Science Department.
McCarthy, J., Minsky M., Rochester N., & Shannon, C. (1956). Dartmouth Summer Research Conference on Artificial Intelligence.
Omohundro, S. (2008). The basic AI drives. In B. G. P. Wang (Ed.). Proceedings of the First AGI Conference (p. 171). Frontiers in Artificial Intelligence and Applications, IOS Press.
Ord, T., Hillerbrand, R., & Sandberg, A. (2010). Probing the improbable: Methodological challenges for risks with low probabilities and high stakes. Journal of Risk Research, 13, 191–205.
Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach. 3. Englewood Cliffs, NJ: Prentice-Hall.
Sandberg, A. (2001). Friendly superintelligence. Presentation at Extro 5 conference, 15–17 June 2001. http://www.nada.kth.se/~asa/Extro5/Friendly%20Superintelligence.htm.
Schelling, T. (1960). The strategy of conflict. Massachusetts: Harvard University Press.
Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks. http://singinst.org/upload/ai-resource-drives.pdf.
Simon, H. A. (1965). The shape of automation for men and management. New York: Harper & Row.
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ: Princeton University Press.
Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 308–345). Oxford: Oxford University Press.
Yudkowsky, E. (2001a). Creating friendly AI. Singularity Institute. http://singinst.org/CFAI/.
Yudkowsky, E. (2001b). Friendly AI 0.9. Singularity Institute. http://singinst.org/CaTAI/friendly/contents.html.
Yudkowsky, E. (2001c). General intelligence and seed AI 2.3. http://singinst.org/ourresearch/publications/GISAI/.
Yudkowsky, E. (2009). Paperclip maximiser. Less Wrong. http://wiki.lesswrong.com/wiki/Paperclip_maximizer.
Yudkowsky, E. (2002). The AI-box experiment. Singularity Institute. http://yudkowsky.net/singularity/aibox.
We would like to thank and acknowledge the help from Owen Cotton-Barratt, Will Crouch, Katja Grace, Robin Hanson, Lisa Makros, Moshe Looks, Eric Mandelbaum, Toby Ord, Jake Nebel, Owain Evans, Carl Shulman, Anna Salamon, and Eliezer Yudkowsky.
About this article
Cite this article
Armstrong, S., Sandberg, A. & Bostrom, N. Thinking Inside the Box: Controlling and Using an Oracle AI. Minds & Machines 22, 299–324 (2012). https://doi.org/10.1007/s11023-012-9282-2
- Artificial intelligence
- Motivational control
- Capability control