Skip to main content

Thinking Inside the Box: Controlling and Using an Oracle AI


There is no strong reason to believe that human-level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed motivation system. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act in the world except by answering questions. Even this narrow approach presents considerable challenges. In this paper, we analyse and critique various methods of controlling the AI. In general an Oracle AI might be safer than unrestricted AI, but still remains potentially dangerous.

This is a preview of subscription content, access via your institution.


  1. It is generally argued that intelligence and motivation are orthogonal, that a high intelligence is no guarantor of safe motives (Bostrom 2012b).

  2. Humans have many preferences—survival, autonomy, hedonistic pleasure, overcoming challenges, satisfactory interactions, and countless others—and we want them all satisfied, to some extent. But a randomly chosen motivation would completely disregard half of these preferences (actually it would disregard much more, as these preferences are highly complex to define—we wouldn’t want any of the possible ‘partial survival’ motivations, for instance).

  3. Friendliness should not be interpreted here as social or emotional friendliness, but simply a shorthand for whatever behavioural or motivational constraints that keeps a superintelligent system from deliberately or accidentally harming humans.

  4. Another common term is “AI-in-a-box”.

  5. A term coined by Fanya Montalvo by (Mallery 1988) analogy to the mathematical concept of NP-completeness: a problem is AI-complete if an AI capable of solving it would reasonably also be able to solve all major outstanding problems in AI.

  6. Or whatever counterpart might apply to an AI.

  7. Though there have been some attempts to formalise ontology changes, such as (de Blanc 2011).

  8. Overfitting in this way is a common worry in supervised learning methods.


  • Anderson, M., & Anderson, S. L. (2011). Machine ethics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Armstrong, S. (2010). Utility indifference. Technical Report. Future of Humanity Institute, Oxford University, no. 2010-1.

  • Asimov, I. (1942). Runaround. In Street & Smith (Eds.). Astounding Science Fiction.

  • Bostrom, N. (2003a). Are you living in a computer simulation? Philosophical Quarterly, 53(211), 243–255.

    Article  Google Scholar 

  • Bostrom, N. (2003b). Ethical issues in advanced artificial intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans 2, 12–17.

  • Bostrom, N. (2001). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology 9.

  • Bostrom, N. (2011). Information hazards: A typology of potential harms from knowledge. Review of Contemporary Philosophy, 10, 44–79.

    Google Scholar 

  • Bostrom, N. (2000). Predictions from philosophy? Coloquia Manilana (PDCIS) 7.

  • Bostrom, N. (2004). The future of human evolution. In C. Tandy (Ed.), Death and anti-death: Two hundred years after Kant, fifty years after Turing (pp. 339–371). Palo Alto, California: Ria University Press.

    Google Scholar 

  • Bostrom, N. (2012a). Superintelligence: An analysis of the coming machine intelligence revolution (in preparation).

  • Bostrom, N. (2012b). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (Forthcoming).

  • Caplan, B. (2008). The totalitarian threat. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 504–519). Oxford: Oxford University Press.

    Google Scholar 

  • Chalmers, D. J. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17, 7–65.

    Google Scholar 

  • de Blanc, P. (2011). Ontological Crises in Artificial Agents’ Value Systems. arXiv:1105.3821v1 [cs.AI].

  • Good, I. J. (1965). Speculations concerning the first ultraintelligent machine. Advances in Computers, 6, 31–83.

  • Hall, J. S. (2007). Beyond AI: Creating the conscience of the machine. Amherst, NY: Prometheus Books.

    Google Scholar 

  • Hanson, R. (2001). Economic growth given machine intelligence. Journal of Artificial Intelligence Research.

  • Idel, M. (1990). Golem: Jewish magical and mystical traditions on the artificial anthropoid. Albany, New York: State University of New York Press.

    Google Scholar 

  • Kaas, S., Rayhawk, S., Salamon, A., & Salamon, P. (2010). Economic implications of software minds.

  • Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgement under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.

    Google Scholar 

  • Kurzweil, R. (2005). The singularity is near. New York, NY: Penguin Group.

    Google Scholar 

  • Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Cambridge, MA: MIT Political Science Department.

    Google Scholar 

  • McCarthy, J., Minsky M., Rochester N., & Shannon, C. (1956). Dartmouth Summer Research Conference on Artificial Intelligence.

  • Omohundro, S. (2008). The basic AI drives. In B. G. P. Wang (Ed.). Proceedings of the First AGI Conference (p. 171). Frontiers in Artificial Intelligence and Applications, IOS Press.

  • Ord, T., Hillerbrand, R., & Sandberg, A. (2010). Probing the improbable: Methodological challenges for risks with low probabilities and high stakes. Journal of Risk Research, 13, 191–205.

    Article  Google Scholar 

  • Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach. 3. Englewood Cliffs, NJ: Prentice-Hall.

  • Sandberg, A. (2001). Friendly superintelligence. Presentation at Extro 5 conference, 15–17 June 2001.

  • Schelling, T. (1960). The strategy of conflict. Massachusetts: Harvard University Press.

    Google Scholar 

  • Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks.

  • Simon, H. A. (1965). The shape of automation for men and management. New York: Harper & Row.

    Google Scholar 

  • Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    Google Scholar 

  • von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ: Princeton University Press.

    MATH  Google Scholar 

  • Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 308–345). Oxford: Oxford University Press.

    Google Scholar 

  • Yudkowsky, E. (2001a). Creating friendly AI. Singularity Institute.

  • Yudkowsky, E. (2001b). Friendly AI 0.9. Singularity Institute.

  • Yudkowsky, E. (2001c). General intelligence and seed AI 2.3.

  • Yudkowsky, E. (2009). Paperclip maximiser. Less Wrong.

  • Yudkowsky, E. (2002). The AI-box experiment. Singularity Institute.

Download references


We would like to thank and acknowledge the help from Owen Cotton-Barratt, Will Crouch, Katja Grace, Robin Hanson, Lisa Makros, Moshe Looks, Eric Mandelbaum, Toby Ord, Jake Nebel, Owain Evans, Carl Shulman, Anna Salamon, and Eliezer Yudkowsky.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Stuart Armstrong.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Armstrong, S., Sandberg, A. & Bostrom, N. Thinking Inside the Box: Controlling and Using an Oracle AI. Minds & Machines 22, 299–324 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Artificial intelligence
  • Superintelligence
  • Security
  • Risks
  • Motivational control
  • Capability control