Minds and Machines

, Volume 22, Issue 4, pp 299–324 | Cite as

Thinking Inside the Box: Controlling and Using an Oracle AI

  • Stuart ArmstrongEmail author
  • Anders Sandberg
  • Nick Bostrom


There is no strong reason to believe that human-level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed motivation system. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act in the world except by answering questions. Even this narrow approach presents considerable challenges. In this paper, we analyse and critique various methods of controlling the AI. In general an Oracle AI might be safer than unrestricted AI, but still remains potentially dangerous.


Artificial intelligence Superintelligence Security Risks Motivational control Capability control 



We would like to thank and acknowledge the help from Owen Cotton-Barratt, Will Crouch, Katja Grace, Robin Hanson, Lisa Makros, Moshe Looks, Eric Mandelbaum, Toby Ord, Jake Nebel, Owain Evans, Carl Shulman, Anna Salamon, and Eliezer Yudkowsky.


  1. Anderson, M., & Anderson, S. L. (2011). Machine ethics. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  2. Armstrong, S. (2010). Utility indifference. Technical Report. Future of Humanity Institute, Oxford University, no. 2010-1.Google Scholar
  3. Asimov, I. (1942). Runaround. In Street & Smith (Eds.). Astounding Science Fiction.Google Scholar
  4. Bostrom, N. (2003a). Are you living in a computer simulation? Philosophical Quarterly, 53(211), 243–255.CrossRefGoogle Scholar
  5. Bostrom, N. (2003b). Ethical issues in advanced artificial intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans 2, 12–17.Google Scholar
  6. Bostrom, N. (2001). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology 9.Google Scholar
  7. Bostrom, N. (2011). Information hazards: A typology of potential harms from knowledge. Review of Contemporary Philosophy, 10, 44–79.Google Scholar
  8. Bostrom, N. (2000). Predictions from philosophy? Coloquia Manilana (PDCIS) 7.Google Scholar
  9. Bostrom, N. (2004). The future of human evolution. In C. Tandy (Ed.), Death and anti-death: Two hundred years after Kant, fifty years after Turing (pp. 339–371). Palo Alto, California: Ria University Press.Google Scholar
  10. Bostrom, N. (2012a). Superintelligence: An analysis of the coming machine intelligence revolution (in preparation).Google Scholar
  11. Bostrom, N. (2012b). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (Forthcoming).Google Scholar
  12. Caplan, B. (2008). The totalitarian threat. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 504–519). Oxford: Oxford University Press.Google Scholar
  13. Chalmers, D. J. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17, 7–65.Google Scholar
  14. de Blanc, P. (2011). Ontological Crises in Artificial Agents’ Value Systems. arXiv:1105.3821v1 [cs.AI].Google Scholar
  15. Good, I. J. (1965). Speculations concerning the first ultraintelligent machine. Advances in Computers, 6, 31–83.Google Scholar
  16. Hall, J. S. (2007). Beyond AI: Creating the conscience of the machine. Amherst, NY: Prometheus Books.Google Scholar
  17. Hanson, R. (2001). Economic growth given machine intelligence. Journal of Artificial Intelligence Research.Google Scholar
  18. Idel, M. (1990). Golem: Jewish magical and mystical traditions on the artificial anthropoid. Albany, New York: State University of New York Press.Google Scholar
  19. Kaas, S., Rayhawk, S., Salamon, A., & Salamon, P. (2010). Economic implications of software minds.
  20. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgement under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.Google Scholar
  21. Kurzweil, R. (2005). The singularity is near. New York, NY: Penguin Group.Google Scholar
  22. Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Cambridge, MA: MIT Political Science Department.Google Scholar
  23. McCarthy, J., Minsky M., Rochester N., & Shannon, C. (1956). Dartmouth Summer Research Conference on Artificial Intelligence.Google Scholar
  24. Omohundro, S. (2008). The basic AI drives. In B. G. P. Wang (Ed.). Proceedings of the First AGI Conference (p. 171). Frontiers in Artificial Intelligence and Applications, IOS Press.Google Scholar
  25. Ord, T., Hillerbrand, R., & Sandberg, A. (2010). Probing the improbable: Methodological challenges for risks with low probabilities and high stakes. Journal of Risk Research, 13, 191–205.CrossRefGoogle Scholar
  26. Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach. 3. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  27. Sandberg, A. (2001). Friendly superintelligence. Presentation at Extro 5 conference, 15–17 June 2001.
  28. Schelling, T. (1960). The strategy of conflict. Massachusetts: Harvard University Press.Google Scholar
  29. Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks.
  30. Simon, H. A. (1965). The shape of automation for men and management. New York: Harper & Row.Google Scholar
  31. Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.Google Scholar
  32. von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ: Princeton University Press.zbMATHGoogle Scholar
  33. Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 308–345). Oxford: Oxford University Press.Google Scholar
  34. Yudkowsky, E. (2001a). Creating friendly AI. Singularity Institute.
  35. Yudkowsky, E. (2001b). Friendly AI 0.9. Singularity Institute.
  36. Yudkowsky, E. (2001c). General intelligence and seed AI 2.3.
  37. Yudkowsky, E. (2009). Paperclip maximiser. Less Wrong.
  38. Yudkowsky, E. (2002). The AI-box experiment. Singularity Institute.

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Stuart Armstrong
    • 1
    Email author
  • Anders Sandberg
    • 1
  • Nick Bostrom
    • 1
  1. 1.Future of Humanity Institute, Faculty of PhilosophyUniversity of OxfordOxfordUK

Personalised recommendations