Thinking Inside the Box: Controlling and Using an Oracle AI
- 1k Downloads
There is no strong reason to believe that human-level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed motivation system. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act in the world except by answering questions. Even this narrow approach presents considerable challenges. In this paper, we analyse and critique various methods of controlling the AI. In general an Oracle AI might be safer than unrestricted AI, but still remains potentially dangerous.
KeywordsArtificial intelligence Superintelligence Security Risks Motivational control Capability control
We would like to thank and acknowledge the help from Owen Cotton-Barratt, Will Crouch, Katja Grace, Robin Hanson, Lisa Makros, Moshe Looks, Eric Mandelbaum, Toby Ord, Jake Nebel, Owain Evans, Carl Shulman, Anna Salamon, and Eliezer Yudkowsky.
- Armstrong, S. (2010). Utility indifference. Technical Report. Future of Humanity Institute, Oxford University, no. 2010-1.Google Scholar
- Asimov, I. (1942). Runaround. In Street & Smith (Eds.). Astounding Science Fiction.Google Scholar
- Bostrom, N. (2003b). Ethical issues in advanced artificial intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans 2, 12–17.Google Scholar
- Bostrom, N. (2001). Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology 9.Google Scholar
- Bostrom, N. (2011). Information hazards: A typology of potential harms from knowledge. Review of Contemporary Philosophy, 10, 44–79.Google Scholar
- Bostrom, N. (2000). Predictions from philosophy? Coloquia Manilana (PDCIS) 7.Google Scholar
- Bostrom, N. (2004). The future of human evolution. In C. Tandy (Ed.), Death and anti-death: Two hundred years after Kant, fifty years after Turing (pp. 339–371). Palo Alto, California: Ria University Press.Google Scholar
- Bostrom, N. (2012a). Superintelligence: An analysis of the coming machine intelligence revolution (in preparation).Google Scholar
- Bostrom, N. (2012b). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (Forthcoming).Google Scholar
- Caplan, B. (2008). The totalitarian threat. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 504–519). Oxford: Oxford University Press.Google Scholar
- Chalmers, D. J. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17, 7–65.Google Scholar
- de Blanc, P. (2011). Ontological Crises in Artificial Agents’ Value Systems. arXiv:1105.3821v1 [cs.AI].Google Scholar
- Good, I. J. (1965). Speculations concerning the first ultraintelligent machine. Advances in Computers, 6, 31–83.Google Scholar
- Hall, J. S. (2007). Beyond AI: Creating the conscience of the machine. Amherst, NY: Prometheus Books.Google Scholar
- Hanson, R. (2001). Economic growth given machine intelligence. Journal of Artificial Intelligence Research.Google Scholar
- Idel, M. (1990). Golem: Jewish magical and mystical traditions on the artificial anthropoid. Albany, New York: State University of New York Press.Google Scholar
- Kaas, S., Rayhawk, S., Salamon, A., & Salamon, P. (2010). Economic implications of software minds. http://www.sci.sdsu.edu/~salamon/SoftwareMinds.pdf.
- Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgement under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.Google Scholar
- Kurzweil, R. (2005). The singularity is near. New York, NY: Penguin Group.Google Scholar
- Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Cambridge, MA: MIT Political Science Department.Google Scholar
- McCarthy, J., Minsky M., Rochester N., & Shannon, C. (1956). Dartmouth Summer Research Conference on Artificial Intelligence.Google Scholar
- Omohundro, S. (2008). The basic AI drives. In B. G. P. Wang (Ed.). Proceedings of the First AGI Conference (p. 171). Frontiers in Artificial Intelligence and Applications, IOS Press.Google Scholar
- Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach. 3. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
- Sandberg, A. (2001). Friendly superintelligence. Presentation at Extro 5 conference, 15–17 June 2001. http://www.nada.kth.se/~asa/Extro5/Friendly%20Superintelligence.htm.
- Schelling, T. (1960). The strategy of conflict. Massachusetts: Harvard University Press.Google Scholar
- Shulman, C. (2010). Omohundro’s “basic AI drives” and catastrophic risks. http://singinst.org/upload/ai-resource-drives.pdf.
- Simon, H. A. (1965). The shape of automation for men and management. New York: Harper & Row.Google Scholar
- Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.Google Scholar
- Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In N. Bostrom & M. Cirkovic (Eds.), Global catastrophic risks (pp. 308–345). Oxford: Oxford University Press.Google Scholar
- Yudkowsky, E. (2001a). Creating friendly AI. Singularity Institute. http://singinst.org/CFAI/.
- Yudkowsky, E. (2001b). Friendly AI 0.9. Singularity Institute. http://singinst.org/CaTAI/friendly/contents.html.
- Yudkowsky, E. (2001c). General intelligence and seed AI 2.3. http://singinst.org/ourresearch/publications/GISAI/.
- Yudkowsky, E. (2009). Paperclip maximiser. Less Wrong. http://wiki.lesswrong.com/wiki/Paperclip_maximizer.
- Yudkowsky, E. (2002). The AI-box experiment. Singularity Institute. http://yudkowsky.net/singularity/aibox.