Safety Engineering for Artificial General Intelligence

Yampolskiy, Roman; Fox, Joshua

doi:10.1007/s11245-012-9128-9

Safety Engineering for Artificial General Intelligence

Published: 24 August 2012

Volume 32, pages 217–226, (2013)
Cite this article

Topoi Aims and scope Submit manuscript

Roman Yampolskiy¹ &
Joshua Fox²

5929 Accesses
34 Citations
5 Altmetric
Explore all metrics

Abstract

Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have human-friendly values that they provably retain, even under recursive self-improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach

Artificial Intelligence and Social Responsibility

Notes

The term AGI can also refer more narrowly to engineered AI, in contrast to those derived from the human model, such as emulated or uploaded brains (Goertzel and Pennachin 2007). In this article, unless specified otherwise, we use AI and AGI to refer to artificial general intelligences in the broader sense.
The term “artimetrics” was coined (Yampolskiy and Govindaraju 2008) on the basis of “artilect,” which is Hugo de Garis’s (2005) neologism for “artificial intellect.”

References

Allen C, Varner G, Zinser J (2000) Prolegomena to any future artificial moral agent. J Exp Theor Artif Intell 12:251–261
Article Google Scholar
Allen C, Smit I, Wallach W (2005) Artificial morality: top-down, bottom-up, and hybrid approaches. Ethics Inf Technol 7(3):149–155
Article Google Scholar
Allen C, Wallach W, Smit I (2006) Why machine ethics? IEEE Intell Syst 21(4):12–17
Article Google Scholar
Anderson M, Anderson SL (2007) Machine ethics: creating an ethical intelligent agent. AI Mag 28(4):15–26
Google Scholar
Arneson RJ (1999) What, if anything, renders all humans morally equal? In: Jamieson D (ed) Peter singer and his critics. Blackwell, Oxford
Google Scholar
Asimov I (1942) Runaround. In: Astounding science fiction, March, pp 94–103
Berg P, Baltimore D, Brenner S, Roblin RO, Singer MF (1975) Summary statement of the Asilomar conference on recombinant DNA molecules. Proc Natl Acad Sci USA 72(6):1981–1984
Article Google Scholar
Bishop M (2009) Why computers can’t feel pain. Mind Mach 19(4):507–516
Article Google Scholar
Bostrom N (2002) Existential risks: analyzing human extinction scenarios and related hazards. J Evol Technol 9(1)
Bostrom N (2006) How long before superintelligence. Linguist Philos Investig 5(1):11–30
Google Scholar
Butler S (1863) Darwin among the machines, letter to the Editor. The Press, Christchurch, New Zealand, 13 June 1863
Butler S (1970/1872) Erewhon: or, over the range. Penguin, London
Chalmers DJ (2010) The singularity: a philosophical analysis. J Conscious Stud 17:7–65
Google Scholar
Churchland PS (2011) Brain trust. Princeton University Press, Princeton
Google Scholar
Clarke R (1993) Asimov’s laws of robotics: implications for information technology, part 1. IEEE Comput 26(12):53–61
Article Google Scholar
Clarke R (1994) Asimov’s laws of robotics: implications for information technology, part 2. IEEE Comput 27(1):57–66
Article Google Scholar
de Garis H (2005) The artilect war: cosmists versus Terrans. ETC. Publications, Palm Springs
Google Scholar
Dennett DC (1978) Why you can’t make a computer that feels pain. Synthese 38(3):415–456
Article Google Scholar
Drescher G (2006) Good and real: demystifying paradoxes from physics to ethics. MIT Press, Cambridge
Google Scholar
Drexler E (1986) Engines of creation. Anchor Press, New York
Google Scholar
Fox J (2011) Morality and super-optimizers. Paper presented at the Future of Humanity Conference, 24 Oct 2011, Van Leer Institute, Jerusalem
Fox J, Shulman C (2010) Superintelligence does not imply benevolence. In: Mainzer K (ed) Proceedings of the VIII European conference on computing and philosophy. Verlag Dr. Hut, Munich
Google Scholar
Gauthier D (1986) Morals by agreement. Oxford University Press, Oxford
Google Scholar
Gavrilova M, Yampolskiy R (2011) Applying biometric principles to avatar recognition. Trans Comput Sci XII:140–158
Google Scholar
Goertzel B (2011) Does humanity need an AI nanny. H+ Magazine, 17 Aug 2011
Goertzel B, Pennachin C (eds) (2007) Essentials of general intelligence: the direct path to artificial general intelligence. Springer, Berlin
Google Scholar
Good IJ (1965) Speculations concerning the first ultraintelligent machine. Adv Comput 6:31–88
Article Google Scholar
Gordon DF (1998) Well-behaved Borgs, bolos, and berserkers. Paper presented at the 15th International Conference on Machine Learning (ICML98), San Francisco, CA
Gordon-Spears DF (2003) Asimov’s laws: current progress. Lect Notes Comput Sci 2699:257–259
Article Google Scholar
Gordon-Spears DF (2005) Assuring the behavior of adaptive agents. In: Hinchey M, Rash J, Truszkowski W, Gordon-Spears DF, Rouff C (eds) Agent technology from a formal perspective. Kluwer, Amsterdam, pp 227–259
Google Scholar
Grau C (2006) There is no “I” in “Robot”: robots and utilitarianism. IEEE Intell Syst 21(4):52–55
Article Google Scholar
Guo S, Zhang G (2009) Robot rights. Science 323(5916):876
Article Google Scholar
Hall JS (2007a) Beyond AI: creating the conscience of the machine. Prometheus, Amherst
Google Scholar
Hall JS (2007b) Self-improving AI: an analysis. Mind Mach 17(3):249–259
Article Google Scholar
Hanson R (2010) Prefer law to values. Overcoming Bias, 10 Oct 2010. Retrieved 15 Jan 2012, from http://www.overcomingbias.com/2009/10/prefer-law-to-values.html
Hobbes T (1998/1651) Leviathan. Oxford University Press, Oxford
Hutter M (2005) Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Berlin
Google Scholar
Joy B (2000) Why the future doesn’t need us. Wired Magazine, 8, April 2000
Kaczynski T (1995) Industrial society and its future. The New York Times, 19 Sep 1995
Kurzweil R (2006) The singularity is near: when humans transcend biology. Penguin, New York
Google Scholar
LaChat MR (1986) Artificial intelligence and ethics: an exercise in the moral imagination. AI Mag 7(2):70–79
Google Scholar
Legg S (2006) Unprovability of Friendly AI. Vetta Project, 15 Sep 2006. Retrieved Jan. 15, 2012, from http://www.vetta.org/2006/09/unprovability-of-friendly-ai/
Legg S, Hutter M (2007) Universal intelligence: a definition of machine intelligence. Mind Mach 17(4):391–444
Article Google Scholar
Lin P, Abney K, Bekey G (2011) Robot ethics: mapping the issues for a mechanized world. Artif Intell 175(5–6):942–949
Google Scholar
McCauley L (2007) AI Armageddon and the three laws of robotics. Ethics Inf Technol 9(2):153–164
Google Scholar
McDermott D (2008) Why ethics is a high hurdle for AI. Paper presented at the North American Conference on Computers and Philosophy, Bloomington, IN
Moor JH (2006) The nature, importance, and difficulty of machine ethics. IEEE Intell Syst 21(4):18–21
Article Google Scholar
Omohundro SM (2008) The basic AI drives. In: Wang P, Goertzel B, Franklin S (eds) The proceedings of the first AGI conference. IOS Press, Amsterdam, pp 483–492
Google Scholar
Pierce MA, Henry JW (1996) Computer ethics: the role of personal, informal, and formal codes. J Bus Ethics 14(4):425–437
Article Google Scholar
Powers TM (2006) Prospects for a Kantian machine. IEEE Intell Syst 21(4):46–51
Article Google Scholar
Pynadath DV, Tambe M (2001) Revisiting Asimov’s first law: a response to the call to arms. Paper presented at the Intelligent Agents VIII. International Workshop on Agents, Theories, Architectures and Languages (ATAL’01)
Rappaport ZH (2006) Robotics and artificial intelligence: jewish ethical perspectives. Acta Neurochir Suppl 98:9–12
Article Google Scholar
Roth D (2009) Do humanlike machines deserve human rights? Wired 17, 19 Jan 2009
Ruvinsky AI (2007) Computational ethics. In: Quigley M (ed) Encyclopedia of information ethics and security. IGI Global, Hershey, p 76
Chapter Google Scholar
Salamon A, Rayhawk S, Kramár J (2010) How intelligible is intelligence? In: Mainzer K (ed) Proceedings of the VIII European conference on computing and philosophy. Verlag Dr. Hut, Munich
Google Scholar
Sawyer RJ (2007) Robot ethics. Science 318(5853):1037
Article Google Scholar
Sharkey N (2008) The ethical frontiers of robotics. Science 322(5909):1800–1801
Article Google Scholar
Sotala K (2010) From mostly harmless to civilization-threatening: pathways to dangerous artificial general intelligences. In: Mainzer K (ed) Proceedings of the VIII European conference on computing and philosophy. Verlag Dr. Hut, Munich
Google Scholar
Sotala K (2012) Advantages of artificial intelligences, uploads, and digital minds. Int J Mach Conscious 4:275–291
Google Scholar
Sparrow R (2007) Killer robots. J Appl Philos 24(1):62–77
Article Google Scholar
Tonkens R (2009) A challenge for machine ethics. Mind Mach 19(3):421–438
Article Google Scholar
Tooby J, Cosmides L (1992) The psychological foundations of culture. In: Barkow J, Tooby J, Cosmides L (eds) The adapted mind: evolutionary psychology and the generation of culture. Oxford University Press, Oxford, pp 19–136
Google Scholar
Vassar M (2005) AI boxing (dogs and helicopters), 2 Aug 2005. Retrieved 18 Jan 2012, from http://sl4.org/archive/0508/11817.html
Veruggio G (2010) Roboethics. IEEE Robot Autom Mag 17(2):105–109
Article Google Scholar
von Ahn L, Blum M, Hopper N, Langford J (2003) CAPTCHA: using hard AI problems for security. In: E. Biham (ed) Advances in cryptology—EUROCRYPT 2003: International conference on the theory and applications of cryptographic techniques, Warsaw, Poland, May 4-8, 2003 proceedings. Lecture notes in computer science 2656, Berlin, Springer, pp 293–311
Wallach W, Allen C (2006) EthicALife: a new field of inquiry. Paper presented at the AnAlifeX workshop, USA
Wallach W, Allen C (2008) Moral machines: teaching robots right from wrong. Oxford University Press, Oxford
Google Scholar
Warwick K (2003) Cyborg morals, cyborg values, cyborg ethics. Ethics Inf Technol 5:131–137
Article Google Scholar
Weld DS, Etzioni O (1994) The first law of robotics (a call to arms). Paper presented at the Twelfth National Conference on Artificial Intelligence (AAAI)
Wright R (2001) Nonzero: the logic of human destiny. Vintage, New York
Google Scholar
Yampolskiy RV (2011a) AI-complete CAPTCHAs as zero knowledge proofs of access to an artificially intelligent system. ISRN Artificial Intelligence, 271878
Yampolskiy RV (2011b) Artificial intelligence safety engineering: why machine ethics is a wrong approach. Philosophy and Theory of Artificial Intelligence, 3–4 Oct, Thessaloniki, Greece
Yampolskiy RV (2011c) What to do with the singularity paradox? Paper presented at the Philosophy and Theory of Artificial Intelligence (PT-AI2011), 3–4 Oct, Thessaloniki, Greece
Yampolskiy RV (2012a) Leakproofing singularity: the artificial intelligence confinement problem. J Conscious Stud 19(1–2):194–214
Google Scholar
Yampolskiy RV (2012b) Turing test as a defining feature of AI-completeness. In: Yang X-S (ed) Artificial intelligence, evolutionary computation and metaheuristics—in the footsteps of Alan Turing. Springer, Berlin
Yampolskiy RV, Fox J (2012) Artificial intelligence and the human mental model. In: Eden A, Moor J, Soraker J, Steinhart E (eds) The singularity hypothesis: a scientific and philosophical assessment. Springer, Berlin (in press)
Yampolskiy R, Gavrilova M (2012) Artimetrics: biometrics for artificial entities. IEEE Robot Autom Mag (RAM) (In press)
Yampolskiy RV, Govindaraju V (2008) Behavioral biometrics for verification and recognition of malicious software agents. Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense VII. SPIE Defense and Security Symposium, Orlando, Florida, 16–20 Mar
Yudkowsky E (2002) The AI-box experiment. Retrieved 15 Jan 2012, from http://yudkowsky.net/singularity/aibox
Yudkowsky E (2007) The logical fallacy of generalization from fictional evidence. Less Wrong. Retrieved 20 Feb 2012, from http://lesswrong.com/lw/k9/the_logical_fallacy_of_generalization_from/
Yudkowsky E (2008) Artificial intelligence as a positive and negative factor in global risk. In: Bostrom N, Ćirković MM (eds) Global catastrophic risks. Oxford University Press, Oxford, pp 308–345
Google Scholar
Yudkowsky E (2010) Timeless decision theory. Retrieved 15 Jan 2012, from http://singinst.org/upload/TDT-v01o.pdf
Yudkowsky E (2011a) Complex value systems are required to realize valuable futures. In: Schmidhuber J, Thórisson KR, Looks M (eds) Artificial general intelligence: 4th international conference, AGI 2011, mountain view, CA, USA, August 3–6, 2011, proceedings. Springer, Berlin, pp 388–393
Google Scholar
Yudkowsky E (2011b) Open problems in friendly artificial intelligence. Paper presented at the Singularity Summit, New York
Yudkowsky E, Bostrom N (2011) The ethics of artificial intelligence. In: Ramsey W, Frankish K (eds) Cambridge handbook of artificial intelligence. Cambridge University Press, Cambridge
Google Scholar

Download references

Acknowledgments

This article is an expanded version of the conference paper “Artificial intelligence safety engineering: why machine ethics is a wrong approach” (Yampolskiy 2011b). We would like to thank Brian Rabkin and Michael Anissimov for their comments.

Author information

Authors and Affiliations

Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, 40292, USA
Roman Yampolskiy
Singularity Institute for Artificial Intelligence, Palo Alto, CA, USA
Joshua Fox

Authors

Roman Yampolskiy
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Fox
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Yampolskiy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yampolskiy, R., Fox, J. Safety Engineering for Artificial General Intelligence. Topoi 32, 217–226 (2013). https://doi.org/10.1007/s11245-012-9128-9

Download citation

Published: 24 August 2012
Issue Date: October 2013
DOI: https://doi.org/10.1007/s11245-012-9128-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Safety Engineering for Artificial General Intelligence

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach

Artificial Intelligence and Social Responsibility

Artificial Intelligence and Social Responsibility

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Safety Engineering for Artificial General Intelligence

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach

Artificial Intelligence and Social Responsibility

Artificial Intelligence and Social Responsibility

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation