Friendly Artificial Intelligence

Yudkowsky, Eliezer

doi:10.1007/978-3-642-32560-1_10

Eliezer Yudkowsky⁵

Part of the book series: The Frontiers Collection ((FRONTCOLL))

4648 Accesses
8 Citations
3 Altmetric

Abstract

By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: “A curious aspect of the theory of evolution is that everybody thinks he understands it”. Nonetheless the problem seems to be unusually acute in Artificial Intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This story, though famous and oft-cited as fact, may be apocryphal; I could not find a first-hand report. For unreferenced reports see e.g. Crochat and Franklin (2000) or http://neil.fraser.name/writing/tank/. However, failures of the type described are a major real-world consideration when building and testing neural networks.
2.
Bill Hibbard, after viewing a draft of this paper, wrote a response arguing that the analogy to the “tank classifier” problem does not apply to reinforcement learning in general. His critique may be found at http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html. My response may be found at http://yudkowsky.net/AIRisk_Hibbard.html. Hibbard also notes that the proposal of Hibbard (2001) is superseded by Hibbard (2004). The latter recommends a two-layer system in which expressions of agreement from humans reinforce recognition of happiness, and recognized happiness reinforces action strategies.

References

Barrett, J. L., & Keil, F. (1996). Conceptualizing a non-natural entity: anthropomorphism in god concepts. Cognitive Psychology, 31, 219–247.
Article Google Scholar
Bostrom, N. (2001). Existential risks: analyzing human extinction scenarios. Journal of Evolution and Technology, 9.
Google Scholar
Brown, D. E. (1991). Human universals. New York: McGraw-Hill.
Google Scholar
Crochat, P., & Franklin, D. (2000). Back-propagation neural network tutorial. http://ieee.uow.edu.au/~daniel/software/libneural/.
Ekman, P., & Keltner, D. (1997). Universal facial expressions of emotion: An old controversy and new findings. In U. Segerstrale & P. Molnar (Eds.), Nonverbal communication: Where nature meets culture. Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Hibbard, B. (2001). Super-intelligent machines. ACM SIGGRAPH Computer Graphics, 35(1) .
Google Scholar
Hibbard, B. (2004). Reinforcement learning as a context for integrating AI research. Presented at the 2004 AAAI Fall Symposium on Achieving Human-Level Intelligence through Integrated Systems and Research.
Google Scholar
Jaynes, E. T., & Bretthorst, G. L. (2003). Probability theory: The logic of science. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Monod, J. L. (1974). On the molecular theory of evolution. Oxford: New York.
Google Scholar
Raymond, E. S. ed. (2003). DWIM. The on-line hacker Jargon File, version 4.4.7, 29 Dec 2003.
Google Scholar
Rice, H. G. (1953). Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74, 358–366.
Article MathSciNet MATH Google Scholar
Schmidhuber, J. (2003). Goedel machines: self-referential universal problem solvers making provably optimal self-improvements. In B. Goertzel & C. Pennachin (Eds), Artificial general intelligence. Forthcoming. New York: Springer.
Google Scholar
Sober, E. (1984). The nature of selection. Cambridge: MIT Press.
Google Scholar
Tooby, J., & Cosmides, L. (1992). The psychological foundations of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture. New York: Oxford University Press.
Google Scholar
Wachowski, A., & Wachowski, L. (1999). The Matrix, USA, Warner Bros, 135 min.
Google Scholar

Download references

Author information

Authors and Affiliations

Machine Intelligence Research Institute, San Francisco, CA, USA
Eliezer Yudkowsky

Authors

Eliezer Yudkowsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eliezer Yudkowsky .

Editor information

Editors and Affiliations

School of Computer Science and, Electronic Engineering, University of Essex, Colchester, CO4 3SQ, United Kingdom
Amnon H. Eden
Dartmouth College, Thornton 6035, Hanover, 03755-3592, New Hampshire, USA
James H. Moor
Department of Philosophy, University of Twente, Enschede, 7500 AE, Netherlands
Johnny H. Søraker
Department of Philosophy, William Paterson University, Pompton Road 300, Wayne, 07470, New York, USA
Eric Steinhart

Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”

Friendly Advice?

Yudkowsky begins with a warning to his readers that “By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it”. He ends by reminding us that software written to “Do-What-I-Mean is a major, nontrivial technical challenge of Friendly AI”. Yudkowsky suggests a history of over-exuberant claims about AI, commenting that early proponents of the idea that artificial neural networks would be intelligent were engaged in “wishful thinking probably more analogous to alchemy than civil engineering”. He indicates that anyone who predicts strongly utopian or dystopian outcomes from superhuman AI is committing the “Giant Cheesecake Fallacy”—the mistake of thinking that just because a powerful intelligence could do something it will do that thing. His message seems to be that we should be neither terrified of superhuman AI nor naive about the challenge of building superhuman AI that will be “nice”.

So, what is to be the approach to designing Friendly AI? Yudkowsky characterizes the challenge as one of choosing a powerful enough optimization process with an appropriate target. Engineers, he asserts, use a rigorous theory to select a design and then build structures implementing the calculated designs. But, he cautions, we must beware of two kinds of errors: “philosophical failure”, i.e. choosing the wrong target, and “technical failure”, i.e. wrongly assuming that a system will work optimally in contexts other than those in which it has been tested.

As heuristics, these can hardly be faulted. But like the classic stockbroker’s platitude, “buy low, sell high”, they give no practical advice. Yudkowsky’s repetition of a an apocryphal story about the failure of a neural network program at classifying photographs of tanks—a story that I remember hearing over 25 years ago—hardly enlightens. (If the advice is “Don’t rely on backprop!” this is hardly news.) Likewise, to be told that to “build an AI that discovers the orbits of the planets, the programmers need know only the math of Bayesian probability theory” is facile.

Yudkowsky correctly points out that engineering, like evolution, explores a tiny fraction of design space, but the rest of his story is shallow. Both processes are historically-bound. They work by modification of designs that are received from the past. Engineers do not start only with a target specification, but with a choice of platforms from which to try to reach that target. Inspired engineering sometimes involves taking something that was designed for one context and applying it in another, but always it involves cycles of testing and refinement, and it is far from guaranteeing optimization. Where should those who want to program “Friendly AI” begin?

Yudkowsky cites nothing more recent than 2004, but in the interim many new books and articles have been published, some proposing quite specific architectures or discussing particular programming paradigms for well-behaved autonomous systems. It would have been nice to know whether Yudkowsky thinks any of this work is on the right track, and if not, why not. If Bayesian theory can discover the orbits of planets, is it suitable for discovering “nice” AI? If not, why not? In describing a developmental neural network approach to AI, Yudkowsky shows a tendency, all too common among writers on this topic, when he asks us to “[f]lash forward to a time when the AI is superhumanly intelligent”. We jump straight to sci fi without being given any clue how that flash occurs.

I hoped for more in the context of the present volume, with its stated goal to “reformulate the singularity hypothesis as a coherent and falsifiable conjecture and to investigate its most likely consequences, in particular those associated with existential risks”. For our assessment of the existential risks, some knowledge of the current engineering pathways is crucial. If the path to Friendly AI with superhuman intelligence goes through explicit, top-down reasoning the existential risks may be rather different than if it goes through implicit, bottom-up processes. Different kinds of philosophical and technical failures are likely to accompany the different approaches. Similarly, if the route to superhuman AI runs through our self-driving automobiles, the risks may be rather different than if they run through our battle-ready military robots. What is clear is that our current understanding of how to build intelligent machines is low, but we have only the vaguest ideas about how to make it high.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yudkowsky, E. (2012). Friendly Artificial Intelligence. In: Eden, A., Moor, J., Søraker, J., Steinhart, E. (eds) Singularity Hypotheses. The Frontiers Collection. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32560-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-32560-1_10
Published: 04 April 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32559-5
Online ISBN: 978-3-642-32560-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Friendly Artificial Intelligence

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”

Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation