Abstract
By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: “A curious aspect of the theory of evolution is that everybody thinks he understands it”. Nonetheless the problem seems to be unusually acute in Artificial Intelligence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This story, though famous and oft-cited as fact, may be apocryphal; I could not find a first-hand report. For unreferenced reports see e.g. Crochat and Franklin (2000) or http://neil.fraser.name/writing/tank/. However, failures of the type described are a major real-world consideration when building and testing neural networks.
- 2.
Bill Hibbard, after viewing a draft of this paper, wrote a response arguing that the analogy to the “tank classifier” problem does not apply to reinforcement learning in general. His critique may be found at http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html. My response may be found at http://yudkowsky.net/AIRisk_Hibbard.html. Hibbard also notes that the proposal of Hibbard (2001) is superseded by Hibbard (2004). The latter recommends a two-layer system in which expressions of agreement from humans reinforce recognition of happiness, and recognized happiness reinforces action strategies.
References
Barrett, J. L., & Keil, F. (1996). Conceptualizing a non-natural entity: anthropomorphism in god concepts. Cognitive Psychology, 31, 219–247.
Bostrom, N. (2001). Existential risks: analyzing human extinction scenarios. Journal of Evolution and Technology, 9.
Brown, D. E. (1991). Human universals. New York: McGraw-Hill.
Crochat, P., & Franklin, D. (2000). Back-propagation neural network tutorial. http://ieee.uow.edu.au/~daniel/software/libneural/.
Ekman, P., & Keltner, D. (1997). Universal facial expressions of emotion: An old controversy and new findings. In U. Segerstrale & P. Molnar (Eds.), Nonverbal communication: Where nature meets culture. Mahwah: Lawrence Erlbaum Associates.
Hibbard, B. (2001). Super-intelligent machines. ACM SIGGRAPH Computer Graphics, 35(1) .
Hibbard, B. (2004). Reinforcement learning as a context for integrating AI research. Presented at the 2004 AAAI Fall Symposium on Achieving Human-Level Intelligence through Integrated Systems and Research.
Jaynes, E. T., & Bretthorst, G. L. (2003). Probability theory: The logic of science. Cambridge: Cambridge University Press.
Monod, J. L. (1974). On the molecular theory of evolution. Oxford: New York.
Raymond, E. S. ed. (2003). DWIM. The on-line hacker Jargon File, version 4.4.7, 29 Dec 2003.
Rice, H. G. (1953). Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74, 358–366.
Schmidhuber, J. (2003). Goedel machines: self-referential universal problem solvers making provably optimal self-improvements. In B. Goertzel & C. Pennachin (Eds), Artificial general intelligence. Forthcoming. New York: Springer.
Sober, E. (1984). The nature of selection. Cambridge: MIT Press.
Tooby, J., & Cosmides, L. (1992). The psychological foundations of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture. New York: Oxford University Press.
Wachowski, A., & Wachowski, L. (1999). The Matrix, USA, Warner Bros, 135 min.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”
Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”
Friendly Advice?
Yudkowsky begins with a warning to his readers that “By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it”. He ends by reminding us that software written to “Do-What-I-Mean is a major, nontrivial technical challenge of Friendly AI”. Yudkowsky suggests a history of over-exuberant claims about AI, commenting that early proponents of the idea that artificial neural networks would be intelligent were engaged in “wishful thinking probably more analogous to alchemy than civil engineering”. He indicates that anyone who predicts strongly utopian or dystopian outcomes from superhuman AI is committing the “Giant Cheesecake Fallacy”—the mistake of thinking that just because a powerful intelligence could do something it will do that thing. His message seems to be that we should be neither terrified of superhuman AI nor naive about the challenge of building superhuman AI that will be “nice”.
So, what is to be the approach to designing Friendly AI? Yudkowsky characterizes the challenge as one of choosing a powerful enough optimization process with an appropriate target. Engineers, he asserts, use a rigorous theory to select a design and then build structures implementing the calculated designs. But, he cautions, we must beware of two kinds of errors: “philosophical failure”, i.e. choosing the wrong target, and “technical failure”, i.e. wrongly assuming that a system will work optimally in contexts other than those in which it has been tested.
As heuristics, these can hardly be faulted. But like the classic stockbroker’s platitude, “buy low, sell high”, they give no practical advice. Yudkowsky’s repetition of a an apocryphal story about the failure of a neural network program at classifying photographs of tanks—a story that I remember hearing over 25 years ago—hardly enlightens. (If the advice is “Don’t rely on backprop!” this is hardly news.) Likewise, to be told that to “build an AI that discovers the orbits of the planets, the programmers need know only the math of Bayesian probability theory” is facile.
Yudkowsky correctly points out that engineering, like evolution, explores a tiny fraction of design space, but the rest of his story is shallow. Both processes are historically-bound. They work by modification of designs that are received from the past. Engineers do not start only with a target specification, but with a choice of platforms from which to try to reach that target. Inspired engineering sometimes involves taking something that was designed for one context and applying it in another, but always it involves cycles of testing and refinement, and it is far from guaranteeing optimization. Where should those who want to program “Friendly AI” begin?
Yudkowsky cites nothing more recent than 2004, but in the interim many new books and articles have been published, some proposing quite specific architectures or discussing particular programming paradigms for well-behaved autonomous systems. It would have been nice to know whether Yudkowsky thinks any of this work is on the right track, and if not, why not. If Bayesian theory can discover the orbits of planets, is it suitable for discovering “nice” AI? If not, why not? In describing a developmental neural network approach to AI, Yudkowsky shows a tendency, all too common among writers on this topic, when he asks us to “[f]lash forward to a time when the AI is superhumanly intelligent”. We jump straight to sci fi without being given any clue how that flash occurs.
I hoped for more in the context of the present volume, with its stated goal to “reformulate the singularity hypothesis as a coherent and falsifiable conjecture and to investigate its most likely consequences, in particular those associated with existential risks”. For our assessment of the existential risks, some knowledge of the current engineering pathways is crucial. If the path to Friendly AI with superhuman intelligence goes through explicit, top-down reasoning the existential risks may be rather different than if it goes through implicit, bottom-up processes. Different kinds of philosophical and technical failures are likely to accompany the different approaches. Similarly, if the route to superhuman AI runs through our self-driving automobiles, the risks may be rather different than if they run through our battle-ready military robots. What is clear is that our current understanding of how to build intelligent machines is low, but we have only the vaguest ideas about how to make it high.
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yudkowsky, E. (2012). Friendly Artificial Intelligence. In: Eden, A., Moor, J., Søraker, J., Steinhart, E. (eds) Singularity Hypotheses. The Frontiers Collection. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32560-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-32560-1_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32559-5
Online ISBN: 978-3-642-32560-1
eBook Packages: EngineeringEngineering (R0)