Skip to main content

Friendly Artificial Intelligence

  • Chapter
  • First Online:
Singularity Hypotheses

Part of the book series: The Frontiers Collection ((FRONTCOLL))

Abstract

By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: “A curious aspect of the theory of evolution is that everybody thinks he understands it”. Nonetheless the problem seems to be unusually acute in Artificial Intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This story, though famous and oft-cited as fact, may be apocryphal; I could not find a first-hand report. For unreferenced reports see e.g. Crochat and Franklin (2000) or http://neil.fraser.name/writing/tank/. However, failures of the type described are a major real-world consideration when building and testing neural networks.

  2. 2.

    Bill Hibbard, after viewing a draft of this paper, wrote a response arguing that the analogy to the “tank classifier” problem does not apply to reinforcement learning in general. His critique may be found at http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html. My response may be found at http://yudkowsky.net/AIRisk_Hibbard.html. Hibbard also notes that the proposal of Hibbard (2001) is superseded by Hibbard (2004). The latter recommends a two-layer system in which expressions of agreement from humans reinforce recognition of happiness, and recognized happiness reinforces action strategies.

References

  • Barrett, J. L., & Keil, F. (1996). Conceptualizing a non-natural entity: anthropomorphism in god concepts. Cognitive Psychology, 31, 219–247.

    Article  Google Scholar 

  • Bostrom, N. (2001). Existential risks: analyzing human extinction scenarios. Journal of Evolution and Technology, 9.

    Google Scholar 

  • Brown, D. E. (1991). Human universals. New York: McGraw-Hill.

    Google Scholar 

  • Crochat, P., & Franklin, D. (2000). Back-propagation neural network tutorial. http://ieee.uow.edu.au/~daniel/software/libneural/.

  • Ekman, P., & Keltner, D. (1997). Universal facial expressions of emotion: An old controversy and new findings. In U. Segerstrale & P. Molnar (Eds.), Nonverbal communication: Where nature meets culture. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Hibbard, B. (2001). Super-intelligent machines. ACM SIGGRAPH Computer Graphics, 35(1)  .

    Google Scholar 

  • Hibbard, B. (2004). Reinforcement learning as a context for integrating AI research. Presented at the 2004 AAAI Fall Symposium on Achieving Human-Level Intelligence through Integrated Systems and Research.

    Google Scholar 

  • Jaynes, E. T., & Bretthorst, G. L. (2003). Probability theory: The logic of science. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Monod, J. L. (1974). On the molecular theory of evolution. Oxford: New York.

    Google Scholar 

  • Raymond, E. S. ed. (2003). DWIM. The on-line hacker Jargon File, version 4.4.7, 29 Dec 2003.

    Google Scholar 

  • Rice, H. G. (1953). Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74, 358–366.

    Article  MathSciNet  MATH  Google Scholar 

  • Schmidhuber, J. (2003). Goedel machines: self-referential universal problem solvers making provably optimal self-improvements. In B. Goertzel & C. Pennachin (Eds), Artificial general intelligence. Forthcoming. New York: Springer.

    Google Scholar 

  • Sober, E. (1984). The nature of selection. Cambridge: MIT Press.

    Google Scholar 

  • Tooby, J., & Cosmides, L. (1992). The psychological foundations of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture. New York: Oxford University Press.

    Google Scholar 

  • Wachowski, A., & Wachowski, L. (1999). The Matrix, USA, Warner Bros, 135 min.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eliezer Yudkowsky .

Editor information

Editors and Affiliations

Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”

Colin Allen on Yudkowsky’s “Friendly Artificial Intelligence”

Friendly Advice?

Yudkowsky begins with a warning to his readers that “By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it”. He ends by reminding us that software written to “Do-What-I-Mean is a major, nontrivial technical challenge of Friendly AI”. Yudkowsky suggests a history of over-exuberant claims about AI, commenting that early proponents of the idea that artificial neural networks would be intelligent were engaged in “wishful thinking probably more analogous to alchemy than civil engineering”. He indicates that anyone who predicts strongly utopian or dystopian outcomes from superhuman AI is committing the “Giant Cheesecake Fallacy”—the mistake of thinking that just because a powerful intelligence could do something it will do that thing. His message seems to be that we should be neither terrified of superhuman AI nor naive about the challenge of building superhuman AI that will be “nice”.

So, what is to be the approach to designing Friendly AI? Yudkowsky characterizes the challenge as one of choosing a powerful enough optimization process with an appropriate target. Engineers, he asserts, use a rigorous theory to select a design and then build structures implementing the calculated designs.  But, he cautions, we must beware of two kinds of errors: “philosophical failure”, i.e. choosing the wrong target, and “technical failure”, i.e. wrongly assuming that a system will work optimally in contexts other than those in which it has been tested.

As heuristics, these can hardly be faulted.  But like the classic stockbroker’s platitude, “buy low, sell high”, they give no practical advice. Yudkowsky’s repetition of a an apocryphal story about the failure of a neural network program at classifying photographs of tanks—a story that I remember hearing over 25 years ago—hardly enlightens. (If the advice is “Don’t rely on backprop!” this is hardly news.)  Likewise, to be told that to “build an AI that discovers the orbits of the planets, the programmers need know only the math of Bayesian probability theory” is facile.

Yudkowsky correctly points out that engineering, like evolution, explores a tiny fraction of design space, but the rest of his story is shallow. Both processes are historically-bound. They work by modification of designs that are received from the past. Engineers do not start only with a target specification, but with a choice of platforms from which to try to reach that target. Inspired engineering sometimes involves taking something that was designed for one context and applying it in another, but always it involves cycles of testing and refinement, and it is far from guaranteeing optimization. Where should those who want to program “Friendly AI” begin?

Yudkowsky cites nothing more recent than 2004, but in the interim many new books and articles have been published, some proposing quite specific architectures or discussing particular programming paradigms for well-behaved autonomous systems. It would have been nice to know whether Yudkowsky thinks any of this work is on the right track, and if not, why not. If Bayesian theory can discover the orbits of planets, is it suitable for discovering “nice” AI? If not, why not? In describing a developmental neural network approach to AI, Yudkowsky shows a tendency, all too common among writers on this topic, when he asks us to “[f]lash forward to a time when the AI is superhumanly intelligent”. We jump straight to sci fi without being given any clue how that flash occurs.

I hoped for more in the context of the present volume, with its stated goal to “reformulate the singularity hypothesis as a coherent and falsifiable conjecture and to investigate its most likely consequences, in particular those associated with existential risks”. For our assessment of the existential risks, some knowledge of the current engineering pathways is crucial. If the path to Friendly AI with superhuman intelligence goes through explicit, top-down reasoning the existential risks may be rather different than if it goes through implicit, bottom-up processes. Different kinds of philosophical and technical failures are likely to accompany the different approaches. Similarly, if the route to superhuman AI runs through our self-driving automobiles, the risks may be rather different than if they run through our battle-ready military robots. What is clear is that our current understanding of how to build intelligent machines is low, but we have only the vaguest ideas about how to make it high.

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Yudkowsky, E. (2012). Friendly Artificial Intelligence. In: Eden, A., Moor, J., Søraker, J., Steinhart, E. (eds) Singularity Hypotheses. The Frontiers Collection. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32560-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32560-1_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32559-5

  • Online ISBN: 978-3-642-32560-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics