Skip to main content

Avoiding Unintended AI Behaviors

  • Conference paper
Artificial General Intelligence (AGI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7716))

Included in the following conference series:

Abstract

Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a two-stage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, M., Anderson, S., Armen, C.: AAAI Symposium on Machine Ethics. AAAI Press, Menlo Park (2005)

    Google Scholar 

  2. Asimov, I.: Runaround. Astounding Science Fiction (1942)

    Google Scholar 

  3. Bostrom, N.: Ethical issues in advanced artificial intelligence. In: Smit, I., et al. (eds.) Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, vol. 2, pp. 12–17. Int. Inst. of Adv. Studies in Sys. Res. and Cybernetics (2003)

    Google Scholar 

  4. Bostrom, N.: The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (forthcoming)

    Google Scholar 

  5. Dewey, D.: Learning What to Value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 309–314. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Goertzel, B.: Universal ethics: the foundations of compassion in pattern dynamics (2004), http://www.goertzel.org/papers/UniversalEthics.html

  7. Hay, N.: Optimal Agents. BS honours thesis, University of Auckland (2005)

    Google Scholar 

  8. Hibbard, B.: Super-intelligent machines. Computer Graphics 35(1), 11–13 (2001)

    Article  Google Scholar 

  9. Hibbard, B.: The technology of mind and a new social contract. J. Evolution and Technology 17(1), 13–22 (2008)

    Google Scholar 

  10. Hibbard, B.: Model-based utility functions. J. Artificial General Intelligence 3(1), 1–24 (2012)

    Article  Google Scholar 

  11. Hutter, M.: Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  12. Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. J. Artificial General Intelligence 1, 3–24 (2009a)

    Article  Google Scholar 

  13. Hutter, M.: Feature dynamic Bayesian networks. In: Goertzel, B., Hitzler, P., Hutter, M. (eds.) Proc. Second Conf. on AGI, AGI 2009, pp. 67–72. Atlantis Press, Amsterdam (2009b)

    Google Scholar 

  14. Kurzweil, R.: The singularity is near. Penguin, New York (2005)

    Google Scholar 

  15. Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  16. Lloyd, S.: Computational Capacity of the Universe. Phys. Rev. Lett. 88, 237901 (2002)

    Article  MathSciNet  Google Scholar 

  17. Muehlhauser, L., Helm, L.: The singularity and machine ethics. In: Eden, Søraker, Moor, Steinhart (eds.) The Singularity Hypothesis: a Scientific and Philosophical Assessment. Springer, Heidleberg (2012)

    Google Scholar 

  18. Omohundro, S.: The basic AI drive. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Proc. First Conf. on AGI, AGI 2008, pp. 483–492. IOS Press, Amsterdam (2008)

    Google Scholar 

  19. Orseau, L., Ring, M.: Self-Modification and Mortality in Artificial Agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 1–10. Springer, Heidelberg (2011a)

    Chapter  Google Scholar 

  20. Ring, M., Orseau, L.: Delusion, Survival, and Intelligent Agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 11–20. Springer, Heidelberg (2011b)

    Chapter  Google Scholar 

  21. Russell, S., Norvig, P.: Artificial intelligence: a modern approach, 3rd edn. Prentice Hall, New York (2010)

    Google Scholar 

  22. Schmidhuber, J.: Ultimate cognition à la Gödel. Cognitive Computation 1(2), 177–193 (2009)

    Article  Google Scholar 

  23. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press (1998)

    Google Scholar 

  24. von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton U. Press, Princeton (1944)

    MATH  Google Scholar 

  25. Waser, M.: Designing a safe motivational system for intelligent machines. In: Baum, E., Hutter, M., Kitzelmann, E. (eds.) Proc. Third Conf. on AGI, AGI 2010, pp. 170–175. Atlantis Press, Amsterdam (2010)

    Google Scholar 

  26. Waser, M.: Rational Universal Benevolence: Simpler, Safer, and Wiser Than “Friendly AI”. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 153–162. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  27. Yudkowsky, E.: (2004), http://www.sl4.org/wiki/CoherentExtrapolatedVolition

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hibbard, B. (2012). Avoiding Unintended AI Behaviors. In: Bach, J., Goertzel, B., Iklé, M. (eds) Artificial General Intelligence. AGI 2012. Lecture Notes in Computer Science(), vol 7716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35506-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35506-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35505-9

  • Online ISBN: 978-3-642-35506-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics