Avoiding Unintended AI Behaviors

Hibbard, Bill

doi:10.1007/978-3-642-35506-6_12

Bill Hibbard²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7716))

Included in the following conference series:

International Conference on Artificial General Intelligence

1314 Accesses
8 Citations
2 Altmetric

Abstract

Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a two-stage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, M., Anderson, S., Armen, C.: AAAI Symposium on Machine Ethics. AAAI Press, Menlo Park (2005)
Google Scholar
Asimov, I.: Runaround. Astounding Science Fiction (1942)
Google Scholar
Bostrom, N.: Ethical issues in advanced artificial intelligence. In: Smit, I., et al. (eds.) Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, vol. 2, pp. 12–17. Int. Inst. of Adv. Studies in Sys. Res. and Cybernetics (2003)
Google Scholar
Bostrom, N.: The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (forthcoming)
Google Scholar
Dewey, D.: Learning What to Value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 309–314. Springer, Heidelberg (2011)
Chapter Google Scholar
Goertzel, B.: Universal ethics: the foundations of compassion in pattern dynamics (2004), http://www.goertzel.org/papers/UniversalEthics.html
Hay, N.: Optimal Agents. BS honours thesis, University of Auckland (2005)
Google Scholar
Hibbard, B.: Super-intelligent machines. Computer Graphics 35(1), 11–13 (2001)
Article Google Scholar
Hibbard, B.: The technology of mind and a new social contract. J. Evolution and Technology 17(1), 13–22 (2008)
Google Scholar
Hibbard, B.: Model-based utility functions. J. Artificial General Intelligence 3(1), 1–24 (2012)
Article Google Scholar
Hutter, M.: Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Heidelberg (2005)
MATH Google Scholar
Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. J. Artificial General Intelligence 1, 3–24 (2009a)
Article Google Scholar
Hutter, M.: Feature dynamic Bayesian networks. In: Goertzel, B., Hitzler, P., Hutter, M. (eds.) Proc. Second Conf. on AGI, AGI 2009, pp. 67–72. Atlantis Press, Amsterdam (2009b)
Google Scholar
Kurzweil, R.: The singularity is near. Penguin, New York (2005)
Google Scholar
Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. Springer, Heidelberg (1997)
MATH Google Scholar
Lloyd, S.: Computational Capacity of the Universe. Phys. Rev. Lett. 88, 237901 (2002)
Article MathSciNet Google Scholar
Muehlhauser, L., Helm, L.: The singularity and machine ethics. In: Eden, Søraker, Moor, Steinhart (eds.) The Singularity Hypothesis: a Scientific and Philosophical Assessment. Springer, Heidleberg (2012)
Google Scholar
Omohundro, S.: The basic AI drive. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Proc. First Conf. on AGI, AGI 2008, pp. 483–492. IOS Press, Amsterdam (2008)
Google Scholar
Orseau, L., Ring, M.: Self-Modification and Mortality in Artificial Agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 1–10. Springer, Heidelberg (2011a)
Chapter Google Scholar
Ring, M., Orseau, L.: Delusion, Survival, and Intelligent Agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 11–20. Springer, Heidelberg (2011b)
Chapter Google Scholar
Russell, S., Norvig, P.: Artificial intelligence: a modern approach, 3rd edn. Prentice Hall, New York (2010)
Google Scholar
Schmidhuber, J.: Ultimate cognition à la Gödel. Cognitive Computation 1(2), 177–193 (2009)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press (1998)
Google Scholar
von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton U. Press, Princeton (1944)
MATH Google Scholar
Waser, M.: Designing a safe motivational system for intelligent machines. In: Baum, E., Hutter, M., Kitzelmann, E. (eds.) Proc. Third Conf. on AGI, AGI 2010, pp. 170–175. Atlantis Press, Amsterdam (2010)
Google Scholar
Waser, M.: Rational Universal Benevolence: Simpler, Safer, and Wiser Than “Friendly AI”. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 153–162. Springer, Heidelberg (2011)
Chapter Google Scholar
Yudkowsky, E.: (2004), http://www.sl4.org/wiki/CoherentExtrapolatedVolition

Download references

Author information

Authors and Affiliations

SSEC, University of Wisconsin, Madison, WI, 53706, USA
Bill Hibbard

Authors

Bill Hibbard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Humboldt Universität Berlin, Raumerstr. 11, 10437, Berlin, Germany
Joscha Bach
Aidyia Ltd., Unit 612, 6/F, Lu Plaza, 2 Wing Yip Street, Kwun Tong, Hong Kong
Ben Goertzel
Adams State University, Suite 3060, 81101, Alamosa, CO, USA
Matthew Iklé

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hibbard, B. (2012). Avoiding Unintended AI Behaviors. In: Bach, J., Goertzel, B., Iklé, M. (eds) Artificial General Intelligence. AGI 2012. Lecture Notes in Computer Science(), vol 7716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35506-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-35506-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35505-9
Online ISBN: 978-3-642-35506-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics