Skip to main content

Self-Modification of Policy and Utility Function in Rational Agents

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9782)

Abstract

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify – for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby ‘escaping’ the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

Keywords

  • Current Utility Function
  • General Reinforcement Learning (GRL)
  • Preservation Goals
  • Action-percept Pair
  • Orseau

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-41649-6_1
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-41649-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    To fit the knowledge-seeking agent into our framework, our definition deviates slightly from Orseau (2014).

  2. 2.

    In this paper, we only consider the possibility of the agent changing its utility function itself, not the possibility of someone else (like the creator of the agent) changing it back. See Orseau and Ring (2012) for a model where the environment can change the agent.

  3. 3.

    Note that a policy argument to \(Q^\mathrm{{re}}\) would be superfluous, as the the action \(a_k\) determines the next step policy \(\pi _{k+1}\).

  4. 4.

    Computer viruses are very simple forms of survival agents that can be hard to stop. More intelligent versions could turn out to be very problematic.

  5. 5.

    Note, however, that our result says nothing about the agent modifying the chessboard program to give high reward even when the agent is not winning. Our result only shows that the agent does not change its utility function \(u_1\leadsto u_t\), but not that the agent refrains from changing the percept \(e_t\) that is the input to the utility function. Ring and Orseau (2011) develop a model of the latter possibility.

References

  • Bird, J., Layzell, P.: The evolved radio and its implications for modelling the evolution of novel sensors. In: CEC-02, pp. 1836–1841 (2002)

    Google Scholar 

  • Bostrom, N.: Superintelligence: Paths, Dangers Strategies. Oxford University Press, Oxford (2014)

    Google Scholar 

  • Dewey, D.: Learning what to value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 309–314. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  • Everitt, T., Filan, D., Daswani, M., Hutter, M.: Self-modification of policy and utility function in rational agents. Technical report (2016). arXiv:1605.03142

  • Everitt, T., Hutter, M.: Avoiding wireheading with value reinforcement learning. In: Steunebrink, B., et al. (eds.) AGI 2016, LNAI 9782, pp. 12–22 (2016)

    Google Scholar 

  • Hibbard, B.: Model-based utility functions. J. Artif. Gen. Intell. Res. 3(1), 1–24 (2012)

    CrossRef  Google Scholar 

  • Hutter, M.: Universal Artificial Intelligence. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  • Hutter, M.: Extreme state aggregation beyond MDPs. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds.) ALT 2014. LNCS, vol. 8776, pp. 185–199. Springer, Heidelberg (2014)

    Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)

    CrossRef  MathSciNet  MATH  Google Scholar 

  • Legg, S., Hutter, M.: Universal intelligence: a definition of machine intelligence. Mind. Mach. 17(4), 391–444 (2007)

    CrossRef  Google Scholar 

  • Leike, J., Lattimore, T., Orseau, L., Hutter, M.: Thompson sampling is asymptotically optimal in general environments. In: UAI-16 (2016)

    Google Scholar 

  • Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    CrossRef  Google Scholar 

  • Omohundro, S.M.: The basic AI drives. In: AGI-08, pp. 483–493. IOS Press (2008)

    Google Scholar 

  • Orseau, L.: Universal knowledge-seeking agents. TCS 519, 127–139 (2014)

    CrossRef  MathSciNet  MATH  Google Scholar 

  • Orseau, L., Ring, M.: Self-modification and mortality in artificial agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 1–10. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  • Orseau, L., Ring, M.: Space-time embedded intelligence. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 209–218. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  • Ring, M., Orseau, L.: Delusion, survival, and intelligent agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 11–20. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  • Schmidhuber, J.: Gödel machines: fully self-referential optimal universal self-improvers. In: Goertzel, B., Pennachin, C. (eds.) AGI-07, pp. 199–226. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  • Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    CrossRef  Google Scholar 

  • Soares, N.: The value learning problem. Technical report MIRI (2015)

    Google Scholar 

  • Soares, N., Fallenstein, B., Yudkowsky, E., Armstrong, S.: Corrigibility. In: AAAI Workshop on AI and Ethics, pp. 74–82 (2015)

    Google Scholar 

  • Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  • Yampolskiy, R.V.: Artificial Super Intelligence: A Futuristic Approach. Chapman and Hall/CRC, Boca Raton (2015)

    Google Scholar 

Download references

Acknowledgements

This work grew out of a MIRIx workshop. We thank the (non-author) participants David Johnston and Samuel Rathmanner. We also thank John Aslanides, Jan Leike, and Laurent Orseau for reading drafts and providing valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Everitt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Everitt, T., Filan, D., Daswani, M., Hutter, M. (2016). Self-Modification of Policy and Utility Function in Rational Agents. In: Steunebrink, B., Wang, P., Goertzel, B. (eds) Artificial General Intelligence. AGI 2016. Lecture Notes in Computer Science(), vol 9782. Springer, Cham. https://doi.org/10.1007/978-3-319-41649-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41649-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41648-9

  • Online ISBN: 978-3-319-41649-6

  • eBook Packages: Computer ScienceComputer Science (R0)