Self-Modification of Policy and Utility Function in Rational Agents

Everitt, Tom; Filan, Daniel; Daswani, Mayank; Hutter, Marcus

doi:10.1007/978-3-319-41649-6_1

Self-Modification of Policy and Utility Function in Rational Agents

Tom Everitt¹⁶,
Daniel Filan¹⁶,
Mayank Daswani¹⁶ &
…
Marcus Hutter¹⁶

Conference paper
First Online: 25 June 2016

1615 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9782))

Abstract

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify – for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby ‘escaping’ the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
To fit the knowledge-seeking agent into our framework, our definition deviates slightly from Orseau (2014).
2.
In this paper, we only consider the possibility of the agent changing its utility function itself, not the possibility of someone else (like the creator of the agent) changing it back. See Orseau and Ring (2012) for a model where the environment can change the agent.
3.
Note that a policy argument to \(Q^\mathrm{{re}}\) would be superfluous, as the the action \(a_k\) determines the next step policy \(\pi _{k+1}\).
4.
Computer viruses are very simple forms of survival agents that can be hard to stop. More intelligent versions could turn out to be very problematic.
5.
Note, however, that our result says nothing about the agent modifying the chessboard program to give high reward even when the agent is not winning. Our result only shows that the agent does not change its utility function \(u_1\leadsto u_t\), but not that the agent refrains from changing the percept \(e_t\) that is the input to the utility function. Ring and Orseau (2011) develop a model of the latter possibility.

References

Bird, J., Layzell, P.: The evolved radio and its implications for modelling the evolution of novel sensors. In: CEC-02, pp. 1836–1841 (2002)
Google Scholar
Bostrom, N.: Superintelligence: Paths, Dangers Strategies. Oxford University Press, Oxford (2014)
Google Scholar
Dewey, D.: Learning what to value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 309–314. Springer, Heidelberg (2011)
Chapter Google Scholar
Everitt, T., Filan, D., Daswani, M., Hutter, M.: Self-modification of policy and utility function in rational agents. Technical report (2016). arXiv:1605.03142
Everitt, T., Hutter, M.: Avoiding wireheading with value reinforcement learning. In: Steunebrink, B., et al. (eds.) AGI 2016, LNAI 9782, pp. 12–22 (2016)
Google Scholar
Hibbard, B.: Model-based utility functions. J. Artif. Gen. Intell. Res. 3(1), 1–24 (2012)
Article Google Scholar
Hutter, M.: Universal Artificial Intelligence. Springer, Heidelberg (2005)
MATH Google Scholar
Hutter, M.: Extreme state aggregation beyond MDPs. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds.) ALT 2014. LNCS, vol. 8776, pp. 185–199. Springer, Heidelberg (2014)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet MATH Google Scholar
Legg, S., Hutter, M.: Universal intelligence: a definition of machine intelligence. Mind. Mach. 17(4), 391–444 (2007)
Article Google Scholar
Leike, J., Lattimore, T., Orseau, L., Hutter, M.: Thompson sampling is asymptotically optimal in general environments. In: UAI-16 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Omohundro, S.M.: The basic AI drives. In: AGI-08, pp. 483–493. IOS Press (2008)
Google Scholar
Orseau, L.: Universal knowledge-seeking agents. TCS 519, 127–139 (2014)
Article MathSciNet MATH Google Scholar
Orseau, L., Ring, M.: Self-modification and mortality in artificial agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 1–10. Springer, Heidelberg (2011)
Chapter Google Scholar
Orseau, L., Ring, M.: Space-time embedded intelligence. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 209–218. Springer, Heidelberg (2012)
Chapter Google Scholar
Ring, M., Orseau, L.: Delusion, survival, and intelligent agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 11–20. Springer, Heidelberg (2011)
Chapter Google Scholar
Schmidhuber, J.: Gödel machines: fully self-referential optimal universal self-improvers. In: Goertzel, B., Pennachin, C. (eds.) AGI-07, pp. 199–226. Springer, Heidelberg (2007)
Chapter Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Soares, N.: The value learning problem. Technical report MIRI (2015)
Google Scholar
Soares, N., Fallenstein, B., Yudkowsky, E., Armstrong, S.: Corrigibility. In: AAAI Workshop on AI and Ethics, pp. 74–82 (2015)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Yampolskiy, R.V.: Artificial Super Intelligence: A Futuristic Approach. Chapman and Hall/CRC, Boca Raton (2015)
Google Scholar

Download references

Acknowledgements

This work grew out of a MIRIx workshop. We thank the (non-author) participants David Johnston and Samuel Rathmanner. We also thank John Aslanides, Jan Leike, and Laurent Orseau for reading drafts and providing valuable suggestions.

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Tom Everitt, Daniel Filan, Mayank Daswani & Marcus Hutter

Authors

Tom Everitt
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Filan
View author publications
You can also search for this author in PubMed Google Scholar
Mayank Daswani
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Everitt .

Editor information

Editors and Affiliations

Galleria 1, IDSIA, Manno, Switzerland
Bas Steunebrink
Temple University, Phoenixville, Pennsylvania, USA
Pei Wang
Hong Kong Polytechnic University, Hong Kong, Hong Kong
Ben Goertzel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Everitt, T., Filan, D., Daswani, M., Hutter, M. (2016). Self-Modification of Policy and Utility Function in Rational Agents. In: Steunebrink, B., Wang, P., Goertzel, B. (eds) Artificial General Intelligence. AGI 2016. Lecture Notes in Computer Science(), vol 9782. Springer, Cham. https://doi.org/10.1007/978-3-319-41649-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-41649-6_1
Published: 25 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41648-9
Online ISBN: 978-3-319-41649-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics