A Game-Theoretic Analysis of the Off-Switch Game

  • Tobias Wängberg
  • Mikael Böörs
  • Elliot Catt
  • Tom Everitt
  • Marcus Hutter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10414)

Abstract

The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al.  (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot’s best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot’s best action for arbitrary belief and irrationality assumptions.

References

  1. Allais, M.: Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école Américaine. Econometrica 21(4), 503–546 (1953). doi:10.2307/1907921
  2. Armstrong, S.: Motivated value selection for artificial agents. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 12–20 (2015)Google Scholar
  3. Armstrong, S.: Utility indifference. Technical report. Oxford University, pp. 1–5 (2010)Google Scholar
  4. Armstrong, S., Leike, J.: Towards interactive inverse reinforcement learning. In: NIPS Workshop (2016)Google Scholar
  5. Dewey, D.: Learning what to value. In: Artificial General Intelligence, vol. 6830, pp. 309–314 (2011). ISBN 978-3-642-22886-5. doi:10.1007/978-3-642-22887-2. arXiv: 1402.5379
  6. Everitt, T., Filan, D., Daswani, M., Hutter, M.: Self-modification of policy and utility function in rational agents. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 1–11. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_1 Google Scholar
  7. Hadfield-Menell, D., et al.: Cooperative inverse reinforcementlearning (2016a). arXiv: 1606.03137
  8. Hadfield-Menell, D., et al.: The off-switch game 2008, pp. 1–11 (2016b). arXiv: 1611.08219
  9. Martin, J., Everitt, T., Hutter, M.: Death and suicide in universal artificial intelligence. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 23–32. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_3. arXiv: 1606.00652 Google Scholar
  10. Omohundro, S.M.: The basic AI drives. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Artificial General Intelligence, vol. 171, pp. 483–493. IOS Press (2008)Google Scholar
  11. Orseau, L., Armstrong, S.: Safely interruptible agents. In: 32nd Conference on Uncertainty in Artificial Intelligence (2016)Google Scholar
  12. Rasmusen, E.: Games and Information, 2nd edn. Blackwell, Oxford (1994)Google Scholar
  13. Soares, N., Fallenstein, B.: A technical research agenda. Technical report. Machine Intelligence Research Institute (MIRI), pp. 1–14Google Scholar
  14. Soares, N., et al.: Corrigibility. In: AAAI Workshop on AI and Ethics, pp. 74–82 (2015)Google Scholar
  15. Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton Classic Editions. Princeton University Press, Princeton (1947). ISBN 0691003629. doi:10.1177/1468795X06065810. Lambert, S., Deuber, O. (eds.)
  16. Wiener, N.: Some moral and technical consequences of automation. Science 131(3410), 1355–1358 (1960). ISSN 0036–8075. doi:10.1126/science.132.3429.741

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Tobias Wängberg
    • 2
  • Mikael Böörs
    • 2
  • Elliot Catt
    • 1
  • Tom Everitt
    • 1
  • Marcus Hutter
    • 1
  1. 1.Australian National UniversityActonAustralia
  2. 2.Linköping UniversityLinköpingSweden

Personalised recommendations