Optimistic AIXI

  • Peter Sunehag
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7716)

Abstract

We consider extending the AIXI agent by using multiple (or even a compact class of) priors. This has the benefit of weakening the conditions on the true environment that we need to prove asymptotic optimality. Furthermore, it decreases the arbitrariness of picking the prior or reference machine. We connect this to removing symmetry between accepting and rejecting bets in the rationality axiomatization of AIXI and replacing it with optimism. Optimism is often used to encourage exploration in the more restrictive Markov Decision Process setting and it alleviates the problem that AIXI (with geometric discounting) stops exploring prematurely.

Keywords

AIXI Reinforcement Learning Optimism Optimality 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ALL09]
    Asmuth, J., Li, L., Littman, M.L., Nouri, A., Wingate, D.: Pac-mdp reinforcement learning with bayesian priors (2009)Google Scholar
  2. [BD62]
    Blackwell, D., Dubins, L.: Merging of Opinions with Increasing Information. The Annals of Mathematical Statistics 33(3), 882–886 (1962)MathSciNetMATHCrossRefGoogle Scholar
  3. [CMKO00]
    Casadesus-Masanell, R., Klibanoff, P., Ozdenoren, E.: Maxmin Expected Utility over Savage Acts with a Set of Priors. Journal of Economic Theory 92(1), 35–65 (2000)MathSciNetMATHCrossRefGoogle Scholar
  4. [deF37]
    deFinetti, B.: La prévision: Ses lois logiques, ses sources subjectives. In: Annales de l’Institut Henri Poincaré 7, Paris, pp. 1–68 (1937)Google Scholar
  5. [GS89]
    Gilboa, I., Schmeidler, D.: Maxmin expected utility with non-unique prior. Journal of Mathematical Economics 18(2), 141–153 (1989)MathSciNetMATHCrossRefGoogle Scholar
  6. [Hut05]
    Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
  7. [LH07]
    Legg, S., Hutter, M.: Universal Intelligence: A defintion of machine intelligence. Mind and Machine 17, 391–444 (2007)CrossRefGoogle Scholar
  8. [LH11]
    Lattimore, T., Hutter, M.: Asymptotically Optimal Agents. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 368–382. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. [LV93]
    Li, M., Vitany, P.: An Introduction to Kolmogov Complexity and Its Applications. Springer (1993)Google Scholar
  10. [Mül10]
    Müller, M.: Stationary algorithmic probability. Theor. Comput. Sci. 411(1), 113–130 (2010)MATHCrossRefGoogle Scholar
  11. [NM44]
    Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press (1944)Google Scholar
  12. [Ors10]
    Orseau, L.: Optimality Issues of Universal Greedy Agents with Static Priors. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS, vol. 6331, pp. 345–359. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. [Ram31]
    Ramsey, F.: Truth and probability. In: Braithwaite, R.B. (ed.) The Foundations of Mathematics and other Logical Essays, ch. 7, pp. 156–198. Brace & Co. (1931)Google Scholar
  14. [RN10]
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Englewood Cliffs (2010)Google Scholar
  15. [Sav54]
    Savage, L.: The Foundations of Statistics. Wiley, New York (1954)MATHGoogle Scholar
  16. [SH11a]
    Sunehag, P., Hutter, M.: Axioms for Rational Reinforcement Learning. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 338–352. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. [SH11b]
    Sunehag, P., Hutter, M.: Principles of Solomonoff induction and AIXI. In: Solomonoff Memorial Conference, Melbourne, Australia (2011)Google Scholar
  18. [SH12]
    Sunehag, P., Hutter, M.: Optimistic Agents Are Asymptotically Optimal. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 15–26. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. [SLL09]
    Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learing Research 10, 2413–2444 (2009)MathSciNetMATHGoogle Scholar
  20. [Wal00]
    Walley, P.: Towards a unified theory of imprecise probability. Int. J. Approx. Reasoning, 125–148 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Peter Sunehag
    • 1
  • Marcus Hutter
    • 1
  1. 1.Research School of Computer ScienceAustralian National UniversityCanberraAustralia

Personalised recommendations