Asymptotically Optimal Agents

  • Tor Lattimore
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


Artificial general intelligence aims to create agents capable of learning to solve arbitrary interesting problems. We define two versions of asymptotic optimality and prove that no agent can satisfy the strong version while in some cases, depending on discounting, there does exist a non-computable weak asymptotically optimal agent.


Rational agents sequential decision theory artificial general intelligence reinforcement learning asymptotic optimality general discounting 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)CrossRefMATHGoogle Scholar
  2. 2.
    Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1985)CrossRefMATHGoogle Scholar
  3. 3.
    Diaconis, P., Freedman, D.: On inconsistent Bayes estimates of location. The Annals of Statistics 14(1), 68–87 (1986)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Diaconis, P., Freedman, D.: On the consistency of Bayes estimates. The Annals of Statistics 14(1), 1–26 (1986)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Frederick, S., Oewenstein, G.L., O’Donoghue, T.: Time discounting and time preference: A critical review. Journal of Economic Literature 40(2) (2002)Google Scholar
  6. 6.
    Hutter, M.: Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 364–379. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)MATHGoogle Scholar
  8. 8.
    Hutter, M., Muchnik, A.A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Lattimore, T., Hutter, M.: Time consistent discounting. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) Algorithmic Learning Theory. LNCS, vol. 6925, pp. 384–398. Springer, Heidelberg (2011)Google Scholar
  10. 10.
    Legg, S.: Is there an elegant universal theory of prediction? In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 274–287. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds & Machines 17(4), 391–444 (2007)CrossRefGoogle Scholar
  12. 12.
    Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Heidelberg (2008)CrossRefMATHGoogle Scholar
  13. 13.
    Norvig, P., Russell, S.J.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Englewood Cliffs (2003)MATHGoogle Scholar
  14. 14.
    Orseau, L.: Optimality issues of universal greedy agents with static priors. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS, vol. 6331, pp. 345–359. Springer, Heidelberg (2010)Google Scholar
  15. 15.
    Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309–1331 (2008)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tor Lattimore
    • 1
  • Marcus Hutter
    • 1
    • 2
  1. 1.Research School of Computer ScienceAustralian National UniversityAustralia
  2. 2.ETH ZürichAustralia

Personalised recommendations