Asymptotically Optimal Agents

  • Tor Lattimore
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


Artificial general intelligence aims to create agents capable of learning to solve arbitrary interesting problems. We define two versions of asymptotic optimality and prove that no agent can satisfy the strong version while in some cases, depending on discounting, there does exist a non-computable weak asymptotically optimal agent.


Rational agents sequential decision theory artificial general intelligence reinforcement learning asymptotic optimality general discounting 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)CrossRefzbMATHGoogle Scholar
  2. 2.
    Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1985)CrossRefzbMATHGoogle Scholar
  3. 3.
    Diaconis, P., Freedman, D.: On inconsistent Bayes estimates of location. The Annals of Statistics 14(1), 68–87 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Diaconis, P., Freedman, D.: On the consistency of Bayes estimates. The Annals of Statistics 14(1), 1–26 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Frederick, S., Oewenstein, G.L., O’Donoghue, T.: Time discounting and time preference: A critical review. Journal of Economic Literature 40(2) (2002)Google Scholar
  6. 6.
    Hutter, M.: Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 364–379. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)zbMATHGoogle Scholar
  8. 8.
    Hutter, M., Muchnik, A.A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Lattimore, T., Hutter, M.: Time consistent discounting. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) Algorithmic Learning Theory. LNCS, vol. 6925, pp. 384–398. Springer, Heidelberg (2011)Google Scholar
  10. 10.
    Legg, S.: Is there an elegant universal theory of prediction? In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 274–287. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds & Machines 17(4), 391–444 (2007)CrossRefGoogle Scholar
  12. 12.
    Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Heidelberg (2008)CrossRefzbMATHGoogle Scholar
  13. 13.
    Norvig, P., Russell, S.J.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Englewood Cliffs (2003)zbMATHGoogle Scholar
  14. 14.
    Orseau, L.: Optimality issues of universal greedy agents with static priors. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS, vol. 6331, pp. 345–359. Springer, Heidelberg (2010)Google Scholar
  15. 15.
    Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309–1331 (2008)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tor Lattimore
    • 1
  • Marcus Hutter
    • 1
    • 2
  1. 1.Research School of Computer ScienceAustralian National UniversityAustralia
  2. 2.ETH ZürichAustralia

Personalised recommendations