Using Localization and Factorization to Reduce the Complexity of Reinforcement Learning

  • Peter Sunehag
  • Marcus Hutter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9205)


General reinforcement learning is a powerful framework for artificial intelligence that has seen much theoretical progress since introduced fifteen years ago. We have previously provided guarantees for cases with finitely many possible environments. Though the results are the best possible in general, a linear dependence on the size of the hypothesis class renders them impractical. However, we dramatically improved on these by introducing the concept of environments generated by combining laws. The bounds are then linear in the number of laws needed to generate the environment class. This number is identified as a natural complexity measure for classes of environments. The individual law might only predict some feature (factorization) and only in some contexts (localization). We here extend previous deterministic results to the important stochastic setting.


Reinforcement learning Laws Optimism Bounds 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Diuk, C., Li, L., Leffer, B.R.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) ICML. ACM International Conference Proceeding Series, vol. 382 (2009)Google Scholar
  2. 2.
    Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
  3. 3.
    Lattimore, T.: Theory of General Reinforcement Learning. Ph.D. thesis, Australian National University (2014)Google Scholar
  4. 4.
    Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 320–334. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  5. 5.
    Lattimore, T., Hutter, M., Sunehag, P.: The sample-complexity of general reinforcement learning. Journal of Machine Learning Research, W&CP: ICML 28(3), 28–36 (2013)Google Scholar
  6. 6.
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Englewood Clifs (2010)Google Scholar
  7. 7.
    Sunehag, P., Hutter, M.: Axioms for rational reinforcement learning. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 338–352. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  8. 8.
    Sunehag, P., Hutter, M.: Optimistic agents are asymptotically optimal. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 15–26. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  9. 9.
    Sunehag, P., Hutter, M.: Optimistic AIXI. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 312–321. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  10. 10.
    Sunehag, P., Hutter, M.: Learning agents with evolving hypothesis classes. In: Kühnberger, K.-U., Rudolph, S., Wang, P. (eds.) AGI 2013. LNCS, vol. 7999, pp. 150–159. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  11. 11.
    Sunehag, P., Hutter, M.: A dual process theory of optimistic cognition. In: Annual Conference of the Cognitive Science Society, CogSci 2014 (2014)Google Scholar
  12. 12.
    Sunehag, P., Hutter, M.: Rationality, Optimism and Guarantees in General Reinforcement Learning. Journal of Machine Learning Reserch (to appear, 2015)Google Scholar
  13. 13.
    Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI approximation. Journal of Artifiicial Intelligence Research 40(1), 95–142 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Willems, F., Shtarkov, Y., Tjalkens, T.: The context tree weighting method: Basic properties. IEEE Transactions on Information Theory 41, 653–664 (1995)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Research School of Computer Science Australian National University CanberraAustralia
  2. 2.Google - Deep Mind LondonUK

Personalised recommendations