Goal-Directed Online Learning of Predictive Models

  • Sylvie C. W. Ong
  • Yuri Grinberg
  • Joelle Pineau
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7188)


We present an algorithmic approach for integrated learning and planning in predictive representations. The approach extends earlier work on predictive state representations to the case of online exploration, by allowing exploration of the domain to proceed in a goal-directed fashion and thus be more efficient. Our algorithm interleaves online learning of the models, with estimation of the value function. The framework is applicable to a variety of important learning problems, including scenarios such as apprenticeship learning, model customization, and decision-making in non-stationary domains.


predictive state representation online learning model-based reinforcement learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aberdeen, D., Buffet, O., Thomas, O.: Policy-gradients for PSRs and POMDPs. In: AISTATS (2007)Google Scholar
  2. 2.
    Boots, B., Gordon, G.J.: An online spectral learning algorithm for partially observable nonlinear dynamical systems. In: Proceedings AAAI (2011)Google Scholar
  3. 3.
    Boots, B., Siddiqi, S., Gordon, G.: Closing the learning-planning loop with predictive state representations. In: Proceedings of Robotics: Science and Systems (2010)Google Scholar
  4. 4.
    Bowling, M., McCracken, P., James, M., Neufeld, J., Wilkinson, D.: Learning predictive state representations using non-blind policies. In: Proceedings ICML (2006)Google Scholar
  5. 5.
    Brand, M.: Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications 415, 20–30 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Dinculescu, M., Precup, D.: Approximate predictive representations of partially observable systems. In: Proceedings ICML (2010)Google Scholar
  7. 7.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning (2005)Google Scholar
  8. 8.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63, 3–42 (2006)zbMATHCrossRefGoogle Scholar
  9. 9.
    Gordon, G.J.: Approximate Solutions to Markov Decision Processes. Ph.D. thesis, School of Computer Science, Carnegie Mellon University (1999)Google Scholar
  10. 10.
    Izadi, M.T., Precup, D.: Point-Based Planning for Predictive State Representations. In: Bergler, S. (ed.) Canadian AI. LNCS (LNAI), vol. 5032, pp. 126–137. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    James, M.R., Wessling, T., Vlassis, N.: Improving approximate value iteration using memories and predictive state representations. In: AAAI (2006)Google Scholar
  12. 12.
    James, M.R., Singh, S., Littman, M.L.: Planning with predictive state representations. In: International Conference on Machine Learning and Applications, pp. 304–311 (2004)Google Scholar
  13. 13.
    Littman, M., Sutton, R., Singh, S.: Predictive representations of state. In: Advances in Neural Information Processing Systems, NIPS (2002)Google Scholar
  14. 14.
    McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, University of Rochester (1996)Google Scholar
  15. 15.
    McCracken, P., Bowling, M.: Online discovery and learning of predictive state representations. In: Neural Information Processing Systems, vol. 18 (2006)Google Scholar
  16. 16.
    Nguyen, P., Sunehag, P., Hutter, M.: Feature reinforcement learning in practice. Tech. rep. (2011)Google Scholar
  17. 17.
    Poupart, P., Vlassis, N.: Model-based bayesian reinforcement learning in partially observable domains. In: Tenth International Symposium on Artificial Intelligence and Mathematics, ISAIM (2008)Google Scholar
  18. 18.
    Rafols, E.J., Ring, M., Sutton, R., Tanner, B.: Using predictive representations to improve generalization in reinforcement learning. In: IJCAI (2005)Google Scholar
  19. 19.
    Rosencrantz, M., Gordon, G.J., Thrun, S.: Learning low dimensional predictive representations. In: Proceedings ICML (2004)Google Scholar
  20. 20.
    Ross, S., Pineau, J., Chaib-draa, B., Kreitmann, P.: A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research 12, 1655–1696 (2011)MathSciNetGoogle Scholar
  21. 21.
    Singh, S., James, M., Rudary, M.: Predictive state representations: A new theory for modeling dynamical systems. In: Proceedings UAI (2004)Google Scholar
  22. 22.
    Soni, V., Singh, S.: Abstraction in predictive state representations. In: AAAI (2007)Google Scholar
  23. 23.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press (1998)Google Scholar
  24. 24.
    Talvitie, E., Singh, S.: Simple local models for complex dynamical systems. In: Advances in Neural Information Processing Systems, NIPS (2008)Google Scholar
  25. 25.
    Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI approximation. JAIR 40, 95–142 (2011)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sylvie C. W. Ong
    • 1
  • Yuri Grinberg
    • 1
  • Joelle Pineau
    • 1
  1. 1.School of Computer ScienceMcGill UniversityMontrealCanada

Personalised recommendations