Advertisement

Predictively Defined Representations of State

  • David Wingate
Part of the Adaptation, Learning, and Optimization book series (ALO, volume 12)

Abstract

The concept of state is central to dynamical systems. In any timeseries problem—such as filtering, planning or forecasting—models and algorithms summarize important information from the past into some sort of state variable. In this chapter, we start with a broad examination of the concept of state, with emphasis on the fact that there are many possible representations of state for a given dynamical system, each with different theoretical and computational properties. We then focus on models with predictively defined representations of state that represent state as a set of statistics about the short-term future, as opposed to the classic approach of treating state as a latent, unobservable quantity. In other words, the past is summarized into predictions about the actions and observations in the short-term future, which can be used to make further predictions about the infinite future.While this representational idea applies to any dynamical system problem, it is particularly useful in a model-based RL context, when an agent must learn a representation of state and a model of system dynamics online: because the representation (and hence all of the model’s parameters) are defined using only statistics of observable quantities, their learning algorithms are often straightforward and have attractive theoretical properties. Here, we survey the basic concepts of predictively defined representations of state, important auxiliary constructs (such as the systems dynamics matrix), and theoretical results on their representational power and learnability.

Keywords

Belief State Core Test Neural Information Processings System Policy Gradient Finite State Controller 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aberdeen, D., Buffet, O., Thomas, O.: Policy-gradients for psrs and pomdps. In: International Workshop on Artificial Intelligence and Statistics, AISTAT (2007)Google Scholar
  2. Astrom, K.J.: Optimal control of Markov decision processes with the incomplete state estimation. Journal of Computer and System Sciences 10, 174–205 (1965)MathSciNetGoogle Scholar
  3. Boots, B., Siddiqi, S., Gordon, G.: Closing the learning-planning loop with predictive state representations. In: Proceedings of Robotics: Science and Systems VI, RSS (2010)Google Scholar
  4. Boularias, A., Chaib-draa, B.: Predictive representations for policy gradient in pomdps. In: International Conference on Machine Learning, ICML (2009)Google Scholar
  5. Bowling, M., McCracken, P., James, M., Neufeld, J., Wilkinson, D.: Learning predictive state representations using non-blind policies. In: International Conference on Machine Learning (ICML), pp. 129–136 (2006)Google Scholar
  6. Ghahramani, Z., Hinton, G.E.: Parameter estimation for linear dynamical systems. Tech. Rep. CRG-TR-96-2, Dept. of Computer Science, U. of Toronto (1996)Google Scholar
  7. Izadi, M., Precup, D.: Model minimization by linear psr. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1749–1750 (2005)Google Scholar
  8. Izadi, M.T., Precup, D.: Point-Based Planning for Predictive State Representations. In: Bergler, S. (ed.) Canadian AI. LNCS (LNAI), vol. 5032, pp. 126–137. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. Jaeger, H.: Observable operator processes and conditioned continuation representations. Neural Computation 12(6), 1371–1398 (2000)CrossRefGoogle Scholar
  10. Jaeger, H.: Discrete-time, discrete-valued observable operator models: A tutorial. Tech. rep., International University Bremen (2004)Google Scholar
  11. James, M., Singh, S., Littman, M.: Planning with predictive state representations. In: International Conference on Machine Learning and Applications (ICMLA), pp. 304–311 (2004)Google Scholar
  12. James, M., Wessling, T., Vlassis, N.: Improving approximate value iteration using memories and predictive state representations. In: Proceedings of AAAI (2006)Google Scholar
  13. James, M.R.: Using predictions for planning and modeling in stochastic environments. PhD thesis, University of Michigan (2005)Google Scholar
  14. James, M.R., Singh, S.: Learning and discovery of predictive state representations in dynamical systems with reset. In: International Conference on Machine Learning (ICML), pp. 417–424 (2004)Google Scholar
  15. James, M.R., Singh, S.: Planning in models that combine memory with predictive representations of state. In: National Conference on Artificial Intelligence (AAAI), pp. 987–992 (2005a)Google Scholar
  16. James, M.R., Wolfe, B., Singh, S.: Combining memory and landmarks with predictive state representations. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 734–739 (2005b)Google Scholar
  17. Kalman, R.E.: A new approach to linear filtering and prediction problem. Transactions of the ASME—Journal of Basic Engineering 82(Series D), 35–45 (1960)CrossRefGoogle Scholar
  18. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research (JMLR) 4, 1107–1149 (2003)MathSciNetGoogle Scholar
  19. Littman, M.L., Sutton, R.S., Singh, S.: Predictive representations of state. In: Neural Information Processing Systems (NIPS), pp. 1555–1561 (2002)Google Scholar
  20. McCracken, P., Bowling, M.: Online discovery and learning of predictive state representations. In: Neural Information Processings Systems (NIPS), pp. 875–882 (2006)Google Scholar
  21. Nikovski, D.: State-aggregation algorithms for learning probabilistic models for robot control. PhD thesis, Carnegie Mellon University (2002)Google Scholar
  22. Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: European Conference on Machine Learning (ECML), pp. 280–291 (2005)Google Scholar
  23. Precup, D., Sutton, R.S., Singh, S.: Theoretical results on reinforcement learning with temporally abstract options. In: European Conference on Machine Learning (ECML), pp. 382–393 (1998)Google Scholar
  24. Rafols, E.J., Ring, M.B., Sutton, R.S., Tanner, B.: Using predictive representations to improve generalization in reinforcement learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 835–840 (2005)Google Scholar
  25. Rivest, R.L., Schapire, R.E.: Diversity-based inference of finite automata. In: IEEE Symposium on the Foundations of Computer Science, pp. 78–87 (1987)Google Scholar
  26. Rosencrantz, M., Gordon, G., Thrun, S.: Learning low dimensional predictive representations. In: International Conference on Machine Learning (ICML), pp. 695–702 (2004)Google Scholar
  27. Rudary, M., Singh, S.: Predictive linear-Gaussian models of stochastic dynamical systems with vector-value actions and observations. In: Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics, ISAIM (2008)Google Scholar
  28. Rudary, M.R., Singh, S.: A nonlinear predictive state representation. Neural Information Processing Systems (NIPS), 855–862 (2004)Google Scholar
  29. Rudary, M.R., Singh, S.: Predictive linear-Gaussian models of controlled stochastic dynamical systems. In: International Conference on Machine Learning (ICML), pp. 777–784 (2006)Google Scholar
  30. Rudary, M.R., Singh, S., Wingate, D.: Predictive linear-Gaussian models of stochastic dynamical systems. In: Uncertainty in Artificial Intelligence, pp. 501–508 (2005)Google Scholar
  31. Shatkay, H., Kaelbling, L.P.: Learning geometrically-constrained hidden Markov models for robot navigation: Bridging the geometrical-topological gap. Journal of AI Research (JAIR), 167–207 (2002)Google Scholar
  32. Singh, S., Littman, M., Jong, N., Pardoe, D., Stone, P.: Learning predictive state representations. In: International Conference on Machine Learning (ICML), pp. 712–719 (2003)Google Scholar
  33. Singh, S., James, M.R., Rudary, M.R.: Predictive state representations: A new theory for modeling dynamical systems. In: Uncertainty in Artificial Intelligence (UAI), pp. 512–519 (2004)Google Scholar
  34. Sutton, R.S., Tanner, B.: Temporal-difference networks. In: Neural Information Processing Systems (NIPS), pp. 1377–1384 (2005)Google Scholar
  35. Tanner, B., Sutton, R.: Td(lambda) networks: Temporal difference networks with eligibility traces. In: International Conference on Machine Learning (ICML), pp. 888–895 (2005a)Google Scholar
  36. Tanner, B., Sutton, R.: b) Temporal difference networks with history. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 865–870 (2005)Google Scholar
  37. Tanner, B., Bulitko, V., Koop, A., Paduraru, C.: Grounding abstractions in predictive state representations. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1077–1082 (2007)Google Scholar
  38. Wan, E.A., van der Merwe, R.: The unscented Kalman filter for nonlinear estimation. In: Proceedings of Symposium 2000 on Adaptive Systems for Signal Processing, Communication and Control (2000)Google Scholar
  39. Wiewiora, E.: Learning predictive representations from a history. In: International Conference on Machine Learning (ICML), pp. 964–971 (2005)Google Scholar
  40. Wingate, D.: Exponential family predictive representations of state. PhD thesis, University of Michigan (2008)Google Scholar
  41. Wingate, D., Singh, S.: Kernel predictive linear Gaussian models for nonlinear stochastic dynamical systems. In: International Conference on Machine Learning (ICML), pp. 1017–1024 (2006a)Google Scholar
  42. Wingate, D., Singh, S.: Mixtures of predictive linear Gaussian models for nonlinear stochastic dynamical systems. In: National Conference on Artificial Intelligence (AAAI) (2006b)Google Scholar
  43. Wingate, D., Singh, S.: Exponential family predictive representations of state. In: Neural Information Processing Systems, NIPS (2007a) (to appear)Google Scholar
  44. Wingate, D., Singh, S.: On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1128–1135 (2007b)Google Scholar
  45. Wingate, D., Singh, S.: Efficiently learning linear-linear exponential family predictive representations of state. In: International Conference on Machine Learning, ICML (2008)Google Scholar
  46. Wingate, D., Soni, V., Wolfe, B., Singh, S.: Relational knowledge with predictive representations of state. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 2035–2040 (2007)Google Scholar
  47. Wolfe, B.: Modeling dynamical systems with structured predictive state representations. PhD thesis, University of Michigan (2009)Google Scholar
  48. Wolfe, B.: Valid parameters for predictive state representations. In: Eleventh International Symposium on Artificial Intelligence and Mathematics (ISAIM) (2010)Google Scholar
  49. Wolfe, B., James, M.R., Singh, S.: Learning predictive state representations in dynamical systems without reset. In: International Conference on Machine Learning, pp. 980–987 (2005)Google Scholar
  50. Wolfe, B., James, M., Singh, S.: Approximate predictive state representations. In: Procedings of the 2008 International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2008)Google Scholar
  51. Wolfe, B., James, M., Singh, S.: Modeling multiple-mode systems with predictive state representations. In: Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Massachusetts Institute of TechnologyCambridgeUS

Personalised recommendations