Abstract
This article reviews an emerging field that aims for autonomous reinforcement learning (RL) directly on sensor-observations. Straightforward end-to-end RL has recently shown remarkable success, but relies on large amounts of samples. As this is not feasible in robotics, we review two approaches to learn intermediate state representations from previous experiences: deep auto-encoders and slow-feature analysis. We analyze theoretical properties of the representations and point to potential improvements.
Similar content being viewed by others
Notes
Perhaps with the exception of TD-Gammon [53], which relied heavily on a well chosen representation as input.
Sampling from trajectories with changing policies leads to non-stationary training distributions and prevents convergence in online gradient descent algorithms.
See [5] for a comparison of SFA/PVF subspace-invariance.
In the limit of infinite training samples, the optimization problem can be analyzed by function analysis in \(L^2({\mathcal {Z}}, \xi )\).
It is not entirely clear why empirical PVF fail here. One can observe that ideal PVF features have higher frequencies than SFA’s, which may be harder to estimate empirically.
References
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Bellman RE (1957) Dynamic programming. Princeton University Press
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems
Böhmer W, Grünewälder S, Nickisch H, Obermayer K (2012) Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis. Mach Learn 89(1–2):67–86
Böhmer W, Grünewälder S, Shen Y, Musial M, Obermayer K (2013) Construction of approximation spaces for reinforcement learning. J Mach Learn Res 14:2067–2118
Böhmer W, Obermayer K (2013) Towards structural generalization: Factored approximate planning. ICRA Workshop on Autonomous Learning. http://autonomous-learning.org/wp-content/uploads/13-ALW/paper_1.pdf
Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11:1–94
Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp 369–376
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1/2/3):33–57
Džeroski S, Raedt LD, Drissens K (2001) Relational reinforcement learning. Mach Learn 43:7–52
Ferguson K, Mahadevan S (2006) Proto-transfer learning in Markov decision processes using spectral methods. In: ICML Workshop on Transfer Learning
Ferrante E, Lazaric A, Restelli M (2008) Transfer of task representation in reinforcement learning using policy-based proto-value functions. In: International Joint Conference on Autonomous Agents and Multiagent Systems
Franzius M, Sprekeler H, Wiskott L (2007) Slowness and sparseness leads to place, head-direction, and spatial-view cells. PLoS Comput Biol 3(8):e166
Hafner R, Riedmiller M (2011) Reinforcement learning in feedback control. Mach Learn 27(1):55–74
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Jonschkowski R, Brock O (2013) Learning task-specific state representations by maximizing slowness and predictability. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-13-ERLARS-final.pdf
Jonschkowski R, Brock O (2014) State representation learning in robotics: Using prior knowledge about physical interaction. In: Proceedings of Robotics, Science and Systems
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR
Kober J, Bagnell D, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Konidaris GD, Osentoski S, Thomas P (2011) Value function approximation in reinforcement learning using the Fourier basis. In: Proceedings of the Twenty-Fifth Conference on Artificial Intelligence
Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149
Lang T, Toussaint M (2010) Planning with noisy probabilistic relational rules. J Artif Intell Res 39:1–49
Lange S, Riedmiller M, Voigtlaender A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, Brisbane, Australia
Legenstein R, Wilbert N, Wiskott L (2010) Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput Biol 6(8):894–e1000
Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems
Lin LJ (1992) Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA
Littman ML, Sutton RS, Singh S (2001) Predictive representations of state. In: Advances in Neural Information Processing Systems, vol 14
Luciw M, Schmidhuber J (2012) Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: International Conference on Artificial Neural Networks and Machine Learning, vol III. Springer, pp 279–287
Maass W, Natschlaeger T, Markram H (2002) Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput 14(11):2531–2560
Maddison CJ, Huang A, Sutskever I, Silver D (2014) Move evaluation in go using deep convolutional neural networks. arXiv preprint arXiv:1412.6564
Mahadevan S, Liu B (2010) Basis construction from power series expansions of value functions. In: Advances in Neutral Information Processing Systems, pp 1540–1548
Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representations and control in Markov decision processes. J Mach Learn Res 8:2169–2231
Mattner J, Lange S, Riedmiller M (2012) Learn to swing up and balance a real pole based on raw visual input data. In: Proceedings of the 19th International Conference on Neural Information Processing (5) (ICONIP 2012). Dohar, Qatar, pp 126–133
Mnih V, Hees N, Graves A, Kavukcuoglu, K (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop
Mordatch I, Todorov E (2014) Combining the benefits of function approximation and trajectory optimization. In: Proceedings of Robotics: Science and Systems (RSS)
Parr R, Li L, Taylor G, Painter-Wakefiled C, Littman ML (2008) An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: International Conference on Machine Learning
Parr R, Painter-Wakefield C, Li L, Littman M (2007) Analyzing feature generation for value-function approximation. In: International Conference on Machine Learning
Petrik M (2007) An analysis of Laplacian methods for value function approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp 2574–2579
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: ICML
Riedmiller M (2005) Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In: 16th European Conference on Machine Learning. Springer, pp 317–328
Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robot 27(1):55–74
Sallans B, Hinton GE (2004) Reinforcement learning with factored states and actions. J Mach Learn Res 5:1063–1088
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: The 31st International Conference on Machine Learning (ICML 2014)
Snel M, Whiteson S (2011) Multi-task reinforcement learning: Shaping and feature selection. In: European Workshop on Reinforcement Learning, pp 237–248
Sprekeler H (2011) On the relationship of slow feature analysis and Laplacian eigenmaps. Neural Comput 23(12):3287–3302
Sutton RS, Barto AG (1998) Reinforcement Learning: an introduction. MIT Press
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685
Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11:3371–3408
Wingate D, Singh SP (2007) On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In: International Joint Conference on Autonomous Agents and Multiagent Systems, pp 1128–1135 (2007)
Wiskott L (2003) Slow feature analysis: a theoretical analysis of optimal free responses. Neural Comput 15(9):2147–2177
Wiskott L, Sejnowski T (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput 14(4):715–770
Acknowledgments
We would like to thank Sebastian Höfer and Rico Jonschkowski for many fruitful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially funded by the German science foundation (DFG) within the priority program SPP 1527.
Rights and permissions
About this article
Cite this article
Böhmer, W., Springenberg, J.T., Boedecker, J. et al. Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations. Künstl Intell 29, 353–362 (2015). https://doi.org/10.1007/s13218-015-0356-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-015-0356-1