Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Böhmer, Wendelin; Springenberg, Jost Tobias; Boedecker, Joschka; Riedmiller, Martin; Obermayer, Klaus

doi:10.1007/s13218-015-0356-1

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Technical Contribution
Published: 19 March 2015

Volume 29, pages 353–362, (2015)
Cite this article

KI - Künstliche Intelligenz Aims and scope Submit manuscript

Wendelin Böhmer¹,
Jost Tobias Springenberg²,
Joschka Boedecker²,
Martin Riedmiller² &
…
Klaus Obermayer¹

1638 Accesses
30 Citations
3 Altmetric
Explore all metrics

Abstract

This article reviews an emerging field that aims for autonomous reinforcement learning (RL) directly on sensor-observations. Straightforward end-to-end RL has recently shown remarkable success, but relies on large amounts of samples. As this is not feasible in robotics, we review two approaches to learn intermediate state representations from previous experiences: deep auto-encoders and slow-feature analysis. We analyze theoretical properties of the representations and point to potential improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Article 16 November 2021

Research on Robot Control Based on Reinforcement Learning

First Steps Towards State Representation Learning for Cognitive Robotics

Notes

Perhaps with the exception of TD-Gammon [53], which relied heavily on a well chosen representation as input.
Sampling from trajectories with changing policies leads to non-stationary training distributions and prevents convergence in online gradient descent algorithms.
The back-propagated Bellman-error could potentially also be used to fine-tune the representation, but both [25, 35] chose not to adapt the representation to the task.
See [5] for a comparison of SFA/PVF subspace-invariance.
In the limit of infinite training samples, the optimization problem can be analyzed by function analysis in \(L^2({\mathcal {Z}}, \xi )\).
It is not entirely clear why empirical PVF fail here. One can observe that ideal PVF features have higher frequencies than SFA’s, which may be harder to estimate empirically.

References

Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article MATH Google Scholar
Bellman RE (1957) Dynamic programming. Princeton University Press
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems
Böhmer W, Grünewälder S, Nickisch H, Obermayer K (2012) Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis. Mach Learn 89(1–2):67–86
Article MathSciNet MATH Google Scholar
Böhmer W, Grünewälder S, Shen Y, Musial M, Obermayer K (2013) Construction of approximation spaces for reinforcement learning. J Mach Learn Res 14:2067–2118
MathSciNet MATH Google Scholar
Böhmer W, Obermayer K (2013) Towards structural generalization: Factored approximate planning. ICRA Workshop on Autonomous Learning. http://autonomous-learning.org/wp-content/uploads/13-ALW/paper_1.pdf
Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11:1–94
MathSciNet MATH Google Scholar
Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp 369–376
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1/2/3):33–57
Article MATH Google Scholar
Džeroski S, Raedt LD, Drissens K (2001) Relational reinforcement learning. Mach Learn 43:7–52
Article MATH Google Scholar
Ferguson K, Mahadevan S (2006) Proto-transfer learning in Markov decision processes using spectral methods. In: ICML Workshop on Transfer Learning
Ferrante E, Lazaric A, Restelli M (2008) Transfer of task representation in reinforcement learning using policy-based proto-value functions. In: International Joint Conference on Autonomous Agents and Multiagent Systems
Franzius M, Sprekeler H, Wiskott L (2007) Slowness and sparseness leads to place, head-direction, and spatial-view cells. PLoS Comput Biol 3(8):e166
Article MathSciNet Google Scholar
Hafner R, Riedmiller M (2011) Reinforcement learning in feedback control. Mach Learn 27(1):55–74
MathSciNet Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Jonschkowski R, Brock O (2013) Learning task-specific state representations by maximizing slowness and predictability. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-13-ERLARS-final.pdf
Jonschkowski R, Brock O (2014) State representation learning in robotics: Using prior knowledge about physical interaction. In: Proceedings of Robotics, Science and Systems
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134
Article MathSciNet MATH Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Google Scholar
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR
Kober J, Bagnell D, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Article Google Scholar
Konidaris GD, Osentoski S, Thomas P (2011) Value function approximation in reinforcement learning using the Fourier basis. In: Proceedings of the Twenty-Fifth Conference on Artificial Intelligence
Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149
MathSciNet Google Scholar
Lang T, Toussaint M (2010) Planning with noisy probabilistic relational rules. J Artif Intell Res 39:1–49
MATH Google Scholar
Lange S, Riedmiller M, Voigtlaender A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, Brisbane, Australia
Legenstein R, Wilbert N, Wiskott L (2010) Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput Biol 6(8):894–e1000
Article MathSciNet Google Scholar
Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems
Lin LJ (1992) Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA
Littman ML, Sutton RS, Singh S (2001) Predictive representations of state. In: Advances in Neural Information Processing Systems, vol 14
Luciw M, Schmidhuber J (2012) Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: International Conference on Artificial Neural Networks and Machine Learning, vol III. Springer, pp 279–287
Maass W, Natschlaeger T, Markram H (2002) Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput 14(11):2531–2560
Article MATH Google Scholar
Maddison CJ, Huang A, Sutskever I, Silver D (2014) Move evaluation in go using deep convolutional neural networks. arXiv preprint arXiv:1412.6564
Mahadevan S, Liu B (2010) Basis construction from power series expansions of value functions. In: Advances in Neutral Information Processing Systems, pp 1540–1548
Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representations and control in Markov decision processes. J Mach Learn Res 8:2169–2231
MathSciNet MATH Google Scholar
Mattner J, Lange S, Riedmiller M (2012) Learn to swing up and balance a real pole based on raw visual input data. In: Proceedings of the 19th International Conference on Neural Information Processing (5) (ICONIP 2012). Dohar, Qatar, pp 126–133
Mnih V, Hees N, Graves A, Kavukcuoglu, K (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop
Mordatch I, Todorov E (2014) Combining the benefits of function approximation and trajectory optimization. In: Proceedings of Robotics: Science and Systems (RSS)
Parr R, Li L, Taylor G, Painter-Wakefiled C, Littman ML (2008) An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: International Conference on Machine Learning
Parr R, Painter-Wakefield C, Li L, Littman M (2007) Analyzing feature generation for value-function approximation. In: International Conference on Machine Learning
Petrik M (2007) An analysis of Laplacian methods for value function approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp 2574–2579
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: ICML
Riedmiller M (2005) Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In: 16th European Conference on Machine Learning. Springer, pp 317–328
Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robot 27(1):55–74
Article Google Scholar
Sallans B, Hinton GE (2004) Reinforcement learning with factored states and actions. J Mach Learn Res 5:1063–1088
MathSciNet MATH Google Scholar
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: The 31st International Conference on Machine Learning (ICML 2014)
Snel M, Whiteson S (2011) Multi-task reinforcement learning: Shaping and feature selection. In: European Workshop on Reinforcement Learning, pp 237–248
Sprekeler H (2011) On the relationship of slow feature analysis and Laplacian eigenmaps. Neural Comput 23(12):3287–3302
Article MathSciNet MATH Google Scholar
Sutton RS, Barto AG (1998) Reinforcement Learning: an introduction. MIT Press
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685
MathSciNet MATH Google Scholar
Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68
Article Google Scholar
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11:3371–3408
MathSciNet MATH Google Scholar
Wingate D, Singh SP (2007) On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In: International Joint Conference on Autonomous Agents and Multiagent Systems, pp 1128–1135 (2007)
Wiskott L (2003) Slow feature analysis: a theoretical analysis of optimal free responses. Neural Comput 15(9):2147–2177
Article MATH Google Scholar
Wiskott L, Sejnowski T (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput 14(4):715–770
Article MATH Google Scholar

Download references

Acknowledgments

We would like to thank Sebastian Höfer and Rico Jonschkowski for many fruitful discussions.

Author information

Authors and Affiliations

Neural Information Processing Group, Technische Universität Berlin, Sekr. MAR 5-6, Marchstrasse 23, 10587, Berlin, Germany
Wendelin Böhmer & Klaus Obermayer
Machine Learning Lab, Universtät Freiburg, Freiburg, Germany
Jost Tobias Springenberg, Joschka Boedecker & Martin Riedmiller

Authors

Wendelin Böhmer
View author publications
You can also search for this author in PubMed Google Scholar
Jost Tobias Springenberg
View author publications
You can also search for this author in PubMed Google Scholar
Joschka Boedecker
View author publications
You can also search for this author in PubMed Google Scholar
Martin Riedmiller
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Obermayer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wendelin Böhmer.

Additional information

This work was partially funded by the German science foundation (DFG) within the priority program SPP 1527.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Böhmer, W., Springenberg, J.T., Boedecker, J. et al. Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations. Künstl Intell 29, 353–362 (2015). https://doi.org/10.1007/s13218-015-0356-1

Download citation

Received: 03 February 2015
Accepted: 05 March 2015
Published: 19 March 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s13218-015-0356-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Research on Robot Control Based on Reinforcement Learning

First Steps Towards State Representation Learning for Cognitive Robotics

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Research on Robot Control Based on Reinforcement Learning

First Steps Towards State Representation Learning for Cognitive Robotics

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation