Skip to main content
Log in

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

This article reviews an emerging field that aims for autonomous reinforcement learning (RL) directly on sensor-observations. Straightforward end-to-end RL has recently shown remarkable success, but relies on large amounts of samples. As this is not feasible in robotics, we review two approaches to learn intermediate state representations from previous experiences: deep auto-encoders and slow-feature analysis. We analyze theoretical properties of the representations and point to potential improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Perhaps with the exception of TD-Gammon [53], which relied heavily on a well chosen representation as input.

  2. Sampling from trajectories with changing policies leads to non-stationary training distributions and prevents convergence in online gradient descent algorithms.

  3. The back-propagated Bellman-error could potentially also be used to fine-tune the representation, but both [25, 35] chose not to adapt the representation to the task.

  4. See [5] for a comparison of SFA/PVF subspace-invariance.

  5. In the limit of infinite training samples, the optimization problem can be analyzed by function analysis in \(L^2({\mathcal {Z}}, \xi )\).

  6. It is not entirely clear why empirical PVF fail here. One can observe that ideal PVF features have higher frequencies than SFA’s, which may be harder to estimate empirically.

References

  1. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  2. Bellman RE (1957) Dynamic programming. Princeton University Press

  3. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems

  4. Böhmer W, Grünewälder S, Nickisch H, Obermayer K (2012) Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis. Mach Learn 89(1–2):67–86

    Article  MathSciNet  MATH  Google Scholar 

  5. Böhmer W, Grünewälder S, Shen Y, Musial M, Obermayer K (2013) Construction of approximation spaces for reinforcement learning. J Mach Learn Res 14:2067–2118

    MathSciNet  MATH  Google Scholar 

  6. Böhmer W, Obermayer K (2013) Towards structural generalization: Factored approximate planning. ICRA Workshop on Autonomous Learning. http://autonomous-learning.org/wp-content/uploads/13-ALW/paper_1.pdf

  7. Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11:1–94

    MathSciNet  MATH  Google Scholar 

  8. Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp 369–376

  9. Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1/2/3):33–57

    Article  MATH  Google Scholar 

  10. Džeroski S, Raedt LD, Drissens K (2001) Relational reinforcement learning. Mach Learn 43:7–52

    Article  MATH  Google Scholar 

  11. Ferguson K, Mahadevan S (2006) Proto-transfer learning in Markov decision processes using spectral methods. In: ICML Workshop on Transfer Learning

  12. Ferrante E, Lazaric A, Restelli M (2008) Transfer of task representation in reinforcement learning using policy-based proto-value functions. In: International Joint Conference on Autonomous Agents and Multiagent Systems

  13. Franzius M, Sprekeler H, Wiskott L (2007) Slowness and sparseness leads to place, head-direction, and spatial-view cells. PLoS Comput Biol 3(8):e166

    Article  MathSciNet  Google Scholar 

  14. Hafner R, Riedmiller M (2011) Reinforcement learning in feedback control.  Mach Learn 27(1):55–74

    MathSciNet  Google Scholar 

  15. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  16. Jonschkowski R, Brock O (2013) Learning task-specific state representations by maximizing slowness and predictability. http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/Jonschkowski-13-ERLARS-final.pdf

  17. Jonschkowski R, Brock O (2014) State representation learning in robotics: Using prior knowledge about physical interaction. In: Proceedings of Robotics, Science and Systems

  18. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134

    Article  MathSciNet  MATH  Google Scholar 

  19. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  20. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLR

  21. Kober J, Bagnell D, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274

    Article  Google Scholar 

  22. Konidaris GD, Osentoski S, Thomas P (2011) Value function approximation in reinforcement learning using the Fourier basis. In: Proceedings of the Twenty-Fifth Conference on Artificial Intelligence

  23. Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149

    MathSciNet  Google Scholar 

  24. Lang T, Toussaint M (2010) Planning with noisy probabilistic relational rules. J Artif Intell Res 39:1–49

    MATH  Google Scholar 

  25. Lange S, Riedmiller M, Voigtlaender A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, Brisbane, Australia

  26. Legenstein R, Wilbert N, Wiskott L (2010) Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput Biol 6(8):894–e1000

    Article  MathSciNet  Google Scholar 

  27. Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems

  28. Lin LJ (1992) Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA

  29. Littman ML, Sutton RS, Singh S (2001) Predictive representations of state. In: Advances in Neural Information Processing Systems, vol 14

  30. Luciw M, Schmidhuber J (2012) Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: International Conference on Artificial Neural Networks and Machine Learning, vol III. Springer, pp 279–287

  31. Maass W, Natschlaeger T, Markram H (2002) Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput 14(11):2531–2560

    Article  MATH  Google Scholar 

  32. Maddison CJ, Huang A, Sutskever I, Silver D (2014) Move evaluation in go using deep convolutional neural networks. arXiv preprint arXiv:1412.6564

  33. Mahadevan S, Liu B (2010) Basis construction from power series expansions of value functions. In: Advances in Neutral Information Processing Systems, pp 1540–1548

  34. Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representations and control in Markov decision processes. J Mach Learn Res 8:2169–2231

    MathSciNet  MATH  Google Scholar 

  35. Mattner J, Lange S, Riedmiller M (2012) Learn to swing up and balance a real pole based on raw visual input data. In: Proceedings of the 19th International Conference on Neural Information Processing (5) (ICONIP 2012). Dohar, Qatar, pp 126–133

  36. Mnih V, Hees N, Graves A, Kavukcuoglu, K (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems

  37. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop

  38. Mordatch I, Todorov E (2014) Combining the benefits of function approximation and trajectory optimization. In: Proceedings of Robotics: Science and Systems (RSS)

  39. Parr R, Li L, Taylor G, Painter-Wakefiled C, Littman ML (2008) An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: International Conference on Machine Learning

  40. Parr R, Painter-Wakefield C, Li L, Littman M (2007) Analyzing feature generation for value-function approximation. In: International Conference on Machine Learning

  41. Petrik M (2007) An analysis of Laplacian methods for value function approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp 2574–2579

  42. Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: ICML

  43. Riedmiller M (2005) Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In: 16th European Conference on Machine Learning. Springer, pp 317–328

  44. Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robot 27(1):55–74

    Article  Google Scholar 

  45. Sallans B, Hinton GE (2004) Reinforcement learning with factored states and actions. J Mach Learn Res 5:1063–1088

    MathSciNet  MATH  Google Scholar 

  46. Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  47. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  48. Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: The 31st International Conference on Machine Learning (ICML 2014)

  49. Snel M, Whiteson S (2011) Multi-task reinforcement learning: Shaping and feature selection. In: European Workshop on Reinforcement Learning, pp 237–248

  50. Sprekeler H (2011) On the relationship of slow feature analysis and Laplacian eigenmaps. Neural Comput 23(12):3287–3302

    Article  MathSciNet  MATH  Google Scholar 

  51. Sutton RS, Barto AG (1998) Reinforcement Learning: an introduction. MIT Press

  52. Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685

    MathSciNet  MATH  Google Scholar 

  53. Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68

    Article  Google Scholar 

  54. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  55. Wingate D, Singh SP (2007) On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In: International Joint Conference on Autonomous Agents and Multiagent Systems, pp 1128–1135 (2007)

  56. Wiskott L (2003) Slow feature analysis: a theoretical analysis of optimal free responses. Neural Comput 15(9):2147–2177

    Article  MATH  Google Scholar 

  57. Wiskott L, Sejnowski T (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput 14(4):715–770

    Article  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank Sebastian Höfer and Rico Jonschkowski for many fruitful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wendelin Böhmer.

Additional information

This work was partially funded by the German science foundation (DFG) within the priority program SPP 1527.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Böhmer, W., Springenberg, J.T., Boedecker, J. et al. Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations. Künstl Intell 29, 353–362 (2015). https://doi.org/10.1007/s13218-015-0356-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-015-0356-1

Keywords

Navigation