KI - Künstliche Intelligenz

, Volume 29, Issue 4, pp 353–362 | Cite as

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations

  • Wendelin BöhmerEmail author
  • Jost Tobias Springenberg
  • Joschka Boedecker
  • Martin Riedmiller
  • Klaus Obermayer
Technical Contribution


This article reviews an emerging field that aims for autonomous reinforcement learning (RL) directly on sensor-observations. Straightforward end-to-end RL has recently shown remarkable success, but relies on large amounts of samples. As this is not feasible in robotics, we review two approaches to learn intermediate state representations from previous experiences: deep auto-encoders and slow-feature analysis. We analyze theoretical properties of the representations and point to potential improvements.


End-to-end reinforcement learning Representation learning Deep auto-encoder networks Slow feature analysis Autonomous robotics 



We would like to thank Sebastian Höfer and Rico Jonschkowski for many fruitful discussions.


  1. 1.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396CrossRefzbMATHGoogle Scholar
  2. 2.
    Bellman RE (1957) Dynamic programming. Princeton University PressGoogle Scholar
  3. 3.
    Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing SystemsGoogle Scholar
  4. 4.
    Böhmer W, Grünewälder S, Nickisch H, Obermayer K (2012) Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis. Mach Learn 89(1–2):67–86MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Böhmer W, Grünewälder S, Shen Y, Musial M, Obermayer K (2013) Construction of approximation spaces for reinforcement learning. J Mach Learn Res 14:2067–2118MathSciNetzbMATHGoogle Scholar
  6. 6.
    Böhmer W, Obermayer K (2013) Towards structural generalization: Factored approximate planning. ICRA Workshop on Autonomous Learning.
  7. 7.
    Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11:1–94MathSciNetzbMATHGoogle Scholar
  8. 8.
    Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp 369–376Google Scholar
  9. 9.
    Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1/2/3):33–57CrossRefzbMATHGoogle Scholar
  10. 10.
    Džeroski S, Raedt LD, Drissens K (2001) Relational reinforcement learning. Mach Learn 43:7–52CrossRefzbMATHGoogle Scholar
  11. 11.
    Ferguson K, Mahadevan S (2006) Proto-transfer learning in Markov decision processes using spectral methods. In: ICML Workshop on Transfer LearningGoogle Scholar
  12. 12.
    Ferrante E, Lazaric A, Restelli M (2008) Transfer of task representation in reinforcement learning using policy-based proto-value functions. In: International Joint Conference on Autonomous Agents and Multiagent SystemsGoogle Scholar
  13. 13.
    Franzius M, Sprekeler H, Wiskott L (2007) Slowness and sparseness leads to place, head-direction, and spatial-view cells. PLoS Comput Biol 3(8):e166MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hafner R, Riedmiller M (2011) Reinforcement learning in feedback control.  Mach Learn 27(1):55–74MathSciNetGoogle Scholar
  15. 15.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Jonschkowski R, Brock O (2013) Learning task-specific state representations by maximizing slowness and predictability.
  17. 17.
    Jonschkowski R, Brock O (2014) State representation learning in robotics: Using prior knowledge about physical interaction. In: Proceedings of Robotics, Science and SystemsGoogle Scholar
  18. 18.
    Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285Google Scholar
  20. 20.
    Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: ICLRGoogle Scholar
  21. 21.
    Kober J, Bagnell D, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274CrossRefGoogle Scholar
  22. 22.
    Konidaris GD, Osentoski S, Thomas P (2011) Value function approximation in reinforcement learning using the Fourier basis. In: Proceedings of the Twenty-Fifth Conference on Artificial IntelligenceGoogle Scholar
  23. 23.
    Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149MathSciNetGoogle Scholar
  24. 24.
    Lang T, Toussaint M (2010) Planning with noisy probabilistic relational rules. J Artif Intell Res 39:1–49zbMATHGoogle Scholar
  25. 25.
    Lange S, Riedmiller M, Voigtlaender A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, Brisbane, AustraliaGoogle Scholar
  26. 26.
    Legenstein R, Wilbert N, Wiskott L (2010) Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput Biol 6(8):894–e1000MathSciNetCrossRefGoogle Scholar
  27. 27.
    Levine S, Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing SystemsGoogle Scholar
  28. 28.
    Lin LJ (1992) Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USAGoogle Scholar
  29. 29.
    Littman ML, Sutton RS, Singh S (2001) Predictive representations of state. In: Advances in Neural Information Processing Systems, vol 14Google Scholar
  30. 30.
    Luciw M, Schmidhuber J (2012) Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: International Conference on Artificial Neural Networks and Machine Learning, vol III. Springer, pp 279–287Google Scholar
  31. 31.
    Maass W, Natschlaeger T, Markram H (2002) Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput 14(11):2531–2560CrossRefzbMATHGoogle Scholar
  32. 32.
    Maddison CJ, Huang A, Sutskever I, Silver D (2014) Move evaluation in go using deep convolutional neural networks. arXiv preprint arXiv:1412.6564
  33. 33.
    Mahadevan S, Liu B (2010) Basis construction from power series expansions of value functions. In: Advances in Neutral Information Processing Systems, pp 1540–1548Google Scholar
  34. 34.
    Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representations and control in Markov decision processes. J Mach Learn Res 8:2169–2231MathSciNetzbMATHGoogle Scholar
  35. 35.
    Mattner J, Lange S, Riedmiller M (2012) Learn to swing up and balance a real pole based on raw visual input data. In: Proceedings of the 19th International Conference on Neural Information Processing (5) (ICONIP 2012). Dohar, Qatar, pp 126–133Google Scholar
  36. 36.
    Mnih V, Hees N, Graves A, Kavukcuoglu, K (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing SystemsGoogle Scholar
  37. 37.
    Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS Deep Learning WorkshopGoogle Scholar
  38. 38.
    Mordatch I, Todorov E (2014) Combining the benefits of function approximation and trajectory optimization. In: Proceedings of Robotics: Science and Systems (RSS)Google Scholar
  39. 39.
    Parr R, Li L, Taylor G, Painter-Wakefiled C, Littman ML (2008) An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: International Conference on Machine LearningGoogle Scholar
  40. 40.
    Parr R, Painter-Wakefield C, Li L, Littman M (2007) Analyzing feature generation for value-function approximation. In: International Conference on Machine LearningGoogle Scholar
  41. 41.
    Petrik M (2007) An analysis of Laplacian methods for value function approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp 2574–2579Google Scholar
  42. 42.
    Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: ICMLGoogle Scholar
  43. 43.
    Riedmiller M (2005) Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In: 16th European Conference on Machine Learning. Springer, pp 317–328Google Scholar
  44. 44.
    Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robot 27(1):55–74CrossRefGoogle Scholar
  45. 45.
    Sallans B, Hinton GE (2004) Reinforcement learning with factored states and actions. J Mach Learn Res 5:1063–1088MathSciNetzbMATHGoogle Scholar
  46. 46.
    Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319CrossRefGoogle Scholar
  47. 47.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  48. 48.
    Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: The 31st International Conference on Machine Learning (ICML 2014)Google Scholar
  49. 49.
    Snel M, Whiteson S (2011) Multi-task reinforcement learning: Shaping and feature selection. In: European Workshop on Reinforcement Learning, pp 237–248Google Scholar
  50. 50.
    Sprekeler H (2011) On the relationship of slow feature analysis and Laplacian eigenmaps. Neural Comput 23(12):3287–3302MathSciNetCrossRefzbMATHGoogle Scholar
  51. 51.
    Sutton RS, Barto AG (1998) Reinforcement Learning: an introduction. MIT PressGoogle Scholar
  52. 52.
    Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685MathSciNetzbMATHGoogle Scholar
  53. 53.
    Tesauro G (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58–68CrossRefGoogle Scholar
  54. 54.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res (JMLR) 11:3371–3408MathSciNetzbMATHGoogle Scholar
  55. 55.
    Wingate D, Singh SP (2007) On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In: International Joint Conference on Autonomous Agents and Multiagent Systems, pp 1128–1135 (2007)Google Scholar
  56. 56.
    Wiskott L (2003) Slow feature analysis: a theoretical analysis of optimal free responses. Neural Comput 15(9):2147–2177CrossRefzbMATHGoogle Scholar
  57. 57.
    Wiskott L, Sejnowski T (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput 14(4):715–770CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Wendelin Böhmer
    • 1
    Email author
  • Jost Tobias Springenberg
    • 2
  • Joschka Boedecker
    • 2
  • Martin Riedmiller
    • 2
  • Klaus Obermayer
    • 1
  1. 1.Neural Information Processing GroupTechnische Universität BerlinBerlinGermany
  2. 2.Machine Learning LabUniverstät FreiburgFreiburgGermany

Personalised recommendations