Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks

  • Siegmund Duell
  • Steffen Udluft
  • Volkmar Sterzing
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7700)

Abstract

The aim of this chapter is to provide a series of tricks and recipes for neural state estimation, particularly for real world applications of reinforcement learning. We use various topologies of recurrent neural networks as they allow to identify the continuous valued, possibly high dimensional state space of complex dynamical systems. Recurrent neural networks explicitly offer possibilities to account for time and memory, in principle they are able to model any type of dynamical system. Because of these capabilities recurrent neural networks are a suitable tool to approximate a Markovian state space of dynamical systems. In a second step, reinforcement learning methods can be applied to solve a defined control problem. Besides the trick of using a recurrent neural network for state estimation, various issues regarding real world problems such as, large sets of observables and long-term dependencies are addressed.

Keywords

Reinforcement Learning Recurrent Neural Network Reward Function Markovian State Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bakker, B.: Reinforcement Learning with Long Short-Term Memory. In: Becker, S., Dietterich, T.G., Ghahramani, Y. (eds.) Advances in Neural Information Processing Systems, pp. 1475–1482. MIT Press (2002)Google Scholar
  2. 2.
    Bellman, R.E.: Dynamic Programming. Princeton University Press (1957)Google Scholar
  3. 3.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)CrossRefGoogle Scholar
  4. 4.
    Duell, S., Hans, A., Udluft, S.: The Markov Decision Process Extraction Network. In: Proc. of the 18th European Symposium on Artificial Neural Networks (2010)Google Scholar
  5. 5.
    Duell, S., Weichbrodt, L., Hans, A., Udluft, S.: Recurrent Neural State Estimation in Domains with Long-Term Dependencies. In: Proc. of the 20th European Symposium on Artificial Neural Networks (2012)Google Scholar
  6. 6.
    Frasconi, P., Gori, M., Soda, G.: Local feedback multilayered networks. Neural Computation 4(1), 120–130 (1992)CrossRefGoogle Scholar
  7. 7.
    Gomez, F., Miikkulainen, R.: 2-D Balancing with Recurrent Evolutionary Networks. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN 1998), pp. 425–430. Springer (1998)Google Scholar
  8. 8.
    Gomez, F.: Robust Non-Linear Control through Neuroevolution. PhD thesis, Departement of Computer Sciences Technical Report AI-TR-03-3003 (2003)Google Scholar
  9. 9.
    Haykin, S.: Neural networks and learning machines, vol. 3. Prentice-Hall (2009)Google Scholar
  10. 10.
    Haykin, S., Principe, J., Sejnowski, T., McWhirter, J.: New directions in statistical signal processing: from systems to brain. MIT Press (2007)Google Scholar
  11. 11.
    Kolen, J.F., Kremer, S.C.: A field guide to dynamical recurrent networks. IEEE Press (2001)Google Scholar
  12. 12.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Kietzmann, T.C., Riedmiller, M.: The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting. In: Proc. of the Int. Conf. on Machine Learning and Applications. IEEE (2009)Google Scholar
  14. 14.
    Lin, T., Horne, B.G., Tino, P., Giles, C.L.: Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks 7(6) (1996)Google Scholar
  15. 15.
    Medsker, L., Jain, L.: Recurrent Neural Networks: Design and Application. International Series on Comp. Intelligence, vol. I. CRC Press (1999)Google Scholar
  16. 16.
    Mozer, M.C.: Induction of multiscale temporal structure. In: Advances in Neural Information Processing Systems, vol. 4, pp. 275–282 (1992)Google Scholar
  17. 17.
    Meuleau, N., Peshkin, L., Kee-Eung, K., Kaebling, L.P.: Learning Finite-State Controllers for Partially Observable Environments. In: Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence (UAI 1999), pp. 427–436 (1999)Google Scholar
  18. 18.
    Neuneier, R., Zimmermann, H.-G.: How to Train Neural Networks. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 373–423. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  19. 19.
    Peters, J., Schaal, A.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4) (2008)Google Scholar
  20. 20.
    Ramachandran, D.: Knowledge and Ignorance in Reinforcement Learning. PhD thesis, University of Illinois (2011)Google Scholar
  21. 21.
    Rosenstein, M.T., Barto, A.G., Si, J., Powell, W., Wunsch, D.: Supervised actor-critic reinforcement learning. In: Handbook of Learning and Approximate Dynamic Programming, pp. 359–380 (2012)Google Scholar
  22. 22.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(9), 533–536 (1986)CrossRefGoogle Scholar
  23. 23.
    Riedmiller, M.: 10 Steps and Some Tricks to Set Up Neural Reinforcement Controllers. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 735–757. Springer, Heidelberg (2012)Google Scholar
  24. 24.
    Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM Journal on Research and Developement, 210–229 (1959)Google Scholar
  26. 26.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  27. 27.
    Schneegass, D.: Steigerung der Informationseffizienz im Reinforcement-Learning. PhD thesis, Luebeck University (2008)Google Scholar
  28. 28.
    Schäfer, A.M., Schneegass, D., Sterzing, V., Udluft, S.: A Neural Reinforcement Learning Approach to Gas Turbine Control. In: Proc. of the Int. Joint Conf. on Neural Networks (2007)Google Scholar
  29. 29.
    Schäfer, A.M., Udluft, S.: Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks. In: Workshop Proc. of the European Conf. on Machine Learning (2005)Google Scholar
  30. 30.
    Schneegass, D., Udluft, S., Martinetz, T.: Neural Rewards Regression for Near-Optimal Policy Identification in Markovian and Partial Observable Environments. In: Proc. of the European Symposium on Artificial Neural Networks, pp. 301–306 (2007)Google Scholar
  31. 31.
    Schäfer, A.M., Udluft, S., Zimmermann, H.G.: The Recurrent Control Neural Network. In: Proc. of the European Symposium on Artificial Neural Networks, pp. 319–324 (2007)Google Scholar
  32. 32.
    Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  33. 33.
    Takens, F.: Detecting strange attractors in turbulence. Dynamical Systems and Turbulence 898, 366–381 (1981)MathSciNetMATHGoogle Scholar
  34. 34.
    Zimmermann, H.G., Grothmann, R., Schäfer, A.M., Tietz, C.: Identification and Forecasting of Large Dynamical Systems by Dynamical Consistent Neural Networks. In: New Directions in Statistical Signal Processing: From Systems to Brain, pp. 203–242. MIT Press (2006)Google Scholar
  35. 35.
    Zimmermann, H.G., Neuneier, R.: Neural network architectures for the modeling of dynamical systems. In: Kolen, J.F., Kremer, S.C. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press (2001)Google Scholar
  36. 36.
    Zimmermann, H.G., Tietz, C., Grothmann, R.: Forecasting with Recurrent Neural Networks: 12 Tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 687–707. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Siegmund Duell
    • 1
    • 2
  • Steffen Udluft
    • 1
  • Volkmar Sterzing
    • 1
  1. 1.Intelligent Systems and ControlSiemens AG, Corporate TechnologyGermany
  2. 2.Intelligent Systems and ControlBerlin University of Technology, Machine LearningGermany

Personalised recommendations