Neural Networks: Tricks of the Trade pp 709-733 | Cite as
Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks
Abstract
The aim of this chapter is to provide a series of tricks and recipes for neural state estimation, particularly for real world applications of reinforcement learning. We use various topologies of recurrent neural networks as they allow to identify the continuous valued, possibly high dimensional state space of complex dynamical systems. Recurrent neural networks explicitly offer possibilities to account for time and memory, in principle they are able to model any type of dynamical system. Because of these capabilities recurrent neural networks are a suitable tool to approximate a Markovian state space of dynamical systems. In a second step, reinforcement learning methods can be applied to solve a defined control problem. Besides the trick of using a recurrent neural network for state estimation, various issues regarding real world problems such as, large sets of observables and long-term dependencies are addressed.
Keywords
Reinforcement Learning Recurrent Neural Network Reward Function Markovian State Partially Observable Markov Decision ProcessPreview
Unable to display preview. Download preview PDF.
References
- 1.Bakker, B.: Reinforcement Learning with Long Short-Term Memory. In: Becker, S., Dietterich, T.G., Ghahramani, Y. (eds.) Advances in Neural Information Processing Systems, pp. 1475–1482. MIT Press (2002)Google Scholar
- 2.Bellman, R.E.: Dynamic Programming. Princeton University Press (1957)Google Scholar
- 3.Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)CrossRefGoogle Scholar
- 4.Duell, S., Hans, A., Udluft, S.: The Markov Decision Process Extraction Network. In: Proc. of the 18th European Symposium on Artificial Neural Networks (2010)Google Scholar
- 5.Duell, S., Weichbrodt, L., Hans, A., Udluft, S.: Recurrent Neural State Estimation in Domains with Long-Term Dependencies. In: Proc. of the 20th European Symposium on Artificial Neural Networks (2012)Google Scholar
- 6.Frasconi, P., Gori, M., Soda, G.: Local feedback multilayered networks. Neural Computation 4(1), 120–130 (1992)CrossRefGoogle Scholar
- 7.Gomez, F., Miikkulainen, R.: 2-D Balancing with Recurrent Evolutionary Networks. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN 1998), pp. 425–430. Springer (1998)Google Scholar
- 8.Gomez, F.: Robust Non-Linear Control through Neuroevolution. PhD thesis, Departement of Computer Sciences Technical Report AI-TR-03-3003 (2003)Google Scholar
- 9.Haykin, S.: Neural networks and learning machines, vol. 3. Prentice-Hall (2009)Google Scholar
- 10.Haykin, S., Principe, J., Sejnowski, T., McWhirter, J.: New directions in statistical signal processing: from systems to brain. MIT Press (2007)Google Scholar
- 11.Kolen, J.F., Kremer, S.C.: A field guide to dynamical recurrent networks. IEEE Press (2001)Google Scholar
- 12.Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)MathSciNetCrossRefMATHGoogle Scholar
- 13.Kietzmann, T.C., Riedmiller, M.: The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting. In: Proc. of the Int. Conf. on Machine Learning and Applications. IEEE (2009)Google Scholar
- 14.Lin, T., Horne, B.G., Tino, P., Giles, C.L.: Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks 7(6) (1996)Google Scholar
- 15.Medsker, L., Jain, L.: Recurrent Neural Networks: Design and Application. International Series on Comp. Intelligence, vol. I. CRC Press (1999)Google Scholar
- 16.Mozer, M.C.: Induction of multiscale temporal structure. In: Advances in Neural Information Processing Systems, vol. 4, pp. 275–282 (1992)Google Scholar
- 17.Meuleau, N., Peshkin, L., Kee-Eung, K., Kaebling, L.P.: Learning Finite-State Controllers for Partially Observable Environments. In: Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence (UAI 1999), pp. 427–436 (1999)Google Scholar
- 18.Neuneier, R., Zimmermann, H.-G.: How to Train Neural Networks. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 373–423. Springer, Heidelberg (1998)CrossRefGoogle Scholar
- 19.Peters, J., Schaal, A.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4) (2008)Google Scholar
- 20.Ramachandran, D.: Knowledge and Ignorance in Reinforcement Learning. PhD thesis, University of Illinois (2011)Google Scholar
- 21.Rosenstein, M.T., Barto, A.G., Si, J., Powell, W., Wunsch, D.: Supervised actor-critic reinforcement learning. In: Handbook of Learning and Approximate Dynamic Programming, pp. 359–380 (2012)Google Scholar
- 22.Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(9), 533–536 (1986)CrossRefGoogle Scholar
- 23.Riedmiller, M.: 10 Steps and Some Tricks to Set Up Neural Reinforcement Controllers. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 735–757. Springer, Heidelberg (2012)Google Scholar
- 24.Riedmiller, M.: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 25.Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM Journal on Research and Developement, 210–229 (1959)Google Scholar
- 26.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
- 27.Schneegass, D.: Steigerung der Informationseffizienz im Reinforcement-Learning. PhD thesis, Luebeck University (2008)Google Scholar
- 28.Schäfer, A.M., Schneegass, D., Sterzing, V., Udluft, S.: A Neural Reinforcement Learning Approach to Gas Turbine Control. In: Proc. of the Int. Joint Conf. on Neural Networks (2007)Google Scholar
- 29.Schäfer, A.M., Udluft, S.: Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks. In: Workshop Proc. of the European Conf. on Machine Learning (2005)Google Scholar
- 30.Schneegass, D., Udluft, S., Martinetz, T.: Neural Rewards Regression for Near-Optimal Policy Identification in Markovian and Partial Observable Environments. In: Proc. of the European Symposium on Artificial Neural Networks, pp. 301–306 (2007)Google Scholar
- 31.Schäfer, A.M., Udluft, S., Zimmermann, H.G.: The Recurrent Control Neural Network. In: Proc. of the European Symposium on Artificial Neural Networks, pp. 319–324 (2007)Google Scholar
- 32.Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 33.Takens, F.: Detecting strange attractors in turbulence. Dynamical Systems and Turbulence 898, 366–381 (1981)MathSciNetMATHGoogle Scholar
- 34.Zimmermann, H.G., Grothmann, R., Schäfer, A.M., Tietz, C.: Identification and Forecasting of Large Dynamical Systems by Dynamical Consistent Neural Networks. In: New Directions in Statistical Signal Processing: From Systems to Brain, pp. 203–242. MIT Press (2006)Google Scholar
- 35.Zimmermann, H.G., Neuneier, R.: Neural network architectures for the modeling of dynamical systems. In: Kolen, J.F., Kremer, S.C. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press (2001)Google Scholar
- 36.Zimmermann, H.G., Tietz, C., Grothmann, R.: Forecasting with Recurrent Neural Networks: 12 Tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 687–707. Springer, Heidelberg (2012)Google Scholar