Abstract
Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential Constant-Size Compressor (SCSC). The SCSC takes the form of a sequential Recurrent Auto-Associative Memory, trained through standard back-propagation. Results illustrate the feasibility of this approach — this system learns to deal with high-dimensional visual observations (up to 640 pixels) in partially observable environments where there are long time lags (up to 12 steps) between relevant sensory information and necessary action.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Anderson, C.W.: Strategy learning with multilayer connectionist representations. Technical Report TR87-509.3, GTE Labs, Waltham, MA (1987)
Bakker, B.: Reinforcement learning with Long Short-Term Memory. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC 13, 834–846 (1983)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Neural Information Processing Systems (NIPS) (2007)
Gomez, F.J., Miikkulainen, R.: Solving non-Markovian control tasks with neuroevolution. In: Proc. IJCAI 1999, Denver, CO. Morgan Kaufman, San Francisco (1999)
Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 654–662. Springer, Heidelberg (2006)
Gruau, F., Whitley, D., Pyeatt, L.: A comparison between cellular encoding and direct encoding for genetic neural networks. Technical Report NC-TR-96-048, NeuroCOLT (1996)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9(2), 159–195 (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004); (On J. Schmidhuber’s SNF grant 20-61847)
Kolen, J.F., Pollack, J.B.: Back propagation is sensitive to initial conditions. Advances in neural information processing systems 3, 860–867 (1991)
Lange, S., Riedmiller, M.: Deep Auto-Encoder Neural Networks in Reinforcement Learning. IJCNN (2010)
Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)
Pollack, J.B.: Recursive distributed representations. Artificial Intelligence 46(1-2), 77–105 (1990)
Saravanan, N., Fogel, D.B.: Evolving neural control systems. IEEE Expert, 23–27 (June 1995)
Schaul, T., Glasmachers, T., Schmidhuber, J.: High dimensions and heavy tails for natural evolution strategies. In: Genetic and Evolutionary Computation Conference (GECCO) (2011)
Schmidhuber, J.: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science 1(4), 403–412 (1989)
Schmidhuber, J.: Recurrent networks adjusted by adaptive critics. In: Proc. IEEE/INNS International Joint Conference on Neural Networks, Washington, D. C, vol. 1, pp. 719–722 (1990)
Schmidhuber, J.: Reinforcement learning in Markovian and non-Markovian environments. In: Lippman, D.S., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 3, pp. 500–506. Morgan Kaufmann, San Francisco (1991)
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10, 99–127 (2002)
Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)
Werbos, P.J.: Neural networks for control and system identification. In: Proceedings of IEEE/CDC Tampa, Florida (1989)
Wierstra, D., Schmidhuber, J.: Policy gradient critics. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 466–477. Springer, Heidelberg (2007)
Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: Congress on Evolutionary Computation CEC (2008)
Yao, X.: Xin Yao. A review of evolutionary artificial neural networks. International Journal of Intelligent Systems 4, 203–222 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gisslén, L., Luciw, M., Graziano, V., Schmidhuber, J. (2011). Sequential Constant Size Compressors for Reinforcement Learning. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science(), vol 6830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22887-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-22887-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22886-5
Online ISBN: 978-3-642-22887-2
eBook Packages: Computer ScienceComputer Science (R0)