Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning

  • Jan Koutník
  • Jürgen Schmidhuber
  • Faustino Gomez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8575)


Dealing with high-dimensional input spaces, like visual input, is a challenging task for reinforcement learning (RL). Neuroevolution (NE), used for continuous RL problems, has to either reduce the problem dimensionality by (1) compressing the representation of the neural network controllers or (2) employing a pre-processor (compressor) that transforms the high-dimensional raw inputs into low-dimensional features. In this paper we extend the approach in [16]. The Max-Pooling Convolutional Neural Network (MPCNN) compressor is evolved online, maximizing the distances between normalized feature vectors computed from the images collected by the recurrent neural network (RNN) controllers during their evaluation in the environment. These two interleaved evolutionary searches are used to find MPCNN compressors and RNN controllers that drive a race car in the TORCS racing simulator using only visual input.


deep learning neuroevolution vision-based TORCS reinforcement learning computer games 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep big simple neural nets for handwritten digit recognition. Neural Computation 22(12), 3207–3220 (2010)CrossRefGoogle Scholar
  2. 2.
    Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1237–1242 (2011)Google Scholar
  3. 3.
    Cuccu, G., Luciw, M., Schmidhuber, J., Gomez, F.: Intrinsically motivated evolutionary search for vision-based reinforcement learning. In: Proceedings of the IEEE Conference on Development and Learning, and Epigenetic Robotics (2011)Google Scholar
  4. 4.
    D’Ambrosio, D.B., Stanley, K.O.: A novel generative encoding for exploiting neural network sensor and output geometry. In: Proceedings of the 9th Conference on Genetic and Evolutionary Computation (GECCO), pp. 974–981. ACM, New York (2007)Google Scholar
  5. 5.
    Fernández, F., Borrajo, D.: Two steps reinforcement learning. International Journal of Intelligent Systems 23(2), 213–245 (2008)CrossRefzbMATHGoogle Scholar
  6. 6.
    Fukushima, K.: Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36(4), 193–202 (1980)CrossRefzbMATHGoogle Scholar
  7. 7.
    Gauci, J., Stanley, K.: Generating large-scale neural networks through discovering geometric regularities. In: Proceedings of the Conference on Genetic and Evolutionary Computation (GECCO), pp. 997–1004. ACM (2007)Google Scholar
  8. 8.
    Gisslén, L., Luciw, M., Graziano, V., Schmidhuber, J.: Sequential constant size compressors for reinforcement learning. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 31–40. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Accelerated neural evolution through cooperatively coevolved synapses. Journal of Machine Learning Research 9, 937–965 (2008)zbMATHMathSciNetGoogle Scholar
  10. 10.
    Gruau, F.: Cellular encoding of genetic neural networks. Technical Report RR-92-21, Ecole Normale Superieure de Lyon, Institut IMAG, Lyon, France (1992)Google Scholar
  11. 11.
    Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Jodogne, S.R., Piater, J.H.: Closed-loop learning of visual control policies. Journal of Artificial Intelligence Research 28, 349–391 (2007)zbMATHGoogle Scholar
  13. 13.
    Kitano, H.: Designing neural networks using genetic algorithms with graph generation system. Complex Systems 4, 461–476 (1990)zbMATHGoogle Scholar
  14. 14.
    Koutník, J., Cuccu, G., Schmidhuber, J., Gomez, F.: Evolving large-scale neural networks for vision-based reinforcement learning. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Amsterdam (2013)Google Scholar
  15. 15.
    Koutník, J., Gomez, F., Schmidhuber, J.: Evolving neural networks in compressed weight space. In: Proceedings of the Conference on Genetic and Evolutionary Computation, GECCO (2010)Google Scholar
  16. 16.
    Koutník, J., Schmidhuber, J., Gomez, F.: Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In: Proceedings of the 2014 Genetic and Evolutionary Computation Conference (GECCO). ACM Press (2014)Google Scholar
  17. 17.
    Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain (2010)Google Scholar
  18. 18.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  19. 19.
    Legenstein, R., Wilbert, N., Wiskott, L.: Reinforcement Learning on Slow Features of High-Dimensional Input Streams. PLoS Computational Biology 6(8) (2010)Google Scholar
  20. 20.
    Pierce, D., Kuipers, B.: Map learning with uninterpreted sensors and effectors. Artificial Intelligence 92, 169–229 (1997)CrossRefzbMATHGoogle Scholar
  21. 21.
    Riedmiller, M., Lange, S., Voigtlaender, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, pp. 1–8 (2012)Google Scholar
  22. 22.
    Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010, Part III. LNCS, vol. 6354, pp. 92–101. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Schmidhuber, J.: Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks 10(5), 857–873 (1997)CrossRefGoogle Scholar
  24. 24.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (NIPS), pp. 1057–1063 (1999)Google Scholar
  25. 25.
    Tesauro, G.: Practical issues in temporal difference learning. In: Lippman, D.S., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems 4 (NIPS), pp. 259–266. Morgan Kaufmann (1992)Google Scholar
  26. 26.
    Yao, X.: Evolving artificial neural networks. Proceedings of the IEEE 87(9), 1423–1447 (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jan Koutník
    • 1
  • Jürgen Schmidhuber
    • 1
  • Faustino Gomez
    • 1
  1. 1.USI-SUPSIIDSIAManno-LuganoSwitzerland

Personalised recommendations