Boosting Reinforcement Learning with Unsupervised Feature Extraction

  • Simon HakenesEmail author
  • Tobias Glasmachers
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11727)


Learning to process visual input for Deep Reinforcement Learning is challenging and training a neural network with nothing else but a sparse and delayed reward signal seems rather inappropriate. In this work, Deep Q-Networks are leveraged by several unsupervised machine learning methods that provide additional information for the training of the feature extraction stage to find a well suited representation of the input data. The influence of convolutional filters that were pretrained on a supervised classification task, a Convolutional Autoencoder and Slow Feature Analysis are investigated in an end-to-end architecture. Experiments are performed on five ViZDoom environments. We found that the unsupervised methods boost Deep Q-Networks significantly depending on the underlying task the agent has to fulfill. While pretrained filters improve object detection tasks, we find that Convolutional Autoencoders leverage navigation and orientation tasks. Combining these two approaches leads to an agent that performs well on all tested environments.


Deep Reinforcement Learning Unsupervised learning 


  1. 1.
    Alvernaz, S., Togelius, J.: Autoencoder-augmented neuroevolution for visual Doom playing. CoRR abs/1707.03902 (2017).
  2. 2.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM, New York (2001).
  3. 3.
    Cuccu, G., Togelius, J., Cudré-Mauroux, P.: Playing atari with six neurons. In: International Conference on Autonomous Agents and MultiAgent Systems, pp. 998–1006. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2019)Google Scholar
  4. 4.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766, December 2015.
  5. 5.
    Franzius, M., Sprekeler, H., Wiskott, L.: Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Comput. Biol. 3(8), 1–18 (2007)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Franzius, M., Wilbert, N., Wiskott, L.: Invariant object recognition and pose estimation with slow feature analysis. Neural Comput. 23, 2289–2323 (2011)CrossRefGoogle Scholar
  7. 7.
    Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611, July 2017.
  8. 8.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  9. 9.
    Hasselt, H.V., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI Conference on Artificial Intelligence, pp. 2094–2100. AAAI Press (2016)Google Scholar
  10. 10.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). Scholar
  11. 11.
    Hinton, G., Sejnowski, T., Poggio, T.: Unsupervised Learning: Foundations of Neural Computation. A Bradford Book. MCGRAW HILL BOOK Company (1999)Google Scholar
  12. 12.
    Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016)Google Scholar
  13. 13.
    Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 341–348. IEEE, Santorini, September 2016Google Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)Google Scholar
  15. 15.
    Kulkarni, T.D., Saeedi, A., Gautam, S., Gershman, S.J.: Deep successor reinforcement learning. CoRR abs/1606.02396 (2016).
  16. 16.
    Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8, July 2010.
  17. 17.
    Legenstein, R., Wilbert, N., Wiskott, L.: Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput. Biol. 6(8), e1000894 (2010)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). Scholar
  19. 19.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  20. 20.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724, June 2014Google Scholar
  21. 21.
    Papoudakis, G., Chatzidimitriou, K.C., Mitkas, P.A.: Deep reinforcement learning for Doom using unsupervised auxiliary tasks. CoRR abs/1807.01960 (2018).
  22. 22.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Saxe, A.M., Koh, P.W., Chen, Z., Bhand, M., Suresh, B., Ng, A.Y.: On random weights and unsupervised feature learning. In: International Conference on Machine Learning, pp. 1089–1096. Omnipress (2011)Google Scholar
  24. 24.
    Schüler, M., Hlynsson, H.D., Wiskott, L.: Gradient-based training of slow feature analysis by differentiable approximate whitening. CoRR abs/1808.08833 (2018).
  25. 25.
    Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)Google Scholar
  26. 26.
    Wiskott, L.: Learning invariance manifolds. In: Niklasson, L., Bodén, M., Ziemke, T. (eds.) ICANN 1998. PNC, pp. 555–560. Springer, London (1998). Scholar
  27. 27.
    Wiskott, L., Sejnowski, T.: Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14(4), 715–770 (2002)CrossRefGoogle Scholar
  28. 28.
    Wohlfarth, K., et al.: Dense cloud classification on multispectralsatellite imagery. In: IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS), August 2018.
  29. 29.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 3320–3328. Curran Associates, Inc. (2014)Google Scholar
  30. 30.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute for Neural ComputationRuhr University BochumBochumGermany

Personalised recommendations