Augmented Curiosity: Depth and Optical Flow Prediction for Efficient Exploration

  • Juan CarvajalEmail author
  • Thomas Molnar
  • Lukasz Burzawa
  • Eugenio Culurciello
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11844)


Exploring novel environments for a specific target poses the challenge of how to adequately provide positive external rewards to an artificial agent. In scenarios with sparse external rewards, a reinforcement learning algorithm often cannot develop a successful policy function to govern an agent’s behavior. However, intrinsic rewards can provide feedback on an agent’s actions and enable updates towards a proper policy function in sparse scenarios. Our approaches called the Optical Flow-Augmented Curiosity Module (OF-ACM) and Depth-Augmented Curiosity Module (D-ACM) extend the Intrinsic Curiosity Model (ICM) by Pathak et al. The ICM forms an intrinsic reward signal from the error between a prediction and the ground truth of the next state. Shown with experiments in visually rich and sparse feature scenarios in ViZDoom, our predictive modules exhibit improved exploration capabilities and learning of an ideal policy function. Our modules leverage additional sources of information, such as depth images and optical flow, to generate superior embeddings that serve as inputs for next state prediction. With D-ACM we show a 63.3% average improvement in time to convergence of a policy over ICM in “My Way Home” scenarios.


Reinforcement Learning Exploration Curiosity Self supervision 


  1. 1.
    Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016)Google Scholar
  2. 2.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). Scholar
  3. 3.
    Grześ, M.: Reward shaping in episodic reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, pp. 565–573. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2017)Google Scholar
  4. 4.
    He, Y., Chen, S.: Advances in sensing and processing methods for three-dimensional robot vision. Int. J. Adv. Robot. Syst. 15(2) (2018).
  5. 5.
    Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaskowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. CoRR abs/1605.02097 (2016)Google Scholar
  6. 6.
    Lu, F., Milios, E.: Globally consistent range scan alignment for environment mapping. Auton. Robots 4(4), 333–349 (1997)CrossRefGoogle Scholar
  7. 7.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. CoRR abs/1602.01783 (2016)Google Scholar
  8. 8.
    Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML (2017)Google Scholar
  9. 9.
    Tai, L., Liu, M.: Towards cognitive exploration through deep reinforcement learning for mobile robots. CoRR abs/1610.01733 (2016)Google Scholar
  10. 10.
    Wu, Y., Mansimov, E., Liao, S., Grosse, R.B., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. CoRR abs/1708.05144 (2017)Google Scholar
  11. 11.
    Zhang, M., Levine, S., McCarthy, Z., Finn, C., Abbeel, P.: Policy learning with continuous memory states for partially observed robotic control. CoRR abs/1507.01273 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Juan Carvajal
    • 1
    Email author
  • Thomas Molnar
    • 1
  • Lukasz Burzawa
    • 1
  • Eugenio Culurciello
    • 1
  1. 1.Purdue UniversityWest LafayetteUSA

Personalised recommendations