Interactive Learning with Corrective Feedback for Policies Based on Deep Neural Networks

  • Rodrigo Pérez-DattariEmail author
  • Carlos Celemin
  • Javier Ruiz-del-Solar
  • Jens Kober
Conference paper
Part of the Springer Proceedings in Advanced Robotics book series (SPAR, volume 11)


Deep Reinforcement Learning (DRL) has become a powerful strategy to solve complex decision making problems based on Deep Neural Networks (DNNs). However, it is highly data demanding, so unfeasible in physical systems for most applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback, with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teachers as well as the power of DNNs, but also has no need of a reward function (which sometimes implies the need of external perception for computing rewards). We combine Deep Learning with the COrrective Advice Communicated by Humans (COACH) framework, in which non-expert humans shape policies by correcting the agent’s actions during execution. The D-COACH framework has the potential to solve complex problems without much data or time required. Experimental results validated the efficiency of the framework in three different problems (two simulated, one with a real robot), with state spaces of low and high dimensions, showing the capacity to successfully learn policies for continuous action spaces like in the Car Racing and Cart-Pole problems faster than with DRL.


Reinforcement Learning Deep Learning Interactive Machine Learning Learning from Demonstration 



This work was partially funded by FONDECYT Project 1161500 and CONICYT/PIA Project AFB180004. A portion of it has taken place in the University of Chile Duckietown’s headquarters, FabLab U. de Chile ( Special thanks to Matias Mattamala, who provided the necessary tools to do the tests with the Duckiebots.

Supplementary material

489953_1_En_31_MOESM1_ESM.mp4 (22.4 mb)
Supplementary material 1 (mp4 22964 KB)


  1. 1.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)Google Scholar
  2. 2.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016)CrossRefGoogle Scholar
  3. 3.
    Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: IEEE International Conference on Robotics and Automation (ICRA) (2017)Google Scholar
  4. 4.
    Akrour, R., Schoenauer, M., Sebag, M.: Preference-based policy learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2011)CrossRefGoogle Scholar
  5. 5.
    Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: Fifth International Conference on Knowledge Capture (2009)Google Scholar
  6. 6.
    Celemin, C., Ruiz-del Solar, J.: An interactive framework for learning continuous actions policies based on corrective feedback. J. Intell. Robotic Syst. 95, 1–20 (2018)Google Scholar
  7. 7.
    Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences (2017). arXiv preprint arXiv:1706.03741
  8. 8.
    Warnell, G., Waytowich, N., Lawhern, V., Stone, P.: Deep TAMER: Interactive agent shaping in high-dimensional state spaces (2017). arXiv preprint arXiv:1709.10163
  9. 9.
    Argall, B.D., Browning, B., Veloso, M.: Learning robot motion control with demonstration and advice-operators. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2008)Google Scholar
  10. 10.
    Celemin, C., Ruiz-Del-Solar, J.: Teaching agents with corrective human feedback for challenging problems. In: IEEE Latin American Conference on Computational Intelligence (LA-CCI) (2017)Google Scholar
  11. 11.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  12. 12.
    Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: OpenAI baselines (2017).
  13. 13.
    Ha, D., Schmidhuber, J.: World models (2018). arXiv preprint arXiv:1803.10122
  14. 14.
    Paull, L., Tani, J., Ahn, H., Alonso-Mora, J., Carlone, L., Cap, M., Chen, Y.F., Choi, C., Dusek, J., Fang, Y., Hoehener, D., Liu, S.Y., Novitzky, M., Okuyama, I.F., Pazis, J., Rosman, G., Varricchio, V., Wang, H.C., Yershov, D., Zhao, H., Benjamin, M., Carr, C., Zuber, M., Karaman, S., Frazzoli, E., Del Vecchio, D., Rus, D., How, J., Leonard, J., Censi, A.: Duckietown: an open, inexpensive and flexible platform for autonomy education and research. In: IEEE International Conference on Robotics and Automation (ICRA) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Rodrigo Pérez-Dattari
    • 1
    Email author
  • Carlos Celemin
    • 2
  • Javier Ruiz-del-Solar
    • 3
  • Jens Kober
    • 2
  1. 1.Universidad de ChileSantiagoChile
  2. 2.Delft University of TechnologyDelftNetherlands
  3. 3.AMTC CenterUniversidad de ChileSantiagoChile

Personalised recommendations