CIRL: Controllable Imitative Reinforcement Learning for Vision-Based Self-driving

  • Xiaodan LiangEmail author
  • Tairui Wang
  • Luona Yang
  • Eric Xing
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11211)


Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy by reinforcement learning in the high-fidelity simulator, which performs better than supervised imitation learning.


Imitative reinforcement learning Autonomous driving 


  1. 1.
    Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, pp. 1–8 (2007)Google Scholar
  2. 2.
    Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
  3. 3.
    Cao, Q., Lin, L., Shi, Y., Liang, X., Li, G.: Attention-aware face hallucination via deep reinforcement learning. arXiv preprint arXiv:1708.03132 (2017)
  4. 4.
    Codevilla, F., Müller, M., Dosovitskiy, A., López, A., Koltun, V.: End-to-end driving via conditional imitation learning. arXiv preprint arXiv:1710.02410 (2017)
  5. 5.
    Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. arXiv preprint arXiv:1611.01779 (2016)
  6. 6.
    Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017)
  7. 7.
    Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., Cheng, G.: Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot. Int. J. Robot. Res. 27(2), 213–228 (2008)CrossRefGoogle Scholar
  8. 8.
    Franke, U.: Autonomous driving. In: Computer Vision in Vehicle Technology (2017)CrossRefGoogle Scholar
  9. 9.
    Han, J., Yang, L., Zhang, D., Chang, X., Liang, X.: Reinforcement cutting-agent learning for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9080–9089 (2018)Google Scholar
  10. 10.
    Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)
  11. 11.
    Hester, T., et al.: Learning from demonstrations for real world reinforcement learning. arXiv preprint arXiv:1704.03732 (2017)
  12. 12.
    Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, pp. 4565–4573 (2016)Google Scholar
  13. 13.
    Hou, Y., Hornauer, S., Zipser, K.: Fast recurrent fully convolutional networks for direct perception in autonomous driving. arXiv preprint arXiv:1711.06459 (2017)
  14. 14.
    Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., Yan, S.: Tree-structured reinforcement learning for sequential object localization. In: Advances in Neural Information Processing Systems, pp. 127–135 (2016)Google Scholar
  15. 15.
    Kim, J., Canny, J.: Interpretable learning for self-driving cars by visualizing causal attention. In: ICCV (2017)Google Scholar
  16. 16.
    Latzke, T., Behnke, S., Bennewitz, M.: Imitative reinforcement learning for soccer playing robots. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup 2006. LNCS (LNAI), vol. 4434, pp. 47–58. Springer, Heidelberg (2007). Scholar
  17. 17.
    Li, Y., Song, J., Ermon, S.: InfoGail: interpretable imitation learning from visual demonstrations. In: Advances in Neural Information Processing Systems, pp. 3815–3825 (2017)Google Scholar
  18. 18.
    Liang, X., Hu, Z., Zhang, H., Gan, C., Xing, E.P.: Recurrent topic-transition GAN for visual paragraph generation. In: ICCV (2017)Google Scholar
  19. 19.
    Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4408–4417. IEEE (2017)Google Scholar
  20. 20.
    Liang, X., Zhou, H., Xing, E.: Dynamic-structured semantic propagation network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 752–761 (2018)Google Scholar
  21. 21.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)Google Scholar
  22. 22.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  23. 23.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  24. 24.
    Muller, U., Ben, J., Cosatto, E., Flepp, B., Cun, Y.L.: Off-road obstacle avoidance through end-to-end learning. In: Advances in Neural Information Processing Systems, pp. 739–746 (2006)Google Scholar
  25. 25.
    Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)CrossRefGoogle Scholar
  26. 26.
    Plappert, M., et al.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)
  27. 27.
    Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, pp. 305–313 (1989)Google Scholar
  28. 28.
    Sallab, A.E., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)CrossRefGoogle Scholar
  29. 29.
    Santana, E., Hotz, G.: Learning a driving simulator. arXiv preprint arXiv:1608.01230 (2016)
  30. 30.
    Shalev-Shwartz, S., Shammah, S., Shashua, A.: Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295 (2016)
  31. 31.
    Silver, D., Bagnell, J.A., Stentz, A.: Learning from demonstration for autonomous navigation in complex unstructured terrain. Int. J. Robot. Res. 29(12), 1565–1592 (2010)CrossRefGoogle Scholar
  32. 32.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)Google Scholar
  33. 33.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)Google Scholar
  34. 34.
    Večerík, M., et al.: Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817 (2017)
  35. 35.
    Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: CVPR (2017)Google Scholar
  36. 36.
    Yang, L., Liang, X., Xing, E.: Unsupervised real-to-virtual domain unification for end-to-end highway driving. arXiv preprint arXiv:1801.03458 (2018)
  37. 37.
    You, Y., Pan, X., Wang, Z., Lu, C.: Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952 (2017)
  38. 38.
    Zhang, J., Cho, K.: Query-efficient imitation learning for end-to-end simulated driving. In: AAAI, pp. 2891–2897 (2017)Google Scholar
  39. 39.
    Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: European Conference on Computer Vision, pp. 443–457 (2016)CrossRefGoogle Scholar
  40. 40.
    Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3357–3364 (2017)Google Scholar
  41. 41.
    Ziebart, B.D., Maas, A.L., Dey, A.K., Bagnell, J.A.: Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior. In: Proceedings of the 10th International Conference on Ubiquitous Computing, pp. 322–331 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiaodan Liang
    • 1
    • 2
    Email author
  • Tairui Wang
    • 1
  • Luona Yang
    • 2
  • Eric Xing
    • 1
    • 2
  1. 1.Petuum Inc.PittsburghUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations