Automating Vehicles by Deep Reinforcement Learning Using Task Separation with Hill Climbing

  • Mogens Graf PlessenEmail author
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 70)


Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.


Motion planning Encoding motion primitives in neural networks Sparse rewards Hill climbing Virtual velocity constraints 


  1. 1.
    Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch sgd: training resnet-50 on imagenet in 15 minutes. arXiv:1711.04325 (2017)
  2. 2.
    Anderson, C.W.: Learning to control an inverted pendulum using neural networks. IEEE Control. Syst. Mag. 9(3), 31–37 (1989)CrossRefGoogle Scholar
  3. 3.
    Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)
  4. 4.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning, pp. 41–48. ACM (2009)Google Scholar
  5. 5.
    Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., et al.: End to end learning for self-driving cars. arXiv:1604.07316 (2016)
  6. 6.
    Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)Google Scholar
  7. 7.
    Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain inspired cognitive model with attention for self-driving cars. arXiv:1702.05596 (2017 )
  8. 8.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation arXiv:1406.1078 (2014)
  9. 9.
    Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J.: Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res. 29(5), 485–501 (2010)CrossRefGoogle Scholar
  10. 10.
    Falcone, P., Borrelli, F., Asgari, J., Tseng, H.E., Hrovat, D.: Predictive active steering control for autonomous vehicle systems. IEEE Trans. Control. Syst. Technol. 15(3), 566–580 (2007)CrossRefGoogle Scholar
  11. 11.
    Frazzoli, E., Dahleh, M.A., Feron, E.: A hybrid control architecture for aggressive maneuvering of autonomous helicopters. IEEE Conf. Decis. Control. 3, 2471–2476 (1999)zbMATHGoogle Scholar
  12. 12.
    Fu, M.C., Glover, F.W., April, J.: Simulation optimization: a review, new developments, and applications. In: IEEE Winter Simulation Conference, pp. 13–pp. IEEE (2005)Google Scholar
  13. 13.
    Geering, H.P., Dondi, G., Herzog, F., Keel, S.: Stochastic systems. Course script (2011)Google Scholar
  14. 14.
    Gers, F.A., Schraudolph, N.N., Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)Google Scholar
  15. 15.
    Gillespie, T.D.: Vehicle dynamics. Warren Dale (1997)Google Scholar
  16. 16.
    Glasmachers, T.: Limits of end-to-end learning. arXiv:1704.08305 (2017)
  17. 17.
    Gray, A., Gao, Y., Lin, T., Hedrick, J.K., Tseng, H.E., Borrelli, F.: Predictive control for agile semi-autonomous ground vehicles using motion primitives. In: IEEE American Control Conference, pp. 4239–4244 (2012)Google Scholar
  18. 18.
    Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y. et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)
  19. 19.
    Hong, L.J., Nelson, B.L.: A brief introduction to optimization via simulation. In: IEEE Winter Simulation Conference, pp. 75–85 (2009)Google Scholar
  20. 20.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  21. 21.
    Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning, pp. 2342–2350 (2015)Google Scholar
  22. 22.
    Karaman, S., Walter, M.R., Perez, A., Frazzoli, E., Teller, S.: Anytime motion planning using the RRT. In: IEEE Conference on Robotics and Automation, pp. 1478–1483 (2011)Google Scholar
  23. 23.
    Koutník, J., Schmidhuber, J., Gomez, F.: Online evolution of deep convolutional network for vision-based reinforcement learning. In: International Conference on Simulation of Adaptive Behavior, pp. 260–269. Springer (2014)Google Scholar
  24. 24.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
  25. 25.
    Liniger, A., Domahidi, A., Morari, M.: Optimization-based autonomous racing of 1: 43 scale rc cars. Optim. Control. Appl. Methods 36(5), 628–647 (2015)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)MathSciNetCrossRefGoogle Scholar
  27. 27.
    McNaughton, M., Urmson, C., Dolan, J.M., Lee, J.-W.: Motion planning for autonomous driving with a conformal spatiotemporal lattice. In: IEEE Conference on Robotics and Automation, pp. 4889–4895 (2011)Google Scholar
  28. 28.
    Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, Ti., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  29. 29.
    National Highway Traffic Safety Administration. Traffic safety facts, 2014: a compilation of motor vehicle crash data from the fatality analysis reporting system and the general estimates system. dot hs 812261. Department of Transportation, Washington, DC (2014)Google Scholar
  30. 30.
  31. 31.
    Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)CrossRefGoogle Scholar
  32. 32.
    Paxton, C., Raman, V., Hager, G.D., Kobilarov, M.: Combining neural networks and tree search for task and motion planning in challenging environments. arXiv:1703.07887 (2017)
  33. 33.
    Plessen, M.G.: Trajectory planning of automated vehicles in tube-like road segments. In: IEEE Conference on Intelligent Transportation Systems, pp. 83–88 (2017)Google Scholar
  34. 34.
    Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Multi-automated vehicle coordination using decoupled prioritized path planning for multi-lane one-and bi-directional traffic flow control. In: IEEE Conference on Decision and Control, pp. 1582–1588 (2016)Google Scholar
  35. 35.
    Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Spatial-based predictive control and geometric corridor planning for adaptive cruise control coupled with obstacle avoidance. IEEE Trans. Control. Syst, Technol (2017)Google Scholar
  36. 36.
    Plessen, M.G., Lima, P.F., Mårtensson, J., Bemporad, A., Wahlberg, B.: Trajectory planning under vehicle dimension constraints using sequential linear programming. In: IEEE Conference on Intelligent Transportation Systems, pp. 108–113 (2017)Google Scholar
  37. 37.
    Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, pp. 305–313 (1989)Google Scholar
  38. 38.
    Rajamani, R.: Vehicle Dynamics and Control. Springer Science & Business Media (2011)Google Scholar
  39. 39.
    Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: International Conference on Machine Learning, pp. 463–471 (1998)Google Scholar
  40. 40.
    Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
  41. 41.
    Schouwenaars, T., Mettler, B., Feron, E., How, J.P.: Robust motion planning using a maneuver automation with built-in uncertainties. IEEE Am. Control. Conf. 3, 2211–2216 (2003)Google Scholar
  42. 42.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
  43. 43.
    Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4(6), 77–80 (1991)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395 (2014)Google Scholar
  45. 45.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press Cambridge (1998)Google Scholar
  46. 46.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)Google Scholar
  47. 47.
    Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)CrossRefGoogle Scholar
  48. 48.
    Urmson, C., Anhalt, J., Bagnell, D., Baker, C., Bittner, R., Clark, M.N., Dolan, J., et al.: Autonomous driving in urban environments: boss and the urban challenge. J. Field Robot. 25(8), 425–466 (2008)CrossRefGoogle Scholar
  49. 49.
    Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv:1612.01079 (2016)
  51. 51.
    Xu, J., Nelson, B.L., Hong, J.: Industrial strength COMPASS: a comprehensive algorithm and software for optimization via simulation. ACM Trans. Model. Comput. Simul. 20(1), 3 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.IMTLuccaItaly

Personalised recommendations