Adaptive cruise control via adaptive dynamic programming with experience replay
- 260 Downloads
The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay technology is proposed to design the ACC controller. Experience replay increases the data efficiency by recording the available driving data and repeatedly presenting them to the learning procedure of the acceleration controller in the ACC system. The learning framework that combines ADP with experience replay is described in detail. The distinguishing feature of the algorithm is that when estimating parameters of the critic network and the actor network with gradient rules, the gradients of historical data and current data are used to update parameters concurrently. It is proved with Lyapunov theory that the weight estimation errors of the actor network and the critic network are uniformly ultimately bounded under the novel weight update rules. The learning performance of the ACC controller implemented by this ADP algorithm is clearly demonstrated that experience replay can increase data efficiency significantly, and the approximate optimality and adaptability of the learned control policy are tested with typical driving scenarios.
KeywordsAdaptive cruise control Adaptive dynamic programming Experience replay Reinforcement learning Neural networks
This work was supported partly by National Natural Science Foundation of China (Nos. 61603150, 61273136, 61573353 and 61533017), the National Key Research and Development Plan (No. 2016YFB0101000), and Doctoral Foundation of University of Jinan (No. XBS1605).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.
- Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. In: 2010 49th IEEE conference decision and control (CDC), pp 3674–3679Google Scholar
- dSPACE (2015) dSPACE ASM vehicle dynamics. http://www.dspace.com/en/inc/home/products/sw/automotive_simulation_models/produkte_asm/vehicle_dynamics_models.cfm. Accessed Oct 28, 2015
- Kyongsu Y, Ilki M (2004) A driver-adaptive stop-and-go cruise control strategy. In: 2004 IEEE International Conference on Network Sensor Control, pp 601–606Google Scholar
- Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321Google Scholar
- Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Department of Engineering, University of CambridgeGoogle Scholar
- Wang B, Zhao D, Li C, Dai Y (2015) Design and implementation of an adaptive cruise control system based on supervised actor-critic learning. In: 2015 5th international conference on information science technology (ICIST), pp 243–248Google Scholar