Human-Centered Collaborative Decision-Making and Steering Control with Reinforcement Learning

Yan, Liang; Wu, Xiaodong; Lu, Hangyu

doi:10.1007/978-3-031-70392-8_105

Liang Yan¹⁷,
Xiaodong Wu¹⁸ &
Hangyu Lu^18,19

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Advanced Vehicle Control Symposium

Abstract

This paper presents a novel human-centered collaborative driving scheme using model-free reinforcement learning (RL) approach. The human-machine cooperation is achieved in both decision-making and steering control levels to improve driving safety while leaving space for human freedom as much as possible. A Markov decision process is firstly derived from the collaborative driving problem, then a RL agent is developed and trained to cooperatively control the vehicle steering under the guidance of a heuristic reward function. Twin delayed deep deterministic policy gradient (TD3) is conducted to attain the optimal control policy. In addition, two extended algorithms with distinct agent action definitions and training patterns are also devised. The effectiveness of the RL-based copilot system is finally validated in an obstacle avoidance scenario by simulation experiments. Driving performance and training efficiency of different RL agents are measured and compared to demonstrate the superiority of the proposed method.

Supported by the National Natural Science Foundation of China under Grant 51775331.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Subject to the inadequacies of fully autonomous driving technology, it is still the mainstream practice to retain human drivers in the vehicle control loop [1]. Nowadays a new technical architecture called human-machine collaborative driving has emerged as the times require. Through close collaboration in vehicle motion planning and control, the integrated system can easily benefit from the hybrid intelligence of the human and the machine [2].

In order to achieve the coordination, traditional methods for driving authority allocation usually include constructed functions, model predictive control, fuzzy system and game theory [3]. Recently, with the development of machine learning and neural networks, several researchers have attempted to use such techniques to settle the co-driving problem. For example, [4] proposed a shared steering control framework based on miscellaneous RL methods to achieve a flexible and efficient path-following maneuver. In [5], a lane change decision-making strategy was developed with deep Q-learning, in which the driving risk was evaluated by probabilistic models beforehand. Nevertheless, few of the existing works pay attention to the control freedom or the realization of individual preference of the driver, thus is detrimental to volume up the superiority of human intelligent in the hybrid system.

A human-centered collaborative driving paradigm adheres to the minimal intervention principle [6], which means the machine partner only intervenes as necessary. Otherwise, the human entities are allowed to do whatever they want, such as choosing the desired path and speed, under the premise of safety. Therefore, it improves the driving flexibility in face of some ambiguous environments, and also facilitates the user acceptance of the assistance system. Several human-centered shared control schemes have been elaborated in the robotics domain, like [7] and [8], but that relevant to the ground vehicle is rare. To this end, the main contributions of this paper are concluded as:

A novel human-centered collaborative driving scheme is proposed, which is the first effort to achieve human-machine coordination in the integrated decision-making and control links with reinforcement learning (RL).
Two extended RL agents adapted for the collaborative driving task are devised and validated in a challenging obstacle avoidance scenario, which provides directions for structural optimization and training acceleration.

2 Driver-Vehicle System Modeling

Vehicle Modeling. The lateral dynamic characteristics of a vehicle can be represented by a 2-DOF bicycle model:

$$\begin{aligned} \dot{\beta }=\frac{2}{Mu}\left[ C_f\delta -(C_f+C_r)\beta +\frac{-aC_f+bC_r}{u}r\right] -r \end{aligned}$$

(1a)

$$\begin{aligned} \dot{r}=\frac{2}{I_z}\left[ aC_f\delta -(aC_f-bC_r)\beta -\frac{a^2C_f+b^2C_r}{u}r\right] \end{aligned}$$

(1b)

where M is the vehicle mass; $I_z$ is the yaw inertia; a and b are the distances of the front and rear axles from the center of gravity; $C_f$ and $C_r$ are the cornering stiffnesses of the front and rear tires; $\beta $ is the sideslip angle; r is the yaw rate. The longitudinal speed is u and the system incentive is the front steering angle $\delta $. Note that the vehicle model is only used for environmental simulation during training and validation since the RL-based method is a model-free approach. Therefore, the model uncertainty will not downgrade the performance of the controller.

Driver Modeling. The optimal preview model is utilized to describe the driver steering control behavior during the path-following process. At an instant $t_0$, the optimal steering angle can be obtained by:

$$\begin{aligned} \delta _d^*=\frac{2(a+b)}{d^2}\left[ f(t_0+\frac{d}{u})-y(t_0)-\frac{\dot{y}(t_0)d}{u}\right] \end{aligned}$$

(2)

where d is the preview distance; f is the reference path; y, $\dot{y}$ is the lateral displacement and velocity respectively. Likewise, the driver model is not necessary a priori knowledge for the controller design, but plays a role in the interaction with the RL agent.

3 Reinforcement Learning Approach

The collaborative driving control can be considered as a Markov decision process (MDP), which is denoted by a tuple $(\mathcal {S},\mathcal {A},P,R,\gamma )$ composed of states $\mathcal {S}$, actions $\mathcal {A}$, transitions P, rewards R and discount factor $\gamma \in [0,1]$. An optimal policy $\pi ^*$ that maximizes the expected discounted return in the future can be found through a training process with interaction of the external environment. The policy is usually represented by a parametric neural network.

Observation. The agent observation in this paper consists of the driver action $a_H=[\delta _d, \dot{\delta }_d]$ and the environment states $o_E$. The environment states contain positions and status of both ego vehicle and surroundings, including information of lanes, road boundaries and obstacles.

Action. The action space is discussed as two situations here. Considering the end-to-end capability of neural networks, the agent output can be defined as either the steering angular velocity or the target lateral displacement of the ego vehicle. For the latter, a low-level Stanley controller [9] is employed to figure out the executable front-wheel steering angle.

Reward. The step reward is comprised of human reward $r_H$ and environmental reward $r_E$, which is given by:

$$\begin{aligned} r_H=k_1e^{-\sigma _1(\delta -\delta _d)^2} \end{aligned}$$

(3a)

$$\begin{aligned} r_E=k_2e^{-\sigma _2{d_c}^2}-k_3e^{-\sigma _3{d_o}^2} \end{aligned}$$

(3b)

where $d_c$ is the offset of the ego vehicle to the lane centerline; $d_o$ is the distance to the nearest obstacle; $k_1$, $k_2$, $k_3$ are weighting coefficients; $\sigma _1$, $\sigma _2$, $\sigma _3$ are adjustable softmax coefficients.

Policy Gradient. To obtain the optimal policy, twin delayed deep deterministic policy gradient (TD3) is adopted in this paper. TD3 establishes two Q-function networks $Q_{\theta _1}$, $Q_{\theta _2}$ as the critic and a deterministic policy network $\pi _{\theta }$ as the actor, which is updated by the policy gradient:

$$\begin{aligned} \nabla _{\theta }J(\pi )=\frac{1}{N}\sum \nabla _a Q_{\theta _1}(s,a)\mid _{a=\pi _{\theta }(s)}\nabla _{\theta }\pi _{\theta }(s) \end{aligned}$$

(4)

where J is the return function and N is the number of transitions in a mini-batch. Figure 1 shows the overall framework of the proposed collaborative driving scheme.

Besides vanilla TD3 (where the agent output is the direct steering action), two extended versions are also developed and investigated in this paper: TD3-SC and TD3-SF. Both of them has target lateral displacement as their action space with a subsequent Stanley tracker. The difference is that TD3-SC agent is trained under changeable episode steps, which means an episode ends as collision occurs, while TD3-SF agent is trained under fixed episode steps (collision does not abort the episode). Performance of the three agents will be compared and discussed in the next section.

4 Validation

The collaborative driving agents are trained and validated in the scenario shown in Fig. 2. As illustrate by the diagram, there exist two feasible paths for the vehicle to bypass the obstacles. These paths serve as the reference trajectories for human drivers. In addition, one straight-line path representing the driver takes no action in face of the oncoming obstacle is also involved. Basically, the driver randomly choose one of the three reference paths for each episode. It is expected that with the assistance of the collaborative control, the vehicle can travel along the desired path consistent with driver’s intention when there is no risk of collision, or actively steer to avoid the obstacle in case of danger (Fig. 4).

Table 1. Comparison of the training time for the three agents.

Full size table

As depicted in Fig. 3, for path ① and ②, all three agents can follow the reference and make the correct steering decisions that align with the driver’s intention. However, the trajectories of vanilla TD3 have more deviations than the others, owing to the arbitrary variations and jitters in the steering angle output, which is also plotted in Fig. 3 accordingly. For path ③, it can be seen that all agents successfully bypass the obstacle, but some of them choose to turn left while others choose to turn right. This is also interpretable that the agents stochastically make their own strategies due to the symmetrical nature of the field. Besides, Table 1 lists the total training time for the three agents to convergence. Obviously, TD3-SF has the highest training efficiency as well as the best path-following accuracy among the three agents.

5 Conclusion

In this paper, a novel human-centered collaborative steering strategy based on RL is proposed and validated in an obstacle avoidance driving scenario. The result shows that the RL-based controller can effectively decode driver’s intention from the steering behavior and correct the risky action to enhance the driving safety. In the comparison of three different agents, the adjustment of agent output to target displacement helps to improve the stability of steering control, while the fixed-step discipline can greatly increase the convergence speed. Future work may include to explore more advanced RL algorithms like soft actor-critic (SAC), and to conduct additional driver-in-the-loop experiments where real-world human drivers are engaged.

References

Guo, W., et al.: Toward consumer acceptance of cooperative driving systems: a human-centered shared steering control approach within a hierarchical framework. IEEE Trans. Consum. Electron. 70(1), 635–645 (2024)
Article Google Scholar
Ning, H., Yin, R., Ullah, A., Shi, F.: A survey on hybrid human-artificial intelligence for autonomous driving. IEEE Trans. Intell. Transp. Syst. 23(7), 6011–6026 (2022)
Article Google Scholar
Ansari, S., Naghdy, F., Du, H.: Human-machine shared driving: challenges and future directions. IEEE Trans. Intell. Veh. 7(3), 499–519 (2022)
Article Google Scholar
Xie, J., Xu, X., Wang, F., Liu, Z., Chen, L.: Coordination control strategy for human-machine cooperative steering of intelligent vehicles: a reinforcement learning approach. IEEE Trans. Intell. Transp. Syst. 23(11), 21163–21177 (2022)
Article Google Scholar
Li, G., Yang, Y., Li, S., Qu, X., Lyu, N., Li, S.E.: Decision making of autonomous vehicles in lane change scenarios: deep reinforcement learning approaches with risk awareness. Transp. Res. Part C: Emerg. Technol. 134, 103452 (2022)
Article Google Scholar
Anderson, S.J., Karumanchi, S.B., Iagnemma, K.: Constraint-based planning and control for safe, semi-autonomous operation of vehicles. In: 2012 IEEE Intelligent Vehicles Symposium, pp. 383–388 (2012)
Google Scholar
Carlson, T., Demiris, Y.: Collaborative control for a robotic wheelchair: Evaluation of performance, attention, and workload. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(3), 876–888 (2012)
Article Google Scholar
Javdani, S., Admoni, H., Pellegrinelli, S., Srinivasa, S.S., Bagnell, J.A.: Shared autonomy via hindsight optimization for teleoperation and teaming. Int. J. Robot. Res. 37(7), 717–742 (2018)
Article Google Scholar
Hoffmann, G.M., Tomlin, C.J., Montemerlo, M., Thrun, S.: Autonomous automobile trajectory tracking for off-road driving: controller design, experimental validation and racing. In: 2007 American Control Conference, pp. 2296–2301 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Liang Yan
Institute of Intelligent Vehicle, Shanghai Jiao Tong University, Shanghai, China
Xiaodong Wu & Hangyu Lu
Department of Applied Mechanics, Budapest University of Technology and Economics, Budapest, Hungary
Hangyu Lu

Authors

Liang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hangyu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Yan .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Giampiero Mastinu
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Francesco Braghin
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Federico Cheli
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Matteo Corno
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Sergio M. Savaresi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, L., Wu, X., Lu, H. (2024). Human-Centered Collaborative Decision-Making and Steering Control with Reinforcement Learning. In: Mastinu, G., Braghin, F., Cheli, F., Corno, M., Savaresi, S.M. (eds) 16th International Symposium on Advanced Vehicle Control. AVEC 2024. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-70392-8_105

Download citation

DOI: https://doi.org/10.1007/978-3-031-70392-8_105
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70391-1
Online ISBN: 978-3-031-70392-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Human-Centered Collaborative Decision-Making and Steering Control with Reinforcement Learning