Powerslide Control with Deep Reinforcement Learning

Jaumann, Florian; Schuster, Tobias; Unterreiner, Michael; Gräber, Torben; Edelmann, Johannes; Plöchl, Manfred

doi:10.1007/978-3-031-70392-8_121

Florian Jaumann^17,18,
Tobias Schuster¹⁷,
Michael Unterreiner¹⁸,
Torben Gräber¹⁸,
Johannes Edelmann¹⁷ &
…
Manfred Plöchl¹⁷

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Advanced Vehicle Control Symposium

Abstract

Controlling a vehicle’s powerslide motion in the presence of a human driver is a challenging control task, but one that may have a significant impact on vehicle safety, for example, during rapid evasive manoeuvres. Reinforcement Learning, a data-driven optimal control strategy, has gained increasing attention in recent years, demonstrating its effectiveness in successfully controlling various nonlinear systems. In this work, a novel powerslide controller is designed for an all-wheel drive battery electric vehicle with individually driven front and rear axles and a human driver in closed-loop using Reinforcement Learning. The performance of the proposed controller is analysed, and its robustness to steering disturbances and changes in road friction is demonstrated.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction and Related Work

The ability to control a vehicle in extreme driving conditions, such as the powerslide, where large vehicle sideslip angles, large traction forces and large negative steering angles occur, is of great interest to the automotive industry. Since the powerslide is an unstable motion, [4] addresses the observability and controllability characteristics for both rear-wheel drive (RWD) and all-wheel drive (AWD) vehicles. Depending on the drive concept, the powerslide can be stabilised in different ways. In recent years, Reinforcement Learning (RL), a data-driven control approach, has become increasingly popular and has been considered to stabilise the powerslide in this work. In [1] and [9], RL is used to control the steering wheel angle and the drive train in simulation, while in [2], RL-based controllers are successful tested on radio-controlled (RC) model cars in real world. The development of battery electric vehicles (BEVs) and the possibility of using individual electric motors on each axle opens up new control strategies. While previous works considered the autonomous drift controlling both the vehicle’s drive train and the steering system, [3] proposes a linear controller that controls front and rear axle torques in the presence of a human driver. This controller shows great performance in simulation; however, when tested in the real world, it requires application effort to handle specific road conditions. In this work, a novel RL-based controller is proposed for an AWD BEV to control the individually driven front and rear axles with a human driver in the loop. Moreover, the robustness of the proposed controller in terms of steering disturbances from the driver and sudden road friction changes is analysed. Further, the controller is integrated into a test vehicle and the control performance is proven in a real-world test case. The remainder of this paper is structured as follows. In Sect. 2, the general problem setup, the vehicle model and the RL problem are introduced, while in Sect. 3, the test configuration and the results in simulation and in real world are presented. Section 4 gives a brief summary and an outlook.

2 Problem Formulation

This section introduces the vehicle and driver model used in the simulation environment, and the general methodology of RL.

Vehicle and Driver Model. The nonlinear two-wheel vehicle model in Fig. 1 at time step $t \in \mathbb {N}_{0}$

$$\begin{aligned} \boldsymbol{x}_{t+1} = {\textbf {f}}_{\text {car}}\left( \boldsymbol{x}_t, \boldsymbol{u}_t \right) , \end{aligned}$$

(1)

is considered, with vehicle state $\boldsymbol{x} \in \mathcal {X}$ and input $\boldsymbol{u} = \left[ \delta , T_{\text {front}} ,T_{\text {rear}} \right] ^T\in \mathcal {U}$, where $T_{\text {front}}$ and $T_{\text {rear}}$ denote the front and rear axle torques, respectively. In the presence of a human driver, the steering angle $\delta $ is defined by the nonlinear driver model [6]

$$\begin{aligned} \delta = {\textbf {f}}_{\text {driver}}\left( \beta , v, e, \varDelta \psi \right) \end{aligned}$$

(2)

based on vehicle sideslip angle $\beta $, velocity v, lateral deviation e and orientation $\varDelta \psi $ relative to the desired path.

Control Goal. While the driver (model) in (2) focuses only on path-tracking by applying $\delta $, the RL-based controller’s task is to initiate and stabilise the powerslide by applying $T_{\text {front}}$ and $T_{\text {rear}}$. Although the control tasks are separated, they are expected to interfere with each other.

Reinforcement Learning. In RL, an agent learns a policy based on the interaction with its environment. The agent acts on the environment using control $\boldsymbol{u}_t \in \mathcal {\hat{U}} = \mathcal {U} \backslash \lbrace \delta \rbrace $ sampled from policy $\boldsymbol{\pi }_{\theta }(\boldsymbol{u} \vert \boldsymbol{x})$ with policy parameters $\theta $ based on the current environment state $\boldsymbol{x}_t$. The agent observes the next environment state $\boldsymbol{x}_{t+1}$ and receives a reward $r_{t+1}$ defined by the reward function R associated with the tuple $[\boldsymbol{x}_t, \boldsymbol{u}_t, \boldsymbol{x}_{t+1}]$. A Markov Decision Process (MDP) described by $\langle \mathcal {X},\mathcal {\hat{U}}, R, {\textbf {f}}_{\text {car}}, \mathcal {X}_0 \rangle $ is assumed, where $\mathcal {X}_0 \subseteq \mathcal {X}$ denotes the initial state distribution. Starting from an initial state $\boldsymbol{x}_0\in \mathcal {X}_0$, the MDP forms a trajectory $\tau $ of states, actions and rewards. The central objective is to find an optimal control policy $\pi ^*$ that maximises the expected sum of discounted rewards

$$\begin{aligned} \pi ^* = \arg \max _{\pi _{\theta }} \underset{\tau \sim \pi }{\mathbb {E}}\left[ \sum \nolimits _{t=0}^{\infty }\gamma ^tr_t\right] \end{aligned}$$

(3)

with discount factor $\gamma \in [0,1]$ balancing the present impact of future rewards. To find policy parameters $\theta $, optimisation problem (3) can be solved using policy gradient methods, e.g. Proximal Policy Optimization (PPO) [7, 8].

Observation Space and Action Space. At each time step, only a subset of the entire environment state is visible to the agent. The observation space $\textbf{o} = [\beta \,\, \dot{\psi } \,\, v \,\, \omega _{\text {front}} \,\, \omega _{\text {rear}} \,\, \delta \,\, \beta _{\text {target}} \,\, \beta _{\text {ref}}]^T$ comprises vehicle sideslip angle $\beta $, yaw rate $\dot{\psi }$, velocity v, angular speed of front axle $\omega _{\text {front}}$ and rear axle $\omega _{\text {rear}}$, respectively, and the steering angle $\delta $. Moreover, the agent receives information about the target steady-state vehicle sideslip angle $\beta _{\text {target}}$ and the predefined vehicle sideslip angle reference trajectory $\beta _{\text {ref}}$ on how to reach $\beta _{\text {target}}$. The vehicle sideslip angle reference is a ramp function with a slope of $-9^{\circ }/\text {s}$, derived from expert knowledge, converging to $\beta _{\text {target}}$. While the first six entries in $\textbf{o}$ reflect sensor information available in the car, the last two entries are required to fulfil the control task. To stabilise the powerslide, the agent can individually control both the front axle torque $T_{\text {front}}$ and the rear axle torque $T_{\text {rear}}$, however only positive torque values are feasible, which excludes the possibility of braking.

Reward Function. During training, the agent tries to learn a policy that maximises the reward function. To simultaneously encourage the agent to stabilise the powerslide and to stay on the circular path,

$$\begin{aligned} R\left( \beta ,e\right) = \sum \nolimits _{i=1}^{2}w_i R_i = w_{\text {slip}}R_{\text {slip}}\left( \beta \right) + w_{\text {path}}R_{\text {path}}\left( e\right) , w_i \in [0,1] \end{aligned}$$

(4)

is chosen as a weighted sum of the reference vehicle sideslip angle tracking and the trajectory following reward $R_{\text {slip}}$ and $R_{\text {path}}$, respectively. The reward terms $R_{\text {i}}= \text {exp}\left( - c_{\text {i}} \varDelta _{\text {i}}^2\right) $ with the deviation of the control target $\varDelta _{\text {i}}$ and shaping parameter $c_{\text {i}}$ ensure a positive learning signal.

3 Experiments

The agent is trained in simulation and evaluated in simulation and in real world.

Training and Network Architecture. Throughout the training, two multilayer perceptron (MLP) networks are trained, one for the policy and one for the value function. Both networks share the same architecture, namely three hidden layers and use Exponential Linear Unit (ELU) activation functions. Before the observations are passed into the policy network, they are normalised to the range [–1,1]. The output of the policy network is clipped to the range [–1,1] and scaled to the admissible torques. To improve exploration during the training, generalised state-dependent exploration (gSDE) is used. The learning rate is set to $2\textrm{e}{-4}$ and the discount factor to 0.9999. ADAM is used to optimise the networks [5]. To accelerate the training, 40 environments are run in parallel. A training episode is terminated when it exceeds 40 seconds. However, it is prematurely terminated when the vehicle leaves the track or when the difference between the current vehicle sideslip angle and the reference exceeds a certain threshold. A rollout buffer size of 409600 and a batch size of 5120 are used.

3.1 Testing

The trained controller is tested in both simulation and real world.

Simulation. The controller is trained on different configurations of the environment, where each configuration is randomly initialised. In the first example, the controller is exemplarily evaluated with a target vehicle sideslip angle of –30 deg on a circular path with radius $R = 60\,\text {m}$ and friction coefficient $\mu = 0.21$, see Fig. 2. The vehicle starts with an initial velocity of $20\,\text {km/h}$ and transitions into a stable powerslide motion following the vehicle sideslip angle reference $\beta _{\text {ref}}$. Figure 2 shows that the controller successfully learnt to transition the vehicle from regular steady-state cornering into powerslide and to stabilise the powerslide motion. To initiate the powerslide and to increase the vehicle sideslip angle, a high rear axle torque compared to the front axle torque is applied. Once the target vehicle sideslip angle is reached, front and rear torque converge to a fixed drive torque distribution of $\gamma = T_{\text {rear}}/(T_{\text {rear}}+T_{\text {front}})=0.84$.

In the second example, the controller’s robustness to steering and road friction disturbances is focussed. In the first scenario, a disturbance of the steering angle $\delta $ is considered, while in the second scenario, the road friction $\mu $ is instantaneously increased ($\mu _{\uparrow }$) and decreased ($\mu _{\downarrow }$). The steering angle disturbance is represented by a shifted cosine function over a single period of $0.5\,\text {s}$ with an amplitude of $1.5^{\circ }$ towards the inside ($\delta _{\uparrow }$) and the outside ($\delta _{\downarrow }$) of the turn. For these evaluations, the environment configuration is adapted to radius $R=18.5\,\text {m}$ and friction coefficient $\mu = 0.35$, corresponding to the real-world setting. Figure 3 shows the vehicle sideslip angle trajectories resulting from the steering wheel angle disturbance and the change of the friction value. In both scenarios, the vehicle motion is successfully stabilised by the controller.

Real World. The controller is deployed on a conventional, consumer-grade computer, which is connected to the vehicle’s embedded hardware. The test vehicle is a series production electric sports car. Vehicle sensor data and computed controls are exchanged between the computer and the vehicle via the XCP-protocol using prototype hardware. The controller runs cyclically with a sampling frequency of 100 Hz. For the control task, only built-in sensor signals of the vehicle are used, except for the vehicle sideslip angle, which is provided by an additional inertial measurement unit (IMU) mounted in the car. The measurements are collected on a watered circuit with $18.5\,\text {m}$ radius and an estimated friction coefficient of 0.35. In the experiment, the vehicle starts in regular steady-state cornering with an initial speed of $20\,\text {km/h}$ and after time step $t=5\,\text {s}$, it transitions to the powerslide with a target vehicle sideslip angle of $-30^{\circ }$, see Fig. 4. The controller stabilises the powerslide motion, however, oscillations of the vehicle side slip angle are present. This could be due to the driver influence, latencies or sensor noise in the control loop.

4 Summary and Outlook

In this paper, an RL-based controller is developed to stabilise the powerslide of a vehicle with a human driver in charge of steering only. This is achieved by controlling the front and rear axle torques of an AWD BEV. The control performance of the powerslide controller on a circular path both in simulation and in the real world is demonstrated. For the latter case, the controller is tested in a series production electric sports car. The experiments clearly show that the proposed controller reacts appropriately to steering disturbances and instantaneous changes in the friction coefficient, revealing the robustness of the controller. Moreover, the tests prove that the controller, which was exclusively trained in simulation is also capable of stabilising the powerslide motion in real-world application. This indicates the capability of RL controllers to bridge the simulation to reality gap, since it has to deal with unmodelled real-world phenomena.

Future work should investigate the performance of RL-based controllers across a broader range of drivers, friction coefficients, and vehicle platforms.

References

Cai, P., Mei, X., Tai, L., Sun, Y., Liu, M.: High-speed autonomous drifting with deep reinforcement learning. IEEE Robot. Autom. Lett. 5(2), 1247–1254 (2020)
Article Google Scholar
Domberg, F., Wembers, C.C., Patel, H., Schildbach, G.: Deep drifting: autonomous drifting of arbitrary trajectories using deep reinforcement learning. In: International Conference on Robotics and Automation (ICRA), pp. 7753–7759 (2022)
Google Scholar
Eberhart, M., Plöchl, M., Unterreiner, M., Edelmann, J.: Insights into the stability and control of the powerslide for variable drive torque distribution applied to a driver assistance system. (submitted)
Google Scholar
Edelmann, J., Plöchl, M.: Controllability of the powerslide motion of vehicles with different drive concepts. Procedia Eng. 199, 3266–3271 (2017). x Int. Conf. on Structural Dynamics, EURODYN
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR). San Diega, CA, USA (2015)
Google Scholar
McRuer, D., Graham, D., Krendel, E., Reisener, W.: Human-pilot dynamics in compensatory systems: an abstract of U.S. government report AFFDL-TR-65-15. IEEE Trans. Hum. Factors Electron. HFE-6(1), 84–84 (1965)
Google Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Tóth, S.H., Ádám Bárdos, Viharos, Z.J.: Tabular q-learning based reinforcement learning agent for autonomous vehicle drift initiation and stabilization. IFAC-PapersOnLine 56(2), 4896–4903 (2023). 22nd IFAC World Congress
Google Scholar

Download references

Acknowledgements

We thank the ÖAMTC Fahrtechnikzentrum Wachau for providing the test track, Dr. Ing. h.c. F. Porsche AG for their support, and the Vienna Scientific Cluster for computational resources.

Author information

Authors and Affiliations

TU Wien, Institute of Mechanics and Mechatronics, Vienna, Austria
Florian Jaumann, Tobias Schuster, Johannes Edelmann & Manfred Plöchl
CARIAD SE, Vehicle Motion & Energy, Wolfsburg, Germany
Florian Jaumann, Michael Unterreiner & Torben Gräber

Authors

Florian Jaumann
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Michael Unterreiner
View author publications
You can also search for this author in PubMed Google Scholar
Torben Gräber
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Edelmann
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Plöchl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Jaumann .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Giampiero Mastinu
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Francesco Braghin
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Federico Cheli
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Matteo Corno
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Sergio M. Savaresi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaumann, F., Schuster, T., Unterreiner, M., Gräber, T., Edelmann, J., Plöchl, M. (2024). Powerslide Control with Deep Reinforcement Learning. In: Mastinu, G., Braghin, F., Cheli, F., Corno, M., Savaresi, S.M. (eds) 16th International Symposium on Advanced Vehicle Control. AVEC 2024. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-70392-8_121

Download citation

DOI: https://doi.org/10.1007/978-3-031-70392-8_121
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70391-1
Online ISBN: 978-3-031-70392-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Powerslide Control with Deep Reinforcement Learning