Keywords

1 Introduction

Traction controllers (TCs) adjust the driver’s traction torque request to limit wheel slip, and in doing so, maximise tyre force generation and maintain vehicle stability [1]. The functionality of TC is particularly beneficial for very low tyre-road friction coefficient conditions, like icy roads. In these conditions, relatively small torque requests can induce very fast wheel dynamics (beyond the driver's control), yielding significant wheel slips and potentially causing the vehicle to spin. Literature surveys on TC solutions, e.g., [2,3,4] show that most of the available TC strategies use model based and deterministic control approaches, such as Nonlinear Model Predictive Control (NMPC) based solutions [2]. In this work, a novel artificial intelligence based traction controller, using deep reinforcement learning (DRL), is designed and its effectiveness explored for low road-tyre friction conditions. The proposed DRL TC design uses feedforward neural networks (FFNNs). The choice of neural network (NN) architecture is based on the current vehicle dynamics control literature [5,6,7]. The analysis of different NN architectures such as recurrent neural networks (RNNs) for vehicle dynamics control is beyond the scope of this paper. To evaluate the DRL controller, its performance is compared against an NMPC strategy, serving as a benchmark state-of-the-art traction controller. The novelty points presented in this paper are: (i) the design of a DRL-based-TC trained on an experimentally validated vehicle model, and (ii) a simulation based analysis of the performance advantages of the DRL control strategy compared to a state-of-the-art NMPC TC.

The paper is organised as follows: Sect. 2 presents the case study electric vehicle (EV), vehicle model validation and the simulated control framework. Section 3 introduces the proposed TC strategies. Section 4 describes the simulation setup and the simulation results, which are quantitatively assessed. The conclusions and future developments are reported in Sect. 5.

2 Vehicle Validation and Traction Controller Framework

2.1 Case Study Vehicle and Validation

The EV case study considers a single motor, rear-wheel-drive vehicle available at the University of Surrey. Table 1 reports the main vehicle parameters. Straight-line acceleration tests were carried out with the vehicle driven on polished ice, see Fig. 1. The experimental results were used to validate the simulation model created in AVL VSM for the same acceleration manoeuvre, see Fig. 2. The tyres are simulated with the Pacejka Magic Formula (MF) 5.2. The electric motor dynamics are modelled with (i) a rate limiter of 226 %/s, and (ii) a communication time delay (\(t_d = 82 \,{\text{ms}}\)) between the torque request \(T_{r_{exp} }\) and the feedback motor \(T_{fb_{exp} }\) (see Fig. 2a). The longitudinal acceleration \(a_x\) of the vehicle was measured with an on-board accelerometer and the slip ratio was computed from the four wheel speed measurements.

Fig. 1.
figure 1

Case study EV tested on ice.

Table 1. Main vehicle parameters
Fig. 2.
figure 2

Experimental (subscript \(exp\)) and simulated (subscript \(sim\)) straight-line acceleration manoeuvre. a) torque request (\(T_r\)) and feedback motor torque (\(T_{fb}\)); b) longitudinal acceleration, \(a_x\); c) practical slip ratio, \(\kappa\).

2.2 Traction Controller Structure

The TC architecture is shown in Fig. 3. The driver presses the accelerator pedal, and the pedal’s position (\(app\)) is fed into a drivability map to calculate a driver reference torque (\(T_{ref}\)). The TC block receives \(T_{ref}\) and the vehicle signals, and computes a torque correction \(T_{corr}\) that is subtracted from the driver torque request. The final torque request (\(T_{app}\)) is then applied to the vehicle. The controlled variable is the slip velocity \(s_w = \omega_{wh} r - v_x\), where \(\omega_{wh}\) is the angular speed of the driven wheels, \(r\) is the rolling radius of the tyre and \(v_x\) is the vehicle’s longitudinal velocity. The practical slip ratio is defined as \(\kappa = \frac{{\omega_{wh} r - v_x }}{v_x }\).

Fig. 3.
figure 3

Simplified schematic of the traction control architecture.

3 Control Strategies

3.1 NMPC Framework

The NMPC TC is a simplified version of the one presented in [1]. The internal model of the NMPC is based on the following four equations:

$$\frac{d}{dt}s_{w,L/R} = \left( { - \frac{r^2 }{{J_{wh} }} - \frac{1}{m_q }} \right)D{\mu_{L/R}} {\text{sin}}\left( {C {\text{arctan}}\left( {\frac{{Bs_{w,L/R} }}{{\omega_{wh,L/R} r}}} \right)} \right)F_z + \frac{{G_r T_{fb} }}{2J_{wh} }r$$
(1)
$$\frac{d}{dt}\omega_{wh,L/R} = \frac{1}{{J_{wh} }}\left( {\frac{G_r T_{fb}}{2} - D{\mu_{L/R}} \,{\text{sin}}\left( {C \,{\text{arctan}}\left( {\frac{Bs_{w,L/R} }{{\omega_{wh,L/R} r}}} \right)} \right)F_z r} \right)$$
(2)
$$\frac{d}{dt}T_{fb} { } = { }\frac{{T_{app} - T_{fb} }}{\tau_m }$$
(3)

where the subscript “\(L/R\)” denotes the left or right wheel. \(m_q\) is the mass of the quarter car model, \(J_{wh}\) is the rear wheel inertia, \(B\), \(C\) and \(D\) are the simplified MF parameters [8], \(G_r\) is the drivetrain gear ratio, μ is the friction coefficient and \(F_z\) is the vertical tyre load considered as constant in the formulation of the NMPC internal model, and \(\tau_m\) is the motor time constant. The NMPC state vector is \(x = [s_{w_L },\omega_{wh_L },s_{w_R },\omega_{wh_R },T_{fb} ]\), where \(T_{fb}\) is the feedback motor torque. The control action is \(u = T_{corr}\), whose maximum is equal to the torque requested by the driver. The stage cost is:

$$\text{min}\,{J} = W_{\Delta s_w } \left( {\Delta s_{w,L}^2 (N)+ \Delta s_{w,R}^2 (N)} \right) +\sum_{k=0}^{N-1} W_{\Delta s_{w}} \left( \Delta s_{w,L}^2(k) + \Delta s_{w,R}^2 (k) \right) + W_u T_{corr}^2 (k)$$
(4)

where the slip velocity error \(\Delta s_{w,L/R} = s_{w,L/R} - \sigma_{x_{ref} } \omega_{wh,L/R} r\), with the theoretical slip reference \(\sigma_{x_{ref} } = 5\%\). \(W_{\Delta s_{w} }\) and \(W_u\) are weighting factors for slip tracking and control effort, respectively. Two versions of the same NMPC are proposed: the first one, denoted as “expert NMPC”, with \(N = 50\) steps in the prediction horizon and sampling time of \(T_s\) = 10 ms, which is also used to guide the DRL TC training. The second one, denoted as “real-time NMPC”, is the real-time version with number of steps in the prediction horizon \(N = 10\) and sampling time of \(T_s = 10 \,{\text{ms}}\). To confirm the real-time capability of all proposed controllers, both NMPC configurations were run in real-time on a dSPACE MicroAutoBox II system (900 MHz, 16 Mb flash memory). The real-time NMPC has a peak turnaround time of \(5.8 \,{\text{ms}}\), guaranteeing real-time capability, while the “expert NMPC” had a peak turnaround time of \(60.5 \,{\text{ms}}\) and \(30 \,{\text{ms}}\) turnaround time subsequently, exceeding \(T_s\).

3.2 Reinforcement Learning Framework

DRL controllers, known also as agents, decide actions (control input) based on observations (plant states) from the environment, which includes the nominal plant and relevant disturbances. The policy (control strategy) mapping observations to actions is learned through interactions with the environment by maximising a cumulative reward over multiple simulations, called episodes.

For the proposed DRL TC formulation, the control input (action) \(a\) is the torque correction, i.e., \(a = [T_{corr} ]\), while the observations vector is set as:

$$s = [a_x,T_{app},\Delta s_{w,av},\int {\Delta s_{w,av},app} ]$$
(5)

where \(\Delta s_{w,av}\) is the average slip velocity error between left and right wheels. The reward function is:

$$R = - W_{\Delta s_{w,av} } \left| {\Delta s_{w,av} } \right| + W_{v_x } v_x - W_{IL} \left| {T_{corr} - T_{corr,exp} } \right|.$$
(6)

The first two rewards guide the agent to track the reference slip while keeping the vehicle accelerating. The third reward teaches the agent to follow the expert NMPC control action \(T_{corr,exp}\). The third reward guides the agent to speed up the training process, while the first two rewards teach the agent to continue improving the TC performance. The weights for each term \([W_{\Delta s_{w,av} },\) \(W_{v_x },\) \(W_{IL} ]\) have been chosen to give the same priority to the first two rewards, while the third one has been tuned empirically. The DRL algorithm used is the state-of-the-art actor-critic DDPG algorithm [9]. This algorithm has the advantage of (i) handling complex plants with continuous control actions; (ii) improving performance, i.e., they are able to learn policies that achieve higher rewards on the same environment compared to value-based and policy-based algorithms; and (iii) providing a good trade-off between sample efficiency and computational expense [10]. The DDPG agent consists of 1 actor and 1 critic FFNNs. The actor network, i.e., the one computing the control action has two hidden layers with 40 neurons each and ReLU activation function after each layer, and one tanh output layer with one neuron, which is scaled by the peak rear motor torque. The critic network, i.e., which evaluates the actor’s performance, has two hidden layers with 40 neurons each and a ReLU activation function in between for the state. The action path has two hidden layers, with 10 and 40 neurons respectively, and a ReLU activation function between them. The common path combines the two paths and applies a ReLU activation function before the single-neuron output.

4 Simulation Set up and Results

4.1 Test Scenario

The simulation analysis was carried out with the validated vehicle model presented in Sect. 2.1 using AVM VSM software. The DRL TC agent was trained for 1000 episodes and tested on a straight-line tip-in manoeuvre on a surface with a friction coefficient of \(\mu_x = 0.085\). The manoeuvre consists of a step torque request \(T_{ref}\) with an initial value of \(7.5 \,{\text{Nm}}\) and a final value of \(54 \,{\text{Nm}}\) at a step time of 2.5 seconds.

4.2 Simulation Results

To evaluate the effectiveness of the DRL TC solution, the NMPC TC strategies presented in Sect. 3.1, i.e., the expert NMPC and the real-time NMPC, have been adopted as benchmark solutions. The sampling time of the DRL and NMPCs controllers is set at \(T_s = 10 \,{\text{ms}}\) for a fair comparison. The reduction of sampling time implies a better tracking of the reference slip regardless of the selected control algorithm [11]. In addition, the controllers activate only when the slip is above of 5%. Figure 4 shows the simulation results for the tip-in manoeuvre with an initial velocity of 2.5 km/h. All control strategies manage to reduce the slip during the manoeuvre despite the presence of a pure time delay in the system. However, the DRL control solution outperforms the NMPC strategies in terms of residual steady-state error and converge time of the closed-loop slip response (see Fig. 4a). The inset in Fig. 4c shows that the DRL agent provides a quicker adjustment of the torque request compared to the NMPC controllers, resulting in a smaller overshoot in the slip response depicted in Fig. 4a. In addition, the smoother control action provided by the DRL also reduces the wear in the drivetrain components. In the interval between 1 and 2 seconds. The DRL agent starts to reduce the torque correction, leading to an increase in the final torque request. This allows the vehicle to reach higher accelerations compared to the NMPCs, up to \(0.5\, {\text{m}}/{\text{s}}^2\) (see Fig. 4b).

Fig. 4.
figure 4

Simulation results: a) Slip; b) longitudinal acceleration; c) final torque request with an inset in the interval [0.5, 1]s highlighting the intervention time and initial response of the TCs.

5 Conclusions

This paper proposes a novel DRL traction controller benchmarked against state-of-the-art NMPC controllers with two parametrisations: a real-time capable NMPC with a 10 step prediction horizon, and an expert NMPC with a 50 step prediction horizon. The expert NMPC provides better closed-loop tracking performance than the real-time capable NMPC, but is not real-time capable. The proposed DRL controller is trained for 1000 episodes using the control action of the expert NMPC as a term in the reward function allowing the agent to learn faster. The proposed DRL TC strategy can successfully reduce the peaks of the wheel slip and reduce the oscillations in the control action. In addition, the DRL agent reaches the reference slip faster while managing to maintain a higher longitudinal acceleration throughout the manoeuvre.

Future work will focus on extending the formulation of the DRL TC to different friction coefficients and more challenging manoeuvres to further improve closed-loop tracking performance. In addition, different NN structures will be explored and analysed.