1 Introduction

The rapid increase in the number of cars on the roads has increased the risks associated with safety and traffic congestion. Recently, autonomous vehicles are being considered as a potential solution to overcome these problems. The autonomous vehicle can achieve more robust systems by providing more efficient driving systems and automated control [1]. The core functions of the autonomous vehicle can be divided into three main categories: perception, planning, and control (See Fig. 1). The environment perception provides the vehicle with the required information about the surrounding driving environment, including the vehicle’s location, the drivable areas, the velocity, etc. Different sensors and tools can be implemented to tackle the perception task, such as using ultrasonic sensors, cameras, LiDARs (Light Detection And Ranging), or even a combination of these (sensor fusion) to decrease the uncertainty of the data [2]. Based on the collected data, the best scenarios are obtained and the required control actions are made in the planning module in order to drive the vehicle efficiently to the desired location. In the control function, the commands are sent to the actuators to put the control strategy into action [3]. Model predictive control (MPC) is one of the most commonly used control strategies due to its ability to solve an online optimization problem and handle soft and hard constraints. However, for complex and high non-linear systems, the implementation of MPC is a great challenge. It is often infeasible due to its high computational demands, especially with resource-limited embedded platforms [4]. Consequently, the competition in this field has concentrated on developing more efficient control strategies in terms of minimizing the computational loads and the execution time. The significant performance of the machine learning methods with a variety of applications in different fields brought attention to the importance of making use of them with automotive driving systems. The promising results of the deep neural networks associated with environment perception and motion planning sparked interest in developing deep neural network-based control strategies [5, 6]. The deep neural network is considered a self-optimized method due to its ability to optimize its behavior based on the provided information, and that makes DNNs suitable for complex dynamic systems [7].

Fig. 1
figure 1

The interaction of the autonomous vehicle within the surrounding environment

Additionally, deep neural networks offer many other benefits in regards of reducing the execution time and the computational load, which makes implementations on limited-resource HW (Hardware) more efficient [8].

The main contribution of this work is the presentation of a novel FPGA implementation method of deep neural networks “IP Generator Tool”, where MPC-based DNN model for automated driving system is the application case that is used to prove the robustness of the new tool. Some papers propose implementations of model predictive controllers on FPGAs considering different implementation methods such as high-level synthesis [9], a Xilinx System Generator [10] or even HDL (Hardware Description Language) [11]. In this paper, the suggested solution is to develop a deep neural network model based on the behavior of the traditional MPC controller so that the DNN model can replace the MPC controller in high complexity driving system environments. Additionally, a new automatic IP generator tool is developed to optimize the deployments of the DNN on FPGA, meaning that low-end FPGA can be used. The paper is structured as follows: The second section provides background and discusses the state of the art concerning the most commonly used control strategies and deployment methods on different embedded platforms. In the third section, the designing process of the vehicle model and the traditional MPC controller are presented. In section four, the the DNN model was designed. The procedures of the auto-generation of the DNN IP and the implementations process of the entire solution for DNN on a hardware accelerator are presented in section five. Results and discussion are found in section six, while the conclusions and the future direction of this research are presented in the final section.

2 Background

Different control strategies can be used to perform the automated steering task, such as classical feedback control (such as the PID controller), dynamic control and model-based control. In this context, model predictive control is one of the most commonly used control strategies for the steering task, known for its efficiency due to its ability to solve an online optimization problem through handling multiple inputs and outputs taking into consideration the interactions between these variables. The prediction strategy of the MPC controller is performed over the prediction horizon, which represents the next P time steps that the controller looks forward to the future [12]. The MPC controller simulates several future scenarios, and the optimizer chooses the best scenario based on the cost function, which represents the error between the reference target and the predicted outputs. The optimal scenario corresponds to the minimum cost function. Figure 2 shows the main structure of the MPC controller. The MPC controller is computationally expensive since it solves the online optimization problems every time step, which requires high computing capabilities and large memory. The computational load and the high resource consumption make the implementations of MPC on limited resources a big challenge [13]. Despite the increasing popularity of Python, MATLAB is still a powerful environment that users in various fields are utilizing for a huge range of applications. A MATLAB toolbox was released in September 2020 for the creation of DNN IPs for FPGAs [14], but this tool forces the developer to use high-end FPGA, even for small DNN structures. Some works propose different methods or tools to implement DNNs on FPGA [15,16,17,18,19,20,21]. As mentioned earlier and beside developing the DNN model, one of the motivations behind the work performed in this article is to propose an alternative tool to implement deep neural networks on low-end FPGA. The work includes creating the back-end of an automated tool to generate deep neural network Intellectual Properties (IPs) for further FPGA implementation.

Fig. 2
figure 2

Traditional MPC block diagram

3 Design of the MPC controller

The MPC drives the vehicle to the target point along the desired trajectory by controlling the lateral deviation d and the relative yaw angle \(\theta \) of the vehicle. Maintaining these variables to be zero or as close as possible to zero is the online optimization problem that the MPC controller must handle in real time. Since MPC is a model-based controller, the design process has two main steps, designing the plant model (the vehicle) first and then designing the MPC controller in the second step. The design process includes tuning the parameters of the controller and formulating the operating conditions that are imposed by the system in the form of soft and hard constraints.

3.1 The vehicle model

The dynamic model is represented by Eqs. 1, 2 and 3. Figure 3 shows the global position of the vehicle, where \(v_x\), \(v_y\) are the longitudinal and the lateral velocities respectively, d is the lateral deviation, m is the total mass of the vehicle, \(l_r\) is the distance between the rear tire and the center of the gravity, \(l_f\) is the distance between the front tire and the center of the gravity, \(l_z\) is yaw moment, \(c_f\), \(c_r\) are the corner stiffness of the front and rear tires respectively, \(\delta \) is the steering angle, \(\theta \) is the yaw angle, \(\rho \) is the curvature, and \(\omega \) is the yaw rate. The lateral and yaw motions are determined by the fundamental laws of motion, meaning that they are determined by the forces that are applied on the front and rear tires.

3.2 The MPC model

The first step in designing the MPC model is to determine the input–output signals of the vehicle model and the second step is to set the parameters and determine the constraints. The manipulated variable (steering angle \(\delta \)) and the disturbance (\(v_x\) \(\rho \)) are determined as inputs, while lateral velocity \(v_y\), lateral deviation d, yaw angle \(\theta \), and yaw rate \(\omega \) are determined as outputs. The design parameters of the MPC controller were tuned during the design process and several standard recommendations were taken into consideration to determine their values. Sample time (\(T_s\)) determine the rate that MPC controller executes the control algorithm. In the case of long \(T_s\), the controller will not be able to response in time to the disturbance. On the other hand, in case of too small \(T_s\), the controller will response faster, but the computational loads will increase. Prediction horizon (P) is chosen in a way that covers the dynamic changes of the environment. The recommendation is to chose P to be 20 to 30 samples. By taking into consideration that the first two control actions have the highest impact on the response behaviour, determining a large control horizon (M) increases the computational load, while a small M increase the stability. The parameters of the MPC model are determined as flows: the sample time \(T_s = 0.1\) s, the prediction horizon \(P = 2\) s, and the control horizon \(M = 2\) s. The constraints are determined as follows: the steering angle is in the range [\(-\)1.04, 1.04] rad and the yaw angle rate is in the range [\(-\)0.26, 0.26] rad. The parameters were maintained during the design process until satisfactory behavior was obtained. The overall design of the MPC and plant model is shown in Fig. 4.

Fig. 3
figure 3

Global position of the vehicle model

$$\begin{aligned} \frac{d}{dt} \begin{bmatrix} {v_y}\\ {\omega }\\ {d}\\ \theta \end{bmatrix}= \begin{bmatrix} {A} \end{bmatrix} \begin{bmatrix} {v_y}\\ {\omega }\\ {d}\\ \theta \end{bmatrix}+\begin{bmatrix} {B} \end{bmatrix} \begin{bmatrix} {\delta }\\ {\rho } {v_x} \end{bmatrix} \end{aligned}$$
(1)
$$\begin{aligned} A = \begin{bmatrix}\\ frac{2 c_f + 2 C_r}{m v_x} &{} - v_x -\frac{2c_fl_f-2c_rl_r}{{m v_x}} &{} 0 &{} 0 \\ -\frac{2c_fl_f-2c_rl_r}{{I_zv_x}} &{} -\frac{2 C_f l_f^2 - 2 C_r l_r^2}{l_z v_x}&{} 0 &{} 0\\ 1 &{} 0 &{} 0 &{} v_x\\ 0 &{} 1 &{} 0 &{} 0 \end{bmatrix} \end{aligned}$$
(2)
$$\begin{aligned} B = \begin{bmatrix} \frac{2c_f}{m} &{} 0 \\ \frac{2c_fl_f}{I_z} &{} 0 \\ 0 &{} 0 \\ 0 &{} 1 \end{bmatrix} \end{aligned}$$
(3)
Fig. 4
figure 4

General MPC and plant model diagram

4 Design of the DNN model

The model architecture, the data preparation, the training, the validation and the testing processes were carried out taking into consideration the nature of the task that the controller is dedicated for. The architecture of the suggested model consists of an input layer with 6 observations and inputs (yaw angle \(\theta \), lateral deviation d, lateral velocity \(v_x\), yaw angle rate \(\omega \), distribution \(\rho \hspace{1pt} v_x\), and the previous control action \(\delta ^*\)), an output layer (regression layer) with one output (steering angle) and fully connected hidden layers. The output layer holds the mean-squared error as a loss function (Fig. 5). After designing the MPC controller, the efficiency of the MPC controller in solving the determined optimization problem (driving the vehicle to the desired trajectory) is verified in order to generate the data set that is used to train the DNN model.

4.1 Data preparation and training process

The data set is generated by implementing the MPC controller against a massive number of scenarios that cover the maximum number of the possible environment’s states, and then obtaining and recording the control actions in the data set. The size and type of the generated data set is (120,000 \(\times \) 6), double data type, where 6 refer to the number of the state space variables. The generated data set is divided into three sets, which are: training set that is used to train the model, validation set that is used to validate the model during the training and testing set that is used to test the model after being trained. After designing the deep neural network, defining the training options, the training process is performed using the training and the validation data sets. The training stops after the final iteration. The details showed that 9680 iterations are needed to perform the training, 40 epochs and 242 iterations per each (40 *242 = 9680). The validation process was performed every 50 iterations. The validation loss (root mean square error RMSE) was almost the same for each mini batch (RMSE = 0.010799), which means that the trained DNN does not over fit. After, the trained neural network was tested using the testing data set. The performance of the DNN model is evaluated comparing to the performance of the MPC controller, where the RMSE between the outputs of the controllers is calculated. The obtained root mean square error by the end of the testing process was: RMSE = 0.011228 rad, which is a very small compared to the range of the steering angle [\(-\)1.04, 1.04] rad. This small value indicates to that the DNN model successfully imitated the behavior of the MPC controller. The training options that is used to train the DNN model are presented in table 1.

Fig. 5
figure 5

The DNN model architecture representation

Table 1 Training options of the DNN model

5 Auto generation of DNN IP procedures

Custom IP generation is a tricky step but crucial for FPGA implementations. Various tools are provided for the designers to perform such steps, but all seem to be time-consuming, especially for applications that consume huge computational resources. Deep neural networks are becoming widely used in all fields, hence, their deployment must be simplified for developers, engineers, and scientists [22]. However, DNNs are time and resource consuming if they are implemented on sequential computational systems such as \(\mu P\)/\(\mu C\) (microprocessors/microcontrollers) or Digital Signal Processors (DSPs) [23]. That is why DNNs are more likely to be implemented with parallel computing systems such as FPGAs and graphical processing units (GPUs). It is evident that the most economical solution for these applications is to adapt a dedicated Application-specific integrated circuit (ASIC) conditionally upon mass production. GPU are known for their ability to execute several parallel processes, which makes their application for image-processing favorable [24]. Since the neural networks can be computed in a parallel way, GPU can be dedicated for such applications [9, 25,26,27]. However, they are known to be power consuming, which makes their use inadequate for embedded applications. Re-configurable computing is an efficient alternative solution for parallel processes where all the computations can be executed at the same time. The most common re-configurable technologies are FPGAs [10, 28], in addition, recently field-programmable analog array (FPAAs) have become a hot topic for research [26, 27]. However, because of their lack of hardware resources, the use of FPAAs are limited to scientific research applications [29]. Many studies have proven that FPGA-based neural network implementations provide much better results in terms of power consumption and timing performances [10, 22, 30]. However, the lack of FPGA hardware resources restricts their use to limited DNN sizes. In this context, the provided tool is developed to implement deep neural networks on low-end FPGAs, where the user is given the options to optimize the model (parameters, datatype,...ets) in away that achieve the balance between the available resources and the desired performance. The targeted FPGA which is used in this work is the Xilinx Kintex-7- KC705 chip, which is known for its hardware resources limitations. Nevertheless, the new tool has the ability to deploy the generated IP on larger FPGAs/SoCs. The tool is based on the Xilinx System Generator (XSG), where blocks are automatically invoked, parameterized, and linked from the script. The procedure of the neural network’s IP auto-generating has two main steps, as shown in Fig. 6.

The most common re-configurable technologies are FPGAs [18, 26], in addition, recently field-programmable analog array (FPAAs) have become a hot topic for research [16, 17]. However, because of their lack of hardware resources, the use of FPAAs are limited to scientific research applications [19]. Many studies have proven that FPGA-based neural network implementations provide much better results in terms of power consumption and timing performances [20, 24, 26]. However, the lack of FPGA hardware resources restricts their use to

Fig. 6
figure 6

Flow of deep neural network IP automated generating

First, the user is asked to define the parameters concerning the DNN (the structure, the data type and the activation function) and the targeted computational HW and the values of weights and biases parameters are imported from the pre-trained DNN. Then comes the step of setting the input/output interfacing mode, where 4 ways are available: UART, AXI, constrained parallel, and no interface modes. If the UART interfacing mode is selected, two additional pre-designed IPs are invoked that are responsible for receiving and transmitting UART data from/to the DNN IP. In this mode, the user is asked to specify the ports to be used for Tx and Rx. If the AXI mode is selected, it provides the possibility for the DNN IP to communicate with the processing system (PS) or the soft microprocessor core. If UART or AXI modes are set, additional IPs will be invoked (UART_Tx IP, UART_Rx IP, AXI-interface IP, and Processor System Reset IP), hence, some additional hardware resources and power are required. The other disadvantage of the AXI interfacing mode is its power consumption since PS consumes 1.53 W. In this paper, only the AXI interfacing mode is utilized (the UART mode will be treated in a future work). The third interfacing mode is the constrained parallel I/O where the user is asked to specify the ports to be used. This mode allows parallel communication from/to DNN, and hence there will be no latency caused by the data transmission and reception, however, this method consumes a lot of I/O resources. It is therefore not practical for the majority of the DNN applications. If no interface mode is selected, the connection is unconstrained, which permits IP-IP interconnection. Table 2 summarizes the interfacing possibilities for the presented tool, where AXI interface remains the appropriate one for the studied application in this paper After setting up the DNN-IP preferences, the XSG automation part begins, which consists of invoking the elementary computational components needed for each neuron, linking the components and the neurons, setting the weights and biases accordingly, implementing the I/O, and then generating the IP. Figure 7 shows the auto-generated DNN circuit on the Xilinx System Generator to be implemented on a low-end FPGA.

Table 2 DNN IP generator tool interfaces
Fig. 7
figure 7

The auto-generated deep neural network structure on Xilinx System Generator to be implemented on a low-end FPGA

The implementation of the FPGA design is not a straightforward process due to the lack of a direct connection between the algorithms’ design and the hardware, in addition to the deviations that can be caused by the difference between the fixed-point and floating-point implementations of the algorithm’s specifications. Also, the hand-written code is error-prone and can be hard to debug. In order to address these problems and provide an integrated workflow with an unified environment for algorithm design, simulation, validation, and implementation, the suggested solution was performed using Matlab, Simulink, and the Xilinx System Generator-based tool “Automatic DNN IP Generator”. Figure 8 shows the detailed steps of implementation.

Fig. 8
figure 8

The implementation steps of the solution

6 Results and discussion

The implementations of the traditional MPC, the DNN model, and the deployment of the DNN on FPGA are discussed in terms of performance, taking the response of the traditional MPC as reference behavior. In addition to the performance, the deployments using floating point and fixed-point data type can be discussed in terms of resource consumption. The performance of the controllers was evaluated based on the settling time \(T_s\), the overshoot \(M_p\), and the final value (steady-state) of the yaw angle and the lateral deviation of the vehicle. The overshooting shows the amount that the lateral deviation/yaw angle overshoots (exceeds) its target (final) value, while the settling time shows the time required to settle and reach the final value within a certain percentage. The settling time of the performance indicators were determined to be the time that the signal reach 5% (commonly used) of its final value. Figure 9 clearly shows that the DNN model and the traditional MPC have a very similar response, meaning that the DNN model successfully imitated the behavior of the traditional MPC, while the variant of DNN on FPGA has a slightly different response. In order to evaluate these behaviors, Fig. 10, Fig. 11 and Table 3 present the performance indicators, which show that the traditional MPC, the DNN model, and the DNN on FPGA version all successfully drive the lateral deviation and yaw angle to be zero or very close to zero as a desired steady state. The detailed results show that the settling time for both indicators is almost the same in the case of the traditional MPC and the DNN model, while approximately 0.342 s in the case of lateral deviation and 0.3013 s in regards of yaw angle were noticed in the behavior of the DNN on FPGA version. On the other hand, the behavior was very similar in regards of the overshooting, where only 0.0492 m for the lateral deviation and 0.0339 rad for the yaw angle were noticed in the response of the DNN on FPGA compared to the response of the traditional MPC. These results demonstrate that the trained DNN model provided satisfactory performance and the vehicle was driven smoothly to the desired destination. Despite the slightly higher overshooting and settling time that are noticed in the behaviour of the DNN after being deployed on hardware (FPGA) compared to the simulation, the DNN on FPGA provided a satisfactory performance and met the safety requirements that were determined in the designing process. In addition to performance, in order to evaluate the efficiency of deploying the DNN on FPGA in terms of resource consumption, the main estimated resource utilization of the deployments using fixed-point and floating-point data types were compared and presented in Table 4. The results show that DNN on FPGA using fixed-point data consumes fewer resources compared to using floating-point data, where \(86.29 \% \) of the LUTs and \(51.54 \%\) of the DSPs were saved from the overall resource availability of the FPGA board.

Fig. 9
figure 9

Estimated steering angle computed on FPGA compared to simulation results of the traditional MPC and the DNN model

Fig. 10
figure 10

Vehicle response in terms of lateral deviation: comparison of FPGA implementation and simulation results

Fig. 11
figure 11

Vehicle response in terms of yaw angle: comparison of FPGA implementation and simulation results

Table 3 Performance indicators for controller behavior
Table 4 Resource utilization of DNN on FPGA for fixed point versus floating point data

7 Conclusions

In this paper, a deep neural network was designed and trained based on the behavior of the traditional MPC controller. A new tool based on the Xilinx System Generator was developed to perform and optimize the deployments of the DNN model on FPGAs. Results showed that the trained model successfully imitated the behavior of the MPC controller with a very small root mean square error (\(RMSE = 0.011228\) rad). The trained DNN model was efficiently deployed on low-end FPGA Xilinx Kintex-7 FPGA KC705 using fixed-point data type, achieving satisfactory performance and meeting the design’s constraints.