Keywords

1 Introduction

In order to ensure the safety of autonomous driving, predicting the future trajectory of traffic participants is essential. However, the trajectory prediction is a complex task due to the complicated traffic situation, especially during rush hours and interaction between different traffic participants [1]. In the traditional prediction model, the environmental information, such as the lane and other vehicles’ positions is detected by the perception system and then input into a neural network. Current methods primarily utilize graph neural networks to extract key interaction information [2, 3]. Waymo’s VectorNet model, for instance, encodes the environmental information as vectors and tensors, enhancing effectiveness and accuracy [4].

Recently, large language model (LLM) has been effectively applied in various fields. For trajectory prediction, LLMs’ prior knowledge and understanding ability are used to extract the scene information and interaction information. Based on this, model like Traj-LLM [5] and the transportation-context map [6] have enhanced the motion prediction performance. Unlike existing methods which enhance prediction performance by feeding enhanced scene information into prediction networks, this paper unitizes the high-level information generated by LLM for the adaptive modification of the output trajectory of the prediction network. Compared with the less interpretability of the network, our method can explicitly handle the output of the LLM and according to the modification result to improve the algorithm. In the meantime, we use a finetuned image-text large language model, trained on a specialized dataset, which can greatly enhance the understanding of the LLM for the specific traffic scene. The contributions of this paper are summarized as follows:

  • 1) Innovatively utilizing a fine-tuned large language model to adaptively modify predicted trajectories with inferred driving intentions, enhancing prediction accuracy.

  • 2) Considering both the driving intentions and predicted trajectories in the boundary-based drivable area model and allowing the safety margin to maximize the safety (Fig. 1).

2 Method

Fig. 1.
figure 1

The main process of this paper

Firstly, the dataset is processed and input into a prediction network to generate corresponding trajectory outputs. Simultaneously, the dataset is visualized in a specific form. Through the reasonable setting of prompts and supervised learning on a custom dataset, namely the finetuning process, the large language model's ability to recognize and understand the visualized dataset is significantly enhanced. The large language model then outputs assessments the driving intentions based on this recognition.

Secondly, the driving intention assessment is then utilized to modify the trajectory output by the prediction network, thereby improving prediction accuracy.

Finally, the modified trajectories are input into a boundary-based drivable area model, which comprehensively considers both driving intentions and predicted trajectories to achieve safe decision-making.

2.1 The Finetune of the LLM

Firstly, the public dataset needs to be visualized in a fixed form as a picture. The picture should contain all the related traffic elements, including the current position \(P_{cur}\), the current velocity \(V_{cur}\), the current yaw \(\theta\), and the trajectory of the past 2 s \(\zeta_{2 seconds}\), and the center line of the road \({\text{\rm M}}_{lane}\). All the elements are described by specific geometric shapes.

$$ \begin{array}{*{20}c} {{\Theta }_{per\;frame} = \left( {X\left\{ {P_{cur} ,{ }V_{cur} ,{ }\theta ,\zeta_{2{ }seconds} } \right\},{\text{ {\rm M}}}_{lane} } \right)} \\ \end{array} $$
(1)

The setting of the prompt firstly should consider the explanation of the above traffic elements, and then describe the task for the driving intention judgement. The annotation of the custom is according to the ground truth of the dataset, and the label is stored in the corresponding JSON file.

According to the actual driving behavior, it can be divided into lateral intention and longitudinal intention. According to the amplitude, laterally it can be divided into left turn, left lane change, right turn, right lane change, going straight, U-turn, and longitudinally it can be divided into acceleration, deceleration, braking, and uniform speed.

2.2 Merge of the Driving Intention and the Predicted Trajectory

For the merge of the driving intention and the predicted trajectory, the prior information is the accuracy rate of the driving intention and the accuracy rate of the predicted trajectory, obtained by statistics on validation dataset. Also, in most cases, the vehicle follows the centerline of the road, which is regarded as an important reference.

And the final modified trajectory is related to the predicted trajectory \(\zeta_{predicted}\), driving intention I, the accuracy rate of the LLM \(\tau_{LLM}\), the accuracy rate of the predicted trajectory \(\tau_{NN}\), and the neighbor centerline of the road \(M_{lane}\), like in the Eq. (1) shows:

$$ \begin{array}{*{20}c} { \zeta_{modified} = f\left( {\zeta_{predicted} , I, \tau_{LLM} , \tau_{NN} ,M_{lane} } \right)} \\ \end{array} $$
(2)

the detailed modification process is divided into three cases:

Case 1: When the driving intention is more consistent with the neighbor centerline of the road, the modified trajectory should follow the direction of the centerline.

Case 2: When the predicted trajectory is more consistent with the neighbor centerline of the road, the modified trajectory should follow the direction between the centerline and the predicted trajectory.

Case 3: When neither the driving intention and the predicted trajectory is not consistent with the neighbor centerline, the modified trajectory should follow the weighted average of the driving intention and the weight is related to the prior information. The Eq. (5) shows the modification process for the case 3.

$$ \begin{array}{*{20}c} { \zeta_{modified,3} = \frac{{\tau_{NN} *\zeta_{predicted} + \tau_{LLM} * I}}{{ \tau_{LLM} + \tau_{NN} }}} \\ \end{array} $$
(3)

Case 4: When the driving intention is consistent with the predicted trajectory, the predicted trajectory will remain unchanged.

2.3 Decision-Making Based on Boundary-Based Drivable Area Model

Boundary-based drivable area model is an environmental model that considers the safety margin. With the modified trajectory, the future drivable area boundary will be drawn, and according to future boundaries, the safe decision-making can be generated.

The boundary of drivable area is shown in Fig. 2. The bounding boxes represents the perception result and the map limit represents the information of the HD map, after the pointing merging the final unified state-extended environment boundary is generated, and different color of the boundary represents the different attribute [7].

Fig. 2.
figure 2

The fusion process of the boundary-based drivable area model [7]

Combined with the driving intention and the modified predicted trajectory, it can ensure the safety of decision-making to the greatest extent.

3 Experiment

3.1 Finetune on the Vision-Text LLM

The public motion forecasting dataset Argoverse is utilized in this paper. And the LLM miniGPT4 is selected for the driving intention judgement.

Because the official finetune process only makes use of 3000 pictures for the whole scene, we utilize 500 frames in Argoverse with the annotation for the single scene. And the information of the hardware and the train parameter is in Table 1.

Table 1. Configuration of the finetune.

3.2 Modification of the Predicted Trajectory

The typical prediction model Vectornet is selected as the baseline for the output of the trajectory and the network is trained for 25 epochs on the dataset Argoverse. And the typical modified trajectory is shown in the Fig. 3 below:

The red rectangle represents the target vehicle and the gray line represents the centerline of the road. The predicted trajectory by the Vectornet is in yellow, the modified trajectory is in green and the ground truth is in red. From the picture, it is demonstrated that the network predicts the wrong direction to straight, and with the modification of the driving intention, the green line is much close to the ground truth, which proves that the modification is effective.

Fig. 3.
figure 3

The modified trajectory in the left-turn scenario

According to the modification method in Sect. 2.2, with the fusion of the driving intention of the LLM, the total prediction average distance error is greatly decreased by 31.9%, from 4.67 to 3.18.

3.3 Safe Decision-Making Based on Boundary-Based Drivable Area Model

The driving intention and the modified trajectory are considered in the boundary-based drivable area model, and the corresponding drivable space is generated. The planner for this part is the Werling planner based on the Frenet frame. The picture below shows the safety and the reliability of this method (Fig. 4):

Fig. 4.
figure 4

The decision-making comparison between original prediction and modified prediction

Subpicture 1 represents the real scene of the dataset. The green car is the ego vehicle and chose to turn right in this scene, while the red car is the predicted vehicle and actually went straight. Subpicture 2 then shows the drivable space at the current moment. Subpicture 3 shows the drivable space at future moments and gives the planned trajectory of the ego vehicle going straight according to the prediction information of VectorNet. Subpicture 4 comprehensively considers the driving intention output by the LLM and the predicted trajectory to draw the drivable space and the planned trajectory of the ego vehicle at future moments, turning right.

In the subpicture 3, VectorNet model wrongly gives the prediction that the target vehicle (annotated with the red circle) will turn left, while subpicture 4 that comprehensively considers the LLM information to compensate the drivable boundary and reduces the drivable area. In terms of decision-making, the ego vehicle in subpicture 4 also avoids the target vehicle by an earlier right turn, thereby avoiding the safety risk of the ego vehicle crashing straight into the target vehicle which demonstrates that this method is safe and reliable.

4 Conclusion

This paper proposed a LLM enhanced trajectory prediction method. We finetune the large language model, so that the large language model has better prior knowledge of traffic scenes in general situations, and thus can give the driving intention of vehicles in the scene. At the same time, this paper designs an adaptive trajectory modification method, which utilizes the driving intention to modify the predicted trajectory, resulting in an improvement of the accuracy. Finally, combined with the boundary-based driving area model, this paper also adds the driving intention into the judgement of the drivable area, which generates a safer decision-making.