Advancing Autonomous Driving Safety Through LLM Enhanced Trajectory Prediction

Cheng, Qian; Jiao, Xinyu; Yang, Mengmeng; Yang, Mingliang; Jiang, Kun; Yang, Diange

doi:10.1007/978-3-031-70392-8_71

Qian Cheng¹⁷,
Xinyu Jiao¹⁷,
Mengmeng Yang¹⁷,
Mingliang Yang¹⁷,
Kun Jiang¹⁷ &
…
Diange Yang¹⁷

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Advanced Vehicle Control Symposium

Abstract

In recent years, there has been remarkable progress in autonomous driving technology. To improve the safety of autonomous driving comprehensively, accurate predictions for all traffic agents are crucial. Typically, the graph neural network is widely employed for the trajectory prediction. To enhance the prediction accuracy rate, this paper utilizes a finetuned vision-to-language large model to extract driving intentions. With the well-designed prompt and the supervision of the specific dataset, the LLM (large language model) can analyze the current traffic condition and give the corresponding driving intention. This paper also combines the result of the LLM and the output of the traditional prediction model, and the future trajectory is modified with the driving intention, which can improve the final prediction accuracy. Finally, in the decision-making part, both the driving intention from the LLM and the trajectory from the traditional prediction model are considered in the boundary-based drivable area, and a safe planning path is then generated. According to the validation in the public motion forecasting dataset, this method has greatly improved the accuracy of the prediction and the safety of route planning.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

In order to ensure the safety of autonomous driving, predicting the future trajectory of traffic participants is essential. However, the trajectory prediction is a complex task due to the complicated traffic situation, especially during rush hours and interaction between different traffic participants [1]. In the traditional prediction model, the environmental information, such as the lane and other vehicles’ positions is detected by the perception system and then input into a neural network. Current methods primarily utilize graph neural networks to extract key interaction information [2, 3]. Waymo’s VectorNet model, for instance, encodes the environmental information as vectors and tensors, enhancing effectiveness and accuracy [4].

Recently, large language model (LLM) has been effectively applied in various fields. For trajectory prediction, LLMs’ prior knowledge and understanding ability are used to extract the scene information and interaction information. Based on this, model like Traj-LLM [5] and the transportation-context map [6] have enhanced the motion prediction performance. Unlike existing methods which enhance prediction performance by feeding enhanced scene information into prediction networks, this paper unitizes the high-level information generated by LLM for the adaptive modification of the output trajectory of the prediction network. Compared with the less interpretability of the network, our method can explicitly handle the output of the LLM and according to the modification result to improve the algorithm. In the meantime, we use a finetuned image-text large language model, trained on a specialized dataset, which can greatly enhance the understanding of the LLM for the specific traffic scene. The contributions of this paper are summarized as follows:

1) Innovatively utilizing a fine-tuned large language model to adaptively modify predicted trajectories with inferred driving intentions, enhancing prediction accuracy.
2) Considering both the driving intentions and predicted trajectories in the boundary-based drivable area model and allowing the safety margin to maximize the safety (Fig. 1).

2 Method

Firstly, the dataset is processed and input into a prediction network to generate corresponding trajectory outputs. Simultaneously, the dataset is visualized in a specific form. Through the reasonable setting of prompts and supervised learning on a custom dataset, namely the finetuning process, the large language model's ability to recognize and understand the visualized dataset is significantly enhanced. The large language model then outputs assessments the driving intentions based on this recognition.

Secondly, the driving intention assessment is then utilized to modify the trajectory output by the prediction network, thereby improving prediction accuracy.

Finally, the modified trajectories are input into a boundary-based drivable area model, which comprehensively considers both driving intentions and predicted trajectories to achieve safe decision-making.

2.1 The Finetune of the LLM

Firstly, the public dataset needs to be visualized in a fixed form as a picture. The picture should contain all the related traffic elements, including the current position $P_{cur}$, the current velocity $V_{cur}$, the current yaw $\theta$, and the trajectory of the past 2 s $\zeta_{2 seconds}$, and the center line of the road ${\text{\rm M}}_{lane}$. All the elements are described by specific geometric shapes.

$$ \begin{array}{*{20}c} {{\Theta }_{per\;frame} = \left( {X\left\{ {P_{cur} ,{ }V_{cur} ,{ }\theta ,\zeta_{2{ }seconds} } \right\},{\text{ {\rm M}}}_{lane} } \right)} \\ \end{array} $$

(1)

The setting of the prompt firstly should consider the explanation of the above traffic elements, and then describe the task for the driving intention judgement. The annotation of the custom is according to the ground truth of the dataset, and the label is stored in the corresponding JSON file.

According to the actual driving behavior, it can be divided into lateral intention and longitudinal intention. According to the amplitude, laterally it can be divided into left turn, left lane change, right turn, right lane change, going straight, U-turn, and longitudinally it can be divided into acceleration, deceleration, braking, and uniform speed.

2.2 Merge of the Driving Intention and the Predicted Trajectory

For the merge of the driving intention and the predicted trajectory, the prior information is the accuracy rate of the driving intention and the accuracy rate of the predicted trajectory, obtained by statistics on validation dataset. Also, in most cases, the vehicle follows the centerline of the road, which is regarded as an important reference.

And the final modified trajectory is related to the predicted trajectory $\zeta_{predicted}$, driving intention I, the accuracy rate of the LLM $\tau_{LLM}$, the accuracy rate of the predicted trajectory $\tau_{NN}$, and the neighbor centerline of the road $M_{lane}$, like in the Eq. (1) shows:

$$ \begin{array}{*{20}c} { \zeta_{modified} = f\left( {\zeta_{predicted} , I, \tau_{LLM} , \tau_{NN} ,M_{lane} } \right)} \\ \end{array} $$

(2)

the detailed modification process is divided into three cases:

Case 1: When the driving intention is more consistent with the neighbor centerline of the road, the modified trajectory should follow the direction of the centerline.

Case 2: When the predicted trajectory is more consistent with the neighbor centerline of the road, the modified trajectory should follow the direction between the centerline and the predicted trajectory.

Case 3: When neither the driving intention and the predicted trajectory is not consistent with the neighbor centerline, the modified trajectory should follow the weighted average of the driving intention and the weight is related to the prior information. The Eq. (5) shows the modification process for the case 3.

$$ \begin{array}{*{20}c} { \zeta_{modified,3} = \frac{{\tau_{NN} *\zeta_{predicted} + \tau_{LLM} * I}}{{ \tau_{LLM} + \tau_{NN} }}} \\ \end{array} $$

(3)

Case 4: When the driving intention is consistent with the predicted trajectory, the predicted trajectory will remain unchanged.

2.3 Decision-Making Based on Boundary-Based Drivable Area Model

Boundary-based drivable area model is an environmental model that considers the safety margin. With the modified trajectory, the future drivable area boundary will be drawn, and according to future boundaries, the safe decision-making can be generated.

The boundary of drivable area is shown in Fig. 2. The bounding boxes represents the perception result and the map limit represents the information of the HD map, after the pointing merging the final unified state-extended environment boundary is generated, and different color of the boundary represents the different attribute [7].

Combined with the driving intention and the modified predicted trajectory, it can ensure the safety of decision-making to the greatest extent.

3 Experiment

3.1 Finetune on the Vision-Text LLM

The public motion forecasting dataset Argoverse is utilized in this paper. And the LLM miniGPT4 is selected for the driving intention judgement.

Because the official finetune process only makes use of 3000 pictures for the whole scene, we utilize 500 frames in Argoverse with the annotation for the single scene. And the information of the hardware and the train parameter is in Table 1.

Table 1. Configuration of the finetune.

Full size table

3.2 Modification of the Predicted Trajectory

The typical prediction model Vectornet is selected as the baseline for the output of the trajectory and the network is trained for 25 epochs on the dataset Argoverse. And the typical modified trajectory is shown in the Fig. 3 below:

The red rectangle represents the target vehicle and the gray line represents the centerline of the road. The predicted trajectory by the Vectornet is in yellow, the modified trajectory is in green and the ground truth is in red. From the picture, it is demonstrated that the network predicts the wrong direction to straight, and with the modification of the driving intention, the green line is much close to the ground truth, which proves that the modification is effective.

According to the modification method in Sect. 2.2, with the fusion of the driving intention of the LLM, the total prediction average distance error is greatly decreased by 31.9%, from 4.67 to 3.18.

3.3 Safe Decision-Making Based on Boundary-Based Drivable Area Model

The driving intention and the modified trajectory are considered in the boundary-based drivable area model, and the corresponding drivable space is generated. The planner for this part is the Werling planner based on the Frenet frame. The picture below shows the safety and the reliability of this method (Fig. 4):

Subpicture 1 represents the real scene of the dataset. The green car is the ego vehicle and chose to turn right in this scene, while the red car is the predicted vehicle and actually went straight. Subpicture 2 then shows the drivable space at the current moment. Subpicture 3 shows the drivable space at future moments and gives the planned trajectory of the ego vehicle going straight according to the prediction information of VectorNet. Subpicture 4 comprehensively considers the driving intention output by the LLM and the predicted trajectory to draw the drivable space and the planned trajectory of the ego vehicle at future moments, turning right.

In the subpicture 3, VectorNet model wrongly gives the prediction that the target vehicle (annotated with the red circle) will turn left, while subpicture 4 that comprehensively considers the LLM information to compensate the drivable boundary and reduces the drivable area. In terms of decision-making, the ego vehicle in subpicture 4 also avoids the target vehicle by an earlier right turn, thereby avoiding the safety risk of the ego vehicle crashing straight into the target vehicle which demonstrates that this method is safe and reliable.

4 Conclusion

This paper proposed a LLM enhanced trajectory prediction method. We finetune the large language model, so that the large language model has better prior knowledge of traffic scenes in general situations, and thus can give the driving intention of vehicles in the scene. At the same time, this paper designs an adaptive trajectory modification method, which utilizes the driving intention to modify the predicted trajectory, resulting in an improvement of the accuracy. Finally, combined with the boundary-based driving area model, this paper also adds the driving intention into the judgement of the drivable area, which generates a safer decision-making.

References

Mozaffari, S., Al-Jarrah, O.Y., Dianati, M., Jennings, P., Mouzakitis, A.: Deep learning-based vehicle behaviour prediction for autonomous driving applications: a review. IEEE Trans. Intell. Transport. Syst. 23(1), 33–47 (2020)
Google Scholar
Brito, B., Agarwal, A., Alonso-Mora, J.: Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios. arXiv (2021)
Google Scholar
Li, X., Ying, X., Chuah, M.C.: GRIP: graph-based interaction-aware trajectory prediction. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 3960–3966. IEEE, Auckland, New Zealand (2019)
Google Scholar
Gao, J., et al.: VectorNet: encoding HD maps and agent dynamics from vectorized representation. In: 2020 IEEE/CVF Conference on CVPR, pp. 11522–11530. IEEE (2020)
Google Scholar
Lan, Z., et al.: Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models. arXiv (2024)
Google Scholar
Zheng, X., et al.: Large language models powered context-aware motion prediction. arXiv (2024). Accessed: 29 May 2024
Google Scholar
Jiao, X., et al.: Reliable autonomous driving environment model with unified state-extended boundary. IEEE Trans. Intell. Transport. Syst. 24(1), 516–527 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (U22A20104, 52102464), Beijing Natural Science Foundation (L231008), and Young Elite Scientist Sponsorship Program By BAST (BYESS2022153) and Shuimu Tsinghua Scholar Program.

Author information

Authors and Affiliations

School of Vehicle and Mobility, Tsinghua University, Beijing, China
Qian Cheng, Xinyu Jiao, Mengmeng Yang, Mingliang Yang, Kun Jiang & Diange Yang

Authors

Qian Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Mengmeng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mingliang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Diange Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xinyu Jiao , Kun Jiang or Diange Yang .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Giampiero Mastinu
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Francesco Braghin
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Federico Cheli
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Matteo Corno
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Sergio M. Savaresi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, Q., Jiao, X., Yang, M., Yang, M., Jiang, K., Yang, D. (2024). Advancing Autonomous Driving Safety Through LLM Enhanced Trajectory Prediction. In: Mastinu, G., Braghin, F., Cheli, F., Corno, M., Savaresi, S.M. (eds) 16th International Symposium on Advanced Vehicle Control. AVEC 2024. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-70392-8_71

Download citation

DOI: https://doi.org/10.1007/978-3-031-70392-8_71
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70391-1
Online ISBN: 978-3-031-70392-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics