STS-GAN: Spatial-Temporal Attention Guided Social GAN for Vehicle Trajectory Prediction

Chen, Yanbo; Yu, Huilong; Xi, Junqiang

doi:10.1007/978-3-031-70392-8_24

Yanbo Chen¹⁷,
Huilong Yu¹⁷ &
Junqiang Xi¹⁷

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Advanced Vehicle Control Symposium

Abstract

Accurately predicting the trajectories of other vehicles is crucial for autonomous driving to ensure driving safety and efficiency. Recently, deep learning techniques have been extensively employed for trajectory prediction, resulting in significant advancements in predictive accuracy. However, existing studies often fail to explicitly distinguish the impact of historical inputs at different time steps and the influence of surrounding vehicles at distinct locations. Moreover, deep learning-based approaches generally lack model interpretation. To overcome the issues, we propose the Spatial-Temporal Attention Guided Social GAN (STS-GAN). In the generator, we proposed a spatial-temporal attention mechanism to guide the utilization of trajectory features and interaction of the target vehicle with its surrounding vehicles. The spatial attention mechanism evaluates the importance of surrounding vehicles for predictions of the target vehicle, while the temporal attention mechanism learns the significance of historical trajectory information at different historical time steps, thereby enhancing the model interpretation. A convolutional social pooling module is employed to capture interaction features from surrounding vehicles, which are subsequently fused with the attributes of the target vehicle. Experimental results demonstrate that our model achieves competitive performance compared with state-of-the-art methods on publicly available datasets.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Recently, vehicle trajectory prediction has garnered significant attention due to its critical applications in autonomous driving [1, 2]. However, predicting the trajectories of social vehicles is not trivial due to the inherent uncertainty variability in the motion patterns of objects [3].

Benefiting from the potent deep learning, pioneering work in vehicle trajectory prediction has addressed some of the above challenges. Variational Autoencoders (VAE) [4], Generative Adversarial Networks (GAN) [5], and Graph Neural Networks (GNN) [6] have been utilized to learn trajectory representations and generate multiple possible trajectory samples, effectively capturing multimodal features. These techniques model the complex relationships between vehicles and capture social interactions, leading to more accurate trajectory predictions.

Despite significant progress, existing vehicle trajectory prediction methods struggle with interpretability, especially concerning long-term historical data and nearby vehicle information. Questions about which parts of historical trajectories or nearby vehicle positions influence future motion and how to quantify this influence remain unanswered. To address this, we introduce a spatial-temporal attention mechanism in our STS-GAN model. This approach matches the prediction accuracy of state-of-the-art techniques and enhances interpretability by highlighting the influence of historical trajectories and nearby vehicles through attention weights. Our main contributions are:

1)
Proposing a spatial-temporal attention-guided social GAN model for vehicle trajectory prediction;
2)
Developing a temporal attention mechanism to identify the importance of historical trajectories at different times for predicting future behavior;
3)
Designing a spatial attention mechanism to quantify the influence of nearby vehicles on the trajectory prediction of the target vehicle.

2 Methods

The overall network architecture is shown in Fig. 1. To better understand the importance of different vehicle locations for prediction, similar to research [7], we define a $3 \times 13$ spatial grid around the predicted vehicles (Fig. 1).

LSTM Encoder. We first use a single-layer fully connected (FC) network to embed the position of each vehicle $x^i_t$, obtaining the vector $e^{e,i}_{t}$. Then, the LSTM encoder processes these embedding vectors for each vehicle i over time steps $t = 1, ..., h$.

Temporal Attention. The hidden states of vehicle v in the LSTM encoder are denoted as $H_t^{e,v}=\{h^{e,v}_{t-h},...,h_j^{e,v},...,h_t^{e,v}\}$. Subsequently, the temporal attention weights are computed as follows:

$$\begin{aligned} A_t^v=softmax(tanh(W_\alpha H_t^{e,v})). \end{aligned}$$

(1)

Next, the hidden states $H_t^{e,v}$ and temporal attention $ A_t^v$ are combined through a weighted processing, resulting in:

$$\begin{aligned} \mathcal {H}_t^v=H_t^{e,v}(A_t^v)^{\top }=\sum _{j=t-h}^{t} {\alpha _t^v h_t^{e,v}}. \end{aligned}$$

(2)

Spatial Attention. Each cell on the grid is denoted as $G_t = \{G_t^1, ..., G_t^N\}$. N is the total number of grid cells, which can be calculated as follows

$$\begin{aligned} G_t^n = \left\{ \begin{aligned} \mathcal {H}_t^v, \quad & \textrm{if} \; \textrm{any} \; \textrm{vehicle} \; v \; \textrm{locates} \; \textrm{at} \; \textrm{grid} \; \textrm{cell} \; n\\ \textbf{0} \in \mathbb {R}^{d \times 1}, \quad & \textrm{otherwise} \end{aligned} \right. \end{aligned}$$

(3)

The spatial attention weights for all vehicles at time step t, denoted as $B_t=\{\beta _t^1,...,\beta _t^n,...,\beta _t^N\}$, are calculated as follows:

$$\begin{aligned} B_t=softmax(tanh(W_\beta G_t)), \end{aligned}$$

(4)

where $W_\beta $ is learnable weights matrix. Finally, we combine all of the historical information from its surrounding vehicles as follows:

$$\begin{aligned} {V}_t= G_t (B_t)^{\top }= \sum _{n=1}^{N} {\beta ^n_t G_t^n}. \end{aligned}$$

(5)

LSTM Decoder. After concatenating the nearby vehicles’ spatial-temporal feature vectors, and their social context vectors, we use an LSTM layer followed by a FC layer to predict the future trajectory.

Discriminator. The discriminator evaluates the accuracy of the predicted and actual trajectories

$$\begin{aligned} h^{D,i}_{t+1}=LSTM(h^{D,i}_{t},x^{D,i}_{t};W_{D,encoder}), \end{aligned}$$

(6)

$$\begin{aligned} s^{D,i}_{t+1}=Sigmoid(FC(\boldsymbol{h}^{D,i}_{t+1};W_{D})). \end{aligned}$$

(7)

3 Datasets and Experiments Setup

STS-GAN is trained and evaluated using the Next Generation Simulation (NGSIM) ([8]) US-101 and I-80 datasets, each containing 45-min vehicle trajectories split into six 15-min segments. These segments are further divided into training, validation, and test datasets in a 0.7 : 0.1 : 0.2 ratio, resulting in 5, 922, 867 training entries, 859, 769 validation entries, and 1, 505, 756 test entries.

The Average Displace Error (ADE) and the Final Displacement Error (FDE) are employed as the performance metrics to evaluate the prediction accuracy, defined as:

$$\begin{aligned} \begin{aligned} \text {ADE} & = \frac{\sum _{i=1}^{n} \sum _{T=t+1}^{t+p} ||x_T^i-\hat{x}_T^i||}{np}, \\ \text {FDE} & = \frac{\sum _{i=1}^{n} ||x_{t+h}^i-\hat{x}_{t+h}^i|| }{n}, \end{aligned} \end{aligned}$$

(8)

where n represents the number of predicted samples. $\hat{x}^i$ and $x^i$ are the predicted and true trajectories of group i data, respectively. The batch size is set to 128, the optimiser used is Adam with a learning rate of 0.001, and the number of training epochs is 10.

Table 1. Performance Metrics (ADE/FDE) Comparison with Other Methods

Full size table

To verify the effectiveness of STS-GAN in vehicle trajectory prediction, we compare several state-of-the-art methods. Additionally, to validate the effectiveness of the network structure and the proposed spatial-temporal attention mechanism, we also design ablation experiments. Specifically, we evaluate: 1) CS-LSTM [7], an LSTM encoder-decoder model using a convolutional pooling layer; 2) STA-LSTM [9], a trajectory prediction model that incorporates spatial-temporal attention mechanisms in LSTM networks; 3) ST-GAN, a GAN-based network for spatial-temporal attention mechanisms, but without the introduction of convolutional social pooling; 4) SS-GAN, a GAN-based network that incorporates convolutional social pooling and spatial attention mechanisms, but without temporal attention mechanisms; and 5) TS-GAN, a GAN-based network that incorporates convolutional social pooling and temporal attention mechanisms, but without spatial attention mechanisms.

4 Results and Analysis

Table 1 compares the ADE/FDE of different models over prediction horizons from 1 to 5 s. STS-GAN outperforms other models across short-term, and long-term predictions, showcasing its superior predictive capabilities. Specifically, CS-LSTM performs the worst due to the absence of attention mechanisms, resulting in higher errors. STA-LSTM, despite incorporating spatial-temporal attention mechanisms, lacks social pooling and generative adversarial mechanisms, leading to lower predictive accuracy. ST-GAN, an ablation study without social pooling, exhibits decreased accuracy compared to STS-GAN, emphasizing the importance of considering social interactions. SS-GAN, focusing on temporal attention, shows slightly lower accuracy than STS-GAN, suggesting limited improvement from the temporal attention mechanism. TS-GAN, concentrating on spatial attention, also demonstrates slightly lower accuracy than STS-GAN but still outperforms models lacking spatial attention.

We calculate the average weights for the last 15 time steps (from $t - 14$ to t) within each interval. Figure 2 displays these weights from time $t-5$ to t due to smaller weights before $t-5$. The results reveal that the weight is highest at the current time step t, indicating that the future trajectory of the target vehicle is primarily influenced by its recent trajectory and those of nearby vehicles. This finding aligns with human cognition.

We further analyze the spatial attention mechanism and observe that the spatial attention weights of the predicted vehicle are highest within the grid space. Combined with the earlier analysis of temporal attention, this suggests that the future trajectory of the predicted vehicle is largely influenced by its own driving state.To illustrate the distribution of attention weights of nearby vehicles, we select two typical scenarios. We then normalize and plot the remaining attention weights on a 3$\times $13 grid, excluding those of the predicted vehicle. In Fig. 3(a), we depict a common driving scenario where the predicted vehicle primarily focuses on the vehicle ahead in the same lane, with relatively high weights (e.g., $28.4\%$, $16.3\%$, $17.9\%$), while the weights in other grids are relatively low. Notably, the weight of the grid directly in front of the predicted vehicle is low, possibly due to the typically large following distance for driving safety, resulting in the grid directly ahead often being unoccupied. Figure 3(b) illustrates the spatial weight distribution in a left lane-changing scenario. Unlike the common driving scenario, the predicted vehicle does not focus as much on the vehicle ahead in the same lane but instead pays more attention to vehicles in the target lane, both in front and behind. This observation aligns with human driving experience, where drivers assess lane change opportunities by observing the behavior of vehicles in the target lane.

5 Conclusions

This paper presents STS-GAN, a spatial-temporal attention guided social GAN model for vehicle trajectory prediction. The temporal attention mechanism highlights significant time points in historical trajectories, while the spatial attention mechanism measures the influence of nearby vehicles. Key findings from ablation experiments and comparisons with state-of-the-art models include: 1) STS-GAN achieves state-of-the-art prediction accuracy, 2) recent historical trajectory segments are sufficient for accurate predictions, and 3) although the accuracy of STS-GAN is similar to that of ST-GAN and SS-GAN, it offers better interpretability through its spatial-temporal attention weights.

References

Carvalho, A., Gao, Y., Lefevre, S., Borrelli, F.: Stochastic predictive control of autonomous vehicles in uncertain environments. In: 12th International Symposium on Advanced Vehicle Control, vol. 9 (2014)
Google Scholar
Kapania, N.R., Gerdes, J.C.: An autonomous lanekeeping system for vehicle path tracking and stability at the limits of handling. In: Proceedings of the 12th International Symposium on Advanced Vehicle Control (AVEC), pp. 720–725 (2014)
Google Scholar
Benrachou, D.E., Glaser, S., Elhenawy, M., Rakotonirainy, A.: Use of social interaction and intention to improve motion prediction within automated vehicle framework: a review. IEEE Trans. Intell. Transp. Syst. 23(12), 22807–22837 (2022)
Article Google Scholar
Zhu, W., Lü, C., Chen, X.: A crash occurrence risk prediction model based on variational autoencoder and generative adversarial network. Transportmetrica B Transp. Dyn. 12(1), 2358211 (2024)
Article Google Scholar
Guo, L., Ge, P., Shi, Z.: Multi-object trajectory prediction based on lane information and generative adversarial network. Sensors 24(4), 1280 (2024)
Article Google Scholar
Yin, Y.H., Lü, X., Li, S.K., Yang, L.X., Gao, Z.Y.: Graph representation learning in the its: Car-following informed spatiotemporal network for vehicle trajectory predictions. In: IEEE Transactions on Intelligent Vehicles (2024)
Google Scholar
Deo, N., Trivedi, M.M.: Convolutional social pooling for vehicle trajectory prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1468–1476 (2018)
Google Scholar
SIMulation, G.: US Highway 101 Dataset (2007)
Google Scholar
Lin, L., Li, W., Bi, H., Qin, L.: Vehicle trajectory prediction using lstms with spatial-temporal attention mechanisms. IEEE Intell. Transp. Syst. Mag. 14(2), 197–208 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mechanical Engineering, Beijing Institute of Technology, Beijing, China
Yanbo Chen, Huilong Yu & Junqiang Xi

Authors

Yanbo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huilong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Junqiang Xi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huilong Yu .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Giampiero Mastinu
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Francesco Braghin
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Federico Cheli
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Matteo Corno
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Sergio M. Savaresi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Yu, H., Xi, J. (2024). STS-GAN: Spatial-Temporal Attention Guided Social GAN for Vehicle Trajectory Prediction. In: Mastinu, G., Braghin, F., Cheli, F., Corno, M., Savaresi, S.M. (eds) 16th International Symposium on Advanced Vehicle Control. AVEC 2024. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-70392-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-70392-8_24
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70391-1
Online ISBN: 978-3-031-70392-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics