Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction

Fu, Jiawei; Zhao, Xu

doi:10.1007/s12204-023-2565-3

Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction

行人轨迹预测的动作感知编码器–解码器网络傅家威,赵旭

Original Paper
Published: 07 February 2023

Volume 28, pages 20–27, (2023)
Cite this article

Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Jiawei Fu (傅家威)¹ &
Xu Zhao (赵旭)¹

176 Accesses
4 Citations
Explore all metrics

Abstract

Accurate pedestrian trajectory predictions are critical in self-driving systems, as they are fundamental to the response- and decision-making of ego vehicles. In this study, we focus on the problem of predicting the future trajectory of pedestrians from a first-person perspective. Most existing trajectory prediction methods from the first-person view copy the bird’s-eye view, neglecting the differences between the two. To this end, we clarify the differences between the two views and highlight the importance of action-aware trajectory prediction in the first-person view. We propose a new action-aware network based on an encoder-decoder framework with an action prediction and a goal estimation branch at the end of the encoder. In the decoder part, bidirectional long short-term memory (Bi-LSTM) blocks are adopted to generate the ultimate prediction of pedestrians’ future trajectories. Our method was evaluated on a public dataset and achieved a competitive performance, compared with other approaches. An ablation study demonstrates the effectiveness of the action prediction branch.

摘要

准确的行人轨迹预测在自动驾驶系统中至关重要,因为它们对于自主车辆的响应和决策至关重要。在本研究中,我们关注从第一人称视角预测行人未来轨迹的问题。大多数现有的第一人称视角的轨迹预测方法采用了鸟瞰图下的预测方法,忽略了两者之间的差异。为此,我们澄清了两种视角之间的差异,并强调了第一人称视角中动作感知对于轨迹预测的重要性。我们提出了一种基于编码器–解码器框架的新动作感知网络,在编码器末端具有动作预测分支和目标估计分支。在解码器部分,采用双向长短期记忆块来生成行人未来轨迹的最终预测。与其他方法相比,我们的方法在公共数据集上进行了评估,并取得了有竞争力的表现。消融研究证明了动作预测分支的有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic spatio-temporal graph convolutional network for traffic forecasting

Article 31 May 2024

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

Article 12 August 2023

References

MALLA S, DARIUSH B, CHOI C. TITAN: future forecast using action priors [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA: IEEE, 2020: 11183–11193.
Google Scholar
ZHANG T L, TU H Z, QIU W. Developing high-precision maps for automated driving in China: Legal obstacles and the way to overcome them [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(5): 658–669.
Google Scholar
GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The KITTI dataset [J]. The International Journal of Robotics Research, 2013, 32(11): 1231–1237.
Article Google Scholar
SONG X B, WANG P, ZHOU D F, et al. Apollo-Car3D: A large 3D car instance understanding benchmark for autonomous driving [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 5447–5457.
Google Scholar
HU Y K, WANG C X, YANG M. Decision-making method of intelligent vehicles: A survey [J]. Journal of Shanghai Jiao Tong University, 2021, 55(8): 1035–1048 (in Chinese).
Google Scholar
SHI Q, ZHANG J L, YANG M. Curvature adaptive control based path following for automatic driving vehicles in private area [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(5): 690–698.
Google Scholar
RASOULI A, KOTSERUBA I, KUNIC T, et al. PIE: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6261–6270.
Google Scholar
RASOULI A, KOTSERUBA I, TSOTSOS J K. Are they going to cross? A benchmark dataset and baseline for pedestrian crosswalk behavior [C]//2017 IEEE International Conference on Computer Vision Workshops. Venice: IEEE, 2017: 206–213.
Google Scholar
PELLEGRINI S, ESS A, SCHINDLER K, et al. You’ll never walk alone: Modeling social behavior for multi-target tracking [C]//2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 261–268.
Google Scholar
LEAL-TAIXÉ L, FENZI M, KUZNETSOVA A, et al. Learning an image-based motion context for multiple people tracking [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014: 3542–3549.
Google Scholar
ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 961–971.
Google Scholar
LIANG J W, JIANG L, NIEBLES J C, et al. Peeking into the future: Predicting future person activities and locations in videos [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 5718–5727.
Google Scholar
SIVARAMAN S, TRIVEDI M M. Dynamic probabilistic drivability maps for lane change and merge driver assistance [J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(5): 2063–2073.
Article Google Scholar
LI N, YAO Y, KOLMANOVSKY I, et al. Game-theoretic modeling of multi-vehicle interactions at uncontrolled intersections [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 1428–1442.
Article Google Scholar
YAO Y, ATKINS E, JOHNSON-ROBERSON M, et al. BiTraP: Bi-directional pedestrian trajectory prediction with multi-modal goal estimation [J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1463–1470.
Article Google Scholar
WANG C H, WANG Y C, XU M Z, et al. Step-wise goal-driven networks for trajectory prediction [J]. IEEE Robotics and Automation Letters, 2022, 7(2): 2716–2723.
Article Google Scholar
MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: Endpoint conditioned trajectory prediction [M]//Computer Vision — ECCV 2020. Cham: Springer, 2020: 759–776.
Google Scholar
REHDER E, KLOEDEN H. Goal-directed pedestrian prediction [C]//2015 IEEE International Conference on Computer Vision Workshop. Santiago: IEEE, 2015: 139–147.
Google Scholar
RHINEHART N, MCALLISTER R, KITANI K, et al. PRECOG: Prediction conditioned on goals in visual multi-agent settings [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2821–2830.
Google Scholar
HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735–1780.
Article Google Scholar
GUPTA A, JOHNSON J, LI F F, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 2255–2264.
Google Scholar
KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al. Social-BiGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks [C]//Advances in Neural Information Processing Systems. Vancouver, BC: Neural Information Processing Systems Foundation, 2019: 137–146.
Google Scholar
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [C]//Advances in Neural Information Processing Systems. Montreal: Neural Information Processing Systems Foundation, 2014: 2672–2680.
Google Scholar
SHAFIEE N, PADIR T, ELHAMIFAR E. Introvert: Human trajectory prediction via conditional 3D attention [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 16810–16820.
Google Scholar
DU L, DING X, LIU T, et al. Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder [C]//2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019: 2682–2691.
Google Scholar
ZHAO T C, ZHAO R, ESKENAZI M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders [C]//55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017: 654–664.
Google Scholar
SOHN K, LEE H, YAN X. Learning structured output representation using deep conditional generative models [C]//Advances in Neural Information Processing Systems. Montréal: Neural Information Processing Systems Foundation, 2015: 3483–3491.
Google Scholar
REYNOLDS D. Gaussian mixture models [M]//Encyclopedia of biometrics. Boston, MA: Springer, 2009: 659–663.
Google Scholar
QUAN R J, ZHU L C, WU Y, et al. Holistic LSTM for pedestrian trajectory prediction [J]. IEEE Transactions on Image Processing, 2021, 30: 3229–3239.
Article Google Scholar
NEUMANN L, VEDALDI A. Pedestrian and ego-vehicle trajectory prediction from monocular camera [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 10199–10207.
Google Scholar
RHINEHART N, KITANI K M, VERNAZA P. R2P2: A reparameterized pushforward policy for diverse, precise generative path forecasting [M]//Computer vision — ECCV 2018. Cham: Springer, 2018: 794–811.
Google Scholar
LI J C, MA H B, TOMIZUKA M. Conditional generative neural system for probabilistic trajectory prediction [C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macao: IEEE, 2019: 6150–6156.
Google Scholar
CHOI C, MALLA S, PATIL A, et al. DROGON: A causal reasoning framework for future trajectory forecast [EB/OL]. (2020-11-06) [2022-04-19]. https://arxiv.org/abs/1908.00024.
DEO N, TRIVEDI M M. Trajectory forecasts in unknown environments conditioned on grid-based plans [EB/OL]. (2021-04-29) [2022-04-19]. https://arxiv.org/abs/2001.00735.
FANG Z J, LÓPEZ A M. Is the pedestrian going to cross? Answering by 2D pose estimation [C]//2018 IEEE Intelligent Vehicles Symposium. Changshu: IEEE, 2018: 1271–1276.
Google Scholar
CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 1302–1310.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Jiawei Fu (傅家威) & Xu Zhao (赵旭)

Authors

Jiawei Fu (傅家威)
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhao (赵旭)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiawei Fu (傅家威).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, J., Zhao, X. Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction. J. Shanghai Jiaotong Univ. (Sci.) 28, 20–27 (2023). https://doi.org/10.1007/s12204-023-2565-3

Download citation

Received: 28 February 2022
Accepted: 19 April 2022
Published: 07 February 2023
Issue Date: February 2023
DOI: https://doi.org/10.1007/s12204-023-2565-3

Key words

关键词

CLC number

TP 391.4

Document code

A

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction

Abstract

摘要

Access this article

Similar content being viewed by others

Probabilistic spatio-temporal graph convolutional network for traffic forecasting

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Document code

Navigation

Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction

Abstract

摘要

Access this article

Similar content being viewed by others

Probabilistic spatio-temporal graph convolutional network for traffic forecasting

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Document code

Search

Navigation