Skip to main content
Log in

Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction

行人轨迹预测的动作感知编码器–解码器网络 傅家威,赵旭

  • Original Paper
  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

Accurate pedestrian trajectory predictions are critical in self-driving systems, as they are fundamental to the response- and decision-making of ego vehicles. In this study, we focus on the problem of predicting the future trajectory of pedestrians from a first-person perspective. Most existing trajectory prediction methods from the first-person view copy the bird’s-eye view, neglecting the differences between the two. To this end, we clarify the differences between the two views and highlight the importance of action-aware trajectory prediction in the first-person view. We propose a new action-aware network based on an encoder-decoder framework with an action prediction and a goal estimation branch at the end of the encoder. In the decoder part, bidirectional long short-term memory (Bi-LSTM) blocks are adopted to generate the ultimate prediction of pedestrians’ future trajectories. Our method was evaluated on a public dataset and achieved a competitive performance, compared with other approaches. An ablation study demonstrates the effectiveness of the action prediction branch.

摘要

准确的行人轨迹预测在自动驾驶系统中至关重要,因为它们对于自主车辆的响应和决策至关重要。在本研究中,我们关注从第一人称视角预测行人未来轨迹的问题。大多数现有的第一人称视角的轨迹预测方法采用了鸟瞰图下的预测方法,忽略了两者之间的差异。为此,我们澄清了两种视角之间的差异,并强调了第一人称视角中动作感知对于轨迹预测的重要性。我们提出了一种基于编码器–解码器框架的新动作感知网络,在编码器末端具有动作预测分支和目标估计分支。在解码器部分,采用双向长短期记忆块来生成行人未来轨迹的最终预测。与其他方法相比,我们的方法在公共数据集上进行了评估,并取得了有竞争力的表现。消融研究证明了动作预测分支的有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MALLA S, DARIUSH B, CHOI C. TITAN: future forecast using action priors [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA: IEEE, 2020: 11183–11193.

    Google Scholar 

  2. ZHANG T L, TU H Z, QIU W. Developing high-precision maps for automated driving in China: Legal obstacles and the way to overcome them [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(5): 658–669.

    Google Scholar 

  3. GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The KITTI dataset [J]. The International Journal of Robotics Research, 2013, 32(11): 1231–1237.

    Article  Google Scholar 

  4. SONG X B, WANG P, ZHOU D F, et al. Apollo-Car3D: A large 3D car instance understanding benchmark for autonomous driving [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 5447–5457.

    Google Scholar 

  5. HU Y K, WANG C X, YANG M. Decision-making method of intelligent vehicles: A survey [J]. Journal of Shanghai Jiao Tong University, 2021, 55(8): 1035–1048 (in Chinese).

    Google Scholar 

  6. SHI Q, ZHANG J L, YANG M. Curvature adaptive control based path following for automatic driving vehicles in private area [J]. Journal of Shanghai Jiao Tong University (Science), 2021, 26(5): 690–698.

    Google Scholar 

  7. RASOULI A, KOTSERUBA I, KUNIC T, et al. PIE: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6261–6270.

    Google Scholar 

  8. RASOULI A, KOTSERUBA I, TSOTSOS J K. Are they going to cross? A benchmark dataset and baseline for pedestrian crosswalk behavior [C]//2017 IEEE International Conference on Computer Vision Workshops. Venice: IEEE, 2017: 206–213.

    Google Scholar 

  9. PELLEGRINI S, ESS A, SCHINDLER K, et al. You’ll never walk alone: Modeling social behavior for multi-target tracking [C]//2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 261–268.

    Google Scholar 

  10. LEAL-TAIXÉ L, FENZI M, KUZNETSOVA A, et al. Learning an image-based motion context for multiple people tracking [C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014: 3542–3549.

    Google Scholar 

  11. ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 961–971.

    Google Scholar 

  12. LIANG J W, JIANG L, NIEBLES J C, et al. Peeking into the future: Predicting future person activities and locations in videos [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 5718–5727.

    Google Scholar 

  13. SIVARAMAN S, TRIVEDI M M. Dynamic probabilistic drivability maps for lane change and merge driver assistance [J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(5): 2063–2073.

    Article  Google Scholar 

  14. LI N, YAO Y, KOLMANOVSKY I, et al. Game-theoretic modeling of multi-vehicle interactions at uncontrolled intersections [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(2): 1428–1442.

    Article  Google Scholar 

  15. YAO Y, ATKINS E, JOHNSON-ROBERSON M, et al. BiTraP: Bi-directional pedestrian trajectory prediction with multi-modal goal estimation [J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1463–1470.

    Article  Google Scholar 

  16. WANG C H, WANG Y C, XU M Z, et al. Step-wise goal-driven networks for trajectory prediction [J]. IEEE Robotics and Automation Letters, 2022, 7(2): 2716–2723.

    Article  Google Scholar 

  17. MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: Endpoint conditioned trajectory prediction [M]//Computer Vision — ECCV 2020. Cham: Springer, 2020: 759–776.

    Google Scholar 

  18. REHDER E, KLOEDEN H. Goal-directed pedestrian prediction [C]//2015 IEEE International Conference on Computer Vision Workshop. Santiago: IEEE, 2015: 139–147.

    Google Scholar 

  19. RHINEHART N, MCALLISTER R, KITANI K, et al. PRECOG: Prediction conditioned on goals in visual multi-agent settings [C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2821–2830.

    Google Scholar 

  20. HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735–1780.

    Article  Google Scholar 

  21. GUPTA A, JOHNSON J, LI F F, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 2018: 2255–2264.

    Google Scholar 

  22. KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al. Social-BiGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks [C]//Advances in Neural Information Processing Systems. Vancouver, BC: Neural Information Processing Systems Foundation, 2019: 137–146.

    Google Scholar 

  23. GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [C]//Advances in Neural Information Processing Systems. Montreal: Neural Information Processing Systems Foundation, 2014: 2672–2680.

    Google Scholar 

  24. SHAFIEE N, PADIR T, ELHAMIFAR E. Introvert: Human trajectory prediction via conditional 3D attention [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 16810–16820.

    Google Scholar 

  25. DU L, DING X, LIU T, et al. Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder [C]//2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019: 2682–2691.

    Google Scholar 

  26. ZHAO T C, ZHAO R, ESKENAZI M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders [C]//55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017: 654–664.

    Google Scholar 

  27. SOHN K, LEE H, YAN X. Learning structured output representation using deep conditional generative models [C]//Advances in Neural Information Processing Systems. Montréal: Neural Information Processing Systems Foundation, 2015: 3483–3491.

    Google Scholar 

  28. REYNOLDS D. Gaussian mixture models [M]//Encyclopedia of biometrics. Boston, MA: Springer, 2009: 659–663.

    Google Scholar 

  29. QUAN R J, ZHU L C, WU Y, et al. Holistic LSTM for pedestrian trajectory prediction [J]. IEEE Transactions on Image Processing, 2021, 30: 3229–3239.

    Article  Google Scholar 

  30. NEUMANN L, VEDALDI A. Pedestrian and ego-vehicle trajectory prediction from monocular camera [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 10199–10207.

    Google Scholar 

  31. RHINEHART N, KITANI K M, VERNAZA P. R2P2: A reparameterized pushforward policy for diverse, precise generative path forecasting [M]//Computer vision — ECCV 2018. Cham: Springer, 2018: 794–811.

    Google Scholar 

  32. LI J C, MA H B, TOMIZUKA M. Conditional generative neural system for probabilistic trajectory prediction [C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macao: IEEE, 2019: 6150–6156.

    Google Scholar 

  33. CHOI C, MALLA S, PATIL A, et al. DROGON: A causal reasoning framework for future trajectory forecast [EB/OL]. (2020-11-06) [2022-04-19]. https://arxiv.org/abs/1908.00024.

  34. DEO N, TRIVEDI M M. Trajectory forecasts in unknown environments conditioned on grid-based plans [EB/OL]. (2021-04-29) [2022-04-19]. https://arxiv.org/abs/2001.00735.

  35. FANG Z J, LÓPEZ A M. Is the pedestrian going to cross? Answering by 2D pose estimation [C]//2018 IEEE Intelligent Vehicles Symposium. Changshu: IEEE, 2018: 1271–1276.

    Google Scholar 

  36. CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI: IEEE, 2017: 1302–1310.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiawei Fu  (傅家威).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, J., Zhao, X. Action-Aware Encoder-Decoder Network for Pedestrian Trajectory Prediction. J. Shanghai Jiaotong Univ. (Sci.) 28, 20–27 (2023). https://doi.org/10.1007/s12204-023-2565-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12204-023-2565-3

Key words

关键词

CLC number

Document code

Navigation