Skip to main content
Log in

Optimizing pedestrian simulation based on expert trajectory guidance and deep reinforcement learning

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Most traditional pedestrian simulation methods suffer from short-sightedness, as they often choose the best action at the moment without considering the potential congesting situations in the future. To address this issue, we propose a hierarchical model that combines Deep Reinforcement Learning (DRL) and Optimal Reciprocal Velocity Obstacle (ORCA) algorithms to optimize the decision process of pedestrian simulation. For certain complex scenarios prone to local optimality, we include expert trajectory imitation degree in the reward function, aiming to improve pedestrian exploration efficiency by designing simple expert trajectory guidance lines without constructing databases of expert examples and collecting priori datasets. The experimental results show that the proposed method presents great stability and generalizability, evidenced by its capability to adjust the behavioral strategy earlier for the upcoming congestion situations. The overall simulation time for each scenario is reduced by approximately 8-44% compared to traditional methods. After including the expert trajectory guidance, the convergence speed of the model is greatly improved, evidenced by the reduced 56-64% simulation time from the first exploration to the global maximum cumulative reward value. The expert trajectory establishes the macro rules while preserving the space for free exploration, avoiding local dilemmas, and achieving optimized training efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1:
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

All of the material is owned by the authors and/or no permissions are required.

References

  1. Xu D (2021) Study on micro-scale pedestrian simulation using reinforcement learning [D]. East China Normal University. https://doi.org/10.27149/d.cnki.ghdsu.2021.000427

    Article  Google Scholar 

  2. Du J (2020) Research on emergency evacuation modeling and path planning based on artificial bee colony algorithm [D]. Hubei University of Technology. https://doi.org/10.27131/d.cnki.ghugc.2020.000042

  3. Helbing D, Farkas IJ, Molnar P et al (2002) Simulation of pedestrian crowds in normal and evacuation situations[J]. Pedestr Evacuation Dyn 21(2):21–58

  4. Dijkstra J, Jessurun J, Timmermans HJP (2001) A multi-agent cellular automata model of pedestrian movement[J]. Pedestr Evacuation Dyn 173:173–180

    Google Scholar 

  5. Helbing D (1998) A fluid dynamic model for the movement of pedestrians[J]. arXiv preprint cond-mat/9805213

  6. Wu Z, Liu D, Cheng Y, Sun Y (2012) Three-dimensional crowd simulation of agent-based method[J]. Comput Technol Dev 22(11):108–112

    Google Scholar 

  7. Zhao L, Guo M, Tang S, Tang J (2022) Adaptive crowd evacuation simulation model based on bounded rationality constraints [J/OL]. J Syst Simul: 1–9. https://doi.org/10.16182/j.issn1004731x.joss.21-0472

  8. Shen Y, Han J, Li L et al (2020) AI in game intelligence—from multi-role game to parallel game[J]. Chin J Intell Sci Technol 2(3):205–213

    Google Scholar 

  9. Hu Y, Mottaghi R, Kolve E et al (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]. 2017 IEEE international conference on robotics and automation (ICRA). Piscataway: IEEE Press, pp 3357–3364

  10. Shani G, Heckerman D, Brafman RI et al (2005) An MDP-based recommender system[J]. J Mach Learn Res 6(9):1265–1295

  11. Yao Z (2020) Research on simulation method of crowd evacuation based on reinforcement learning and deep residual network learning [D]. Shandong Normal University. https://doi.org/10.27280/d.cnki.gsdsu.2020.001531

  12. Xu D, Huang X, Li Z et al (2020) Local motion simulation using deep reinforcement learning[J]. Trans GIS 24(3):756–779

    Article  Google Scholar 

  13. Lowe R, Wu Y I, Tamar A, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in Neural Information Processing systems. https://doi.org/10.48550/arXiv.1706.02275

  14. Zhang F, Li J, Li ZA (2020) TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J]. Neurocomputing 411:206–215

    Article  Google Scholar 

  15. Zhelo O, Zhang J, Tai L et al (2018) Curiosity-driven exploration for mapless navigation with deep reinforcement learning[J], arXiv: 1804.00456

  16. Yang Y (2019) Study on crowd evacuation simulation models for semi-submersible accommodation platform [D]. Shanghai Jiao Tong University. https://doi.org/10.27307/d.cnki.gsjtu.2019.001894

  17. Bounini F, Gingras D, Pollart H et al (2017) Modified artificial potential field method for online path planning applications[C]. 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, pp 180–185

  18. Wu H (2017) Evacuation Simulation of indoor pedestrian [D]. University of Electronic Science and Technology of China

  19. Ma S, Zhang R, Qi Z, Hao J (2021) Research on improvement of social force model of opposite avoidance and contact behavior [J]. Comput Simul 38(03):63–67

    Google Scholar 

  20. Guo K, Wang D, Fan T et al (2021) VR-ORCA: variable responsibility optimal reciprocal collision Avoidance[J]. IEEE Rob Autom Lett 6(3):4520–4527

    Article  Google Scholar 

  21. He G, Jiang D, Jin Y, Chen Q, Lu X, Xu M (2018) Crowd behavior simulation based on shadow obstacle and ORCA models [J]. Sci Sin Informationis 48(03):233–247

    Article  Google Scholar 

  22. Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of go with deep neural networks and tree search[J]. Nature 529(7587):484–489

    Article  Google Scholar 

  23. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning[J]. Nature 518(7540):529–533

    Article  Google Scholar 

  24. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning[C]. Proceedings of the AAAI conference on artificial intelligence, 30(1)

  25. Wang Z, Schaul T, Hessel M et al (2016) Dueling network architectures for deep reinforcement learning[C]. International conference on machine learning. PMLR, pp 1995–2003

  26. Schaul T, Quan J, Antonoglou I et al (2015) Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952

  27. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Mach Learn 8(3):229–256

    Article  MATH  Google Scholar 

  28. Schulman J, Levine S, Abbeel P et al (2015) Trust region policy optimization[C]. International conference on machine learning. PMLR, pp 1889–1897

  29. Schulman J, Wolski F, Dhariwal P et al (2017) Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347

  30. Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971

  31. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods[C]. International conference on machine learning. PMLR, pp 1587–1596

  32. Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. International conference on machine learning. PMLR, pp 1861–1870

  33. Lee J, Won J, Lee J (2018) Crowd simulation by deep reinforcement learning[C]. Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games. pp 1–7

  34. Sharma J, Andersen PA, Granmo OC et al (2020) Deep q-learning with q-matrix transfer learning for novel fire evacuation environment[J]. IEEE Trans Syst Man Cybernet Syst 51(12):7363–7381

    Article  Google Scholar 

  35. Lu G (2021) Regularized maximum entropy imitation learning based on prior reward of trajectory [D]. East China Normal University. https://doi.org/10.27149/d.cnki.ghdsu.2021.000029

  36. Levine S, Kumar A, Tucker G et al (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems[J]. arXiv preprint arXiv:2005.01643

  37. Berg J, Guy SJ, Lin M et al (2011) Reciprocal n-body collision avoidance[M]. Robotics research. Springer, Berlin, Heidelberg, pp 3–19

  38. Fortunato M, Azar MG, Piot B et al (2017) Noisy networks for exploration[J]. arXiv preprint arXiv:1706.10295

  39. Gwynne S, Rosenbaum ER (2016) Employing the hydraulic model in assessing emergency movement[M]. SFPE handbook of fire protection engineering. Springer, New York, pp 2115–2151

    Google Scholar 

  40. Thunderhead Engineering (2011) Pathfinder technical reference. Thunderhead Engineering Consultants, Inc, Manhattan

    Google Scholar 

Download references

Acknowledgements

We thank the editors and reviewers for their valuable comments. This work is partially supported by the projects funded by the Chongqing Natural Science Foundation (Grant Number: CSTB2022NSCQ-MSX2069) and the Ministry of Education of China (Grant Number: 19JZD023).

Author information

Authors and Affiliations

Authors

Contributions

S.M. and X.L. conceived and designed the experiments. S.M., X.H., M.W., D.Z., D.X. performed the experiments and analyzed the results. S.M., X.H., and M.W. wrote the manuscript. X.L. gave comments and suggestions on the manuscript and proofread the document. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xiang Li.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mu, S., Huang, X., Wang, M. et al. Optimizing pedestrian simulation based on expert trajectory guidance and deep reinforcement learning. Geoinformatica 27, 709–736 (2023). https://doi.org/10.1007/s10707-023-00486-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-023-00486-5

Keywords

Navigation