An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments

Chen, Yang; Shi, Dianxi; Yang, Huanhuan; Li, Tongyue; Wang, Zhen

doi:10.1631/FITEE.2300151

An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments

面向未知动态环境的机器人搜救任务避障算法

Research Article
Published: 10 May 2024

Volume 25, pages 569–584, (2024)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Yang Chen (陈洋) ORCID: orcid.org/0000-0003-3414-3328¹,
Dianxi Shi (史殿习) ORCID: orcid.org/0000-0002-8112-371X^2,3,
Huanhuan Yang (杨焕焕)⁴,
Tongyue Li (李彤月)³ &
…
Zhen Wang (王震)³

39 Accesses
Explore all metrics

Abstract

This paper deals with the search-and-rescue tasks of a mobile robot with multiple interesting targets in an unknown dynamic environment. The problem is challenging because the mobile robot needs to search for multiple targets while avoiding obstacles simultaneously. To ensure that the mobile robot avoids obstacles properly, we propose a mixed-strategy Nash equilibrium based Dyna-Q (MNDQ) algorithm. First, a multi-objective layered structure is introduced to simplify the representation of multiple objectives and reduce computational complexity. This structure divides the overall task into subtasks, including searching for targets and avoiding obstacles. Second, a risk-monitoring mechanism is proposed based on the relative positions of dynamic risks. This mechanism helps the robot avoid potential collisions and unnecessary detours. Then, to improve sampling efficiency, MNDQ is presented, which combines Dyna-Q and mixed-strategy Nash equilibrium. By using mixed-strategy Nash equilibrium, the agent makes decisions in the form of probabilities, maximizing the expected rewards and improving the overall performance of the Dyna-Q algorithm. Furthermore, a series of simulations are conducted to verify the effectiveness of the proposed method. The results show that MNDQ performs well and exhibits robustness, providing a competitive solution for future autonomous robot navigation tasks.

摘要

本文研究未知动态环境下具有多个兴趣目标的移动机器人搜救任务问题。由于移动机器人需要搜救多个目标并避开障碍, 此类问题具有挑战性。为确保移动机器人合理避碰, 本文提出一种基于混合策略纳什均衡的Dyna-Q算法(MNDQ)。首先, 引入一种多目标分层结构以简化问题, 该结构将整个任务划分为多个子任务, 包括搜索目标和躲避障碍。其次, 提出基于动态风险相对位置的风险监测机制, 使机器人避免潜在碰撞和绕路。此外, 为提高采样效率, 提出了结合Dyna-Q和混合策略纳什均衡的强化学习方法(MNDQ)。根据混合策略纳什均衡, 智能体以概率的形式做出决策从而最大化期望回报, 提高Dyna-Q算法的整体性能。最后, 通过仿真实验验证所提方法的有效性。结果表明, 该方法具有良好的表现并为未来的机器人自主导航任务提供了解决思路。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Aggarwal S, Kumar N, 2020. Path planning techniques for unmanned aerial vehicles: a review, solutions, and challenges. Comput Commun, 149:270–299. https://doi.org/10.1016/j.comcom.2019.10.014
Article Google Scholar
Banerjee C, Datta D, Agarwal A, 2015. Chaotic patrol robot with frequency constraints. Proc IEEE Int Conf on Research in Computational Intelligence and Communication Networks, p.340–344. https://doi.org/10.1109/ICRCICN.2015.7434261
Brito B, Floor B, Ferranti L, et al., 2019. Model predictive contouring control for collision avoidance in unstructured dynamic environments. IEEE Robot Autom Lett, 4(4):4459–4466. https://doi.org/10.1109/LRA.2019.2929976
Article Google Scholar
Brito B, Everett M, How JP, et al., 2021. Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot Autom Lett, 6(3):4616–4623. https://doi.org/10.1109/LRA.2021.3068662
Article Google Scholar
Brockman G, Cheung V, Pettersson L, et al., 2016. OpenAI Gym. https://arxiv.org/abs/1606.01540
Chiu ZY, Richter F, Funk EK, et al., 2021. Bimanual regrasping for suture needles using reinforcement learning for rapid motion planning. Proc IEEE Int Conf on Robotics and Automation, p.7737–7743. https://doi.org/10.1109/ICRA48506.2021.9561673
Curiac DI, Banias O, Volosencu C, et al., 2018. Novel bioinspired approach based on chaotic dynamics for robot patrolling missions with adversaries. Entropy, 20(5):378. https://doi.org/10.3390/e20050378
Article Google Scholar
Dong YS, Zou XJ, 2020. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. Proc IEEE 11^th Int Conf on Software Engineering and Service Science, p.52–56. https://doi.org/10.1109/ICSESS49938.2020.9237641
Faust A, Oslund K, Ramirez O, et al., 2018. PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. Proc IEEE Int Conf on Robotics and Automation, p.5113–5120. https://doi.org/10.1109/ICRA.2018.8461096
Fudenberg D, Tirole J, 1991. Game Theory. MIT Press, Cambridge, USA.
Google Scholar
Gaertner M, Bjelonic M, Farshidian F, et al., 2021. Collisionfree MPC for legged robots in static and dynamic scenes. Proc IEEE Int Conf on Robotics and Automation, p.8266–8272. https://doi.org/10.1109/ICRA48506.2021.9561326
Geng N, Meng QG, Gong DW, et al., 2019. How good are distributed allocation algorithms for solving urban search and rescue problems? A comparative study with centralized algorithms. IEEE Trans Autom Sci Eng, 16(1):478–485. https://doi.org/10.1109/TASE.2018.2866395
Article Google Scholar
Greenwald A, Hall K, 2003. Correlated-Q-learning. Proc 20^th Int Conf on Machine Learning, p.242–249.
Gregor M, Nemec D, Janota A, et al., 2018. A visual attention operator for playing Pac-Man. Proc ELEKTRO, p.1–6. https://doi.org/10.1109/elektro.2018.8398308
Hayamizu Y, Amiri S, Chandan K, et al., 2021. Guiding robot exploration in reinforcement learning via automated planning. Proc 31^st Int Conf on Automated Planning and Scheduling, p.625–633. https://doi.org/10.1609/icaps.v31i1.16011
Hong LB, Wang Y, Du YC, et al., 2021. UAV search-and-rescue planning using an adaptive memetic algorithm. Front Inform Technol Electron Eng, 22(11):1477–1491. https://doi.org/10.1631/FITEE.2000632
Article Google Scholar
Hu JL, Wellman MP, 2003. Nash Q-learning for general-sum stochastic games. J Mach Learn Res, 4:1039–1069.
MathSciNet Google Scholar
Hubert T, Schrittwieser J, Antonoglou I, et al., 2021. Learning and planning in complex action spaces. Proc 38^th Int Conf on Machine Learning, p.4476–4486.
Hwang KS, Lin JL, Huang HL, 2011. Dynamic patrol planning in a cooperative multi-robot system. Proc 14^th FIRA RoboWorld Congress, p.116–123. https://doi.org/10.1007/978-3-642-23147-6_14
Hwang KS, Jiang WC, Chen YJ, 2015. Model learning and knowledge sharing for a multiagent system with Dyna-Q learning. IEEE Trans Cybern, 45(5):978–990. https://doi.org/10.1109/TCYB.2014.2341582
Article Google Scholar
Jaderberg M, Czarnecki WM, Dunning I, et al., 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865. https://doi.org/10.1126/science.aau6249
Article MathSciNet Google Scholar
Lei XY, Zhang Z, Dong PF, 2018. Dynamic path planning of unknown environment based on deep reinforcement learning. J Rob, 2018:5781591. https://doi.org/10.1155/2018/5781591
Google Scholar
Li CH, Fang C, Wang FY, et al., 2019. Complete coverage path planning for an Arnold system based mobile robot to perform specific types of missions. Front Inform Technol Electron Eng, 20(11):1530–1542. https://doi.org/10.1631/FITEE.1800616
Article Google Scholar
Li HQ, Huang J, Cao Z, et al., 2023. Stochastic pedestrian avoidance for autonomous vehicles using hybrid reinforcement learning. Front Inform Technol Electron Eng, 24(1):131–140. https://doi.org/10.1631/FITEE.2200128
Article Google Scholar
Li HR, Zhang QC, Zhao DB, 2020. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE Trans Neur Netw Learn Syst, 31(6):2064–2076. https://doi.org/10.1109/TNNLS.2019.2927869
Article Google Scholar
Li ZR, Lu C, Yi YT, et al., 2022. A hierarchical framework for interactive behaviour prediction of heterogeneous traffic participants based on graph neural network. IEEE Trans Intell Transp Syst, 23(7):9102–9114. https://doi.org/10.1109/TITS.2021.3090851
Article Google Scholar
Liu SJ, Tong XR, 2021. Urban transportation path planning based on reinforcement learning. J Comput Appl, 41(1):185–190 (in Chinese). https://doi.org/10.11772/j.issn.1001-9081.2020060949
Google Scholar
Liu YY, Yan SH, Zhao Y, et al., 2022. Improved Dyna-Q: a reinforcement learning method focused via heuristic graph for AGV path planning in dynamic environments. Drones, 6(11):365. https://doi.org/10.3390/drones6110365
Article Google Scholar
Liu Z, Cao YQ, Chen JY, et al., 2023. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation. IEEE Trans Intell Transp Syst, 24(11):13309–13320. https://doi.org/10.1109/TITS.2022.3225721
Article Google Scholar
Lu YL, Yan K, 2020. Algorithms in multi-agent systems: a holistic perspective from reinforcement learning and game theory. https://arxiv.org/abs/2001.06487
Lu YM, Kamgarpour M, 2020. Safe mission planning under dynamical uncertainties. Proc IEEE Int Conf on Robotics and Automation, p.2209–2215. https://doi.org/10.1109/ICRA40945.2020.9196515
Luo GY, Wang YT, Zhang H, et al., 2023. AlphaRoute: large-scale coordinated route planning via Monte Carlo tree search. Proc 37^th AAAI Conf on Artificial Intelligence, p.12058–12067. https://doi.org/10.1609/aaai.v37i10.26422
Martins-Filho LS, Macau EEN, 2007. Patrol mobile robots and chaotic trajectories. Math Probl Eng, 2007:061543. https://doi.org/10.1155/2007/61543
Article MathSciNet Google Scholar
McGuire KN, de Croon GCHE, Tuyls K, 2019. A comparative study of bug algorithms for robot navigation. Robot Auton Syst, 121:103261. https://doi.org/10.1016/j.robot.2019.103261
Article Google Scholar
Mirchevska B, Hügle M, Kalweit G, et al., 2021. Amortized Q-learning with model-based action proposals for autonomous driving on highways. Proc IEEE Int Conf on Robotics and Automation, p.1028–1035. https://doi.org/10.1109/ICRA48506.2021.9560777
Nasar W, da Silva Torres R, Gundersen OE, et al., 2023. The use of decision support in search and rescue: a systematic literature review. ISPRS Int J Geo-Inform, 12(5):182. https://doi.org/10.3390/ijgi12050182
Article Google Scholar
Nash JFJr, 1950. Equilibrium points in n-person games. Proc Natl Acad Sci USA, 36(1):48–49. https://doi.org/10.1073/pnas.36.1.48
Article MathSciNet Google Scholar
Ng J, Bräunl T, 2007. Performance comparison of bug navigation algorithms. J Intell Robot Syst, 50(1):73–84. https://doi.org/10.1007/s10846-007-9157-6
Article Google Scholar
Niroui F, Sprenger B, Nejat G, 2017. Robot exploration in unknown cluttered environments when dealing with uncertainty. Proc IEEE Int Symp on Robotics and Intelligent Sensors, p.224–229. https://doi.org/10.1109/IRIS.2017.8250126
Ohnishi M, Wang L, Notomista G, et al., 2019. Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Trans Robot, 35(5):1186–1205. https://doi.org/10.1109/TRO.2019.2920206
Article Google Scholar
Osborne M, Rubinstein A, 1994. A Course in Game Theory. MIT Press, Cambridge, USA.
Google Scholar
Padakandla S, 2021. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Comput Surv, 54(6):127. https://doi.org/10.1145/3459991
MathSciNet Google Scholar
Patle BK, Babu LG, Pandey A, et al., 2019. A review: on path planning strategies for navigation of mobile robot. Def Technol, 15(4):582–606. https://doi.org/10.1016/j.dt.2019.04.011
Article Google Scholar
Pei M, An H, Liu B, et al., 2022. An improved Dyna-Q algorithm for mobile robot path planning in unknown dynamic environment. IEEE Trans Syst Man Cybern Syst, 52(7):4415–4425. https://doi.org/10.1109/TSMC.2021.3096935
Article Google Scholar
Prado J, Marques L, 2014. Energy efficient area coverage for an autonomous demining robot. Proc 1^st Iberian Robotics Conf, p.459–471. https://doi.org/10.1007/978-3-319-03653-3_34
Puterman ML, 1990. Markov decision processes. Handb Oper Res Manage Sci, 2:331–434. https://doi.org/10.1016/S0927-0507(05)80172-0
MathSciNet Google Scholar
Rosenthal RW, 1973. A class of games possessing pure-strategy Nash equilibria. Int J Game Theory, 2(1):65–67. https://doi.org/10.1007/BF01737559
Article MathSciNet Google Scholar
Roughgarden T, 2010. Algorithmic game theory. Commun ACM, 53(7):78–86. https://doi.org/10.1145/1785414.1785439
Article Google Scholar
Roughgarden T, 2016. Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, New York, USA. https://doi.org/10.1017/CBO9781316779309
Book Google Scholar
Shi HB, Yang SK, Hwang KS, et al., 2018. A sample aggregation approach to experiences replay of Dyna-Q learning. IEEE Access, 6:37173–37184. https://doi.org/10.1109/ACCESS.2018.2847048
Article Google Scholar
Sutton RS, Barto AG, 1999. Reinforcement learning. J Cognit Neurosci, 11(1):126–134. https://doi.org/10.1162/089892999563184
Google Scholar
Sutton RS, Barto AG, 2018. Reinforcement Learning: an Introduction (2^nd Ed.). MIT Press, Cambridge, USA.
Google Scholar
Wakayama S, Ahmed NR, 2020. Auto-tuning online POMDPs for multi-object search in uncertain environments. Proc AIAA Scitech Forum. https://doi.org/10.2514/6.2020-0391
Wang BY, Liu Z, Li QB, et al., 2020. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot Autom Lett, 5(4):6932–6939. https://doi.org/10.1109/LRA.2020.3026638
Article Google Scholar
Wu JF, Braverman V, Yang L, 2021. Accommodating picky customers: regret bound and exploration complexity for multi-objective reinforcement learning. Proc 35^th Int Conf on Neural Information Processing Systems, p.13112–13124.
Wu YX, Li XJ, Liu JJ, et al., 2019. Switch-based active deep Dyna-Q: efficient adaptive planning for task-completion dialogue policy learning. Proc 33^rd AAAI Conf on Artificial intelligence, p.7289–7296. https://doi.org/10.1609/aaai.v33i01.33017289
Wyrąbkiewicz K, Tarczewski T, Niewiara ł, 2020. Local path planning for autonomous mobile robot based on APF-BUG algorithm with ground quality indicator. In: Bartoszewicz A, Kabziński J, Kacprzyk J (Eds.), Advanced, Contemporary Control. Springer, Cham, p.979–990. https://doi.org/10.1007/978-3-030-50936-1_82
Chapter Google Scholar
Yu Y, Tang J, Huang JY, et al., 2021. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm. IEEE Trans Commun, 69(9):6361–6374. https://doi.org/10.1109/TCOMM.2021.3089476
Article Google Scholar
Zhang YH, Chai ZJ, Lykotrafitis G, 2021. Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles. Phys A Stat Mech Appl, 571:125845. https://doi.org/10.1016/j.physa.2021.125845
Article Google Scholar
Zheng KY, Sung Y, Konidaris G, et al., 2021. Multiresolution POMDP planning for multi-object search in 3D. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2022–2029. https://doi.org/10.1109/IROS51168.2021.9636737
Zheng Z, Liu Y, Zhang XY, 2016. The more obstacle information sharing, the more effective real-time path planning? Knowl-Based Syst, 114:36–46. https://doi.org/10.1016/j.knosys.2016.09.021
Article Google Scholar
Zou LX, Xia L, Du P, et al., 2020. Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. Proc 13^th Int Conf on Web Search and Data Mining, p.816–824. https://doi.org/10.1145/3336191.3371801

Download references

Author information

Authors and Affiliations

School of Computer Science, Peking University, Beijing, 100871, China
Yang Chen (陈洋)
Tianjin Artificial Intelligence Innovation Center, Tianjin, 300457, China
Dianxi Shi (史殿习)
Intelligent Game and Decision Lab, Beijing, 100071, China
Dianxi Shi (史殿习), Tongyue Li (李彤月) & Zhen Wang (王震)
College of Computer, National University of Defense Technology, Changsha, 410073, China
Huanhuan Yang (杨焕焕)

Authors

Yang Chen (陈洋)
View author publications
You can also search for this author in PubMed Google Scholar
Dianxi Shi (史殿习)
View author publications
You can also search for this author in PubMed Google Scholar
Huanhuan Yang (杨焕焕)
View author publications
You can also search for this author in PubMed Google Scholar
Tongyue Li (李彤月)
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Wang (王震)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yang CHEN designed the research and drafted the paper. Yang CHEN and Dianxi SHI processed the data. Yang CHEN and Huanhuan YANG designed the simulations. Tongyue LI and Zhen WANG helped organize the paper. Yang CHEN and Dianxi SHI revised and finalized the paper.

Corresponding author

Correspondence to Dianxi Shi (史殿习).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (No. 91948303)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Shi, D., Yang, H. et al. An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments. Front Inform Technol Electron Eng 25, 569–584 (2024). https://doi.org/10.1631/FITEE.2300151

Download citation

Received: 03 March 2023
Accepted: 25 August 2023
Published: 10 May 2024
Issue Date: March 2024
DOI: https://doi.org/10.1631/FITEE.2300151

Key words

关键词

CLC number

TP183

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments

Abstract

摘要

Access this article

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation