Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

Li, Junchao; Cai, Mingyu; Kan, Zhen; Xiao, Shaoping

doi:10.1007/s10458-024-09641-0

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

Published: 26 March 2024

Volume 38, article number 14, (2024)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Junchao Li¹^na1,
Mingyu Cai²^na1,
Zhen Kan³^na1 &
…
Shaoping Xiao¹^na1

240 Accesses
Explore all metrics

Abstract

Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

Code availability

Code files are available at https://github.com/JunchaoLi001/Model-free_DRL_LSTM_on_POMDP_with_LDGBA.

Notes

https://github.com/JunchaoLi001/Model-free_DRL_LSTM_on_POMDP_with_LDGBA.

References

Kurniawati, H. (2022). Partially observable Markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems. https://doi.org/10.1146/annurev-control-042920-092451
Article Google Scholar
Kaufman, H., & Howard, R. A. (1961). Dynamic programming and Markov processes. The American Mathematical Monthly. https://doi.org/10.2307/2312519
Article MathSciNet Google Scholar
Cai, M., Xiao, S., Li, B., Li, Z., & Kan, Z. (2021). Reinforcement learning based temporal logic control with maximum probabilistic satisfaction (Vol. 2021). https://doi.org/10.1109/ICRA48506.2021.9561903
Cai, M., Xiao, S., Li, Z., & Kan, Z. (2021). Optimal probabilistic motion planning with potential infeasible ltl constraints. IEEE Transactions on Automatic Control. https://doi.org/10.1109/TAC.2021.3138704
Article Google Scholar
Perez, A., Platt, R., Konidaris, G., Kaelbling, L., & Lozano-Perez, T. (2012). Lqr-rrt*: Optimal sampling-based motion planning with automatically derived extension heuristics. In 2012 IEEE International Conference on Robotics and Automation (pp. 2537–2542). IEEE.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Google Scholar
Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based pomdp solvers. Autonomous Agents and Multi-Agent Systems. https://doi.org/10.1007/s10458-012-9200-2
Article Google Scholar
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for pomdps.
Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces (Vol. 4) (2009). https://doi.org/10.15607/rss.2008.iv.009
Li, J., Cai, M., Wang, Z., & Xiao, S. (2023). Model-based motion planning in pomdps with temporal logic specifications. Advanced Robotics, 37(14), 871–886.
Article Google Scholar
Mnih, V., Silver, D., & Riedmiller, M. (2013). Playing atari with deep q learning. Nips.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM. https://doi.org/10.1145/3065386
Article Google Scholar
Hausknecht, M., & Stone, P. (2015). Deep recurrent q-learning for partially observable mdps (Vol. FS-15-06).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Foerster, J. N., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks.
Zhu, P., Li, X., Poupart, P., & Miao, G. (2017). On improving deep reinforcement learning for pomdps.
Heess, N., Hunt, J. J., Lillicrap, T. P., & Silver, D. (2015). Memory-based control with recurrent neural networks.
Meng, L., Gorbet, R., & Kulic, D. (2021). Memory-based deep reinforcement learning for pomdps. https://doi.org/10.1109/IROS51168.2021.9636140
0 Baier, C., & Katoen, J.-P. (2008). Principles of model checking (Vol. 950).
Křetínský, J., Meggendorfer, T., & Sickert, S. (2018). Owl: A library for \(\omega \)-words, automata, and ltl, vol. 11138 LNCS. https://doi.org/10.1007/978-3-030-01090-4_34
Chatterjee, K., Chmelík, M., Gupta, R., & Kanodia, A. (2015). Qualitative analysis of pomdps with temporal logic specifications for robotics applications, vol. 2015-June. https://doi.org/10.1109/ICRA.2015.7139019
Icarte, R. T., Waldie, E., Klassen, T. Q., Valenzano, R., Castro, M. P., & McIlraith, S. A. (2019). Learning reward machines for partially observable reinforcement learning, vol. 32.
Icarte, R. T., Klassen, T. Q., Valenzano, R., & McIlraith, S. A. (2022). Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73, 173–208.
Article MathSciNet Google Scholar
Sharan, R., & Burdick, J. (2014). Finite state control of pomdps with ltl specifications. https://doi.org/10.1109/ACC.2014.6858909
Bouton, M., & Kochenderfer, M. J. (2020). Point-based methods for model checking in partially observable Markov decision processes. https://doi.org/10.1609/aaai.v34i06.6563
Ahmadi, M., Sharan, R., & Burdick, J. W. (2020). Stochastic finite state control of pomdps with ltl specifications.
Carr, S., Jansen, N., Wimmer, R., Serban, A., Becker, B., & Topcu, U. (2019). Counterexample-guided strategy improvement for pomdps using recurrent neural networks (vol. 2019). https://doi.org/10.24963/ijcai.2019/768.
Carr, S., Jansen, N., & Topcu, U. (2020) Verifiable rnn-based policies for pomdps under temporal logic constraints (Vol. 2021). https://doi.org/10.24963/ijcai.2020/570.
Hahn, E. M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., & Wojtczak, D. (2019). Omega-regular objectives in model-free reinforcement learning (Vol. 11427). LNCS. https://doi.org/10.1007/978-3-030-17462-0_27
Cai, M., Hasanbeig, M., Xiao, S., Abate, A., & Kan, Z. (2021). Modular deep reinforcement learning for continuous motion planning with temporal logic. IEEE Robotics and Automation Letters. https://doi.org/10.1109/LRA.2021.3101544
Article Google Scholar
Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G. J., & Lee, I. (2019). Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees (Vol. 2019). https://doi.org/10.1109/CDC40024.2019.9028919
Hasanbeig, H., Kroening, D., & Abate, A. (2023). Certified reinforcement learning with logic guidance. Artificial Intelligence, 322, 103949.
Article MathSciNet Google Scholar
Oura, R., Sakakibara, A., & Ushio, T. (2020). Reinforcement learning of control policy for linear temporal logic specifications using limit-deterministic generalized büchi automata. IEEE Control Systems Letters. https://doi.org/10.1109/LCSYS.2020.2980552
Article Google Scholar
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning. https://doi.org/10.1007/bf00992699
Article Google Scholar
Bozkurt, A. K., Wang, Y., Zavlanos, M. M., & Pajic, M. (2020). Control synthesis from linear temporal logic specifications using model-free reinforcement learning. https://doi.org/10.1109/ICRA40945.2020.9196796
Sickert, S., Esparza, J., Jaax, S., & Křetínský, J. (2016). Limit-deterministic büchi automata for linear temporal logic (Vol. 9780). https://doi.org/10.1007/978-3-319-41540-6_17.
Coumans, E., & Bai, Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org (2016–2021)
Oroojlooy, A., & Hajinezhad, D. (2022). A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence. https://doi.org/10.1007/s10489-022-04105-y
Article Google Scholar
Zhou, W., Li, J., & Zhang, Q. (2022). Joint communication and action learning in multi-target tracking of uav swarms with deep reinforcement learning. Drones, 6(11), 339.
Article Google Scholar

Download references

Acknowledgements

Li and Xiao would like to thank US Department of Education (ED#P116S210005) and NSF (#2226936) for supporting this research.

Funding

This research was funded by US Department of Education (ED#P116S210005) and NSF (#2226936).

Author information

Junchao Li, Mingyu Cai, Zhen Kan and Shaoping Xiao have contributed equally to this work.

Authors and Affiliations

Department of Mechanical Engineering, Iowa Technology of Institute, The University of Iowa, Seamans Center, 3131, Iowa City, IA, 52242, USA
Junchao Li & Shaoping Xiao
Department of Mechanical Engineering, University of California, Bourns Hall A342 900 University Ave., Riverside, CA, 92521, USA
Mingyu Cai
Department of Automation, University of Science and Technology of China, No. 443 Huangshan Road, Shushan District, Hefei, 230022, Anhui, China
Zhen Kan

Authors

Junchao Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Kan
View author publications
You can also search for this author in PubMed Google Scholar
Shaoping Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junchao Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Cai, M., Kan, Z. et al. Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments. Auton Agent Multi-Agent Syst 38, 14 (2024). https://doi.org/10.1007/s10458-024-09641-0

Download citation

Accepted: 29 February 2024
Published: 26 March 2024
DOI: https://doi.org/10.1007/s10458-024-09641-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Deep learning: systematic review, models, challenges, and research directions

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Deep learning: systematic review, models, challenges, and research directions

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation