Better value estimation in Q-learning-based multi-agent reinforcement learning

Ding, Ling; Du, Wei; Zhang, Jian; Guo, Lili; Zhang, Chenglong; Jin, Di; Ding, Shifei

doi:10.1007/s00500-023-09365-5

Better value estimation in Q-learning-based multi-agent reinforcement learning

Application of soft computing
Published: 02 November 2023

Volume 28, pages 5625–5638, (2024)
Cite this article

Soft Computing Aims and scope Submit manuscript

Ling Ding¹,
Wei Du²,
Jian Zhang²,
Lili Guo²,
Chenglong Zhang²,
Di Jin¹ &
…
Shifei Ding²

241 Accesses
Explore all metrics

Abstract

In many real-life scenarios, multiple agents necessitate cooperation to accomplish tasks. Benefiting from the significant success of deep learning, many single-agent deep reinforcement learning algorithms have been extended to multi-agent scenarios. Overestimation in value estimation of Q-learning is a significant issue that has been studied comprehensively in the single-agent domains, but rarely in multi-agent reinforcement learning. In this paper, we first demonstrate that Q-learning-based multi-agent reinforcement learning (MARL) methods generally have notably serious overestimation issues, which cannot be alleviated by current methods. To tackle this problem, we introduce the double critic networks structure and the delayed policy update to Q-learning-based multi-agent MARL methods, which reduce the overestimation and enhance the quality of policy updating. To demonstrate the versatility of our proposed method, we select several Q-learning based MARL methods and evaluate them on several multi-agent tasks on the multi-agent particle environment and SMAC. Experimental results demonstrate that the proposed method can avoid the overestimation problem and significantly improve performance. Besides, application in the Traffic Signal Control verifies the feasibility of applying the proposed method in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bias Estimation Correction in Multi-Agent Reinforcement Learning for Mixed Cooperative-Competitive Environments

Article 16 November 2023

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Article 27 March 2020

Data availability

Availability of data and material possible.

References

Carta S, Ferreira A, Podda AS, Recupero DR, Sanna A (2021) Multi-DQN: an ensemble of deep Q-learning agents for stock market forecasting. Expert Syst Appl 164:113820. https://doi.org/10.1016/j.eswa.2020.113820
Article Google Scholar
Chen X, Xiong G, Lv Y, Chen Y, Song B & Wang FY (2021) A collaborative communication-Qmix approach for large-scale networked traffic signal control. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp 3450–3455. https://doi.org/10.1109/TITS.2019.2901791
Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086–1095. https://doi.org/10.1109/TITS.2019.2901791
Article Google Scholar
Du W, Ding S, Guo L, Zhang J, Zhang C, Ding L (2022) Value function factorization with dynamic weighting for deep multi-agent reinforcement learning. Inf Sci 615:191–208. https://doi.org/10.1016/j.ins.2022.10.042
Article Google Scholar
Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P & Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, pp 1146–1155. https://doi.org/10.48550/arXiv.1702.08887
Fujimoto S, Hoof H & Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, pp 1587–1596. https://doi.org/10.48550/arXiv.1802.09477.
Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: a survey. Artif Intell Rev. https://doi.org/10.1007/s10462-021-09996-w
Article Google Scholar
Ibarz J, Tan J, Finn C, Kalakrishnan, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40(4–5):698–721. https://doi.org/10.1177/0278364920987859
Article Google Scholar
Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2021.3054625
Article Google Scholar
Li S (2020) Multi-agent deep deterministic policy gradient for traffic signal control on urban road network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp 896–900. https://doi.org/10.1109/AEECA49918.2020.9213523
Li Z, Yu H, Zhang G, Dong S, Xu CZ (2021) Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp Res Part C 125:103059. https://doi.org/10.1016/j.trc.2021.103059
Article Google Scholar
Liu B, Ding Z (2022) A distributed deep reinforcement learning method for traffic light control. Neurocomputing 490:390–399
Article Google Scholar
Lowe R, Wu Y, Tamar A et al (2017) Multi-agent actor-critic for mixed cooperative- competitive environments. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1706.02275
Article Google Scholar
Ma T, Chen X, Ma Z & Chen Y (2020) Deep reinforcement learning for pre-caching and task allocation in internet of vehicles. In: 2020 IEEE International Conference on Smart Internet of Things (pp 79–85). IEEE, https://doi.org/10.1109/SmartIoT49966.2020.00021
Maldonado-Ramirez A, Rios-Cabrera R, Lopez-Juarez I (2021) A visual path-following learning approach for industrial robots using DRL. Robot Comput-Integrat Manuf 71:102130. https://doi.org/10.1016/j.rcim.2021.102130
Article Google Scholar
Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M et al (2022) Deep learning, reinforcement learning, and world models. Neural Netw. https://doi.org/10.1016/j.neunet.2022.03.037
Article PubMed Google Scholar
Mnih V, Kavukcuoglu K, Silver D et al. (2013) Playing atari with deep reinforcement learning. In: Proceedings of Workshops at the 26th Neural Information Processing Systems 2013. Lake Tahoe, USA, 2013:201–220. https://doi.org/10.48550/arXiv.1312.5602
Pan L, Cai Q, Huang L (2020) Softmax deep double deterministic policy gradients. Adv Neural Inf Process Syst 33:11767–11777. https://doi.org/10.48550/arXiv.2010.09177
Article Google Scholar
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, & Whiteson S (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning (pp 4295–4304). https://doi.org/10.48550/arXiv.1803.11485
Samvelyan M, Rashid T, De Witt CS, Farquhar G, Nardelli N, Rudner TG et al (2019) The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
Song, Z., Parr, R., & Carin, L. (2019). Revisiting the softmax bellman operator: New benefits and new perspective. In International conference on machine learning, pp 5916–5925. https://doi.org/10.48550/arXiv.1812.00456
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M et al. (2018) Value-decomposition networks for cooperative multi-agent learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pages 2085–2087, https://doi.org/10.48550/arXiv.1706.05296.
Tan T, Bao F, Deng Y, Jin A, Dai Q, Wang J (2019) Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2904742
Article PubMed Google Scholar
Thrun S & Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Proceedings of the Fourth Connectionist Models Summer School, Vol. 255., p 263
Van Hasselt H, Guez A & Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1), https://doi.org/10.1609/aaai.v30i1.10295
Wang J, Ren Z, Liu T, Yu Y & Zhang C (2020) Qplex: duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, https://doi.org/10.48550/arXiv.2008.01062
Yang S, Yang B, Wong HS, Kang Z (2019) Cooperative traffic signal control using multi-step return and off-policy asynchronous advantage actor-critic graph algorithm. Knowl-Based Syst 183:104855. https://doi.org/10.1016/j.knosys.2019.07.026
Article Google Scholar
Yang J, Zhang J, Wang H (2020) Urban traffic control in software defined internet of things via a multi-agent deep reinforcement learning approach. IEEE Trans Intell Transp Syst 22(6):3742–3754. https://doi.org/10.1109/TITS.2020.3023788
Article ADS Google Scholar
Yang S, Yang B, Kang Z, Deng L (2021) IHG-MA: inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control. Neural Netw 139:265–277. https://doi.org/10.1016/j.neunet.2021.03.015
Article PubMed Google Scholar
Yu C, Feng Y, Liu HX, Ma W, Yang X (2019) Corridor level cooperative trajectory optimization with connected and automated vehicles. Transp Res Part C 105:405–421. https://doi.org/10.1016/j.trc.2019.06.002
Article Google Scholar
Zhang L, Zhou W, Xia J, Gao C, Zhu F, Fan C, Ou J (2022) DQN-based mobile edge computing for smart Internet of vehicle. EURASIP J Adv Signal Process 2022(1):1–16. https://doi.org/10.1186/s13634-022-00876-1
Article Google Scholar
Zhang Z, Pan Z & Kochenderfer MJ (2017) Weighted double Q-learning. In: IJCAI, pp 3455–3461. https://doi.org/10.24963/ijcai.2017/483
Zhou M, Luo J, Villella J, Yang Y, Rusu D, Miao J & Wang J (2020) Smarts: scalable multi-agent reinforcement learning training school for autonomous driving. arXiv preprint arXiv:2010.09776, https://doi.org/10.48550/arXiv.2010.0977.
Zhu F, Lv Y, Chen Y, Wang X, Xiong G, Wang FY (2019) Parallel transportation systems: toward IoT-enabled smart urban traffic control and management. IEEE Trans Intell Transp Syst 21(10):4063–4071. https://doi.org/10.1109/TITS.2019.2934991
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundations of China (No. 62272340, No. 62276265 and No. 61976216).

Funding

Possible.

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Ling Ding & Di Jin
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Wei Du, Jian Zhang, Lili Guo, Chenglong Zhang & Shifei Ding

Authors

Ling Ding
View author publications
You can also search for this author in PubMed Google Scholar
Wei Du
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lili Guo
View author publications
You can also search for this author in PubMed Google Scholar
Chenglong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Di Jin
View author publications
You can also search for this author in PubMed Google Scholar
Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ding, L., Du, W., Zhang, J. et al. Better value estimation in Q-learning-based multi-agent reinforcement learning. Soft Comput 28, 5625–5638 (2024). https://doi.org/10.1007/s00500-023-09365-5

Download citation

Accepted: 01 October 2023
Published: 02 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00500-023-09365-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Better value estimation in Q-learning-based multi-agent reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Bias Estimation Correction in Multi-Agent Reinforcement Learning for Mixed Cooperative-Competitive Environments

A review of cooperative multi-agent deep reinforcement learning

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Better value estimation in Q-learning-based multi-agent reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Bias Estimation Correction in Multi-Agent Reinforcement Learning for Mixed Cooperative-Competitive Environments

A review of cooperative multi-agent deep reinforcement learning

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation