Abstract
In many real-life scenarios, multiple agents necessitate cooperation to accomplish tasks. Benefiting from the significant success of deep learning, many single-agent deep reinforcement learning algorithms have been extended to multi-agent scenarios. Overestimation in value estimation of Q-learning is a significant issue that has been studied comprehensively in the single-agent domains, but rarely in multi-agent reinforcement learning. In this paper, we first demonstrate that Q-learning-based multi-agent reinforcement learning (MARL) methods generally have notably serious overestimation issues, which cannot be alleviated by current methods. To tackle this problem, we introduce the double critic networks structure and the delayed policy update to Q-learning-based multi-agent MARL methods, which reduce the overestimation and enhance the quality of policy updating. To demonstrate the versatility of our proposed method, we select several Q-learning based MARL methods and evaluate them on several multi-agent tasks on the multi-agent particle environment and SMAC. Experimental results demonstrate that the proposed method can avoid the overestimation problem and significantly improve performance. Besides, application in the Traffic Signal Control verifies the feasibility of applying the proposed method in real-world scenarios.
Similar content being viewed by others
Data availability
Availability of data and material possible.
References
Carta S, Ferreira A, Podda AS, Recupero DR, Sanna A (2021) Multi-DQN: an ensemble of deep Q-learning agents for stock market forecasting. Expert Syst Appl 164:113820. https://doi.org/10.1016/j.eswa.2020.113820
Chen X, Xiong G, Lv Y, Chen Y, Song B & Wang FY (2021) A collaborative communication-Qmix approach for large-scale networked traffic signal control. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp 3450–3455. https://doi.org/10.1109/TITS.2019.2901791
Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086–1095. https://doi.org/10.1109/TITS.2019.2901791
Du W, Ding S, Guo L, Zhang J, Zhang C, Ding L (2022) Value function factorization with dynamic weighting for deep multi-agent reinforcement learning. Inf Sci 615:191–208. https://doi.org/10.1016/j.ins.2022.10.042
Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P & Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, pp 1146–1155. https://doi.org/10.48550/arXiv.1702.08887
Fujimoto S, Hoof H & Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, pp 1587–1596. https://doi.org/10.48550/arXiv.1802.09477.
Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: a survey. Artif Intell Rev. https://doi.org/10.1007/s10462-021-09996-w
Ibarz J, Tan J, Finn C, Kalakrishnan, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 40(4–5):698–721. https://doi.org/10.1177/0278364920987859
Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2021.3054625
Li S (2020) Multi-agent deep deterministic policy gradient for traffic signal control on urban road network. In: 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp 896–900. https://doi.org/10.1109/AEECA49918.2020.9213523
Li Z, Yu H, Zhang G, Dong S, Xu CZ (2021) Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp Res Part C 125:103059. https://doi.org/10.1016/j.trc.2021.103059
Liu B, Ding Z (2022) A distributed deep reinforcement learning method for traffic light control. Neurocomputing 490:390–399
Lowe R, Wu Y, Tamar A et al (2017) Multi-agent actor-critic for mixed cooperative- competitive environments. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1706.02275
Ma T, Chen X, Ma Z & Chen Y (2020) Deep reinforcement learning for pre-caching and task allocation in internet of vehicles. In: 2020 IEEE International Conference on Smart Internet of Things (pp 79–85). IEEE, https://doi.org/10.1109/SmartIoT49966.2020.00021
Maldonado-Ramirez A, Rios-Cabrera R, Lopez-Juarez I (2021) A visual path-following learning approach for industrial robots using DRL. Robot Comput-Integrat Manuf 71:102130. https://doi.org/10.1016/j.rcim.2021.102130
Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M et al (2022) Deep learning, reinforcement learning, and world models. Neural Netw. https://doi.org/10.1016/j.neunet.2022.03.037
Mnih V, Kavukcuoglu K, Silver D et al. (2013) Playing atari with deep reinforcement learning. In: Proceedings of Workshops at the 26th Neural Information Processing Systems 2013. Lake Tahoe, USA, 2013:201–220. https://doi.org/10.48550/arXiv.1312.5602
Pan L, Cai Q, Huang L (2020) Softmax deep double deterministic policy gradients. Adv Neural Inf Process Syst 33:11767–11777. https://doi.org/10.48550/arXiv.2010.09177
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, & Whiteson S (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning (pp 4295–4304). https://doi.org/10.48550/arXiv.1803.11485
Samvelyan M, Rashid T, De Witt CS, Farquhar G, Nardelli N, Rudner TG et al (2019) The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
Song, Z., Parr, R., & Carin, L. (2019). Revisiting the softmax bellman operator: New benefits and new perspective. In International conference on machine learning, pp 5916–5925. https://doi.org/10.48550/arXiv.1812.00456
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M et al. (2018) Value-decomposition networks for cooperative multi-agent learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, pages 2085–2087, https://doi.org/10.48550/arXiv.1706.05296.
Tan T, Bao F, Deng Y, Jin A, Dai Q, Wang J (2019) Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2904742
Thrun S & Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Proceedings of the Fourth Connectionist Models Summer School, Vol. 255., p 263
Van Hasselt H, Guez A & Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1), https://doi.org/10.1609/aaai.v30i1.10295
Wang J, Ren Z, Liu T, Yu Y & Zhang C (2020) Qplex: duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062, https://doi.org/10.48550/arXiv.2008.01062
Yang S, Yang B, Wong HS, Kang Z (2019) Cooperative traffic signal control using multi-step return and off-policy asynchronous advantage actor-critic graph algorithm. Knowl-Based Syst 183:104855. https://doi.org/10.1016/j.knosys.2019.07.026
Yang J, Zhang J, Wang H (2020) Urban traffic control in software defined internet of things via a multi-agent deep reinforcement learning approach. IEEE Trans Intell Transp Syst 22(6):3742–3754. https://doi.org/10.1109/TITS.2020.3023788
Yang S, Yang B, Kang Z, Deng L (2021) IHG-MA: inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control. Neural Netw 139:265–277. https://doi.org/10.1016/j.neunet.2021.03.015
Yu C, Feng Y, Liu HX, Ma W, Yang X (2019) Corridor level cooperative trajectory optimization with connected and automated vehicles. Transp Res Part C 105:405–421. https://doi.org/10.1016/j.trc.2019.06.002
Zhang L, Zhou W, Xia J, Gao C, Zhu F, Fan C, Ou J (2022) DQN-based mobile edge computing for smart Internet of vehicle. EURASIP J Adv Signal Process 2022(1):1–16. https://doi.org/10.1186/s13634-022-00876-1
Zhang Z, Pan Z & Kochenderfer MJ (2017) Weighted double Q-learning. In: IJCAI, pp 3455–3461. https://doi.org/10.24963/ijcai.2017/483
Zhou M, Luo J, Villella J, Yang Y, Rusu D, Miao J & Wang J (2020) Smarts: scalable multi-agent reinforcement learning training school for autonomous driving. arXiv preprint arXiv:2010.09776, https://doi.org/10.48550/arXiv.2010.0977.
Zhu F, Lv Y, Chen Y, Wang X, Xiong G, Wang FY (2019) Parallel transportation systems: toward IoT-enabled smart urban traffic control and management. IEEE Trans Intell Transp Syst 21(10):4063–4071. https://doi.org/10.1109/TITS.2019.2934991
Acknowledgements
This work is supported by the National Natural Science Foundations of China (No. 62272340, No. 62276265 and No. 61976216).
Funding
Possible.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, L., Du, W., Zhang, J. et al. Better value estimation in Q-learning-based multi-agent reinforcement learning. Soft Comput 28, 5625–5638 (2024). https://doi.org/10.1007/s00500-023-09365-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-09365-5