Skip to main content
Log in

Better value estimation in Q-learning-based multi-agent reinforcement learning

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In many real-life scenarios, multiple agents necessitate cooperation to accomplish tasks. Benefiting from the significant success of deep learning, many single-agent deep reinforcement learning algorithms have been extended to multi-agent scenarios. Overestimation in value estimation of Q-learning is a significant issue that has been studied comprehensively in the single-agent domains, but rarely in multi-agent reinforcement learning. In this paper, we first demonstrate that Q-learning-based multi-agent reinforcement learning (MARL) methods generally have notably serious overestimation issues, which cannot be alleviated by current methods. To tackle this problem, we introduce the double critic networks structure and the delayed policy update to Q-learning-based multi-agent MARL methods, which reduce the overestimation and enhance the quality of policy updating. To demonstrate the versatility of our proposed method, we select several Q-learning based MARL methods and evaluate them on several multi-agent tasks on the multi-agent particle environment and SMAC. Experimental results demonstrate that the proposed method can avoid the overestimation problem and significantly improve performance. Besides, application in the Traffic Signal Control verifies the feasibility of applying the proposed method in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Availability of data and material possible.

References

Download references

Acknowledgements

This work is supported by the National Natural Science Foundations of China (No. 62272340, No. 62276265 and No. 61976216).

Funding

Possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifei Ding.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Du, W., Zhang, J. et al. Better value estimation in Q-learning-based multi-agent reinforcement learning. Soft Comput 28, 5625–5638 (2024). https://doi.org/10.1007/s00500-023-09365-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09365-5

Keywords

Navigation