A double Actor-Critic learning system embedding improved Monte Carlo tree search

Zhu, Hongjun; Xie, Yong; Zheng, Suijun

doi:10.1007/s00521-024-09513-4

A double Actor-Critic learning system embedding improved Monte Carlo tree search

Original Article
Published: 23 February 2024

Volume 36, pages 8485–8500, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

155 Accesses
Explore all metrics

Abstract

As the bias between the estimated value and the true value, overestimation is a basic problem in reinforcement learning, which leads to a lower total reward because of the incorrect action decisions. In order to reduce the impact of overestimation on reinforcement learning, we propose a double Actor-Critic learning system embedding improved Monte Carlo Tree Search (DAC-IMCTS). The proposed learning system consists of a reference module, a simulation module and an outcome module. The reference module and the simulation module are designed to compute the upper bound and lower bound of the expected reward of the agent, respectively. And the outcome module is developed to learn the agent’s control policies. The reference module, constructed based on the Actor-Critic framework, provides an upper confidence bound of the expected reward. Different from the classic Actor-Critic learning system, we introduce a simulation module into the new learning system to estimate the lower confidence bound of the expected reward. We propose an improved MCTS in this module to sample the policy distribution more efficiency. Based on the lower and upper confidence bounds, we propose a confidence interval weighted estimation algorithm (CIWE) in the outcome module for generating the target expected reward. We then prove that the target expected reward generated by our method has zero expectation bias, which reduces the overestimation that exists in the classic Actor-Critic learning system. We evaluate our learning system on OpenAI Gym experimental tasks. The experimental results show that our proposed model and algorithm outperform the state-of-the-art learning systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Emerging trends in federated learning: from model fusion to federated X learning

Article Open access 02 April 2024

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

Availability of data and materials

Data will be made available on reasonable request.

References

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International conference on machine learning, pp 4295–4304. PMLR
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Article Google Scholar
Zhou T, Tang D, Zhu H, Zhang Z (2021) Multi-agent reinforcement learning for online scheduling in smart factories. Robot Comput-Integr Manuf 72:102202
Article Google Scholar
Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669
Article MathSciNet Google Scholar
Namdari A, Samani MA, Durrani TS (2022) Lithium-ion battery prognostics through reinforcement learning based on entropy measures. Algorithms 15(11):393
Article Google Scholar
Chen S-A, Tangkaratt V, Lin H-T, Sugiyama M (2020) Active deep q-learning with demonstration. Mach Learn 109(9):1699–1725
Article MathSciNet Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Meng L, Yazidi A, Goodwin M, Engelstad P (2022) Expert q-learning: deep reinforcement learning with coarse state values from offline expert examples. In: Proceedings of the northern lights deep learning workshop, vol 3
Panag TS, Dhillon J (2021) Predator-prey optimization based clustering algorithm for wireless sensor networks. Neural Comput Appl 33(17):11415–11435
Article Google Scholar
Thrun S, Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Proceedings of the 1993 connectionist models summer school Hillsdale, NJ. Lawrence Erlbaum, vol 6, pp 1–9
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12
Lv P, Wang X, Cheng Y, Duan Z, Chen CP (2020) Integrated double estimator architecture for reinforcement learning. IEEE Trans Cybern 52(5):3111–3122
Google Scholar
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in Actor-Critic methods. In: International conference on machine learning, pp 1587–1596. PMLR
Wu H, Zhang J, Wang Z, Lin Y, Li H (2022) Sub-avg: overestimation reduction for cooperative multi-agent reinforcement learning. Neurocomputing 474:94–106
Article Google Scholar
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43
Article Google Scholar
Lu Q, Tao F, Zhou S, Wang Z (2021) Incorporating Actor-Critic in Monte Carlo tree search for symbolic regression. Neural Comput Appl 33(14):8495–8511
Article Google Scholar
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540
Baxter J, Tridgell A, Weaver L (1999) Knightcap: a chess program that learns by combining td (lambda) with game-tree search. arXiv preprint arXiv:cs/9901002
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, pp 387–395. PMLR
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937. PMLR
Walȩdzik K, Mańdziuk J (2018) Applying hybrid Monte Carlo tree search methods to risk-aware project scheduling problem. Inf Sci 460:450–468
Article Google Scholar
Kocsis L, Szepesvári C (2006) Bandit based monte-carlo planning. In: European conference on machine learning, pp 282–293. Springer
Luo S (2020) Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl Soft Comput 91:106208
Article Google Scholar
Snyder RD, Koehler AB, Hyndman RJ, Ord JK (2004) Exponential smoothing models: means and variances for lead-time demand. Eur J Oper Res 158(2):444–455
Article MathSciNet Google Scholar
Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70
Article Google Scholar
Sabry M, Khalifa A (2019) On the reduction of variance and overestimation of deep q-learning. arXiv preprint arXiv:1910.05983
Jadon S (2020) A survey of loss functions for semantic segmentation. In: 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), pp 1–7. IEEE
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their constructive comments and suggestions. The work was supported by the National Natural Science Foundation of China [grant number 71771096].

Author information

Authors and Affiliations

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
Hongjun Zhu, Yong Xie & Suijun Zheng

Authors

Hongjun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Suijun Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Xie.

Ethics declarations

Conflict of interest

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, H., Xie, Y. & Zheng, S. A double Actor-Critic learning system embedding improved Monte Carlo tree search. Neural Comput & Applic 36, 8485–8500 (2024). https://doi.org/10.1007/s00521-024-09513-4

Download citation

Received: 05 January 2023
Accepted: 14 January 2024
Published: 23 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00521-024-09513-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A double Actor-Critic learning system embedding improved Monte Carlo tree search

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Emerging trends in federated learning: from model fusion to federated X learning

Deep learning: systematic review, models, challenges, and research directions

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A double Actor-Critic learning system embedding improved Monte Carlo tree search

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Emerging trends in federated learning: from model fusion to federated X learning

Deep learning: systematic review, models, challenges, and research directions

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation