Aiming at the locality and uncertainty of observations in large-scale multi-agent application scenarios, the model of Decentralized Partially Observable Markov Decision Processes (DEC-POMDP) is considered, and a novel multi-agent reinforcement learning algorithm based on local communication is proposed. For a distributed learning environment, the elements of reinforcement learning are difficult to describe effectively in local observation situation, and the learning behaviour of each individual agent is influenced by its teammates. The local communication with consensus protocol is utilized to agree on the global observing environment, and thus that a part of strategies generated by repeating observations are eliminated, and the agent team gradually reach uniform opinion on the state of the event or object to be observed, they can thus approach a unique belief space regardless of whether each individual agent can perform a complete or partial observation. The simulation results show that the learning strategy space is reduced, and the learning speed is improved.
Reinforcement learning Multi-agent Local communication Consensus
This is a preview of subscription content, log in to check access.
Ramachandram, D., Taylor, G.W.: Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process. Mag. 34(6), 96–108 (2017)CrossRefGoogle Scholar
Girard, J., Emami, M.R.: Concurrent Markov decision processes for robot team learning. Eng. Appl. Artif. Intell. 39, 223–234 (2015)CrossRefGoogle Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27(1), 1–51 (2013)CrossRefGoogle Scholar
Pajarinen, J., Hottinen, A., Peltonen, J.: Optimizing spatial and temporal reuse in wireless networks by decentralized partially observable Markov decision processes. IEEE Trans. Mob. Comput. 13(4), 866–879 (2014)CrossRefGoogle Scholar
Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)CrossRefGoogle Scholar
Sharma, R., Spaan, M.T.J.: Bayesian-game-based fuzzy reinforcement learning control for decentralized POMDPs. IEEE Trans. Comput. Intell. AI Games 4(4), 309–328 (2012)CrossRefGoogle Scholar
Vaisenberg, R., Della Motta, A., Mehrotra, S., et al.: Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy. Pervasive Mobile Comput. 10, 83–103 (2014)CrossRefGoogle Scholar
Chandrasekaran, M., Doshi, P., Zeng, Y., et al.: Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams. Auton. Agent. Multi-Agent Syst. 31(4), 821–860 (2017)CrossRefGoogle Scholar
Dutta, P.S., Jennings, N.R., Moreau, L.: Cooperative information sharing to improve distributed learning in multi-agent systems. J. Artif. Intell. Res. 24, 407–463 (2005)zbMATHGoogle Scholar
Fang, M., Groen, F.C.A., Li, H., et al.: Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition. Eng. Appl. Artif. Intell. 27(1), 191–198 (2014)CrossRefGoogle Scholar
Cockburn, J., Collins, A.E., Frank, M.: A reinforcement learning mechanism responsible for the valuation of free choice. Neuron 83(3), 551–557 (2014)CrossRefGoogle Scholar
Mongillo, G., Shteingart, H., Loewenstein, Y.: The misbehavior of reinforcement learning. Proc. IEEE 102(4), 528–541 (2014)CrossRefGoogle Scholar
La, H.M., Sheng, W.: Distributed sensor fusion for scalar field mapping using mobile sensor networks. IEEE Trans. Cybern. 43(2), 766–778 (2013)CrossRefGoogle Scholar