Multiagent Planning with Trembling-Hand Perfect Equilibrium in Multiagent POMDPs
Multiagent Partially Observable Markov Decision Processes are a popular model of multiagent systems with uncertainty. Since the computational cost for finding an optimal joint policy is prohibitive, a Joint Equilibrium-based Search for Policies with Nash Equilibrium (JESP-NE) is proposed that finds a locally optimal joint policy in which each policy is a best response to other policies; i.e., the joint policy is a Nash equilibrium.
One limitation of JESP-NE is that the quality of the obtained joint policy depends on the predefined default policy. More specifically, when finding a best response, if some observation have zero probabilities, JESP-NE uses this default policy. If the default policy is quite bad, JESP-NE tends to converge to a sub-optimal joint policy.
In this paper, we propose a method that finds a locally optimal joint policy based on a concept called Trembling-hand Perfect Equilibrium (TPE). In finding a TPE, we assume that an agent might make a mistake in selecting its action with small probability. Thus, an observation with zero probability in JESP-NE will have non-zero probability. We no longer use the default policy. As a result, JESP-TPE can converge to a better joint policy than the JESP-NE, which we confirm this fact by experimental evaluations.
KeywordsMultiagent systems Partially Observable Markov Decision Process Nash equilibrium Trembling-hand perfect equilibrium
Unable to display preview. Download preview PDF.
- 1.Beard, R.W., McLain, T.W.: Multiple uav cooperative search under collision avoidance and limited range communication constraints. In: Proceedings of the 42nd Conference Decision and Control, pp. 25–30. IEEE, Los Alamitos (2003)Google Scholar
- 4.Xuan, P., Lesser, V., Zilberstein, S.: Communication decisions in Multiagent cooperation. In: Proceedings of the Fifth International Conference on Autonomous Agents, pp. 616–623 (2001)Google Scholar
- 5.Goldman, C.V., Zilberstein, S.: Optimizing information exchange in cooperative multi-agent systems. In: Proceedings of the Second International Joint Conference on Agents and Multiagent Systems (AAMAS 2003), pp. 137–144 (2003)Google Scholar
- 6.Nair, R., Tambe, M., Marsella, S.: Role allocation and reallocation in multiagent teams: Towards a practical analysis. In: Proceedings of the Second International Joint Conference on Agents and Multiagent Systems (AAMAS 2003), pp. 552–559 (2003)Google Scholar
- 7.Bernstein, D.S., Zilberstein, S., Immerman, N.: The complexity of decentralized control of markov decision processes. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI 2000), pp. 32–37 (2000)Google Scholar
- 8.Nair, R., Roth, M., Yokoo, M., Tambe, M.: Communication for improving policy computation in distributed pomdps. In: Proceedings of the Third International Joint Conference on Agents and Multiagent Systems (AAMAS 2004), pp. 1098–1105 (2004)Google Scholar