Rationality of Reward Sharing in Multi-agent Reinforcement Learning

  • Kazuteru Miyazaki
  • Shigenobu Kobayashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1733)


In multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on the Rationality Theorem of Profit Sharing [5] and analyze how to share a reward among all profit sharing agents. When an agent gets a direct reward R (R > 0), an indirect reward µR (µ ≥ 0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows
$$ \mu < \frac{{M - 1}} {{M^W \left( {1 - (\tfrac{1} {M})^{W_0 } } \right)\left( {n - 1} \right)L}}, $$
where M and L are the maximum number of conflicting all rules and rational rules in the same sensory input, W and W 0 are the maximum episode length of a direct and an indirect-reward agents, and n is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem.


Sensory Input Rational Rule Rule Sequence Negative Reward Sharing Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arai, S., Miyazaki, K., and Kobayashi, S.: Generating Cooperative Behavior by Multi-Agent Reinforcement Learning, Proc. of the 6th European Workshop on Learning Robots, pp.143–157 (1997). 111, 115Google Scholar
  2. 2.
    Arai, S., Miyazaki, K., and Kobayashi, S.: Cranes Control Using Multi-agent Reinforcement Learning, International Conference on Intelligent Autonomous System 5, pp.335–342 (1998). 111, 115Google Scholar
  3. 3.
    Grefenstette, J. J.: Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms, Machine Learning Vol.3, pp.225–245 (1988). 111, 113Google Scholar
  4. 4.
    Holland, J. H.: Escaping Brittleness: The Possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-Based Sysems, in R.S. Michalsky et al. (eds.), Machine Learning: An Artificial Intelligence Approach, Vol.2, pp.593–623. Morgan Kaufman (1986). 111Google Scholar
  5. 5.
    Miyazaki, K., Yamamura, M., and Kobayashi, S.: On the Rationality of Profit Sharing in Reinforcement Learning, Proc. of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, Iizuka, Japan, pp.285–288 (1994). 111, 112, 114, 116Google Scholar
  6. 6.
    Miyazaki, K., and Kobayashi, S.: Learning Deterministic Policies in Partially Observable Markov Decision Processes, International Conference on Intelligent Autonomous System 5, pp.250–257 (1998). 115Google Scholar
  7. 7.
    Ono, N., Ikeda, O. and Rahmani, A.T.: Synthesis of Herding and Specialized Behavior by Modular Q-learning Animats, Proc. of the ALIFE V Poster Presentations, pp.26–30 (1996). 111Google Scholar
  8. 8.
    Sen, S. and Sekaran, M.: Multiagent Coordination with Learning Classifier Systems, inWeiss, G. and Sen, S.(eds.), Adaption and Learning in Multi-agent systems, Berlin, Heidelberg. Springer Verlag, pp.218–233 (1995). 111, 115Google Scholar
  9. 9.
    Tan, M.: Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents, Proc. of the 10th International Conference on Machine Learning, pp.330–337 (1993). 111Google Scholar
  10. 10.
    Watkins, C. J. H., and Dayan, P.: Technical note: Q-learning, Machine Learning Vol.8, pp.55–68 (1992). 111, 116Google Scholar
  11. 11.
    Weiss, G.: Learning to Coordinate Actions in Multi-Agent Systems, Proc. of the 13th International Joint Conference on Artificial Intelligence, pp.311–316 (1993). 111Google Scholar
  12. 12.
    Whitehead, S. D. and Balland, D. H.: Active perception and Reinforcement Learning, Proc. of the 7th International Conference on Machine Learning, pp.162–169 (1990). 111, 115Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Kazuteru Miyazaki
    • 1
  • Shigenobu Kobayashi
    • 1
  1. 1.Department of Computational Intelligence and Systems Science Interdisciplinary Graduate School of Science and EngineeringTokyo Institute of TechnologyMidori-kuJAPAN

Personalised recommendations