A Note on Working Memory in Agent Learning

  • Fang Zhong
Part of the International Handbooks on Information Systems book series (INFOSYS)


An important dimension of system and mechanism design, working memory, has been paid insufficient attention by scholars. Existing literature reports mixed findings on the effects of the amount of working memory on system efficiency. In this note, we investigate this relationship with a computational approach. We design an intelligent agent system in which three agents, one buyer and two bidders, play an Exchange Game repeatedly. The buyer agent decides whether to list a request for proposal, while the bidders bid for it independently. Only one bidder can win on a given round of play. Once the winning bidder is chosen by the buyer, a transaction takes place. The two parties of the trade can either cooperate or defect at this point. The decisions are made simultaneously and the payoffs essentially follow the Prisoner’s Dilemma game. We find that the relationship between working memory and the efficiency of the system has an inverted U-shape, i.e., there seems to be an optimal memory size. When we mixed agents with different memory sizes together, agents with the same amount of working memory generate the most efficient outcome in terms of total payoffs.


Reinforcement Learning Multiagent System Memory Size Total Surplus Agent Learn 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AH81]
    Robert Axelrod and W.D. Hamilton, The evolution of cooperation, Science 211 (1981), 1390.ADSPubMedMathSciNetCrossRefGoogle Scholar
  2. [BS00]
    Adam M. Brandenburger and Harborne (Gus) W. Stuart, Biform games, Working paper No. 00-06, Harvard NOM Research Paper,, November 2000.Google Scholar
  3. [CB98]
    Caroline Claus and Craig Boutilier, The dynamics of reinforcement learning in cooperative multiagent systems, Proceedings of the Fifteenth National Conference on Artificial Intelligence (Menlo Park, CA), AAAI Press/MIT Press, 1998, pp. 746–752.Google Scholar
  4. [Del03]
    Chrysanthos N. Dellarocas, Efficiency and robustness of binary feedback mechanisms in trading environments with moral hazard, Working Paper No. 4297-03, MIT, Sloan School, Cambridge, MA, January 2003, Scholar
  5. [HW98]
    J. Hu and M. P. Wellman, Multiagent reinforcement learning: Theoretical framework and an algorithm, Fifteenth International Conference on Machine Learning, July 1998, pp. 242–250.Google Scholar
  6. [KL97]
    Steven O. Kimbrough and Ronald M. Lee, Formal aspects of electronic commerce: Research issues and challenges, International Journal of Electronic Commerce 1 (1997), no. 4, 11–30.Google Scholar
  7. [KLK04]
    Steven O. Kimbrough, Ming Lu, and Ann Kuo, A note on strategic learning in policy space, Formal Modelling in Electronic Commerce: Representation, Inference, and Strategic Interaction (Steven O. Kimbrough and D. J. Wu, eds.), Springer, 2004.Google Scholar
  8. [KLM96]
    Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4 (1996), 237–285.Google Scholar
  9. [KWZ02]
    Steven O. Kimbrough, D. J. Wu, and Fang Zhong, Computers play the beer game: Can artificial agents manage supply chains?, Decision Support Systems 33 (2002), no. 3, 323–333.CrossRefGoogle Scholar
  10. [SB98]
    Richar S. Sutton and Andrew G. Barto, Reinforcement learning: An introduction, The MIT Press, Cambridge, MA, 1998.Google Scholar
  11. [SC95]
    T. Sandholm and R. Crites, Multiagent reinforcement learning in iterated prisoner’s dilemma, Biosystems 37 (1995), 147–166, Special Issue on the Prisoner’s Dilemma.CrossRefGoogle Scholar
  12. [Sha89]
    C. Shapiro, The theory of business strategy, Rand Journal of Economics 20 (1989), 125–137.PubMedCrossRefGoogle Scholar
  13. [Sut91]
    J. Sutton, Sunk costs and market structure: Price competition, advertising, and the evolution of concentration, The MIT Press, Cambridge, MA, 1991.Google Scholar
  14. [WD92]
    C.J.C.H. Watkins and P. Dayan, Q-learning, Machine Learning 8 (1992), 279–292.Google Scholar
  15. [ZKW02]
    Fang Zhong, Steven O. Kimbrough, and D. J. Wu, Cooperative agent systems: Artificial agents play the ultimatum game, Journal of Group Decision and Negotiation 11 (November 2002), no. 6, 433–447.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Fang Zhong
    • 1
  1. 1.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations