Skip to main content
Log in

Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning

  • Published:
Journal of Control Theory and Applications Aims and scope Submit manuscript

Abstract

A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space (action choices and world states) and the number of agents. Nonetheless, there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease, both individually and cooperatively. This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition. Using biologically derived mechanisms for state representation and mammalian social intelligence, we constrain state-action choices in reinforcement learning in order to improve learning efficiency. Analysis results bound the reduction in computational complexity due to state abstraction, hierarchical representation, and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes. Investigation of two task domains, single-robot herding and multirobot foraging, shows that theoretical bounds hold and that acceptable policies emerge, which reduce task completion time, computational cost, and/or memory resources compared to learning without hierarchical representations and with no social knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Bernstein, R. Givan, N. Immerman, et al. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 2002, 27(4): 819–840.

    Article  MathSciNet  MATH  Google Scholar 

  2. Z. Rabinovich, C. V. Goldman, J. S. Rosenschein. The complexity of multiagent systems: The price of silence. Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, Melbourne, Australia, 2003: 1102–1103.

  3. R. Granger. Engines of the brain: the computational instruction set of human cognition. AI Magazine, 2006: 27(2): 15–32.

    Google Scholar 

  4. A. Rodriguez, J. Whitson, R. Granger. Derivation and analysis of basic computational operations of thalamocortical circuits. Journal of Cognitive Neuroscience, 2004: 16(5): 856–877.

    Article  Google Scholar 

  5. J. B. Silk. Social components of fitness in primate groups. Science, 2007, 317(5843): 1347–1351.

    Article  Google Scholar 

  6. B. B. Smuts, D. L. Cheney, R. M. Seyfarth, et al., eds. Primate Societies. Chicago: University of Chicago Press, 1987.

    Google Scholar 

  7. C. Boesch, H. Boesch-Achermann. The Chimpanzees of the Tai Forest: Behavioral Ecology and Evolution. Oxford: Oxford University Press, 2000.

    Google Scholar 

  8. M. Byron. Satisficing and optimality. Ethics, 1998: 109(1): 67–93.

    Article  Google Scholar 

  9. W. Byrne, A. Whiten. Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Clarendon Press, 1988.

    Google Scholar 

  10. C. W. Clark, M. Mangel. Dynamic State Variable Models in Ecology: Methods and Applications. New York: Oxford University Press, 2000.

    Google Scholar 

  11. L. A. Giraldeau, T. Caraco. Social Foraging Theory. Princeton: Princeton University Press, 2000.

    Google Scholar 

  12. S. A. Rands, G. Cowlishaw, R. A. Pettifor, et al. The spontaneous emergence of leaders and followers in a foraging pair. Nature, 2003, 423(6938): 432–434.

    Article  Google Scholar 

  13. R. C. Connor. Dolphin social intelligence: complex alliance relationships in bottlenose dolphins and a consideration of selective environments for extreme brain size evolution in mammals. Philosophical Transactions of the Royal Society B: Biological Sciences, 2007: 362(1480): 587–602.

    Article  Google Scholar 

  14. C. Boesch. Complex cooperation among tai chimpanzees. Animal Social Complexity. F. B. M. Waal, P. L. Tyack, eds. Cambridge: Harvard University Press, 2003: 93–110.

    Google Scholar 

  15. C. Goldman, S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research, 2004: 22(1): 143–174.

    MathSciNet  MATH  Google Scholar 

  16. J. Hu, M. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 2003, 4: 1039–1069.

    Article  MathSciNet  Google Scholar 

  17. D. Fudenberg, J. Tirole. Game Theory. Cambridge: MIT Press, 1991.

    Google Scholar 

  18. R. Vaughan, N. Sumpter, A. Frost, et al. Experiments in automatic flock control. Robotics and Autonomous Systems, 2000: 31(1/2): 109–116.

    Article  Google Scholar 

  19. D. Busquets, R. L. de Mantaras, C. Sierra, et al. Reinforcement learning for landmark-based robot navigation. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2002: 841–843.

    Chapter  Google Scholar 

  20. A. Sanjeev, B. Boaz. Complexity Theory: A Modern Approach. Cambridge: Cambridge University Press, 2009.

    MATH  Google Scholar 

  21. C. J. C. H. Watkins, P. Dayan. Technical note: Q-learning. Machine Learning, 1992: 8(3/4): 279–292.

    Article  MATH  Google Scholar 

  22. J. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994: 16(3): 185–202.

    MathSciNet  MATH  Google Scholar 

  23. M. L. Littman. Markov games as a framework for multiagent reinforcement learning. Proceedings of the 11th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc., 1994: 157–163.

    Google Scholar 

  24. C. Claus, C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press, 1998: 746–752.

    Google Scholar 

  25. L. D. Pyeatt, A. E. Howe. Decision Tree Function Approximation in Reinforcement Learning. Report TR CS-98-112. Fort Collins, CO: Colorado State University, 1998.

    Google Scholar 

  26. W. T. B. Uther, M.M. Veloso. Tree based discretization for continuous state space reinforcement learning. Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA: American Association for Artificial Intelligence (AAAI), 1998: 769–774.

    Google Scholar 

  27. X. Sun, T. Mao, J. D. Kralik, et al. Cooperative multirobot reinforcement learning: a framework in hybrid state space. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), New York: IEEE, 2009: 1190–1196.

    Google Scholar 

  28. T. Mao, X. Sun, L. E. Ray. Role selection in multi-robot systems using abstract state-based reinforcement learning. Proceedings of the 14th IASTED International Conference on Robotics and Applications, Boston, MA, 2009.

  29. M. Z. Sauter, D. Shi, J. D. Kralik. Multi-agent reinforcement learning and chimpanzee hunting. Proceedings of IEEE International Conferenceon Robotics and Biomimetics (ROBIO), New York: IEEE, 2009: 622–626.

    Chapter  Google Scholar 

  30. D. Shi, M. Z. Sauter, J. D. Kralik. Distributed, heterogeneous, multiagent social coordination via reinforcement learning. Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO), New York: IEEE, 2009: 653–658.

    Chapter  Google Scholar 

  31. M. Lauer, M. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc., 2000: 535–542.

    Google Scholar 

  32. R. C. Miller. The significance of the gregarious habit. Ecology, 1922: 3(2): 122–126.

    Article  Google Scholar 

  33. Webots Reference Manual. Professional Mobile Robot Simulation Software. Cyberbotics Ltd. http://www.cyberbotics.com.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueqing Sun.

Additional information

This work was supported by the Office of Naval Research under Multi-University Research Initiative (MURI) (No.N00014-08-1-0693).

Xueqing SUN is a Ph.D. candidate at the Thayer School of Engineering at Dartmouth College. She received her M.S. degree from Rice University in 1997 and her B.S. degree from Tsinghua University, Beijing, China, in 1992. Between 1992 and 1995, she did research in distributed system design in the National Research Center for Computer Integrated Manufacturing Systems in Beijing. From 1997 to 2008, she was a senior data analyst in semiconductor fabrication at Micron Technology, Inc. Her research interests focuses on multiagent systems, mobile robot coordination, and reinforcement learning.

Tao MAO joined Thayer School of Engineering at Dartmouth College in 2008 and is currently pursuing a Ph.D. degree. He received his Bachelor’s degree in Electrical Engineering with first-class honors from Zhejiang University in 2008. His research interests include multiagent intelligent systems, machine learning, and reinforcement learning.

Laura RAY is a professor of engineering sciences at the Thayer School of Engineering, Dartmouth College. She received her B.S. degree with highest honors and Ph.D. in Mechanical and Aerospace Engineering from Princeton University and her M.S. degree in Mechanical Engineering from Stanford University. Her current research interests include control of multiagent systems, robot mobility and vehicle-terrain interaction, and field robotics.

Dongqing SHI is a research associate at Dartmouth College in the Department of Psychological and Brain Sciences. His research interests are primarily in the areas of social intelligence, artificial intelligence, and mobile robotics. He received his Ph.D. degree from Florida State University in 2006 and M.S. degree in Mechanical Engineering from Zhejiang University, China in 2002.

Jerald KRALIK is an assistant professor in the Department of Psychological and Brain Sciences at Dartmouth College. He received his B.S. degree in Zoology from Michigan State University and A.M. and Ph.D. degrees in Psychology from Harvard University. He also completed post-doctoral positions in behavioral neuroscience at the Duke University Medical Center and the National Institute of Mental Health. His research interests include animal cognition and behavior, cognitive neuroscience, and brain engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Mao, T., Ray, L. et al. Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning. J. Control Theory Appl. 9, 440–450 (2011). https://doi.org/10.1007/s11768-011-1047-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-011-1047-6

Keywords

Navigation