Abstract
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space (action choices and world states) and the number of agents. Nonetheless, there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease, both individually and cooperatively. This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition. Using biologically derived mechanisms for state representation and mammalian social intelligence, we constrain state-action choices in reinforcement learning in order to improve learning efficiency. Analysis results bound the reduction in computational complexity due to state abstraction, hierarchical representation, and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes. Investigation of two task domains, single-robot herding and multirobot foraging, shows that theoretical bounds hold and that acceptable policies emerge, which reduce task completion time, computational cost, and/or memory resources compared to learning without hierarchical representations and with no social knowledge.
Similar content being viewed by others
References
D. Bernstein, R. Givan, N. Immerman, et al. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 2002, 27(4): 819–840.
Z. Rabinovich, C. V. Goldman, J. S. Rosenschein. The complexity of multiagent systems: The price of silence. Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, Melbourne, Australia, 2003: 1102–1103.
R. Granger. Engines of the brain: the computational instruction set of human cognition. AI Magazine, 2006: 27(2): 15–32.
A. Rodriguez, J. Whitson, R. Granger. Derivation and analysis of basic computational operations of thalamocortical circuits. Journal of Cognitive Neuroscience, 2004: 16(5): 856–877.
J. B. Silk. Social components of fitness in primate groups. Science, 2007, 317(5843): 1347–1351.
B. B. Smuts, D. L. Cheney, R. M. Seyfarth, et al., eds. Primate Societies. Chicago: University of Chicago Press, 1987.
C. Boesch, H. Boesch-Achermann. The Chimpanzees of the Tai Forest: Behavioral Ecology and Evolution. Oxford: Oxford University Press, 2000.
M. Byron. Satisficing and optimality. Ethics, 1998: 109(1): 67–93.
W. Byrne, A. Whiten. Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Clarendon Press, 1988.
C. W. Clark, M. Mangel. Dynamic State Variable Models in Ecology: Methods and Applications. New York: Oxford University Press, 2000.
L. A. Giraldeau, T. Caraco. Social Foraging Theory. Princeton: Princeton University Press, 2000.
S. A. Rands, G. Cowlishaw, R. A. Pettifor, et al. The spontaneous emergence of leaders and followers in a foraging pair. Nature, 2003, 423(6938): 432–434.
R. C. Connor. Dolphin social intelligence: complex alliance relationships in bottlenose dolphins and a consideration of selective environments for extreme brain size evolution in mammals. Philosophical Transactions of the Royal Society B: Biological Sciences, 2007: 362(1480): 587–602.
C. Boesch. Complex cooperation among tai chimpanzees. Animal Social Complexity. F. B. M. Waal, P. L. Tyack, eds. Cambridge: Harvard University Press, 2003: 93–110.
C. Goldman, S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research, 2004: 22(1): 143–174.
J. Hu, M. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 2003, 4: 1039–1069.
D. Fudenberg, J. Tirole. Game Theory. Cambridge: MIT Press, 1991.
R. Vaughan, N. Sumpter, A. Frost, et al. Experiments in automatic flock control. Robotics and Autonomous Systems, 2000: 31(1/2): 109–116.
D. Busquets, R. L. de Mantaras, C. Sierra, et al. Reinforcement learning for landmark-based robot navigation. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2002: 841–843.
A. Sanjeev, B. Boaz. Complexity Theory: A Modern Approach. Cambridge: Cambridge University Press, 2009.
C. J. C. H. Watkins, P. Dayan. Technical note: Q-learning. Machine Learning, 1992: 8(3/4): 279–292.
J. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994: 16(3): 185–202.
M. L. Littman. Markov games as a framework for multiagent reinforcement learning. Proceedings of the 11th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc., 1994: 157–163.
C. Claus, C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press, 1998: 746–752.
L. D. Pyeatt, A. E. Howe. Decision Tree Function Approximation in Reinforcement Learning. Report TR CS-98-112. Fort Collins, CO: Colorado State University, 1998.
W. T. B. Uther, M.M. Veloso. Tree based discretization for continuous state space reinforcement learning. Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA: American Association for Artificial Intelligence (AAAI), 1998: 769–774.
X. Sun, T. Mao, J. D. Kralik, et al. Cooperative multirobot reinforcement learning: a framework in hybrid state space. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), New York: IEEE, 2009: 1190–1196.
T. Mao, X. Sun, L. E. Ray. Role selection in multi-robot systems using abstract state-based reinforcement learning. Proceedings of the 14th IASTED International Conference on Robotics and Applications, Boston, MA, 2009.
M. Z. Sauter, D. Shi, J. D. Kralik. Multi-agent reinforcement learning and chimpanzee hunting. Proceedings of IEEE International Conferenceon Robotics and Biomimetics (ROBIO), New York: IEEE, 2009: 622–626.
D. Shi, M. Z. Sauter, J. D. Kralik. Distributed, heterogeneous, multiagent social coordination via reinforcement learning. Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO), New York: IEEE, 2009: 653–658.
M. Lauer, M. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc., 2000: 535–542.
R. C. Miller. The significance of the gregarious habit. Ecology, 1922: 3(2): 122–126.
Webots Reference Manual. Professional Mobile Robot Simulation Software. Cyberbotics Ltd. http://www.cyberbotics.com.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Office of Naval Research under Multi-University Research Initiative (MURI) (No.N00014-08-1-0693).
Xueqing SUN is a Ph.D. candidate at the Thayer School of Engineering at Dartmouth College. She received her M.S. degree from Rice University in 1997 and her B.S. degree from Tsinghua University, Beijing, China, in 1992. Between 1992 and 1995, she did research in distributed system design in the National Research Center for Computer Integrated Manufacturing Systems in Beijing. From 1997 to 2008, she was a senior data analyst in semiconductor fabrication at Micron Technology, Inc. Her research interests focuses on multiagent systems, mobile robot coordination, and reinforcement learning.
Tao MAO joined Thayer School of Engineering at Dartmouth College in 2008 and is currently pursuing a Ph.D. degree. He received his Bachelor’s degree in Electrical Engineering with first-class honors from Zhejiang University in 2008. His research interests include multiagent intelligent systems, machine learning, and reinforcement learning.
Laura RAY is a professor of engineering sciences at the Thayer School of Engineering, Dartmouth College. She received her B.S. degree with highest honors and Ph.D. in Mechanical and Aerospace Engineering from Princeton University and her M.S. degree in Mechanical Engineering from Stanford University. Her current research interests include control of multiagent systems, robot mobility and vehicle-terrain interaction, and field robotics.
Dongqing SHI is a research associate at Dartmouth College in the Department of Psychological and Brain Sciences. His research interests are primarily in the areas of social intelligence, artificial intelligence, and mobile robotics. He received his Ph.D. degree from Florida State University in 2006 and M.S. degree in Mechanical Engineering from Zhejiang University, China in 2002.
Jerald KRALIK is an assistant professor in the Department of Psychological and Brain Sciences at Dartmouth College. He received his B.S. degree in Zoology from Michigan State University and A.M. and Ph.D. degrees in Psychology from Harvard University. He also completed post-doctoral positions in behavioral neuroscience at the Duke University Medical Center and the National Institute of Mental Health. His research interests include animal cognition and behavior, cognitive neuroscience, and brain engineering.
Rights and permissions
About this article
Cite this article
Sun, X., Mao, T., Ray, L. et al. Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning. J. Control Theory Appl. 9, 440–450 (2011). https://doi.org/10.1007/s11768-011-1047-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768-011-1047-6