Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning

Sun, Xueqing; Mao, Tao; Ray, Laura; Shi, Dongqing; Kralik, Jerald

doi:10.1007/s11768-011-1047-6

Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning

Published: 19 July 2011

Volume 9, pages 440–450, (2011)
Cite this article

Journal of Control Theory and Applications Aims and scope Submit manuscript

Xueqing Sun¹,
Tao Mao¹,
Laura Ray¹,
Dongqing Shi² &
…
Jerald Kralik²

157 Accesses
5 Citations
Explore all metrics

Abstract

A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space (action choices and world states) and the number of agents. Nonetheless, there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease, both individually and cooperatively. This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition. Using biologically derived mechanisms for state representation and mammalian social intelligence, we constrain state-action choices in reinforcement learning in order to improve learning efficiency. Analysis results bound the reduction in computational complexity due to state abstraction, hierarchical representation, and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes. Investigation of two task domains, single-robot herding and multirobot foraging, shows that theoretical bounds hold and that acceptable policies emerge, which reduce task completion time, computational cost, and/or memory resources compared to learning without hierarchical representations and with no social knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent problem-solving as integrated hierarchical reinforcement learning

Article 25 January 2022

Abstraction of State-Action Space Utilizing Properties of the Body and Environment

Multi-agent Double Deep Q-Networks

References

D. Bernstein, R. Givan, N. Immerman, et al. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 2002, 27(4): 819–840.
Article MathSciNet MATH Google Scholar
Z. Rabinovich, C. V. Goldman, J. S. Rosenschein. The complexity of multiagent systems: The price of silence. Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, Melbourne, Australia, 2003: 1102–1103.
R. Granger. Engines of the brain: the computational instruction set of human cognition. AI Magazine, 2006: 27(2): 15–32.
Google Scholar
A. Rodriguez, J. Whitson, R. Granger. Derivation and analysis of basic computational operations of thalamocortical circuits. Journal of Cognitive Neuroscience, 2004: 16(5): 856–877.
Article Google Scholar
J. B. Silk. Social components of fitness in primate groups. Science, 2007, 317(5843): 1347–1351.
Article Google Scholar
B. B. Smuts, D. L. Cheney, R. M. Seyfarth, et al., eds. Primate Societies. Chicago: University of Chicago Press, 1987.
Google Scholar
C. Boesch, H. Boesch-Achermann. The Chimpanzees of the Tai Forest: Behavioral Ecology and Evolution. Oxford: Oxford University Press, 2000.
Google Scholar
M. Byron. Satisficing and optimality. Ethics, 1998: 109(1): 67–93.
Article Google Scholar
W. Byrne, A. Whiten. Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Clarendon Press, 1988.
Google Scholar
C. W. Clark, M. Mangel. Dynamic State Variable Models in Ecology: Methods and Applications. New York: Oxford University Press, 2000.
Google Scholar
L. A. Giraldeau, T. Caraco. Social Foraging Theory. Princeton: Princeton University Press, 2000.
Google Scholar
S. A. Rands, G. Cowlishaw, R. A. Pettifor, et al. The spontaneous emergence of leaders and followers in a foraging pair. Nature, 2003, 423(6938): 432–434.
Article Google Scholar
R. C. Connor. Dolphin social intelligence: complex alliance relationships in bottlenose dolphins and a consideration of selective environments for extreme brain size evolution in mammals. Philosophical Transactions of the Royal Society B: Biological Sciences, 2007: 362(1480): 587–602.
Article Google Scholar
C. Boesch. Complex cooperation among tai chimpanzees. Animal Social Complexity. F. B. M. Waal, P. L. Tyack, eds. Cambridge: Harvard University Press, 2003: 93–110.
Google Scholar
C. Goldman, S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research, 2004: 22(1): 143–174.
MathSciNet MATH Google Scholar
J. Hu, M. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 2003, 4: 1039–1069.
Article MathSciNet Google Scholar
D. Fudenberg, J. Tirole. Game Theory. Cambridge: MIT Press, 1991.
Google Scholar
R. Vaughan, N. Sumpter, A. Frost, et al. Experiments in automatic flock control. Robotics and Autonomous Systems, 2000: 31(1/2): 109–116.
Article Google Scholar
D. Busquets, R. L. de Mantaras, C. Sierra, et al. Reinforcement learning for landmark-based robot navigation. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2002: 841–843.
Chapter Google Scholar
A. Sanjeev, B. Boaz. Complexity Theory: A Modern Approach. Cambridge: Cambridge University Press, 2009.
MATH Google Scholar
C. J. C. H. Watkins, P. Dayan. Technical note: Q-learning. Machine Learning, 1992: 8(3/4): 279–292.
Article MATH Google Scholar
J. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994: 16(3): 185–202.
MathSciNet MATH Google Scholar
M. L. Littman. Markov games as a framework for multiagent reinforcement learning. Proceedings of the 11th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc., 1994: 157–163.
Google Scholar
C. Claus, C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA: AAAI Press, 1998: 746–752.
Google Scholar
L. D. Pyeatt, A. E. Howe. Decision Tree Function Approximation in Reinforcement Learning. Report TR CS-98-112. Fort Collins, CO: Colorado State University, 1998.
Google Scholar
W. T. B. Uther, M.M. Veloso. Tree based discretization for continuous state space reinforcement learning. Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA: American Association for Artificial Intelligence (AAAI), 1998: 769–774.
Google Scholar
X. Sun, T. Mao, J. D. Kralik, et al. Cooperative multirobot reinforcement learning: a framework in hybrid state space. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), New York: IEEE, 2009: 1190–1196.
Google Scholar
T. Mao, X. Sun, L. E. Ray. Role selection in multi-robot systems using abstract state-based reinforcement learning. Proceedings of the 14th IASTED International Conference on Robotics and Applications, Boston, MA, 2009.
M. Z. Sauter, D. Shi, J. D. Kralik. Multi-agent reinforcement learning and chimpanzee hunting. Proceedings of IEEE International Conferenceon Robotics and Biomimetics (ROBIO), New York: IEEE, 2009: 622–626.
Chapter Google Scholar
D. Shi, M. Z. Sauter, J. D. Kralik. Distributed, heterogeneous, multiagent social coordination via reinforcement learning. Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO), New York: IEEE, 2009: 653–658.
Chapter Google Scholar
M. Lauer, M. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers Inc., 2000: 535–542.
Google Scholar
R. C. Miller. The significance of the gregarious habit. Ecology, 1922: 3(2): 122–126.
Article Google Scholar
Webots Reference Manual. Professional Mobile Robot Simulation Software. Cyberbotics Ltd. http://www.cyberbotics.com.

Download references

Author information

Authors and Affiliations

Thayer School of Engineering, Dartmouth College, Hanover, NH, 03755, USA
Xueqing Sun, Tao Mao & Laura Ray
Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA
Dongqing Shi & Jerald Kralik

Authors

Xueqing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tao Mao
View author publications
You can also search for this author in PubMed Google Scholar
Laura Ray
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jerald Kralik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueqing Sun.

Additional information

This work was supported by the Office of Naval Research under Multi-University Research Initiative (MURI) (No.N00014-08-1-0693).

Xueqing SUN is a Ph.D. candidate at the Thayer School of Engineering at Dartmouth College. She received her M.S. degree from Rice University in 1997 and her B.S. degree from Tsinghua University, Beijing, China, in 1992. Between 1992 and 1995, she did research in distributed system design in the National Research Center for Computer Integrated Manufacturing Systems in Beijing. From 1997 to 2008, she was a senior data analyst in semiconductor fabrication at Micron Technology, Inc. Her research interests focuses on multiagent systems, mobile robot coordination, and reinforcement learning.

Tao MAO joined Thayer School of Engineering at Dartmouth College in 2008 and is currently pursuing a Ph.D. degree. He received his Bachelor’s degree in Electrical Engineering with first-class honors from Zhejiang University in 2008. His research interests include multiagent intelligent systems, machine learning, and reinforcement learning.

Laura RAY is a professor of engineering sciences at the Thayer School of Engineering, Dartmouth College. She received her B.S. degree with highest honors and Ph.D. in Mechanical and Aerospace Engineering from Princeton University and her M.S. degree in Mechanical Engineering from Stanford University. Her current research interests include control of multiagent systems, robot mobility and vehicle-terrain interaction, and field robotics.

Dongqing SHI is a research associate at Dartmouth College in the Department of Psychological and Brain Sciences. His research interests are primarily in the areas of social intelligence, artificial intelligence, and mobile robotics. He received his Ph.D. degree from Florida State University in 2006 and M.S. degree in Mechanical Engineering from Zhejiang University, China in 2002.

Jerald KRALIK is an assistant professor in the Department of Psychological and Brain Sciences at Dartmouth College. He received his B.S. degree in Zoology from Michigan State University and A.M. and Ph.D. degrees in Psychology from Harvard University. He also completed post-doctoral positions in behavioral neuroscience at the Duke University Medical Center and the National Institute of Mental Health. His research interests include animal cognition and behavior, cognitive neuroscience, and brain engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Mao, T., Ray, L. et al. Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning. J. Control Theory Appl. 9, 440–450 (2011). https://doi.org/10.1007/s11768-011-1047-6

Download citation

Received: 09 March 2011
Published: 19 July 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s11768-011-1047-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning

Abstract

Access this article

Similar content being viewed by others

Intelligent problem-solving as integrated hierarchical reinforcement learning

Abstraction of State-Action Space Utilizing Properties of the Body and Environment

Multi-agent Double Deep Q-Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning

Abstract

Access this article

Similar content being viewed by others

Intelligent problem-solving as integrated hierarchical reinforcement learning

Abstraction of State-Action Space Utilizing Properties of the Body and Environment

Multi-agent Double Deep Q-Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation