Abstract
Recently, many models of reinforcement learning with hierarchical or modular structures have been proposed. They decompose a task into simpler subtasks and solve them by using multiple agents. However, these models impose certain restrictions on the topological relations of agents and so on. By relaxing these restrictions, we propose networked reinforcement learning, where each agent in a network acts autonomously by regarding the other agents as a part of its environment. Although convergence to an optimal policy is no longer assured, by means of numerical simulations, we show that our model functions appropriately, at least in certain simple situations.
Similar content being viewed by others
References
Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. MIT Press
Bakker B, Schmidhuber J (2004) Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, pp 438–445
Dayan P, Hinton GE (1993) Feudal reinforcement learning. Adv Neural Inf Process Syst 5:271–278
Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 13:227–303
Doya K, Samejima K, Katagiri K, et al (2002) Multiple model-based reinforcement learning. Neural Comput 14:1347–1369
Parr R, Russell S (1998) Reinforcement learning with hierarchies of machines. Adv Neural Inf Process Syst 10:1043–1049
Singh SP (1992) Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8:323–339
Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112:181–211
Cassandra AR, Kaelbling LP, Littman ML (1994) Acting optimally in partially observable stochastic domains. In: Proceedings of the Twelfth National Conference on Artificial Intelligence, pp 1023–1028
Dolgov D, Durfee E (2004) Graphical models in local, asymmetric multi-agent Markov decision processes. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, pp 956–963
Nair R, Varakantham P, Tambe M, et al (2005) Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp 133–139
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008
About this article
Cite this article
Oku, M., Aihara, K. Networked reinforcement learning. Artif Life Robotics 13, 112–115 (2008). https://doi.org/10.1007/s10015-008-0565-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-008-0565-x