Skip to main content
Log in

Networked reinforcement learning

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Recently, many models of reinforcement learning with hierarchical or modular structures have been proposed. They decompose a task into simpler subtasks and solve them by using multiple agents. However, these models impose certain restrictions on the topological relations of agents and so on. By relaxing these restrictions, we propose networked reinforcement learning, where each agent in a network acts autonomously by regarding the other agents as a part of its environment. Although convergence to an optimal policy is no longer assured, by means of numerical simulations, we show that our model functions appropriately, at least in certain simple situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. MIT Press

  2. Bakker B, Schmidhuber J (2004) Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, pp 438–445

  3. Dayan P, Hinton GE (1993) Feudal reinforcement learning. Adv Neural Inf Process Syst 5:271–278

    Google Scholar 

  4. Dietterich TG (2000) Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 13:227–303

    MATH  MathSciNet  Google Scholar 

  5. Doya K, Samejima K, Katagiri K, et al (2002) Multiple model-based reinforcement learning. Neural Comput 14:1347–1369

    Article  MATH  Google Scholar 

  6. Parr R, Russell S (1998) Reinforcement learning with hierarchies of machines. Adv Neural Inf Process Syst 10:1043–1049

    Google Scholar 

  7. Singh SP (1992) Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8:323–339

    MATH  Google Scholar 

  8. Sutton RS, Precup D, Singh S (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif Intell 112:181–211

    Article  MATH  MathSciNet  Google Scholar 

  9. Cassandra AR, Kaelbling LP, Littman ML (1994) Acting optimally in partially observable stochastic domains. In: Proceedings of the Twelfth National Conference on Artificial Intelligence, pp 1023–1028

  10. Dolgov D, Durfee E (2004) Graphical models in local, asymmetric multi-agent Markov decision processes. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, pp 956–963

  11. Nair R, Varakantham P, Tambe M, et al (2005) Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp 133–139

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Makito Oku.

Additional information

This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008

About this article

Cite this article

Oku, M., Aihara, K. Networked reinforcement learning. Artif Life Robotics 13, 112–115 (2008). https://doi.org/10.1007/s10015-008-0565-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-008-0565-x

Key words

Navigation