Modular Reinforcement Learning: An Application to a Real Robot Task

Kalmár, Zsolt; Szepesvári, Csaba; Lorincz, András

doi:10.1007/3-540-49240-2_3

Zsolt Kalmár³,
Csaba Szepesvári⁴ &
András Lorincz⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1545))

Included in the following conference series:

European Workshop on Learning Robots

350 Accesses
5 Citations

Abstract

The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, “approximately” Markovian task, which is completely observable too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the “module-level” that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279–303, 1996.
Google Scholar
A. Barto, S. J. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 1(72):81–138, 1995.
Article Google Scholar
R. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957.
Google Scholar
R. Brooks. Elephants don’t play chess. In Designing Autonomous Agents. Bradford-MIT Press, 1991.
Google Scholar
T. Jaakkola, M. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185–1201, November 1994.
Google Scholar
Z. Kalmár, C. Szepesvari, and A. Lorincz. Generalization in an autonomous agent. In Proc. of IEEE WCCI IGNN’94, volume 3, pages 1815–1817, Orlando, Florida, June 1994. IEEE Inc.
Google Scholar
Z. Kalmar, C. Szepesvári, and A. Lorincz. Generalized dynamic concept model as a route to construct adaptive autonomous agents. Neural Network World, 5:353–360, 1995.
Google Scholar
Z. Kalmar, C. Szepesvári, and A. Lorincz. Module based reinforcement learning: Experiments with a real robot. Machine Learning, 31:55–85, 1998. joint special issue on “Learning Robots” with the J. of Autonomous Robots;.
Article MATH Google Scholar
M. Littman. Algorithms for Sequential Decision Making. PhD thesis, Department of Computer Science, Brown University, February 1996. Also Technical Report CS-96-09.
Google Scholar
M. Littman and C. Szepesvári. A Generalized Reinforcement Learning Model: Convergence and applications. In Int. Conf. on Machine Learning, pages 310–318, 1996.
Google Scholar
M. L. Littman, A. Cassandra, and L. P. Kaelbling. Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362–370, San Francisco, CA, 1995. Morgan Kaufmann.
Google Scholar
P. Maes. A bottom-up mechanism for behavior selection in an artificial creature. In J. Meyer and S. Wilson, editors, Proc. of the First International Conference on Simulation of Adaptive Behavior. MIT Press, 1991.
Google Scholar
S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311–365, 1992.
Article Google Scholar
M. Mataric. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4, 1997.
Google Scholar
S. Ross. Applied Probability Models with Optimization Applications. Holden Day, San Francisco, California, 1970.
MATH Google Scholar
S. Singh, T. Jaakkola, and M. Jordan. Learning without state-estimation in partially observable Markovian decision processes. In Proc. of the Eleventh Machine Learning Conference, pages pp. 284–292, 1995.
Google Scholar
R. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, MA, 1984.
Google Scholar
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8, 1996.
Google Scholar
C. Szepesvári and M. Littman. A unified analysis of value-function-based reinforcement-learning algorithms. Neural Computation, 1997. submitted.
Google Scholar
C. Szepesvári and A. Lorincz. Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts. Adaptive Behavior, 2(2): 131–160, 1994.
Article Google Scholar
S. Thrun. The role of exploration in learning control. Van Nostrand Rheinhold, Florence KY, 1992.
Google Scholar
J. Tsitsiklis and B. Van Roy. An analysis of temporal difference learning with function approximation. Technical Report LIDS-P-2322, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, 1995.
Google Scholar
J. N. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59–94, 1996.
MATH Google Scholar
E. Uchibe, M. Asada, and K. Hosoda. Behavior coordination for a mobile robot using modular reinforcement learning. In Proc. of IEEE/RSJ Int. Conf. on Intelligent Robot and Sytems, pages 1329–1336, 1996.
Google Scholar
C. Watkins and P. Dayan. Q-learning. Machine Learning, 3(8):279–292, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Informatics JATE, Szeged, Aradi vrt. tere 1, Hungary, H-6720
Zsolt Kalmár
Research Group on Art. Int., JATE, Szeged, Aradi vrt. tere 1, Hungary, H-6720
Csaba Szepesvári
Dept. of Chemical Physics, Inst. of Isotopes, HAS, Budapest, P.O. Box 77, Hungary, H-1525
András Lorincz

Authors

Zsolt Kalmár
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Szepesvári
View author publications
You can also search for this author in PubMed Google Scholar
András Lorincz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, B-1050, Brussels, Belgium
Andreas Birk
Department of Artificial Intelligence, University of Edinburgh, 5 Forrest Hill, Edinburgh, EH1 2QL, Scotland, UK
John Demiris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalmár, Z., Szepesvári, C., Lorincz, A. (1998). Modular Reinforcement Learning: An Application to a Real Robot Task. In: Birk, A., Demiris, J. (eds) Learning Robots. EWLR 1997. Lecture Notes in Computer Science(), vol 1545. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49240-2_3

Download citation

DOI: https://doi.org/10.1007/3-540-49240-2_3
Published: 09 June 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65480-3
Online ISBN: 978-3-540-49240-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics