Abstract
Reinforcement learning allows an agent to be both reactive and adaptive, but it requires a simple yet consistent representation of the task environment. In robotics this representation is the product of perception. Perception is a powerful simplifying mechanism because it ignores much of the complexity of the world by mapping multiple world states to each of a few representational states. The constraint of consistency conflicts with simplicity, however. A consistent representation distinguishes world states that have distinct utilities, but perception systems with sufficient acuity to do this tend to also make many unnecessary distinctions.
In this paper we discuss reinforcement learning and the problem of appropriate perception. We then investigate a method for dealing with the problem, called theLion algorithm [1], and show that it can be used to reduce complexity by decomposing perception. The Lion algorithm does not allow iterative rules to be learned, and we describe modifications that overcome this limitation. We present experimental results that demonstrate their effectiveness in further reducing complexity. Finally, we mention some related research, and conclude with suggestions for further work.
Similar content being viewed by others
References
Stephen D. Whitehead and Dana H. Ballard, “Learning to perceive and act by trial and error”,Machine Learning, 7(1):45–83, July 1991.
David Chapman, “Planning for conjunctive goals,”Artificial Intelligence, 32:333–377, 1987.
Lonnie Chrisman and Reid Simmons, “Sensible planning: Focusing perceptual attention,” inProc. 9th Nat. (USA) Conf. on AI, pages 756–761, Menlo Park, CA, July 1991. MIT Press.
R. James Firby, “An investigation into reactive planning in complex domains,” inProc. 6th Nat. Conf. on Artificial Intelligence, pages202–206, San Mateo, CA, July 1987. Morgan Kaufmann.
Michael P. Georgeffand Amy L. Lansky, “Reactive reasoning and planning,” inProc. 6th Nat. Conf. on Artificial Intelligence, pages677–682, San Mateo, CA, July 1987. Morgan Kaufmann.
Philip E. Agre and David Chapman, “Pengi: An implementation of a theory of action,” inProc. 6th Nat. (USA) Conf. on AI, pages 268–272, San Mateo, CA, July 1987. Morgan Kaufmann.
Rodney A. Brooks, “A robust layered control system for a mobile robot,”IEEE Journal of Robotics and Automation, RA-2(1):14–23, March 1986.
Stanley J. Rosenschein, “Formal theories of knowledge in AI and robotics,”New Generation Computing, 3(4):345–357, 1985.
Rodney A. Brooks, “Intelligence without reason,” inProc. 12th Int. Joint Conf. on AI, pages 569–595, San Mateo, CA, 1991. Morgan Kaufmann.
Rodney A. Brooks, “Elephants don't play chess,” in Pattie Maes, editor,Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back, pages 3–15. MIT Press, Cambridge, MA, 1990.
Rodney A. Brooks, “Challenges for complete creature architectures,” in Jean-Arcady Meyer and Stewart W. Wilson, editors,From Animals to Animats: Proc. 1st Int. Conf. on Simulation of Adaptive Behaviour, pages 434–443, Cambridge, MA, 1991, MIT Press.
Sridhar Mahadevan and Jonathan Connell, “Automatic programming of behaviour-based robots using reinforcementlearning,”Artificial Intelligence, 55:311–365, 1992.
Doyle J, “Rationality and its role in reasoning,” inProc. 8th Nat. (USA) Conf. on AI, AAAI-90, pages 1093–1100, Cambridge, MA, 1990. MIT Press.
Pattie Maes and Rodney A. Brooks, “Learning to coordinate behaviours,” inProc. 8th Nat. (USA) Conf. on AI, pages 796–802, Menlo Park, CA, July 1990. MIT Press.
Richard S. Sutton, “Learning to predict by the methods of temporal difference,”Machine Learning, 3:9–43, 1988.
Christopher J. C. H. Watkins, “Learning from Delayed Rewards,” PhD thesis, Department of Computer Science, University of Cambridge, Cambridge, U.K., 1989.
Richard Bellman and Stuart Dreyfus, “Applied Dynamic Programming,” Princeton University Press, Princeton, NJ, 1962.
Christopher J. C. H. Watkins and Peter Dayan, “Q-learning,”Machine Learning, 8(3/4):279–292, 1991.
Douglas B. Lenat and John Seely Brown, “Why AM and EURISKO appear to work,”Artificial Intelligence, 23:269–294, 1984.
Matthew T. Mason, “Kicking the sensing habit,”AI Magazine, 14(1):58–59, 1993.
David P. Millar, “A twelve step program to more efficient robots,”AI Magazine, 14(1):60–63, 1993.
Long-Ji Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,”Machine Learning, 8(3/4):293–321, 1991.
Ming Tan, “Cost-sensitive reinforcement learning for adaptive classification and control,” inProc. 9th Nat. Conf. on AI, pages 774–780, Menlo Park, CA, July 1991. MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Piggott, P., Sattar, A. Reinforcement learning of iterative behaviour with multiple sensors. Appl Intell 4, 351–365 (1994). https://doi.org/10.1007/BF00872474
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00872474