Skip to main content
Log in

Reinforcement learning of iterative behaviour with multiple sensors

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Reinforcement learning allows an agent to be both reactive and adaptive, but it requires a simple yet consistent representation of the task environment. In robotics this representation is the product of perception. Perception is a powerful simplifying mechanism because it ignores much of the complexity of the world by mapping multiple world states to each of a few representational states. The constraint of consistency conflicts with simplicity, however. A consistent representation distinguishes world states that have distinct utilities, but perception systems with sufficient acuity to do this tend to also make many unnecessary distinctions.

In this paper we discuss reinforcement learning and the problem of appropriate perception. We then investigate a method for dealing with the problem, called theLion algorithm [1], and show that it can be used to reduce complexity by decomposing perception. The Lion algorithm does not allow iterative rules to be learned, and we describe modifications that overcome this limitation. We present experimental results that demonstrate their effectiveness in further reducing complexity. Finally, we mention some related research, and conclude with suggestions for further work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Stephen D. Whitehead and Dana H. Ballard, “Learning to perceive and act by trial and error”,Machine Learning, 7(1):45–83, July 1991.

    Google Scholar 

  2. David Chapman, “Planning for conjunctive goals,”Artificial Intelligence, 32:333–377, 1987.

    Article  Google Scholar 

  3. Lonnie Chrisman and Reid Simmons, “Sensible planning: Focusing perceptual attention,” inProc. 9th Nat. (USA) Conf. on AI, pages 756–761, Menlo Park, CA, July 1991. MIT Press.

    Google Scholar 

  4. R. James Firby, “An investigation into reactive planning in complex domains,” inProc. 6th Nat. Conf. on Artificial Intelligence, pages202–206, San Mateo, CA, July 1987. Morgan Kaufmann.

    Google Scholar 

  5. Michael P. Georgeffand Amy L. Lansky, “Reactive reasoning and planning,” inProc. 6th Nat. Conf. on Artificial Intelligence, pages677–682, San Mateo, CA, July 1987. Morgan Kaufmann.

    Google Scholar 

  6. Philip E. Agre and David Chapman, “Pengi: An implementation of a theory of action,” inProc. 6th Nat. (USA) Conf. on AI, pages 268–272, San Mateo, CA, July 1987. Morgan Kaufmann.

    Google Scholar 

  7. Rodney A. Brooks, “A robust layered control system for a mobile robot,”IEEE Journal of Robotics and Automation, RA-2(1):14–23, March 1986.

    Google Scholar 

  8. Stanley J. Rosenschein, “Formal theories of knowledge in AI and robotics,”New Generation Computing, 3(4):345–357, 1985.

    Google Scholar 

  9. Rodney A. Brooks, “Intelligence without reason,” inProc. 12th Int. Joint Conf. on AI, pages 569–595, San Mateo, CA, 1991. Morgan Kaufmann.

    Google Scholar 

  10. Rodney A. Brooks, “Elephants don't play chess,” in Pattie Maes, editor,Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back, pages 3–15. MIT Press, Cambridge, MA, 1990.

    Google Scholar 

  11. Rodney A. Brooks, “Challenges for complete creature architectures,” in Jean-Arcady Meyer and Stewart W. Wilson, editors,From Animals to Animats: Proc. 1st Int. Conf. on Simulation of Adaptive Behaviour, pages 434–443, Cambridge, MA, 1991, MIT Press.

    Google Scholar 

  12. Sridhar Mahadevan and Jonathan Connell, “Automatic programming of behaviour-based robots using reinforcementlearning,”Artificial Intelligence, 55:311–365, 1992.

    Google Scholar 

  13. Doyle J, “Rationality and its role in reasoning,” inProc. 8th Nat. (USA) Conf. on AI, AAAI-90, pages 1093–1100, Cambridge, MA, 1990. MIT Press.

    Google Scholar 

  14. Pattie Maes and Rodney A. Brooks, “Learning to coordinate behaviours,” inProc. 8th Nat. (USA) Conf. on AI, pages 796–802, Menlo Park, CA, July 1990. MIT Press.

    Google Scholar 

  15. Richard S. Sutton, “Learning to predict by the methods of temporal difference,”Machine Learning, 3:9–43, 1988.

    Google Scholar 

  16. Christopher J. C. H. Watkins, “Learning from Delayed Rewards,” PhD thesis, Department of Computer Science, University of Cambridge, Cambridge, U.K., 1989.

    Google Scholar 

  17. Richard Bellman and Stuart Dreyfus, “Applied Dynamic Programming,” Princeton University Press, Princeton, NJ, 1962.

    Google Scholar 

  18. Christopher J. C. H. Watkins and Peter Dayan, “Q-learning,”Machine Learning, 8(3/4):279–292, 1991.

    Google Scholar 

  19. Douglas B. Lenat and John Seely Brown, “Why AM and EURISKO appear to work,”Artificial Intelligence, 23:269–294, 1984.

    Google Scholar 

  20. Matthew T. Mason, “Kicking the sensing habit,”AI Magazine, 14(1):58–59, 1993.

    Google Scholar 

  21. David P. Millar, “A twelve step program to more efficient robots,”AI Magazine, 14(1):60–63, 1993.

    Google Scholar 

  22. Long-Ji Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,”Machine Learning, 8(3/4):293–321, 1991.

    Google Scholar 

  23. Ming Tan, “Cost-sensitive reinforcement learning for adaptive classification and control,” inProc. 9th Nat. Conf. on AI, pages 774–780, Menlo Park, CA, July 1991. MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Piggott, P., Sattar, A. Reinforcement learning of iterative behaviour with multiple sensors. Appl Intell 4, 351–365 (1994). https://doi.org/10.1007/BF00872474

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00872474

Keywords

Navigation