Multistrategy Learning for Robot Behaviours

  • Claude Sammut
  • Tak Fai Yik
Part of the Studies in Computational Intelligence book series (SCI, volume 262)

Abstract

Pure reinforcement learning does not scale well to domains with many degrees of freedom and particularly to continuous domains. In this paper, we introduce a hybrid method in which a symbolic planner constructs an approximate solution to a control problem. Subsequently, a numerical optimisation algorithm is used to refine the qualitative plan into an operational policy. The method is demonstrated on the problem of learning a stable walking gait for a bipedal robot. We use this approach to illustrate the benefits of a multistrategy approach to robot learning.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apt, K.R., Wallace, M.: Constraint Logic Programming Using Eclipse. Cambridge University Press, Cambridge (2007)MATHGoogle Scholar
  2. 2.
    Benson, S., Nilsson, N.J.: Reacting, planning and learning in an autonomous agent. In: Furukawa, K., Michie, D., Muggleton, S. (eds.) Machine Intelligence, vol. 14. Oxford University Press, Oxford (1995)Google Scholar
  3. 3.
    Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 118–126. Morgan Kaufmann, San Francisco (1998)Google Scholar
  4. 4.
    Durrant-whyte, H., Bailey, T.: Simultaneous Localisation and Mapping (SLAM): Part I: The Essential Algorithms. Robotics and Automation Magazine 13, 99–110 (2006)CrossRefGoogle Scholar
  5. 5.
    Dzeroski, S., De Raedt, L., Blockeel, H.: Relational reinforcement learning. In: Page, D.L. (ed.) ILP 1998. LNCS (LNAI), vol. 1446. Springer, Heidelberg (1998)Google Scholar
  6. 6.
    Ferrein, A., Lakemeyer, G.: Logic-based robot control in highly dynamic domains. Robotics and Autonomous Systems 56(11), 980–991 (2008)CrossRefGoogle Scholar
  7. 7.
    Fikes, R., Nilsson, N.: STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)MATHCrossRefGoogle Scholar
  8. 8.
    Getoor, L., Taskar, B. (eds.): Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)MATHGoogle Scholar
  9. 9.
    Hengst, B.: Discovering Hierarchy in Reinforcement Learning with HEXQ. In: Sammut, C. (ed.) Proceedings of the International Conference on Machine Learning, Sydney (2002)Google Scholar
  10. 10.
    Hornby, G.S., Fujita, M., Takamura, S., Yamamoto, T., Hanagata, O.: Evolution of gaits with the sony quadruped robot. In: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  11. 11.
    Kim, M.S., Uther, W.: Automatic gait optimisation for quadruped robots. In: Australasian Conference on Robotics and Automation, Brisbane (2003)Google Scholar
  12. 12.
    Laird, J., Rosenbloom, P., Newell, A.: Soar: An Architecture for General Intelligence. Artificial Intelligence 33, 1–64 (1987)CrossRefGoogle Scholar
  13. 13.
    Langley, P., Choi, D.: A unified cognitive architecture for physical agents. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence. AAAI Press, Boston (2006)Google Scholar
  14. 14.
    Leonard, J.J., Durrant-whyte, H.F.: Simultaneous map building and localization for an autonomous mobilerobot. In: Intelligent Robots and Systems 1991. Intelligence for Mechanical Systems, Proceedings IROS 1991. IEEE/RSJ International Workshop, pp. 1442–1447 (1991)Google Scholar
  15. 15.
    Michalski, R.S.: LEARNING = INFERENCING + MEMORIZING: Basic Concepts of Inferential Theory of Learning and Their Use for Classifying Learning Processes. In: Chipman, S. (ed.) Cognitive Models of Learning (1992)Google Scholar
  16. 16.
    Michalski, R.S.: Inferential Theory of Learning as a Conceptual Basis for Multistrategy Learning. Machine Learning, Special Issue on Multistrategy Learning 11, 111–151 (1993)MathSciNetGoogle Scholar
  17. 17.
    Michalski, R.S.: Toward a Unified Theory of Learning: Multistrategy Task-adaptive Learning. In: Buchanan, B.G., Wikins, D.C. (eds.) Readings in Knowledge Acquisition and Learning: Automating the Construction and Improvement of Expert Systems. Morgan Kaufmann, San Mateo (1993)Google Scholar
  18. 18.
    Michalski, R.S.: Inferential Theory of Learning: Developing Foundations for Multistrategy Learning. In: Machine Learning: A Multistrategy Approach, vol. IV. Morgan Kaufmann Publishers, San Francisco (1994)Google Scholar
  19. 19.
    Michie, D., Chambers, R.A.: Boxes: An Experiment in Adaptive Control. In: Dale, E., Michie, D. (eds.) Machine Intelligence, vol. 2. Oliver and Boyd, Edinburgh (1968)Google Scholar
  20. 20.
    Mitchell, T.M., Keller, R.M., Kedar-Cabelli, S.T.: Explanation-Based Generalization: A Unifying View. Machine Learning 1(1), 47–80 (1986)Google Scholar
  21. 21.
    Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2619–2624 (2004)Google Scholar
  22. 22.
    Nardi, D., Riedmiller, M., Sammut, C., Santos-Victor, J. (eds.): RoboCup 2004. LNCS (LNAI), vol. 3276. Springer, Heidelberg (2005)Google Scholar
  23. 23.
    Ogino, M., Katoh, Y., Asada, M., Hosoda, K.: Vision-Based Reinforcement Learning for Humanoid Behavior Generation with Rhythmic Walking Parameters. In: Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, pp. 1665–1671 (2003)Google Scholar
  24. 24.
    Ogino, M., Hosoda, K., Asada, M.: Learning Energy Efficient Walking with Ballistic Walking. In: 2nd International Symposium on Adaptive Motion of Animals and Machines (2003)Google Scholar
  25. 25.
    Ramos, F.T., Durrant-Whyte, H.F., Upcroft, B.: Learning Articulated Motion Structures with Bayesian Networks. In: 8th International Conference on Information Fusion, Philadelphia (2005)Google Scholar
  26. 26.
    Ryan, M.R.K.: Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies. In: Sammut, C. (ed.) Proceedings of The 19th International Conference on Machine Learning, Sydney (2002)Google Scholar
  27. 27.
    Sammut, C.A., Hume, D.V.: Observation and Generalisation in a Simulated Robot World. In: Proceedings of the Fourth International Machine Learning Workshop, Los Altos, California (1987)Google Scholar
  28. 28.
    Sammut, C., Hengst, B.: The Evolution of a Robot Soccer Team. In: Jarvis, R.A., Zelinksky, A. (eds.) Robotics Research: The Tenth International Conference, pp. 517–529. Springer, Heidelberg (2003)Google Scholar
  29. 29.
    Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement Learning for RoboCup-Soccer Keepaway. Adaptive Behavior 13(3), 165–188 (2005)CrossRefGoogle Scholar
  30. 30.
    Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)MATHGoogle Scholar
  31. 31.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  32. 32.
    Watkins, C.J.C.H.: Learning with Delayed Rewards. Ph.D. Dissertation, Psychology Department, University of Cambridge, England (1989)Google Scholar
  33. 33.
    Wyeth, G., Kee, D., Yik, T.F.: Evolving a Locus Based Gait for a Humanoid Robot. In: International Conference on Robotics and Intelligent Systems (2003)Google Scholar
  34. 34.
    Yik, T.K.: Locomotion of Bipedal Humanoid Robots: Planning and Learning to Walk. Ph.D. Dissertation, School of Computer Science and Engineering, Universty of New South Wales (2008)Google Scholar
  35. 35.
    Zhou, C., Yue, P.K., Ni, J., Chan, S.-B.: Dynamically stable gait planning for a humanoid robot to climb sloping surface. In: Proceedings of IEEE Conference on Robotics, Automation and Mechatronics, pp. 341–346 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Claude Sammut
    • 1
  • Tak Fai Yik
    • 1
  1. 1.ARC Centre of Excellence for Autonomous Systems, School of Computer Science and EngineeringUniversity of New South WalesSydneyAustralia

Personalised recommendations