Machine Learning

, Volume 21, Issue 3, pp 199–233 | Cite as

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

  • Andrew W. Moore
  • Christopher G. Atkeson
Article

Abstract

Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that neither planning nor exploration occurs uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a real-valued criterion. Many simulated problems have been rested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.

Keywords

Reinforcement Learning Curse of Dimensionality Learning Control Robotics kd-trees 

References

  1. Akian, M., Chancelier, J.P. & Quadrat, J.P., (1988). Dynamic Programming Complexity and Application. InProceedings of the 27th Conference on Decision and Control, Austin, Texas.Google Scholar
  2. Arcilla, A.S., Hauser, J., Eiseman, P.R. & Thompson, J.F., (1991).Numerical Grid Generation in Computational Fluid Dynamics and Related Fields. North-Holland.Google Scholar
  3. Barto, A.G., Bradtke, S.J. & Singh, S.P., (1994). Real-time Learning and Control using Asynchronous Dynamic Programming.AI Journal, to appear (also published as UMass Amherst Technical Report 91-57 in 1991).Google Scholar
  4. Barto, A.G., Sutton, R.S. & Anderson, C.W., (1983). Neuronlike Adaptive elements that that can learn difficult Control Problems.IEEE Trans. on Systems Man and Cybernetics, 13(5):835–846.Google Scholar
  5. Bellman, R.E., (1957).Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
  6. Bertsekas, D.P. & Tsitsiklis, J.N., (1989).Parallel and Distributed Computation. Prentice Hall.Google Scholar
  7. Brooks, R.A. & Lozano-Perez, T., (1983). A Subdivision Algorithm in Configuration Space for Findpath with rotation. InProceedings of the 8th International Conference on Artifical Intelligence.Google Scholar
  8. Chapman, D. & Kaelbling, L.P., (1991). Learning from Delayed Reinforcement In a Complex Domain. Technical Report, Teleos Research.Google Scholar
  9. Chow, C.S., (1990). Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems. Technical report, M.I.T. Laboratory for Information and Decision Sciences.Google Scholar
  10. Dayan, P. & Hinton, G.E., (1993). Feudal Reinforcement Learning. In S. J. Hanson, J. D Cowan, and C. L. Giles, editors,Advances in Neural Information Processing Systems 5. Morgan Kaufmann.Google Scholar
  11. Hoppe, R. H. W., (1986). Multi-Grid Methods for Hamilton-Jacobi-Bellman Equations.Numerical Mathematics, 49.Google Scholar
  12. Kaelbling, L. (1993). Hierarchicial Learning in Stochastic Domains: Preliminary Results. InMachine Learning: Proceedings of the Tenth International Workshop. Morgan Kaufmann.Google Scholar
  13. Kaelbling, L.P., (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR-90-04, Stanford University, Department of Computer Science, June 1990.Google Scholar
  14. Kambhampati, Subbarao & Davis, Larry S., (1986). Multiresolution Path Planning for Mobile Robots.IEEE Journal of Robotics and Automation, Vol. RA-2, No. 3, 2(3).Google Scholar
  15. Knuth, D.E., (1973).Sorting and Searching. Addison Wesley.Google Scholar
  16. Koenig, S. & Simmons, R.G. (1993). Complexity Analysis of Reinforcement Learning. InProceedings of the Eleventh International Conference on Artificial Intelligence (AAAI-93). MIT Press.Google Scholar
  17. Latombe, J. (1991).Robot Motion Planning. Kluwer.Google Scholar
  18. McCormick, S.F., (1989).Multilevel Adaptive Methods for Partial Differential Equations. SIAM.Google Scholar
  19. Michie, D. & Chambers, R.A., (1968). BOXES: An Experiment in Adaptive Control. In E. Dale and D. Michie, editors,Machine Intelligence 2. Oliver and Boyd.Google Scholar
  20. Moore, A.W., (1991). Variable Resolution Dynamic Programming: Efficiently Learning Action Maps in Multivariate Real-valued State-spaces. In L. Birnbaum and G. Collins, editors,Machine Learning: Proceedings of the Eighth International Workshop. Morgan Kaufmann.Google Scholar
  21. Moore, A.W. & Atkeson, C.G., (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time.Machine Learning, 13.Google Scholar
  22. Nilsson, N.J., (1971).Problem-solving Methods in Artificial Intelligence. McGraw Hill.Google Scholar
  23. Peng, J. & Williams, R.J., (1993). Efficient Learning and Planning Within the Dyna Framework. InProceedings of the Second International Conference on Simulation of Adaptive Behavior. MIT Press.Google Scholar
  24. Sage, A.P. & White, C.C., (1977).Optimum Systems Control. Prentice Hall.Google Scholar
  25. Schaal, S. & Atkeson, C.G., (1994). Assessing the Quality of Local Linear Models. InAdvances in Neural Information Processing Systems 6. Morgan Kaufmann.Google Scholar
  26. Simons, J., Van Brussel, H., De Schutter, J. & Verhaert, J. (1982). A Self-Learning Automaton with Variable Resolution for High Precision Assembly by Industrial Robots.IEEE Trans. on Automatic Control, 27(5):1109–1113.Google Scholar
  27. Sutton, R.S., (1984). Temporal Credit Assignment in Reinforcement Learning. Phd. thesis, University of Massachusetts, Amherst.Google Scholar
  28. Sutton, R.S., (1990). Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. InProceedings of the 7th International Conference on Machine Learning. Morgan Kaufmann.Google Scholar
  29. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD. Thesis, King's College, University of Cambridge.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Andrew W. Moore
    • 1
  • Christopher G. Atkeson
    • 2
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburgh
  2. 2.Georgia Institute of TechnologyCollege of ComputingAtlanta

Personalised recommendations