Advertisement

Artificial Intelligence Review

, Volume 11, Issue 1–5, pp 75–113 | Cite as

Locally Weighted Learning for Control

  • Christopher G. Atkeson
  • Andrew W. Moore
  • Stefan Schaal
Article

Abstract

Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.

locally weighted regression LOESS LWR lazy learning memory-based learning least commitment learning forward models inverse models linear quadratic regulation (LQR) shifting setpoint algorithm dynamic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D. W. & Salzberg, S. L. (1993). Learning to catch: Applying nearest neighbor algorithms to dynamic control tasks. In Proceedings of the Fourth International Workshop on Artificial Intelligence and Statistics, pp. 363–368, Ft. Lauderdale, FL.Google Scholar
  2. Albus, J. S. (1981). Brains, Behaviour and Robotics. BYTE Books, McGraw-Hill.Google Scholar
  3. Atkeson, C. G. (1990). Using local models to control movement. In Touretzky, D. S. (ed.), Advances in Neural Information Processing Systems 2, pp. 316–323. Morgan Kaufmann, San Mateo, CA.Google Scholar
  4. Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In Hanson, S. J., Cowan, J. D. & Giles, C. L. (eds.), Advances in Neural Information Processing Systems 6, pp. 663–670. Morgan Kaufmann, San Mateo, CA.Google Scholar
  5. Atkeson, C. G., Moore, A. W. & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, this issue.Google Scholar
  6. Barto, A. G., Sutton, R. S. & Watkins, C. J. C. H. (1990). Learning and Sequential Decision Making. In Gabriel, M. & Moore, J. W. (eds.), Learning and Computational Neuroscience, pp. 539–602. MIT Press, Cambridge, MA.Google Scholar
  7. Barto, A. G., Bradtke, S. J. & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence 72(1): 81–138.Google Scholar
  8. Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
  9. Bertsekas, D. P. & Tsitsiklis, J. N. (1989). Parallel and Distributed Computation. Prentice Hall.Google Scholar
  10. Cannon, R. H. (1967). Dynamics of Physical Systems. McGraw-Hill.Google Scholar
  11. Cohn, D. A., Ghahramani, Z. & Jordan, M. I. (1995). Active learning with statistical models. In Tesauro, G., Touretzky, D. & Leen, T. (eds.), Advances in Neural Information Processing Systems 7. MIT Press.Google Scholar
  12. Connell, M. E. & Utgoff, P. E. (1987). Learning to control a dynamic physical system. In Sixth National Conference on Artificial Intelligence, pp. 456–460, Seattle, WA. Morgan Kaufmann, San Mateo, CA.Google Scholar
  13. Conte, S. D. & De Boor, C. (1980). Elementary Numerical Analysis, McGraw Hill.Google Scholar
  14. Deng, K. & Moore, A. W. (1995). Multiresolution Instance-based Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1233–1239. Morgan Kaufmann.Google Scholar
  15. Friedman, J. H. & Stuetzle, W. (1981). Projection Pursuit Regression. Journal of the American Statistical Association, 76(376): 817–823.Google Scholar
  16. Grosse, E. (1989). LOESS: Multivariate Smoothing by Moving Least Squares. In C. K. Chul, L. L. S. & Ward, J. D. (eds.), Approximation Theory VI. Academic Press.Google Scholar
  17. Hastie, T. & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science 8(2): 120–143.Google Scholar
  18. Jordan, M. I. & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In Touretzky, D. (ed.), Advances in Neural Information Processing Systems 2, pp. 324–331. Morgan Kaufmann, San Mateo, CA.Google Scholar
  19. Jordan, M. I. & Rumelhart, D. E. (1992). Forward Models: Supervised Learning with a Distal Teacher. Cognitive Science 16: 307–354.Google Scholar
  20. Kaelbling, L. P. (1993). Learning in Embedded Systems. MIT Press, Cambridge, MA.Google Scholar
  21. Kuperstein, M. (1988). Neural Model of Adaptive Hand-Eye Coordination for Single Postures. Science 239: 1308–3111.Google Scholar
  22. MacKay, D. J. C. (1992). Bayesian Model Comparison and Backprop Nets. In Moody, J. E., Hanson, S. J. & Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4, pp. 839–846. Morgan Kaufmann, San Mateo, CA.Google Scholar
  23. Mahadevan, S. (1992). Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions. In Machine Learning: Proceedings of the Ninth International Conference, pp. 290–299. Morgan Kaufmann.Google Scholar
  24. Maron, O. & Moore, A. (1994). Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation. In Advances in Neural Information Processing Systems 6, pp. 59–66. Morgan Kaufmann, San Mateo, CA.Google Scholar
  25. McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Prieditis and Russell (1995), pp. 387–395.Google Scholar
  26. Mel, B. W. (1989). MURPHY: A Connectionist Approach to Vision-Based Robot Motion Planning. Technical Report CCSR–89–17A, University of Illinois at Urbana-Champaign.Google Scholar
  27. Miller, W. T. (1989). Real-Time Application of Neural Networks for Sensor-Based Control of Robots with Vision. IEEE Transactions on Systems, Man and Cybernetics 19(4): 825–831.Google Scholar
  28. Moore, A. W. (1990). Acquisition of Dynamic Control Knowledge for a Robotic Manipulator. In Proceedings of the 7th International Conference on Machine Learing, pp. 244–252. Morgan Kaufmann.Google Scholar
  29. Moore, A. W. (1991a). Knowledge of Knowledge and Intelligent Experimentation for Learning Control. In Proceedings of the 1991 Seattle International Joint Conference on Neural Networks.Google Scholar
  30. Moore, A. W. (1991b). Variable Resolution Dynamic Programming: Efficiently Learning Action Maps in Multivariate Real-valued State-spaces. In Birnbaum, L. & Collins, G. (eds.), Machine Learning: Proceedings of the Eighth International Workshop, pp. 333–337. Morgan Kaufmann.Google Scholar
  31. Moore, A. W. (1992). Fast, Robust Adaptive Control by Learning only Forward Models. In Moody, J. E., Hanson, S. J. & Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4, pp. 571–578. Morgan Kaufmann, San Mateo, CA.Google Scholar
  32. Moore, A. W. & Atkeson, C. G. (1993). Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time. Machine Learning 13: 103–130.Google Scholar
  33. Moore, A. W., Hill, D. J. & Johnson, M. P. (1992). An Empirical Investigation of Brute Force to Choose Features, Smoothers and Function Approximators. In Hanson, S., Judd, S. & Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press.Google Scholar
  34. Moore, A. W. & Lee, M. S. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Proceedings of the 11th International Conference on Machine Learning, pp. 190–198. Morgan Kaufmann.Google Scholar
  35. Moore, A. W. & Schneider, J. (1995). Memory-Based Stochastic Optimization. In Proceedings of Neural Information Processing Systems Conference.Google Scholar
  36. Omohundro, S. M. (1987). Efficient Algorithms with Neural Network Behaviour. Journal of Complex Systems 1(2): 273–347.Google Scholar
  37. Omohundro, S. M. (1991). Bumptrees for Efficient Function, Constraint, and Classification Learning. In Lippmann, R. P., Moody, J. E. & Touretzky, D. S. (eds.), Advances in Neural Information Processing Systems 3, pp. 693–699. Morgan Kaufmann, San Mateo, CA.Google Scholar
  38. Ortega, J. M. & Rheinboldt, W. C. (1970). Iterative Solution of Nonlinear Equations in Several Variables. Academic Press.Google Scholar
  39. Peng, J. (1995). Efficient memory-based dynamic programming. In Prieditis and Russell (1995), pp. 438–446.Google Scholar
  40. Peng, J. & Williams, R. J. (1993). Efficient Learning and Planning Within the Dyna Framework. In Proceedings of the Second International Conference on Simulation of Adaptive Behavior. MIT Press.Google Scholar
  41. Pomerleau, D. (1994). Reliability estimation for neural network based autonomous driving. Robotics and Autonomous Systems, 12.Google Scholar
  42. Preparata, F. P. & Shamos, M. (1985). Computational Geometry. Springer-Verlag.Google Scholar
  43. Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1988). Numerical Recipes in C. Cambridge University Press, New York, NY.Google Scholar
  44. Prieditis, A. & Russell, S. (eds.) (1995). Twelfth International Conference on Machine Learning, Tahoe City, CA. Morgan Kaufmann, San Mateo, CA.Google Scholar
  45. Quinlan, J. R. (1993). Combining Instance-Based and Model-Based Learning. In Machine Learning: Proceedings of the Tenth International Conference, pp. 236–243. Morgan Kaufmann.Google Scholar
  46. Schaal, S. & Atkeson, C. (1994a). Robot Juggling: An Implementation of Memory-based Learning. Control Systems Magazine 14(1): 57–71.Google Scholar
  47. Schaal, S. & Atkeson, C. G. (1994b). Assessing the Quality of Local Linear Models. In Cowan, J. D., Tesauro, G. & Alspector, J. (eds.), Advances in Neural Information Processing Systems 6, pp. 160–167. Morgan Kaufmann.Google Scholar
  48. Stanfill, C. & Waltz, D. (1986). Towards Memory-Based Reasoning. Communications of the ACM 29(12): 1213–1228.Google Scholar
  49. Stengel, R. F. (1986). Stochastic Optimal Control. John Wiley and Sons.Google Scholar
  50. Sutton, R. S. (1988). Learning to Predict by the Methods of Temporal Differences. Machine Learning 3: 9–44.Google Scholar
  51. Sutton, R. S. (1990). Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In Proceedings of the 7th International Conference on Machine Learning, pp. 216–224. Morgan Kaufmann.Google Scholar
  52. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD. Thesis, King's College, University of Cambridge.Google Scholar
  53. Zografski, Z. (1992). Geometric and neuromorphic learning for nonlinear modeling, control and forecasting. In Proceedings of the 1992 IEEE International Symposium on Intelligent Control, pp. 158–163. Glasgow, Scotland. IEEE catalog number 92CH3110–4.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Christopher G. Atkeson
    • 1
    • 2
  • Andrew W. Moore
    • 3
  • Stefan Schaal
    • 1
    • 2
  1. 1.College of ComputingGeorgia Institute of TechnologyAtlanta
  2. 2.ATR Human Information Processing Research LaboratoriesKyotoJapan
  3. 3.Carnegie Mellon UniversityPittsburgh

Personalised recommendations