Machine Learning

, Volume 28, Issue 1, pp 77–104

CHILD: A First Step Towards Continual Learning

  • Mark B. Ring
Article

Abstract

Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.

Continual learning transfer reinforcement learning sequence learning hierarchical neural networks 

References

  1. Albus, J. S. (1979). Mechanisms of planning and problem solving in the brain. Mathematical Biosciences, 45:247–293.CrossRefGoogle Scholar
  2. Barr, A. & Feigenbaum, E. A. (1981). The Handbook of Artificial Intelligence, volume 1. Los Altos, California: William Kaufmann, Inc.Google Scholar
  3. Baxter, J. (1995). Learning model bias. Technical Report NC-TR-95-46, Royal Holloway College, University of London, Department of Computer Science.Google Scholar
  4. Caruana, R. (1993). Multitask learning: A knowledge-based source of inductive bias. In Machine Learning: Proceedings of the tenth International Conference, pages 41–48. Morgan Kaufmann Publishers.Google Scholar
  5. Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the National Conference on Artificial Intelligence (AAAI-92). Cambridge, MA: AAAI/MIT Press.Google Scholar
  6. Dawkins, R. (1976). Hierarchical organisation: a candidate principle for ethology. In Bateson, P. P. G. and Hinde, R. A., editors, Growing Points in Ethology, pages 7–54. Cambridge: Cambridge University Press.Google Scholar
  7. Dayan, P. & Hinton, G. E. (1993). Feudal reinforcement learning. In Giles, C. L., Hanson, S. J., and Cowan, J. D., editors, Advances in Neural Information Processing Systems 5, pages 271–278. San Mateo, California: Morgan Kaufmann Publishers.Google Scholar
  8. Drescher, G. L. (1991). Made-Up Minds: A Constructivist Approach to Artificial Intelligence. Cambridge, Massachusetts: MIT Press.Google Scholar
  9. Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48: 71–79.CrossRefGoogle Scholar
  10. Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In Lippmann, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 190–196. San Mateo, California: Morgan Kaufmann Publishers.Google Scholar
  11. Giles, C., Chen, D., Sun, G., Chen, H., Lee, Y., & Goudreau, M. (1995). Constructive learning of recurrent neural networks: Problems with recurrent cascade correlation and a simple solution. IEEE Transactions on Neural Networks, 6(4):829.CrossRefGoogle Scholar
  12. Jameson, J. W. (1992). Reinforcement control with hierarchical back propagated adaptive critics. Submitted to Neural Networks.Google Scholar
  13. Jordan, M. I. (1986). Serial order: A parallel distributed processing approach. ICS Report 8604, Institute for Cognitive Science, University of California, San Diego.Google Scholar
  14. Kaelbling, L. P. (1993a). Hierarchical learning in stochastic domains: Preliminary results. In Machine Learning: Proceedings of the tenth International Conference, pages 167–173. Morgan Kaufmann Publishers.Google Scholar
  15. Kaelbling, L. P. (1993b). Learning to achieve goals. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pages 1094–1098. Chambéry, France: Morgan Kaufmann.Google Scholar
  16. Laird, J. E., Rosenbloom, P. S., & Newell, A. (1986). Chunking in soar: The anatomy of a general learning mechanism. Machine Learning, 1:11–46.Google Scholar
  17. Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321.CrossRefGoogle Scholar
  18. Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University. Also appears as Technical Report CMU-CS-93-103.Google Scholar
  19. McCallum, A. K. (1996). Learning to use selective attention and short-term memory in sequential tasks. In From Animals to Animats, Fourth International Conference on Simulation of Adaptive Behavior, (SAB'96).Google Scholar
  20. McCallum, R. A. (1993). Overcoming incomplete perception with Utile Distinction Memory. In Machine Learning: Proceedings of the Tenth International Conference, pages 190–196. Morgan Kaufmann Publishers.Google Scholar
  21. Pollack, J. B. (1991). The induction of dynamical recognizers. Machine Learning, 7: 227–252.Google Scholar
  22. Pratt, L. Y. (1993). Discriminability-based transfer between neural networks. In Giles, C., Hanson, S. J., and Cowan, J. D., editors, Advances in Neural Information Processing Systems 5, pages 204–211. San Mateo, CA: Morgan Kaufmann Publishers.Google Scholar
  23. Ring, M. B. (1993). Learning sequential tasks by incrementally adding higher orders. In Giles, C. L., Hanson, S. J., and Cowan, J. D., editors, Advances in Neural Information Processing Systems 5, pages 115–122. San Mateo, California: Morgan Kaufmann Publishers.Google Scholar
  24. Ring, M. B. (1994). Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712.Google Scholar
  25. Ring, M.B. (1996). Finding promising exploration regions by weighting expected navigation costs. Arbeitspapiere der GMD 987, GMD—German National Research Center for Information Technology.Google Scholar
  26. Robinson, A. J. & Fallside, F. (1987). The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department.Google Scholar
  27. Roitblat, H. L. (1988). A cognitive action theory of learning. In Delacour, J. and Levy, J. C. S., editors, Systems with Learning and Memory Abilities, pages 13–26. Elsevier Science Publishers B.V. (North-Holland).Google Scholar
  28. Roitblat, H. L. (1991). Cognitive action theory as a control architecture. In Meyer, J. A. and W ilson, S. W., editors, From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, pages 444–450. MIT Press.Google Scholar
  29. Sanger, T. D. (1991). A tree structured adaptive network for function approximation in high-dimensional spaces. IEEE Transactions on Neural Networks, 2(2):285–301.Google Scholar
  30. Schmidhuber, J. (1992). Learning unambiguous reduced sequence descriptions. In Moody, J. E., Hanson, S. J., and Lippman, R. P., editors, Advances in Neural Information Processing Systems 4, pages 291–298. San Mateo, California: Morgan Kaufmann Publishers.Google Scholar
  31. Schmidhuber, J. (1994). On learning how to learn learning strategies. Technical Report FKI- 198- 94 (revised), Technische Universität München, Institut für Informatik.Google Scholar
  32. Schmidhuber, J. & Wahnsiedler, R. (1993). Planning simple trajectories using neural subgoal generators. In Meyer, J. A., Roitblat, H., and Wilson, S., editors, From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pages 196–199. MIT Press.Google Scholar
  33. Sharkey, N. E. & Sharkey, A. J. (1993). adaptive generalisation. Artificial Intelligence Review, 7: 313–328.Google Scholar
  34. Silver, D. L. & Mercer, R. E. (1995). Toward a model of consolidation: The retention and transfer of neural net task knowledge. In Proceedings of the INNS World Congress on Neural Networks, volume III, pages 164–169. Washington, DC.Google Scholar
  35. Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8: 323-340.Google Scholar
  36. Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In Touretzky, D. S., Mozer, M. C., and Hasselno, M. E., editors, Advances in Neural Information Processing Systems 8. MIT Press.Google Scholar
  37. Thrun, S. & Schwartz, A. (1995). Finding structure in reinforcement learning. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 7, pages 385–392. MIT Press.Google Scholar
  38. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, King's College.Google Scholar
  39. Wilson, S.W. (1989). Hierarchical credit allocation in a classifier system. In Elzas, M. S., Ören, T. I., and Zeigler, B. P., editors, Modeling and Simulation Methodology. Elsevier Science Publishers B.V.Google Scholar
  40. Wixson, L. E. (1991). Scaling reinforcement learning techniques via modularity. In Birnbaum, L. A. and Collins, G. C., editors, Machine Learning: Proceedings of the Eighth International Workshop (ML91), pages 368–372. Morgan Kaufmann Publishers.Google Scholar
  41. Wynn-Jones, M. (1993). Node splitting: A constructive algorithm for feed-forward neural networks. Neural Computing and Applications, 1(1):17–22.Google Scholar
  42. Yu, Y.-H. & Simmons, R. F. (1990). Extra ouput biased learning. In Proceedings of the International Joint Conference on Neural Networks. Hillsdale, NJ: Erlbaum Associates.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Mark B. Ring
    • 1
  1. 1.Adaptive Systems Research Group, GMD–German National Research Center for Information Technology, Schloß BirlinghovenSankt AugustinGermany

Personalised recommendations