Transfer of learning by composing solutions of elemental sequential tasks

Abstract

Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focused on single tasks. In this paper I consider a class of sequential decision tasks (SDTs), called composite sequential decision tasks, formed by temporally concatenating a number of elemental sequential decision tasks. Elemental SDTs cannot be decomposed into simpler SDTs. I consider a learning agent that has to learn to solve a set of elemental and composite SDTs. I assume that the structure of the composite tasks is unknown to the learning agent. The straightforward application of reinforcement learning to multiple tasks requires learning the tasks separately, which can waste computational resources, both memory and time. I present a new learning algorithm and a modular architecture that learns the decomposition of composite SDTs, and achieves transfer of learning by sharing the solutions of elemental SDTs across multiple composite SDTs. The solution of a composite SDT is constructed by computationally inexpensive modifications of the solutions of its constituent elemental SDTs. I provide a proof of one aspect of the learning algorithm.

References

  1. Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991).Real-time learning and control using asynchronous dynamic programming. (Technical Report 91-57). Amherst, MA: University of Massachusetts, COINS Dept.

    Google Scholar 

  2. Barto, A.G. & Singh, S.P. (1990). On the computational economics of reinforcement learning.Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  3. Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems.IEEE SMC, 13, 835–846.

    Google Scholar 

  4. Barto, A.G., Sutton, R.S., & Watkins, C.J.C.H. (1990). Sequential decision problems and neural networks. In D.S. Touretzky, (Ed.),Advances in neural information processing systems 2, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  5. Bertsekas, D.P. (1987).Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  6. Brooks, R. (1989). A robot that walks: Emergent behaviors from a carefully evolved network.Neural Computation, 1, 253–262.

    Google Scholar 

  7. Duda, R.O. & Hart, P.E. (1973).Pattern classification and scene analysis. New York: Wiley.

    Google Scholar 

  8. Iba, G.A. (1989). A heuristic approach to the discovery of macro-operators.Machine Learning, 3, 285–317.

    Google Scholar 

  9. Jacobs, R.A. (1990).Task decomposition through competition in a modular connectionist architecture. Ph.D. Thesis, COINS Dept., Univ. of Massachusetts, Amherst, Mass.

    Google Scholar 

  10. Jacobs, R.A. & Jordan, M.I. (1991). A competitive modular connectionist architecture.Advances in neural information processing systems, 3.

  11. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., & Hinton, G.E. (1991). Adaptive mixtures of local experts.Neural Computation, 3.

  12. Kaelbling, L.P. (1990).Learning in embedded systems. Ph.D. Thesis, Stanford University, Department of Computer Science, Stanford CA. Technical Report TR-90-04.

    Google Scholar 

  13. Korf, R.E. (1985). Macro-operators: A weak method for learning.Artificial Learning, 26, 35–77.

    Google Scholar 

  14. Maes, P. & Brooks, R. (1990). Learning to coordinate behaviours.Proceedings of the Eighth AAAI (pp. 796–802). Morgan Kaufmann.

  15. Mahadevan, S. & Connell, J. (1990). Automatic programming of behavior-based robots using reinforcement learning. (Technical Report) Yorktown Heights, NY: IBM Research Division, T.J. Watson Research Center.

    Google Scholar 

  16. Nowlan, S.J. (1990). Competing experts: An experimental investigation of associative mixture models. (Technical Report CRG-TR-90-5). Toronto, Canada: Univ. of Toronto, Department of Computer Science.

    Google Scholar 

  17. Ross, S. (1983).Introduction to stochastic dynamic programming. New York: Academic Press.

    Google Scholar 

  18. Singh, S.P. (1992a). On the efficient learning of multiple sequential tasks. In J. Moody, S.J. Hanson, & R.P. Lippman, (Eds.),Advances in neural information processing systems 4, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  19. Singh, S.P. (1992b). Solving multiple sequential tasks using a hierarchy of variable temporal resolution models. Submitted to Machine Learning Conference, 1992.

  20. Skinner, B.F. (1938).The behavior of organisms: An experimental analysis. New York: D. Appleton Century.

    Google Scholar 

  21. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3, 9–44.

    Google Scholar 

  22. Sutton, R.S. (1990). Integrating architectures for learning, planning, and reacting based on approximating dynamic programming.Proceedings of the Seventh International Workshop on Machine Learning (pp. 216–224). San Mateo, CA: Morgan Kaufmann

    Google Scholar 

  23. Watkins, C.J.C.H. (1989).Learning from delayed rewards. Ph.D. Thesis, Cambridge Univ., Cambridge, England.

    Google Scholar 

  24. Watkins, C.J.C.H. & Dayan, P. (1992). Q-learning.Machine Learning, 8, 279–292.

    Google Scholar 

  25. Whitehead, S.D. & Ballard, D.H. (1990). Active perception and reinforcement learning.Proceedings of the Seventh International Conference on Machine Learning. Austin, TX.

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Singh, S.P. Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8, 323–339 (1992). https://doi.org/10.1007/BF00992700

Download citation

Keywords

  • Reinforcement learning
  • compositional learning
  • modular architecture
  • transfer of learning