Transfer of learning by composing solutions of elemental sequential tasks

Singh, Satinder Pal

doi:10.1007/BF00992700

Transfer of learning by composing solutions of elemental sequential tasks

Published: May 1992

Volume 8, pages 323–339, (1992)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Transfer of learning by composing solutions of elemental sequential tasks

Download PDF

Satinder Pal Singh¹

1521 Accesses
128 Citations
6 Altmetric
2 Mentions
Explore all metrics

Abstract

Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focused on single tasks. In this paper I consider a class of sequential decision tasks (SDTs), called composite sequential decision tasks, formed by temporally concatenating a number of elemental sequential decision tasks. Elemental SDTs cannot be decomposed into simpler SDTs. I consider a learning agent that has to learn to solve a set of elemental and composite SDTs. I assume that the structure of the composite tasks is unknown to the learning agent. The straightforward application of reinforcement learning to multiple tasks requires learning the tasks separately, which can waste computational resources, both memory and time. I present a new learning algorithm and a modular architecture that learns the decomposition of composite SDTs, and achieves transfer of learning by sharing the solutions of elemental SDTs across multiple composite SDTs. The solution of a composite SDT is constructed by computationally inexpensive modifications of the solutions of its constituent elemental SDTs. I provide a proof of one aspect of the learning algorithm.

References

Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991).Real-time learning and control using asynchronous dynamic programming. (Technical Report 91-57). Amherst, MA: University of Massachusetts, COINS Dept.
Google Scholar
Barto, A.G. & Singh, S.P. (1990). On the computational economics of reinforcement learning.Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems.IEEE SMC, 13, 835–846.
Google Scholar
Barto, A.G., Sutton, R.S., & Watkins, C.J.C.H. (1990). Sequential decision problems and neural networks. In D.S. Touretzky, (Ed.),Advances in neural information processing systems 2, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Bertsekas, D.P. (1987).Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Brooks, R. (1989). A robot that walks: Emergent behaviors from a carefully evolved network.Neural Computation, 1, 253–262.
Google Scholar
Duda, R.O. & Hart, P.E. (1973).Pattern classification and scene analysis. New York: Wiley.
Google Scholar
Iba, G.A. (1989). A heuristic approach to the discovery of macro-operators.Machine Learning, 3, 285–317.
Google Scholar
Jacobs, R.A. (1990).Task decomposition through competition in a modular connectionist architecture. Ph.D. Thesis, COINS Dept., Univ. of Massachusetts, Amherst, Mass.
Google Scholar
Jacobs, R.A. & Jordan, M.I. (1991). A competitive modular connectionist architecture.Advances in neural information processing systems, 3.
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., & Hinton, G.E. (1991). Adaptive mixtures of local experts.Neural Computation, 3.
Kaelbling, L.P. (1990).Learning in embedded systems. Ph.D. Thesis, Stanford University, Department of Computer Science, Stanford CA. Technical Report TR-90-04.
Google Scholar
Korf, R.E. (1985). Macro-operators: A weak method for learning.Artificial Learning, 26, 35–77.
Google Scholar
Maes, P. & Brooks, R. (1990). Learning to coordinate behaviours.Proceedings of the Eighth AAAI (pp. 796–802). Morgan Kaufmann.
Mahadevan, S. & Connell, J. (1990). Automatic programming of behavior-based robots using reinforcement learning. (Technical Report) Yorktown Heights, NY: IBM Research Division, T.J. Watson Research Center.
Google Scholar
Nowlan, S.J. (1990). Competing experts: An experimental investigation of associative mixture models. (Technical Report CRG-TR-90-5). Toronto, Canada: Univ. of Toronto, Department of Computer Science.
Google Scholar
Ross, S. (1983).Introduction to stochastic dynamic programming. New York: Academic Press.
Google Scholar
Singh, S.P. (1992a). On the efficient learning of multiple sequential tasks. In J. Moody, S.J. Hanson, & R.P. Lippman, (Eds.),Advances in neural information processing systems 4, San Mateo, CA: Morgan Kaufmann.
Google Scholar
Singh, S.P. (1992b). Solving multiple sequential tasks using a hierarchy of variable temporal resolution models. Submitted to Machine Learning Conference, 1992.
Skinner, B.F. (1938).The behavior of organisms: An experimental analysis. New York: D. Appleton Century.
Google Scholar
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3, 9–44.
Google Scholar
Sutton, R.S. (1990). Integrating architectures for learning, planning, and reacting based on approximating dynamic programming.Proceedings of the Seventh International Workshop on Machine Learning (pp. 216–224). San Mateo, CA: Morgan Kaufmann
Google Scholar
Watkins, C.J.C.H. (1989).Learning from delayed rewards. Ph.D. Thesis, Cambridge Univ., Cambridge, England.
Google Scholar
Watkins, C.J.C.H. & Dayan, P. (1992). Q-learning.Machine Learning, 8, 279–292.
Google Scholar
Whitehead, S.D. & Ballard, D.H. (1990). Active perception and reinforcement learning.Proceedings of the Seventh International Conference on Machine Learning. Austin, TX.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, 01003, Amherst, MA
Satinder Pal Singh

Authors

Satinder Pal Singh
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, S.P. Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8, 323–339 (1992). https://doi.org/10.1007/BF00992700

Download citation

Issue Date: May 1992
DOI: https://doi.org/10.1007/BF00992700

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Transfer of learning by composing solutions of elemental sequential tasks

Abstract

Article PDF

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transfer of learning by composing solutions of elemental sequential tasks

Abstract

Article PDF

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation