Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Credit Assignment

  • Claude Sammut
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_185



When a learning system employs a complex decision process, it must assign credit or blame for the outcomes to each of its decisions. Where it is not possible to directly attribute an individual outcome to each decision, it is necessary to apportion credit and blame between each of the combinations of decisions that contributed to the outcome. We distinguish two cases in the credit assignment problem. Temporal credit assignment refers to the assignment of credit for outcomes to actions. Structural credit assignment refers to the assignment of credit for actions to internal decisions. The first subproblem involves determining when the actions that deserve credit were taken and the second involves assigning credit to the internal structure of actions (Sutton, 1984).


Consider the problem of learning to balance a pole that is hinged on a cart (Michie & Chambers, 1968, Anderson, & Miller, 1991). The...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Albus, J. S. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Journal of Dynamic Systems, Measurement and Control, Transactions ASME, 97(3), 220–227.MATHGoogle Scholar
  2. Anderson, C. W., & Miller, W. T. (1991). A set of challenging control problems. In W. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Neural Networks for Control. Cambridge: MIT Press.Google Scholar
  3. Atkeson, C., Schaal, S., & Moore, A. (1997). Locally weighted learning. AI Review, 11, 11–73.Google Scholar
  4. Banerjee, B., Liu, Y., & Youngblood, G. M. (Eds.), (2006). Proceedings of the ICML workshop on “Structural knowledge transfer for machine learning.” Pittsburgh, PA.Google Scholar
  5. Barto, A., Sutton, R., & Anderson, C. (1983). Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834–846.Google Scholar
  6. Benson, S., & Nilsson, N. J. (1995). Reacting, planning and learning in an autonomous agent. In K. Furukawa, D. Michie, & S. Muggleton (Eds.), Machine Intelligence 14. Oxford: Oxford University Press.Google Scholar
  7. Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Nashua, NH: Athena Scientific.MATHGoogle Scholar
  8. Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.Google Scholar
  9. Dejong, G., & Mooney, R. (1986). Explanation-based learning: An alternative view. Machine Learning, 1, 145–176.Google Scholar
  10. Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. Boston: Addison-Wesley Longman Publishing.MATHGoogle Scholar
  11. Grefenstette, J. J. (1988). Credit assignment in rule discovery systems based on genetic algorithms. Machine Learning, 3(2–3), 225–245.Google Scholar
  12. Hinton, G., Rumelhart, D., & Williams, R. (1985). Learning internal representation by back-propagating errors. In D. Rumelhart, J. McClelland, & T. P. R. Group (Eds.), Parallel distributed computing: Explorations in the microstructure of cognition (Vol. 1., pp. 31–362). Cambridge: MIT Press.Google Scholar
  13. Holland, J. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). Los Altos: Morgan Kaufmann.Google Scholar
  14. Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). SOAR: An architecture for general intelligence. Artificial Intelligence, 33(1), 1–64.MathSciNetGoogle Scholar
  15. Mahadevan, S. (2009). Learning representation and control in Markov decision processes: New frontiers. Foundations and Trends in Machine Learning, 1(4), 403–565.MATHGoogle Scholar
  16. Michie, D., & Chambers, R. (1968). Boxes: An experiment in adaptive control. In E. Dale & D. Michie (Eds.), Machine Intelligence 2. Edinburgh: Oliver and Boyd.Google Scholar
  17. Minsky, M. (1961). Steps towards artificial intelligence. Proceedings of the IRE, 49, 8–30.MathSciNetGoogle Scholar
  18. Mitchell, T. M., Keller, R. M., & Kedar-Cabelli, S. T. (1986). Explanation based generalisation: A unifying view. Machine Learning, 1, 47–80.Google Scholar
  19. Mitchell, T. M., Utgoff, P. E., & Banerji, R. B. (1983). Learning by experimentation: Acquiring and refining problem-solving heuristics. In R. Michalski, J. Carbonell, & T. Mitchell (Eds.), Machine kearning: An artificial intelligence approach. Palo Alto: Tioga.Google Scholar
  20. Moore, A. W. (1990). Efficient memory-based learning for robot control. Ph.D. Thesis, UCAM-CL-TR-209, Computer Laboratory, University of Cambridge, Cambridge.Google Scholar
  21. Niculescu-mizil, A., & Caruana, R. (2007). Inductive transfer for Bayesian network structure learning. In Proceedings of the 11th International Conference on AI and Statistics (AISTATS 2007). San Juan, Puerto Rico.Google Scholar
  22. Reid, M. D. (2004). Improving rule evaluation using multitask learning. In Proceedings of the 14th International Conference on Inductive Logic Programming (pp. 252–269). Porto, Portugal.Google Scholar
  23. Reid, M. D. (2007). DEFT guessing: Using inductive transfer to improve rule evaluation from limited data. Ph.D. thesis, School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.Google Scholar
  24. Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanics. Washington, DC: Spartan Books.Google Scholar
  25. Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3(3), 210–229.Google Scholar
  26. Silver, D., Bakir, G., Bennett, K., Caruana, R., Pontil, M., Russell, S., et al. (2005). NIPS workshop on “Inductive transfer: 10 years later”. Whistler, Canada.Google Scholar
  27. Sutton, R. (1984). Temporal credit assignment in reinforcement learning. Ph.D. thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, MA.Google Scholar
  28. Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.Google Scholar
  29. Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10, 1633–1685.MathSciNetGoogle Scholar
  30. Wang, X., Simon, H. A., Lehman, J. F., & Fisher, D. H. (1996). Learning planning operators by observation and practice. In Proceedings of the Second International Conference on AI Planning Systems, AIPS-94 (pp. 335–340). Chicago, IL.Google Scholar
  31. Watkins, C. (1989). Learning with delayed rewards. Ph.D. thesis, Psychology Department, University of Cambridge, Cambridge.Google Scholar
  32. Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.MATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Claude Sammut

There are no affiliations available