Intrinsic Motivation and Reinforcement Learning

Chapter

Abstract

Psychologists distinguish between extrinsically motivated behavior, which is behavior undertaken to achieve some externally supplied reward, such as a prize, a high grade, or a high-paying job, and intrinsically motivated behavior, which is behavior done for its own sake. Is an analogous distinction meaningful for machine learning systems? Can we say of a machine learning system that it is motivated to learn, and if so, is it possible to provide it with an analog of intrinsic motivation? Despite the fact that a formal distinction between extrinsic and intrinsic motivation is elusive, this chapter argues that the answer to both questions is assuredly “yes” and that the machine learning framework of reinforcement learning is particularly appropriate for bringing learning together with what in animals one would call motivation. Despite the common perception that a reinforcement learning agent’s reward has to be extrinsic because the agent has a distinct input channel for reward signals, reinforcement learning provides a natural framework for incorporating principles of intrinsic motivation.

References

  1. .
    Ackley, D.H., Littman, M.: Interactions between learning and evolution. In: Langton, C., Taylor, C., Farmer, C., Rasmussen, S. (eds.) Artificial Life II (Proceedings Volume X in the Santa Fe Institute Studies in the Sciences of Complexity, pp. 487–509. Addison-Wesley, Reading (1991)Google Scholar
  2. .
    Andry, P., Gaussier, P., Nadel, J., Hirsbrunner, B.: Learning invariant sensorimotor behaviors: A developmental approach to imitation mechanisms. Adap. Behav. 12, 117–140 (2004)CrossRefGoogle Scholar
  3. .
    Arkes, H.R., Garske, J.P.: Psychological Theories of Motivation. Brooks/Cole, Monterey (1982)Google Scholar
  4. .
    Baranes, A., Oudeyer, P.-Y.: Intrinsically motivated goal exploration for active motor learning in robots: A case study. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), Taipei, Taiwan 2010Google Scholar
  5. .
    Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discr. Event Dynam. Syst. Theory Appl. 13, 341–379 (2003)CrossRefMathSciNetGoogle Scholar
  6. .
    Barto, A.G., Singh, S., Chentanez, N.: Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of the International Conference on Developmental Learning (ICDL), La Jolla, CA 2004Google Scholar
  7. .
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike elements that can solve difficult learningcontrol problems. 13, 835–846 (1983). IEEE Trans. Sys. Man, Cybern. Reprinted in J.A. Anderson and E. Rosenfeld (eds.), Neurocomputing: Foundations of Research, pp. 535–549, MIT, Cambridge (1988)Google Scholar
  8. .
    Beck, R.C.: Motivation. Theories and Principles, 2nd edn. Prentice-Hall, Englewood Cliffs (1983)Google Scholar
  9. .
    Berlyne, D.E.: A theory of human curiosity. Br. J. Psychol. 45, 180–191 (1954)Google Scholar
  10. .
    Berlyne, D.E.: Conflict, Arousal., Curiosity. McGraw-Hill, New York (1960)Google Scholar
  11. .
    Berlyne, D.E.: Curiosity and exploration. Science 143, 25–33 (1966)Google Scholar
  12. .
    Berlyne, D.E.: Aesthetics and Psychobiology. Appleton-Century-Crofts, New York (1971)Google Scholar
  13. .
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATHGoogle Scholar
  14. .
    Bindra, D.: How adaptive behavior is produced: A perceptual-motivational alternative to response reinforcement. Behav. Brain Sci. 1, 41–91 (1978)CrossRefGoogle Scholar
  15. .
    Breazeal, C., Brooks, A., Gray, J., Hoffman, G., Lieberman, J., Lee, H., Lockerd, A., Mulanda, D.: Tutelage and collaboration for humanoid robots. Int. J. Human. Robot. 1 (2004)Google Scholar
  16. .
    Bush, V.: Science the endless frontier: Areport to the president. Technical report (1945)Google Scholar
  17. .
    Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multi-agent reinforcement learning. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 38(2), 156–172 (2008)CrossRefGoogle Scholar
  18. .
    Cannon, W.B.: The Wisdom of the Body. W.W. Norton, New York (1932)Google Scholar
  19. .
    Clark, W.A., Farley, B.G.: Generalization of pattern recognition in a self-organizing system. In: AFIPS’ 55 (Western) Proceedings of the March 1–3, 1955, Western Joint Computer Conference, Los Angeles, CA, pp. 86–91, ACM, New York (1955)Google Scholar
  20. .
    Cofer, C.N., Appley, M.H.: Motivation: Theory and Research. Wiley, New York (1964)Google Scholar
  21. .
    Damoulas, T., Cos-Aguilera, I., Hayes, G.M., Taylor, T.: Valency for adaptive homeostatic agents: Relating evolution and learning. In: Capcarrere, M.S., Freitas, A.A., Bentley, P.J., Johnson, C.G., Timmis, J. (eds.) Advances in Artificial Life: 8th European Conference, ECAL 2005. Canterbury, UK LNAI vol. 3630, pp. 936–945. Springer, Berlin (2005)CrossRefGoogle Scholar
  22. .
    Daw, N.D., Shohamy, D.: The cognitive neuroscience of motivation and learning. Soc. Cogn. 26(5), 593–620 (2008)CrossRefGoogle Scholar
  23. .
    Dayan, P.: Motivated reinforcement learning. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14: Proceedings of the 2001 Conference, pp. 11–18. MIT, Cambridge (2001)Google Scholar
  24. .
    Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-Determination in Human Behavior. Plenum, New York (1985)CrossRefGoogle Scholar
  25. .
    Dember, W.N., Earl, R.W.: Analysis of exploratory, manipulatory, and curiosity behaviors. Psychol. Rev. 64, 91–96 (1957)CrossRefGoogle Scholar
  26. .
    Dember, W.N., Earl, R.W., Paradise, N.: Response by rats to differential stimulus complexity. J. Comp. Physiol. Psychol. 50, 514–518 (1957)CrossRefGoogle Scholar
  27. .
    Dickinson, A., Balleine, B.: The role of leaning in the operation of motivational systems. In: Gallistel, R. (ed.) Handbook of Experimental Psychology, 3rd edn. Learning, Motivation, and Emotion, pp. 497–533. Wiley, New York (2002)Google Scholar
  28. .
    Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Co-evolution of shaping rewards and meta-parameters in reinforcement learning. Adap. Behav. 16, 400–412 (2008)CrossRefGoogle Scholar
  29. .
    Epstein, A.: Instinct and motivation as explanations of complex behavior. In: Pfaff, D.W. (ed.) The Physiological Mechanisms of Motivation. Springer, New York (1982)Google Scholar
  30. .
    Friston, K.J., Daunizeau, J., Kilner, J., Kiebel, S.J.: Action and behavior: A free-energy formulation. Biol. Cybern. (2010). Pubished online February 11, 2020Google Scholar
  31. .
    Groos, K.: The Play of Man. D. Appleton, New York (1901)CrossRefGoogle Scholar
  32. .
    Harlow, H.F.: Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. J. Comp. Physiol. Psychol. 43, 289–294 (1950)CrossRefGoogle Scholar
  33. .
    Harlow, H.F., Harlow, M.K., Meyer, D.R.: Learning motivated by a manipulation drive. J. Exp. Psychol. 40, 228–234 (1950)CrossRefGoogle Scholar
  34. .
    Hart, S., Grupen, R.: Intrinsically motivated affordance discovery and modeling. In: Baldassarre, G., Mirolli, M. (eds.) Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin (2012, this volume)Google Scholar
  35. .
    Hebb, D.O.: The Organization of Behavior. Wiley, New York (1949)Google Scholar
  36. .
    Hendrick, I.: Instinct and ego during infancy. Psychoanal. Quart. 11, 33–58 (1942)Google Scholar
  37. .
    Hesse, F., Der, R., Herrmann, M., Michael, J.: Modulated exploratory dynamics can shape self-organized behavior. Adv. Complex Syst. 12(2), 273–292 (2009)CrossRefMathSciNetMATHGoogle Scholar
  38. .
    Hull, C.L.: Principles of Behavior. D. Appleton-Century, New York (1943)Google Scholar
  39. .
    Hull, C.L.: Essentials of Behavior. Yale University Press, New Haven (1951)Google Scholar
  40. .
    Hull, C.L.: A Behavior System: An Introduction to Behavior Theory Concerning the Individual Organism. Yale University Press, New Haven (1952)Google Scholar
  41. .
    Kimble, G.A.: Hilgard and Marquis’ Conditioning and Learning. Appleton-Century-Crofts, Inc., New York (1961)Google Scholar
  42. .
    Klein, S.B.: Motivation. Biosocial Approaches. McGraw-Hill, New York (1982)Google Scholar
  43. .
    Klopf, A.H.: Brain function and adaptive systems—A heterostatic theory. Technical report AFCRL-72-0164, Air Force Cambridge Research Laboratories, Bedford. A summary appears in Proceedings of the International Conference on Systems, Man, and Cybernetics, 1974, IEEE Systems, Man, and Cybernetics Society, Dallas (1972)Google Scholar
  44. .
    Klopf, A.H.: The Hedonistic Neuron: A Theory of Memory, Learning, and Intelligence. Hemisphere, Washington (1982)Google Scholar
  45. .
    Lenat, D.B.: AM: An artificial intelligence approach to discovery in mathematics. Ph.D. Thesis, Stanford University (1976)Google Scholar
  46. .
    Linden, D.J.: The Compass of Pleasure: How Our Brains Make Fatty Foods, Orgasm, Exercise, Marijuana, Generosity, Vodka, Learning, and Gambling Feel So Good. Viking, New York (2011)Google Scholar
  47. .
    Littman, M.L., Ackley, D.H.: Adaptation in constant utility nonstationary environments. In: Proceedings of the Fourth International Conference on Genetic Algorithms, San Diego, CA pp. 136–142 (1991)Google Scholar
  48. .
    Lungarella, M., Metta, G., Pfeiffer, R., Sandini, G.: Developmental robotics: A survey. Connect. Sci. 15, 151–190 (2003)CrossRefGoogle Scholar
  49. .
    Mackintosh, N.J.: Conditioning and Associative Learning. Oxford University Press, New York (1983)Google Scholar
  50. .
    McFarland, D., Bösser, T.: Intelligent Behavior in Animals and Robots. MIT, Cambridge (1993)Google Scholar
  51. .
    Mendel, J.M., Fu, K.S. (eds.): Adaptive, Learning, and Pattern Recognition Systems: Theory and Applications. Academic, New York (1970)MATHGoogle Scholar
  52. .
    Mendel, J.M., McLaren, R.W.: Reinforcement learning control and pattern recognition systems. In: Mendel, J.M., Fu, K.S. (eds.) Adaptive, Learning and Pattern Recognition Systems:Theory and Applications, pp. 287–318. Academic, New York (1970)CrossRefGoogle Scholar
  53. .
    Michie, D., Chambers, R.A.: BOXES: An experiment in adaptive control. In: Dale, E., Michie, D. (eds.) Machine Intelligence 2, pp. 137–152. Oliver and Boyd, Edinburgh (1968)Google Scholar
  54. .
    Minsky, M.L.: Theory of neural-analog reinforcement systems and its application to the brain-model problem. Ph.D. Thesis, Princeton University (1954)Google Scholar
  55. .
    Minsky, M.L.: Steps toward artificial intelligence. Proc. Inst. Radio Eng. 49, 8–30 (1961). Reprinted in E.A. Feigenbaum and J. Feldman (eds.) Computers and Thought, pp. 406–450. McGraw-Hill, New York (1963)Google Scholar
  56. .
    Mollenauer, S.O.: Shifts in deprivations level: Different effects depending on the amount of preshift training. Learn. Motiv. 2, 58–66 (1971)CrossRefGoogle Scholar
  57. .
    Narendra, K., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice Hall, Englewood Cliffs (1989)Google Scholar
  58. .
    Olds, J., Milner, P.: Positive reinforcement produced by electrical stimulation of septal areas and other regions of rat brain. J. Comp. Physiol. Psychol. 47, 419–427 (1954)CrossRefGoogle Scholar
  59. .
    Oudeyer, P.-Y., Kaplan, F.: What is intrinsic motivation? A typology of computational approaches. Front. Neurorobot. 1:6, doi: 10.3389/neuro.12.006.2007 (2007)Google Scholar
  60. .
    Oudeyer, P.-Y., Kaplan, F., Hafner, V.: Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11, 265–286 (2007)CrossRefGoogle Scholar
  61. .
    Petri, H.L.: Motivation: Theory and Research. Wadsworth Publishing Company, Belmont (1981)Google Scholar
  62. .
    Piaget, J.: The Origins of Intelligence in Children. Norton, New York (1952)CrossRefGoogle Scholar
  63. .
    Picard, R.W.: Affective Computing. MIT, Cambridge (1997)CrossRefGoogle Scholar
  64. .
    Prince, C.G., Demiris, Y., Marom, Y., Kozima, H., Balkenius, C. (eds.): Proceedings of the Second International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, vol. 94. Lund University, Lund (2001)Google Scholar
  65. .
    Rescorla, R.A., Wagner, A.R.: A theory of Pavlovian conditioning: Variationsin the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Prokasy, W.F. (eds.) Classical Conditioning, vol. II, pp. 64–99. Appleton-Century-Crofts, New York (1972)Google Scholar
  66. .
    Rosenblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington (1962)MATHGoogle Scholar
  67. .
    Rumelhart, D., Hintont, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  68. .
    Ryan, R.M., Deci, E.L.: Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp. Educ. Psychol. 25, 54–67 (2000)CrossRefGoogle Scholar
  69. .
    Samuelson, L.: Introduction to the evolution of preferences. J. Econ. Theory 97, 225–230 (2001)CrossRefMathSciNetMATHGoogle Scholar
  70. .
    Samuelson, L., Swinkels, J.: Information, evolution, and utility. Theor. Econ. 1, 119–142 (2006)Google Scholar
  71. .
    Savage, T.: Artificial motives: A review of motivation in artificial creatures. Connect. Sci. 12, 211–277 (2000)CrossRefGoogle Scholar
  72. .
    Schembri, M., Mirolli, M., Baldassarre, G.: Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot. In: Proceedings of the 6th International Conference on Development and Learning (ICDL2007), Imperial College, London 2007Google Scholar
  73. .
    Schmidhuber, J.: Adaptive confidence and adaptive curiosity. Technical report FKI-149-91, Institut für Informatik, Technische Universität München (1991a)Google Scholar
  74. .
    Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, pp. 222–227. MIT, Cambridge (1991b)Google Scholar
  75. .
    Schmidhuber, J.: What’s interesting? Technical report TR-35-97. IDSIA, Lugano (1997)Google Scholar
  76. .
    Schmidhuber, J.: Artificial curiosity based on discovering novel algorithmic predictability through coevolution. In: Proceedings of the Congress on Evolutionary Computation, vol. 3, pp. 1612–1618. IEEE (1999)Google Scholar
  77. .
    Schmidhuber, J.: Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Pezzulo, G., Butz, M.V., Sigaud, O., Baldassarre, G. (eds.) Anticipatory Behavior in Adaptive Learning Systems. From Psychological Theories to Artificial Cognitive Systems, pp. 48–76. Springer, Berlin (2009)CrossRefGoogle Scholar
  78. .
    Schultz, W.: Predictive reward signal of dopamine neurons. J. Neurophysiol. 80(1), 1–27 (1998)Google Scholar
  79. .
    Schultz, W.: Reward. Scholarpedia 2(3), 1652 (2007a)CrossRefGoogle Scholar
  80. .
    Schultz, W.: Reward signals. Scholarpedia 2(6), 2184 (2007b)CrossRefGoogle Scholar
  81. .
    Scott, P.D., Markovitch, S.: Learning novel domains through curiosity and conjecture. In: Sridharan, N.S. (ed.) Proceedings of the 11th International Joint Conference on Artificial Intelligence, Detroit, MI pp. 669–674. Morgan Kaufmann, San Francisco (1989)Google Scholar
  82. .
    Settles, B.: Active learning literature survey. Technical Report 1648, Computer Sciences, University of Wisconsin-Madison, Madison (2009)Google Scholar
  83. .
    Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference. MIT, Cambridge (2005)Google Scholar
  84. .
    Singh, S., Lewis, R.L., Barto, A.G.: Where do rewards come from? In: Taatgen, N., van Rijn, H. (eds.) Proceedings of the 31st Annual Conference of the Cognitive Science Society, Amsterdam pp. 2601–2606. Cognitive Science Society (2009)Google Scholar
  85. .
    Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans. Auton. Mental Dev. 2(2), 70–82 (2010). Special issue on Active Learning and Intrinsically Motivated Exploration in Robots: Advances and ChallengesGoogle Scholar
  86. .
    Snel, M., Hayes, G.M.: Evolution of valence systems in an unstable environment. In: Proceedings of the 10th International Conference on Simulation of Adaptive Behavior: From Animals to Animats, Osaka, M. Asada, J.C. Hallam, J.-A. Meyer (Eds.) pp. 12–21 (2008)Google Scholar
  87. .
    Sorg, J., Singh, S., Lewis, R.L.: Internal rewards mitigate agent boundedness. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, Omnipress pp. 1007–1014 (2010)Google Scholar
  88. .
    Sutton, R.S.: Reinforcement learning architectures for animats. In: From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, J.-A. Meyer, S.W.Wilson (Eds.) pp. 288–296. MIT, Cambridge (1991)Google Scholar
  89. .
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)Google Scholar
  90. .
    Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction inreinforcement learning. Artif. Intell. 112, 181–211 (1999)CrossRefMathSciNetMATHGoogle Scholar
  91. .
    Tesauro, G.J.: TD—gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215–219 (1994)CrossRefGoogle Scholar
  92. .
    Thomaz, A.L., Breazeal, C.: Transparency and socially guided machine learning. In: Proceedings of the 5th International Conference on Developmental Learning (ICDL) Bloomington, IN (2006)Google Scholar
  93. .
    Thomaz, A.L., Hoffman, G., Breazeal, C.: Experiments in socially guided machine learning: Understanding how humans teach. In: Proceedings of the 1st Annual conference on Human-Robot Interaction (HRI) Salt Lake City, UT (2006)Google Scholar
  94. .
    Thorndike, E.L.: Animal Intelligence. Hafner, Darien (1911)Google Scholar
  95. .
    Toates, F.M. (1911): Motivational Systems. Cambridge University Press, Cambridge (1911)Google Scholar
  96. .
    Tolman, E.C.: Purposive Behavior in Animals and Men. Naiburg, New York (1932)Google Scholar
  97. .
    Trappl, R., Petta, P., Payr, S. (eds.): Emotions in Humans and Artifacts. MIT, Cambridge (1997)Google Scholar
  98. .
    Uchibe, E., Doya, K.: Finding intrinsic rewards by embodied evolution and constrained reinforcement learning. Neural Netw. 21(10), 1447–1455 (2008)CrossRefMATHGoogle Scholar
  99. .
    Waltz, M.D., Fu, K.S.: A heuristic approach to reinforcement learning control systems. IEEE Transactions on Automatic Control 10, 390–398 (1965)CrossRefGoogle Scholar
  100. .
    Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., Thelen, E.: Autonomous mental development by robots and animals. Science 291, 599–600 (2001)CrossRefGoogle Scholar
  101. .
    Werbos, P.J.: Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Trans. Sys. Man Cybern. 17, 7–20 (1987)CrossRefGoogle Scholar
  102. .
    White, R.W.: Motivation reconsidered: The concept of competence. Psychol. Rev. 66, 297–333 (1959)CrossRefGoogle Scholar
  103. .
    Widrow, B., Gupta, N.K., Maitra, S.: Punish/reward: Learning with a critic in adaptive thresholdsystems. IEEE Trans. Sys. Man Cybern. 3, 455–465 (1973)CrossRefMathSciNetMATHGoogle Scholar
  104. .
    Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: 1960 WESCON Convention Record Part IV, pp. 96–104. Institute of Radio Engineers, New York (1960). Reprinted in J.A. Anderson and E. Rosenfeld, Neurocomputing: Foundations of Research, pp. 126–134. MIT, Cambridge (1988)Google Scholar
  105. .
    Young, P.T.: Hedonic organization and regulation of behavior. Psychol. Rev. 73, 59–86 (1966)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.University of MassachussettsAmherstUSA
  2. 2.Institute of Cognitive Sciences and Technologies, CNRRomaItaly

Personalised recommendations