An Anti-hebbian Learning Rule to Represent Drive Motivations for Reinforcement Learning

  • Varun Raj Kompella
  • Sohrob Kazerounian
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8575)


We present a motivational system for an agent undergoing reinforcement learning (RL), which enables it to balance multiple drives, each of which is satiated by different types of stimuli. Inspired by drive reduction theory, it uses Minor Component Analysis (MCA) to model the agent’s internal drive state, and modulates incoming stimuli on the basis of how strongly the stimulus satiates the currently active drive. The agent’s dynamic policy continually changes through least-squares temporal difference updates. It automatically seeks stimuli that first satiate the most active internal drives, then the next most active drives, etc. We prove that our algorithm is stable under certain conditions. Experimental results illustrate its behavior.


Motivational Drives Reinforcement Learning MCA Animats 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. Cambridge Univ Press (1998)Google Scholar
  2. 2.
    Konidaris, G., Barto, A.: An adaptive robot motivational system. In: Nolfi, S., Baldassarre, G., Calabretta, R., Hallam, J.C.T., Marocco, D., Meyer, J.-A., Miglino, O., Parisi, D. (eds.) SAB 2006. LNCS (LNAI), vol. 4095, pp. 346–356. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Cos, I., Cañamero, L., Hayes, G.M., Gillies, A.: Hedonic value: Enhancing adaptation for motivated agents. Adaptive Behavior 21(6), 465–483 (2013)CrossRefGoogle Scholar
  4. 4.
    Woodworth, R.S.: Dynamic psychology, by Robert Sessions Woodworth. Columbia University Press (1918)Google Scholar
  5. 5.
    Hull, C.L.: Principles of behavior: An introduction to behavior theory. Century psychology series. D. Appleton-Century Company, Incorporated (1943)Google Scholar
  6. 6.
    Wolpe, J.: Need-reduction, drive-reduction, and reinforcement: A neurophysiological view. Psychological Review 57(1), 19 (1950)CrossRefGoogle Scholar
  7. 7.
    Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: Proceedings of the 25th International Conference on Machine Learning, pp. 41–47. ACM (2008)Google Scholar
  8. 8.
    Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning 84(1), 51–80 (2011)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Keramati, M., Gutkin, B.S.: A reinforcement learning theory for homeostatic regulation. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 82–90 (2011)Google Scholar
  10. 10.
    Konidaris, G.D., Hayes, G.M.: An architecture for behavior-based reinforcement learning. Adaptive Behavior 13(1), 5–32 (2005)CrossRefGoogle Scholar
  11. 11.
    Oja, E.: Principal components, minor components, and linear neural networks. Neural Networks 5(6), 927–935 (1992)CrossRefGoogle Scholar
  12. 12.
    Peng, D., Yi, Z., Luo, W.: Convergence analysis of a simple minor component analysis algorithm. Neural Networks 20(7), 842–850 (2007)CrossRefzbMATHGoogle Scholar
  13. 13.
    White, R.W.: Motivation reconsidered: The concept of competence. Psychological Review 66(5), 297 (1959)CrossRefGoogle Scholar
  14. 14.
    Luciw, M., Kompella, V.R., Kazerounian, S., Schmidhuber, J.: An intrinsic value system for developing multiple invariant representations with incremental slowness learning. Frontiers in Neurorobotics 7 (2013)Google Scholar
  15. 15.
    Shirinov, E., Butz, M.V.: Distinction between types of motivations: Emergent behavior with a neural, model-based reinforcement learning system. In: IEEE Symposium on Artificial Life, ALife 2009, pp. 69–76. IEEE (2009)Google Scholar
  16. 16.
    Sprague, N., Ballard, D.: Multiple-goal reinforcement learning with modular sarsa (0). In: IJCAI, pp. 1445–1447 (2003)Google Scholar
  17. 17.
    Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38(3), 287–308 (2000)CrossRefzbMATHGoogle Scholar
  18. 18.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetGoogle Scholar
  20. 20.
    Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8(16), 2169–2231 (2007)zbMATHMathSciNetGoogle Scholar
  21. 21.
    Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development 2(3), 230–247 (2010)CrossRefGoogle Scholar
  22. 22.
    Guedalia, I.D., London, M., Werman, M.: An on-line agglomerative clustering method for nonstationary data. Neural Computation 11(2), 521–540 (1999)CrossRefGoogle Scholar
  23. 23.
    Kompella, V.R., Luciw, M., Schmidhuber, J.: Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams. Neural Computation 24(11), 2994–3024 (2012)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Varun Raj Kompella
    • 1
  • Sohrob Kazerounian
    • 1
  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIAManno-LuganoSwitzerland

Personalised recommendations