Learning & Behavior

, Volume 38, Issue 1, pp 50–67 | Cite as

Two-factor theory, the actor-critic model, and conditioned avoidance

Article
  • 4.2k Downloads

Abstract

Two-factor theory (Mowrer, 1947, 1951, 1956) remains one of the most influential theories of avoidance, but it is at odds with empirical findings that demonstrate sustained avoidance responding in situations in which the theory predicts that the response should extinguish. This article shows that the well-known actor-critic model seamlessly addresses the problems with two-factor theory, while simultaneously being consistent with the core ideas that underlie that theory. More specifically, the article shows that (1) the actor-critic model bears striking similarities to two-factor theory and explains all of the empirical phenomena that two-factor theory explains, in much the same way, and (2) there are subtle but important differences between the actor-critic model and two-factor theory, which result in the actor-critic model predicting the persistence of avoidance responses that is found empirically.

References

  1. Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34B, 77–98.Google Scholar
  2. Baird, L. C. (1993). Advantage updating (Tech. Rep. No. WL-TR-93-1146). Dayton, OH: Wright-Patterson Air Force Base.Google Scholar
  3. Barto, A. G. (1995). Adaptive critics and the basal ganglia. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 215–232). Cambridge, MA: MIT Press.Google Scholar
  4. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, & Cybernetics, 13, 835–846.Google Scholar
  5. Beninger, R. J., Mason, S. T., Phillips, A. G., & Fibiger, H. C. (1980). The use of conditioned suppression to evaluate the nature of neuroleptic-induced avoidance deficits. Journal of Pharmacology & Experimental Therapeutics, 213, 623–627.Google Scholar
  6. Bolles, R. C. (1969). Avoidance and escape learning: Simultaneous acquisition of different responses. Journal of Comparative & Physiological Psychology, 68, 355–358.CrossRefGoogle Scholar
  7. Bolles, R. C. (1970). Species-specific defense reactions and avoidance learning. Psychological Review, 77, 32–48.CrossRefGoogle Scholar
  8. Bolles, R. C. (1972a). The avoidance learning problem. In G. H. Bower & K. W. Spence (Eds.), The psychology of learning and motivation (Vol. 6, pp. 97–145). New York: Academic Press.Google Scholar
  9. Bolles, R. C. (1972b). Reinforcement, expectancy, and learning. Psychological Review, 79, 394–409.CrossRefGoogle Scholar
  10. Bolles, R. C. (1978). The role of stimulus learning in defensive behavior. In S. H. Hulse, H. Fowler, & W. K. Honig (Eds.), Cognitive processes in animal behavior (pp. 89–108). Hillsdale, NJ: Erlbaum.Google Scholar
  11. Bolles, R. C., & Grossen, N. E. (1969). Effects of an informational stimulus on the acquisition of avoidance behavior in rats. Journal of Comparative & Physiological Psychology, 68, 90–99.CrossRefGoogle Scholar
  12. Bolles, R. C., Stokes, L. W., & Younger, M. S. (1966). Does CS termination reinforce avoidance behavior? Journal of Comparative & Physiological Psychology, 62, 201–207.CrossRefGoogle Scholar
  13. Brady, J. V. (1965). Experimental studies of psychophysiological responses to stressful situations. In Symposium on Medical Aspects of Stress in the Military Climate (pp. 271–289). Washington, DC: Walter Reed Army Institute of Research.Google Scholar
  14. Brady, J. V., & Harris, A. (1977). The experimental production of altered physiological states. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 595–618). Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  15. Bridle, J. S. (1990). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimates of parameters. In D. S. Touretzky (Ed.), Advances in neural information processing systems 2 (pp. 211–217). San Mateo, CA: Morgan Kaufmann.Google Scholar
  16. Chorazyna, H. (1962). Some properties of conditioned inhibition. Acta Biologiae Experimentalis, 22, 5–13.PubMedGoogle Scholar
  17. Cicala, G. A., & Owen, J. W. (1976). Warning signal termination and a feedback signal may not serve the same function. Learning & Motivation, 7, 356–367.CrossRefGoogle Scholar
  18. Cook, M., Mineka, S., & Trumble, D. (1987). The role of responseproduced and exteroceptive feedback in the attenuation of fear over the course of avoidance learning. Journal of Experimental Psychology: Animal Behavior Processes, 13, 239–249.CrossRefGoogle Scholar
  19. Coover, G. D., Ursin, H., & Levine, S. (1973). Plasma-corticosterone levels during active-avoidance learning in rats. Journal of Comparative & Physiological Psychology, 82, 170–174.CrossRefGoogle Scholar
  20. Crawford, M., & Masterson, F. A. (1978). Components of the flight response can reinforce bar-press avoidance learning. Journal of Experimental Psychology: Animal Behavior Processes, 4, 144–151.CrossRefGoogle Scholar
  21. Crawford, M., & Masterson, F. A. (1982). Species-specific defense reactions and avoidance learning. An evaluative review. Pavlovian Journal of Biological Science, 17, 204–214.PubMedGoogle Scholar
  22. Crespi, L. P. (1942). Quantitative variation of incentive and performance in the white rat. American Journal of Psychology, 55, 467–517.CrossRefGoogle Scholar
  23. Daw, N. D. (2003). Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh.Google Scholar
  24. Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theories of the dopamine system. Neural Computation, 18, 1637–1677.PubMedCrossRefGoogle Scholar
  25. Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711.PubMedCrossRefGoogle Scholar
  26. Daw, N. D., Niv, Y., & Dayan, P. (2006). Actions, policies, values, and the basal ganglia. In E. Bezard (Ed.), Recent breakthroughs in basal ganglia research (pp. 111–130). New York: Nova Science.Google Scholar
  27. Daw, N. D., & Touretzky, D. S. (2002). Long-term reward prediction in TD models of the dopamine system. Neural Computation, 14, 2567–2583.PubMedCrossRefGoogle Scholar
  28. Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36, 285–298.PubMedCrossRefGoogle Scholar
  29. Dayan, P., Kakade, S., & Montague, P. R. (2000). Learning and selective attention. Nature Neuroscience, 3 (Suppl.), 1218–1223.PubMedCrossRefGoogle Scholar
  30. Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society B, 308, 67–78.CrossRefGoogle Scholar
  31. Dickinson, A. (1994). Instrumental conditioning. In N. J. Mackintosh (Ed.), Animal learning and cognition (pp. 45–79). San Diego: Academic Press.Google Scholar
  32. Dinsmoor, J. A. (2001). Stimuli inevitably generated by behavior that avoids electric shock are inherently reinforcing. Journal of the Experimental Analysis of Behavior, 75, 311–333.PubMedCrossRefGoogle Scholar
  33. Dinsmoor, J. A., & Sears, G. W. (1973). Control of avoidance by a response-produced stimulus. Learning & Motivation, 4, 284–293.CrossRefGoogle Scholar
  34. Domjan, M. (2003). The principles of learning and behavior (5th ed.). Belmont, CA: Thomson/Wadsworth.Google Scholar
  35. Estes, W. K., & Skinner, B. F. (1941). Some quantitative properties of anxiety. Journal of Experimental Psychology, 29, 390–400.CrossRefGoogle Scholar
  36. Grossberg, S. (1972). A neural theory of punishment and avoidance. I: Qualitative theory. Mathematical Biosciences, 15, 39–67.CrossRefGoogle Scholar
  37. Grossen, N. E., & Kelley, M. J. (1972). Species-specific behavior and acquisition of avoidance behavior in rats. Journal of Comparative & Physiological Psychology, 81, 307–310.CrossRefGoogle Scholar
  38. Herrnstein, R. (1969). Method and theory in the study of avoidance. Psychological Review, 76, 49–69.PubMedCrossRefGoogle Scholar
  39. Hodgson, R., & Rachman, S. (1974). II. Desynchrony in measures of fear. Behaviour Research & Therapy, 12, 319–326.CrossRefGoogle Scholar
  40. Houk, J. C., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 249–270). Cambridge, MA: MIT Press.Google Scholar
  41. Hull, C. L. (1943). Principles of behavior: An introduction to behavior theory. New York: Appleton-Century.Google Scholar
  42. Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547.PubMedCrossRefGoogle Scholar
  43. Johnson, J. D., Li, W., Li, J., & Klopf, A. H. (2002). A computational model of learned avoidance behavior in a one-way avoidance experiment. Adaptive Behavior, 9, 91–104.CrossRefGoogle Scholar
  44. Kamin, L. J. (1956). The effects of termination of the CS and avoidance of the US on avoidance learning. Journal of Comparative & Physiological Psychology, 49, 420–424.CrossRefGoogle Scholar
  45. Kamin, L. J., Brimer, C. J., & Black, A. H. (1963). Conditioned suppression as a monitor of fear of the CS in the course of avoidance training. Journal of Comparative & Physiological Psychology, 56, 497–501.CrossRefGoogle Scholar
  46. Klopf, A. H., Morgan, J. S., & Weaver, S. E. (1993). A hierarchical network of control systems that learn: Modeling nervous system function during classical and instrumental conditioning. Adaptive Behavior, 1, 263–319.CrossRefGoogle Scholar
  47. Knapp, R. K. (1965). Acquisition and extinction of avoidance with similar and different shock and escape situations. Journal of Comparative & Physiological Psychology, 60, 272–273.CrossRefGoogle Scholar
  48. Levis, D. J. (1966). Effects of serial CS presentation and other characteristics of the CS on the conditioned avoidance response. Psychological Reports, 18, 755–766.Google Scholar
  49. Levis, D. J., Bouska, S. A., Eron, J. B., & McIlhon, M. D. (1970). Serial CS presentation and one-way avoidance conditioning: A noticeable lack of delay in responding. Psychonomic Science, 20, 147–149.Google Scholar
  50. Levis, D. J., & Boyd, T. L. (1979). Symptom maintenance: An infrahuman analysis and extension of the conservation of anxiety principle. Journal of Abnormal Psychology, 88, 107–120.PubMedCrossRefGoogle Scholar
  51. Levis, D. J., & Brewer, K. E. (2001). The neurotic paradox: Attempts by two-factor fear theory and alternative avoidance models to resolve the issues associated with sustained avoidance responding in extinction. In R. R. Mowrer & S. B. Klein (Eds.), Handbook of contemporary learning theories (pp. 561–597). Mahwah, NJ: Erlbaum.Google Scholar
  52. Logan, F. A. (1951). A comparison of avoidance and nonavoidance eyelid conditioning. Journal of Experimental Psychology, 42, 390–393.PubMedCrossRefGoogle Scholar
  53. Mackintosh, N. J. (1974). The psychology of animal learning. New York: Academic Press.Google Scholar
  54. Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298.CrossRefGoogle Scholar
  55. Maia, T. V. (2007). A reinforcement learning theory of avoidance. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh.Google Scholar
  56. Maia, T. V. (2009). Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience, 9, 343–364.CrossRefGoogle Scholar
  57. Malloy, P., & Levis, D. J. (1988). A laboratory demonstration of persistent human avoidance. Behavior Therapy, 19, 229–241.CrossRefGoogle Scholar
  58. Masterson, F. A. (1970). Is termination of a warning signal an effective reward for the rat? Journal of Comparative & Physiological Psychology, 72, 471–475.CrossRefGoogle Scholar
  59. McAllister, W. R., & McAllister, D. E. (1995). Two-factor fear theory: Implications for understanding anxiety-based clinical phenomena. In W. O’Donohue & L. Krasner (Eds.), Theories of behavior therapy (pp. 145–171). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  60. McAllister, W. R., McAllister, D. E., Scoles, M. T., & Hampton, S. R. (1986). Persistence of fear-reducing behavior: Relevance for the conditioning theory of neurosis. Journal of Abnormal Psychology, 95, 365–372.PubMedCrossRefGoogle Scholar
  61. Mineka, S. (1979). The role of fear in theories of avoidance learning, flooding, and extinction. Psychological Bulletin, 86, 985–1010.CrossRefGoogle Scholar
  62. Mineka, S., & Gino, A. (1980). Dissociation between conditioned emotional response and extended avoidance performance. Learning & Motivation, 11, 476–502.CrossRefGoogle Scholar
  63. Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936–1947.PubMedGoogle Scholar
  64. Morris, R. G. (1974). Pavlovian conditioned inhibition of fear during shuttlebox avoidance behavior. Learning & Motivation, 5, 424–447.CrossRefGoogle Scholar
  65. Morris, R. G. (1975). Preconditioning of reinforcing properties to an exteroceptive feedback stimulus. Learning & Motivation, 6, 289–298.CrossRefGoogle Scholar
  66. Moutoussis, M., Bentall, R. P., Williams, J., & Dayan, P. (2008). A temporal difference account of avoidance learning. Network, 19, 137–160.PubMedCrossRefGoogle Scholar
  67. Mowrer, O. H. (1947). On the dual nature of learning—a reinterpretation of conditioning and problem solving. Harvard Educational Review, 17, 102–148.Google Scholar
  68. Mowrer, O. H. (1951). Two-factor learning theory: Summary and comment. Psychological Review, 58, 350–354.PubMedCrossRefGoogle Scholar
  69. Mowrer, O. H. (1956). Two-factor learning theory reconsidered, with special reference to secondary reinforcement and the concept of habit. Psychological Review, 63, 114–128.PubMedCrossRefGoogle Scholar
  70. Mowrer, O. H. (1960). Learning theory and behavior. New York: Wiley.CrossRefGoogle Scholar
  71. Neuenschwander, N., Fabrigoule, C., & Mackintosh, N. J. (1987). Fear of the warning signal during overtraining of avoidance. Quarterly Journal of Experimental Psychology, 39B, 23–33.Google Scholar
  72. Niv, Y., Duff, M. O., & Dayan, P. (2005). Dopamine, uncertainty and TD learning. Behavioral & Brain Functions, 1, 6.CrossRefGoogle Scholar
  73. O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.PubMedCrossRefGoogle Scholar
  74. Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552.PubMedCrossRefGoogle Scholar
  75. Rachman, S. (1976). The passing of the two-stage theory of fear and avoidance: Fresh possibilities. Behaviour Research & Therapy, 14, 125–131.CrossRefGoogle Scholar
  76. Rachman, S., & Hodgson, R. (1974). I. Synchrony and desynchrony in fear and avoidance. Behaviour Research & Therapy, 12, 311–318.CrossRefGoogle Scholar
  77. Rescorla, R. A. (1968). Pavlovian conditioned fear in Sidman avoidance learning. Journal of Comparative & Physiological Psychology, 65, 55–60.CrossRefGoogle Scholar
  78. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.Google Scholar
  79. Riccio, D. C., & Silvestri, R. (1973). Extinction of avoidance behavior and the problem of residual fear. Behaviour Research & Therapy, 11, 1–9.CrossRefGoogle Scholar
  80. Schmajuk, N. A., & Zanutto, B. S. (1997). Escape, avoidance, and imitation: A neural network approach. Adaptive Behavior, 6, 63–129.CrossRefGoogle Scholar
  81. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.PubMedGoogle Scholar
  82. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.PubMedCrossRefGoogle Scholar
  83. Seligman, M. E. P., & Campbell, B. A. (1965). Effect of intensity and duration of punishment on extinction of an avoidance response. Journal of Comparative & Physiological Psychology, 59, 295–297.CrossRefGoogle Scholar
  84. Seligman, M. E. P., & Johnston, J. C. (1973). A cognitive theory of avoidance learning. In F. J. McGuigan & D. B. Lumsden (Eds.), Contemporary approaches to conditioning and learning (pp. 69–110). Washington, DC: Winston.Google Scholar
  85. Servatius, R. J., Jiao, X., Beck, K. D., Pang, K. C., & Minor, T. R. (2008). Rapid avoidance acquisition in Wistar-Kyoto rats. Behavioural Brain Research, 192, 191–197.PubMedCrossRefGoogle Scholar
  86. Sheffield, F. D., & Temmer, H. W. (1950). Relative resistance to extinction of escape training and avoidance training. Journal of Experimental Psychology, 40, 287–298.PubMedCrossRefGoogle Scholar
  87. Smith, A. J., Becker, S., & Kapur, S. (2005). A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural Computation, 17, 361–395.PubMedCrossRefGoogle Scholar
  88. Smith, A. [J.], Li, M., Becker, S., & Kapur, S. (2004). A model of antipsychotic action in conditioned avoidance: A computational approach. Neuropsychopharmacology, 29, 1040–1049.PubMedCrossRefGoogle Scholar
  89. Solomon, R. L., Kamin, L. J., & Wynne, L. C. (1953). Traumatic avoidance learning: The outcomes of several extinction procedures with dogs. Journal of Abnormal Psychology, 48, 291–302.PubMedCrossRefGoogle Scholar
  90. Solomon, R. L., & Wynne, L. C. (1953). Traumatic avoidance learning: Acquisition in normal dogs. Psychological Monographs, 67(Whole No. 354).Google Scholar
  91. Solomon, R. L., & Wynne, L. C. (1954). Traumatic avoidance learning: The principles of anxiety conservation and partial irreversibility. Psychological Review, 61, 353–385.PubMedCrossRefGoogle Scholar
  92. Starr, M. D., & Mineka, S. (1977). Determinants of fear over the course of avoidance learning. Learning & Motivation, 8, 332–350.CrossRefGoogle Scholar
  93. Stebbins, W. C. (1962). Response latency as a function of amount of reinforcement. Journal of the Experimental Analysis of Behavior, 5, 305–307.PubMedCrossRefGoogle Scholar
  94. Strub, H. (1963). Instrumental escape conditioning in a water alley: Shifts in magnitude of reinforcement under constant drive conditions. Unpublished master’s thesis, Hollins University, Roanoke, VA.Google Scholar
  95. Suri, R. E., Bargas, J., & Arbib, M. A. (2001). Modeling functions of striatal dopamine modulation in learning and planning. Neuroscience, 103, 65–85.PubMedCrossRefGoogle Scholar
  96. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.Google Scholar
  97. Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. R. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 497–537). Cambridge, MA: MIT Press.Google Scholar
  98. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.Google Scholar
  99. Takahashi, Y., Schoenbaum, G., & Niv, Y. (2008). Silencing the critics: Understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Frontiers in Neuroscience, 2, 86–99.PubMedCrossRefGoogle Scholar
  100. Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New Brunswick, NJ: Transaction.Google Scholar
  101. Wahlsten, D. L., & Cole, M. (1972). Classical and avoidance training of leg flexion in the dog. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 379–408). New York: Appleton-Century-Crofts.Google Scholar
  102. Weisman, R. G., & Litner, J. S. (1972). The role of Pavlovian events in avoidance training. In R. A. Boakes & M. S. Halliday (Eds.), Inhibition and learning. New York: Academic Press.Google Scholar
  103. Williams, B. A. (2001). Two-factor theory has strong empirical evidence of validity. Journal of the Experimental Analysis of Behavior, 75, 362–378.PubMedCrossRefGoogle Scholar
  104. Williams, R. W., & Levis, D. J. (1991). A demonstration of persistent human avoidance in extinction. Bulletin of the Psychonomic Society, 29, 125–127.Google Scholar
  105. Williams, Z. M., & Eskandar, E. N. (2006). Selective enhancement of associative learning by microstimulation of the anterior caudate. Nature Neuroscience, 9, 562–568.PubMedCrossRefGoogle Scholar
  106. Woods, P. J. (1967). Performance changes in escape conditioning following shifts in the magnitude of reinforcement. Journal of Experimental Psychology, 75, 487–491.CrossRefGoogle Scholar
  107. Zeaman, D. (1949). Response latency as a function of the amount of reinforcement. Journal of Experimental Psychology, 39, 466–483.PubMedCrossRefGoogle Scholar
  108. Zerbolio, D. J., Jr. (1968). Escape and approach responses in avoidance learning. Canadian Journal of Psychology, 22, 60–71.PubMedCrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2010

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburgh

Personalised recommendations