Introduction

Learning to stop responding is one of the most fundamental processes in instrumental learning. The suppression of behavior is commonly studied with extinction, in which a previously conditioned response no longer delivers the associated outcome, such that responding declines and eventually ceases. It is now well known, however, that the absence of responding does not indicate an erasure of the original conditioning. Extinguished responding can recover under a variety of conditions including renewal, spontaneous recovery, and reacquisition. In renewal, responding recovers when tested in a different physical context from the one in which extinction occurred (e.g., Bouton et al., 2011; Nakajima et al., 2000). In spontaneous recovery, responding recovers in a different temporal context, i.e., once a significant interval of time has elapsed since extinction (see Bouton et al., 2021; Jansen et al., 2016; Todd et al., 2012). In reacquisition, extinguished responding can recover rapidly when the response is reinforced again (e.g., Bullock & Smith, 1953; Willcocks & McNally, 2011, 2014; Woods & Bouton, 2007).

Extinction is considered a form of retroactive interference in which initial conditioning is inhibited by subsequent extinction learning (e.g., Bouton, 2019; Bouton et al., 2021). Interference is a general mechanism that is probably involved in behavioral change in many instrumental learning domains, including discrimination reversal learning (e.g., Thomas et al., 1985; Vila et al., 2002) and the switch between goal-direction and habit (Bouton, 2021; Steinfeld & Bouton, 2021). And extinction, of course, is not the only instance of behavior suppression via retroactive interference.

Evidence suggests that punishment – in which a reinforced instrumental response is suppressed when it earns an additional aversive stimulus such as footshock – may also be governed by a similar retroactive interference mechanism. Specifically, relapse effects that follow extinction have also been observed following punishment (e.g., Bouton & Schepers, 2015; Estes, 1944; Krasnova et al., 2014; Marchant et al., 2013; Panlilio et al., 2003). For example, Bouton and Schepers (2015) demonstrated renewal following instrumental punishment: Rats learned to lever press for a food reinforcer in Context A before receiving punishment in Context B (see also Marchant et al., 2013). There, lever presses occasionally delivered a brief footshock in addition to the reinforcer, which quickly reduced responding. When tested in each context, however, the punished response renewed in Context A (ABA renewal) as seen in instrumental extinction (e.g., Bouton et al., 2011). In a subsequent experiment, the authors also observed renewal when tested in a third, neutral context (ABC renewal).

Extinction-like mechanisms appear to be involved in punishment across a variety of designs and reinforcer types. For example, recovery after punishment has often been examined following training with drug reinforcers (e.g., Krasnova et al., 2014; Marchant et al., 2013; Panlilio et al., 2003; Pelloux et al., 2013). Renewal has also been observed following negative punishment (i.e., omission or differential reinforcement of other behavior), where learned responding is suppressed when it now prevents a reinforcer that is otherwise scheduled to occur (e.g., Nakajima et al., 2002; Rey et al., 2020). Furthermore, omission and extinction can allow equivalent renewal effects when the two procedures are compared directly (Rey et al., 2020; cf. Kearns & Weiss, 2007). In other words, negative punishment appears to instill new, inhibitory, context-dependent learning consistent with the same retroactive interference mechanism at work in extinction (see Marchant et al., 2019, for a review).

The purpose of the present experiments was to compare the vulnerability of positive punishment and extinction learning to three basic recovery effects – renewal, spontaneous recovery, and reacquisition – while keeping the general parameters of response training, elimination, and testing as comparable as possible. We chose to contrast simple extinction with the punishment procedure used by Bouton and Schepers (2015) because they appear to produce reasonably similar patterns of response suppression with similar amounts of training. To our knowledge, recovery after punishment and extinction has not been examined side-by-side within individual experiments and with similar amounts of training, though the need for such a comparison has been noted (e.g., Jean-Richard-dit-Bressel et al., 2018; Marchant et al., 2019). Based on previous research, we might see both renewal and spontaneous recovery following both punishment and extinction, so the relative magnitude of each effect in each group was of particular interest. However, we expected differences in reacquisition because punished rats would have learned to inhibit responding with the reinforcer available, encouraging transfer to the reacquisition phase. In contrast, extinguished rats would have learned to inhibit responding without the reinforcer being available, allowing for rapid recovery (i.e., renewal) when the reinforcer is reintroduced (e.g., Woods and Bouton, 2007).

Experiment 1a

Experiment 1a examined ABA renewal of instrumental responding following either punishment or extinction. Based on previous findings (e.g., Bouton & Schepers, 2015; Bouton et al., 2011) we expected to see renewal in both punished and extinguished rats. We were specifically interested in the magnitude of this renewal effect, i.e., whether punishment was more or less sensitive to a context change than extinction. If both punishment and extinction are governed by a similar retroactive interference mechanism, we expected to see similar degrees of renewal upon removal from the punishment/extinction context.

We used the punishment procedure described by Bouton and Schepers (2015) in which responding was punished by footshocks delivered on a variable interval (VI) 90-s schedule. We likewise included a yoked control group in which each rat received a noncontingent shock whenever a corresponding punished rat earned one. This yoking procedure allowed us to determine whether the suppression seen in the punished group was truly due to the contingency between response and shock as opposed, for example, to a mere Pavlovian association between Context B and shock, which on its own could produce both behavioral suppression in B and removal of that suppression in A.

Method

Subjects

The subjects were 32 naïve male Wistar rats (Charles River, Raleigh, NC, USA) that were approximately 75–90 days old at the start of the experiment and individually housed in a room maintained on a 12:12-h light:dark cycle. Males were used in part to test the generality of our prior renewal-after-punishment findings, which had been observed in females (Bouton & Schepers, 2015; Rey et al., 2020). Experiments took place during the light period of the cycle. Upon arrival, rats were allowed to acclimate to the colony for 8 days before being food restricted to 80% of their baseline body weights. Rats were maintained at this weight throughout the duration of the experiment.

Apparatus

Two sets of four conditioning chambers housed in separate rooms of the laboratory served as two distinct contexts. Each chamber was housed in its own sound-attenuation chamber and was of the same design (Med Associates model ENV-007-VP, St. Albans, VT, USA) measuring 29.53 cm × 23.5 cm × 27.31 cm (l × w × h). A recessed 5.1 cm × 5.1 cm food cup was centered in the front wall approximately 2.5 cm above the level of the floor. A retractable lever (ENV-112CM) positioned to the left of the food cup protruded 1.9 cm into the chamber. Each chamber was illuminated by one 7.5-W incandescent bulb mounted to the ceiling of the sound attenuation chamber approximately 43.6 cm above the grid floor. Ventilation fans provided background noise of 65 dBA.

Each set of conditioning chambers possessed unique characteristics to create separate contexts, which were counterbalanced to serve as Contexts A and B. In one set of boxes, side walls and ceilings were made of clear acrylic plastic while the front and rear walls were made of brushed aluminum. In each box, one side wall and the ceiling were decorated with black diagonal stripes 3.8 cm wide and 3.8 cm apart. The floor was made of stainless-steel grids (0.40 cm diameter) spaced 1.6 cm apart (center-to-center). A dish containing 5 ml of a 2% anise solution (McCormick & Co., Hunt Valley, MD, USA) diluted in tap water was placed outside each chamber near the front wall.

The second set of boxes was similar to the first except for the following features: In each box, the ceiling and one side wall were decorated with a 7 × 9 array of opaque blue circles, each 2 cm in diameter, spaced 3 cm apart (center-to-center). The floor was made of alternating 0.4 cm and 0.9 cm stainless-steel grides spaced 1.6 cm apart. A dish containing 5 ml of a 4% coconut solution (McCormick & Co., Hunt Valley, MD) diluted in tap water was placed outside each chamber near the front wall.

The reinforcer was a 45 mg grain food pellet (MLab Rodent Tablets, 5TUM; TestDiet, Richmond, IN, USA). Aversive stimuli consisted of 0.5 mA, 0.5-s footshocks delivered to each chamber by aversive stimulator/scrambler modules (ENV-414). The apparatus was controlled by computer equipment located in an adjacent room.

Procedure

The procedure generally followed previous experiments examining renewal after instrumental punishment and extinction in this laboratory (Bouton & Schepers, 2015; Bouton et al., 2011; Eddy et al., 2016). In these procedures, a response was trained in one context (A), eliminated in another (B), and then tested in both in a counterbalanced order. The design is summarized in Table 1. All experimental sessions were 30 min in duration unless otherwise noted, and context exposures followed a double-alternating enclosed pattern (e.g., ABBABAABABBA, etc.)

  • Magazine training. On the first day of the experiment, all rats received magazine training in both contexts. During these sessions, rats were placed into conditioning chambers with levers retracted, and food pellets were delivered on a random time (RT) 30-s schedule that arranged a 1/30 probability of pellet delivery each second. The order of the sessions was counterbalanced; half the rats received magazine training in Context A followed by Context B, while the other half received training in B followed by A. Sessions in each context were separated by approximately 2 h.

  • Response training. For the next 6 days, rats received instrumental response training in Context A. Following a 2-min delay at the start of each session, the lever was inserted into the chamber and lever presses were reinforced on a random-interval (RI) 30-s schedule. No hand shaping was necessary. On each day of training in Context A, rats received time-equivalent exposure to Context B with no levers available and no reinforcers delivered. Order of context exposure was counterbalanced as described above.

  • Punishment/Extinction. Following instrumental training, rats were divided into either Punished (n = 12), Yoked (n = 12), or Extinguished (n = 8) groups. Over the next 4 days, Group Punished underwent instrumental punishment in Context B. In each session, after a 2-min delay, levers were inserted and responses were reinforced on the same RI 30-s schedule as during training. However, lever presses also delivered a brief footshock (0.5 mA, 0.5 s) according to a VI 90-s schedule. The shock schedule featured random selection without replacement from a list of five intervals: 60 s, 75 s, 90 s, 105 s, and 120 s (see Bouton & Schepers, 2015). For Group Yoked, lever presses were similarly reinforced by food pellets, and shocks were received by individual rats at the same point in time that a master rat in Group Punished received them. Thus, Groups Punished and Yoked received the same number and distribution of shocks but differed in whether the shock was response-contingent. For Group Extinguished, lever presses merely earned no pellets and no shocks were delivered. On each day of training in Context B, each group also received time-equivalent exposure to Context A with no levers available and no reinforcers delivered.

  • Renewal test. On the final day of the experiment, rats underwent a 10-min test of instrumental responding in each context. Levers were inserted following a 2-min delay, but presses were not reinforced. Testing order in each context was counterbalanced between groups, and sessions in each context were separated by approximately 1.5 h.

Table 1 Designs of the Experiments

Data analyses

Response rates were evaluated with repeated-measures analysis of variance (ANOVA) with rejection criterion set to p < .05. To minimize Type 1 error rate, we report only those comparisons (using the error term from the overall ANOVA) that are orthogonal.

Results

Response training

The results of all phases of the experiment are summarized in Fig. 1. The rats acquired the lever press response without incident (left). There were no differences between groups, confirmed by a Group (Punished, Yoked, Extinguished) by Session (6) ANOVA which found a significant effect of Session, F(5, 145) = 165.66, MSE = 6.92, p < .001, but no other effects or interactions, Fs < 1.

Fig. 1
figure 1

Results of Experiment 1a. a Lever-press data for each stage of training and test. b Responding during the last and first 1-min bins during the four sessions of response elimination. Spontaneous recovery is suggested by an increase from last to first. Error bars depict the standard error of the mean and are only appropriate for between-group comparisons

Punishment/Extinction

All groups reduced their response rates upon introduction of punishment or extinction contingencies in Context B (Fig. 1, middle). However, responding in Group Yoked increased across the punishment/extinction phase and, by the final day before test, was at a rate not significantly different from the final day of response training, t(11) < 1. A Group (3) by Session (4) ANOVA on data from the response elimination phase indicated main effects of Group, F(1, 29) = 49.50, MSE = 95.41, p < .001, and Session, F(3, 87) = 4.34, MSE = 11.79, p = .007, and as well as a Group by Session interaction, F(6, 87) = 16.16, MSE = 11.79, p < .001. A separate Group (2) by Session (4) ANOVA isolating the Punished and Extinguished rats indicated a main effect of Session, F(3, 54) = 26.60, MSE = 6.66, p < .001, as well as Group by Session interaction, F(3, 54) = 3.39, MSE = 6.66, p < .025. The interaction suggests that punishment and extinction progressed at different rates over the sessions, with extinction causing a quicker initial drop in responding, but punishment producing a deeper effect later.

To investigate spontaneous recovery that occurred over the 24 h between the successive sessions of response elimination, Fig. 1b plots response rates for Groups Punished and Extinguished in Context B over the last 1-min bin of each punishment/extinction session and the first 1-min bin of the next session 24 h later. A Group (Punished, Extinguished) by Bin (last, first) by Session Transition (Sessions 1–2, 2–3, and 3–4) ANOVA revealed a significant main effect of Bin, F(1, 18) = 41.90, MSE = 19.35, p < .001, a significant interaction between Bin and Session Transition, F(2, 36) = 5.61, MSE = 6.59, p = .008, and a significant three-way interaction between Bin, Session Transition, and Group, F(2, 36) = 8.89, MSE = 6.59, p < .001. The three-way interaction suggests that the groups showed different amounts of recovery over sessions. Consistent with this suggestion, a separate Group (Punished, Extinguished) by Bin (3) ANOVA for the first 1-min bin of each session, revealed a significant effect of Bin, F(2, 36) = 4.04, MSE = 33.40, p = .026, a significant Bin by Group interaction, F(2, 36) = 4.10, MSE = 33.40, p = .025, and no other effects or interactions (F = .027). An identical ANOVA for the last 1-min bin of each session revealed no significant effects or interactions (largest F = 1.24).

Renewal test

Results of the renewal test are shown in Fig. 1a, right; Groups Punished and Extinguished both responded at a higher rate in Context A than Context B, indicating renewal. Group Yoked response rates did not differ between contexts, t(11) < 1. A Group (Punished, Yoked, Extinguished) by Context (A, B) ANOVA indicated main effects of Context, F(1, 29) = 21.81, MSE = 10.19, p < .001, and Group, F(2, 29) = 32.34, MSE = 23.45, p < .001, as well as a Context by Group interaction, F(2, 29) = 8.63, MSE = 10.19, p = .001. A separate Group (2) by Context ANOVA conducted to examine differences in renewal between Groups Punished and Extinguished revealed a main effect of Context, F(1, 18) = 28.33, MSE = 12.56, p < .001, and no other effects or interactions (Fs < 1), indicating similar renewal in both groups.

Discussion

Both punishment and extinction significantly reduced responding in Context B. The greater response suppression in Group Extinguished on Day 1 of punishment/extinction may be due in part to a context switch effect (e.g., Bouton et al., 2011): Although certain elements of the training context transferred to punishment (i.e., lever presses could still earn pellets), the same elements did not transfer to extinction, where lever presses in Context B could not earn any pellets. Thus, Group Extinguished may have experienced a greater context switch effect than Group Punished on the first day of response elimination training. Across subsequent sessions, however, the addition of footshock appeared to promote deeper response suppression than simple extinction. Group Yoked showed far less suppression over sessions, suggesting that the behavior of Group Punished was influenced by the actual response-shock contingency (e.g., Bouton & Schepers, 2015; Bolles et al., 1980; Jean-Richard-dit-Bressel & McNally, 2015) rather than merely a developing Pavlovian association between Context B and shock. Moreover, at test, the Punished and Extinguished groups showed equivalent ABA renewal, responding significantly more in Context A than in Context B, but not differing from each other in either context. In contrast, yoked animals responded at equal rates in each context. Such results with male subjects replicate and extend our prior findings with females (e.g., Bouton & Schepers, 2015; Rey et al., 2020).

The results are consistent with previous demonstrations of renewal following both positive and negative punishment (e.g., Bouton & Schepers, 2015; Nakajima et al., 2002; Rey et al., 2020) as well as extinction (e.g., Bouton et al., 2011; Eddy et al., 2016). The fact that punishment and extinction allowed essentially equal ABA renewal further suggests that they are controlled by a similar mechanism. With the current methods, punishment and extinction did not differ in their sensitivity to a context switch effect after response elimination.

A closer look at responding in the last and first 1-min bins of consecutive response elimination sessions also revealed spontaneous recovery of responding between sessions in both the Extinguished and Punished groups. However, the pattern of recoveries differed between groups: Punished rats showed especially strong recovery initially, but this effect decreased and was weaker than the recoveries observed in extinction in later sessions. It is worth noting that the conditions of spontaneous recovery “testing” differed between the groups: Punished rats could still receive response-contingent food pellets and shocks during the “tests,” whereas extinguished rats could receive neither. The fact that Group Punished was still being reinforced for responding could account for its greater recovery seen early in punishment training. This consideration led us to look at spontaneous recovery in punished and extinguished groups under common testing conditions in Experiment 1b.

Experiment 1b

The main goal of Experiment 1b was to examine spontaneous recovery following punishment and extinction with a common test procedure. Here we examined spontaneous recovery in a single test session under extinction conditions 8 days after the completion of punishment/extinction training. In addition to arranging a common test, the experiment provided a look at spontaneous recovery after a longer retention interval.

Method

Subjects and apparatus

Subjects were 16 naïve male Wistar rats housed and maintained under the same conditions as Experiment 1a. The apparatus was the same as in Experiment 1a.

Procedure

Magazine training, response training, and punishment/extinction proceeded as it had for the Punished and Extinguished groups in Experiment 1a, except all phases occurred in a single context (Table 1). There were no sessions conducted in a second context.

Spontaneous recovery test

For 7 days following the completion of the punishment/extinction phase, the rats were maintained in their home cages, where they received no training or handling besides daily weighing and feeding. After this retention interval, rats were returned to conditioning chambers for a 30-min spontaneous recovery test during which levers were available but responding was not reinforced.

Results

Response training

Lever-press acquisition proceeded without incident (Fig. 2a, left). There were no differences between groups, confirmed by a Group (2) by Session (6) ANOVA, which found a significant effect of Session, F(1, 13) = 42.70, MSE = 10.69, p < .001, but no other effects or interactions (F < 1).

Fig. 2
figure 2

Results of Experiment 1b. a Lever press rates throughout response training and punishment/extinction. b Response rates during the last minute of punishment/extinction and the first minute of the spontaneous recovery test. Error bars depict the standard error of the mean

Punishment/Extinction

One rat was excluded due to unusually high responding throughout the punishment phase (z = 2.471). All other rats reduced responding during the response elimination phase, although extinction here appeared to progress more slowly than punishment (Fig. 2a, right). A Group (2) by Session (4) ANOVA revealed significant main effects of Group, F(1, 13) = 39.42, MSE = 2.40, p < .001, Session, F(3, 39) = 105.39, MSE = .93, p < .001, and a Session by Group interaction, F(3, 39) = 12.98, MSE = .93, p < .001.

Spontaneous recovery test

As in Experiment 1a, response rates from the final minute of the last day of punishment/extinction were compared to those from the first minute of the spontaneous recovery test (Fig. 2b). A Group (2) by Session (2) ANOVA revealed a significant main effect of Session, F(1, 13) = 6.01, MSE = 11.65, p = .029, indicating spontaneous recovery, but neither the effect of group nor the group by session interaction approached significance (Fs < 1). Thus, spontaneous recovery appeared similar after extinction and punishment when the test conditions were the same for each group.

Discussion

Both the punished and extinguished groups exhibited spontaneous recovery when tested after a seven-day retention interval, a pattern that did not differ significantly between groups. This confirmed the observation of spontaneous recovery in both groups throughout the punishment/extinction phase in Experiment 1a. In Experiment 1a, however, that recovery was observed under different test conditions of punishment and extinction. Here, spontaneous recovery was studied under a common extinction test condition where no shocks or food pellets were delivered.

The result is consistent with previous demonstrations of spontaneous recovery following both punishment (e.g., Estes, 1944) and extinction (e.g., Rescorla, 1996), and further emphasizes the similar degree of response recovery following each manipulation. Considering spontaneous recovery as a temporal form of renewal (e.g., Bouton, 1988, 2002, 2004; Bouton et al., 2021), the results of Experiments 1a and 1b suggest that instrumental responding eliminated by extinction or punishment remains similarly sensitive to changes in both a physical and a temporal context. It is worth remembering that although Experiment 1b could equate the groups on the conditions at testing, there was no way to control for the change in conditions between response elimination and testing. There will always be underlying differences between extinction and punishment procedures.

Experiment 2

Rapid reacquisition when response-reinforcer pairings are reintroduced is another recovery effect known to occur after instrumental extinction. Woods and Bouton (2007; see also Bouton et al., 2004; Ricker & Bouton, 1996) have argued that rapid reacquisition is another kind of context effect. In free operant procedures, where rats earn repeated reinforcers while responding freely, the animal initially learns to respond in the “context” of recent reinforcement. When the response is extinguished, it undergoes extinction in the absence of such reinforcement. The presence and absence of reinforcement can thus be viewed as distinct contexts (e.g., Bouton et al., 2012; Woods & Bouton, 2007). Following extinction, a reacquisition test – where responding is again reinforced – essentially constitutes an ABA renewal test. Correspondingly, reacquisition is typically rapid following extinction. Consistent with this view, the effect can be attenuated by either occasional response-reinforcer pairings or noncontingent reinforcers delivered during the extinction phase (e.g., Woods & Bouton, 2007). Either of these manipulations would theoretically promote more generalization between the extinction and reacquisition phases.

Based on the results of Experiments 1a and 1b – where instrumental responding recovered to similar degrees following punishment and extinction – one might expect similar rapid reacquisition following both punishment and extinction. However, punishment learning initially occurs in the presence of reinforcement, so here again the response-elimination and reacquisition phases occur in similar “recent reinforcer” contexts. Thus, the behavioral inhibition generated by punishment might transfer more effectively to a reacquisition test, potentially impairing reacquisition relative to extinction. This prediction was tested in Experiment 2.

Method

Subjects and apparatus

Subjects were 16 male Wistar rats maintained under the same conditions as above. The apparatus was the same as in Experiments 1a and 1b.

Procedure

Rats received magazine training, response training, and punishment/extinction in parallel with the rats of Experiment 1b (Table 1). Again, half the rats received punishment while the other half received extinction.

  • Reacquisition test. Following completion of punishment/extinction, all rats received three daily reacquisition sessions. Each session was identical to original response training, such that lever-pressing was reinforced on an RI 30-s schedule for 30 min and no shocks were delivered.

Results

Response training

Lever-press acquisition proceeded without incident (Fig. 3a, left). There were no differences between groups, confirmed by a Group (2) by Session (6) ANOVA, which revealed a significant effect of Session, F(5, 65) = 78.49, MSE = 6.79, p < .001 but no group effect or interaction (larger F = 2.01).

Fig. 3
figure 3

Results of Experiment 2. a Lever press rates throughout response training and punishment/extinction. Error bars depict the standard error of the mean. b Comparison of reacquisition and initial response acquisition, broken into 5-min bins. Error bars depict the standard error of the mean recommended for within-group comparisons (Cousineau & O’Brien 2014)

Punishment/Extinction

One rat was excluded due to unusually high responding throughout the punishment phase (z = 2.474). All other rats reduced responding throughout punishment/extinction (Fig. 3a, right). A Group (2) by Session (4) ANOVA revealed significant main effects of Session, F(3, 39) = 146.72, MSE = 1.21, p < .001, and Group, F(1, 13) = 104.38, MSE = 3.09, p < .001, and a significant interaction between Session and Group, F(3, 39) = 27.44, MSE = 33.17, p < .001.

Reacquisition test

Figure 3b shows a comparison of the first 3 days of response training (acquisition) and the 3 days of reacquisition training, broken into 5-min bins. A Group (Punished, Extinguished) by Phase (Acquisition, Reacquisition) by Bin (18) ANOVA revealed differences in the effects of reacquisition after extinction and punishment. There were significant main effects of Group, F(1, 13) = 27.17, MSE = 366.03, p < .001, Phase, F(1, 13) = 8.65, MSE = 264.23, p = .011, and Bin, F(17, 221) = 35.58, MSE = 20.32, p < .001. More importantly, there were significant Phase by Group, F(1, 13) = 22.51. MSE = 264.23, p < .001, Bin by Group, F(17, 221) = 3.78, MSE = 20.32, p < .001, and Phase by Bin by Group, F(17, 221) = 4.56, MSE = 17.17, p < .001, interactions. The interactions suggest, as depicted in Fig. 3b, that while reacquisition was rapid after extinction, it was not so fast after punishment learning. Consistent with this characterization, a separate Phase (Acquisition, Reacquisition) by Bin (18) ANOVA for Group Extinguished revealed significant main effects of Phase, F(1, 119) = 65.00, MSE = 10.76, p < .001, and Bin, F(17, 119) = 28.23, MSE = 10.76, p < .001, and a significant Session by Bin interaction, F(17, 119) = 76.96, MSE = 10.76, p < .001. In contrast, an identical ANOVA for Group Punished revealed only a main effect of Bin, F(17, 102) = 15.38, MSE = 24.65, p < .001, and no other effects or interactions (largest F = 1.19). After extinction, reacquisition was fast, but after punishment, reacquisition was neither fast nor slow relative to the initial conditioning.

Discussion

Consistent with previous research (e.g., Bouton et al., 2012; Bullock & Smith, 1953; Willcocks & McNally, 2011, 2014; Woods & Bouton, 2007), extinguished rats showed reacquisition at a rate that was significantly faster than initial response training. Punished rats, however, reacquired responding at a much slower rate that was not significantly different than the first 3 days of response training. In other words, rapid reacquisition of instrumental responding occurred following extinction, but not punishment.

Slower reacquisition following punishment than extinction might seem consistent with the intuition that punishment causes a deeper or stronger level of response suppression. However, that intuition was strongly challenged by the similarity of renewal and spontaneous recovery following the present punishment and extinction procedures in Experiments 1a and 1b. Instead, the results may highlight another difference between punishment and extinction: The two response-suppression paradigms entail learning to inhibit responding in the presence versus absence of the reinforcer, respectively. Although the difference in reacquisition rate between punished and extinguished animals appears to contrast with the similar renewal and spontaneous recovery seen in Experiments 1a and 1b, all three results illustrate the same fundamental point: Extinction and punishment both promote a form of context-dependent inhibitory learning (see General discussion). However, this inhibitory learning is acquired in different contexts during punishment and extinction: the presence and absence of the reinforcer, respectively. Thus, a reacquisition test following extinction involves a greater context change than a reacquisition test following punishment, promoting rapid reacquisition following extinction but not punishment.

General discussion

The present experiments examined three recovery effects – renewal, spontaneous recovery, and reacquisition – of an instrumental response eliminated by either punishment or extinction. Collectively, the results are consistent with the idea that punishment and extinction are similar examples of context-dependent learning, although this conclusion should not be taken to imply that there are no interesting differences between them.

In Experiment 1a, punished and extinguished groups exhibited nearly identical degrees of ABA renewal. Although this renewal effect has previously been demonstrated separately in punishment (e.g., Bouton & Schepers, 2015; Marchant & Kaganovsky, 2015) and instrumental extinction (e.g., Bernal-Gamboa et al., 2017; Bouton et al., 2011; Todd, 2013), this is – to our knowledge – the first direct comparison within a single experiment. Although the results with punishment may depend on the parameters used here, the results extend prior research suggesting that punished and extinguished responses are similarly sensitive to changes in physical context. They also parallel previous results suggesting that extinction and omission learning (negative punishment) can create response suppression that is equally sensitive to ABA renewal (Rey et al., 2020).

Experiment 1a also produced results suggesting that punished and extinguished responding spontaneously recovered between sessions, even though, as we noted, the conditions of testing differed between the groups. Experiment 1b then tested spontaneous recovery under the same test conditions (extinction) and demonstrated similar spontaneous recovery following punishment and extinction. The results are generally consistent with prior evidence of spontaneous recovery following both punishment (e.g., Estes, 1944) and extinction (e.g., Rescorla, 1996), and again extend these findings to suggest a similar degree of recovery in a direct comparison within a single experiment. The findings also extend the results of Experiment 1a to suggest that punished and extinguished responses are similarly sensitive to changes in temporal – in addition to physical – context.

Experiment 2 showed rapid reacquisition following extinction but not punishment. The result after extinction is well known (e.g., Bouton et al., 2012; Bullock & Smith, 1953; Willcocks & McNally, 2011, 2014; Woods & Bouton, 2007). The punishment result, however, is new. Although it suggests that the effects of punishment and extinction are not perfectly equivalent, it is nonetheless consistent with a contextual analysis of reacquisition, as a reacquisition test following punishment features less context change between the punishment and reacquisition phases (that is, the reinforcer was available and even earned early in punishment training). It is also reminiscent of Rey et al.’s (2020) observation that presenting reinstating food pellets at test had a greater effect on responding following extinction than following omission. Thus, the difference in reacquisition rates is likely due to extinction and punishment being learned in different contexts.

The present results suggest that both punishment and extinction generate a form of context-specific inhibitory learning. In Experiments 1a and 1b, this was evident in a context-switch effect at test; in Experiment 2 it was evident in a context-switch effect between response elimination and retraining. It is important to note that these similarities exist despite real differences in punishment and extinction learning. Such differences were apparent in the response elimination phase, where punishment and extinction progressed at significantly different rates in all experiments. Over sessions, extinction appeared to progress more slowly than punishment, and punishment ultimately promoted more rapid and more complete response suppression (see especially Experiments 1b and 2). Thus, the arrangement of instrumental contingencies in extinction and punishment does indeed guide behavior in distinct ways (Marchant et al., 2019), but these differences do not necessarily affect the context sensitivity of punishment and extinction learning.

The context-sensitivity of punishment and extinction raises the question of how context operates in punishment. A classical perspective from Pavlovian extinction suggests that the context might function as a negative occasion-setter by signaling the presence of an inhibitory R-O association (e.g., Holland, 1992). However, recent data suggest that the role of context in instrumental extinction may be to directly inhibit the response, perhaps in the form of an S-R association (e.g., Bouton et al., 2016; Todd, 2013; see Trask et al., 2017). Bouton and Schepers (2015) demonstrated response-specific punishment effects in a design that equated punishment history in two contexts: When R1 was trained in A and punished in B while R2 was trained in B and punished in A, only R1 subsequently renewed in A and only R2 renewed in B. The pattern was essentially the same as prior studies of extinction rather than punishment (Todd, 2013). The results are arguably inconsistent with an occasion-setting account, which predicts transfer of negative occasion-setting between similarly trained responses (i.e., context would act as a negative occasion-setter for all responses performed there). Instead, they suggest the involvement of response-specific inhibition in the context of punishment or extinction, furthering the parallel between punishment and extinction. The present results contribute to this parallel by demonstrating an equally powerful role of context in modulating behavior following punishment and extinction.

Finally, the results may have clinical implications. Punishment has generated interest as a potential alternative to extinction in clinical settings, and as a more ecologically valid model of human behavior change in general (Marchant et al., 2019). For example, abstinence from undesirable behaviors – e.g., drug-taking, smoking, gambling, overeating, etc. – is typically motivated by avoidance of the adverse (and thus punishing) consequences of those behaviors (e.g., Downey et al., 2001) rather than extinction. Such behaviors are rarely emitted without reinforcement, the defining feature of extinction, and abstinence rarely occurs in the total absence of reinforcement. Rather, patients must learn to inhibit responding when the reinforcer is available (as in punishment). As seen in Experiment 2, subtle differences between punishment and extinction can have meaningful effects when the subject is later re-exposed to the response-reinforcer contingency. However, Experiments 1a and 1b confirm that, in either case, a change in context can bring about relapse, and punishment may not be any better than extinction in preventing this.