Introduction

Anxiety disorders are the most common type of mental disorder in Western societies (see Grillon et al., 2019), with a prevalence of up to 25% in the adult population (Baxter et al., 2013; Remes et al., 2016). These disorders do not only constitute a major health problem for patients, but also come at enormous economic and societal cost. The burden is even greater considering that anxiety disorders are comorbid with other health problems and increase the risk for different mental disorders, such as substance addiction or depression (Grillon et al., 2019). Fortunately, exposure-based therapies, a form of cognitive-behavioural therapy (CBT), have empirically demonstrated their effectiveness for most patients suffering from anxiety disorders (Craske & Mystkowski, 2006). Exposure-based therapies are a set of techniques in which the patient is repeatedly confronted with an anxiogenic stimulus in the absence of aversive consequences. The aim of this exposure is the reduction of the fear response associated to the anxiogenic stimulus, as well as improving clinical anxiety.

Experimental extinction in the laboratory has been widely used as a model for exposure therapies of anxiety disorders (Graham & Milad, 2011; Urcelay, 2012) and to understand the origin of different forms of relapse (Vervliet et al., 2013). According to this model, fear extinction depends on the development of inhibitory learning (Bouton, 2004; Craske et al., 2014). The use of experimental laboratory models becomes a relevant tool for trying new techniques that may potentiate inhibitory learning and eventually translate into the improvement of exposure-based therapies (Craske et al., 2014; Sewart & Craske, 2020; Vervliet et al., 2013). In fact, there is ample consensus in the literature that relapse prevention depends crucially on the optimization of inhibitory learning (Craske et al., 2008, 2014; Jacoby & Abramowitz, 2016; McGuire & Storch, 2019; Weisman & Rodebaugh, 2018).Footnote 1 Nevertheless, and although exposure-based therapies are successful in reducing anxiety in the short term, they do not always maintain their effects in the long term, with relapse estimates ranging from 19 to 62% (Craske & Mystkowski, 2006).

Several factors have been shown to be implicated in the recovery of the initial problematic response in the laboratory, both in non-human and human conditioning studies. For example, the mere passage of time after the extinction phase may lead to a relapse of the initial anxious response (i.e., the spontaneous recovery effect; Pavlov, 1927). Another factor that has been related to relapse is the experience of a stressful situation after extinction, even if that situation is unrelated to the initially anxiogenic stimulus (i.e., the reinstatement effect; Rescorla & Heth, 1975). A change in the context in which extinction was initially provided has also shown to promote the renewal of the initial anxious response (i.e., the renewal effect; Bouton & Bolles, 1979; see Vervliet et al., 2013 for a review of renewal research). Furthermore, after extinction, later encounters with the initial anxiogenic experience lead to a very rapid relearning, faster than the original learning experience (i.e., the rapid reacquisition effect; Bouton, 2002). Importantly, all these different forms of recovery have not only been observed in conditioning studies, but also in the clinical setting after exposure therapy (Boschen et al., 2009; Craske et al., 2012). Thus, the main current challenge for exposure-based therapies is not so much to achieve anxiety reduction but to prevent the relapse of the pathological anxiety response. In other words, there is ample room for improvement in the efficacy of this successful evidenced-based therapy, with relapse prevention becoming a top priority in this regard (Dunsmoor et al., 2015; Vervliet et al., 2013).

It is important to note that, although here we refer to the conditioning of fear, these ideas can also be applied to appetitive contexts, in which a neutral cue becomes associated with an event of appetitive or positive significance (for instance, food or drugs). Additionally, although this type of conditioning is less studied in animals and, especially, in humans, it is highly important to advance in the study of certain pathologies, such as substance addiction, gambling, or obesity (Andreatta & Pauli, 2015; Quintero et al., 2020; Ramnerö et al., 2019; Schyns et al., 2020). Moreover, equivalent relapse phenomena have been described in appetitive conditioning and, in fact, the relapse prevention strategy studied in this review was first proposed to prevent the relapse of appetitive responses (see Bouton et al., 2004).

In recent years, several techniques have been developed from laboratory extinction studies aimed to potentiate inhibitory learning (see Craske et al., 2014, 2018, 2022; Tolin, 2019; Vervliet et al., 2013). One of these techniques is the occasional inclusion of reinforced trials during extinction (i.e., occasional reinforced extinction; ORE hereafter). This effect was initially described by Bouton et al. (2004) and Woods and Bouton (2007). In a series of animal classical and instrumental conditioning experiments, they found that including some pairings between the conditioned stimulus (CS) and the unconditioned stimulus (US) as part of the extinction procedure could slow down the rate of reacquisition of a previously extinguished response. Following these initial studies, several authors, like Craske et al. (2014, 2018), argued that ORE may be a viable and general strategy to enhance inhibitory learning and its retrieval, with a potentially translational value in the clinical domain. Apparently, experiencing the US during extinction may provide some form of resilience to the individual which can be therapeutically beneficial (Krompinger et al., 2019).

Different explanations have been proposed for the potential effectiveness of ORE on the mitigation of relapse. According to Bouton et al.’s account (2004), the initial excitatory association acquired will stay unchanged after extinction. A new inhibitory association between the original CS and the US will be created during extinction, and this second association will be context dependent. This means that it will be engaged only when features of the extinction context are present. Therefore, at a later test phase, conditioned response will be reduced to the extent that the test context resembles the extinction context. The reasoning for this would be that the inhibitory memory will be retrieved and reduce the expression of the original association. Following Bouton et al.’s studies, the preventive effects of ORE should be specific to rapid reacquisition but not to other forms of relapse (Bouton et al., 2004; Woods & Bouton, 2007). This should occur because reinforced trials in ORE, unlike standard extinction, become part of both the acquisition and the extinction contexts. Reacquisition will be slowed down as reinforced trials introduced after extinction will be able to promote the retrieval of the inhibitory learning developed during extinction. In the case of other recovery phenomena (e.g., spontaneous recovery), the test phase is conducted including only extinction trials. As have been explained, after ORE, the extinction memory will only be retrieved if both reinforced and non-reinforced trials are presented during test.

Gershman et al. (2013) offer a different explanation for the preventive effects of ORE based on the concept of prediction error. According to their account, relapse prevention depends crucially on the specific distribution of reinforced trials during extinction, so that only a gradual decrease of reinforced trials after acquisition will have a preventive effect. This account assumes that the onset of a standard extinction training produces large prediction errors. The CS strongly predicts the presentation of the US after the initial training, but suddenly this is not the case anymore. They propose that these persistently large prediction errors may serve as a segmentation signal (i.e., a novel state in the environment), demanding the formation of a new inhibitory memory and thus, the original memory remains mostly unmodified. The newly formed inhibitory memory becomes context dependent (see also Bouton, 1993, 2002). However, should these prediction errors be small or infrequent, but still large enough to drive learning, as in ORE, no segmentation would occur, and the original acquisition memory will be weakened. Thus, according to Gershman et al. (2013), a gradual ORE should have a general preventive effect to all forms of relapse (see Culver et al., 2018, for other theoretical accounts of ORE general preventive effects).

In recent years, evidence has started to be gathered regarding ORE effects, including experiments with non-human and human participants, using appetitive as well as aversive procedures, and evaluating its effectiveness on different relapse phenomena such as spontaneous recovery, reinstatement, renewal, or rapid reacquisition. Given the rapid accumulation of evidence and the different, even contradicting, pattern of results obtained so far, it is necessary to make a comprehensive and critical review of this evidence. Our objective was to conduct a systematic review of ORE studies to answer the following questions: Is there consistent evidence showing that ORE is effective in reducing the relapse of the conditioned response? Is this relapse prevention effect homogeneous across the different relapse phenomena tested? Under what specific circumstances have these effects been studied (for instance, type of sample, outcome measure, etc.)? What methodological criteria should be taken into account when testing the effectiveness of ORE (for example, distribution of reinforced trials, critical prerequisites to test the effectiveness of ORE, etc.)?

Method

The literature search was conducted in July 2022 following the Preferred Reporting for Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines (Page et al., 2021).

Eligibility criteria

For this review, we considered both laboratory and clinical studies. Given our interest in investigating the potential benefits of an ORE intervention, we included studies conducted with non-human animals and human participants, both in appetitive and aversive conditioning. Given the great heterogeneity of variables, stimuli, and procedures, we decided to include laboratory or clinical studies only if they met the following inclusion criteria:

  1. a)

    included an ORE procedure as defined by Bouton et al. (2004). That is, an extinction phase in which some reinforced trials are presented during extinction.Footnote 2

  2. b)

    was conducted using an appetitive or an aversive preparation.

  3. c)

    used non-human animal or human samples.

Studies were excluded if they were (i) off topic, (ii) dissertation manuscripts, (iii) study proposals without data or results, or (iv) theoretical articles.

We decided to leave out dissertation manuscripts because they either (a) consisted of computational modelling work, (b) briefly mentioned ORE, or (c) were later published and included in this review.Footnote 3 Additionally, although excluded, we decided to track the number of theoretical works identified during the search process.

Information Sources and Search Strategy

The literature search was performed in two steps. First, MJQ conducted the main search using three databases: Web of Science, Scopus, and PsycInfo. No restrictions in language or publication date were applied. Secondly, MJQ also carried out a citation search to identify studies that were referenced or that cited the studies that were eligible for full-text review in the first step.

The search syntax was developed by taking into consideration the wide variety of terms that have been used to describe the manipulation of interest (ORE). This was achieved by including the different names used in previous papers already familiar to the authors (because of their previous work with this experimental strategy; see Morís et al., 2017, or Quintero et al., 2022). Additionally, extra terms were included (such as “intermittent reinforced extinction”) based on a preliminary literature search conducted on late 2020. We also included the terms related to the field on which we were interested (conditioning and extinction). The final search syntax was as follows:

(“occasional* reinforced extinction”) OR (“partial* extinction”) OR (“partial* reinforced extinction”) OR (“gradual extinction”) OR (“intermittent reinforced extinction”) AND (conditioning) AND (extinction)

Selection Process

After importing the results from the databases on Microsoft Excel (2018) format, MJQ merged the three different files and revised the resulting references to check for duplicates. After removing them, MJQ and FJL reviewed titles and abstracts of the total results and decided whether to screen the full text. Next, MJQ screened full-text articles for inclusion. After this, MJQ performed an additional search on Google Scholar looking for papers citing the selected reports in the previous search or cited by them. These reports were assessed for inclusion whenever they could be retrieved. If necessary, a second researcher (FJL) was consulted to make the final decision.

Data Collection Process

MJQ designed a data coding sheet after consensus with the other authors was reached on which variables to include, using this document to extract data from eligible studies. This information was reviewed by FJL, JM, and MAV.

We collected data on:

  1. 1)

    The report: authors, year, and type of publication (laboratory or clinical study).

  2. 2)

    The study (when applicable): theoretical background, objectives, and hypothesis.

  3. 3)

    The method (when applicable): sample characteristics (sample, number of participants), setting, stimuli, procedure, dependent variables, or primary outcome measures (cognitive, behavioural, or psychophysiological outcomes, as well as questionnaires), response recovery phenomena assessed or type of test, additional treatments (in the case of clinical studies), and results.

  4. 4)

    Other relevant information: pre-registrations, availability of data and/or scripts in open repositories, whether sample size was based on a power analysis, and the type of contrast used to test the ORE effect.

Results

Study Selection

We originally found 350 results from the three databases used: 243 from Web of Science, 35 from Scopus, and 72 from PsycInfo. After excluding duplicated articles and other reports for different reasons (namely, translations of the same article), 275 reports were screened. Two hundred and fifty-three of them were excluded based on their titles and abstract. Twenty-two full-texts articles were then assessed for eligibility. However, one of them could not be retrieved since it was a conference abstract. Therefore, full-text review was performed on 21 reports. Five of them did not include an ORE procedure as defined in Bouton et al.’s studies and were excluded. Six of the reports were theoretical and, therefore, excluded. Later, we conducted both reference and citation searching on Google Scholar based on the initially included reports (n = 10). We identified 40 reports and could retrieve 39 of them to assess their eligibility. A total of 21 reports were excluded for not using an occasional reinforced intervention as defined by Bouton et al. (2004) (n = 15), being thesis dissertations (n = 4), not peer-reviewed works (n = 2) being a study proposal (n = 1) or theoretical articles (n = 12). As already mentioned, it should be noted that (a) one of the papers from the original selection of 10 reports (see Thompson et al., 2018) included the results from one of the excluded thesis dissertations, and that (b) although a study proposal was excluded, we included a later study that conducted the proposed intervention and reported the obtained results (see Schyns et al., 2020). The final sample consisted of 15 reports. See Fig. 1 for the flow diagram of the study search and selection.

Fig. 1
figure 1

PRISMA 2020 flow diagram of the selection process

Study Characteristics

The fifteen reports included in this review were published between 2004 and 2022. Twelve of them reported the results of laboratory experiments (one of them being a corrigendum for the original publication; see Gershman et al., 2013, 2021), while the remaining three were conducted in clinical contexts (see Jessup & Olatunji, 2022; Krompinger et al., 2019; Schyns et al., 2020). Information about the main characteristics of the laboratory and clinical studies can be found in Tables 1 and 2.

Table 1 Main characteristics of the laboratory studies
Table 2 Main characteristics of the clinical studies

Effectiveness of ORE: Results from the Laboratory

Different recovery phenomena have been studied when assessing the potential benefit of ORE in the laboratory. However, as can be seen in Fig. 2, some relapse phenomena (e.g., reacquisition) have been more extensively studied than others (e.g., renewal) and results have not always been consistent. In the following sections, we summarize the main results of this literature.

Fig. 2
figure 2

Effects of an ORE treatment on the different response recovery phenomena. Green circle—response recovery was effectively reduced by ORE. Red circle—response recovery was not reduced by ORE. Grey circle—the response recovery phenomenon was not assessed. Yellow circle—inconclusive results (see the main text for further details)

Animal Studies

Several studies have found that ORE can effectively slow down the rate of reacquisition of the conditioned response. In animals, this effect has been observed with measures such as number of magazine entries (Bouton et al., 2004) or number of lever presses (Woods & Bouton, 2007). Regarding spontaneous recovery and reinstatement, Gershman et al. (2013) found that a gradual decrease in the frequency of CS-US pairings during extinction could reduce the return of fear measured as freezing in rats.

Human Studies

van den Akker et al. (2015), Morís et al. (2017), and Quintero et al. (2022) found a slower reacquisition of US expectancy ratings in the occasional reinforced group in comparison to standard extinction. Culver et al. (2018) found this benefit only on the SCR measure. In contrast, Lipp et al. (2021) and Thompson et al. (2018) did not find evidence of such effect in any of the included measures.

Although Culver et al. (2018) pointed to a reduced recovery of SCR and expectancy ratings, it should be noted that extinction was not asymptotic (e.g., expectancy ratings and SCR levels showed some remaining conditioning at the end of the extinction phase greater in the ORE than in the standard extinction condition, probably due to the differential training), then, these results should be interpreted with caution. Thompson et al. (2018) found that ORE eliminated spontaneous recovery, in comparison to a standard extinction condition, but only as measured with SCR. Finally, Quintero et al. (2022) failed to find a prevention or reduction of the spontaneous recovery of expectancy ratings in two experiments.

Lipp et al. (2021) is, to this date, the only study investigating the effects of occasional reinforcement during extinction on the renewal of the conditioned response. However, they did not find significant effects supporting a beneficial role of this manipulation when compared to the standard extinction group.

Finally, a reduction in the reinstatement of the startle response was found by Shiban et al. (2015), but these results should be considered carefully, as the sample size for this measure was small (see the section on Statistical and methodological considerations of the ORE literature). Neither Quintero et al. (2022) nor Thompson et al. (2018) could find a preventive effect of ORE on the reinstatement of US expectancy ratings.

Summary of the Results

The results of an occasional reinforced intervention on the reacquisition of the conditioned response tend to be the most consistent, with six out of eight experiments finding a significant effect, both in animals and in humans, and using either appetitive or aversive procedures. Results regarding other relapse phenomena tend to be less reliable, with some tests showing beneficial effects of ORE on the prevention or reduction of spontaneous recovery (Gershman et al., 2013; Thompson et al., 2018) or reinstatement (Gershman et al., 2013; Shiban et al., 2015), while others did not (see Quintero et al., 2022, and Thompson et al., 2018). Finally, only one article has studied renewal after ORE, finding no significant results (Lipp et al., 2021).

Considering the implications these results may have for the clinical practice, it is fundamental to understand which factors could be responsible for the differences observed across studies. Could it be that different procedural aspects (i.e., recovery phenomena, type of measures, specific procedure…) explain the sometimes-contradictory results? In the following sections, these aspects will be described to further discuss the heterogeneity observed in the ORE literature.

Primary Outcome Measures

As can be seen in Table 1, a wide variety of outcome measures have been reported within the included literature, ranging from number of magazines entries during CS presentations or the level of freezing in non-human animal studies, to expectancy ratings or skin conductance response (SCR) in human studies. This heterogeneity may hamper the direct comparison between the results of the different studies and the interpretation of the effect of ORE. In fact, although different studies have found significant results on the same relapse phenomenon, they differ in the outcome measure in which those significant results were found. For instance, although both Culver et al. (2018) and Thompson et al. (2018) found evidence of a reduced spontaneous recovery, the former did so on SCR and expectancy ratings, whereas the latter only found a benefit on SCR.Footnote 4 Regarding reacquisition, Culver et al. (2018) could only find an effect on the SCR measure, at odds with other studies (see Morís et al., 2017; Quintero et al., 2022; van den Akker et al., 2015), where the rate of reacquisition was slowed down when measured as expectancy ratings.

Note that the different ORE studies compared the levels of response recovery between groups. However, Dunsmoor et al., following a category conditioning procedure, performed a slightly different comparison, that is, they compared the number of elements from the conditioning and extinction phases that could be recognised at test. Additionally, while Gershman et al. (2013) also included a long-term memory test, Culver et al. (2018) conducted a re-test at the end of the task (see Table 1 for more information on the type of test performed in the different studies and the results).

Additionally, some of the included human studies also assessed a series of psychological traits (for example, State-Trait Anxiety Inventory, STAI; Fear of Spiders Questionnaire, FAS; Intolerance of Uncertainty Scale—short version, IUS-12) to control their potential effect on group differences not related to the experimental manipulation rather than to study the effect of individual differences. Of note, all these studies then reported that participants did not significantly differ in the traits measured. See Table 1 for additional information on the reported outcomes within the different empirical articles.

Types of Stimuli

In relation to the nature of the type of CSs used in the laboratory studies, there is also great diversity. The animal studies included within this review used tones or followed an instrumental procedure (measuring lever presses), whereas human experiments used either physical objects (i.e., a children’s jewellery box) or images (such as neutral faces, animals, tools, or geometrical figures). Regarding the USs, the studies used food (i.e., pellets on animal studies; a spoon of chocolate mousse in one of the human studies), electric shocks, an air blast, or an aversive sound. The diversity of stimuli could represent another source of heterogeneity and should be considered, as the type of stimulation may have an influence on the kind of learning that takes place. For instance, whereas some stimuli would promote a stronger emotional response, others would generate weaker responses. This could impact the effectiveness of ORE, obscuring the real effect this procedure may have.

Heterogeneity of the Procedures

Animal studies. Bouton et al. (2004) reported two experiments. In Experiment 1, animals underwent eight 24-trial sessions during conditioning acquisition, twelve 24-trial sessions during extinction, and one 24-trial session for the reacquisition test. During extinction, the rate of reinforcement was either 1:8 or 2:8, meaning that 1 or 2 out of every 8 trials were reinforced. In their Experiment 2, conditioning consisted of ten 8-trial sessions, while extinction lasted for eleven (Experiment 2—Replication 1) or eighteen (Experiment 2—Replication 2) sessions, with 24 trials each. During these trials, the ratio of CS-US pairings gradually decreased from 1:8 to 1:12, until 1:24. Lastly, the test phase consisted of two 17-trial sessions that evaluated the reacquisition of the conditioned response.

Woods and Bouton (2007) reported the results from three experiments. All of them used the same general design, with five 30 min sessions during conditioning, eight 60 min sessions for extinction, and a reacquisition test that took place on two 60 min sessions. In their first two experiments, there were two ORE groups that underwent a gradual decrease in the number of reinforced instrumental responses (from a variable interval with an average schedule of 4 min, to a final variable interval with an average schedule of 32 min). These two groups differed on the variable interval schedule they experienced at test. In Experiment 3, the authors decided to keep just one occasional reinforced group, which underwent a training identical to that of the other two experiments for this group.

Gershman et al.’s (2013) Experiment 1 included three conditioning trials, 24 extinction trials, four long-term memory test trials 24 h after extinction, and four spontaneous recovery test trials 30 days after the previous test. In their Experiment 2, the design was similar, but the reinstatement test took place 24 h after extinction and was followed by the memory test 24 h later. In both experiments, the occasional reinforced group was identical and included a 3:8 reinforcement ratio that was reduced to 2:8 and, finally, to 0:8. Additionally, these experiments included a control condition, namely reverse ORE, where the same number of CS-US pairings was presented but following a gradually increasing fashion.

Human studies. We also observed great differences in the procedures used with human samples. For instance, the number of acquisition trials varies, with studies including five (van den Akker et al., 2015), six (Thompson et al., 2018), eight (Culver et al., 2018; Lipp et al., 2021; Morís et al., 2017, Experiments 2 and 3; Quintero et al., 2022), eighteen (Morís et al., 2017, Experiment 1; Shiban et al., 2015), or even forty trials (Dunsmoor et al., 2018) per stimulus (CS + or CS−). Note, however, that the procedure used by Dunsmoor et al.’ is slightly different from the other studies included in this review, as it follows a category conditioning preparation (i.e., they use CS categories, such as tools or animals, rather than specific stimuli) that requires the presentation of a large number of trials from each category.

In relation to the extinction design, again, we found substantial variability in the number of trials presented during this phase, as can been seen in Fig. 3. Additionally, differences were found in the reinforcement schedule used. For instance, although some studies presented a gradually decreasing number of CS-US trials during extinction (Dunsmoor et al., 2018; Lipp et al., 2021; Morís et al., 2017, Experiment 3; Quintero et al., 2022; Shiban et al., 2015; van den Akker et al., 2015), some others still included CS-US presentations during the final block of the extinction phase (Culver et al., 2018; Morís et al., 2017, Experiments 1 and 2; Thompson et al., 2018). Figure 3 also displays the reinforcement ratio throughout extinction for the different laboratory studies.Footnote 5

Fig. 3
figure 3

Summary of the number of extinction trials and CS-US pairing for the ORE group. Each rectangle represents a block of an equal number of trials, except for Morís et al.’s (2017), Experiments 1 and 2. For instance, van den Akker et al. (2015) included 22 extinction trials, therefore, two blocks of 11 trials each. The number within the rectangle specifies how many CS-US pairings were included per block. Numbers in bold represent the total number of extinction trials. In their Experiment 1, Morís et al. (2017) included 3 non-reinforced trials at the beginning and at the end of the extinction phase, respectively. In their Experiment 2, the number of non-reinforced trials was 2 and 1, respectively

Clinical Studies: Effects of ORE in Therapy

A total of three clinical studies have applied an occasional reinforced intervention in a clinical setting. One of them as a case study with OCD patients (Krompinger et al., 2019) and the other two as clinical studies with snake fearful adults (Jessup & Olatunji, 2022) and overweight women (Schyns et al., 2020). As can be seen in Table 2, a great variety of measures were assessed, from symptom-related questionnaires (e.g., Yale-Brown Obsessive Compulsive Scale, Y-BOCS, or Eating Disorder Examination Questionnaire, EDE-Q) to expectancy ratings or behavioural tests (i.e., behavioural approach task, BAT).

On these studies, the intervention consisted of exposure experiences where the participants had to occasionally encounter the relevant stimulus or situation. For Krompinger et al. (2019) this meant that two OCD patients underwent a treatment where they had to confront evidence “confirming their fears” (for instance, one of the patients, who was fearful of causing harm while driving, had to complete a driving exposure where she accidentally knocked over some warning signs that were on the road), taking this experience as an opportunity to learn and recover more easily (i.e., by realizing they can manage the situation despite the unpleasant occurrence). The patients engaged in daily sessions of extinction with response prevention treatment for several weeks, besides attending CBT therapy groups. Symptom progression was assessed weekly and showed a reduction in OCD symptomatology (see Table 2 for more details).

Schyns et al. (2020) were interested on the effects of cue exposure therapy aimed at strengthening inhibitory learning by violating the CS-US (i.e., food → eating) expectancy. They conducted eight exposure sessions in which participants were exposed to palatable food and instructed to eat a small amount of it once per session and at a variable point. Participants’ expectancies were then measured throughout the session and the researchers evaluated how those exposures affected different relapse-related measures. They compared their results to those from a control condition in which participants received eight sessions (four in person and four via telephone) of psychoeducation on body image, mindfulness, and lifestyle advice. Participants in both groups were evaluated before and after the intervention, as well as three months later. The authors found that the exposure intervention was more effective than the control condition to reduce snacking and binge eating behaviours (see Table 2 for more information).

Finally, Jessup and Olatunji (2022) exposed participants to four videos of snakes that could be presented alone for 5 min or followed by another video of a snake biting a person (for 20 s) before returning to the initial video. Measuring expectancy ratings and behavioural approach tendencies before and after the intervention, as well as one week later, they found that occasionally reinforced trials during exposure diminished both measures in comparison to the standard exposure group.

As can be seen, the idea underlying these different procedures was also to promote the violation of CS-US expectancies to enhance a stronger inhibitory learning on these subjects. Results from these three studies support a beneficial effect of ORE, with a significant reduction in the problematic symptomatology displayed by the individuals, even in the long term (for instance, Krompinger et al., 2019, report results from a 6-month follow-up in which reduced symptom levels are maintained.)

Statistical and Methodological Considerations

Although we have focused our review on discussing procedural differences within the ORE literature that may explain the sometimes-contradictory results, statistical and methodological aspects should also be taken into account, as they could explain part of the variability observed when investigating the effectiveness of ORE.

The laboratory work discussed in this review varies substantially concerning the number of participants included in each study (see Table 1). In human studies, sample sizes tended to be relatively small, with some experiments including 17 participants (see Dunsmoor et al., 2018). However, larger samples sizes have also been used, as in Thompson et al. (2018), Lipp et al. (2021), or Quintero et al. (2022), with some of them including up to 157 participants after applying exclusion criteria. Importantly, all of them used a between-subjects design, which have reduced statistical power and limited precision compared to within-participants designs.Footnote 6

Another aspect worth mentioning is the statistical power of the experiments. On average, these studies included 25 subjects per experimental condition. With this sample size, studies are well powered only to detect very large effects (e.g., 80% power to detect a Cohen’s d = 0.8). Moreover, out of the eleven studies conducted in the laboratory, only two reported power analyses: whereas Quintero et al. (2022) performed a post-hoc power calculation, Lipp et al. (2021) used an a priori power analysis to establish the appropriate sample size.

Additionally, in some of those studies, especially the ones measuring physiological variables (namely SCR and startle), the data from some participants were not included in certain analyses (see Fig. 4, in Shiban et al., 2015). For example, for the reinstatement test in Shiban et al. (2015), the contingency and SCR data from 13 participants from the ORE group and 15 from the standard extinction condition were considered, whereas the startle analysis included the data from only 11 participants in the occasional reinforced group and 12 in the standard group. These authors point out that their small sample size should be taken as a limitation. This is especially important considering that small samples data can lead to more variable results.

As for the type of contrasts used in the included studies, we found a wide variety of tests. For instance, while some studies calculated recovery as the difference between response levels at the end of extinction vs. at the beginning of the test, others compared acquisition and test response levels or solely compared the performance of the different groups at test. These differences in the way the ORE effect is calculated, in combination with the large variety of outcome measures (from expectancy ratings to SCR), hamper any formal comparison across studies and the synthesis of the results on a meta-analysis.

Out of the twelve laboratory studies, only Morís et al. (2017) and Quintero et al. (2022) made the data and scripts publicly available. However, none of the protocols was pre-registered. Out of the three clinical works, only Schyns et al.’s (2020) study proposal had been previously published (see van den Akker et al., 2016, for a detailed description of the protocol as well as a brief section including the proposed statistical analyses).

Discussion

Extinction has been proposed as the experimental model of exposure therapy, allowing researchers to investigate potential ways to improve the latter with results derived from the laboratory. In fact, several studies have found a correlation between the laboratory and the clinical outcomes (Ball et al., 2017; Forcadell et al., 2017; Hahn et al., 2015; Waters & Pine, 2016; see Scheveneels et al., 2021, for a review on this topic). In recent years, the number of studies investigating potential ways to improve extinction has exponentially increased, with the target at maintaining low levels of the anxiety response in the long term (Craske et al., 2014; Vervliet et al., 2013).

The sparse inclusion of reinforced trials during extinction has been suggested as an effective strategy to achieve relapse prevention via the enhancement of inhibitory learning (Craske et al., 2014). Initially described by Bouton et al. (2004), ORE has been explored in several laboratory and clinical studies. In this review, we aimed at collecting and performing a critical analysis of the divergent existing literature about this extinction intervention, trying to answer various questions regarding the effect of ORE and the potential conditions that may account for its effectiveness. In the following sections, we will try to answer each of our research questions considering the results of our review.

Is there Consistent Evidence Showing that ORE is Effective in Reducing the Relapse of the Conditioned Response? Is this Relapse Prevention Effect Homogeneous Across the Different Relapse Phenomena Tested?

After conducting the systematic search and applying the inclusion and exclusion criteria, we selected a total of 15 reports, including 12 laboratory and three clinical studies published between 2004 and 2022. By and large, the effects of ORE in the laboratory (see Fig. 2) are not homogeneous across the different response recovery phenomena tested in the reviewed studies. The most consistent result seems to be the slowing down of the rate of reacquisition, both in animal and human experiments, although there are some negative results as well. Evidence tends to be less clear when it comes to other less studied recovery phenomena, yielding mixed results regarding the preventive effects of ORE (see Fig. 2). Therefore, it is difficult to draw yet a clear conclusion on whether ORE is effective to reduce recovery.

Under What Specific Circumstances Have These Effects Been Studied?

The benefits of ORE are not homogeneous across all relapse phenomena tested. So, which characteristics of these studies may help understand the conditions under which those effects can be obtained?

First, not only the results are contrasting, but the type of outcome measures assessed in the different studies also tends to differ. It should be noted that, although the inclusion of different measures is not uncommon and can even be advisable in the field (see Lonsdorf et al., 2017), the ORE literature offers a picture that is difficult to interpret. Not only is ORE not consistently effective to reducing specific recovery phenomena, but the response systems it has an effect on tend to vary across studies (see the Primary outcome measures section for more details). In general, the evidence is mixed, and ORE has not shown to be systematically effective at tackling specific response systems. Unfortunately, it cannot be confirmed whether these differences are telling us something about the dimension of the fear response that ORE could be modifying or if the different results could be solely explained based on procedural or methodological aspects.

It should be noted that it is not unusual to find divergences among the different components of the conditioned response (i.e., verbal, physiological, and behavioural indices). However, even if some components of the response are positively affected by ORE (i.e., preventive effects are observed), the fact that ORE does not influence all response components may eventually cause a more generalised response recovery (Boddez et al., 2013). Moreover, it is not clear why ORE should affect certain response systems and not others. A more detailed examination of these discrepancies needs to be performed in the future if the field aim at generating solid and guiding evidence that could be applied to therapy.

Regarding the type of sample, as can be seen in Fig. 2, animal studies offer consistent evidence supporting the benefit of ORE. However, human studies provide less consistent results (i.e., six experiments with positive ORE effects, six experiments with negative effects, and one experiment with inconclusive results), making it difficult to judge whether ORE is really effective to reduce response recovery.

A wide variety of stimuli has also been used on the different empirical studies, which may have an impact on the learning processes and on the potential comparison among studies. For instance, whereas some stimuli may promote a stronger conditioned response, others may not be adequate to elicit such intense emotional response (either negative or positive). In this case, learning could be hindered, as well as the interpretation of the results, obscuring any benefit of ORE.

What Methodological Criteria Should be Taken into Account When Testing the Effectiveness of ORE?

The heterogeneity among the studies can also be observed in methodological and statistical features. We found a considerable heterogeneity in the procedures used across different studies, especially in terms of number of trials per phase and, more importantly, the type of occasional reinforcement rate applied during extinction. Again, these differences complicate the comparison between studies and might have important implications considering that one of the theoretical explanations of ORE suggests that the original acquisition memory can only be modified by the gradual reduction of the CS-US pairings. Based on associative learning theories (i.e., Rescorla & Wagner, 1972), we may expect that the longer the conditioning, the stronger the association between stimuli (other things being equal). Hence, this may lead to differential effects, as it would not be the same to conduct extinction on memories established after an acquisition phase of variable duration. For instance, we would expect the acquisition memory to be stronger and more difficult to modify after twelve than after merely three acquisition trials, and this could potentially explain part of the discrepancies observed in the ORE literature.

Importantly, some of the positive results observed in the ORE literature should be taken with caution. A low number of extinction trials, being some of them CS-US presentations (see Fig. 3 for a summary of the different reinforcement schedules that have been applied), could potentially hinder asymptotic extinction, especially when reinforced trials were still presented on the last trials of this phase. In fact, Culver et al. (2018) and Shiban et al. (2015) found a difference between the conditioned response to CS + and CS– even at the end of extinction training. This was noted by Morís et al. (2017) on their first two experiments, opting for a more gradual decrease in their Experiment 3, in which complete extinction was observed in both ORE and standard extinction groups. Conceptually, an important prerequisite is that extinction must be effectively established before assessing any form of response recovery, especially in order to rule out any difference on conditioned response levels between groups before test that are not due to the experimental manipulation.

Although some studies evaluated various recovery phenomena on different experiments (for example, Gershman et al., 2013, or Quintero et al., 2022), others did not test them independently. We found that those studies evaluated the different phenomena in a sequential way, that is, one test after the other, which might have obscured any preventive benefit of ORE (Culver et al., 2018; Lipp et al., 2021; Thompson et al., 2018) due to a potential carryover effect. For instance, evaluating spontaneous recovery could affect a later evaluation of reacquisition and this latter test would not be a sensible measure for the preventive effects of ORE. In this regard, the experiments that failed to offer support for the slower reacquisition effect after an occasional reinforced training evaluated different response recovery phenomena sequentially. So, even though this manipulation could have been able to slow down the rate of reacquisition, the cumulative effect of previous tests might have undermined the sensitivity to detect it.

Taken together, the differences on various methodological and statistical relevant aspects involved in the study of ORE might have a cumulative detrimental impact on the field, obscuring the potential effect this intervention could have. Moreover, some of them did not ensure minimal critical prerequisites to assess the effectiveness of extinction (i.e., asymptotic response levels prior to the test) or conducted experiments and/or analyses with small sample sizes (see the section on Statistical and methodological considerations of the ORE literature). These issues hinder a proper comparison across studies, making it more difficult to ascertain the effect ORE could have. Because of this, it can be concluded that, at this time, there is a dearth of clear and systematic laboratory evidence supporting the effectiveness an ORE treatment may have on the reduction of the recovery of the conditioned response.

Recommendations for Future Studies

Although we excluded theoretical articles from the final sample, it should be noted that throughout the literature search we found 18 reports of this kind, that is, articles that mention ORE as a potential strategy to enhance extinction learning and prevent or reduce relapse in the laboratory or within clinical settings (Bautista & Teng, 2022; Craske et al., 2014, 2018, 2022; Dunsmoor et al., 2015; Elsey & Kindt, 2017; Jansen et al., 2016; Keller et al., 2020; Kummar et al., 2019; Lipp et al., 2020; McGuire et al., 2016; McGuire & Storch, 2019; Monfils & Holmes, 2018; Pittig et al., 2016; Sewart & Craske, 2020; Tolin, 2019; van den Akker et al., 2018; Weisman & Rodebaugh, 2018). Their discussion of the ORE effects varied, going from a simple description of promising results to even recommendations on how to apply it on a clinical setting. From the detailed numbers, it can be concluded that there are more articles highlighting the potential effectiveness of an ORE intervention than actual empirical tests providing evidence of the suggested benefits. This is remarkable considering that the field lacks a standard protocol that could be widely implemented in laboratory or clinical settings and that this type of intervention has already been applied to clinical cases (see Jessup & Olatunji, 2022; Krompinger et al., 2019; Schyns et al., 2020). But even more so when closely investigating the actual effectiveness of ORE on the reduction of relapse and noticing the lack of clear and consistent evidence.

Comparing the results from laboratory and clinical studies, one important factor that could be neglected in the lab would be the suitability of the procedures (for instance, the type of stimuli, the strength of the learning…). It is possible that conditioning and extinction, as studied in laboratory settings, do not really embody the experience that takes place within the clinical context, making it difficult to find strong and clear evidence. In contrast, clinical studies might facilitate the expression of any ORE benefit by conducting research on a more adequate and significant environment for the individuals. It should also be noted that the clinical application of ORE may entail several changes from the laboratory procedure, such as including additional intervention components (i.e., psychoeducation, expectancy violation intervention, multiple contexts exposure, etc.) or a different procedure than the one used in the lab (i.e., including only one reinforced presentation per exposure sessions, as in Schyns et al., 2020, or only reinforced trials, as in Jessup & Olatunji, 2022). Those additional intervention components cast doubts on the idea that ORE is the key element in those positive results, therefore hindering a real interpretation of the effectiveness of this treatment. Moreover, if we consider the translational framework timeline (see Vervliet et al., 2013), more systematic evidence is desirable on early stages before advancing on the implementation of ORE with clinical samples. Additionally, individual differences are not being considered when evaluating the potential impact of ORE in the lab given their importance on anxiety and fear (see Lonsdorf & Merz, 2017), as well as in addictive behaviours (see Brunault & Ballon, 2021).

The already mentioned lack of clear and systematic evidence, as well as the great heterogeneity within the ORE literature, calls for the development of unified protocols (i.e., equivalent number of acquisition and extinction trials, similar reinforcement schedules, ensuring asymptotic extinction, independent study of different relapse phenomena, etc.), consideration of statistical aspects (for instance, including larger samples, an a priori calculation of statistical power or establishing common tests for the effectiveness of ORE to allow comparison across studies), as well as for the adoption of Open Science practices (for instance, pre-registrations or registered reports, making data available, etc.), so that replication is facilitated in the future.

Some limitations of our review should also be noted. First, although we followed the PRISMA 2020 guidelines (Page et al., 2021), we did not pre-register the systematic review (for instance, using the OSF or PROSPERO’s servers) nor performed several of the recommended practices (the PRISMA checklist may be found at https://osf.io/6nvta/). Additionally, although a second researcher was consulted when necessary, data search and entry was performed by one researcher. The small sample of the laboratory studies included, the different indices used in the studies to calculate response recovery, as well as the great variety of protocols did not allow us to conduct a meta-analysis, which could have provided additional information about the effects of ORE. Lastly, we only included published articles (see Eligibility criteria), but there may be laboratory and clinical studies that have not been published yet due to publication bias (Dwan et al., 2013; Franco et al., 2014), and, therefore, were not included in this review. Future studies could try to tackle the necessary statistical approach to conduct a meta-analysis, searching for non-published results.

Conclusions

To sum up, in this review we identified and analysed the existing literature on the effect of ORE. It can be concluded that there has been a substantial variability regarding experimental procedures, not only concerning the phenomena assessed or the measures that were considered, but also regarding number of trials (especially for the extinction phase) and the reinforcement schedule, a key feature of this strategy. Despite all these divergent results and protocols, the picture that emerges is that the effectiveness of ORE has not shown to be systematically superior to standard extinction beyond some beneficial effect to retard the rapid reacquisition of conditioning. Moreover, the limitations observed within the ORE literature call into question how general the potential benefits of its use in clinical settings would be and stress the need to generate high-quality, replicable, and transparent literature. To this day, we do not know the extent of the potential benefit of this strategy, or which factors would determine it (i.e., boundary conditions) and, therefore, further research is needed. Moreover, we should be more cautious when applying ORE to clinical situations considering the lack of consistent laboratory evidence and of a standardised protocol. To the best of our knowledge, this is the first time a systematic review has been conducted on the ORE literature and the results call for the unification of the research protocols and the integration of Open Science practices. To do this, we have suggested that certain methodological and statistical aspects need to be considered to facilitate the replication of ORE studies (see Lonsdorf et al., 2017), allowing a better understanding of its value and scope for relapse reduction.