For several treatments, it has been shown that an extinguished response recovers, indicating that extinction did not completely erase the initially learned information (e.g., Bouton & Bolles, 1979; Rescorla, 2004; Rescorla & Heth, 1975; Winterbauer, Lucke, & Bouton, 2013). One of these postextinction phenomena is renewal, first documented by Bouton and Bolles, which refers to a recovery of acquisition performance after changing the contextual cues that were present during extinction. In a typical renewal experiment, the animal learns an association between a Pavlovian conditioned stimulus (CS) and an unconditioned stimulus (US) in context A. In a second phase, conducted in context B, the presentations of the CS are no longer followed by a reinforcer. After the conditioned response has been extinguished, and if the animal is tested again in context A, the originally learned behavior renews (ABA renewal). Furthermore, the renewal effect can occur in situations in which acquisition, extinction, and testing all take place in different contexts (ABC renewal; e.g., Bouton & Bolles, 1979) or in which the context changes only between extinction and testing (AAB renewal; e.g., Bouton & Ricker, 1994). Renewal can be found in many different preparations, including autoshaping with pigeons (Swartzentruber, 1993), human predictive learning (Üngör & Lachnit, 2006, 2008), and operant conditioning in rats (Bouton, Todd, Vurbic, & Winterbauer, 2011).

While renewal illustrates the context dependency of extinction performance, several findings have indicated that acquisition performance might be less dependent on contextual cues. One line of evidence for this proposal comes from the observation that renewal occurs even though a context switch following initial conditioning did not affect conditioned responding (Bouton & King, 1983; Bouton & Peck, 1989; Paredes-Olay & Rosas, 1999; Thomas, Larsen, & Ayres, 2003; Vansteenwegen et al., 2005). To explain the observation that acquisition and extinction performances are differentially susceptible to contextual manipulations, Darby and Pearce (1995) suggested that during initial conditioning, organisms only pay a negligible amount of attention to contextual stimuli, because of the physical attributes and irrelevance of those stimuli for solving the task. Therefore, contextual stimuli are not processed until the surprise engendered by the omission of the US at the outset of extinction encourages the organisms to pay attention to the actual context. Because of this, the entire pattern of stimulation provoked by the CS and the context of extinction results in one unitary representation developing an inhibitory association with the US during extinction treatment, whereas during initial conditioning, only the CS is represented and associated with the US.

The basic idea that extinction may enhance an organism’s processing of contextual stimuli was also incorporated by Bouton (1997, 2004) into his theory of memory retrieval (Bouton, 1993, 1994) assuming that contextual stimuli regulate the retrieval of different memories related to the same CS. Bouton (1997) proposed that a CS acquires ambiguity through changing its significance from acquisition to extinction. The organism begins to pay attention to the context, in order to solve this arising ambiguity. As a consequence, the inhibitory CS–US association learned during extinction needs the context for retrieval, while the excitatory CS–US association established during acquisition is coded independently of the context.

Extending this approach, Rosas, Callejas-Aguilera, Ramos-Álvarez, and Abad (2006) pointed out that context specificity might depend on characteristic features of the situation that induce animals to pay attention to the context. Once this has happened, all information will be processed in a way that makes it context-specific, regardless of whether acquisition learning or extinction learning is concerned. One of the factors proposed to modulate the amount of attention paid to contextual stimuli was the informational value of contexts (for a formal account, see Mackintosh, 1975). An example supporting the idea suggested by Rosas et al. (2006) is provided by an experiment of Preston, Dickinson, and Mackintosh (1986, Exp. 2). Two groups of rats were trained to press a lever during a tone (S1+) or a light (S3+) and not to react during a clicker (S2–) in context A. In context B, the groups experienced different treatments. For animals in Group Cond, the contingencies for S1 and S2 were changed (S1–, S2+), whereas in Group Disc, the contingencies remained unchanged (S1+, S2–). Hence, contextual stimuli were relevant to the solution of the task in Group Cond, but not in Group Disc. The light (S3+) was not presented in context B during training, but was extinguished for all in a subsequent phase. For half of each group, this extinction took place in context A, and for the other half, in context B. The authors observed that extinction of S3 proceeded faster in context B than in context A for Group Cond, where the contexts were relevant to solve the discrimination between S1 and S2 during the training phase. However, the rate of extinction was independent of the contexts in Group Disc, where the context stimuli were irrelevant.

The generality of the findings reported by Preston et al. (1986) was demonstrated in a series of human experiments by León and colleagues (León, Abad, & Rosas, 2008, 2010; León, Gámez, & Rosas, 2012) using different learning scenarios. In one of these tasks, an instrumental learning scenario (León et al., 2010; León et al., 2012), participants played a computer game in which they had to defend different beaches (contexts) against attackers by clicking on them with the mouse. On every trial, discriminative stimuli indicated to the participants which attacker could be destroyed. In a predictive-learning task (León et al., 2008), participants were instructed to assume the role of an expert to identify foods that would lead to illness. Within this scenario, participants had to predict diarrhea after a person ate a special food in a special restaurant. The results from each of these experiments replicated the basic findings from Preston et al. (1986). Moreover, the authors demonstrated, for example, that the context change effects were not simply a consequence of the participants applying general rules (e.g., León et al., 2012).

The aim of the present experiments was to replicate and extend the previous findings reported by Preston et al. (1986) and León et al. (2008, 2010; León et al., 2012). In each of two experiments, we investigated the role of the informational value of contexts for the formation of context-dependent behavior using a predictive-learning task in which participants were required to predict the occurrence of specific outcomes on the basis of different cues.

In addition, we examined whether differences in the rates of extinction caused by a manipulation of the informational value of contexts are accompanied by differences in the strength of renewal. In previous studies, differences in extinction caused by specific treatments were not necessarily associated with differences in renewal. For instance, Bouton, García-Gutiérrez, Zilski, and Moody (2006) observed that extinction proceeded more slowly when it was conducted in multiple contexts than when it was given in a single context. However, in a final renewal test, response recovery was similar in strength across the two conditions. An additional example was provided by Bouton, Vurbic, and Woods (2008). The authors reported that the administration of D-Cycloserine facilitated extinction learning as compared to a saline control condition, but renewal was equivalent in both conditions. Given these results, it remains unclear whether differences in extinction caused by a manipulation of the informational value of contexts are accompanied by differences in renewal. Therefore, each of our experiments included a final test phase to investigate the context-specific reoccurrence of initially learned behavior. Besides its theoretical importance, this question might also be relevant from a practical (clinical) perspective, as treatments shown to manipulate the strength of renewal might be used for the development of therapeutic strategies for the prevention of relapse after exposure-based therapy.

Furthermore, our study also aimed at evaluating the theoretical analyses of context dependency and attention proposed by Rosas et al. (2006). On the basis of the previous experiments by Preston et al. (1986) and León et al. (2008, 2010; León et al., 2012), differences in attention to contexts established during an initial phase were inferred from behavioral effects observed during a subsequent phase. An alternative approach to investigating attentional processes in associative learning can be found in studies in which eye-gaze behavior during a learning task was monitored. For instance, Le Pelley, Beesley, and Griffiths (2011) and Mitchell, Griffiths, Seetoo, and Lovibond (2012) trained participants with a simple discrimination of the form AX+, AY+, BX–, BY–. They found that participants spent more time looking at the relevant stimuli A and B than at the irrelevant stimuli X and Y, and that this difference in overt attention persisted into a subsequent phase in which all of the stimuli were equally relevant for a novel task. Following the approach taken by Le Pelley et al. (2011) and Mitchell et al. (2012), Experiment 2 of the present study included recording of eye movements to further assess the role of attentional processes for the formation of context-dependent behavior. In contrast to the studies by Le Pelley et al. and Mitchell et al., in Experiment 2 we investigated overt attention to relevant and irrelevant stimuli using an experimental approach in which the relevant and irrelevant stimuli were equally uncorrelated with the outcome (e.g., Uengoer & Lachnit, 2012). Our procedure ensured that differences in overt attention could not be ascribed to differences in stimulus–outcome correlations, as it is the case in studies using only simple discriminations (e.g., Le Pelley et al., 2011; Mitchell et al., 2012).

Experiment 1

Table 1 illustrates the design for the two groups of Experiment 1. During Phase 1, all participants received X+ and Y– trials in context A. Half of the participants (Group Relevant) experienced reversed contingencies for cues X and Y in context B (X–, Y+), whereas the other half (Group Irrelevant) were trained with the same contingencies (X+, Y–) in context B. Thus, Group Relevant received a conditional discrimination, in which the contexts were relevant to solve the discrimination between X and Y. In comparison, Group Irrelevant received a simple discrimination, in which the contexts were irrelevant for the task. Additionally, our experimental design included a within-subjects ABA renewal procedure: All participants were trained with Z+ trials in context A during Phase 1, and in Phase 2, both groups received Z– trials in context B. Then, all participants were tested with Z trials in each context.

Table 1 Design of Experiment 1

If, consistent with the results of Preston et al. (1986) and León et al. (2008, 2010; León et al., 2012), the informational value of contexts affects the strength of context-dependent learning, extinction of Z in Phase 2 should proceed faster in Group Relevant than in Group Irrelevant. This difference in extinction might also be accompanied by differences in the strength of renewal during the final test.

To investigate within each group whether learning of Z is context-dependent or context-independent, we trained a further stimulus H. During Phase 1, all participants were trained with H+ trials in context B, while in Phase 2 both groups received H– trials in context B. Hence, we presented two stimuli in our experiment, which were trained in a first phase and then extinguished during a second phase. However, the extinction of cue Z was conducted outside of its acquisition context, whereas cue H was shown during extinction in the same context in which it was initially trained. If learning is context-independent, no differences in responding to stimuli Z and H should occur in extinction. However, if learning is context-dependent, extinction of Z should be faster than extinction of H, because only Z was subjected to a context change between acquisition and extinction.

Moreover, our experimental design included training of filler cues (F1–F8) during Phases 1 and 2. The purpose of these filler cues was to equate experiences across the contexts. They ensured that participants had experiences with each context in each of the two phases, and also ensured that context A and context B were equated for both the number of presentations and the number of associated cues by the termination of Phase 2. Finally, due to these filler cues, each context was associated with a specific outcome on 50 % of the trials.

Method

Participants

A group of 64 students from the Philipps-Universität Marburg, Marburg, Germany (43 women, 21 men; M age = 22.4 years, age range 18–34 years) participated in the experiment and received course credit, chocolate, or payment (EUR €1.50 [USD $1.88]). Participants were equally and randomly allocated to the different experimental groups as they arrived in the experimental room. They were tested individually and required on average approximately 13 min to complete the experiment. The data of 16 additional participants were excluded from the analyses because their predictions were incorrect on more than 30 % of the trials during the last two blocks in Phase 1 and/or during the last two blocks in Phase 2.

Apparatus and stimuli

The following food types were used as cues H, X to Z, and F1 to F8: avocado, banana, broccoli, grapes, lemon, orange, pear, pepper, pineapple, strawberry, tomato, and zucchini. The names of two fictitious restaurants were used as contexts A and B. The restaurants were labeled (translated from German) The Mug or The Dome, written in turquoise or pink font, respectively. Both the assignments of the different food types to cues H, X to Z, and F1 to F8 and the assignments of the restaurant names to contexts A and B were implemented randomly for each participant. The two different outcomes were the occurrence (+) or nonoccurrence (–) of stomach trouble. The stimuli, instructions, and all other necessary information were presented on a computer screen. Participants interacted with the computer by using the mouse.

Procedure

Each participant was initially asked to read the following instruction (in German) on the computer screen:

Our study is concerned with the questions of how people learn about relationships between different events. Imagine that you are a medical doctor and that one of your patients often suffers from stomach trouble after meals. Your task is to discover what causes this stomach trouble that your patient is suffering from.

Your patient likes to go out for meals. The Mug and The Dome restaurants are your patient’s favorite places. You will be told which restaurant your patient has visited each day and which foods your patient has eaten there. Please look carefully at the foods and the respective restaurant. Thereafter, you will be asked to predict whether the patient suffers from stomach trouble. For this prediction, please click on the appropriate response button. After you have made your prediction, you will be informed whether your patient actually suffers from stomach trouble.

Use this feedback to find out what causes the stomach trouble that your patient is suffering from. Obviously, at first you will have to guess because you do not know anything about your patient. But eventually you will learn which causes lead to stomach trouble in this patient and you will be able to make correct predictions.

For all of your answers, accuracy rather than speed is essential. Please do not take any notes during the experiment. If you have any more questions, please ask them now. If you do not have any questions, please start the experiment by clicking on the Next button.

When a participant asked a question, it was answered by the experimenter. After the participant clicked on the Next button, the learning phases started.

In Phase 1 (see Table 1), Group Relevant received a conditional discrimination with ten trials each of X+ and Y– in context A and ten trials each of X– and Y+ in context B, whereas Group Irrelevant experienced a simple discrimination with ten trials each of X+ and Y– in context A and ten trials each of X+ and Y– in context B. In addition, participants were given ten trials each of Z+ and F1– in context A, together with ten trials each of H+ and F2– in context B. In Phase 2, both groups were trained with ten trials each of F3+, F4+, F5–, and F6– in context A, together with ten trials each of Z–, H–, F7+, and F8+ in context B. Phase 2 followed Phase 1 without a break (so that the transition was not signaled to the participants).

After Phase 2, all participants received a series of test trials. This test was introduced by the following instructions: “Now the feedback of whether your patient actually suffers from stomach trouble will be omitted. Nevertheless, please exert yourself to predict the occurrence or nonoccurrence of stomach trouble as accurately as possible.” The structure of the test trials was identical to that of the learning trials, with the exception that the feedback window was omitted. Participants in both groups received four presentations of Z trials in each context.

For both groups, each learning phase was divided into five blocks and the test phase into two blocks. Within each block, each trial type was presented on two occasions. The order of presentation of the trials within each block was determined randomly for each block and each participant.

Dependent variable and statistical analysis

For each of the trial types Z and H, we calculated for each participant the mean percentages of stomach trouble predictions in each block of Phases 1 and 2 and across all blocks of the test phase. For the simple discriminations in Group Irrelevant and the conditional discriminations in Group Relevant, we calculated for each participant the mean percentages of correct predictions across the four trial types (context A: X, Y; and context B: X, Y) in each block of Phase 1. The means were analyzed by means of a repeated measures analysis of variance (ANOVA). For all reported experiments, the .05 level of significance was used in all statistical tests, and the degrees of freedom were corrected with the Box (1954) method where appropriate. We used partial eta-squared as the measure of effect size.

Results and discussion

Discrimination training in Phase 1

The left-hand panel of Fig. 1 presents the mean percentages of participants who made stomach trouble predictions for Z+ in context A and H+ in context B across the five blocks of Phase 1 for each group. Squares represent the data from Group Relevant, and circles, the data from Group Irrelevant; black symbols record responses to Z+ trials, and white symbols, responses to H+ trials.

Fig. 1
figure 1

The left-hand panel shows the mean percentages of participants who predicted stomach trouble in response to Z in context A and H in context B, across the five blocks in Phase 1 of Experiment 1, separately for Group Relevant (square symbols) and Group Irrelevant (circular symbols). The right-hand panel shows the mean percentages of participants predicting stomach trouble in response to either Z or H in context B across the five blocks in Phase 2 of Experiment 1, separately for Groups Relevant and Irrelevant

As can be seen, no differences in responding to H and Z occurred in either group. This was confirmed by a 2 × 5 × 2 (Stimulus [Z, H] × Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA, including the between-subjects factor Group and the within-subjects factors Stimulus and Block. We found a main effect block, F(4, 248) = 65.6, p < .001, η p 2 = .51, indicating an increase of stomach trouble predictions to both stimuli over the course of acquisition training, as well as a main effect of group, F(1, 62) = 6.9, p = .011, η p 2 = .10, showing that Group Irrelevant reacted more strongly to Z+ and H+ on average than did Group Relevant. No Block × Group interaction was apparent, F < 1. Most importantly, the main effect of stimulus and all interactions including this factor were not significant, Fs < 1, showing that responding to Z did not differ from responding to H.

Table 2 depicts the mean percentages of correct predictions across trials including X and Y for each block of Phase 1. The analysis of correct predictions showed that the simple discrimination in Group Irrelevant was acquired faster than the conditional discrimination in Group Relevant. A 5 × 2 (Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA revealed a main effect of block, F(4, 248) = 66.20, p < .001, η p 2 = .52, showing that the mean percentage of correct predictions increased in the course of training. The main effect of group was also significant, F(1, 62) = 74.27, p < .001, η p 2 = .55, as well as the Block × Group interaction, F(1, 62) = 3.76, p = .008, η p 2 = .57, indicating that performance was more accurate in Group Irrelevant than in Group Relevant.

Table 2 Mean correct predictions across trials including X and Y (standard errors within parentheses) for each of the five blocks of Phase 1 in Experiment 1

Extinction in Phase 2

The right-hand panel of Fig. 1 presents the mean percentages of participants making a stomach trouble prediction for Z– and H– trials in context B across the five blocks of Phase 2 for each group. The percentages of stomach trouble predictions for H– and Z– decreased in the course of Phase 2 in each group. However, responding to H was stronger than to Z in Group Relevant, while in Group Irrelevant, there were no differences in responding to these stimuli. A 2 × 5 × 2 (Stimulus [Z, H] × Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA supported these findings, showing a significant main effect of block, F(4, 248) = 87.95, p < .001, η p 2 = .59, indicating decrements in responding to H and Z over the course of training. A significant main effect of stimulus also emerged, F(1, 62) = 18.40, p < .001, η p 2 = .23, indicating fewer responses to Z– than to H–, and a significant Stimulus × Block interaction, F(4, 248) = 6.53, p < .001, η p 2 = .10, showing that the difference in responding to H and Z changed across the blocks. The main effect of group was not significant, F < 1, but we did find a significant Block × Group interaction, F(4, 248) = 3.48, p = .018, η p 2 = .05, reflecting that the decrements in responding to H and Z over the course of training differed between the groups.

Most importantly, the analysis revealed a significant Stimulus × Group interaction, F(1, 62) = 4.37, p = .041, η p 2 = .07, reflecting that the differences in responding to H and Z varied across the groups. To further decompose the Stimulus × Group interaction, we calculated simple main effects of stimulus at each level of the factor Group. The analysis revealed a significant simple main effect of stimulus only in Group Relevant, F(1, 62) = 20.34, p < .001, η p 2 = .25, but not in Group Irrelevant, F(1, 62) = 2.42, p = .125, η p 2 = .04, showing that the difference in responding between Z and H was more pronounced in Group Relevant than in Group Irrelevant. The Stimulus × Block × Group interaction was not significant, F(4, 248) = 1.75, p = .152.

Contextual control during test

Figure 2 depicts responding to Z trials in contexts A and B during the test phase in terms of the mean percentages of participants making stomach trouble predictions. The left-hand bars present the predictions for Group Relevant, and the right-hand bars show the predictions for Group Irrelevant. Within each group, the black bar depicts the predictions in context A, and the white bar, the predictions in context B.

Fig. 2
figure 2

Mean percentages of participants predicting stomach trouble in response to Z in context A and in context B during the test phase of Experiment 1, collapsed across the four presentations of each trial type separately for Groups Relevant and Irrelevant. Error bars denote standard errors of the means

As the figure demonstrates, both groups responded to Z more readily in context A than in context B. However, the difference in responding between the two contexts was more pronounced in Group Relevant. Contextual control over Z was assessed by means of a 2 × 2 (Context [A, B] × Group [relevant, irrelevant]) ANOVA. The analysis revealed a significant main effect of context, F(1, 62) = 37.88, p < .001, η p 2 = .38, indicating that responding to Z was stronger in context A than in context B, and a significant Context × Group interaction, F(1, 62) = 7.13, p = .010, η p 2 = .10, showing stronger context dependency in Group Relevant than in Group Irrelevant. To further analyze the Context × Group interaction, we calculated the simple main effect of context at each level of the factor Group. For both groups, the simple main effect of context was significant—F(1, 62) = 38.94, p < .001, η p 2 = .39, for Group Relevant, and F(1, 62) = 6.07, p = .017, η p 2 = .09, for Group Irrelevant—indicating that all participants responded more to Z trials in context A than in context B. Furthermore, we conducted a simple main effect analysis of group at each level of the factor Context, revealing a simple main effect of group for context A, F(1, 62) = 11.36, p = .001, η p 2 = .16, but not for context B, F < 1, showing stronger responding for Group Relevant only in context A.

Overall, extinction of Z was affected by a context change between the acquisition and extinction treatments if initial acquisition training of this stimulus was conducted in a context that was relevant for the solution of a discrimination between X and Y. However, extinction was unaffected by the contextual manipulation if initial acquisition was conducted in a context that had been trained as being irrelevant. These results indicate that the informational value of contexts affects context-specific learning, as previously documented by Preston et al. (1986) and León et al. (2008, 2010; León et al., 2012). In addition, we found that after extinction, responding to Z recovered in each group when the stimulus was tested in the context of initial acquisition. However, this ABA renewal effect was stronger in Group Relevant than in Group Irrelevant. To our knowledge, the present experiment is the first demonstrating this effect (for a theoretical discussion of this finding, see the General Discussion).

The results of Experiment 1 support the idea of Rosas et al. (2006) that contexts with a higher informational value receive more attention, consequently leading to stronger context-specific encoding of the information acquired in these contexts. In the present experiment, differences in the amounts of attention paid to contextual stimuli during Phase 1 were inferred from differences in the strengths of context-dependent learning observed in Phase 2 (see also León et al., 2008, 2010; León et al., 2012; Preston et al., 1986). To further support the interpretation of our results in terms of context dependency and attention (Rosas et al., 2006), we conducted a second experiment in which we used a design similar to the one in the previous experiment, but included recording of eye movements to assess overt attention to the contexts during the different phases of the experiment.

Experiment 2

Experiment 2 was designed to yield direct support for the hypothesis that the informational value of the context modulates the amount of attention allocated to that context. We recorded the gaze position of participants who were engaged in a predictive-learning task and analyzed fixational dwell times on context stimuli as a measure of this attentional allocation. The trained discrimination problems were similar to those used in Experiment 1 (Table 3). Thus, in Phase 1, Group Relevant received X+, Y– in context A and X–, Y+ in context B, whereas Group Irrelevant experienced X+, Y– in context A as well as in context B. In addition, both groups received an ABA renewal procedure. All participants were trained with Z+ in context A during Phase 1, experienced Z– in context B during Phase 2, and were tested with Z in both contexts.

Table 3 Design of Experiment 2

However, for the measurement of eye movements during the predictive-learning task, a number of changes in our procedure were necessary. In contrast to Experiment 1, we used a sequential stimulus presentation in which the context preceded the food cues. This sequential presentation ensured that visual orienting toward the context was not disturbed by attention to the food cues. As a reference with which dwell time to the context could be compared, each presentation of a context was accompanied by the presentation of a control stimulus C. In order to avoid biases due to differences in physical intensity across context A, context B, and control stimulus C, we used stimuli that were similar in color, complexity, and in their spatial extent. For half of the participants in each group, the contexts and the control stimulus were yellow patterns, and for the other half they were yellow block capitals (see Fig. 3). In addition, the instructions to the participants differed from the medical doctor scenario that had been applied in Experiment 1, as in Experiment 2 the contexts were no longer represented by restaurant names. Instead, participants were required to predict the correct position of an arrowhead after a shown food cue.

Fig. 3
figure 3

Screenshot of the context and control stimuli of Experiment 2. The upper part of the figure shows the first set of stimuli, which were randomly assigned to context stimuli or the control stimulus. The lower part of the figure depicts the second set of stimuli. Letters A and B were randomly allocated to the context stimuli, and letter X always acted as the control stimulus

If, consistent with the conclusions drawn from Experiment 1, the informational value of context stimuli affects the amount of attention paid to these stimuli, participants should spend more time fixating the contexts in Group Relevant than in Group Irrelevant.

Method

Participants

A group of 28 students from Philipps-Universität Marburg (21 women, 7 men; M age = 22.9 years, age range 16–30 years) participated in the experiment and received either course credit or payment (EUR €8.50 [USD $10.54]). The predictive-learning task took about 30 min to complete on average. All participants had normal or corrected-to-normal vision and were randomly allocated to Group Relevant or Group Irrelevant. Each participant made correct predictions in at least 70 % of all trials in the last two blocks of Phases 1 and 2, and hence all participants reached the learning criterion. In some trials, recording of gaze position was distorted by signal noise or blink artifacts, so that we excluded 2.6 % of the trials for each participant, on average.

Apparatus

An infrared video-based eyetracker (EyeLink 2000, SR Research, Inc.) recorded monocular eye movements and sampled the positions of pupil center and corneal reflection at 1,000 Hz. The recording side (left vs. right eye) was counterbalanced across participants. The eyetracking column was table-mounted in front of a 22-in. CRT monitor (Iiyama, Vision Master Pro514), and restrained the participant’s head via chin and forehead rests. Computer-controlled stimuli were presented at an eye-to-screen distance of 78 cm. A rectangular, funnel-shaped aperture framed the screen, in order to prevent environmental distraction from the experimental chamber.

Procedure and stimuli

Participants gave written consent to the requirements to try to sit still and to avoid blinking during sampling intervals, as well as to the anonymous storage of their data for the analysis. Prior to the experiment, participants read written instructions in German that exemplified the events and task demands that occurred within a trial. Ten practice trials were delivered to assure that individuals had understood the instructions. The eyetracker was then calibrated using a 13-point grid of calibration targets. The calibration procedure was repeated until the subsequent validation process confirmed an average calibration error of <0.5°.

Each trial started with a fixation cross in the center of the screen, which instructed participants to fixate, stop blinking, and pay attention. After the fixation cross, one of two stimuli, acting either as the nominal context stimuli (A or B) or as a competing control stimulus (C), was presented for 2 s in a gray square measuring 4 cm2 that was positioned straight up or straight down from fixation at a distance of 2.68 cm. Hence, on context presentations, participants could decide whether to attend to the context or the control stimulus, and during this time, eye movements were recorded. Two sets of stimuli, which acted as the contexts or control stimuli, are shown in Fig. 3. In the next step, a food cue was shown in the center of the screen for 2 s. Within these 2 s, participants were asked to predict the correct outcome using the computer mouse. The two possible outcomes were the occurrence of a white arrowhead pointing toward either the right or the left side of the screen. In order to predict the right, the participants should click the right button of the mouse, whereas to predict the left, they should click the left button. The outcomes that functioned as + and – for each individual (as described in Table 3) were randomly assigned. Then, the arrowhead appeared for 2 s, and a feedback sound occurred. The feedback about the correctness of the prediction was provided by a high sound (correct prediction) or a low sound (incorrect prediction). Every trial ended with an empty display for 4 s on average. Following Phase 2, participants were told by an instruction on screen that in the next phase the trials would be the same, except for the nonoccurrence of the arrow and the feedback sound. However, they were required to continue their predictions about the direction of the arrow on the basis of experience from before.

Table 3 depicts the design of Experiment 2. All participants received preliminary training (not included in the table) of X and Y alone, leaving out any of the stimuli that would also be trained in Phase 1 (Z and F1–F3). Group Relevant received pretraining of six trials each of X+ and Y– in context A and six trials each of X– and Y+ in context B, whereas Group Irrelevant received six trials each of X+ and Y– in both contexts. In Phase 1, discrimination training between X and Y was continued with each of the four trial types (Context A: X, Y; and Context B: X, Y) presented ten times. Additionally, both groups received training with ten trials each of Z+ and F1– in context A, and ten trials each of F2+ and F3– in context B. During Phase 2, all participants experienced ten trials each of F4– and F5+ in context A, as well as ten trials each of Z– and F6+ in context B. In the test phase, groups were tested with four Z trials in each context.

The preliminary phase was divided into three blocks. Each learning phase was divided into five blocks, and the test phase was divided into two blocks. Within each block, each trial type was presented on two occasions. The order of presentation of the trials within each block was determined randomly for each block and each participant.

Dependent variable and statistical analysis

For trial type Z, we calculated for each participant the mean percentages of correct predictions in each block of Phases 1 and 2 and across all blocks of the test phase. Predictions for Z were considered as being correct if they were in accordance with the contingency trained in Phase 1. The four trial types including X and Y were analyzed as in Experiment 1. Eye position data were analyzed from the onset of the context stimulus presentation until the food cue appeared. We excluded trials from the analyses if the eyetracker lost the signal (e.g., because of head movements or blinks) for more than 20 % in the relevant time window. Dwell time on the context stimulus A or B minus dwell time on the control stimulus C was the subject of the analyses reported below. Please recall that the control stimulus carried no information in both groups and that the contexts had informational value for Group Relevant only.

Results and discussion

Predictive-learning data

Discrimination training in Phase 1

The left-hand panel of Fig. 4 depicts the mean percentage of participants making the correct prediction of arrow position for Z+ trials in context A across the five blocks of Phase 1 for each group. Black squares represent the data from Group Relevant, and white squares, the data from Group Irrelevant.

Fig. 4
figure 4

The left-hand panel shows the mean percentages of participants predicting the correct position of the arrow for Z in context A across the five blocks of Phase 1 of Experiment 2, separately for Group Relevant (black squares) and Group Irrelevant (white squares). The right-hand panel shows the mean percentages of participants predicting the correct position of the arrow for Z in context B across the five blocks of Phase 2 of Experiment 2, separately for Groups Relevant and Irrelevant. (Predictions were considered to be correct if they were in accordance with the contingency trained in Phase 1)

As can be seen, no differences are apparent between the groups in the increase of correct position predictions to Z+ in the course of acquisition. This was confirmed by a 5 × 2 (Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA, revealing a significant main effect of block, F(4, 104) = 8.02, p < .001, η p 2 = .24, that showed an increase of correct predictions in the course of training. We found no main effect of group and no Block × Group interaction, Fs < 1, reflecting that responding to Z did not differ across the groups.

Table 4 shows the means for correct predictions across trials including X and Y for each block of Phase 1. As in Experiment 1, the simple discrimination in Group Irrelevant was acquired faster than the conditional discrimination in Group Relevant. In a 5 × 2 (Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA, we found a significant main effect of block, F(4, 104) = 3.04, p = .030, η p 2 = .11, as well as a significant main effect of group, F(1, 26) = 15.08, p = .001, η p 2 = .37. The analysis also revealed a significant Block × Group interaction, F(4, 104) = 3.92, p = .010, η p 2 = .13, showing differences between the groups over the course of learning.

Table 4 Mean correct predictions across trials including X and Y (standard errors within parentheses) for each of the five blocks of Phase 1 in Experiment 2

Extinction training in Phase 2

The right-hand panel of Fig. 4 presents the mean percentages of participants making correct predictions of the arrow directions for Z– trials in context B across the five blocks of Phase 2 for each group (predictions were considered to be correct if they were in accordance with the contingency trained in Phase 1). As can be seen, Group Relevant responded less strongly to Z– than did Group Irrelevant at the beginning of the phase. A 5 × 2 (Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA confirmed this finding, yielding a significant main effect of block, F(4, 104) = 21.64, p < .001, η p 2 = .45, showing a decrease in correct predictions in the course of training. We found no main effect of group, F(1, 26) = 1.33, p = .259, η p 2 = .05, but a significant Block × Group interaction, F(4, 104) = 5.38, p = .001, η p 2 = .17, showed that the decrements of correct predictions in the course of training proceeded differently across groups. Further decomposing the Block × Group interaction revealed a simple main effect of group only in Block 1, F(1, 26) = 9.03, p = .006, η p 2 = .26 (in later blocks of Phase 2, all Fs < 1.12, all ps > .297), indicating that the groups differed at the beginning of extinction, but then both showed equal extinction of Z.

Contextual control during test

Figure 5 depicts responding to Z trials in contexts A and B during the test phase in terms of the mean percentages of participants making correct predictions of the arrow direction. The left-hand bars present the predictions for Group Relevant, and the right-hand bars show the predictions for Group Irrelevant. Within each group, the black bar depicts the predictions in context A, and the white bar, the predictions in context B.

Fig. 5
figure 5

Mean percentages of participants predicting the correct position of the arrow for Z in contexts A and B during the test phase of Experiment 2, collapsed across the four presentations of each trial type separately for Groups Relevant and Irrelevant. (Predictions were considered to be correct if they were in accordance with the contingency trained in Phase 1.) Error bars denote standard errors of the means

As the figure demonstrates, participants in Group Relevant responded to Z more readily in context A than in context B. However, there was no difference in responding to Z between the contexts in Group Irrelevant. This was confirmed by a 2 × 2 (Context [A, B] × Group [relevant, irrelevant]) ANOVA. The analysis revealed a significant main effect of context, F(1, 26) = 24.92, p < .001, η p 2 = .49, indicating different responding to Z depending on the context, and a significant Context × Group interaction, F(1, 26) = 30.02, p < .001, η p 2 = .54, showing that context dependency was stronger in Group Relevant than in Group Irrelevant. To further analyze the Context × Group interaction, we calculated the simple main effects of context at each level of the factor Group. The simple main effect of context reached significance for Group Relevant, F(1, 26) = 54.82, p < .001, η p 2 = .68, but not for Group Irrelevant, F < 1, indicating that only participants in Group Relevant responded more to Z trials in context A than in context B. Furthermore, we conducted a simple main effect analysis of the factor Group at each level of the factor Context, revealing a simple main effect of group for context A, F(1, 26) = 33.34, p < .001, η p 2 = .56, and no effect of group for context B, F(1, 26) = 2.46, p = .129, η p 2 = .09, showing stronger responding of Group Relevant only in context A.

Eye-gaze data

Figure 6 depicts how overt attention to the contexts developed during the three phases of our predictive-learning task. Inspection of Phase 1 in Fig. 6 reveals that attention to the contexts gradually increased in Group Relevant, while participants in Group Irrelevant did not show any preference for fixation of the contexts as compared to the control stimulus. Accordingly, a 5 × 2 (Block [1, 2, 3, 4, 5] × Group [relevant, irrelevant]) ANOVA for Phase 1 revealed a significant main effect of group, F(1, 26) = 6.51, p = .017, η p 2 = .20, that was modulated by a significant Block × Group interaction, F(4, 104) = 4.52, p = .016, η p 2 = .15, reflecting longer dwell times to contexts in Group Relevant than in Group Irrelevant. The main effect of block failed to reach significance, F(4, 104) = 1.95, p = .155, η p 2 = .07. Figure 6 also shows that the group difference in attention to the contexts that was acquired during Phase 1 persisted during Phase 2. The ANOVA revealed a significant main effect of group, F(1, 26) = 5.48, p = .03, η p 2 = .17, while neither the block effect nor the two-way interaction was significant, all Fs < 1. In the test phase, the 2 × 2 (Block [1, 2] × Group [relevant, irrelevant]) ANOVA showed at least a tendency for group differences, F(1, 26) = 3.79, p = .063, η p 2 = .13. All other effects were not significant, all Fs < 1.

Fig. 6
figure 6

Fixational dwell time on contexts A and B, averaged across the training trials of each block of Experiment 2. Dwell times were computed by subtracting the times that participants spent fixating the control stimulus C from the dwell time on the context stimuli. Data are shown separately for Group Relevant (black squares) and Group Irrelevant (white squares)

Consistent with the results of Experiment 1, we found that extinction of a stimulus outside of its acquisition context proceeded faster when initial acquisition training of this stimulus was conducted in a relevant context than when initial training was given in an irrelevant context. And again, we observed that response recovery after extinction was stronger in Group Relevant than in Group Irrelevant. However, in contrast to Experiment 1, we found no evidence for renewal in Group Irrelevant. Nevertheless, we were able to replicate the basic findings from Experiment 1 in a situation in which the contexts and cues were presented in a sequential order.

In addition, we observed longer dwell times on contexts that were relevant for the conditional discrimination in Phase 1 than on those that were irrelevant for the simple discrimination. This difference in overt attention between relevant and irrelevant contexts was not only evident in Phase 1, but also persisted into the subsequent phases (see also Le Pelley et al., 2011; Mitchell et al., 2012). This is remarkable, as after Phase 1 there was no actual necessity to process the context stimuli for correct responding in either group. Nevertheless, more attention was paid to contexts that had previously been trained as relevant rather than irrelevant. Thus, our results from eye-gaze behavior, together with the results from the predictive-learning task, support the ideas of Rosas et al. (2006) that once participants paid attention to the context, all further information was processed in a way that rendered it context-specific.

General discussion

In two experiments, we investigated the role of the informational value of contexts for the formation of context-dependent behavior. In each of our experiments, we found that extinction of a stimulus outside of its acquisition context proceeded faster when during initial learning about this stimulus, contexts were trained as being relevant for a discrimination between two other stimuli than when the contexts were irrelevant for the discrimination. This finding indicates that the informational value of a context affects the strength of context dependency of acquisition performance, which had been documented in previous studies by Preston et al. (1986) and León et al. (2008, 2010; León et al., 2012). Furthermore, we were able to find the effect with simultaneous as well as with sequential stimulus presentation.

In addition, in each of our experiments we observed that response recovery following extinction was stronger in the group in which contexts were trained as being relevant for a discrimination than in the one in which contexts were irrelevant. To our knowledge, the present experiments are the first demonstrating this effect (for a theoretical discussion of this finding, see below).

Moreover, in Experiment 2, in which the predictive-learning task was accompanied by recording of eye-gaze behavior, we found longer dwell times on context stimuli that were relevant during training of a conditional discrimination than on ones that were irrelevant for a simple discrimination. We also observed that this difference in overt attention persisted into a subsequent phase in which the contexts were no longer needed in either condition for correct responding (see also Le Pelley et al., 2011; Mitchell et al., 2012).

Before turning to a discussion of our results in terms of theories that assume that animals pay more attention to relevant than to irrelevant stimuli, we explore the implications of our finding that differences in rates of extinction caused by a manipulation of the informational value of the context were accompanied by differences in the strength of renewal.

In both experiments, we found a stronger renewal effect in Group Relevant than in Group Irrelevant. This difference in response recovery might have occurred for at least two reasons. One reason might be that the extinction training established a stronger inhibitory cue Z–outcome association in Group Irrelevant than in Group Relevant. This explanation receives support from our observation that acquisition performance was more strongly disrupted by a context change in Group Relevant than in Group Irrelevant. A second reason might be that extinction learning was more context-dependent in Group Relevant than in Group Irrelevant. This second explanation is supported by our finding from Experiment 2 that participants in Group Relevant fixated the context stimuli longer during the extinction phase than did those in Group Irrelevant. Therefore, it is reasonable to assume that the difference in response recovery between Groups Relevant and Irrelevant was caused by both a difference in the inhibitory associations established during extinction and a difference in the processing of the extinction context.

Although the effect was less powerful than in Group Relevant, we found evidence of a renewal effect in Group Irrelevant of Experiment 1, indicating that contextual stimuli were not completely ignored. According to current accounts of renewal (Bouton, 1997; Darby & Pearce, 1995; Rosas et al., 2006), the unexpected omission of the outcome at the outset of extinction should increase the amount of attention paid to contextual stimuli. This attentional mechanism might have contributed to the development of context-specific extinction performance. However, we found no renewal effect in Group Irrelevant in Experiment 2. One procedural difference that might be responsible for the diverging results is that, in Experiment 2, we implemented sequential presentation of the contexts and cues. This sequential presentation might have reduced the ease with which joint representations of contexts and cues could be developed.

The results of our experiments are consistent with the account of Rosas et al. (2006) that the strength of context-specific learning depends on the amount of attention paid to context stimuli, which is thought to be modulated by specific factors, including the informational value of contexts. Formal versions of the idea that animals pay more attention to relevant than to irrelevant stimuli can be found in several theories of learning and attention (e.g., Kruschke, 1992, 2001, 2006; Mackintosh, 1975; Pearce, George, & Redhead, 1998). For instance, the Mackintosh model supposes that attention to a stimulus increases if an outcome is more accurately predicted on the basis of this stimulus than on the basis of all other simultaneously present stimuli. On the other side, attention to a stimulus will decrease if the outcome is predicted more accurately by other, concurrently present stimuli. However, the theory of Mackintosh adopts an elemental stimulus representation, as in the theories proposed by Kruschke (2001, 2006), assuming that each element of a stimulus compound acquires its own direct excitatory or inhibitory association with the outcome. Performance on a trial is then controlled by the algebraic sum of the associative strengths of the stimuli present. Hence, these models cannot account for the acquisition of a conditional discrimination, as we observed in each of our experiments. However, it is worth mentioning that elemental theories extended by the assumption of a unique cue (e.g., Rescorla, 1973) might also have the power to explain the acquisition of a conditional discrimination. According to this hypothesis, any combination of two or more stimuli creates a unique element that can gain associative strength in the same way as conventional stimuli.

Another framework for the discussion of our experiments is provided by a second class of models that assumes a configural stimulus representation (e.g., Kruschke, 1992; Pearce, George, & Redhead, 1998). According to this view, the entire pattern of stimulation provoked by a specific stimulus compound results in one unitary representation developing a connection to the outcome. The response-eliciting property of a stimulus configuration is then determined by its direct association to the outcome, as well as by the generalized associative strengths of other configurations, whereby the amount of generalization is based on similarity. For instance, Kruschke (1992) provided a connectionist model of category learning named ALCOVE. This exemplar-based, three-layer network proposes that stimuli are represented as points in a multidimensional psychological space. The input layer is characterized as a network of nodes, and each node belongs to one psychological dimension. The activation of a special node demonstrates the value of the stimulus on this dimension. The output layer consists of other nodes, each of which represents a response category. The two layers are connected via an intermediary, hidden layer. Each node of this layer corresponds to one training exemplar, and the activation of one particular node depends on the similarity of this node and the external stimulus.

In order to calculate the similarity between an exemplar and a stimulus, attention increases or decreases the importance of the particular dimension. Every input node has a dimensional attention strength α i , which is learned. Hence, at the beginning of learning, all dimensions have equal attention strengths. In the course of training, ALCOVE increases the attention strength with regard to the relevant dimensions, whereas it decreases attention strength related to irrelevant dimensions.

Overall, the present results provide strong evidence that the informational value of contexts affects the strength of context-dependent learning. Our experiments support the idea that relevant contexts receive more attention, leading to stronger context-specific encoding of the information acquired in these contexts. Besides the evidence from the predictive-learning procedure, our conclusions were also supported by measurements of eye-gaze behavior recorded as a second, independent indicator of attentional changes. Furthermore, we extended the pattern of results by adding a test phase, showing that differences in rates of extinction caused by a manipulation of the informational value of the contexts are accompanied by differences in the strength of renewal.