A conditioned reinforcer is an initially neutral stimulus that acquires the capacity to function as a reinforcer for an instrumental response (Skinner, 1938). Pavlovian conditioning (Pavlov, 1927; Pearce & Bouton, 2001; Rescorla & Wagner, 1972) is commonly thought to provide the mechanism by which a neutral stimulus acquires such conditioned reinforcing effects (see Williams, 1994). However, existing conceptions of conditioned reinforcement have been little informed by modern advances in the study of Pavlovian conditioning (see Shahan, 2010, for a discussion).

One such advance in the understanding of Pavlovian conditioning is that learning of temporal intervals between stimuli appears to play a critical role in associative learning (for reviews, see Gallistel & Gibbon, 2000; Miller & Matzel, 1988; Savastano & Miller, 1998). Indeed, some have suggested that rather than being a conduit through which associative value is transferred between a conditioned stimulus (CS) and a unconditioned stimulus (US), temporal intervals are themselves the content of what is learned in conditioning preparations (e.g., Balsam, Drew, & Gallistel, 2010; Gallistel & Gibbon, 2000). Furthermore, there is increasing evidence that once such temporal information has been learned, it can be integrated across separate experiences to guide behavior in ways that are difficult to accommodate within traditional transfer-of-value-based accounts of associative learning (e.g., Cole, Barnett, & Miller, 1995; Leising, Sawa, & Blaisdell, 2007; Taylor, Joseph, Zhao, & Balsam, 2013).

To illustrate the role of temporal integration in Pavlovian conditioning, consider an experiment by Cole et al. (1995) using a conditioned lick suppression procedure. Two groups of rats were exposed to either delay- or trace-conditioning protocols in which a CS predicted shock delivery (CS1→US). Following acquisition, groups were further split into a group that was immediately tested for suppression to the training CS1 and a group that received an additional session of pairings of CS1 with a novel CS2 (CS1→CS2) prior to testing for suppression with CS2 (i.e., backward second-order conditioning). Groups immediately tested for suppression with CS1 demonstrated the standard trace-conditioning deficit. However, the effect was reversed in groups tested for suppression with CS2 following backward second-order conditioning. A second experiment replicated the effect when the order of conditioned suppression training and backward second-order conditioning was reversed (i.e., sensory preconditioning). These results raise challenges for a simple transfer-of-strength-based account of associative learning. Such a view can readily accommodate decreases in the strength of conditioned responding to CS1 as a function of the delay between CS1 and the US in delay and trace conditioning. However, such a view would suggest that at the end of training, the associative strength of CS1 would be relatively weaker following trace than following delay conditioning. Thus, in the following second-order conditioning phase, CS1 should have had less associative strength to condition to CS2, resulting in less responding engendered by CS2, not more. A traditional approach is further challenged by the fact that CS2 for the trace-conditioning group would have needed to acquire its associative strength in the second-order conditioning phase via backward conditioning (CS1→CS2). If such backward conditioning was responsible for the effects of CS2 following training, then because the second-order conditioning phase was the same for both groups, the delay-conditioning group should have shown greater responding to CS2 via the presumed greater associative strength of CS1 for that group. Alternatively, as was noted by Cole et al., if it is assumed that the rats learned and integrated the temporal relations between events across phases, the effect can be understood as resulting from CS2 being a better predictor of impending shock for the trace-conditioning group than for the delay-conditioning group (i.e., the temporal-coding hypothesis; Miller & Matzel, 1988; see Gallistel & Gibbon, 2000, for a related view).

Other studies have demonstrated related temporal integration effects with appetitive conditioning. For example, Leising et al. (2007) used an appetitive sensory preconditioning procedure with rats to investigate integration of temporal information in a timing task. Two groups of rats received preconditioning trials with a compound CS consisting of a 60-s auditory stimulus (CS A; noise or tone) and a 10-s visual stimulus (CS X; flashing light). In one group, CS X was presented 5 s following the onset of CS A (Group Early). In the other group, CS X was presented 45 s following the onset of CS A (Group Late). All rats then received simultaneous presentations of CS X and sucrose prior to probe tests in which only CS A was presented. If rats based their expectation of food delivery in CS A on the relative durations of CS A and CS X, their pattern of magazine entries during CS A would be expected to differ in the two groups. Results confirmed this prediction, showing that magazine entries during CS A were higher in the early portion of CS A in Group Early but higher in the later portion for Group Late.

Taylor et al. (2013) also investigated temporal integration in appetitive conditioning following sensory preconditioning. A series of experiments used procedures known to produce temporal integration in fear conditioning (Arcediano, Escobar, & Miller, 2003). Rats received either backward (CS1–CS2) or forward (CS2–CS1) sensory preconditioning before first-order forward (CS1→US) or backward (US→CS1) conditioning of CS1. Rats that received forward sensory preconditioning prior to backward first-order conditioning showed greater approach responding to CS2, as compared with rats that received forward sensory preconditioning and forward first-order conditioning. On the other hand, rats that received backward sensory preconditioning prior to forward first-order conditioning showed greater responding to CS2, as compared with rats that received backward sensory preconditioning and backward first-order conditioning. Again, these results suggest that animals can integrate temporal information from independent experiences across phases of appetitive conditioning.

It is well-known that, once established, an appetitive CS can be used as a conditioned reinforcer for an instrumental response (e.g., Hyde, 1976; Parkinson, Roberts, Everitt, & Di Ciano, 2005). Although temporal learning and integration have been demonstrated in a variety of Pavlovian paradigms, the possible role of temporal integration in the generation of an instrumental conditioned reinforcer has not been examined. Thus, the present experiment used an appetitive conditioning procedure modeled on the fear conditioning procedure of Cole et al. (1995) to examine such an effect. Groups of rats received either delay or trace appetitive conditioning in which a neutral stimulus (CS1) predicted response-independent food deliveries (CS1→US). A control group received random presentations of CS1 and the food pellet US. Following appetitive conditioning, half of the animals in the delay- and trace-conditioned groups were tested immediately for the ability of CS1 to function as a conditioned reinforcer for instrumental leverpressing. For the remaining animals, including the random control group, training and testing were separated by a single session of backward second-order conditioning of CS1 and a novel CS2 (CS1→CS2). We then assessed the ability of CS2 to function as a conditioned reinforcer for a new instrumental response (Table 1). In the backward second-order conditioning and acquisition of a new instrumental response phases, the random control group allowed for comparison of responding in delay- and trace-conditioned groups to a level of responding that would be expected if CS1 were not a predictor of the US at the time of pairings with CS2. Thus, any excitatory or inhibitory effects of pairing CS1 with CS2 could be assessed in comparison with a group in which CS1 would be associatively neutral (Rescorla, 1967, 1972).

Table 1 Experimental design

Method

Subjects

A total of 50 male Long Evans rats, 80 days of age, participated in the study. Rats were housed in Plexiglas home cages in a colony room with 12:12-h light:dark cycle and were allowed ad lib access to water. Rats were food deprived to 80% of their ad libitum weights prior to the beginning of the experiment. All housing and experimental procedures were conducted in accordance with the guidelines put forward by the Utah State University Institutional Animal Care and Use Committee.

Apparatus

Four Med-Associates modular operant chambers (30 × 24 × 21 cm) housed in sound-attenuating cubicles were used. Operant chambers consisted of two Plexiglas and two aluminum walls on opposite sides. Each chamber contained a response panel with two retractable levers positioned equidistant on either side of an aperture (5 × 5 cm) into which food pellets could be delivered (45-mg dustless precision food pellets; Bio-Serv, Frenchtown, NJ), with an interior light and a photobeam to record head entries. A houselight and a Sonalert (2900 ± 500 Hz, 75–85 dB) were located above the food aperture, and a clicker (75–85 db) was located above one of the levers (counterbalanced across chambers). Three light-emitting diodes were located above each lever on the response panel. The opposite side of the chamber contained five apertures evenly spaced horizontally across the bottom of the panel. Med Associates interfacing and software were used for control of experimental events and recording of responses.

Procedure

With the exception of the acquisition of a new response test phase, all sessions were conducted in chambers with no illumination and levers retracted. Stimuli consisted of a tone (Sonalert) and a click-train (0.5 s on/off). Stimuli were counterbalanced within groups such that an equal number of rats experienced each stimulus in each group. For second-order conditioning groups, the tone and click were counterbalanced as CS1 and CS2. Table 1 presents order of training and testing for all groups. Sessions were conducted 7 days per week at approximately the same time each day (0800 h).

Rats were trained to eat from the pellet receptacle in a 40-min session consisting of the delivery of 20 food pellets according to a variable-time 120 s schedule. Following magazine training, rats received one 40-min session in which they were placed in the experimental chamber and no stimuli or food pellets were presented, in order to extinguish any conditioning to the experimental context (Lattal, 1999).

Figure 1 shows the procedures for delay- and trace-conditioning and backward second-order conditioning groups. Following initial magazine training and context extinction, rats were assigned to delay conditioning only (delay-only; n = 10), trace conditioning only (trace-only; n = 10), delay conditioning prior to second-order conditioning (delay-SO; n = 10), trace conditioning prior to second-order conditioning (trace-SO; n = 10), or random presentations prior to second order conditioning (random-SO; n = 10) groups. Each group received eight conditioning sessions consisting of 20 presentations of CS1 and the food pellet US.

Fig. 1
figure 1

Diagram of conditioning procedures. Four groups received sessions of either delay or trace conditioning in which CS1 (tone or click, counterbalanced) was paired with delivery of a food pellet. A fifth group (not portrayed) received random deliveries of CS1 and food pellets. For the delay-, trace-, and random-SO groups, CS1 was then paired with a novel CS2 (click or tone) in one session of backward second-order conditioning

Delay conditioning consisted of trials on which a 10-s CS1 was presented and co-terminated with the delivery of a food pellet US. Food presentations were separated by a variable 120-s inter-US interval, with the constraint that two USs could not be presented less than 20 s apart. One rat failed to show any evidence of magazine approach in the delay-only group and was, thus, eliminated from the study. Trace conditioning consisted of trials on which the presentation of a 10-s CS1 was followed by the delivery of the food pellet US after a 10-s trace interval. Food presentations were separated by a 120-s inter-US interval on average, with the constraint that two US presentations could not be presented less than 30 s apart. Random conditioning sessions consisted of 20 CS1 presentations and 20 US presentations delivered via independent, concurrently operating schedules with a mean interval of 120 s.

Backward second-order conditioning consisted of 20 presentations of CS1 immediately followed by a 10-s presentation of CS2. Presentations of CS2 were separated by a 120-s inter-CS2 interval on average, with the constraint that two CS2s could not be presented less than 20 s apart.

Following training, all rats were tested for the ability of a CS to serve as a reinforcer for leverpressing in the absence of food pellet deliveries in four 60-min sessions. In delay- and trace-only groups (n = 9 and 10, respectively), testing sessions began in the session immediately following appetitive conditioning. Testing sessions began with the insertion of a retractable lever into the chamber, and each leverpress produced 3-s presentations of CS1 (Parkinson et al., 2005). Leverpresses during a CS were recorded separately but had no scheduled consequences. In delay- , trace-, and random-SO groups (n = 10), in which testing and acquisition were separated by one session of backward second-order conditioning, testing sessions began in an identical manner, and leverpresses produced 3-s presentations of CS2.

Results

Figure 2 shows acquisition of conditioned approach in delay- and trace-only groups measured as food aperture photobeam breaks expressed as elevation scores calculated by subtracting the number of beam breaks occurring during the 10-s preceding CS1 (pre-CS1) from the number of beam breaks occurring during the 10-s CS1 period. A group (2) ×session (8) repeated measures ANOVA found significantly greater responding in the delay-only group than in the trace-only group, F(1, 17) = 7.59, MSE = 27,147.39, p = .01. Both groups increased responding over sessions, F(7, 119) = 6.51, MSE = 1,768.01, p < .001, and training group interacted with session, F(7, 119) = 3.08, p = .005. Thus, the delay-only group showed a greater amount of conditioned approach to the CS1 than did the trace-only group. In addition, responding in the pre-CS1 period did not differ between groups across acquisition sessions, F(1, 17) = 4.26, MSE = 3,485.27, p = .06. Responding in the pre-CS1 period increased across acquisition sessions, F(7, 119) = 7.57, MSE = 519.21, p < .001. However, the increase in pre-CS1 responding did not interact with training group, F(7, 119) = 1.29, p = .26 (Table 2).

Fig. 2
figure 2

Magazine approach in delay- and trace-only groups. Mean magazine entries over sessions of acquisition. Error bars represent SEMs

Table 2 Average magazine entries during the pre-CS1 period in each acquisition session

Figure 3 shows average leverpresses from the two sessions of the acquisition of a new response testing (CS1) condition for delay- and trace-only groups. A group (2) × session (2) repeated measures ANOVA revealed higher responding in the delay-only group, F(1, 17) = 6.43, MSE = 394.72, p = .02, and no effect of session, F(1, 17) = 0.75, MSE = 187.37, p = .40, or group × session interaction, F(1, 17) = 3.19, p = .09.

Fig. 3
figure 3

Leverpressing in delay- and trace-only groups. Mean leverpresses over sessions of acquisition of a new response testing with CS1. Error bars represent SEMs

Figure 4 shows acquisition of magazine approach in the delay-SO, trace-SO, and random-SO groups during initial CS1→US training as the difference between CS1 and pre-CS1 photobeam breaks. A group (3) × session (8) repeated measures ANOVA found significant differences in magazine entries between groups, F(2, 27) = 11.19, MSE = 5,052.94, p < .001, a reliable increase in responding across sessions, F(7, 189) = 8.26, MSE = 764.66, p < .001, and a group × session interaction, F(14, 189) = 5.55, p < .001. A similar analysis was applied to magazine entries in the 10-s pre-CS period (Table 2). A difference in responding in the pre-CS1 period was observed between groups, F(2, 27) = 4.16, MSE = 3,336.86, p = .03. However, pre-CS1 responding did not increase across acquisition sessions, F(7, 189) = 1.67, MSE = 409.15, p = .12, and groups did not interact across sessions, F(14, 189) = 1.01, p = .44. The between-group difference was due to lower pre-CS1 responding in the delay-SO group. Removing random-SO and comparing delay- and trace-SO groups in a group (2) × session (8) repeated measures ANOVA confirmed no between-group difference in pre-CS1 responding, F(1, 18) = 3.88, MSE = 3,687.10, p = .06, no significant increase in pre-CS1 responding over sessions, F(7, 126) = 0.69, MSE = 471.08, p = .68, and no interaction, F(7, 126) = 1.83, p = .32.

Fig. 4
figure 4

Magazine approach in second-order (SO) conditioning groups. Mean magazine entries per session for delay-, trace-, and random-SO groups in each session of acquisition. Error bars represent SEMs

Magazine responding during each CS in the backward second-order conditioning session (Table 3) was compared in a one-way ANOVA. Elevation scores for CS1 differed across groups, F(2, 27) = 7.51, MSE = 749.54, p = .003. Tukey pairwise comparisons found that delay-SO (M = 61.10, SD = 40.66) differed from trace-SO (M = 20.40, SD = 18.88) and random-SO (M = 19.60, SD = 15.45; p = .007 and .006, respectively), and trace-SO and random-SO groups did not differ from one another (p = .99). A one-way ANOVA comparing magazine entries during CS2 found no differences between groups, F(2, 27) = 0.21, MSE = 1,791.71, p = .82. In addition, a one-way ANOVA comparing the difference in magazine entries in CS2 and CS1 did not find a difference between groups, F(2, 27) = 2.81, MSE = 2,301.53, p = .08.

Table 3 Average magazine entries during backward second-order conditioning

Figure 5 shows average leverpresses from the two sessions of the acquisition of a new response testing (CS2) condition for delay-SO, trace-SO, and random-SO groups. A group (3) × session (2) repeated measures ANOVA found significant between-group differences in leverpressing, F(2, 27) = 3.71, MSE = 29.19, p = .04, and a significant decrease in leverpressing across testing sessions, F(1, 27) = 37.93, MSE = 9.37, p < .001. Importantly, there was a significant group × session interaction, F(2, 27) = 4.85, p = .02, suggesting that the between-group difference changed from session 1 to session 2 of testing. Simple effects were compared at each level of session: A one-way ANOVA comparing leverpressing in session 1 was significant, F(2, 27) = 4.78, MSE = 25.77, p = .02; the same comparison for session 2 was not, F(2, 27) = 2.38, MSE = 12.79, p = .11. For session 1, Tukey post hoc comparisons found that leverpressing in the trace-SO group (M = 13.20, SD = 5.0) differed significantly from that for the delay-SO (M = 6.80, SD = 5.75; p = .02) and random-SO (M = 7.50, SD = 4.38 p = .04) groups and that delay-SO and random-SO did not differ (p = .95). Thus, after backward second-order conditioning, CS2 served as a more effective conditioned reinforcer for an instrumental response in the first test session for rats that had previously experienced trace conditioning with CS1 than for rats that had initially experienced delay or random conditioning.

Fig. 5
figure 5

Leverpressing in second-order conditioned groups. Mean leverpresses per session for delay-, trace-, and random-SO groups in each session of acquisition of a new response testing with CS2. Error bars represent SEMs

Discussion

In the present study, appetitive responding as measured by both approach and acquisition of instrumental responding (i.e., conditioned reinforcement) was greater with delay conditioning than with trace conditioning. This effect is not surprising and is consistent with long-standing demonstrations of reductions in conditioned responding with trace- versus delay-conditioning preparations (e.g., Pavlov, 1927). However, when training and testing phases were separated by a single session of second-order conditioning with backward pairings of the training CS1 and a novel CS2 in the absence of the food US, we observed greater approach and leverpressing for CS2 in trace-conditioned groups. In addition, rats that had been exposed to trace conditioning prior to backward second-order conditioning also showed greater approach and leverpressing than rats previously exposed to randomly presented CS1 and food deliveries. Thus, the reliable temporal relation between CS1 and food arranged for the trace-conditioning group was necessary for the development of the appetitive effects of the CS2 stimulus following backward second-order conditioning. These findings extend recent studies of temporal mapping in associative learning (e.g., Cole et al., 1995; Leising et al., 2007; Taylor et al., 2013) to the generation of an instrumental conditioned reinforcer.

There are alternative accounts of the present results based on associative learning perspectives. If animals learn to discriminate the onset, offset, and trace components of a CS, the present finding of excitatory responding to a CS2 inserted into the trace interval is not necessarily surprising and is potentially explained by traditional associative accounts (Pearce & Bouton, 2001). Pairing with the trace interval could have caused CS2 to become excitatory for trace-SO animals, and pairing with the portion of the ITI following food delivery could have caused CS2 to become inhibitory for delay-SO animals through the affective representation of the US activated by CS1 or some other inferential process across phases (Holland, 1990; Mackintosh & Dickinson, 1979; Wagner, 1981). However, such associative accounts have difficulty reconciling two important aspects of the present results.

First, the present study found no difference between groups in the number of CS2 magazine entries during backward second-order conditioning; however, when tested, leverpressing for CS2 was greater in trace-SO, in comparison with delay- and random-SO. Thus, there was no evidence of differential excitation or inhibition to CS2 during backward second-order conditioning, but learning about CS2 as a conditioned reinforcer was expressed as greater leverpressing in animals that had CS2 backward paired with a trace-conditioned CS1. This is consistent with related studies in which temporal integration effects have been shown to be expressed in tests of performance, and not incrementally across experiences (Molet, Jozefowiez, & Miller, 2010; Taylor et al., 2013).

Second, an associative explanation might suggest that CS2 could have acquired conditioned inhibitory properties in the backward second-order conditioning phase for both delay-SO and trace-SO groups (Rescorla & Wagner, 1972). Subsequent differences in leverpressing may then be due to relatively greater conditioned inhibition in the delay-SO group. However, this explanation cannot account for the elevation of trace-SO group leverpressing above that of the random-SO group, which should not have acquired any inhibition. Additionally, if CS2 had become an inhibitor for the delay-SO group, suppression of CS2 responding would be expected in the backward second-order conditioning and acquisition of a new response phases in comparison with the random-SO group; this was not the case. Random- and delay-SO groups did not differ in the amount of CS2 magazine entries during backward second-order conditioning or in the acquisition of instrumental leverpressing. This suggests that weak support of leverpressing by CS2 in delay-SO was not due to CS2 acquiring associative interference or conditioned inhibitory functions. If these potential functions played a role in weakening the ability of CS2 to serve to maintain leverpressing in delay-SO, they were indistinguishable from the random-SO group, to which CS1 would be expected to be associatively neutral (Rescorla, 1967).

The present study successfully replicated and extended earlier studies (Cole et al., 1995) in which conditioned lick suppression integrated new information across training phases to an appetitive situation that required the integration of knowledge controlling magazine approach across phases to transfer to instrumental leverpressing. By extending the phenomenon of temporal integration to instrumental conditioned reinforcement, the present experiment contributes to a growing body of evidence challenging a traditional account of associative learning based on the simple transfer of associative strength between stimuli. An account of these data based on Cole et al. and the temporal-coding hypothesis (Miller & Matzel, 1988) is that rats integrated the temporal relations from training and backward second-order conditioning phases to form a temporal map, thus allowing CS2 to function as a better predictor of food. In addition, these findings are consistent with other timing-based models of conditioning in which learning, remembering, and comparing temporal intervals serves as the foundation of associative learning (Balsam et al., 2010; Gallistel & Gibbon, 2000).

Finally, the present findings also pose a challenge for contemporary models of conditioned reinforcement within the general tradition of the matching law (Baum, 1974; Herrnstein, 1961). Such models (e.g., delay-reduction theory [Squires & Fantino, 1971]; contextual choice model [Grace, 1994]; hyperbolic value-added model [Mazur, 2001]) largely have been developed to account for the impact of conditioned reinforcers on choice responding, but these models are typically assumed to provide a general framework for understanding conditioned reinforcement value (e.g., Fantino, 1977). All of the models do a good job accounting for how changes in the value of a conditioned reinforcer varies with manipulations of schedule and parameters of primary reinforcement. Although the models differ in the details of how they calculate the value of a conditioned reinforcer, all assume that conditioned reinforcers acquire their value via Pavlovian conditioning and include a critical role for the delay between the onset of a putative conditioned reinforcer and delivery of the primary reinforcer (i.e., US). As a result, the present finding of greater conditioned reinforcing effectiveness with trace than with delay conditioning following backward second-order conditioning poses the same challenges for such models as for traditional transfer-of-value-based approaches to associative learning noted above. Like traditional associative models, operant theories of conditioned reinforcement are based on the notion that value is transferred from primary reinforcers to conditioned reinforcers in a delay-dependent fashion. Thus, it is not clear how such models can accommodate the greater conditioned reinforcing effects of CS2 when backward second-order conditioning follows trace, as compared with delay, conditioning.

That being said, it is important to note that matching-law-based models of conditioned reinforcement also include a critical role for the temporal context (i.e., overall average time to primary reinforcement) in which a conditioned reinforcer occurs. Temporal context is accommodated in such models by assuming that conditioned reinforcement value increases (i.e., delay reduction theory, hyperbolic value-added model) or sensitivity to differences in value changes with the temporal context (contextual choice model). Thus, it is possible that these models could be modified to incorporate the sort of learning, remembering, and integrating of temporal information hypothesized by temporal coding/mapping and related approaches. Such a modified approach to operant-choice theories of conditioned reinforcement might suggest that the process of temporal integration can generate a stimulus of value that can then be used in the models to determine behavioral allocation in a fashion consistent with the basic matching-law-based architecture of the theories.

Regardless, the present experiment suggests that the generation of the conditioned reinforcing function of stimulus might involve organisms forming and using a much richer representation of temporal events than is currently portrayed by contemporary models. As such, it appears that continued investigation of how temporal integration and other modern developments in Pavlovian conditioning are involved in conditioned reinforcement could lead to a more accurate portrayal of how reinforcing function might be acquired by previously neutral stimuli.