Intertrial unconditioned stimuli differentially impact trace conditioning
Three experiments assessed how appetitive conditioning in rats changes over the duration of a trace conditioned stimulus (CS) when unsignaled unconditioned stimuli (USs) are introduced into the intertrial interval. In Experiment 1, a target US occurred at a fixed time either shortly before (embedded), shortly after (trace), or at the same time (delay) as the offset of a 120-s CS. During the CS, responding was most suppressed by intertrial USs in the trace group, less so in the delay group, and least in the embedded group. Unreinforced probe trials revealed a bell-shaped curve centered on the normal US arrival time during the trace interval, suggesting that temporally specific learning occurred both with and without intertrial USs. Experiments 2a and 2b confirmed that the bulk of the trace CS became inhibitory when intertrial USs were scheduled, as measured by summation and retardation tests, even though CS offset evoked a temporally precise conditioned response. Thus, an inhibitory CS may give rise to new stimuli specifically linked to its termination, which are excitatory. A modification to the microstimulus temporal difference model is offered to account for the data.
KeywordsTemporal conditioning Trace conditioning Conditioned inhibition Timing Temporal-difference models
It has long been known that whether an association is acquired depends on how closely two distinguishable events occur in time. Research in this area owes a debt to Pavlov (1927, pp. 39–41), who reported that the interval between the conditioned stimulus (CS) and unconditioned stimulus (US) determines whether dogs salivate in anticipation of an acid US following the offset of a tactile CS. Short gaps of a few seconds between CS offset and US delivery resulted in a salivary conditioned response (CR), starting during the CS itself and extending into the CS–US gap, which he labeled the short-trace reflex. As the gap was lengthened, a CR occurred during the CS–US gap but not during CS itself. This suggested to Pavlov that central nervous system activity tied to the recent removal of the CS, and continuing for a time, could support a CR if the gap were not too long. He called this phenomenon the long-trace reflex.
Subsequent research revealed that when a CR is initiated or reaches its maximum level is also affected by the CS–US interval (Bitterman, 1964). Although response latencies often migrate closer and closer over trials to the onset of the effective cue (Schneiderman, 1966), the peak of responding typically builds to asymptotic levels at the experimentally designed CS–US interval (Smith, 1968), including during trace conditioning (Kehoe, Ludvig, & Sutton 2009, 2014). Operant trace conditioning experiments have extended the work of Pavlov (1927) by revealing Gaussian-like response functions centered on the time of reinforcement on unreinforced test trials with minimal responding during the stimulus (Buhusi & Meck, 2000). Echoing the conclusions of an earlier review by Mackintosh (1974, pp. 61–62), Balsam, Drew, and Gallistel (2010) cited additional evidence that CRs early in training are frequently first initiated near the arrival time of the US. For example, they noted that when the CS–US interval is shifted before a CR is acquired, the original and shifted CS–US intervals may both support peaks in responding (Ohyama & Mauk, 2001).
These familiar temporal conditioning phenomena are often used as a platform for testing real-time computational models. One example of such a theory is a relatively new application (Ludvig, Sutton, & Kehoe, 2008, 2012) of the temporal difference (TD) algorithm (Sutton & Barto, 1987, 1990). This family of algorithms derives its name from the idea that organisms predict future states from information in the current moment and bring their predictions as close to reality as possible using an error-correction approach (e.g., Rescorla & Wagner, 1972). Learning occurs when the organism is surprised by an event it did not expect, such as US occurrence. TD models attracted widespread scientific interest in the 1990s with the discovery that dopamine neurons in the monkey brain respond to surprising reward-predicting events (Schultz, Dayan, & Montague 1997).
To encode whether the US occurred, the TD approach assumes the temporally discounted value of the US serves as corrective feedback during each moment in time, where proportionately greater associative strength is supported closer to the time of anticipated US arrival. This discounting mechanism permits responding to propagate backward over trials toward the onset of the CS from the US delivery time. Excitation is passed by temporal contiguity whereby each preceding time step of the CS gradually acquires a level of associative strength that approaches but never fully equals that obtained by the subsequent time step. To encode when the US occurred, the CS is conceived of as a set of temporally defined stimuli with the potential to enter into association with the US. Early implementations of TD approach assumed the temporally defined stimuli had equal potential across the duration of the CS (Moore, Choi, & Brunzell, 1998; Sutton & Barto, 1987, 1990). This so-called complete serial compound (CSC) representation is limited in that it assumes perfect timing and perfect temporal differentiation (see Ludvig et al., 2012). By contrast, in the microstimulus TD model, the CS is thought to trigger smeared representations of temporally defined stimuli generated at distinct times within the duration of the CS (Ludvig et al., 2008, 2012). Each successive microstimulus is increasingly less intense and less temporally specific as the CS is traversed from beginning to end. This permits the time-locked peak CR to diminish in magnitude with increasing interstimulus intervals and allows for increasing generalization across nearby time points.
According to the TD models, the momentary ability of the CS to support a CR depends on the summed associative strength of the microstimuli active at the prescribed time. Similar to the Rescorla–Wagner model, when multiple memory traces from different CSs are simultaneously present, they compete for associative strength, resulting in cue-competition effects, such as blocking (Barnet, Grahame, & Miller, 1993) and conditioned inhibition (Williams, Johns, & Brindas 2008).
It might not be immediately obvious how a model so dependent on contiguity could ever explain trace conditioning. Following Pavlov’s lead, TD models as a class (Ludvig et al., 2008, 2012; Sutton & Barto, 1987, 1990) assume that the offset of a trace CS activates new stimuli that can be differentiated from those of the intertrial interval (ITI). These new stimuli may then persist into the gap and function like a second CS, becoming a direct associate of the US. As the gap is lengthened, the asymptotic level of conditioning at CS offset should diminish, and there should be less associative strength available to spread from CS offset toward CS onset. Furthermore, as a one-time event, CS offset is not expected to continuously generate stimuli during the trace interval and therefore should not “bridge the gap” as effectively as a nominal CS does (Gibbs, Kehoe, & Gormezano, 1991; Kaplan & Hearst, 1982). In summary, the model treats trace conditioning as if it were a variant of the serial conditioning procedure (Kehoe, 1979). The exteroceptive trace stimulus plays the role of the initial CS and the transient stimulus produced at its termination is the terminal CS, which is then reinforced with the US.
The combination of moment-to-moment learning supposed by the TD approach and cue competition leads to some interesting predictions for trace conditioning. For example, according to the CSC and microstimulus TD models (see Ludvig et al., 2009, for a full derivation of this prediction), the initial part of a trace CS should be either excitatory or neutral, and generally not inhibitory, unless the ambient stimuli of the experimental context are reasonably excitatory. Increasing levels of contextual excitation are expected to cause the early part of the trace CS to become increasingly inhibitory. This leads to an interesting and potentially falsifiable prediction. Without modification, TD models do not permit a trace CS be inhibitory across its whole duration and contemporaneously evoke a temporally specific CR in the gap. Pavlov’s (1927) observation of a long trace reflex with minimal responding during the CS suggests the model might be wrong. However, he did not assess the trace CS for inhibition, and the CS might have been neutral right up until it terminated.
The aim of these experiments was to evaluate the foregoing possibility by testing a trace CS (trained with or without intertrial USs) for conditioned inhibition and the ability to trigger a CR in the gap. We chose intertrial USs presented at random times because they are perhaps the most straightforward and effective means to drive up contextual excitation relative to a no-ITI USs baseline (Williams, Lawson, Cook, Mather, & Johns, 2008). All three experiments used an appetitive conditioning procedure with food-restricted rats. Head-entry times are a sensitive measure of temporally defined responding and are normally maximal at the time of US delivery (Williams, Chubala, Mather, & Johns, 2009). Experiment 1 characterized temporally graded responding in trace conditioning in the presence or absence of intertrial USs. Using summation and retardation tests, respectively, Experiments 2a and 2b assessed when conditioned inhibition developed within a trace conditioning trial in the presence of intertrial USs. As previously derived, the microstimulus TD model predicts that inhibition early in the CS should give way to excitation late in the CS, followed by a further increment at CS offset. On the other hand, it is conceivable the trace CS might become inhibitory as a whole: A temporally specific CR might be observed at the arrival time of the US in the gap, while the bulk of CS is inhibitory.
Experiment 1 examined the effects of intertrial USs on second-by-second levels of responding in trace conditioning. Previous experiments from our laboratory have used a standard delay conditioning procedure (the US coincident with CS termination; Williams, MacKenzie, & Johns, 2010), and a procedure in which the CS is extended past the occurrence of the US termination (the US is embedded within the CS; Williams et al., 2008). These delay and embedded procedures were included in Experiment 1 as important comparison points. Temporally specific responding develops during both embedded and delay CSs in the presence of intertrial USs. Williams et al. (2009) reported head-entry times in acquisition peaked above the contextual baseline at specific CS–US intervals when pellet delivery occurred either 10, 30, 60, or 90 s after the onset of the 120-s CS. Extinction tests also revealed spike-like changes (during a single second) in head-entry times when CS onset and offset were associated with pellet delivery at 0 and 120 s, respectively, which they attributed to momentary changes in associative strength. Under an embedded CS–US relationship, the number of sessions to acquisition also increases as the rate of intertrial USs is increased (Williams & Lussier, 2011).
Forty-eight experimentally naïve male Sprague-Dawley rats (Rattus norvegicus) served as subjects. They were obtained from Charles River Canada, St. Constant, QC, Canada, and were approximately 90 days old and 250 g upon arrival. The rats were housed in pairs in solid-bottom plastic cages in a colony room with artificial lighting from 0700–1900 hr. Continuous access to water and food occurred during a 2-week acclimatization period. The rats were then food restricted to 80 % of their free-feeding weights. Conditioning sessions occurred during the light portion of the light–dark cycle.
Subjects were trained in one of eight identical chambers (MED Associates, Georgia, VT) measuring 30-cm length × 25-cm width × 32-cm height. The chambers were housed in separate, ventilated cabinets (Grason-Stadler, West Concord, MA), which minimized outside light and sound. The front and back panels of the chambers were made of aluminum, and the side panels and ceiling were made of transparent acrylic. The floor consisted of 19 stainless steel rods, 0.5 cm in diameter, running parallel to the front panel. The CS was white noise (86 dB, Scale A) delivered from a speaker anchored 3 cm above the center point of the ceiling. The rats could gain access to the food pellet US (Formula 21, Bio-Serv, Frenchtown, NJ) through a 5.0 × 5.0-cm opening in the middle of the front panel, located 2.0 cm above the grid floor. Pellet deliveries were triggered by a sound-attenuated dispenser (ENV-203, MED Associates, Georgia, VT), which produced a mechanical sound of 54 dB when operated in the absence of the noise of the ventilation fans (64 to 68 dB). A computer equipped with MED-PC IV software (MED Associates, Georgia, VT) controlled the presentation of the CS and the US and monitored head-entry behavior. When the rat’s head entered the opening to the recessed food trough, it interrupted an infrared photobeam. The amount of time the photobeam was interrupted on a second-by-second basis served as the dependent variable. On this measure, if the rat’s head remained in the food trough for the entire second, it received a maximal score of 1.0 s, whereas a minimum score of 0.0 s was received if the beam was never broken. The number of beam interruptions per second did not matter.
Before beginning the experiment, the rats were allowed to consume about 20 food pellet USs in their home cages from a dish. Next, they were randomly assigned to one of six groups (n = 8) according to a 2 (intertrial US: no-ITI USs or ITI USs) × 3 (relationship: embedded, delay, or trace) factorial design. Group labels were no-ITI USs/Em (embedded), no-ITI USs/Del (delay), no-ITI USs/Tr (trace), ITI USs/Em (embedded with ITI pellets), ITI USs/Del (delay with ITI pellets), and ITI USs/Tr (trace with ITI pellets). Groups were counterbalanced across specific chambers and running squads. The target US occurred at 110 (Em), 120 (Del), or 130 s (Tr) after the onset of the 120-s CS. The first 10 s after the CS was free of USs in all groups (serving as the gap in the Tr groups). For the trace groups only, the target US was presented 10-s post-CS. For all groups, the ITI was initiated 10 s after CS termination and averaged 340 s (nonuniform distribution with a 120-s minimum and a 460-s maximum). Thus, the interval between subsequent CS presentations was the same for all groups on average. For the ITI USs condition, a long-run average of one intertrial US occurred every 30 s. Intertrial pellet USs were scheduled by programming US deliveries with a probability of 0.033 during each second of the ITI. Although the number of intertrial pellets received in a given ITI could vary, the long-run average was very close to the desired average interval for all subjects. The no-ITI US groups received a pellet-free ITI. There were eight trials during each of the 36 sessions of conditioning. The conditioning parameters were chosen to be as similar as possible to those used by Williams et al. (2009; Williams et al. 2010). A single probe trial was introduced at a random point into each of the last eight sessions as a ninth trial. This probe trial was identical to the other trials except for the absence of the target US and the withholding of any intertrial USs normally scheduled in the 30-s post-CS period (triple the value of the 10-s gap used in the Tr groups). Intertrial USs were permitted throughout the ITI until right before CS onset. The normally 4,100-s sessions were lengthened by 590 s (440-s ITI, 120-s CS, 30-s post) to accommodate the extra probe trial.
Results and discussion
Two of the three predictions made at the outset received support. In acquisition, the presence of intertrial USs markedly decreased anticipatory head-entry behavior during the CS relative to the ITI, and especially so in the group trained with a trace relationship (Prediction 1). Responding remained very low for the duration of the trace CS and did not increase greatly across the CS, suggesting the entire CS rather than just the initial part might be inhibitory (contra Prediction 2). On unreinforced probe trials, responding spiked at the offset of the CS in the delay group, and there was a bell-shaped distribution of responding in the CS–US gap in the trace groups (Prediction 3).
The primary finding was that anticipatory head entries during the trace CS, and less so during the delay CS, were reduced in the presence of intertrial USs compared to their absence. To assess temporal conditioning, we checked for the level of head-entry behavior in the 101- to 110-s interval, which spanned the period immediately prior to the delivery of the target US in the embedded groups. A 2 (intertrial US) × 3 (relationship) × 2 (bin) ANOVA revealed main effects for intertrial US, F(1, 42) = 13.55, MSE = 0.07, p < .0001, η p 2 = 0.24, 95 % CI [.05, .43], and relationship, F(2, 42) = 7.66, MSE = 0.07, p < .001, η p 2 = 0.27, 95 % CI [.05, .44], as well as an intertrial US × relationship interaction, F(2, 42) = 5.04, MSE = 0.07, p < .05, η p 2 = 0.19, 95 % CI [.01, .37]. To examine the source of the interaction, we tested for the simple effect of relationship within each level of the ITI factor. A simple effect was found only in ITI USs condition, F(2, 21) = 12.78, MSE = 0.07, p < .001, η p 2 = 0.55, 95 % CI [.19, .70]. Here, responding in the ITI USs/Em group was greater than in the ITI USs/Del group, F(1, 14) = 5.47, MSE = 0.086, p < .05, η p 2 = 0.28, 95 % CI [0, .55], which in turn was greater than in the ITI USs/Tr group, F(1, 14) = 5.62, MSE = 0.07, p < .05, η p 2 = 0.29, 95 % CI [0, .56]. Thus, while groups trained without intertrial pellets were not significantly different during this period, responding in the ITI USs/Tr group was depressed relative to the ITI USs/Em and ITI USs/Del groups.
We investigated the temporal precision of the CR by searching for the maximum response (y-axis) in individual subjects and then recording the time of the maximum on the x-axis (where 0 = US arrival). A plot of the maximums is shown in the insets in Fig. 3. Some alternative methods considered included those designed for the operant peak procedure (Church, Meck, & Gibbon, 1994). However, these alternative procedures typically assume a break followed by a reasonably long run of responding, which was only true for the Tr groups. Identical maximums were obtained whether the search window was +/-20 s as used in Fig. 3 or the entire trial window from CS onset to 30 s after CS termination. In all cases, the average x-axis maximums were close to the scheduled arrival time of the US (no-ITI USs/Em = -0.12 s, SD = 7.57; no-ITI USs/Del = 0.75 s, SD = 1.17; no-ITI USs/Tr = 0.88 s, SD = 1.81; ITI USs/ Em = 2.75 s, SD = 5.85; ITI USs/ Del = 1.88 s, SD = 1.13; ITI USs/ Tr = 0.63 s, SD = 3.07). There were no mean differences in the x-axis maximums as a function of the intertrial US and relationship variables. This suggests that responding was temporally defined and peaked near the US arrival time.
To confirm that the spikes in responding at CS termination were restricted to the Del groups, we further compared head-entry times during the last second of the CS and the first second immediately following CS termination. A 2 (intertrial US) × 3 (relationship) × 2 (second) ANOVA revealed main effects of intertrial US, F(1, 42) = 34.93, MSE = 0.033, p < .0001, η p 2 = 0.45, 95 % CI [.22, .61], relationship, F(2, 42) = 8.22, MSE = 0.033, p < .0001, η p 2 = 0.28, 95 % CI [.06, .45], second, F(1, 42) = 50.38, MSE = 0.008, p < .0001, η p 2 = 0.55, 95 % CI [.32, .67], and a relationship × second interaction, F(2, 42) = 54.48, MSE = 0.008, p < .0001, η p 2 = 0.72, 95 % CI [.55, .80]. The interaction was caused by an effect of second in the Del groups, F(1, 23) = 25.08, MSE = .035, p < .0001, η p 2 = 0.52, 95 % CI [.24, .69], but not in the Em or Tr groups. It is especially important to note that both delay groups showed a spike. Intertrial USs selectively disrupted responding from CS onset in the ITI USs/Del group, but did not disrupt temporal control at CS offset.
In summary, Experiment 1 found that the introduction of intertrial USs attenuated anticipatory responding during the presentation of both the delay and trace CSs. Responding was particularly low across the duration of the trace CS, which is suggestive of conditioned inhibition. The lack of responding to the trace CS in the presence of intertrial USs was strikingly similar to Pavlov’s (1927) description of the long trace reflex mentioned in the introduction. In both cases, the stimuli arising at CS termination were strongly associated with the US, although the CS itself evoked little responding. The results of Experiment 1, however, extend those of Pavlov (1927) in an important way. The temporal conditioning occurring after the termination of the trace CS survived the introduction of intertrial USs.
Experiments 2a and 2b
As shown in Fig. 4, the summation test involved separate training of a 150-s transfer excitor in preparation for summation testing, followed by unreinforced probe trials with the transfer excitor presented alone and in combination with the 120-s trace CS (simultaneous onsets). Lower head-entry times to the compound than to the transfer excitor alone would suggest conditioned inhibition, whereas the opposite pattern would suggest conditioned excitation. A switch from conditioned inhibition to excitation was expected in the ITI USs/Tr condition either during the trace CS or during the gap. The compound in the gap consisted of internal stimuli created by the offset of the trace CS with an actually present excitor. To corroborate the findings of the summation test, Experiment 2b examined whether acquisition would be retarded if the target US were relocated to 10 s after the onset of the trace CS. It would be hard to claim the trace CS simply directed attention away from the transfer excitor, causing suppression during the summation test, if acquisition was also slow on the basis of a relocated US.
Subjects and apparatus
Each experiment included 48 experimentally naive food-restricted rats, cared for in the same manner as in Experiment 1. The experiments were carried out in a set of six conditioning chambers similar to the ones used for Experiment 1. The chambers were narrower, 22.0 cm, and taller, 27.5 cm. A 2.8-W jeweled light served as the visual CS, located 3 cm above the food aperture on the front panel. White noise (86 dB, Scale A) and a 2900-Hz tone (82 dB) served as the trace or control CS in a counterbalanced fashion. Differences in the configuration of the same physical chamber allowed us to create two contexts. One context consisted of a dimly illuminated chamber with alternating 3.7-cm black-and-white-striped sidewalls. Illumination in this context was provided by a 2.8-W shielded houselight, which was located 3 cm beneath the ceiling in the center of the back wall. The floor was comprised of 0.25-cm steel rods spaced 1.50 cm apart, misaligned in a 0.75 cm up-and-down fashion. The second context was unlit and did not have striped walls. The floor in this context was constructed of 0.50-cm steel rods running evenly from one sidewall to the other, spaced 1.10 cm apart. To further increase the distinctiveness of the second context, a 500-ml aluminum bottle filled with frozen water was placed alongside the wall opposite to the chamber door. It provided tactile, thermal, and visual cues to further help the rats discriminate the contexts.
All rats first received 36 sessions of trace conditioning. As in Experiment 1, half of the rats received intertrial USs at an average rate of one every 30 s, and the other half received none. The trace conditioning sessions were identical to those described in Experiment 1 with the following two exceptions: First, the white noise and tone CSs were counterbalanced as the trace (Tr) and control (Con) CSs, and two configurations of the conditioning chamber were counterbalanced as the conditioning and test contexts. After the completion of 36 sessions of trace conditioning, the rats were assigned to experiment (Experiment 2a or 2b) and tested stimulus (Tr or Con) to equate performance as much as possible while maintaining the counterbalancing of stimuli and contexts. After assignment, they were exposed to the test context for three sessions in preparation for summation and retardation testing. The purpose of the unreinforced context exposure was to reduce the chance that differences in the associative strength of the conditioning context would generalize to the test context. During context exposure, the rats were simply placed in the test context for 1 hr, with no programmed events.
The procedures used in Experiments 2a and 2b differed at this point. In Experiment 2a, the light CS was then conditioned to make it uniformly excitatory. There were eight reinforced presentations of a 150-s light CS in each 1-hr session. During the light CS, the US was delivered at random times at a rate of 0.067 per second (on average one every 15 s). The ITI was US-free. Note, the trace CS was 120 s in duration, making it 30 s shorter than the 150-s light CS. This difference allowed us to then assess the influence of the removal of trace CS on ongoing responding evoked by the light CS during the summation test. The summation test was conducted when the light CS evoked a moderate and consistent level of responding across its duration, which required seven sessions. The test session began with two reinforced presentations of the light CS during a 15-min “warm-up” period. This was followed by four unreinforced test trials with the light CS (+), and four unreinforced compound trials (-) with the light CS beginning in combination with either the trace CS (for half the subjects) or the novel control CS (for the other half of the subjects). The auditory CS terminated after 120 s, whereas the visual CS terminated after 150 s. This manipulation completed the experimental design, resulting in four groups labeled no-ITI USs/Tr, no-ITI USs/Con, ITI USs/ Tr, and ITI USs/Con. The order of the test trials was randomized.
In Experiment 2b, the retardation test began the day following the unreinforced exposure sessions to the test context. Half of the rats were conditioned with the trace CS, and the other half were conditioned with the control CS. Each session included reinforced and probe trials. On reinforced trials, the target US was delivered 10 s after the onset of the 120-s CS, and no other USs occurred. On probe trials, the target US was omitted and the CS was unreinforced. There were 6 sessions of retardation testing with four reinforced and four probe trials scheduled randomly in each 1-hr session (50 % reinforcement schedule). The same group labels were used as in Experiment 2a.
Results and discussion
An initial 2 (intertrial US: present vs. absent) × 2 (bin: 5-s bins) ANOVA confirmed that pre-CS responding was greater in the presence of intertrial USs than in their absence, F(1, 94) = 72.83, MSE = 0.096, p < .0001, η p 2 = 0.44, 95 % CI [.29, .55]. There were no main effects nor interactions involving the bin variable. As shown in Fig. 5, responding dropped shortly after the onset of the trace CS in the ITI USs condition, and there was a lesser increase thereafter in subsequent bins than for the no-ITI USs condition. This observation was supported by a 2 (intertrial US) × 24 (bin) ANOVA, which produced main effects of intertrial US, F(1, 94) = 13.91, MSE = 1.36, p < .001, η p 2 = 0.13, 95 % CI [.03, .26], and bin, F(23, 2162) = 19.10, MSE = 0.29, p < .0001, η p 2 = 0.17, 95 % CI [.13, .19], and an interaction of intertrial US × bin, F(23, 2162) = 10.49, MSE = 0.29, p < .0001, η p 2 = 0.10, 95 % CI [.07, .12]. Both the no-ITI USs and ITI USs conditions showed an abrupt increase in responding at CS offset, which subsequently peaked near the trained US arrival time before declining. A 2 (intertrial US) × 6 (bin) ANOVA applied to the data from the post-CS period found main effects for intertrial US, F(1, 94) = 6.55, MSE = 0.469, p < .05, η p 2 = 0.07, 95 % CI [.00, .18], and bin, F(5, 470) = 76.61, MSE = 0.041, p < .0001, η p 2 = 0.45, 95 % CI [.38, .50], and an interaction of intertrial US × bin, F(5, 470) = 6.92, MSE = 0.041, p < .0001, η p 2 = 0.07, 95 % CI [.02, .11]. Mean responding in the 126 to 130-s interval (the 130-s mark in Fig. 5, which is just before the trained US arrival time) did not differ in the two conditions, F(1, 94) = 1.53, ns. Thus, as in Experiment 1, ITI USs reduced responding during the trace CS, which was followed by a robust CR at CS offset.
The data of most importance are from the CS period. The main result was that responding in the ITI USs/Tr group (bottom right panel of Fig. 6), but not the ITI USs/Con group (bottom left panel of Fig. 6), was strongly attenuated on compound trials. A stimulus × trial type × bin ANOVA on the data shown in the bottom panels (ITI USs/ Con vs. ITI USs/ Tr) revealed an interaction of stimulus × trial type, F(1, 22) = 5.45, MSE = .011, p < .05, η p 2 = 0.20, 95 % CI [.00, .45]. Responding to the excitor in the ITI USs/Tr group was attenuated throughout by the trace CS compared to the excitor on its own, F(1, 11) = 31.06, MSE = .007, p < .001, η p 2 = 0.74, 95 % CI [.32, .85]. No difference between these trials was found in the ITI USs/Con group, F < 1.0. There was also less responding on compound trials, but not excitor alone trials, to the trace CS than the control CS, F(1, 22) = 17.25, MSE = 0.008, p < .001, η p 2 = .44, 95 % CI [.12, .63]. A stimulus × trial type × bin ANOVA on the data shown in the top panels (no-ITI USs/Con vs. no-ITI USs/Tr) revealed an interaction of stimulus × trial type × bin, F(23, 506) = 2.10, MSE = 0.031, p < .01, η p 2 = 0.09, 95 % CI [.03, .09]. In the no-ITI USs condition, the two-way stimulus × trial type interaction was not reliable, F(1, 22) = 1.07, ns, suggesting that negative summation was minimal. Stimulus × trial type ANOVAs over individual bins only found hints of negative summation at the 10-s, 15-s, and 20-s marks of the trace CS. Here, responding to the compound was less than the excitor alone, smallest F(1, 11) = 7.28, MSE = 0.073, p < .05, η p 2 = 0.40, 95 % CI [.01, .65]. Perhaps due to external inhibition (Pavlov, 1927), there was also less responding to the compound than the excitor in no-ITI USs/Con group during a number of bins after the 60-s mark, smallest F(1, 11) = 4.90, MSE = 0.052, p < .05, η p 2 = 0.31, 95 % CI [.00, .59].
Statistical analyses confirmed that responding in the post-CS period (last six bins after the 120-s mark) increased rapidly on compound trials in groups tested with the trace CS relative to controls (top panels: no-ITI USs/ Tr vs. no-ITI USs/ Con; bottom panels: ITI USs/Tr vs. ITI USs/Con). Again, stimulus × trial type × bin ANOVAs were conducted separately for the top and bottom panels. Both analyses revealed three-way interactions of stimulus × trial type × bin, smallest F(5, 110) = 6.02, MSE = 0.010, p < .0001, η p 2 = 0.22, 95 % CI [.07, .31]. To investigate the source of the three-way stimulus × trial type × bin interactions, we conducted trial type × bin ANOVAs within each level of stimulus. A trial type × bin analysis revealed no changes after the removal of the control stimulus (see top left and bottom left panels). On the other hand, after the removal of the trace CS (top right and bottom right panels), there were main effects of bin for both ITI conditions, minimum F(5, 55) = 4.98, MSE = 0.020, p < .001, η p 2 = 0.31, 95 % CI [.07, .43], and trial type × bin interactions for both ITI conditions, minimum F(5, 55) = 9.10, MSE = 0.013, p < .0001, η p 2 = 0.45, 95 % CI [.21, .56]. The interactions were caused by more post-trace CS responding to the compound than to the excitor. This can be seen 10 and 15 s after trace CS termination (130- and 135-s marks in Fig. 6) in the no-ITI USs group, smallest F(1, 11) = 6.06, MSE = 0.081, p < .05, η p 2 = 0.36, 95 % CI [.00, .62], and at 15 s after CS offset in the ITI USs group, F(1, 11) = 8.54, MSE = 0.020, p < .05, η p 2 = 0.44, 95 % CI [.02, .67]. Thus, the sign of the summation effect changed from negative to positive at CS termination in the ITI USs/Tr group, and from neutral to positive in the no-ITI USs/Tr group.
An examination of pre-CS responding with an intertrial US × stimulus × block × bin ANOVA found main effects for intertrial US, F(1, 44) = 18.33, MSE = 0.033, p < .0001, η p 2 = 0.29, 95 % CI [.09, .47], and stimulus, F(1, 44) = 5.04, MSE = 0.033, p < .05, η p 2 = 0.10, 95 % CI [.00, .28], and an interaction of intertrial US × stimulus × bin, F(1, 44) = 6.07, MSE = 0.001, p < .05, η p 2 = .12, 95 % CI [.00, .30]. Although numerical differences in head-entry times were again unimpressive due to the low baseline levels, the pattern was similar to that found in the summation test with more responding in the no-intertrial USs condition.
Rates of acquisition can be discerned by examining the speed of development of a bell-shaped response distribution centered near the 10-s mark (US delivery time) of the CS over two-session blocks. Separate stimulus (2) × block (3) × bin (8) ANOVAs on the data shown in Fig. 7 identified three-way interactions in both the top panels, F(14, 308) = 3.17, MSE = 0.005, p < .0001, η p 2 = 0.13, 95 % CI [.03, .16], and in the bottom panels, F(14, 308) = 3.22, MSE = 0.003, p < .0001, η p 2 = 0.13, 95 % CI [.03, .16]. To better understand the pattern of differences contributing to the interactions, we examined responding in a single bin just before the arrival time of the US (i.e., the 6- to 10-s time bin, dashed line). These comparisons revealed that responding in the no-ITI USs/Tr group was lower than the no-ITI USs/Con group in Blocks 1 and 2, smallest F(1, 22) = 8.40, MSE = 0.032, p < .01, η p 2 = 0.28, 95 % CI [.02, .51]. Thus, the early part of the CS in the no-ITI USs/Tr group appeared somewhat inhibitory, which is consistent with the results of summation testing. By contrast, responding immediately prior to US arrival was consistently lower in the ITI USs/Tr group than the ITI USs/Con group regardless of block, smallest F(1, 22) = 9.33, MSE = 0.023, p < .01, η p 2 = 0.30, 95 % CI [.03, .53]. Other comparisons confirmed less responding in the ITI USs/Tr than no-ITI USs/Tr group during all three blocks, smallest F(1, 22) = 9.33, MSE = 0.023, p < .01, η p 2 = 0.30, 95 % CI [.03, .53]. This difference suggests that the early portion of the trace CS in the ITI USs/Tr group was strongly inhibitory and much more so than in the no-ITI USs/Tr group.
In summary, both tests led to the same conclusion: The trace CS became strongly inhibitory when trained in the presence of ITI pellets. The results from the summation test also demonstrated trace CS termination evoked a temporally defined CR that positively summated with the transfer excitor. In this case, the termination of the CS caused a shift from negative to positive summation. The added stimulus in the gap on compound trials was presumably the internal trace left by a recently presented inhibitor.
These experiments provide an interesting new set of findings about how intertrial USs influence time-based patterns of responding under various CS–US relationships. Experiment 1 found that intertrial USs caused an increasing level of attenuation during the CS itself as the target US was moved in 10-s steps from preceding (embedded relationship) to following CS (trace relationship) the termination of the 120 s. Although intertrial USs caused the most attenuation of responding during the trace CS, a robust CR was still observed in the gap between CS offset and US delivery. This post-CS responding was not simply the resumption of an expectation of randomly distributed intertrial USs, because the CR peaked exactly 10 s after the trace CS terminated. Our trace conditioning data are reminiscent in some ways of those of Pavlov (1927) with minimal (his) or lesser (ours, ITI USs condition) responding during the CS than the pre-CS period, followed by a CR in the gap. A summation test in Experiment 2a subsequently revealed a trace CS trained in the presence of intertrial USs was strongly inhibitory across its whole duration. Experiment 2b found retardation of acquisition to a US relocated within the early part of the trace CS, providing further evidence of conditioned inhibition. Conditioned inhibition (negative summation) gave way to conditioned excitation (positive summation) only during the gap. Taken together, these data suggest excitatory temporal conditioning triggered by the offset of the trace CS survived the introduction of unsignaled USs, although the CS itself became inhibitory.
Unfortunately, the corresponding predictions for the ITI USs condition assuming the same high gamma are less impressive (lower right panel of Fig. 8). Unlike the data, the formal simulations predict a clear temporal pattern, with an increase in associative strength from the initial postonset depression over the course of the CS, irrespective of the CS–US relationship. Decreases in associative strength due to conditioned inhibition during the initial part of the CS are predicted to be followed by an increase to above the contextual baseline for the Em and Del groups. Increasing responding should be seen first in the Em group, then in the Del group, and last in the Tr group. The increase in the Tr group follows because the microstimuli arising later in the CS are paired with the beginning of the trace interval (CS offset), which is excitatory. The expected temporal pattern is consistent with observed responding in the ITI USs/Em group in Experiment 1, but it is somewhat less consistent with responding in the ITI USs/Del group. However, there is a clear mismatch in the ITI USs/Tr group. In Experiment 1, responding did not increase from near-floor levels until very near the termination of the CS, and even then it remained well below pre-CS levels. Likewise, Experiment 2a found negative summation on compound trials throughout the trace CS in the ITI USs/Tr group.
Although experiments from our laboratory have not always found as severe a deficit as depicted in Fig. 3 in delay conditioning (Williams et al., 2010), the microstimulus TD model clearly underpredicts the attenuation of conditioning that can occur during the late portion of a trace CS. Substantially better fits are not found when the coarseness of the representation of the CS is reduced (e.g., when the number of microstimuli per CS is reduced from 60 to 6; see Ludvig et al., 2012) or the simulations assume preasymptotic data (e.g., 200 vs. 2,000 trials). One solution we have explored assumes that long-duration trace or delay CSs might lose their eligibility to acquire strength via pairings with the secondary reward value conditioned to an upcoming stimulus, the gap. A loss of eligibility in combination with the usual assumption of CS offset as a new event might explain our trace conditioning data.
Consistent with this line of thinking, Ludvig, Sutton, Verbeek, and Kehoe (2009) have recently suggested a related modification to the TD model. In recognition of the need for qualitative change in representation over time, they argue that long-latency temporal elements of the CS might require activation of a collateral brain structure, the hippocampus (Shors, 2004). Hippocampally dependent learning is thought to occur whenever long-latency temporal elements of the CS enter into association with the discounted reward value of the US. These elements are thought to ramp slowly to a low asymptote over the course of the continued presence of a long duration CS and then diminish after reaching their preferred temporal bias point. On the other hand, the short-latency elements are thought to be evoked without hippocampal involvement, and differ in their more narrowed and peaked activation patterns. Thus, one could suppose that serial conditioning is weaker with long-latency microstimuli than with short-latency microstimuli because the hippocampal system is less efficient at secondary learning. This distinction between primary and secondary learning also echoes the primary value and learned value (PVLV) model, a competitor to TD learning for explaining appetitive conditioning in the brain (see O’Reilly, Frank, Hazy, & Watz 2007).
A core principle of TD learning, however, is that primary and secondary learning are identical, making this suggested revision to the model a significant deviation in need of further empirical validation. In addition, the effect of reduced eligibility with trace conditioning should occur both with and without ITI USs. Yet, the animals displayed what would seem to be intact eligibility without ITI USs. Finally, this solution does not address fundamental assumptions about TD learning in this situation but rather the particular way stimuli are encoded in the microstimulus TD model.
Figure 8 might provide a clue to a more promising solution. It is possible the extra USs affected the discount factor gamma. In particular, the introduction of intertrial USs might lead to faster discounting of future reinforcement because pellets are more common in the conditioning session as a whole. Thus, the data of the ITI/USs condition might be more appropriately modeled with a lower gamma (gamma = 0.90; lower left panel of Fig. 8), with a higher gamma used for the no ITI/USs condition (gamma = 0.97; upper right panel of Fig. 8). Responding still drops post onset with a lower discount rate (as observed), but remain depressed more broadly across the total duration of the CS (as observed). Background levels of responding also diminish somewhat in the ITI USs condition from an otherwise higher level if gamma is reduced. It is interesting that the baseline differences caused by the introduction of intertrial USs were not all that large, especially in Experiment 1. This could be taken as further evidence for greater discounting of future reinforcement with frequent intertrial USs. Any model attempting to learn the discounted value of future reward could be modified in this way, including but not restricted to the microstimulus TD model.
We have chosen to highlight the predictions of microstimulus TD model because it makes highly constrained predictions in addition to addressing the larger goal of reconciling error prediction learning with the temporal properties of the CR (Kirkpatrick, 2014). However, our data should also be of interest to those studying temporal learning from other real-time perspectives. Most real-time models, such as the componential standard operating procedure model (e.g., Vogel, Brandon, & Wagner, 2003; Wagner, 1981), also correctly predict that trace procedures are more likely to produce conditioned inhibition when the context is excitatory. These models, however, are also faced with the problem of specifying how an inhibitory CS might trigger a CR in the post-CS period. One possibility is to suppose a reduced level of second-order conditioning in the presence of intertrial USs, following the line of thinking mentioned previously for the PVLV model. This suggestion is tempered by an acknowledgement that the mechanism underlying the effects of intertrial USs remains to be determined. The no-ITI USs versus ITI USs manipulation was employed to increase context excitation, which it did for the most part, but it could also have affected discounting, levels of second-order conditioning, or even the current motivational value of the reinforcer (satiation).
Farther afield are timing models. These models assume that intervals of time are the main content of what is learned (Church et al., 1994; Guilhardi, Yi, & Church, 2007). From this perspective, the ability of the rats to time the arrival of a US after CS termination is at issue. These models make the clear prediction that any event, stimulus onset or offset, may serve to mark the beginning of a fixed interval before US arrival (Buhusi & Meck, 2000). Although these theories are be able to accommodate the trace conditioning data found in Experiment 1, it is much less obvious how such an account could handle negative summation (Experiment 2a) or retardation of conditioning (Experiment 2b). Timing models simply do not include the concept of conditioned inhibition (Williams, Johns, et al., 2008) and do not specify when timing will be inhibited. That said, it makes sense that the most proximal stimulus (Fairhurst, Gallistel, & Gibbon, 2003), namely CS offset (Buhusi & Meck, 2000), might have been used by the rats to time to the arrival of the target US. Such timing might be argued to be a separate and distinct process from the “associative properties” of the CS, perhaps involving different brain mechanisms (Meck, 2006). Given this, Experiments 2a and 2b could be interpreted as providing a striking new dissociation: A CS with inhibitory associative properties concurrently acted as a time marker. Although such a dual-systems approach is certainly a possibility worth acknowledging, the data of these experiments can be explained more parsimoniously through the continuing evolution of a model integrating temporal representations into error-prediction mechanisms.
This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada to D. A. Williams.
- Gibbs, C. M., Kehoe, J. K., & Gormezano, I. (1991). Conditioning of the rabbit’s nictitating membrane response to a CSA-CSB-US serial compound: Manipulations of CSB’s associative character. Journal of Experimental Psychology: Animal Behavior Processes, 17, 423–432. doi: 10.1037/0097-7403.17.4.423 PubMedGoogle Scholar
- Kaplan, P. S., & Hearst, E. (1982). Bridging temporal gaps between CS and US in autoshaping: Insertion of other stimuli before, during, and after CS. Journal of Experimental Psychology: Animal Behavior Processes, 8, 187–203. doi: 1037/0097-7403.8.2.187Google Scholar
- Kehoe, E. J., Ludvig, E. A., & Sutton, R. S. (2009). Magnitude and timing of conditioned responses in delay and trace classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). Behavioral Neuroscience, 123, 1095–1101. doi: 10.1037/a0017112 CrossRefPubMedGoogle Scholar
- Ludvig, E. A., Sutton, R. S., Verbeek, E. L., & Kehoe, E. J. (2009). A computational model of hippocampal function in trace conditioning. Advances in Neural Information Processing Systems (NIPS-08), 21, 993–1000.Google Scholar
- Mackintosh, N. J. (1974). The psychology of animal learning. Oxford: Academic Press.Google Scholar
- Moore, J., Choi, J., & Brunzell, D. (1998). Predictive timing under temporal uncertainty: The TD model of the conditioned response. In D. Rosenbaum & A. Collyer (Eds.), Timing of behavior: Neural, computational, and psychological Perspectives (pp. 3–34). Cambridge: MIT Press.Google Scholar
- Pavlov, I. P. (1927). Conditioned reflexes. Oxford: Oxford University Press.Google Scholar
- Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. R. Prokasy (Eds.), Classical conditioning II (pp. 64–99). New York: Appleton-Century-Crofts.Google Scholar
- Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society 355–378.Google Scholar
- Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. R. Gabriel & J. W. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 497–537). Cambridge: Bradford/MIT Press.Google Scholar
- Vogel, E. H., Brandon, S. E., & Wagner, A. R. (2003). Stimulus representation in SOP II: An application to inhibition of delay. Behavioural Processes, 62, 27–48. doi: 1016/S0376-6357(03)00050-0Google Scholar
- Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behavior. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 5–47). Hillsdale: Erlbaum.Google Scholar