Solving Pavlov's puzzle: Attentional, associative, and flexible configural mechanisms in classical conditioning
This article introduces a new “real-time” model of classical conditioning that combines attentional, associative, and "flexible" configural mechanisms. In the model, attention to both conditioned (CS) and configural (CN) stimuli are modulated by the novelty detected in the environment. Novelty increases with the unpredicted presence or absence of any CS, unconditioned stimulus (US), or context. Attention regulates the magnitude of the associations CSs and CNs form with other CSs and the US. We incorporate a flexible configural mechanism in which attention to the CN stimuli increases only after the model has unsuccessfully attempted learn input-output combinations with CS–US associations. That is, CSs become associated with the US and other CSs on fewer trials than they do CNs. Because the CSs activate the CNs through unmodifiable connections, a CS can become directly and indirectly (through the CN) associated with the US or other CSs. In order to simulate timing processes, we simply assume that a CS is formed by a temporal spectrum of short-duration CSs that are activated by the nominal CS trace. The model accurately describes 94 % of the basic properties of classical conditioning, using fixed model parameters and simulation values in all simulations.
KeywordsAssociative learning Attention Configural learning
Although apparently simple, classical conditioning is a collection of diverse phenomena that cannot be reduced to one simple equation. In contrast, we have suggested that conditioning can be described only by a complex combination of different, sometimes redundant mechanisms, each one with its own rules and parameters (Schmajuk, 2010).
Summary of the simulated results (for details of the simulations, see the online Supplemental Materials)
r(8) = .87, p < .05
Suppl. Fig. 2
1.2. Partial reinforcement
r(8) = .87, p < .05
Suppl. Fig. 2
1.3 US- and CS-specific CR
r(2) = .97, p < .05
Suppl. Fig. 3
1.4 Conditioned diminution of the UR
r(4) = .80, p < .05
Suppl. Fig. 4
1.5. Concurrent Excitation and inhibition with few trials
r(16) = .80, p < .05 Stages 1 & 2
1.6.Conditioning proceeds faster with imperfect predictors
r(6) = .87, p < .05 Stage 3
r(1) = .86, p < .05
Suppl. Fig. 54
2.2 Partial reinforcement extinction effect (PREE)
r(8) = .73, p < .05
2.3 Renewal in a context of extinction
See Renewal in 10.7
Suppl. Fig. 55
r(3) = .98, p < .05
Suppl. Fig. 7
3.2 External inhibition
r(1) = .998, p < .05
Suppl. Fig. 8
3.3. Asymmetrical generalization decrement
r(7) = .86, p < .05
4.1 Discrimination between CS + and CS−
r(18) = .92, p < .05
Suppl. Fig. 10
4.2 Positive patterning
r(70) = .90, p < .05
Suppl. Fig. 11
4.3 Negative Patterning
r(70) = .90, p < .05
4.4 Positive patterning is easier than negative patterning.
r(48) = .94, p < .05
Suppl. Fig. 13
r(19) = .93, p < .05 Group Simple
4.5 Adding a common cue to NP decreases discrimination
r(19) = .88, p < .05 Group Common
4.6 Patterning discriminations with 3 CS is learnable
r(28) = .86, p < .05
4.7 Biconditional discrimination
r(28) = .75,p < .05
4.8 Biconditional discrimination is harder than NP
4.9 Biconditional discrimination is harder than component discrimination
r(28) = .75, p < .05
r(34) = .97, p < .05 (Exp.1)
4.10 Following A+/B + and X-/Y − training, AY+/ AX − discrimination is faster than AY+/ BY − discrimination
)r(16) = .91, p < .05 (Exp. 3)
4.11 Following AW + and BX + and nonreinforced CW − and DX − training , AW+/AX − discrimination is slower than AW+/BX − discrimination.
r(18 ) = .98, p < .05
Suppl. Fig. 19
4.12 Simultaneous feature-positive discrimination
r(34) = .63, p < .05
4.13 Serial feature-positive discrimination
r(34) = .63,p < .05
4.14 Simultaneous feature-negative discrimination
r(28) = .67, p < .05
4.15 Serial feature-negative discrimination
r(28) = .67, p < .05
4.16 Feature positive discrimination is easier than feature negative
See 4.12 and 4.14
4.17 Common feature in both a serial feature negative and serial feature positive discrimination
r(14) = .84, p < .05
Suppl. Fig. 22
5. Inhibitory conditioning
5.1 Conditioned Inhibition
r(1) = .998, p < .05
Suppl. Fig. 8
r(1) = .99, p < .05
Suppl. Fig. 23
5.3 Extinction of conditioned inhibition. Inhibitory conditioning is extinguished by CS1–CS2–US presentations
r(2) = .90, p < .5
Suppl. Fig. 24
5.4 Following conditioned inhibition, reinforced and nonreinforced presentations of excitor CS1 might modify the power of CS2 in summation and retardation tests
(2) = 0.92,p < 0.05 (CX = 0.3) ODORS
Suppl. Fig. 25
5.5 Following conditioned inhibition, nonreinforced presentations of inhibitor CS2 does not modify the power of CS2 in summation and retardation tests
r(2) = .99, p < .5 (summation), r(16) = .87, p < .05 (Exp. 3)
Suppl. Fig. 26
5.6 Differential conditioning
r(2) = .94, p < .05 (summation test)
rD: 4.80, rS: 14 (retardation test)
Suppl. Fig. 27
6. Combination of separately trained CSs
6.1 When two CSs independently trained with the same US are tested in combination, there is more likely to be a summative CR when the CSs are in different than in the same modality
r(28) = .63, p < .05
Suppl. Fig. 28
6.2 CSs that are trained with aversive USs may acquire broad tendency to potentiate defensive CRs and suppress appetitive CRs
6.3 CSs that are trained with appetitive USs may acquire broad tendency to potentiate appetitive CRs and suppress defensive CRs
7. Stimulus competition/potentiation in training
7.1 Relative validity
rD = 3.80, rS = 4.2
Suppl. Fig. 29
r(1) = .99, p < .05
Suppl. Fig. 30
7.3 Unblocking by increasing the US
r(2) = .90,p < .05
7.4 Unblocking by decreasing the US
r(2) = .90, p < .05
r(1) = .99,p < .05
Suppl. Fig. 30
rD: 2.26; rS: 2.06
Suppl. Fig. 32
7.7 Backward blocking
rD: 1.40; rS: 4.90
Suppl. Fig. 33
r(3) = .93, p < .05
Suppl. Fig. 34
r(1) = .95, p < .05
Suppl. Fig. 35
7.10 Unequal learning about CSs in compound
rD: 1.82; rS: 1.70
Suppl. Fig. 36
7.11 Temporal primacy overrides prior training
r(18) = .68, p < .05
8. CS/US preexposure effects
8.1 Latent inhibition
r(2) = .91, p < .05
Suppl. Fig. 38
8.2 A change of context disrupts latent inhibition
r (2) = .91, p < .05
Suppl. Fig. 38
8.3 Presentation of a different CS before conditioning disrupts latent inhibition
r(1) = .99, p < .05
Suppl. Fig. 39
8.4 Preexposure to a context facilitates the of fear conditioning
r(2) = .96, p < .05
8.5 US––preexposure effect
rD: 2, rS: 2
Suppl. Fig. 41
8.6 Learned irrelevance
r(10) = .059, p < .05
8.7 Perceptual learning
r(1) = .97, p < .05
Suppl. Fig. 43
8.8 Hall–Pearce effect
r(16) = .79, p < .05
Suppl. Fig. 44
8.9. Presentation of the US before a CS–US pairing impairs conditioning
r(6) = .87, p < .05
Suppl. Fig. 45
8.10. Presentation of the CS before a CS–US pairing impairs conditioning
r(4) = .85, p < .05
Suppl. Fig. 46
8.11. Super latent Inhibition.
r(2) = .99, p < .05 (CX = 0.3) ODORS
Suppl. Fig. 47
9.1 Extinction (see 2.1)
r(1) = .86, p < .05
Suppl. Fig. 54
r(16 ) = .72,p < .05 (Exp.1A),
r(24) = .80,p < .05 (Exp1B)
Suppl. Fig. 48
9.3 Counterconditioning. 5.3. Inhibitory conditioning is extinguished by CS1–CS2–US presentations
9.4 Transfer along a continuum.
r(14) = .64,p < .05
Suppl. Fig. 49
10.1 Recovery from latent inhibition
r(2) = .91, p < .05
10.2 Recovery from overshadowing
r(1) = .99,p < .05
10.3 Recovery from forward blocking
r(2) = .99, p < .05
10.4 Recovery from backward blocking
10.5 External disinhibition
r(1) = .96, p < .05
Suppl. Fig. 53
10.6 Spontaneous recovery
r(18) = .93, p < .05
rD: 2.11; rS: 2.55 (CX = 0.3) ODORS
Suppl. Fig. 55
r(4) = .96, p < .05
Suppl. Fig. 56
11. Higher order conditioning
11.1 Sensory preconditioning
rD: 19.5, rS: 42.20
Suppl. Fig. 57
11.2 Second-order conditioning
r(4) = .99, p < .05
11.3 Second-order conditioning vs. conditioned inhibition
r(4) = .99, p < .05
11.4 Inhibitory sensory preconditioning is possible
11.5 Mediated acquisition
11.6 Mediated extinction
rD: 1.26, rS: 2.35
Suppl. Fig. 59
12. Temporal Properties
12.1 Interstimulus Interval (ISI) effects
r(10) = .85, p < .05
Suppl. Fig. 60
12.2 Intertrial Interval effects
r(4) = .88, p < .05
Suppl. Fig. 61
12.3 Trial spacing effects
r(15) = .72, p < .05
12.4 Timing of the CR
r(8 ) = .64, p < .05
12.5 Timed responding from the onset of conditioning
r(8) = .95, p < .05
Suppl. Fig. 63
12.6 Scalar invariance in response timing
r(8) = .64, p < .05
12.7 Temporal specificity of blocking
r(1) = .99, p < .05
Suppl. Fig. 65
12.8 Temporal specificity of occasion setting
r(8) = .83, p < .05
Suppl. Fig. 66
12.9 Inhibition of delay occurs with long but not with short ISIs
r(26) = .86, p < .05 (Short Group),
Suppl. Fig. 67
r(26) = .81, p < .05 (Long Group)
The SLGK model
The attention-modulated representation of the CS, XCS, is proportional to attention zCS.
Configurations, the combination of different CSs into a new stimulus, are important for solving nonlinear problems such as negative patterning and occasion setting, As is shown in Fig. 1a and b, configural units (CNs) are activated by all CSs through random, nonmodifiable connections. In contrast to the unique element model (see Saavedra, 1975), which assumes that a configural cue is unique to a specific compound combination, each CN codes for multiple combinations of inputs (e.g., AB, AC, BD, . . . ABC), thereby avoiding a combinatorial explosion. Also, because the connections are fixed, the system is less prone to catastrophic interference (McCloskey & Cohen, 1989).
In consequence, attention zCN increases when the model fails to correctly predict the presence or absence of the US after some time. Because it is presently unclear whether zCN decreases over time and organisms revert to elemental processing and linear solutions, we provisionally assume that zCN does not decrease.
The attention-modulated representation of the CN, XCN is proportional to zCN.
Importantly, a CS might simultaneously have an excitatory role (through an excitatory VCS-US association) and an inhibitory role (by activating a CN with an inhibitory VCN-US association).
Note that attention zCS and XCS (Equations 1 and 3) control the formation of VCS-US and VCS-CS associations during conditioning (Equation 5) and the activation of VCS-US and the CR (Equations 6 and 7).
Even if conceptually simple (the number of equations in the SLG model is comparable to that of other models in the present Special Issue), "real time" computer simulations with the model require the restatement of the principles above in terms of differential equations presented in Appendix A in the online Supplemental Materials. Those differential equations use two or three parameters to express each of the 7 equations above in real time.
Following Grossberg and Schmajuk (1989) and Buhusi and Schmajuk (1999), we simulate timing by assuming that a CS is formed by a temporal spectrum of fractional CSs of limited duration (5 time units [t.u.]) within a regular CS (20 t.u.), each one activating its own short-term memory and recruiting its own attention and associations. Following Buhusi and Schmajuk, we assume that this temporal spectrum (1) is activated by a short-term memory CS trace of the CS and (2) that its output is modulated by the CR for that CS trace in order to account for the effects of CS duration and intensity.
In our simulations, the number of independent variables (or simulation values, CS and US duration and salience, interstimulus interval [ISI], intertrial interval [ITI], and sequence and number of types of trials) is identical to the number of independent variables used in the experiments. In the experiments, the CSs were sounds, lights, flavors, odors, shapes, all in a wide range of intensities (from 60 to 92 dB), and durations (from 0.5 s to 10 min); the USs were food or shocks in a wide range of intensities (from 0.4 to 4.5 ma) and durations (from .05 to 5 s); the ISI had multiple possible durations (from 0.8 s to 25 min), and the ITI had multiple possible durations (from 15 s to 24 h). The number of experimental acquisition trials ranges from 1 in conditioned taste aversion in rats to 1,000 in eyeblink conditioning in rabbits, and the corresponding numbers for extinction are 3 and 560 trials.
In contrast, our simulations used the following fixed values: CS salience, 1; CS duration, 20 t.u.; US strength, 1; US duration, 5 t.u.; ISI, 15 t.u.; CX salience, 0.1; and ITI duration, 200 t.u. In addition, simulations reproduced the sequence of type of trials (e.g., CS paired with the US, CS alone) used in each experiment. The number of simulated trials was linearly related to the number of trials used in the experiments, r(76) = .85, p < .05 (percentage of variance explained, 73 %). When we eliminated the 13 experiments using nictitating membrane or eyeblink conditioning (in which conditioning takes a large number of trials), the correlation coefficient increased considerably, r(63) = .92, p < .05 (percentage of variance explained, 85 %). This improvement reaffirms our observation that, regarding the number of trials, the model approximates some preparations better than others.
Using the above-mentioned set of fixed simulation values, we attempted to quantitatively match the experimental results for each of the experiments on the list shown in Table 1. In some cases (e.g., renewal in the context of extinction), we adjusted (1) the CX salience to reflect the introduction of olfactory cues that increased the experimental CX salience (5.4, 8.11, 10.7), (2) the CS salience when a less salient CS (e.g., a light) was paired with a more salient CS (e.g., a tone) in a noncounterbalanced manner (7.9), and (3) when the number of color dots used during elemental trials was decreased during compound trials (4.5, 4.6). Note that these changes are not arbitrary but reflect known properties of the experimental stimuli.
Simulation results were compared with the experimental data by (1) applying Pearson's product–moment correlation coefficient (McCall, 1970), which, unlike the sum of standard errors, is sensitive to the ordinal properties of the data, and (2) comparing the ratios (rD, ratio data; rS, ratio simulations) between groups when the experimental results had only two data points. Note that even if quantitative, these are scale-independent, ordinal measures of the quality of the fit.
Because the configural units, which are connected to the input CSs through random weights, were not active in many simulations, those results were independent of those random weights. When the nonlinear input-output combinations were not rapidly learned, the configural units became attended, and the results depended on those random weights, we averaged the simulated results over 10 different sets of random weights.
Wills and Pothos (2012) suggested that the competence of a "well-defined" model could be assessed by analyzing the number "of irreversible, ordinal, penetrable successes in accounting for empirical phenomena” (p. 110). For Wills and Pothos, “a well-defined model ... is one that considers all input–output combinations appearing in peer-reviewed publications.” In the present issue, the participating authors have selected those input–output combinations (see Table 1). Most important, Wills and Pothos defined irreversible success as that achieved by using model parameters “whose specification is general to the whole domain of phenomena that the model is intended to address” (p. 112), and penetrable success refers to the possibility of (1) understanding the model’s processes in psychological terms and (2) applying the model with little effort.
Bunge (1967) defined the accuracy of a model as the ratio between the number of successes in accounting for experimental data (C) and the number of peer-reviewed experimental results (N), A = C/N. In addition, Bunge defined the efficiency of a model by ρ = 1 − n/ C D, where n is the number of free parameters (or, we suggest, the number of equations) in the model, C is the number of experimental results that the model correctly describes, and D the number of dimensions (trial-to-trial, behavioral real-time, neurophysiological) the model can be applied to. The equation seems to reflect well the fact that the number of free parameters, which imposes a penalty on the efficiency, is not important as long as they are applied globally to a large and representative data set. In the equation, the penalty for the number of free parameters is further decreased by the number of dimensions to which the model is applied.
We have applied the SLGK model to the exhaustive list of experimental results related to classical conditioning. Table 1 list those results and indicates which experiments the model can account for. Due to space limitations, this section offers a few illustrations of those results. The rest of the simulations listed on Table 1 are presented in the online Supplemental Materials.
Acquisition (section 1 in Table 1)
US- and CS-specific CR (1.3)
The nature of the CR is determined not only by the US, but also by the CS (Holland, 1977). In Ross and Holland (1981), Experiment 1, rats received simultaneous and serial feature-positive discriminations. Whereas rats in the serial group showed strong responding to the tone target (characterized by head jerk CRCS), the simultaneous group showed strong responding to the light feature (characterized by rearing CRCS). The model explains the results, r(2) = .97, p < .05 (see the online Supplemental Materials, Fig. 3); because it assumes that different CSs become associated with the US in different nodes (that also receive input from the configural nodes), the model can establish separate CRs (e.g., rearing, head jerk) with different CSs (e.g., light, tone). This is expressed in Equation 8′ by CRCS = f(BCS-US) (1 − OR).
Orienting response and conditioning to a CS after changing its predictive accuracy of another CS (1.6)
Extinction (section 2 in Table 1)
Partial reinforcement extinction effect (2.2)
Generalization (section 3 in Table 1)
Generalization and discrimination (3.3)
Discriminations (section 4 in Table 1)
Adding a common cue to the elements and the compound cue during negative patterning decreases discrimination (4.5)
Biconditional discrimination is harder than component discrimination (4.7–4.9)
Discriminations between compounds (4.10)
Haselgrove et al. (2010, Experiment 3) reported that following AX+, BY+, X−, and Y − trials, the discrimination AY+/ AX − was solved faster than the AY+/ BY − discrimination, a result in agreement with Mackintosh’s (1975) theory. Figure 7 (lower panel) shows that the model correctly describes the result, r(16) = .91 p < .05. According to the model, during training, the VX-US and VY-US associations, overshadowed by the VA-US and VB-US associations, respectively, decrease. Representations XX and XY are high because of the alternated AX+/X − and BY+/Y − trials.
During discrimination, XX decreases because X correctly predicts the absence of the US (AX − trials) but XB increases because B strongly predicts the US in its absence (BY−). Because XX is small and XB is large, the difference between AY + and AX − is smaller than the difference between AY + and BY −.
Discrimination between compounds (4.11)
Dopson, Esber, and Pearce (2010, Experiment 1) found that following AW+, BX+, CW−, and DX − training, the AW+/AX − discrimination was learned more slowly than the AW+/BX − discrimination, a result in line with Mackintosh’s (1975) model. The model correctly describes the result, r(18) = .98, p < .05 (see online Supplemental Materials, Fig. 19) because, at the end of training, XA and XB are higher than XX and XW. This is due to the fact that the presence of D on DX − trials strongly decreases attention to X by becoming inhibitory and helping X to correctly predict the absence of the US. Similarly, B helps X to correctly predict the presence of the US during the BX + trials. Therefore, at the end of training, attention to the partially reinforced cues X and W (XX and XW) is lower than attention to the continuously reinforced cues A and B. During the discrimination phase, the AW+/AX − discrimination is learned slowly because XX is relatively weak and the AW+/BW − discrimination is learned quickly, because XB is relatively strong.
Simultaneous and serial feature-negative discrimination (4.14, 4.15)
Holland (1984) also found that nonreinforced successive X → A presentations, alternated with reinforced presentations of A, result in weaker responding to X–A than to A alone, without X gaining inhibitory tendency. Figure 8 (lower panel) shows that the model correctly describes the result, r(28) = .67, p < .05. In this case, X does not gain an inhibitory tendency, and the problem is solved by X and A exciting the configural units, which, in turn, inhibit the US prediction (see Fig. 12 in Schmajuk et al., 1998).
Feature-positive discrimination is easier than feature-negative discrimination (4.16)
Hearst (1975) reported that feature-positive discrimination is easier than feature-negative discrimination. The model also describes the result, for the same reasons that positive is easier than negative patterning.
Shared feature in serial feature-positive and feature-negative discriminations (4.17)
Holland (1991) reported that in serial discrimination, X can be trained to concurrently serve as the feature in both a feature-negative and a feature-positive discrimination with different CSs. The model correctly describes the result, r(14) = .84, p < .05 (see online Supplemental Materials, Fig. 22), because X controls responding through different configural units.
Inhibitory conditioning (section 5 in Table 1)
Effect of extinction of the excitor on conditioned inhibition (5.4)
Following conditioned inhibition, nonreinforced presentations of the excitor A decrease retardation (Lysle & Fowler, 1985) but have no effect on a summation test (Rescorla & Holland, 1977). The model correctly describes the retardation results, r (2) = .92, p < .05 (see online Supplemental Materials, Fig. 25), because presentation of the excitor A activates the representation of X and, simultaneously, increases Novelty′ (because X is predicted but absent), thereby increasing attention to X, zX, which decreases retardation. In addition, the model explains the absence of effect in the summation tests in the same terms as those used to explain the summation results after extinction treatment of the inhibitor.
Effect of nonreinforced presentations of the inhibitor on conditioned inhibition (5.5)
Both Zimmer-Hart and Rescorla (1974) and Pearce, Nicholas, and Dickinson (1982) reported that extinction treatment of the conditioned inhibitor results in no change in its inhibitory properties, as shown by a summation test. Importantly, Pearce et al. also reported increased retardation during reconditioning. The model correctly describes no effect in summation, r(2) = .99, p < .05 (see online Supplemental Materials, Fig. 26, upper panels), and retardation, r(16) = .87, p < .05 (see online Supplemental Materials, Fig. 26, lower panels).
According to the model, no effects are shown in the summation test because attention to X, zX, decreases during X − presentations (while the inhibitory VX-US association does not change). During BX − testing, X is well predicted by the CX in Group X−, which receives X presentations in the CX, but not in Group CX, in which X is absent during exposure to the CX. Therefore, during the summation test trials, X is predicted and Novelty′ is relatively low in Group X, but large in Group CX. Because both attentions zX and zB are relatively small in Group X (in which X loses inhibitory and B loses excitatory power) and relatively large in Group CX (in which X gains inhibitory and B gains excitatory power), similar differences between responding to B and BX are observed in both groups. In contrast, the decreased attention to X, zX, readily results in retardation in the absence of the transfer CS, B.
Combination of separately trained CSs (section 6 in Table 1)
Summation and modality (6.1)
Kehoe, Horne, Horne, and Macrae (1994) showed that when two CSs independently trained with the same US are tested in combination, there is more likely to be a summative CR when the CSs are in different rather than in the same modality. The model correctly describes the results, r(28) = .63, p < .05 (see online Supplemental Materials, Fig. 28); because the shared features in a given modality contribute less excitatory power than do the nonshared features in different modalities, the response is weaker when the CSs are in the same modality.
Stimulus competition/potentiation in training (section 7 in Table 1)
Unblocking by increasing or decreasing the US (7.3, 7.4)
In contrast, when the US intensity decreases, A can predict this weaker US better, and Novelty'′and XA decrease. The decrement in XA decreases the competition of A (proportional to XAVA-US), with X, thereby increasing the VX-US association and reducing blocking.
Durlach and Rescorla (1980) showed that the presence of a taste stimulus at the time of conditioning potentiates, rather than overshadows, the resulting odor aversion to a solution that is followed by LiCl injections. As was suggested by Durlach and Rescorla, the model explains potentiation (rD, 2.26, rS, 2.06; see online Supplemental Materials, Fig. 32) in terms of chaining of CS odor–CS taste associations with CS taste–US associations. CS taste–CS odor associations are formed because these two CSs temporally overlap. CS taste–US, but not CS odor–US associations, are formed because we assume that the CS taste trace, but not the CS odor trace, is long enough to overlap with the US.
Temporal primacy overrides prior training (7.11)
CS/US preexposure effects (section 8 in Table 1)
Brief preexposure to the context facilitates contextual conditioning (8.4)
Learned irrelevance (8.6)
De la Casa and Lubow (2002, Experiment 1) reported that a delay placed after conditioning in a conditioned taste aversion experiment resulted in an increased (super-) LI. The model can reproduce the results, r(2) = .99 (see online Supplemental Materials, Fig. 47), because attention to the water increases and the water–US association decreases during the postconditioning delay. During testing, the water–US association rapidly becomes inhibitory, decreasing the strength of the CR and increasing LI. Therefore, super-LI is due not to a further decrease in attention to the flavor during the delay (actually attention to the flavor increases), but to an increase in attention to the water with which the flavor is delivered, which, during testing, becomes a predictor of the absence of malaise, thereby increasing its hedonic value.
Recovery (section 10 in Table 1)
Recovery from latent inhibition (10.1)
Recovery from overshadowing (10.2)
Kaufman and Bolles (1981; Matzel, Schachtman, & Miller, 1985), but not Holland (1999), found that extinction of overshadowing A results in increased responding to the overshadowed B. The model describes the results, r(1) = .99, p < .05 (Fig. 13, middle panels). In terms of the model (see also Schmajuk & Larrauri, 2006), recovery from blocking results from the increased attention to B, zB, during the extinction of A. Notice that while our simulations for recovery from overshadowing used relatively few (10) A − trials, simulations for mediated extinction (see 11.5) required relatively many (40) A − trials.
Recovery from forward blocking (10.3)
Blaisdell et al. (1999), but not Holland (1999), reported that extinction of the blocker A may result in increased responding to the blocked B. In agreement with Blaisdell, Gunther, and Miller’s (1999) results, the model describes recovery from forward blocking, r(2) = .99, p < .05 (Fig. 13, lower panels). The explanation is similar to the one offered in 10.2.
Recovery from backward blocking (see 7.7) (10.4)
Pineno, Urushihara, and Miller (2005) reported that a delay following backward blocking results in increased responding to the blocked CS. Although the model describes backward blocking, it does not describe the recovery results.
Spontaneous recovery (10.6)
Higher order conditioning (section 11 in Table 1)
Second-order conditioning and conditioned inhibition versus second-order conditioning (11.2, 11.3)
Temporal properties (section 12 in Table 1)
Trial and intertrial durations (12.3)
Scalar invariance in response timing (12.6)
We showed that the SLGK model, which combines attentional, associative, timing, and “flexible” configural mechanisms, is able to explain a large number of the basic properties of classical conditioning. The model incorporates the original SLG model (Schmajuk et al., 1996; Schmajuk & Larrauri, 2006) and a set of configural (hidden) units (Schmajuk & DiCarlo, 1992; Schmajuk et al., 1998). The model is unique, in that attention to these configural units is gradually increased when the system cannot learn a nonlinear problem. This flexible configural mechanism (which implements Melchers et al.’s, 2004, variable processing strategy) allows the model to reduce or eliminate interference between the simple attentional-associative and configural mechanisms used in the original models.
Following Wills and Pothos's (2012) approach, the present study shows that the "well defined" SLGK model, which considers all input–output combinations appearing in Table 1, achieves a large number of irreversible successes in accounting for classical conditioning data. As is shown in Table 1, 82 out of 87 cases showed either significant correlations or similar ratios between experimental and control group responding in the simulations and the data. In Bunge's (1967) terms, the model is accurate in 94 % of the cases (A = .94).
These 82 successes are "irreversible" because, in addition to using fixed simulation values and a number of simulated trials proportional to the number of experimental trials, the model accounts for all the cases using fixed model parameters (Wills & Pothos, 2012).
Using fixed model parameters, we described experimental results that are “experimental parameter” dependent by varying our simulation values and the number of simulated trials to capture those changes. For instance, in three cases (5.4., 8.11, and 10.7), we used a CX with salience 0.3, instead of 0.1, because the experimental contextual cues had been enhanced with the use of odors to obtain the reported effect. In other cases, the CS salience was reduced because the CS was a nonsalient light paired with a salient tone in a noncounterbalanced fashion (7.9) or when the number of color dots used during elemental trials was decreased during compound trials (4.5, 4.6). Finally, in another case (8.4), we used relatively few trials of CX preexposure to show freezing facilitation, instead of latent inhibition. In sum, the fact that some experimental results are “experimental parameter” dependent is well captured by the model without changing any of its parameters. Except when specifically dependent on the intensity or number of the experimental independent variables (e.g., CX salience, CS salience, or trial numbers), the results are extremely robust within a large range of simulation values and trial numbers. Furthermore, we found that simulated results can approximate even closer the experimental data by arbitrarily adjusting the context salience or the ITI. These adjustments compensate for the large variability in the experimental parameters.
Finally, following Wills and Pothos (2012), the SLGK model also attains penetrable success because the basic mechanisms in the model—short-term memory, attention, associations, timing, and flexible configurations—are comprehensible psychological terms and the effort required to apply the model is minimized by the program posted on our Web site.
Model parameters and the brain
As was mentioned, the SLGK model is a combination of different subsystems (attentional, associational, timing, and configural), each one with its own rules and parameters that act in a coordinated way. Each mechanism is specified by a number of parameters that capture its properties, from the build-up and decay of short-term memory and novelty traces, the chaining of predictions, and the rates at which associations increase and decrease to the sigmoid functions controlling the strength of the US- specific CR, the CS-specific CR, and the OR. The total number of parameters is 18. Although it would be possible to eliminate at least some parameters in the model (for instance, K11 and n = 2 in Equation A16 in Appendix A in the online Supplemental Materials) and still obtain excellent descriptions of the data, Equation A16 describes a sigmoidal trial-to-trial acquisition curve, a basic requisite for a good model of conditioning.
Significantly, because the above-mentioned subsystems seem to exist in the brain, the model can be applied to behavioral and neurophysiological dimensions. For example, the CS short-term memory trace (τCS; see Equation A1 in the Appendix in the online Supplemental Materials) approximates the growth and decay in neural activity that accompanies a CS presentation (e.g,, Brozoski, Bauer, & Caspary, 2002). The same node in the model activated by the perceived CS input and the weaker feedback of CS prediction (BCS, an imagined CS) corresponds to the brain area (the fusiform face area) activated by both visual perception and the relative weaker imagery (O’Craven & Kanwisher, 2000). The attentionally modulated short-term memory (XCS) seems to have a physiological correlate in the neural activity of the dorsolateral frontal cortex (Dunsmoor & Schmajuk, 2009). CS-activated CS–US associations might be correlated with neural activity in the amygdala (Dunsmoor & Schmajuk, 2009) or cerebellum (Raymond, Lisberger, & Mauk, 1996). CS–CS associations and CS–CN associations might be stored in the temporal lobe (Daum, Channon, Polkey, & Gray, 1991; Shimamura and Squire, 1984). Novelty′ might be represented by the dopaminergic activity of the ventral tegmental area (Legault & Wise, 2001; Schmajuk 2009); timing of the nictitating membrane CR might be implemented in the cerebellum (Raymond et al., 1996); and the cholinergic system might be involved in blocking (Baxter, Gallagher, & Holland, 1999). Therefore, a simpler model would fail to provide an explanation for the redundant involvement of different brain areas during classical conditioning, thereby reducing its accuracy at describing the above experimental data.
Importantly, the number of parameters is exceedingly compensated by the number of experimental results that the model correctly describes (82) and the number of dimensions (behavioral, temporal, and neurophysiological) to which the model can be applied. As was mentioned, the number of free parameters is not important as long as they are fixed and applied globally to large and representative data sets in different domains. The resulting efficiency (Bunge, 1967), ρ = 0.93, together with its accuracy, A = 94 %, makes the SLGK model a very attractive solution for Pavlov’s puzzle.
How to predict novel results using the model
In order to generate novel predictions with the SLGK model, the standard values for the CS, US, and CX salience and durations should be used. On the basis of the correlations mentioned in the Method section, the number of simulated trials should be about half of that of the experimental trials. Alternatively, the number of trials used in the predictions could be close to the number of trials used in the simulations for an experimental design that is similar to the one whose results are to be predicted.
Experimental results still outside the power of the SLGK model and how to address them
The SLGK model fails to describe 5 results out of 87—namely , (1) concurrent excitation and inhibition with few trials (1.5; McNish, Betts, Brandon, & Wagner, 1997); (2) negative patterning being easier than a biconditional discrimination (4.8; Harris, Livesey, Gharaei, & Westbrook, 2008), but not biconditional discrimination itself; (3) recovery from backward blocking (10.4; Pineno et al., 2005), but not backward blocking itself; (4) the Espinet effect (11.4; Espinet, González, & Balleine, 2004); and (5) mediated acquisition (11.5; Holland & Sherwood, 2008). If we consider that only [concurrent excitation and inhibition with few trials (1.5) and the Espinet effect (11.4) are robust results, the number of serious failures is reduced to two.
Computer simulations indicate that the model can describe concurrent excitation and inhibition with few trials (1.5) and negative patterning being easier than a biconditional discrimination when, contrasting with the flexible configural approach adopted here, the configural units are active from an early stage of training (4.8). Also, the Espinet effect (11.4) and mediated acquisition (11.5) could be addressed by the model by adding the prediction of λ to the teaching signal λ in Equation 6.
In addition, the relatively weak simulated CRs reported in some cases—such as the effect of CX preexposure (8.4), backward blocking (7.7), and second-order conditioning (11.2)—can be easily improved by adopting different CR sigmoid functions (with K11 ~ 0.01 instead of 0.15 in Equation A 11 in the online Supplemental Material) for freezing behavior and the time to complete a number of licks. Finally, the model could be extended to describe the elimination of the deleterious effect of US devaluations with extended training (Holland, 2004) by (1) combining USAppetitive–USAversive associations with the mutual inhibition between appetitive and aversive USs, and (2) CS–CR associations. Incorporation of CS–CR associations would explain why second-order conditioning sometimes survives the extinction of the A–US association (Rizley & Rescorla, 1972).
The SLGK model, which combines attentional, associational, timing, and “flexible” configural mechanisms, is able to explain a large number of the basic properties of classical conditioning. The model provides an excellent fit to 94 % of the experimental data by using fixed model parameters and simulation values (CS, US, CX salience, and duration) and number of trials roughly proportional to the number of trials in the experiments. Although the approach permits the use of the same model across different preparations, our results suggest that a model with specific parameters for different species and preparations (such as using different sigmoid functions) will provide even better descriptions and explanations of the data. Ideally, the improved model should also use simulation variables precisely scaled to the salience and duration of the experimental variables, as well as to the number of trials in the experiments.
The authors thank Gonzalo de la Casa, Edgar Vogel, Jose Larrauri, and Andy Wills for their comments on an early version of the manuscript. Thanks also to Avani Vora and Aadya Deshpande for their help in running simulations and preparing figures.