Learning & Behavior

, Volume 40, Issue 3, pp 269–291

Solving Pavlov's puzzle: Attentional, associative, and flexible configural mechanisms in classical conditioning

Article

Abstract

This article introduces a new “real-time” model of classical conditioning that combines attentional, associative, and "flexible" configural mechanisms. In the model, attention to both conditioned (CS) and configural (CN) stimuli are modulated by the novelty detected in the environment. Novelty increases with the unpredicted presence or absence of any CS, unconditioned stimulus (US), or context. Attention regulates the magnitude of the associations CSs and CNs form with other CSs and the US. We incorporate a flexible configural mechanism in which attention to the CN stimuli increases only after the model has unsuccessfully attempted learn input-output combinations with CS–US associations. That is, CSs become associated with the US and other CSs on fewer trials than they do CNs. Because the CSs activate the CNs through unmodifiable connections, a CS can become directly and indirectly (through the CN) associated with the US or other CSs. In order to simulate timing processes, we simply assume that a CS is formed by a temporal spectrum of short-duration CSs that are activated by the nominal CS trace. The model accurately describes 94 % of the basic properties of classical conditioning, using fixed model parameters and simulation values in all simulations.

Keywords

Associative learning Attention Configural learning 

Introduction

Although apparently simple, classical conditioning is a collection of diverse phenomena that cannot be reduced to one simple equation. In contrast, we have suggested that conditioning can be described only by a complex combination of different, sometimes redundant mechanisms, each one with its own rules and parameters (Schmajuk, 2010).

In this article, we attempt to solve Pavlov’s conditioning puzzle with a new model (the SLGK1 model) that incorporates attentional, associative, and flexible configural pieces. The SLGK model integrates three previous neural network models—namely, Schmajuk, Lam, and Gray's (SLG; Schmajuk, Lam, & Gray, 1996; see also Larrauri & Schmajuk, 2008; Schmajuk & Larrauri, 2006) attentional-associative model, Schmajuk and DiCarlo's (1992; Schmajuk, Lamoureux, & Holland, 1998) configural model, and Grossberg and Schmajuk's (1989; Buhusi & Schmajuk, 1999) timing model. The resulting model successfully accounts for a large number of the basic properties of classical conditioning that integrate a list put together by all the participants in this special issue of Learning & Behavior (Table 1).
Table 1

Summary of the simulated results (for details of the simulations, see the online Supplemental Materials)

 

Correlations

Figures

1. Acquisition

  

 1.1 Acquisition

r(8) = .87, p < .05

Suppl. Fig. 2

 1.2. Partial reinforcement

r(8) = .87, p < .05

Suppl. Fig. 2

 1.3 US- and CS-specific CR

r(2) = .97, p < .05

Suppl. Fig. 3

 1.4 Conditioned diminution of the UR

r(4) = .80, p < .05

Suppl. Fig. 4

 1.5. Concurrent Excitation and inhibition with few trials

NO

 
 

r(16) = .80, p < .05 Stages 1 & 2

 

 1.6.Conditioning proceeds faster with imperfect predictors

r(6) = .87, p < .05 Stage 3

Fig. 2 - Suppl. Fig. 5

2. Extinction

  

 2.1 Extinction

r(1) = .86, p < .05

Suppl. Fig. 54

 2.2 Partial reinforcement extinction effect (PREE)

r(8) = .73, p < .05

Fig. 3 - Suppl. Fig. 6

 2.3 Renewal in a context of extinction

See Renewal in 10.7

Suppl. Fig. 55

3. Generalization

  

 3.1 Generalization

r(3) = .98, p < .05

Suppl. Fig. 7

 3.2 External inhibition

r(1) = .998, p < .05

Suppl. Fig. 8

 3.3. Asymmetrical generalization decrement

r(7) = .86, p < .05

Fig. 4 - Suppl. Fig. 9

4. Discriminations

  

 4.1 Discrimination between CS + and CS−

r(18) = .92, p < .05

Suppl. Fig. 10

 4.2 Positive patterning

r(70) = .90, p < .05

Suppl. Fig. 11

 4.3 Negative Patterning

r(70) = .90, p < .05

Fig. 5 - Suppl. Fig. 12

 4.4 Positive patterning is easier than negative patterning.

r(48) = .94, p < .05

Suppl. Fig. 13

 

r(19) = .93, p < .05 Group Simple

 

 4.5 Adding a common cue to NP decreases discrimination

r(19) = .88, p < .05 Group Common

Fig. 5 - Suppl. Fig. 14

 4.6 Patterning discriminations with 3 CS is learnable

r(28) = .86, p < .05

Fig. 5 - Suppl. Fig. 15

 4.7 Biconditional discrimination

r(28) = .75,p < .05

Fig. 6 - Suppl. Fig. 16

 4.8 Biconditional discrimination is harder than NP

NO

 

 4.9 Biconditional discrimination is harder than component discrimination

r(28) = .75, p < .05

Fig. 6 - Suppl. Fig. 16

 

r(34) = .97, p < .05 (Exp.1)

 

 4.10 Following A+/B + and X-/Y − training, AY+/ AX − discrimination is faster than AY+/ BY − discrimination

)r(16) = .91, p < .05 (Exp. 3)

Fig. 7 - Suppl. Fig. 17 Suppl. Fig. 18

 4.11 Following AW + and BX + and nonreinforced CW − and DX − training , AW+/AX − discrimination is slower than AW+/BX − discrimination.

r(18 ) = .98, p < .05

Suppl. Fig. 19

 4.12 Simultaneous feature-positive discrimination

r(34) = .63, p < .05

Fig. 8 - Suppl. Fig. 20

 4.13 Serial feature-positive discrimination

r(34) = .63,p < .05

Fig. 8 - Suppl. Fig. 20

 4.14 Simultaneous feature-negative discrimination

r(28) = .67, p < .05

Fig. 8 - Suppl. Fig. 21

 4.15 Serial feature-negative discrimination

r(28) = .67, p < .05

Fig. 8 - Suppl. Fig. 21

 4.16 Feature positive discrimination is easier than feature negative

See 4.12 and 4.14

Fig. 8 - Suppl. Figs. 20 & 21

 4.17 Common feature in both a serial feature negative and serial feature positive discrimination

r(14) = .84, p < .05

Suppl. Fig. 22

5. Inhibitory conditioning

  

 5.1 Conditioned Inhibition

r(1) = .998, p < .05

Suppl. Fig. 8

 5.2 Contingency

r(1) = .99, p < .05

Suppl. Fig. 23

 5.3 Extinction of conditioned inhibition. Inhibitory conditioning is extinguished by CS1–CS2–US presentations

r(2) = .90, p < .5

Suppl. Fig. 24

 5.4 Following conditioned inhibition, reinforced and nonreinforced presentations of excitor CS1 might modify the power of CS2 in summation and retardation tests

(2) = 0.92,p < 0.05 (CX = 0.3) ODORS

Suppl. Fig. 25

 5.5 Following conditioned inhibition, nonreinforced presentations of inhibitor CS2 does not modify the power of CS2 in summation and retardation tests

r(2) = .99, p < .5 (summation), r(16) = .87, p < .05 (Exp. 3)

Suppl. Fig. 26

 5.6 Differential conditioning

r(2) = .94, p < .05 (summation test)

 
 

rD: 4.80, rS: 14 (retardation test)

Suppl. Fig. 27

6. Combination of separately trained CSs

  

 6.1 When two CSs independently trained with the same US are tested in combination, there is more likely to be a summative CR when the CSs are in different than in the same modality

r(28) = .63, p < .05

Suppl. Fig. 28

 6.2 CSs that are trained with aversive USs may acquire broad tendency to potentiate defensive CRs and suppress appetitive CRs

Behavioral Interaction

 

 6.3 CSs that are trained with appetitive USs may acquire broad tendency to potentiate appetitive CRs and suppress defensive CRs

Behavioral Interaction

 

7. Stimulus competition/potentiation in training

  

 7.1 Relative validity

rD = 3.80, rS = 4.2

Suppl. Fig. 29

 7.2 Blocking

r(1) = .99, p < .05

Suppl. Fig. 30

 7.3 Unblocking by increasing the US

r(2) = .90,p < .05

Fig. 9 - Suppl. Fig. 31

 7.4 Unblocking by decreasing the US

r(2) = .90, p < .05

Fig. 9 - Suppl. Fig. 31

 7.5 Overshadowing

r(1) = .99,p < .05

Suppl. Fig. 30

 7.6 Potentiation

rD: 2.26; rS: 2.06

Suppl. Fig. 32

 7.7 Backward blocking

rD: 1.40; rS: 4.90

Suppl. Fig. 33

 7.8 Overexpectation

r(3) = .93, p < .05

Suppl. Fig. 34

 7.9 Superconditioning

r(1) = .95, p < .05

Suppl. Fig. 35

 7.10 Unequal learning about CSs in compound

rD: 1.82; rS: 1.70

Suppl. Fig. 36

 7.11 Temporal primacy overrides prior training

r(18) = .68, p < .05

Fig. 10 - Suppl. Fig. 37

8. CS/US preexposure effects

  

 8.1 Latent inhibition

r(2) = .91, p < .05

Suppl. Fig. 38

 8.2 A change of context disrupts latent inhibition

r (2) = .91, p < .05

Suppl. Fig. 38

 8.3 Presentation of a different CS before conditioning disrupts latent inhibition

r(1) = .99, p < .05

Suppl. Fig. 39

 8.4 Preexposure to a context facilitates the of fear conditioning

r(2) = .96, p < .05

Fig. 11 - Suppl. Fig. 40

 8.5 US––preexposure effect

rD: 2, rS: 2

Suppl. Fig. 41

 8.6 Learned irrelevance

r(10) = .059, p < .05

Fig. 12 - Supp. Fig. 42

 8.7 Perceptual learning

r(1) = .97, p < .05

Suppl. Fig. 43

 8.8 Hall–Pearce effect

r(16) = .79, p < .05

Suppl. Fig. 44

 8.9. Presentation of the US before a CS–US pairing impairs conditioning

r(6) = .87, p < .05

Suppl. Fig. 45

 8.10. Presentation of the CS before a CS–US pairing impairs conditioning

r(4) = .85, p < .05

Suppl. Fig. 46

 8.11. Super latent Inhibition.

r(2) = .99, p < .05 (CX = 0.3) ODORS

Suppl. Fig. 47

9. Transfer

  

 9.1 Extinction (see 2.1)

r(1) = .86, p < .05

Suppl. Fig. 54

 9.2 Reacquisition.

r(16 ) = .72,p < .05 (Exp.1A),

 
 

r(24) = .80,p < .05 (Exp1B)

Suppl. Fig. 48

 9.3 Counterconditioning. 5.3. Inhibitory conditioning is extinguished by CS1–CS2–US presentations

Behavioral Competition

 

 9.4 Transfer along a continuum.

r(14) = .64,p < .05

Suppl. Fig. 49

10. Recovery

  

 10.1 Recovery from latent inhibition

r(2) = .91, p < .05

Fig. 13 - Suppl. Fig. 50

 10.2 Recovery from overshadowing

r(1) = .99,p < .05

Fig. 13 - Supp. Fig. 51

 10.3 Recovery from forward blocking

r(2) = .99, p < .05

Fig. 13 - Supp. Fig. 52

 10.4 Recovery from backward blocking

NO

 

 10.5 External disinhibition

r(1) = .96, p < .05

Suppl. Fig. 53

 10.6 Spontaneous recovery

r(18) = .93, p < .05

Fig. 14 - Suppl. Fig. 54

 10.7 Renewal

rD: 2.11; rS: 2.55 (CX = 0.3) ODORS

Suppl. Fig. 55

 10.8 Reinstatement

r(4) = .96, p < .05

Suppl. Fig. 56

11. Higher order conditioning

  

 11.1 Sensory preconditioning

rD: 19.5, rS: 42.20

Suppl. Fig. 57

 11.2 Second-order conditioning

r(4) = .99, p < .05

Fig. 15 - Suppl. Fig. 58

 11.3 Second-order conditioning vs. conditioned inhibition

r(4) = .99, p < .05

Fig. 15 - Suppl. Fig. 58

 11.4 Inhibitory sensory preconditioning is possible

NO

 

 11.5 Mediated acquisition

NO

 

 11.6 Mediated extinction

rD: 1.26, rS: 2.35

Suppl. Fig. 59

12. Temporal Properties

  

 12.1 Interstimulus Interval (ISI) effects

r(10) = .85, p < .05

Suppl. Fig. 60

 12.2 Intertrial Interval effects

r(4) = .88, p < .05

Suppl. Fig. 61

 12.3 Trial spacing effects

r(15) = .72, p < .05

Fig. 16 - Suppl. Fig. 62

 12.4 Timing of the CR

r(8 ) = .64, p < .05

Fig. 17 - Suppl. Fig. 64

 12.5 Timed responding from the onset of conditioning

r(8) = .95, p < .05

Suppl. Fig. 63

 12.6 Scalar invariance in response timing

r(8) = .64, p < .05

Fig. 17 - Suppl. Fig. 64

 12.7 Temporal specificity of blocking

r(1) = .99, p < .05

Suppl. Fig. 65

 12.8 Temporal specificity of occasion setting

r(8) = .83, p < .05

Suppl. Fig. 66

 12.9 Inhibition of delay occurs with long but not with short ISIs

r(26) = .86, p < .05 (Short Group),

Suppl. Fig. 67

 

r(26) = .81, p < .05 (Long Group)

 

The SLGK model

In the SLGK model (see Fig. 1a, and b), a conditioned stimulus (CS) can become directly and indirectly (through configural stimuli [CNs]) associated with the unconditioned stimulus (US) or other CSs, and these associations can be excitatory or inhibitory. Attention to a CS increases when (1) that CS is not well predicted by itself, other CSs, or the context (CX) or (2) when that CS does not predict well (either the presence or the absence of) other CSs, the CX, or the US. Attention to a CN increases when the presence or absence of the US is not well predicted by the CSs or the CX. Attention to a CS increases and decreases much more rapidly than attention to a CN; that is, the model engages its configural machinery later than its elementary associations. The last assumption, which we refer to as a flexible configuration approach, is similar to Melchers, Lachnit, and Shank’s (2004) “variable processing strategy.” In our model, the configural system is engaged after the model unsuccessfully attempts to learn a nonlinear problem (e.g., negative patterning) with only its linear associative mechanisms. Finally, timing is explained by assuming that a CS is formed by a temporal spectrum of CSs of limited duration activated by the CS trace. Below, we describe the different subsystems included in the model.
Fig. 1

a Schmajuk–Lam–Gray–Kutlu model: CS–US and CN–US associations. A, X, simple conditioned stimuli; N, group of configural stimuli; US, unconditioned stimulus; τA and τX, trace of A and X; BA and BX, predicted A and X; zA and zX, attention to A and X; XA,, XX, and XCN, internal representations of A, X, and CN; VA,US , VX,US , VCN-US, associations of A, X, and CN with the US; BUS, predicted magnitude of the US; CR, conditioned response; Novelty′, detected novelty; OR, orienting response. Triangles: variable attentions and associations. BA, BX, and BUS are always positive. b Schmajuk–Lam–Gray–Kutlu model: CS–CS and CN–CS associations. A, X, simple conditioned stimuli; N, group of configural stimuli; τA and τX, trace of A and X; BA and BX, predicted A and X; zA and zX, attention to A and X; XA, XX, and XCN, internal representations of A, X, and CN; VA,A, VX,A, and VCN-A, associations of A, X, and CN with A; BA, predicted magnitude of CSA; Novelty′, detected novelty. Triangles: variable attentions and associations. BA, BX, and BUS are always positive

Attentional mechanism

Attention, defined as the modulation of the strength of the internal representation of a CS by Novelty′ (see below), is important in paradigms such as latent inhibition and extinction. According to the model (see Fig. 1a, and b), changes in attention zCS (−1 > zCS > 1) to an active or predicted CS are proportional to the salience of the CS and are given by
$$ \matrix{ {\Delta {z_{\text{CS}}} > 0,{\text{ when\;Novelty}}\prime > {\text{Threshol}}{{\text{d}}_{\text{CS}}}} \\ {\Delta {z_{\text{CS}}} < 0,{\text{ when\;Novelty}}\prime < {\text{Threshol}}{{\text{d}}_{\text{CS}}},} \\ }<!end array> $$
(1)
where ThresholdCS = K6/K5 (see the online Supplemental Materials) and Novelty′ is computed as
$$ {\text{Novelty}}\prime \sim \sum\nolimits_{\text{S}} {\left| {{\lambda_{\text{S}}} - {B_{\text{S}}}} \right|}, $$
(2)
where λS is the actual value and BS is the predicted value of the US, a CS, or the CX. Novelty′ increases when the CSs or the CXs are poor predictors of the US, other CSs, or the CX (i.e., when the US, other CSs, or the CX are underpredicted or overpredicted by the CSs and the CX). In consequence, attention to a CS in Equation 1 increases when any CS (1) is a poor predictor of the US (as in the Pearce & Hall [1980] model), other CSs, or the CX, (2), is poorly predicted by other CSs or the CX (as in Wagner's [1981] SOP model), or (3) is presented together with other CSs that are poor predictors of either the US or other CSs.

The attention-modulated representation of the CS, XCS, is proportional to attention zCS.

Configural mechanism

Configurations, the combination of different CSs into a new stimulus, are important for solving nonlinear problems such as negative patterning and occasion setting, As is shown in Fig. 1a and b, configural units (CNs) are activated by all CSs through random, nonmodifiable connections. In contrast to the unique element model (see Saavedra, 1975), which assumes that a configural cue is unique to a specific compound combination, each CN codes for multiple combinations of inputs (e.g., AB, AC, BD, . . . ABC), thereby avoiding a combinatorial explosion. Also, because the connections are fixed, the system is less prone to catastrophic interference (McCloskey & Cohen, 1989).

Attention zCN (−1 > zCN > 1) to an active CN changes in proportion to the salience of the CN and changes according to
$$ \matrix{ {\Delta {z_{\text{CN}}} > 0,{\text{ when\;Novelty}}{\prime_{\text{US}}} > {\text{Threshol}}{{\text{d}}_{\text{CN}}}} \\ {\Delta {z_{\text{CN}}} = 0,{\text{ when\;Novelty}}{\prime_{\text{US}}} < {\text{Threshol}}{{\text{d}}_{\text{CN}}},} \\ }<!end array> $$
(3)
where ThresholdCN >> ThresholdCS and Novelty′ is computed as
$$ {\text{Novelty}}{\prime_{\text{US}}} \sim \left| {{\lambda_{\text{US}}} - {B_{\text{US}}}} \right|. $$
(4)

In consequence, attention zCN increases when the model fails to correctly predict the presence or absence of the US after some time. Because it is presently unclear whether zCN decreases over time and organisms revert to elemental processing and linear solutions, we provisionally assume that zCN does not decrease.

The attention-modulated representation of the CN, XCN is proportional to zCN.

Associative mechanism

Associations are defined as the effect that the representation of a given CS has on the prediction of itself, another CS, the CX or the US. Changes in the excitatory or inhibitory CS–US association, VCS-US, between XCS and the US (see Fig. 1a), are proportional to
$$ \Delta {V_{{\text{CS}} - {\text{US}}}} \sim {{\text{X}}_{\text{CS}}}\left( {{\lambda_{\text{US}}} - {B_{\text{US}}}} \right)\left| {{1} - {V_{{\text{CS}} - {\text{US}}}}} \right|, $$
(5)
where (λUSBUS) is the common error term, λUS is the strength of the US, BUS is the prediction of the US by all CS and CN active at a given time (Equation 6), and |1 − VCS-US| is the individual error term that limits the contribution of a CS to the prediction of the US (important for describing maximality in blocking). As was suggested by Zimmer-Hart and Rescorla (1974), BUS = 0 when BUS < 0.
Changes in the excitatory or inhibitory CN–US, VCN-US, are proportional to
$$ \Delta {V_{{\text{CN}} - {\text{US}}}} \sim {{\text{X}}_{\text{CN}}}\left( {{\lambda_{\text{US}}} - {B_{\text{US}}}} \right)\left| {{1} - {V_{{\text{CN}} - {\text{US}}}}} \right|. $$
(5′)
Changes in the excitatory or inhibitory CS–CS, CS–CX, or CX–CS associations, VCS,CS, VCS,CX, and VCX,CS (see Fig. 1b), are proportional to
$$ \Delta {V_{{\text{CS}} - {\text{CS}}}} \sim {{\text{X}}_{\text{CS}}}\left( {{\lambda_{\text{CS}}} - {B_{\text{CS}}}} \right)\left| {{1} - {V_{{\text{CS}} - {\text{CS}}}}} \right|, $$
(5″)
where λCS is the salience of the CS and BCS is the prediction of the CS by all CSs and CNs active at a given time (Equation 6′). Again, BCS = 0 when BCS < 0.
Changes in the excitatory or inhibitory CN–CS, CN–CX, or CX–CN associations, VCN-CS, VCS-CX, and VCX-CN, are proportional to
$$ \Delta {V_{{\text{CN}} - {\text{CS}}}} \sim {{\text{X}}_{\text{CS}}}\left( {{\lambda_{\text{CS}}} - {B_{\text{CS}}}} \right)\left| {{1} - - {V_{{\text{CN}} - {\text{CS}}}}} \right|. $$
(5‴)

Aggregate prediction

As is shown in Fig. 1a, the aggregate prediction of the US by all CSs and CNs with representations active at a given time, BUS, is given by
$${B_{\text{US}}} = \Sigma {{\text{X}}_{\text{CS}}}{V_{{\text{CS}} - {\text{US}}}} + \Sigma {{\text{X}}_{\text{CN}}}{V_{{\text{CN}} - {\text{US}}}}. $$
(6)

Importantly, a CS might simultaneously have an excitatory role (through an excitatory VCS-US association) and an inhibitory role (by activating a CN with an inhibitory VCN-US association).

As is shown in Fig. 1b, the aggregate prediction of any CS by all CSs and CNs with representations active at a given time, BCS, is given by
$$ {B_{\text{CS}}} = \Sigma {{\text{X}}_{\text{CS}}}{V_{{\text{CS}} - {\text{CS}}}} + \Sigma {{\text{X}}_{\text{CN}}}{V_{{\text{CN}} - {\text{CS}}}}. $$
(6′)

Note that attention zCS and XCS (Equations 1 and 3) control the formation of VCS-US and VCS-CS associations during conditioning (Equation 5) and the activation of VCS-US and the CR (Equations 6 and 7).

Outputs

The US-specific CR in Fig. 1a is a nonlinear function of the prediction of the US, BUS, and decreases as a function of Novelty′. Because we assume that the orienting response (OR) is a sigmoid function of Novelty′, OR = f(Novelty′), the CR can be expressed as
$$ {\text{CR}} = f\left( {{B_{\text{US}}}} \right){ }\left( {{1} - {\text{OR}}} \right). $$
(7)

Because XCS controls BUS, attention controls both learning (Equations 5) and performance (Equation 7).

Similarly, the CS-specific CR is a sigmoid function of the prediction of the US by that particular CS, BCS-US,
$$ {\text{C}}{{\text{R}}_{\text{CS}}} = f\left( {{B_{{\text{CS}} - {\text{US}}}}} \right){ }\left( {{1} - {\text{OR}}} \right). $$
(7′)

Real-time equations

Even if conceptually simple (the number of equations in the SLG model is comparable to that of other models in the present Special Issue), "real time" computer simulations with the model require the restatement of the principles above in terms of differential equations presented in Appendix A in the online Supplemental Materials. Those differential equations use two or three parameters to express each of the 7 equations above in real time.

Timing

Following Grossberg and Schmajuk (1989) and Buhusi and Schmajuk (1999), we simulate timing by assuming that a CS is formed by a temporal spectrum of fractional CSs of limited duration (5 time units [t.u.]) within a regular CS (20 t.u.), each one activating its own short-term memory and recruiting its own attention and associations. Following Buhusi and Schmajuk, we assume that this temporal spectrum (1) is activated by a short-term memory CS trace of the CS and (2) that its output is modulated by the CR for that CS trace in order to account for the effects of CS duration and intensity.

Suppression ratios

Suppression ratios (SRs) were calculated using the following equation:
$$ {\text{SR}} = \left( {\beta - {\text{CR}}\left( {\text{CS}} \right)} \right)/(\beta - {\text{CR}}\left( {\text{CX}} \right) + \left( {\beta - {\text{CR}}\left( {\text{CS}} \right)} \right), $$
(8)
where β is the baseline level of responding (e.g., bar pressing) whose value is approximately equal to the maximum CR, β − CR(CS) is the responding during the presentation of the CS, and β − CR(CX) is the responding during the time period of similar duration of the CS preceding the CS presentation. Except when the CX is reinforced (e.g., contingency experiments), β − CR(CX) is generally similar to β.

Simulation method

In our simulations, the number of independent variables (or simulation values, CS and US duration and salience, interstimulus interval [ISI], intertrial interval [ITI], and sequence and number of types of trials) is identical to the number of independent variables used in the experiments. In the experiments, the CSs were sounds, lights, flavors, odors, shapes, all in a wide range of intensities (from 60 to 92 dB), and durations (from 0.5 s to 10 min); the USs were food or shocks in a wide range of intensities (from 0.4 to 4.5 ma) and durations (from .05 to 5 s); the ISI had multiple possible durations (from 0.8 s to 25 min), and the ITI had multiple possible durations (from 15 s to 24 h). The number of experimental acquisition trials ranges from 1 in conditioned taste aversion in rats to 1,000 in eyeblink conditioning in rabbits, and the corresponding numbers for extinction are 3 and 560 trials.

In contrast, our simulations used the following fixed values: CS salience, 1; CS duration, 20 t.u.; US strength, 1; US duration, 5 t.u.; ISI, 15 t.u.; CX salience, 0.1; and ITI duration, 200 t.u. In addition, simulations reproduced the sequence of type of trials (e.g., CS paired with the US, CS alone) used in each experiment. The number of simulated trials was linearly related to the number of trials used in the experiments, r(76) = .85, p < .05 (percentage of variance explained, 73 %). When we eliminated the 13 experiments using nictitating membrane or eyeblink conditioning (in which conditioning takes a large number of trials), the correlation coefficient increased considerably, r(63) = .92, p < .05 (percentage of variance explained, 85 %). This improvement reaffirms our observation that, regarding the number of trials, the model approximates some preparations better than others.

Using the above-mentioned set of fixed simulation values, we attempted to quantitatively match the experimental results for each of the experiments on the list shown in Table 1. In some cases (e.g., renewal in the context of extinction), we adjusted (1) the CX salience to reflect the introduction of olfactory cues that increased the experimental CX salience (5.4, 8.11, 10.7), (2) the CS salience when a less salient CS (e.g., a light) was paired with a more salient CS (e.g., a tone) in a noncounterbalanced manner (7.9), and (3) when the number of color dots used during elemental trials was decreased during compound trials (4.5, 4.6). Note that these changes are not arbitrary but reflect known properties of the experimental stimuli.

Simulation results were compared with the experimental data by (1) applying Pearson's product–moment correlation coefficient (McCall, 1970), which, unlike the sum of standard errors, is sensitive to the ordinal properties of the data, and (2) comparing the ratios (rD, ratio data; rS, ratio simulations) between groups when the experimental results had only two data points. Note that even if quantitative, these are scale-independent, ordinal measures of the quality of the fit.

Because the configural units, which are connected to the input CSs through random weights, were not active in many simulations, those results were independent of those random weights. When the nonlinear input-output combinations were not rapidly learned, the configural units became attended, and the results depended on those random weights, we averaged the simulated results over 10 different sets of random weights.

Model evaluation

Wills and Pothos (2012) suggested that the competence of a "well-defined" model could be assessed by analyzing the number "of irreversible, ordinal, penetrable successes in accounting for empirical phenomena” (p. 110). For Wills and Pothos, “a well-defined model ... is one that considers all input–output combinations appearing in peer-reviewed publications.” In the present issue, the participating authors have selected those input–output combinations (see Table 1). Most important, Wills and Pothos defined irreversible success as that achieved by using model parameters “whose specification is general to the whole domain of phenomena that the model is intended to address” (p. 112), and penetrable success refers to the possibility of (1) understanding the model’s processes in psychological terms and (2) applying the model with little effort.

Bunge (1967) defined the accuracy of a model as the ratio between the number of successes in accounting for experimental data (C) and the number of peer-reviewed experimental results (N), A = C/N. In addition, Bunge defined the efficiency of a model by ρ = 1 − n/ C D, where n is the number of free parameters (or, we suggest, the number of equations) in the model, C is the number of experimental results that the model correctly describes, and D the number of dimensions (trial-to-trial, behavioral real-time, neurophysiological) the model can be applied to. The equation seems to reflect well the fact that the number of free parameters, which imposes a penalty on the efficiency, is not important as long as they are applied globally to a large and representative data set. In the equation, the penalty for the number of free parameters is further decreased by the number of dimensions to which the model is applied.

Computer simulations

We have applied the SLGK model to the exhaustive list of experimental results related to classical conditioning. Table 1 list those results and indicates which experiments the model can account for. Due to space limitations, this section offers a few illustrations of those results. The rest of the simulations listed on Table 1 are presented in the online Supplemental Materials.

Acquisition (section 1 in Table 1)

US- and CS-specific CR (1.3)

The nature of the CR is determined not only by the US, but also by the CS (Holland, 1977). In Ross and Holland (1981), Experiment 1, rats received simultaneous and serial feature-positive discriminations. Whereas rats in the serial group showed strong responding to the tone target (characterized by head jerk CRCS), the simultaneous group showed strong responding to the light feature (characterized by rearing CRCS). The model explains the results, r(2) = .97, p < .05 (see the online Supplemental Materials, Fig. 3); because it assumes that different CSs become associated with the US in different nodes (that also receive input from the configural nodes), the model can establish separate CRs (e.g., rearing, head jerk) with different CSs (e.g., light, tone). This is expressed in Equation 8′ by CRCS = f(BCS-US) (1 − OR).

Orienting response and conditioning to a CS after changing its predictive accuracy of another CS (1.6)

Wilson, Boumphrey, and Pearce (1992) reported that the percentage of ORs decreased with randomly alternated presentation of light–tone–food trials and light–tone-alone trials, but it was restored when the nonreinforced trials contained only the light. Conditioning was faster in the group for which the OR was restored. Figure 2 shows that the SLGK model reproduces the Wilson et al. (1992) data [r(16) = .80, p < .05, for Stages 1 and 2; r(6) = .87, p < .05, for Stage 3]. The increased OR and attention to the CS is the consequence of the increased Novelty′ when the light predicts the tone in its absence. Increased attention in Phase 2 results in faster conditioning in Phase 3.
Fig. 2

Restoration of the light orienting response. Right Upper panel Data from Wilson et al. (1992) Experiment 1-Stage 1 & 2. Left Upper panel Internal representation of A (XA) during 20 LT+ and 20 L- trials followed by 7 LT+ and 7 LT- trials for the Extinction Group (Group E) and 20 LT+ and 20 LT- trials followed by 7 LT+ and 7 L- trials for the Control Group (Group C) during Stage 1 and 2. Right Lower panel Data from Wilson et al. (1992) Experiment 1-Stage 3. Left Lower panel Conditioned responses to A during 4 A+ trials. CS salience 1, CS active between 20-40 t.u., US active between 35-40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient r (16) = 0.80, p < 0.05 for Stage 1 and 2, and r(6) = 0.87, p < 0.05 for Stage 3

Extinction (section 2 in Table 1)

Partial reinforcement extinction effect (2.2)

According to Wagner, Siegel, and Fein (1967), extinction is slower following partial than following continuous reinforcement. Figure 3 shows that the model reproduces those results. According to the model, Novelty′ and attention to the CS during extinction are smaller after partial reinforcement than after continuous reinforcement because the animal’s expectation of the US is smaller in the first case than in the second case, r (8) = .73, p < .05. This decreased attention results in slower extinction. The explanation provided by the model has some similarities with Capaldi’s (1994) view that responding will persist during extinction if the conditions are similar to those during acquisition. That is, the partial reinforcement extinction effect is the consequence of extinction being more similar to partial than to continuous reinforcement.
Fig. 3

Partial reinforcement extinction effect: Mean suppression of bar pressing in the presence of a CS [(PreCS–CS)/PreCS], on the last day of acquisition and subsequent extinction in groups receiving continuous (Continuous) or partial (Partial) shock US reinforcement during bar pressing. Left panel Data from Wagner, Siegel, and Fein (1967, Experiment 2). Right panel Simulations consisted of 5 A + trials for the continuous reinforcement (BP100) group or 20 A + /A − alternated trials for the partial reinforcement (BP50) group trials, followed by 10 A − test trials. CS salience 1, CS active between 20 and 40 t.u., US active between 35and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(8) = .73, p < .05

Generalization (section 3 in Table 1)

Generalization and discrimination (3.3)

Brandon, Vogel, and Wagner (2000) carried out a generalization experiment using the rabbit conditioned eyeblink response where animals were trained with cues A, AB, or ABC and were tested with A, AB, and ABC. Group A showed maximal responding to A, Group AB to AB, and Group ABC to ABC. Importantly, the effect (generalization decrement) of eliminating a CS was stronger than that of adding one or two additional CSs. Figure 4 shows that the model describes those results, r(7) = .86, p < .05. According to the model, each group showed decreased responding (1) as external inhibition increased when other stimuli were added or eliminated during testing and (2) when stimuli present during training were eliminated during testing. Therefore, generalization decrement is larger when a trained stimulus is removed (external inhibition and elimination of the CS that reduces prediction of the US) than when a novel stimulus is added (external inhibition only).
Fig. 4

Generalization after training with A, AB, or ABC and testing with A, AB, and ABC. Left panels Data from Brandon, Vogel, and Wagner (2000, Experiment1). Right panels Simulations consisted of 500 A + trials for Group A, 500 AB + trials for Group AB, and 500 ABC + trials for Group ABC. Figures show CRs to A, AB, or ABC during 12 nonreinforced test trials. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(7) = .86, p < .05

Discriminations (section 4 in Table 1)

Adding a common cue to the elements and the compound cue during negative patterning decreases discrimination (4.5)

Redhead and Pearce (1998) reported that adding an additional stimulus C to cues A+, B+, AB − in a negative patterning design retarded the discrimination. Figure 5 (upper and middle panels) shows that the model describes the results [r (19) = .93, p < .05, for group simple, and r(19) = .88, p < .05, for group common). In terms of the model, this is explained because of the increased generalization between the reinforced (AC+, BC+) and nonreinforced (ABC−) compounds.
Fig. 5

Left Upper and Middle Panels Negative Patterning adding a common cue (AC+, BC+, and ABC- Discrimination). Data from Redhead and Pearce (1998) Experiment 1. Right Upper Panel Simulations for Group Simple consisted of 270 of each alternated A+, B+, and AB- trials. Correlation coefficient r (19) = 0.85, p < 0.05. Right Middle Panel Simulations for Group Common consisted of 280 of each alternated AC+, BC+, and ABC- trials. Left Lower Panel Negative Patterning with 3 cues (A+, BC+, and ABC- Discrimination). Data from Redhead and Pearce (1995) Experiment 1. Right Lower Panel Simulations consisted of 50 of each alternated A+, BC+, and ABC- trials. CS salience was 1 during A+ trials, 0.5 during BC+ trials and .1 during ABC- trials, CS active between 20-40 t.u., US active between 35-40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Simulations show the average of 10 sets of random input-hidden weights. Correlation coefficient r (28) = 0.86, p < 0.05

Biconditional discrimination is harder than component discrimination (4.7–4.9)

Saavedra (1975) reported that a biconditional discrimination performance in which subjects were trained with two compounds, AC + and BD+, consistently reinforced, and two other compounds, AD − and BC−, consistently nonreinforced, is more difficult than a component discrimination with two compounds, AC + and AD+, with A consistently reinforced, and two other compounds, BD − and BC−, with B consistently nonreinforced. Figure 6 shows that the model correctly describes these results, r(28) = .75, p < .05. According to the model, attention to the configural stimuli takes some time to increase to a level at which the biconditional problem can be solved, whereas attention to the components increases from the first trial.
Fig. 6

Biconditional versus component discriminations. Left panel Data from Saavedra (1975). Right panel Simulations for the biconditional group consisted of 120 of each alternated AC +, AD −, BC −, and BD + trials. Simulations for the component group consisted of 50 of each alternated AC +, AD +, BC −, and BD − trials. CS salience 1, CS active between 20 and 40 t.u., US active between 35and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Simulations show the average of 10 sets of random input-hidden weights. Correlation coefficient: r(28) = .75, p < .05

Discriminations between compounds (4.10)

Haselgrove, Esber, Pearce, and Jones’s (2010) Experiment 1 showed that following A+/B + and X ±/Y ± training, the discrimination between compounds AY + and AX − was solved faster than the discrimination between compounds AY + and BY−, a result in agreement with Pearce and Hall's (1980) model. Figure 7 (upper panel) shows that the model correctly describes the result, r(34) = .97, p < .05. According to the model, attention to the partially reinforced X is higher than attention to the continuously reinforced B, and therefore, the difference in responding to compounds AY + and AX − is greater than the difference in responding to compounds AY + and BY −.
Fig. 7

Upper panel The AY + /AX − discrimination is better than the AY + /BY − discrimination. Left panel Data from Haselgrove, Esber, Pearce, and Jones (2010, Experiment 1). Right panel Simulations consisted of 72 A +, 72, B +, 72 X + /X−, and 72 Y + /Y − trials for the acquisition phase. During the discrimination phase, 24 AY + and 24 AX − trials for the AY + /AX − group and 24 AY + and 24 BY − trials for the AY + /BY − group were used. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(34) = .97, p < .5.Lower panel The AY + /BY − discrimination is better than the AY + /AX − discrimination. Left panel Data from Haselgrove et al., Experiment 3. Right panel For simulations, during training, 35 AX +, 35 BY +, 35 X −, and 35 Y − trials were given. During testing, 56 AY + /28 AX − /28 BY − discrimination trials were given. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(16) = .93, p < .05

Haselgrove et al. (2010, Experiment 3) reported that following AX+, BY+, X−, and Y − trials, the discrimination AY+/ AX − was solved faster than the AY+/ BY − discrimination, a result in agreement with Mackintosh’s (1975) theory. Figure 7 (lower panel) shows that the model correctly describes the result, r(16) = .91 p < .05. According to the model, during training, the VX-US and VY-US associations, overshadowed by the VA-US and VB-US associations, respectively, decrease. Representations XX and XY are high because of the alternated AX+/X − and BY+/Y − trials.

During discrimination, XX decreases because X correctly predicts the absence of the US (AX − trials) but XB increases because B strongly predicts the US in its absence (BY−). Because XX is small and XB is large, the difference between AY + and AX − is smaller than the difference between AY + and BY −.

Discrimination between compounds (4.11)

Dopson, Esber, and Pearce (2010, Experiment 1) found that following AW+, BX+, CW−, and DX − training, the AW+/AX − discrimination was learned more slowly than the AW+/BX − discrimination, a result in line with Mackintosh’s (1975) model. The model correctly describes the result, r(18) = .98, p < .05 (see online Supplemental Materials, Fig. 19) because, at the end of training, XA and XB are higher than XX and XW. This is due to the fact that the presence of D on DX − trials strongly decreases attention to X by becoming inhibitory and helping X to correctly predict the absence of the US. Similarly, B helps X to correctly predict the presence of the US during the BX + trials. Therefore, at the end of training, attention to the partially reinforced cues X and W (XX and XW) is lower than attention to the continuously reinforced cues A and B. During the discrimination phase, the AW+/AX − discrimination is learned slowly because XX is relatively weak and the AW+/BW − discrimination is learned quickly, because XB is relatively strong.

Simultaneous and serial feature-negative discrimination (4.14, 4.15)

Holland (1984, Experiment 1) reported that nonreinforced simultaneous X–A presentations, alternated with A reinforced presentations, result in weaker responding to X–A than to A alone (conditioned inhibition). Figure 8 (lower panel) shows that the model correctly describes the results. In this case, X gains a strong inhibitory association with the US.
Fig. 8

Simultaneous and serial feature-positive discriminations. Left Upper panel Data from Ross and Holland (1981, Experiments 1 and 2). Right upper panel Simulations for simultaneous discrimination consisted of 360 alternated A − and XA + trials with A and X presented between 20 and 40 t.u. Right lower panel Simulations for serial discrimination consisted of 780 alternated A − and X → A + trials with X presented between 1 and 20 t.u. and A presented between 20 and 40 t.u. CS salience was 1, US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(34) = .63, p < .05. Simultaneous and serial feature-negative discriminations. Left Lower panel Data from Holland (1984, Experiment 1). Right Lower panel Simulations showing suppression ratios (β = 0.6) for simultaneous discrimination consisted of 180 A + trials followed by 80 alternated A + and XA − trials with A and X presented between 20 and 40 t.u. Simulations for serial discrimination consisted of 640 alternated A + and X → A − trials with X presented between 1 and 20 t.u. and A presented between 20 and 40 t.u. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Simulations show the average of 10 sets of random input–configural weights. Correlation coefficient: r(28) = .67, p < .05

Holland (1984) also found that nonreinforced successive X → A presentations, alternated with reinforced presentations of A, result in weaker responding to X–A than to A alone, without X gaining inhibitory tendency. Figure 8 (lower panel) shows that the model correctly describes the result, r(28) = .67, p < .05. In this case, X does not gain an inhibitory tendency, and the problem is solved by X and A exciting the configural units, which, in turn, inhibit the US prediction (see Fig. 12 in Schmajuk et al., 1998).

Feature-positive discrimination is easier than feature-negative discrimination (4.16)

Hearst (1975) reported that feature-positive discrimination is easier than feature-negative discrimination. The model also describes the result, for the same reasons that positive is easier than negative patterning.

Shared feature in serial feature-positive and feature-negative discriminations (4.17)

Holland (1991) reported that in serial discrimination, X can be trained to concurrently serve as the feature in both a feature-negative and a feature-positive discrimination with different CSs. The model correctly describes the result, r(14) = .84, p < .05 (see online Supplemental Materials, Fig. 22), because X controls responding through different configural units.

Inhibitory conditioning (section 5 in Table 1)

Effect of extinction of the excitor on conditioned inhibition (5.4)

Following conditioned inhibition, nonreinforced presentations of the excitor A decrease retardation (Lysle & Fowler, 1985) but have no effect on a summation test (Rescorla & Holland, 1977). The model correctly describes the retardation results, r (2) = .92, p < .05 (see online Supplemental Materials, Fig. 25), because presentation of the excitor A activates the representation of X and, simultaneously, increases Novelty′ (because X is predicted but absent), thereby increasing attention to X, zX, which decreases retardation. In addition, the model explains the absence of effect in the summation tests in the same terms as those used to explain the summation results after extinction treatment of the inhibitor.

Effect of nonreinforced presentations of the inhibitor on conditioned inhibition (5.5)

Both Zimmer-Hart and Rescorla (1974) and Pearce, Nicholas, and Dickinson (1982) reported that extinction treatment of the conditioned inhibitor results in no change in its inhibitory properties, as shown by a summation test. Importantly, Pearce et al. also reported increased retardation during reconditioning. The model correctly describes no effect in summation, r(2) = .99, p < .05 (see online Supplemental Materials, Fig. 26, upper panels), and retardation, r(16) = .87, p < .05 (see online Supplemental Materials, Fig. 26, lower panels).

According to the model, no effects are shown in the summation test because attention to X, zX, decreases during X − presentations (while the inhibitory VX-US association does not change). During BX − testing, X is well predicted by the CX in Group X−, which receives X presentations in the CX, but not in Group CX, in which X is absent during exposure to the CX. Therefore, during the summation test trials, X is predicted and Novelty′ is relatively low in Group X, but large in Group CX. Because both attentions zX and zB are relatively small in Group X (in which X loses inhibitory and B loses excitatory power) and relatively large in Group CX (in which X gains inhibitory and B gains excitatory power), similar differences between responding to B and BX are observed in both groups. In contrast, the decreased attention to X, zX, readily results in retardation in the absence of the transfer CS, B.

Combination of separately trained CSs (section 6 in Table 1)

Summation and modality (6.1)

Kehoe, Horne, Horne, and Macrae (1994) showed that when two CSs independently trained with the same US are tested in combination, there is more likely to be a summative CR when the CSs are in different rather than in the same modality. The model correctly describes the results, r(28) = .63, p < .05 (see online Supplemental Materials, Fig. 28); because the shared features in a given modality contribute less excitatory power than do the nonshared features in different modalities, the response is weaker when the CSs are in the same modality.

Stimulus competition/potentiation in training (section 7 in Table 1)

Unblocking by increasing or decreasing the US (7.3, 7.4)

Dickinson, Hall, and Mackintosh (1976, Experiment 3) reported that responding to X can be increased by increasing or decreasing the number of presentations of the US (equivalent to total US intensity) from A + training to AX++ training. The model describes the effects of both increasing and decreasing the intensity of the US, r(2) = .90, p < .05 (see Fig. 9). According to the model, increasing the US intensity increases responding to the blocked X simply because the VX-US association increases.
Fig. 9

Unblocking. Unblocking is observed when the number of US presentations in phase 2 (AB – US) is larger (S – L) or smaller (L – S) than the number used in phase 1 (A – US). Left panel Data from Dickinson, Hall, and Mackintosh (1976, Experiment 3). Right panel Simulations showing suppression ratios (β = 0.3). Group C +/CL ++ received 40 A + trials, followed by 280 AX ++ trials and 1 X − trial. Group C ++/CL ++ received 40 A ++ trials, followed by 280 AX ++ trials and 1 X − trial. Group C ++/CL + received 40 A ++ trials, followed by 280 AX + trials, and 1 X − trial. Finally, Group C +/CL + received 40 A  +  trials, followed by 280 AX + trials, and 1 X − trial. All CSs were presented between 20 and 40 t.u. with salience 1; the US strength was 1 for A + and AX + trials and 2 for A ++ and AX ++ trials, CX salience 0.1, and ITI 200 t.u. and presented between 35 and 40 t.u. Correlation coefficient: r(2) = .9, p < .05

In contrast, when the US intensity decreases, A can predict this weaker US better, and Novelty'′and XA decrease. The decrement in XA decreases the competition of A (proportional to XAVA-US), with X, thereby increasing the VX-US association and reducing blocking.

Potentiation (7.6)

Durlach and Rescorla (1980) showed that the presence of a taste stimulus at the time of conditioning potentiates, rather than overshadows, the resulting odor aversion to a solution that is followed by LiCl injections. As was suggested by Durlach and Rescorla, the model explains potentiation (rD, 2.26, rS, 2.06; see online Supplemental Materials, Fig. 32) in terms of chaining of CS odor–CS taste associations with CS taste–US associations. CS taste–CS odor associations are formed because these two CSs temporally overlap. CS taste–US, but not CS odor–US associations, are formed because we assume that the CS taste trace, but not the CS odor trace, is long enough to overlap with the US.

Temporal primacy overrides prior training (7.11)

In Kehoe, Schreurs, and Graham’s (1987) Experiment 1, groups received either B + trials (Groups B) or exposure to the context (Groups R), followed by AB compound trials with (1) A preceding and completely overlapping B (Groups OL) or (2) A preceding but terminating before B onset (Groups SQ). Kehoe et al. (1987) found that after prior training of B, there was (1) weak blocking to A but (2) strong decline in responding to B. Figure 10 shows that the model correctly describes the results, r(18) = .68, p < .05. In terms of the model, attention to B increases when A is present, thereby favoring the extinction of the B–US association during the time when B is active in the absence of the US, which is presented in a trace arrangement with B. In addition, our model predicts blocking of A by B.
Fig. 10

Temporal primacy overrides prior training. Left panels Data from Kehoe, Schreurs, and Graham (1987, Experiment1). Right panels Simulations consisted of 40 B + trials, 10 series of 5 A → B trials preceded and followed by 1 B − test trial. In the overlapping case, A was presented between 20 and 80 t.u., and B was presented between 60 and 80 t.u. In the sequential case, A was presented between 20 and 40 t.u., and B was presented between 40 and 60 t.u. CS salience 1, US active between 55 and 60 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(18) = .68; p < .05

CS/US preexposure effects (section 8 in Table 1)

Brief preexposure to the context facilitates contextual conditioning (8.4)

Fanselow (1990) reported that a 2-min preexposure to the CX, as well as additional 1-, 3-, 9-, 27-, and 81-s exposures to the same CX before presenting a shock US, facilitates fear conditioning to that CX. Figure 11 shows that the model describes the results, r(2) = .96, p < .05, in terms of the increased Novelty′, zCX, and XCX that follow a brief preexposure to the CX. Longer preexposure times result in LI (see Fanselow, 1990, p. 269).
Fig. 11

Context preexposure facilitates contextual conditioning. Left panel Data from Fanselow (1990, Experiment 3). Left panel Simulated freezing response amplitude (10^8 CR) after one CX – US conditioning trial following 200, 600, 1,400, or 1,800 time units in CXA. US duration 5 t.u., US strength 1, and CX salience 0.1. Correlation coefficient: r(2) = .96, p < .05

Learned irrelevance (8.6)

Bonardi and Hall (1996; Bennett, Maldonado, & Mackintosh, 1995) found that groups receiving uncorrelated presentations of the CS and US conditioned significantly more slowly than groups given separate sessions of exposure to the CS followed by sessions of exposure to the US or vice versa. The model describes the results, r(10) = .59, p < .05 (Fig. 12). According to the model, the increased attention to the CX, following presentation of the US, facilitates the formation of CX–CS associations and reduces Novelty′, thereby reducing attention zCS to the CS and increasing the LI effect.
Fig. 12

Learned irrelevance is more than the sum of CS preexposure and US preexposure effect. Left panel Data from Bonardi and Hall (1996, Experiment 2). Right panel Simulations showing suppression ratios (β = 0.3) for the learned irrelevance (LIRR) group consisted of 80 trials of A/ + where the US was presented 55 t.u. after the offset of A trials in which the conditional probability of the US was similar in the absence of the CS and in its presence, followed by 90 A + trials. Simulations of the control (CON) group consisted of 40 A − trials alternated with 40 trials with the US alone. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(10) = .59, p < .05

Super-latent-inhibition (8.11)

De la Casa and Lubow (2002, Experiment 1) reported that a delay placed after conditioning in a conditioned taste aversion experiment resulted in an increased (super-) LI. The model can reproduce the results, r(2) = .99 (see online Supplemental Materials, Fig. 47), because attention to the water increases and the water–US association decreases during the postconditioning delay. During testing, the water–US association rapidly becomes inhibitory, decreasing the strength of the CR and increasing LI. Therefore, super-LI is due not to a further decrease in attention to the flavor during the delay (actually attention to the flavor increases), but to an increase in attention to the water with which the flavor is delivered, which, during testing, becomes a predictor of the absence of malaise, thereby increasing its hedonic value.

Recovery (section 10 in Table 1)

Recovery from latent inhibition (10.1)

Grahame, Barnet, Gunther, and Miller (1994) reported that LI is attenuated by extensive exposure to the training context following CS–US pairings. The model describes the results, r(2) = .91, p < .05 (Fig. 13, upper panels) in terms of the increased attention to CS, zCS, during exposure to the context. The context activates the representation of the CS when Novelty′ increases, since the context predicts both the CS and the US in their absence.
Fig. 13

Recovery from latent inhibition. Left Upper panel Data from Grahame, Barnet, Gunther, and Miller (1994, Experiment 1). Right Upper panel Simulated results showing the strength of the conditioned response to A during 3 test trials, following 40 A − trials, 10 A + trials, and 40 CX trials. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(2) = .91, p < .05. Recovery from overshadowing. Left Middle panel Data from Matzel, Schachtman, and Miller (1985, Experiment1) showing mean latencies to complete the first 5 cumulative seconds of drinking in the test context. Right Middle panel Simulated results showing the strength of the conditioned response to A during 1 test trial, following 10 AB + trials and 10 CX − trials for Group Over, 10 A + and 10 CX − trials for Group Over-Control, and 10 AB + and 10 B − trials for Group Extinction. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(1) = .99, p < .05. Recovery from blocking. Left Lower panel Data from Blaisdell, Gunther, and Miller (1999, Experiment 3) showing mean times to lick for 5 cumulative seconds during the presentation of the clicker. Right Lower panel Simulated results showing the strength of the conditioned response to A during 5 test trials, following 40 B + trials, 20 AB + trials, and 5 A − trials for the blocking-extinction group and 40 X + trials, 20 AB + trials, and 5 A − trials for the overshadowing-extinction group. For no-extinction groups, 5 A − trials were replaced by 5 CX − trials. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(2) = .99, p < .05

Recovery from overshadowing (10.2)

Kaufman and Bolles (1981; Matzel, Schachtman, & Miller, 1985), but not Holland (1999), found that extinction of overshadowing A results in increased responding to the overshadowed B. The model describes the results, r(1) = .99, p < .05 (Fig. 13, middle panels). In terms of the model (see also Schmajuk & Larrauri, 2006), recovery from blocking results from the increased attention to B, zB, during the extinction of A. Notice that while our simulations for recovery from overshadowing used relatively few (10) A − trials, simulations for mediated extinction (see 11.5) required relatively many (40) A − trials.

Recovery from forward blocking (10.3)

Blaisdell et al. (1999), but not Holland (1999), reported that extinction of the blocker A may result in increased responding to the blocked B. In agreement with Blaisdell, Gunther, and Miller’s (1999) results, the model describes recovery from forward blocking, r(2) = .99, p < .05 (Fig. 13, lower panels). The explanation is similar to the one offered in 10.2.

Recovery from backward blocking (see 7.7) (10.4)

Pineno, Urushihara, and Miller (2005) reported that a delay following backward blocking results in increased responding to the blocked CS. Although the model describes backward blocking, it does not describe the recovery results.

Spontaneous recovery (10.6)

Pavlov (1927; Rescorla, 2004) demonstrated that presentation of the CS after a delay following extinction might yield renewed responding. The model describes the results, r(18) = .93, p < .05 (Fig. 14), in terms of the increased attention, zCS, to the excitatory and unattended CS, but initially not to the inhibitory but even less attended CX, due to the increased Novelty′ that follows the presentation of the target CS after the CX–CS association decreases during the delay and the target CS becomes novel again in that CX.
Fig. 14

Spontaneous recovery. Left panel Data from Rescorla (2004, Experiment 1). Right panel Simulations consisted of 325 A + trials in CXA, 35 A − trials in CXA, 10 home cage trials, and 48 A − test trials in CXA. CS salience 1, CS active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CXA salience 0.1, home cage salience 0.5, and ITI 200 t.u. Correlation coefficient: r(18) = .93, p < .05

Higher order conditioning (section 11 in Table 1)

Second-order conditioning and conditioned inhibition versus second-order conditioning (11.2, 11.3)

As described in 5.1 in Table 1, the same type trials (A+, AB−) are presented in conditioned inhibition and second-order conditioning. Yin, Darnel, and Miller (1994) found that B becomes inhibitory on a summation test after many, but not few, B → A trials following A + trials. The model describes the results, r(4) = .99, p < .05 (Fig. 15), in terms of the competition between (1) the combined (chained) B–A and A–US associations and (2) the formation of inhibitory B–US associations driven by the decreasing A–US association on nonreinforced B–A trials. Note that the model also explains sensory preconditioning in terms of B–A and A–US chaining. Because the SLGK model uses the same (relatively fast) extinction rate as the SLG model, we measured the CR at 2 t.u. after B onset, when the B–A association was not yet extinguished.
Fig. 15

The number of A–X pairings is critical for obtaining conditioned inhibition or second-order conditioning. Left panel Data from Yin, Darnel, and Miller (1994, Experiment 1). Right panel Simulations consisted of 170 A + trials, which were interspersed with 1 AX − trial for the interspersed few (IF) group and with 85 AX − trials for the interspersed many (IM) group. For the sequential few (SF) group, 170 A + trials were followed by 1 AX − trial, whereas for the sequential many (SM) group 170 A + trials were followed by 85 AX − trials. Interspersed nothing (IN) and sequential nothing (SN) groups received only 170 A + trials. Finally, all groups were given 4 nonreinforced test trials for B. CS salience 1, A active between 40 and 60 t.u., and B active between 20 and 40 t.u., US active between 35 and 40 t.u., US strength 1, CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(4) = .99, p < .05

Temporal properties (section 12 in Table 1)

Trial and intertrial durations (12.3)

Gibbon, Baldock, Locurto, Gold, and Terrace (1977) found that constant values of intertrial-duration/trial-duration (I/T) ratios result in an approximately constant number of trials to acquisition. Trial duration was defined as the duration of the CS followed by the US. Intertrial duration was the time from a CS onset to the next CS onset. High I/T ratios resulted in faster acquisition than did low I/T ratios. The model explains the result, r(15) = .72, p < .05 (Fig. 16), because the CS–US association decreases with increasing T durations (longer periods of extinction; see 12.1) but increases with increasing I durations (less competition by the CX; see 12.2), which results in a constant number of trials to acquisition when both ITI and ISI increase or decrease. If I increases with a constant T, conditioning occurs fast. If I decreases with a constant T, conditioning occurs slowly.
Fig. 16

Trials to criterion are determined by both the interstimulus and the intertrial intervals. Left panel Data from Gibbon, Baldock, Locurto, Gold, and Terrace (1977, Experiment 1). Right panel Simulated trials to acquisition criterion (CR = 0.2) for 20-, 25-, 30-, and 35-t.u. ISI and 100-, 125, 150-, 175-, 200-, 225-, 250-, and 300-t.u. ITI. CS salience 1, CS active between 20 and 40 t.u., 15 and 40 t.u., 10 and 40 t.u., or 5 and 40 t.u., US active between 35 and 40 t.u., US strength 1, and CX salience 0.1. Correlation coefficient: r(15) = .72, p < .05

Scalar invariance in response timing (12.6)

Millenson, Kehoe, and Gormezano (1977) found that reinforcing a CS at different ISIs results in CRs that are active over a longer period of time and can be superimposed when divided by the ISI. The temporal spectrum version of the SLGK model approximates the data, r(8) = .64, p < .05 (Fig. 17), because, with a longer ISI, the US overlaps with the traces of more fractional CSs, which results in responding over a longer period of time. Note that the fit provided by the square pulses that represent the fractional CSs in the SLGK model is almost as good as that provided by the more complex traces used by Grossberg and Schmajuk (1989).
Fig. 17

CR peak latencies coincide with the temporal location of the US and the topography of the CR follows a Weber Law (the width of the CR increases with the ISI). Left panel Data from Millenson, Kehoe, and Gormezano (1977, Experiment 1). Right panel Simulations consisted of 800 reinforced presentations of A (represented by a temporal spectrum of five consecutive CSs of salience 1 and duration 5 t.u.) together with a 5-t.u. US of strength 1 presented with a 30-t.u. and 40-t.u. ISI. CX salience 0.1, and ITI 200 t.u. Correlation coefficient: r(8) = .64, p < .05

Discussion

We showed that the SLGK model, which combines attentional, associative, timing, and “flexible” configural mechanisms, is able to explain a large number of the basic properties of classical conditioning. The model incorporates the original SLG model (Schmajuk et al., 1996; Schmajuk & Larrauri, 2006) and a set of configural (hidden) units (Schmajuk & DiCarlo, 1992; Schmajuk et al., 1998). The model is unique, in that attention to these configural units is gradually increased when the system cannot learn a nonlinear problem. This flexible configural mechanism (which implements Melchers et al.’s, 2004, variable processing strategy) allows the model to reduce or eliminate interference between the simple attentional-associative and configural mechanisms used in the original models.

Model evaluation

Following Wills and Pothos's (2012) approach, the present study shows that the "well defined" SLGK model, which considers all input–output combinations appearing in Table 1, achieves a large number of irreversible successes in accounting for classical conditioning data. As is shown in Table 1, 82 out of 87 cases showed either significant correlations or similar ratios between experimental and control group responding in the simulations and the data. In Bunge's (1967) terms, the model is accurate in 94 % of the cases (A = .94).

These 82 successes are "irreversible" because, in addition to using fixed simulation values and a number of simulated trials proportional to the number of experimental trials, the model accounts for all the cases using fixed model parameters (Wills & Pothos, 2012).

Using fixed model parameters, we described experimental results that are “experimental parameter” dependent by varying our simulation values and the number of simulated trials to capture those changes. For instance, in three cases (5.4., 8.11, and 10.7), we used a CX with salience 0.3, instead of 0.1, because the experimental contextual cues had been enhanced with the use of odors to obtain the reported effect. In other cases, the CS salience was reduced because the CS was a nonsalient light paired with a salient tone in a noncounterbalanced fashion (7.9) or when the number of color dots used during elemental trials was decreased during compound trials (4.5, 4.6). Finally, in another case (8.4), we used relatively few trials of CX preexposure to show freezing facilitation, instead of latent inhibition. In sum, the fact that some experimental results are “experimental parameter” dependent is well captured by the model without changing any of its parameters. Except when specifically dependent on the intensity or number of the experimental independent variables (e.g., CX salience, CS salience, or trial numbers), the results are extremely robust within a large range of simulation values and trial numbers. Furthermore, we found that simulated results can approximate even closer the experimental data by arbitrarily adjusting the context salience or the ITI. These adjustments compensate for the large variability in the experimental parameters.

Finally, following Wills and Pothos (2012), the SLGK model also attains penetrable success because the basic mechanisms in the model—short-term memory, attention, associations, timing, and flexible configurations—are comprehensible psychological terms and the effort required to apply the model is minimized by the program posted on our Web site.

Model parameters and the brain

As was mentioned, the SLGK model is a combination of different subsystems (attentional, associational, timing, and configural), each one with its own rules and parameters that act in a coordinated way. Each mechanism is specified by a number of parameters that capture its properties, from the build-up and decay of short-term memory and novelty traces, the chaining of predictions, and the rates at which associations increase and decrease to the sigmoid functions controlling the strength of the US- specific CR, the CS-specific CR, and the OR. The total number of parameters is 18. Although it would be possible to eliminate at least some parameters in the model (for instance, K11 and n = 2 in Equation A16 in Appendix A in the online Supplemental Materials) and still obtain excellent descriptions of the data, Equation A16 describes a sigmoidal trial-to-trial acquisition curve, a basic requisite for a good model of conditioning.

Significantly, because the above-mentioned subsystems seem to exist in the brain, the model can be applied to behavioral and neurophysiological dimensions. For example, the CS short-term memory trace (τCS; see Equation A1 in the Appendix in the online Supplemental Materials) approximates the growth and decay in neural activity that accompanies a CS presentation (e.g,, Brozoski, Bauer, & Caspary, 2002). The same node in the model activated by the perceived CS input and the weaker feedback of CS prediction (BCS, an imagined CS) corresponds to the brain area (the fusiform face area) activated by both visual perception and the relative weaker imagery (O’Craven & Kanwisher, 2000). The attentionally modulated short-term memory (XCS) seems to have a physiological correlate in the neural activity of the dorsolateral frontal cortex (Dunsmoor & Schmajuk, 2009). CS-activated CS–US associations might be correlated with neural activity in the amygdala (Dunsmoor & Schmajuk, 2009) or cerebellum (Raymond, Lisberger, & Mauk, 1996). CS–CS associations and CS–CN associations might be stored in the temporal lobe (Daum, Channon, Polkey, & Gray, 1991; Shimamura and Squire, 1984). Novelty′ might be represented by the dopaminergic activity of the ventral tegmental area (Legault & Wise, 2001; Schmajuk 2009); timing of the nictitating membrane CR might be implemented in the cerebellum (Raymond et al., 1996); and the cholinergic system might be involved in blocking (Baxter, Gallagher, & Holland, 1999). Therefore, a simpler model would fail to provide an explanation for the redundant involvement of different brain areas during classical conditioning, thereby reducing its accuracy at describing the above experimental data.

Importantly, the number of parameters is exceedingly compensated by the number of experimental results that the model correctly describes (82) and the number of dimensions (behavioral, temporal, and neurophysiological) to which the model can be applied. As was mentioned, the number of free parameters is not important as long as they are fixed and applied globally to large and representative data sets in different domains. The resulting efficiency (Bunge, 1967), ρ = 0.93, together with its accuracy, A = 94 %, makes the SLGK model a very attractive solution for Pavlov’s puzzle.

How to predict novel results using the model

In order to generate novel predictions with the SLGK model, the standard values for the CS, US, and CX salience and durations should be used. On the basis of the correlations mentioned in the Method section, the number of simulated trials should be about half of that of the experimental trials. Alternatively, the number of trials used in the predictions could be close to the number of trials used in the simulations for an experimental design that is similar to the one whose results are to be predicted.

Experimental results still outside the power of the SLGK model and how to address them

The SLGK model fails to describe 5 results out of 87—namely , (1) concurrent excitation and inhibition with few trials (1.5; McNish, Betts, Brandon, & Wagner, 1997); (2) negative patterning being easier than a biconditional discrimination (4.8; Harris, Livesey, Gharaei, & Westbrook, 2008), but not biconditional discrimination itself; (3) recovery from backward blocking (10.4; Pineno et al., 2005), but not backward blocking itself; (4) the Espinet effect (11.4; Espinet, González, & Balleine, 2004); and (5) mediated acquisition (11.5; Holland & Sherwood, 2008). If we consider that only [concurrent excitation and inhibition with few trials (1.5) and the Espinet effect (11.4) are robust results, the number of serious failures is reduced to two.

Computer simulations indicate that the model can describe concurrent excitation and inhibition with few trials (1.5) and negative patterning being easier than a biconditional discrimination when, contrasting with the flexible configural approach adopted here, the configural units are active from an early stage of training (4.8). Also, the Espinet effect (11.4) and mediated acquisition (11.5) could be addressed by the model by adding the prediction of λ to the teaching signal λ in Equation 6.

In addition, the relatively weak simulated CRs reported in some cases—such as the effect of CX preexposure (8.4), backward blocking (7.7), and second-order conditioning (11.2)—can be easily improved by adopting different CR sigmoid functions (with K11 ~ 0.01 instead of 0.15 in Equation A 11 in the online Supplemental Material) for freezing behavior and the time to complete a number of licks. Finally, the model could be extended to describe the elimination of the deleterious effect of US devaluations with extended training (Holland, 2004) by (1) combining USAppetitive–USAversive associations with the mutual inhibition between appetitive and aversive USs, and (2) CS–CR associations. Incorporation of CS–CR associations would explain why second-order conditioning sometimes survives the extinction of the A–US association (Rizley & Rescorla, 1972).

Conclusion

The SLGK model, which combines attentional, associational, timing, and “flexible” configural mechanisms, is able to explain a large number of the basic properties of classical conditioning. The model provides an excellent fit to 94 % of the experimental data by using fixed model parameters and simulation values (CS, US, CX salience, and duration) and number of trials roughly proportional to the number of trials in the experiments. Although the approach permits the use of the same model across different preparations, our results suggest that a model with specific parameters for different species and preparations (such as using different sigmoid functions) will provide even better descriptions and explanations of the data. Ideally, the improved model should also use simulation variables precisely scaled to the salience and duration of the experimental variables, as well as to the number of trials in the experiments.

Footnotes
1

Schmajuk–Lam–Gray–Kutlu.

 

Authors note

The authors thank Gonzalo de la Casa, Edgar Vogel, Jose Larrauri, and Andy Wills for their comments on an early version of the manuscript. Thanks also to Avani Vora and Aadya Deshpande for their help in running simulations and preparing figures.

Supplementary material

13420_2012_83_MOESM1_ESM.doc (866 kb)
ESM 1(DOC 846 kb)

Copyright information

© Psychonomic Society, Inc. 2012

Authors and Affiliations

  1. 1.Department of Psychology and NeuroscienceDuke UniversityDurhamUSA

Personalised recommendations