One-back reinforcement dissociates implicit-procedural and explicit-declarative category learning
The debate over unitary/multiple category-learning utilities is reminiscent of debates about multiple memory systems and unitary/dual codes in knowledge representation. In categorization, researchers continue to seek paradigms to dissociate explicit learning processes (yielding verbalizable rules) from implicit learning processes (yielding stimulus–response associations that remain outside awareness). We introduce a new dissociation here. Participants learned matched category tasks with a multidimensional, information-integration solution or a one-dimensional, rule-based solution. They received reinforcement immediately (0-Back reinforcement) or after one intervening trial (1-Back reinforcement). Lagged reinforcement eliminated implicit, information-integration category learning but preserved explicit, rule-based learning. Moreover, information-integration learners facing lagged reinforcement spontaneously adopted explicit rule strategies that poorly suited their task. The results represent a strong process dissociation in categorization, broadening the range of empirical techniques for testing the multiple-process theoretical perspective. This and related methods that disable associative learning—fostering a transition to explicit-declarative cognition—could have broad utility in comparative, cognitive, and developmental science.
KeywordsCategory learning Explicit cognition Associative learning Category rules Procedural learning
Categorization is an essential cognitive function with great evolutionary depth. It increases fitness because categories—that is, psychological equivalence classes—support adaptive behavior toward the members of natural kinds (e.g., members of prey and predator species). Given its importance, categorization is a sharp focus of cognitive research with animals (e.g., Cerella, 1979; Herrnstein, Loveland, & Cable, 1976; Medin, 1975; Pearce, 1994; Smith, Redford, & Haas, 2008; Wasserman, Kiedinger, & Bhatt, 1988) and humans (e.g., Ashby & Maddox, 2011; Brooks, 1978; Feldman, 2000; Knowlton & Squire, 1993; Medin & Schaffer, 1978; Murphy, 2003; Nosofsky, 1987; Rosch & Mervis, 1975; Smith & Minda, 1998).
Categorization could be important enough that organisms bring complementary categorization processes to bear on different situations. Cognitive systems are often diversified, not parsimoniously unitary, as when animals use circadian or interval timing, dead-reckoning or landmark navigation, and so forth. There are trade-offs between alternative processes in human categorization (e.g., Ashby & Maddox, 2011; Blair & Homa, 2003; Homa, Sterling, & Trepel, 1981; Reed, 1978; Smith, Chapman, & Redford, 2010; Smith, Murray, & Minda, 1997). There are parallel trade-offs in animal categorization, pointing to evolutionary continuities (e.g., Cook & Smith, 2006; Smith, Beran, Crossley, Boomer, & Ashby, 2010; Smith et al., 2010; Smith, Coutinho, & Couchman, 2011; Smith, Zakrzewski, Johnson, & Valleau, 2016; Wasserman et al., 1988).
To organize these results, some take a multiple-process theoretical perspective toward categorization (e.g., Ashby & Maddox, 2011; Erickson & Kruschke, 1998; Homa et al., 1981; Minda & Smith, 2001; Rosseel, 2002; Smith & Minda, 1998). They suppose that multiple categorization utilities can be called upon when necessary to learn classifications and discriminations. Not everyone endorses this perspective, though. Some favor explaining categorization as a single, unitary process (e.g., Nosofsky & Johansen, 2000; Nosofsky, Little, Donkin, & Fific, 2011). The present article sheds additional light on this debate by presenting a new dissociative paradigm that broadens the empirical support for a multiple-process theoretical perspective.
Implicit-procedural and explicit-declarative category learning
Our approach is grounded in a multiple-process perspective drawn from cognitive neuroscience (e.g., Ashby & Ell, 2001; Ashby & Valentin, 2005; Maddox & Ashby, 2004). One integrated set of processes—called here implicit-procedural learning—is linked to the basal ganglia. It is an important reinforcement-based learning system. It may underlie humans’ procedural, skill, and habit learning (e.g., Mishkin, Malamut, & Bachevalier, 1984) and performance in instrumental-conditioning, perceptual-categorization, and some discrimination-learning tasks (Ashby & Ennis, 2006; Barnes, Kubota, Hu, Jin, & Graybiel, 2005; Divac, Rosvold, & Szwarcbart, 1967; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996; Konorski, 1967; McDonald & White, 1993, 1994; Nomura et al., 2007; O’Doherty et al., 2004; Packard, Hirsh, & White, 1989; Packard & McGaugh, 1992; Seger & Cincotta, 2005; Waldschmidt & Ashby, 2011; Yin, Ostlund, Knowlton, & Balleine, 2005). Categorization and discrimination are old and crucial adaptations that might have originated in evolutionarily older brain regions such as the basal ganglia. This implicit system learns associatively through procedural-learning processes akin to conditioning. It learns slowly, relying on temporally contiguous reinforcement. Participants generally cannot describe their implicit categorization strategies.
Another integrated set of processes—called here explicit-declarative category learning—is linked to the prefrontal cortex, the anterior cingulate gyrus, the head of the caudate nucleus, and the hippocampus. It uses executive attention (Posner & Petersen, 1990) and working memory (Fuster, 1989; Goldman-Rakic, 1987), capacities that would support hypothesis testing and rule formation (Brown & Marsden, 1988; Cools, van den Bercken, Horstink, van Spaendonck, & Berger, 1984; Elliott & Dolan, 1998; Kolb & Whishaw, 1990; Rao et al., 1997; Robinson, Heaton, Lehman, & Stilson, 1980). It learns by testing hypotheses. It learns rules that participants can describe verbally.
Figure 1a shows stimuli for an II task. Both dimensions present valid but insufficient category information. To categorize successfully, the participant must integrate the dimensional information (thus, an II task). The cognitive system accomplishes this integration implicitly. Humans cannot explain their solution of an II task verbally, especially when the stimulus dimensions are incommensurate. Note that in the II task, one-dimensional rules are nonoptimal. A vertical or horizontal category boundary (i.e., an X rule or Y rule, respectively) will not separate the two categories appropriately, producing poor performance.
Figure 1b shows an RB category task. Only Dimension X presents useful category information. Low and high values on Dimension X define Category A and Category B members, respectively. A one-dimensional rule is the optimal solution (thus, an RB task). These rules are explicit (held in working memory) and declarative (verbalizable). Note that RB (and II) participants are never shown the map of the stimulus space as in Fig. 1. Instead, they must learn to categorize based on the presentation of single stimuli with attendant feedback.
Figure 1 shows that II and RB tasks are simply rotations of one another through stimulus space. The tasks are matched for category size, within-category exemplar similarity, between-category exemplar dissimilarity, overall category discriminability (d′), and for the proportion correct that an ideal observer can optimally achieve. Therefore, there is no objective, a priori difficulty difference between RB and II tasks. Illustrating this equivalence, Smith et al. (2011) showed that pigeons (Columba livia) learned II and RB tasks equally well and quickly. Probably this result was obtained because pigeons lack an explicit category-learning system that selectively supports rule learning. In contrast, humans do learn RB tasks faster than II tasks because they do deliberately learn explicit rules. Accordingly, the II and RB tasks are balanced and useful mutual controls. Moreover, an RB learning advantage suggests that a particular rule task is indeed supported by explicit category-learning processes.
There have also been demonstrations that suggest II–RB dissociations in categorization. For example, delaying feedback temporally following category response impairs II learning more than it impairs RB learning (Maddox, Ashby, & Bohil, 2003; Maddox & Ing, 2005). Additionally, participants can self-instruct to learn RB categories under unsupervised conditions when no feedback is available, but they cannot learn II categories in this way (Ashby, Isen, & Turken, 1999; Ashby, Queller, & Berretty, 1999). However, these demonstrations have not been universally persuasive. Cognitive science has an insistent impulse to pursue parsimonious, unitary explanations of performance. This is why theorists long pursued unitary-code theory in the imagery literature and long doubted the idea of multiple, dissociable systems or processes in the memory literature (e.g., Nairne, 1990; Pylyshyn, 1973). In categorization, too, this hope for parsimony has run deep, so that for 20 years the multiple-systems idea has taken hold only slowly, with difficulty (e.g., Nosofsky & Kruschke, 2002). For example, the RB–II accuracy difference in performance has frequently been cast as a difficulty difference confronting a unitary system, even though the objective difficulty of the tasks (without assuming selective attention and rule formation) is equal. Therefore, there is still a need for converging operations, for new dissociative paradigms that broaden the empirical support for a multiple-process theoretical perspective. This is the goal of the present article. In addition, our new paradigm has a distinctive feature that can grant researchers access to new lines of investigation.
Our goal was to provide a new empirical dissociation between implicit-procedural and explicit-declarative category learning, strengthening the empirical basis for the multiple-process theoretical perspective. We also sought to produce the simplest dissociation of its kind. We wanted our paradigm to scale to constructive research with young children, to children with language delays and learning challenges, and to children with different places along the autistic spectrum. We wanted our dissociative method to scale to any species capable of discrimination learning. This potential reach was the distinctive feature of our paradigm. In contrast, for example, placing a young child or nonhuman primate into an unsupervised category-learning experiment of this type is likely to be quite unsuccessful, because it requires a sophisticated instructional preparation and a mature, self-controlled cognitive orientation by the participant.
To create the simplest and sharpest possible dissociation, we took on the considerable challenge of disabling the implicit-procedural learning system. We did this by disrupting its reinforcement dynamic, disrupting thereby a dominant reinforcement-learning system in the brain. It is helpful to describe that reinforcement dynamic here.
The basal ganglia are important for various kinds of reinforcement-based discrimination learning. In nonhuman primates, for example, extrastriate visual cortex projects directly to the tail of the caudate nucleus—with massive convergence of visual cells onto caudate cells that project on to the premotor cortex (Alexander, DeLong, & Strick, 1986). The caudate is well placed to associate percepts through to actions, perhaps its primary role. Multiple lines of research support that role (Eacott & Gaffan, 1992; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard et al., 1989; Packard & McGaugh, 1992; Rolls, 1994; Wickens, 1993).
Rewards cause dopamine release into the tail of the caudate nucleus (Hollerman & Schultz, 1998; Schultz, 1992; Wickens, 1993). The dopamine signal can strengthen recently active synapses that were plausibly participatory in reward (Arbuthnott, Ingham, & Wickens, 2000; Calabresi, Pisani, Centonze, & Bernardi, 1996). There is a constraint on this mechanism. If reinforcement lags, and the neural system returns to baseline, there is no record of the contributing synapses and no way to strengthen them. This system cannot access working memory or declarative consciousness in assigning neural credit for rewards. In caudate-mediated discrimination learning, the idea of stimulus–response (SR) bonds is literal, because the caudate links (associates) cortical stimulus representations (its direct inputs) to adaptive responses (its indirect outputs). But for this system to operate, the relevant cortical representation must still be active, and the reinforcement signal must arrive promptly.
Illustrating this dynamic, Yagishita et al. (2014) used optogenetic methods to stimulate sensorimotor inputs and dopaminergic inputs separately, gaining control over their temporal asynchrony. Dopamine failed to promote strengthened synapses if delayed beyond 2.0 s. Remarkably, these authors imaged dendritic spine improvement but only saw it given immediate reinforcement. The delay curve they plotted is like that plotted when humans learn categories at different reinforcement delays (Maddox et al., 2003; Maddox & Ing, 2005). This temporal restriction applies to many associative and instrumental-conditioning phenomena familiar to comparative psychologists (Han et al., 2003; Kryukov, 2012; Raybuck & Lattal, 2014; Smith & Church, 2017), and it has been known for a century (Pavlov, 1927; Thorndike, 1911).
The implication of this work is that implicit-procedural learning could be disabled by eliminating the availability of relevant cortical representations or by delaying the arrival of the reinforcement—or both, as in our approach. Implicit-procedural learning would become impossible, and one could evaluate participants’ capacity to adopt alternative learning processes instead.
We instituted a 1-Back reinforcement regimen as a simple way to arrange this disruption. In this regimen, reinforcement lagged one trial behind the stimulus–response pairs as they occurred, so that reinforcement never related to the present stimulus or response. Participants received feedback for Trial 1 after completing Trial 2, for Trial 2 after completing Trial 3, and so forth. They were instructed on the nature of the feedback they would receive. At feedback, the reinforcement-relevant stimulus was gone and masked by the present stimulus. The reinforcement was delayed outside the tolerance of striatal learning. Our hypothesis was that 1-Back reinforcement would disrupt the associative reinforcement-learning system thoroughly (doubly) by blocking it representationally and temporally, even though participants had full knowledge of the reinforcement regimen.
First, 1-Back reinforcement (compared to 0-Back reinforcement) should defeat the reinforcement-based processes underlying II learning. We predicted this learning process would collapse.
Second, 1-Back reinforcement should affect RB learning minimally. RB learners could hold their rule in working memory and evaluate its aptness equally well facing lagged or immediate reinforcement. So RB learning should still succeed under lagged reinforcement.
Third, if 1-Back reinforcement disables II but not RB category learning, II participants facing 1-Back reinforcement might turn—by information-processing necessity—to rules instead. Thus, we predicted that II 1-Back participants would supply their own rule construal of the II task because that was what they still could do—even though such a rule was not much good in the II task.
If confirmed, these predictions would provide an elemental dissociation between RB and II learning and strongly demonstrate that lagged reinforcement disables associative, reinforcement-based learning.
One hundred and seventy-three Georgia State undergraduates with normal or corrected-to-normal vision participated for course credit. Participants’ data were excluded with cause if they completed fewer than 480 trials (one participant each excluded from the RB 0-Back, II 1-Back, and II 0-Back conditions) or if they showed no learning. No learning was defined as not scoring significantly above chance performance (56.7% correct) in the last half of the trials (15, 16, 13, and six participants were excluded for this reason from the RB 1-Back, RB 0-Back, II 1-Back, and II 0-Back conditions, respectively). Because we are interested in the strategies of category learning used in the last 100 trials, it was important to only include participants who were still actively engaging the task and trying to correctly categorize even at the end. This learning criterion allowed us to exclude participants who either because of a lack of motivation, boredom/fatigue, or difficulty understanding the instructions were no longer trying to make accurate decisions toward the end. Based on previous findings, we used a stopping rule of 30 includable participants per condition. The final sample included 120 participants—30 in each of the four conditions.
The category structures used were a major-diagonal II structure with size and density relevant and a vertical RB structure with size relevant. The categories were defined by bivariate normal distributions along the stimulus dimensions. Each exemplar was selected as a coordinate pair in the 101 × 101 space, and these abstract levels were transformed into concrete size and density values (see Stimuli, above). Each participant received his or her own sample of randomly selected category exemplars appropriate to the assigned task. To control for statistical outliers, we did not present exemplars whose Mahalanobis distance (e.g., Fukunaga, 1972) from the category mean exceeded 3.0. This ensured well-behaved elliptical stimulus distributions for the categories.
Design and procedure
The experiment included four between-participant conditions created by crossing two category structures (RB, II) with two reinforcement conditions (0-Back, 1-Back). Participants were assigned randomly to a task and reinforcement condition using their participant number in the experiment.
Our crucial manipulation was to disrupt the normal cycle of immediate reinforcement following the response to a stimulus. In our 0-Back reinforcement condition, this cycle was sustained. Participants saw a stimulus, categorized it by making a Category A or Category B response, and then received immediate feedback. In our 1-Back reinforcement condition, this cycle was disrupted. Participants saw a stimulus, categorized it by making a Category A or Category B response, but then received feedback pertaining to the previous trial they had completed (after Trial 2, feedback for Trial 1 was delivered; after Trial 3, feedback for Trial 2 was delivered, etc.).
The feedback was positioned spatially to make clear to which trial the feedback pertained—this was the purpose of alternating Top and Bottom trials. Notice that the 1-Back reinforcement did not concern a presently available stimulus, or the most recently available stimulus/cortical representation, or the most recently completed behavioral response. It concerned a previous stimulus, cortical representation, and response. Associative learning was disrupted representationally. It was also disrupted temporally because the reinforcement given was delayed several seconds beyond the stimulus–response pair to which it belonged.
On each trial, the to-be-categorized rectangle appeared at the computer screen’s far right. Toward the left of the screen were the large-font letters “A” (on the left) and “B” (on the right), along with a participant-controlled cursor midway between them. Participants pressed the “S” or “L” key on the computer keyboard to choose the response “A” or “B,” indicating to which category they thought the stimulus belonged. The response keys corresponded spatially to the “A” and “B” response icons on the screen and they had tape labeling the appropriate keys “A” and “B.” Top and Bottom trials were arrayed across the top and bottom halves of the screen, respectively, for reasons already explained.
In the 0-Back condition, participants received immediate feedback after each trial. After correct responses, they saw, This Top (Bottom) trial was correct +1 Points Total Points N+1. After incorrect responses, they saw, This Top (Bottom) trial was incorrect −1 Points Total Points N-1. In the latter case, they received a brief penalty time-out of 2 s.
In the 1-Back condition, the feedback was displaced spatially and temporally. That is, following a Bottom trial, participants received lagged feedback regarding the previous Top trial. For example, they might see, given a correct response, and presented at the top of the screen in the position for Top trial feedback, This Top trial was correct +1 Points Total Points N+1. Following a Top trial, participants received lagged feedback regarding the previous Bottom trial. For example, they might see, given an incorrect response, and presented at the bottom of the screen in the position for Bottom trial feedback, This Bottom trial was incorrect −1 Points Total Points N-1. Trials continued until the 52-min session ended or the participant completed 480 trials.
Instructions: 1-Back condition
Participants were told that they would categorize pixel boxes varying in size and dot density as Category A or Category B. They were told that A and B boxes would occur equally often, and that they would have to guess at first, but later would learn to respond correctly. They knew that they would gain or lose 1 point for correct and incorrect responses, respectively, and that they would receive a time-out for incorrect responses. They were told that errors would cost them points, and time to earn points, and that it could make their session longer. They were told that on Trial 2 they would receive feedback from their response on Trial 1, and on Trial 3 receive feedback from their response on Trial 2. They were told that their feedback would always lag one trial behind throughout the task. They were told that even though the boxes alternated top and bottom on the screen, this had nothing to do with their Category A or Category B status. This was done to help them keep track of whether the feedback applied to a top or bottom trial.
Instructions: 0-Back condition
The instructions were similar in many respects for the 0-Back participants, except that they were simply told that they would receive feedback on their responses after each trial.
Following Maddox and Ashby (1993), we fit rule-learning and procedural-learning formal models to each participant’s last 100 categorization responses. The rule-learning model assumes that participants set a criterion on one stimulus dimension. This unidimensional criterion can be visualized as a vertical or horizontal line through the stimulus space of Fig. 1. The modeling specifies the horizontal or vertical line that would best partition the participant’s Category A and Category B responses. The rule-learning model has two free parameters: a perceptual noise variance and a criterion value on the relevant dimension. The procedural-learning model assumes that participants partition the stimulus space consonant with a diagonal decision boundary of some slope and intercept. The modeling lets us specify the line of any slope and intercept that best partitions the participant’s Category A and Category B responses. The procedural-learning model has three free parameters: a perceptual noise variance and the slope and intercept of the decision boundary.
The modeling yields the best fitting decision boundary that summarizes the partition between the categories the participant achieved. This boundary summarizes category performance. However, participants may not learn this boundary, or use this boundary, or have this boundary as any aspect of their category knowledge. In particular, in the case of II learning, participants learn in essence SR associations, the correct response mapping to many category instances. This will produce a category partition that the model captures as a boundary and that we draw in the figures below, but this boundary almost certainly has no role in the person’s II categorization performance, and no place in their II category knowledge.
We estimated the best fitting values for the free parameters in the models using the method of maximum likelihood. The process of model fitting asked which model would have created—with maximum likelihood—the distribution of Category A–B responses the participant produced. The best fitting model was chosen as the one with the smallest Bayesian information criterion (BIC Schwarz, 1978), which is defined as: BIC = r lnN – 2lnL, where r is the number of free parameters, N is the sample size, and L is the likelihood of the model given the data.
The proportion of correct responses was examined across twenty-four 20-trial blocks in a general linear model (GLM), with task (RB, II) and reinforcement condition (0-Back, 1-Back) as between-participant factors and trial block as a within-participant factor. The significant task main effect, F(1, 116) = 65.423, p < .001, ηp 2 = .979, confirmed that RB learning was generally stronger than II learning. This is a ubiquitous finding in this cognitive-neuroscience research area because humans’ rule-based processes for category learning are insistent within their cognitive system and privileged in category learning. The significant reinforcement main effect, F(1, 116) = 10.522, p = .002, ηp 2 = .881, confirmed the intuitive result that 0-Back reinforcement generally produced stronger category learning. The significant trial block main effect, F(23, 2668) = 30.535, p < .001, ηp 2 = .208, confirmed that learning occurred.
A careful examination of the learning curves shows very fast learning in the RB task with 0-Back feedback. By Block 4, group performance is at a high level of accuracy, suggesting that most participants have discovered the dimension of importance and found a good decision boundary. With 1-Back feedback, RB task participants are slower to come to this discovery, but by Block 10 they are also at a highly accurate performance level. The slower discovery of the dimension of interest is not surprising. Greater working memory demands are required in the 1-Back condition, and the nature of the feedback can be initially confusing. Examination of the II task condition with 0-Back feedback suggests a much more gradual learning curve that continues to improve slowly through the blocks, though by the end has not reached the very high accuracy seen in both RB tasks. On the other hand, in the II condition with 1-Back feedback, what little learning takes place seems to happen within the first few blocks, and then performance remains largely constant until the end of the blocks.
We further summarized this result in two ways: first by focusing on participants’ performance during their last trial block and second by examining the change in performance between the first block and the last block. In the last block, participants averaged .920, 95% CI [.970, .870]; .900, 95% CI [.965, .835]; .733, 95% CI [.780, .687]; and .653, 95% CI [.699, .608] correct in the RB-0, RB-1, II-0, and II-1 conditions, respectively. When we examined change in accuracy across the experiment (first block subtracted from last block), we saw a similar pattern. Participants’ average change was .300, 95% CI [.368, .232]; .328, 95% CI [.407, .250]; .207, 95% CI [.275, .138]; and .130, 95% CI [.205, .055] in the RB-0, RB-1, II-0, and II-1 conditions, respectively. Taken together, these patterns suggest that 1-Back reinforcement had essentially no cost to final RB learning but a substantial cost to II learning. The modeling analyses reveal the true extent of this cost.
We modeled participants’ last 100 trials to determine whether they adopted appropriate decision strategies and whether different reinforcement regimens changed their decision strategies in a theoretically meaningful way.
Figure 4b shows modeling results for the RB 1-Back participants. This panel looks like that in Fig. 4a. Twenty-five of the 30 decision boundaries were vertical, indicating the participant’s appreciation of the task’s appropriate X-dimension rule. Three participants misconstrued the task and performed according to a Y-dimension rule (horizontal decision boundaries). One participant was best fit by a procedural-learning model and showed a diagonal decision boundary. There was one random-guessing participant again (no decision boundary).
Overall, the modeling results of the RB tasks confirmed the performance results. That is, participants were easily and equally able to learn the RB task’s size-rule solution under conditions of 0-Back and 1-Back reinforcement. The lagged reinforcement did not alter the character of their final learning. We confirmed this result statistically by computing chi-square. We used the number of participants who were best fit by the X-dimension rule, the Y-dimension rule, and the procedural-learning model in the RB 0-Back condition as the expected category observations, and the number in each category in the RB 1-Back condition as the observed values. These numbers were not significantly different between the conditions: χ2(2, N = 29) = 2.924, p = .404; w = .318.1
Figure 4c shows modeling results for the II 0-Back participants. Eighteen of the participants were best fit by a procedural-learning model that indicated a diagonal decision boundary through the stimulus space. Most of these decision boundaries were organized along the stimulus space’s major diagonal. These participants found a way to integrate the informational signals provided by the two stimulus dimensions toward making appropriate category decisions. However, as is always true in experiments of this kind, some humans insisted on imposing adventitious unidimensional rules onto the II structure. In this case, 12 of the participants were best fit by a rule model that indicated for them either a horizontal or a vertical decision boundary. Humans’ rule-seeking category-learning system is insistent and can be dominant even when the result is suboptimal performance. This “misbehavior” by humans in the II task is another indication of the dissociative aspect of humans’ category learning that is the focus of this research.
Figure 4d shows modeling results for the II 1-Back participants. This panel does not look like that in Fig. 4c. Now, only 10 of the participants were best fit by a procedural-learning model of any slope. From the II 0-Back to the II 1-Back condition, the number of sloped decision boundaries was essentially halved. Now, 19 of the participants were best fit by a rule model, showing an adventitious, inappropriate vertical or horizontal decision boundary. From the II 0-Back to the II 1-Back condition, this inappropriate use of a rule framework essentially doubled. This condition also contained one random-guessing participant (no decision boundary drawn).
In reality, the II-1-Back learning success was far worse than stated. Only two participants, compared to 15 participants in the II 0-Back condition, showed the positively sloped diagonal boundary that would suggest any appreciation of the II task’s true underlying category structure. One may almost say that 1-Back reinforcement switched off true II category learning completely and qualitatively. Instead, participants defaulted to a rule strategy with decision boundaries dividing Dimension X or Dimension Y. The participants may have defaulted to the only categorization strategy that was available to them under 1-Back reinforcement. They had to hold in working memory a description of what they had done on that past trial so that the lagged reinforcement—when it finally came—could still support continuing category learning. As suggested by multiple-process theory, this description apparently had the form of a one-dimensional rule—it certainly did not have the form of an appropriate integrative principle across the dimensions.
Overall, modeling results of the II tasks confirmed the performance results. That is, participants were not easily able or equally able to learn the II task’s appropriate diagonal partition under conditions of 0-Back and 1-Back reinforcement. The lagged reinforcement did alter their pattern of learning. We confirmed this result statistically using a similar analysis to that used for RB participants, χ2(2, N = 29) = 8.306, p = .016; w = .535.
To further quantify this seeming difference in strategy use, we considered the performance accuracy on the modeled trials (the last 100) of only the participants who were best fit by the “correct” strategy model for their condition (RB participants best fit by an X-dimension rule, II participants best fit by a procedural-learning model). We conducted a GLM using categorization task (RB, II) and reinforcement (0-Back, 1-Back) as the independent variables. The significant main effect of task, F (1, 72) = 213.000, p < .001, ηp 2 = .747, confirmed that participants in the RB conditions performed more accurately. The significant reinforcement main effect, F(1, 72) = 23.883, p < .001, ηp 2 = .248, confirmed that 0-Back reinforcement produced stronger category learning. The significant Task × Reinforcement interaction, F(1, 72) = 10.635, p < .001, ηp 2 = .128, suggested that the effect of reinforcement type on accuracy was different depending on whether participants were “correctly” best fit by the procedural learning or the X-dimension rule model. Planned comparisons found that RB performance levels were not statistically different, t(46) = 1.375, p = .173, Cohen’s d = 0.485. As seen in Fig. 4, both 0-Back and 1-Back X-rule participants showed equivalently accurate performance and similarly correct placement of the rule boundary (.968, 95% CI [.978, .957] and .941, 95% CI [.972, .910] for 0-Back and 1-Back, respectively). However, II performance levels were statistically different, t(26) = 5.050, p < .001, Cohen’s d = 1.640. Even for participants best fit by an II model, 1-Back reinforcement significantly impaired their ability to learn the correct decision boundary (.779, 95% CI [.823, .735] and .644, 95% CI [.698, .590] for 0-Back and 1-Back, respectively).
Indeed, the modeling results strengthen the study’s theoretical interpretation beyond the accuracy-based analyses. The .65 accuracy achieved by II 1-Back participants definitely does not signify 65% successful and appropriate II learning. It signifies heavy dependence on adventitious rules, and it signifies heavy dependence on the wrong information-processing strategy for the II task. Really, there was almost no successful II learning in this condition. Implicit-procedural learning was disabled by 1-Back reinforcement.
Cognitive science often expresses its preference for unitary codes in knowledge representation and for single, all-explanatory learning/memory systems. It is a central issue whether minds are parsimonious in this way, or whether minds have accumulated many useful, nonparsimonious apps during cognitive evolution. The debate over multiple categorization systems reflects this tension again. Thus, categorization researchers continue to seek strong dissociative paradigms to determine whether multiple, qualitatively different processes are suggested. We introduced a new paradigm here.
We predicted that 1-Back reinforcement would disable associative, reinforcement-driven learning and the II category-learning processes that depend on it. This disabling seems to have been complete.
We predicted that RB participants could hold their provisional category rule in working memory, making it accessible for evaluation under 0-Back or 1-Back reinforcement. RB learning survived 1-Back reinforcement. The dissociation from these two results combined provides new support for a multiple-process conception of human categorization. Different category tasks foster different category-learning processes.
We predicted that participants might fall back, by information-processing necessity, to rule strategies when 1-Back reinforcement disabled implicit-procedural learning. This implicit system cannot bridge between a past stimulus–response pair and future reinforcement. Working memory can, but it has been known since Bruner, Goodnow, and Austin (1956) that humans’ explicit classificatory rules are low-dimensional or one-dimensional. In fact, II 1-Back participants largely turned toward one-dimensional rules.
A broader class of learning paradigms
Our paradigm is complementary to others pursuing a similar theoretical goal—to block the influence of immediate reinforcement and foster the recruitment of explicit-declarative learning processes instead. These complementary tasks lie along a spectrum. Our task here separated the trial from its reinforcement through a one-trial lag. It minimally separated trials from reinforcement, it maximally integrated reinforcement into the steady-state trial environment, and it let reinforcement maximally energize and motivate task participation.
In an intermediate manipulation, Smith et al. (2014) created a trial block of separation. Participants completed a block of trials before feedback. At block’s end, they received all their rewards clustered and then all their penalty time-outs clustered. Feedback was temporally displaced and scrambled out of trial-by-trial order, doubly defeating stimulus–response learning. However, now reinforcement could only sporadically motivate performance. And now, the instructional set communicated to the participant, and their self-control in executing it, carried a heavier burden.
In an extreme manipulation, Ashby, Isen, et al. (1999; Ashby, Queller, et al., 1999) created ultimate separation by eliminating feedback entirely through an unsupervised-learning paradigm. Now, the burden on the communicated instructional set and its self-management was very heavy. No reinforcement helped motivate performance. Nonetheless, this technique powerfully elicited explicit-declarative processes in category learning from adult, cognitively sophisticated humans.
These tasks share a goal and a family resemblance, while differing in how far they distance reinforcement, how well they still let reinforcement energize performance, and how demanding they are that participants receive, accept, and execute an elaborate cognitive set. Using varied means, they all disrupt the temporal contiguity of the reinforcement signal, disable the reinforcement-binding properties, and prompt a transition to alternative, explicit learning processes. In a sense, these paradigms all seek to replace concrete reinforcement with feedback (or self-feedback) that has a purely informational function, so that it supports learning at an explicit level even if it can no longer support learning at an associative level. The crux of all these paradigms is to keep immediate reinforcement at a “safe” methodological and theoretical distance that rules out associative learning. Then, feedback provides food for thought, not fuel for habit formation. Collectively, these paradigms are progressively combining into a persuasive and conclusive dissociative framework within the categorization literature.
Empirical and theoretical extensions
As specific paradigms, these techniques naturally have their different strengths and weaknesses. For example, unsupervised learning is a powerful way to dissociate away associative learning—it eliminates concrete reinforcement entirely. However, it demands sophisticated participants. In contrast, the 1-Back technique would suit other populations. It integrates reinforcement more thoroughly. It energizes performance more encouragingly. It depends less on the experimenter-participant instructional/social contract. Accordingly, the 1-Back technique shows promise for less sophisticated, less verbal populations. For instance, developmentalists could use it to explore the earliest roots of explicit-declarative cognition in children. It is not known at what age children can first supply their own hypotheses and cognitive construals when reinforcement-driven learning is disallowed. Yet this is an important developmental step because self-directed, self-construed learning is an essential human capacity. Young children might not self-sustain interest or effort without ongoing reinforcement (stickers!) such as the 1-Back task would provide.
Our paradigm also has implications for comparative psychology. A problem faced in behavioral research is that animals’ performances might reflect their higher level cognitive processes or their reinforcement-driven behavioral reactions. There is always the possibility that immediate reinforcement is the true underlying engine of behavior and the integrator of stimulus–response (SR) bonds during learning. Moreover, this problem has often been considered inexorable, given the broad belief that immediate reinforcement is indispensable because it is the reason that animals perform and learn. However, our approach shows that simple dissociative paradigms can be developed that transcend reinforcement-driven learning while sustaining interest and motivation. One can then ask whether animals, in that circumstance, have another level and kind of learning that can replace this.
Our paradigm could be applied to any species that is capable of discrimination learning in simple two-response tasks. Therefore, one could also provide to comparative theory a phylogenetic map of explicit-declarative cognition. One could ask which vertebrate lines are capable of engaging in something like explicit-declarative cognition by asking which lines learn successfully under 1-Back reinforcement. This could be related to their known evolutionary histories and to their frontal-cortical development, also tracing the neuroscientific emergence of explicit cognition during cognitive evolution.
Thus, we believe that the present dissociative paradigm—the 1-Back methodology—represents a complementary methodology of interest to cognitive, comparative, and developmental psychologists, and to many biobehavioral researchers, too. Indeed, the empirical power to qualitatively unplug and shut down associative learning, using techniques like 1-Back feedback that require feedback to be interpreted informationally and explicitly, could become a powerful tool in the next epoch of theoretical development in biobehavioral research (Smith & Church, 2017).
Individuals who best fit the guessing model were not included in the chi-square analyses. This made the analyses more conservative in relation to our hypotheses.
The preparation of this article was supported by Grants HD-060563 and HD-061455 from NICHD, and Grant BCS-0956993 from the National Science Foundation. We want to thank the research assistants in the Complex Cognition Lab at Georgia State University for their help with data collection. Original data and code is available upon request from the first author.
- Alexander, G. E., DeLong, M. R., & Strick, P. L. (1986). Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annual Review of Neuroscience, 9, 357–381. doi: https://doi.org/10.1146/annurev.ne.09.030186.002041 CrossRefPubMedGoogle Scholar
- Arbuthnott, G. W., Ingham, C. A., & Wickens, J. R. (2000). Dopamine and synaptic plasticity in the neostriatum. Journal of Anatomy, 196, 587–596. doi: https://doi.org/10.1046/j.1469-7580.2000.19640587.x CrossRefPubMedPubMedCentralGoogle Scholar
- Ashby, F. G., & Ennis, J. M. (2006). The role of the basal ganglia in category learning. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 46, pp. 1–36). San Diego: Academic Press.Google Scholar
- Brooks, L. R. (1978). Nonanalytic concept formation and memory for instances. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 169–211). Hillsdale: Erlbaum.Google Scholar
- Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. Oxford: Wiley.Google Scholar
- Cools, A. R., van den Bercken, J. H., Horstink, M. W., van Spaendonck, K. P., & Berger, H. J. (1984). Cognitive and motor shifting aptitude disorder in Parkinson’s disease. Journal of Neurological and Neurosurgical Psychology, 47, 443–453. Retrieved from http://jnnp.bmj.com/content/jnnp/47/5/443.full.pdf CrossRefGoogle Scholar
- Eacott, M. J., & Gaffan, D. (1992). Inferotemporal-frontal disconnection: The uncinate fascicle and visual associative learning in monkeys. European Journal of Neuroscience, 4, 1320–1332. doi: https://doi.org/10.1111/j.1460-9568.1992.tb00157.x CrossRefPubMedGoogle Scholar
- Fukunaga, K. (1972). Introduction to statistical pattern recognition. New York: Academic Press.Google Scholar
- Fuster, J. M. (1989). The prefrontal cortex, (2nd ed.). Philadelphia: Lippincott-Raven.Google Scholar
- Gaffan, D., & Eacott, M. J. (1995). Visual learning for an auditory secondary reinforcer by macaques is intact after uncinate fascicle section: Indirect evidence for the involvement of the corpus striatum. European Journal of Neuroscience, 7, 1866–1871. doi: https://doi.org/10.1111/j.1460-9568.1995.tb00707.x CrossRefPubMedGoogle Scholar
- Goldman-Rakic, P. S. (1987). Circuitry of the prefrontal cortex and the regulation of behavior by representational knowledge. In F. Plum & V. Mountcastle (Eds.), Handbook of physiology (pp. 373–417). Bethesda: American Physiological Society.Google Scholar
- Han, C. J., O’Tuathaigh, C. M., van Trigt, L., Quinn, J. J., Fanselow, M. S., Mongeau, R., …, Anderson, D. J. (2003). Trace but not delay fear conditioning requires attention and the anterior cingulate cortex. Proceedings of the National Academy of Sciences of the United States of America, 100, 13087–13092. doi: https://doi.org/10.1073/pnas.2132313100 CrossRefPubMedPubMedCentralGoogle Scholar
- Kolb, B., & Whishaw, I. Q. (1990). Fundamentals of human neuropsychology (3rd ed.). New York: Freeman.Google Scholar
- Konorski, J. (1967). Integrative activity of the brain. Chicago: University of Chicago Press.Google Scholar
- Maddox, W. T., & Ing, A. D. (2005). Delayed feedback disrupts the procedural-learning system but not the hypothesis testing system in perceptual category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 100–107. doi: https://doi.org/10.1037/0278-7318.104.22.168 PubMedGoogle Scholar
- Medin, D. L. (1975). A theory of context in discrimination learning. In G. Bower (Ed.), The psychology of learning and motivation (Vol. 9, pp. 263–314). New York: Academic Press.Google Scholar
- Mishkin, M., Malamut, B., & Bachevalier, J. (1984). Memories and habits: Two neural systems. In G. Lynch, J. L. McGaugh, & N. M. Weinberger (Eds.), Neurobiology of human learning and memory (pp. 65–88). New York: Guilford Press.Google Scholar
- Murphy, G. L. (2003). The big book of concepts. Cambridge: MIT Press.Google Scholar
- Nosofsky, R. M., & Johansen, M. K. (2000). Exemplar-based accounts of multiple-system phenomena in perceptual categorization. Psychonomic Bulletin & Review, 7, 375–402. Retrieved from http://psiexp.ss.uci.edu/research/teaching/Nosofsky_Johansen_2000.pdf Google Scholar
- Packard, M. G., & McGaugh, J. L. (1992). Double dissociation of fornix and caudate nucleus lesions on acquisition of two water maze tasks: Further evidence for multiple memory systems. Behavioral Neuroscience, 106, 439–446. doi: https://doi.org/10.1037/0735-7044.106.3.439 CrossRefPubMedGoogle Scholar
- Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. London: Oxford University Press.Google Scholar
- Pearce, J. M. (1994). Discrimination and categorization: Animal learning and cognition. In N. J. Mackintosh (Ed.), Handbook of perception and cognition series (2nd ed., Vol 18, pp. 109–134). San Diego: Academic Press.Google Scholar
- Rao, S. M., Bobholz, J. A., Hammeke, T. A., Rosen, A. C., Woodley, S. J., Cunningham, J. M., … Binder, J. R. (1997). Functional MRI evidence for subcortical participation in conceptual reasoning skills. NeuroReport, 27, 1987–1993. doi: https://doi.org/10.1097/00001756-199705260-00038 CrossRefGoogle Scholar
- Robinson, A. L., Heaton, R. K., Lehman, R. A. W., & Stilson, D. W. (1980). The utility of the Wisconsin Card Sorting Test in detecting and localizing frontal lobe lesions. Journal of Consulting and Clinical Psychology, 48, 605–614. doi: https://doi.org/10.1037/0022-006X.48.5.605 CrossRefPubMedGoogle Scholar
- Rolls, E. T. (1994). Neurophysiology and cognitive functions of the striatum. Revue Neurologique, 150, 648–660.Google Scholar
- Smith, J. D., Beran, M. J., Crossley, M. J., Boomer, J., & Ashby, F. G. (2010). Implicit and explicit category learning by macaques (Macaca mulatta) and humans (Homo sapiens). Journal of Experimental Psychology: Animal Behavior Processes, 36, 54–65. doi: https://doi.org/10.1037/a0015892 PubMedPubMedCentralGoogle Scholar
- Smith, J. D., & Church, B. A. (2017). Dissociable learning processes in comparative psychology. Psychonomic Bulletin and Review. Advance online publication. doi: https://doi.org/10.3758/s13423-017-1353-1
- Smith, J. D., Coutinho, M. V. C., & Couchman, J. J. (2011). The learning of exclusive-or categories by monkeys (Macaca mulatta) and humans (Homo sapiens). Journal of Experimental Psychology: Animal Behavior Processes, 37, 20–29. doi: https://doi.org/10.1037/a0019497 PubMedPubMedCentralGoogle Scholar
- Thorndike, E. L. (1911). Animal intelligence. New York: Macmillan.Google Scholar
- Wickens, J. (1993). A theory of the striatum. New York: Pergamon Press.Google Scholar
- Yagishita, S., Hayashi-Takagi, A., Ellis-Davies, G. C., Urakubo, H., Ishii, S., & Kasai, H. (2014). A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science, 345, 1616–1620. doi: https://doi.org/10.1126/science.1255514 CrossRefPubMedPubMedCentralGoogle Scholar