Introduction

In humans, implicit learning is non-episodic learning of complex information in an incidental manner, without awareness of what has been learned (Seger, 1994). Learning is thought to be explicit if subjects can verbalize the activity or rule. This definition implies that one can attend to, or is conscious of, the learning. Neal and Hesketh (1997) suggest that in explicit learning the emphasis should be shifted from attention to intention, another term that implies consciousness. In contrast, implicit learning is thought to be a process in which information is abstracted out of the environment without recourse to explicit strategies for responding or systems for recoding the stimuli (Reber, 1967). De Houwer (2009) proposes that most learning, with the exception of very simple organisms (e.g., aplysia), can be described as propositional. Propositional models imply reasoning, and according to De Houwer (2009), they provide a better account of associative learning than implicit association formation theory, not only in humans but also in other animals. Shanks (2010) largely agrees, but suggests that it may not be possible to disentangle implicit and explicit learning because explicit cognitive constructs, such as attention and awareness, may also involve implicit associations.

As can be seen from the various approaches to research with humans, the distinctions between implicit and explicit learning are difficult to make. To make it even more difficult to make a clear distinction, Stadler (1997) has reported that subjects sometimes find explicit regularities in the task that are not actually present – explicit rules that turn out to be incorrect – while other subjects, who cannot verbalize the pattern and whose improvement in performance as measured by reaction time would be thought to be purely implicit, later realize that they knew more about the task than they first thought.

Although the boundary conditions between implicit and explicit learning may be difficult to define, an example of the distinction between implicit and explicit learning that most people have encountered may be helpful. As a child, one may learn to tie one’s shoes explicitly (by verbal rules or motor imitation), but as one gets older, the action of tying one’s shoes becomes automatic or implicit. If, as an adult, one is asked, “how does one tie one’s shoes,” the process may be difficult to describe (explicitly). In fact, it may be easier to go through the motions (implicitly) and, while so doing, describe one’s actions, rather than trying to describe tying one’s shoes without the actions. What has happened is implicit learning has taken over for explicit learning. Of course, explicit learning can recover, as what one often must do to describe tying one’s shoe to a child.

In the present context, implicit learning falls within the scope of associative learning. Although associative learning is agnostic as to the distinction between conscious and subconscious learning, associative learning is generally thought to be a gradual process. Of course, it is also possible for very rapid associative learning to occur (e.g., conditioned taste aversion; Garcia & Koelling, 1966), and explicit learning can be gradual, when applied to concept learning tasks (consider, e.g., a child learning the concept of dog). Nevertheless, the distinction between gradual and all-or-none learning may be useful.

Given that distinguishing implicit from explicit learning in humans is not always easy, the task of trying to identify explicit learning in nonverbal animals is even more difficult. Several indirect procedures have been proposed, however, that have been suggested to show evidence for explicit learning in nonverbal animals.

One such procedure involves the use of a categorical discrimination in which stimuli fall along a continuum and the task requires that the stimuli must be assigned to one of two response categories depending on which side of a criterial value they fall (e.g., Smith et al., 1995, 1997). At stimulus values far from the criterion level, the discrimination is easy. When the value of the stimulus approaches the criterial value, however, the discrimination is more difficult, and errors tend to occur. It has been suggested that the inclusion of an “uncertain” response may allow an animal to avoid these difficult discriminations (e.g., Smith et al., 1995, 1997). It is further proposed that the use of an uncertain response would require the use of controlled decisional mechanisms involving the explicit monitoring of cognition (e.g., “do I know into which category the stimulus should go or should I make the ‘uncertain’ response”).

When humans are subjects, it is assumed that they would rather avoid making an incorrect categorizing response and instead choose the uncertain response because by doing so they would be able to skip to the next trial. But nonhuman animals are typically working for food as a reinforcer, and when the discrimination is difficult, if one were to choose the categories randomly it would provide reinforcement 50% of the time. For this reason, there must be some incentive for the animal to choose the uncertain response. When the uncertain response has been allowed with animals, outcomes for choosing the uncertain response may involve avoiding an aversive timeout if they are incorrect and often getting an easy-to-categorize stimulus on the next trial (e.g., Couchman et al., 2010; Smith et al., 1995, 1997). Alternatively, choosing the uncertain response may result in obtaining a reward smaller than the reward for correct categorization (Foote & Crystal, 2007; Hampton, 2001). In fact, it can be argued that in order to encourage the use of the uncertain response, the reward for making the uncertain response should be larger than a 50% chance of getting a reward for correct categorization, otherwise the uncertain response may never be made.

It can be argued, however, that with reinforcement for making the uncertain response, the choice among three responses allows for a clear implicit account. With practice, each stimulus would be associated with one of the three response alternatives according to its maximum reinforcement value. When the stimuli are close to the criterial level, and the value of a categorical response declines to near 50%; if the value of the uncertain response exceeds the value of 50% reinforcement, an uncertain response should be made (see Foote & Crystal, 2007; Jozefowiez et al., 2009; Le Pelley, 2012; Smith et al., 2008). Thus, the use of an uncertain response for stimuli close to the categorical boundary does not imply that the response was made explicitly.

A second presumed source of evidence for a nonverbal distinction between implicit and explicit learning comes from performance on rule-based category-learning tasks, as compared to information-integration tasks. In a rule-based task, for example, stimuli are varied along two dimensions (e.g., size and brightness), but only one of the dimensions is relevant (e.g., size). Thus, a simple rule can govern the correct response (e.g., choose left when the rectangle is large, choose right when the rectangle is small; see Fig. 1a). In an information-integration task, however, the stimuli are varied along two dimensions and both dimensions are relevant, such that no simple category rule will work (see Fig. 1b). There is evidence that information-integration tasks of this kind are learned slower by humans than rule-based tasks, presumably because in the case of information-integration tasks there is no simple rule that can be used to categorize the stimuli, so learning is hypothesized to be implicit (Smith et al., 2011). Thus, according to Smith et al., slow learning on the information-integration task suggests it is learned implicitly.

Fig. 1
figure 1

Rule-based and information-integration category tasks illustrated within a two-dimensional stimulus space. The gray and black symbols represent responses to stimuli from Category A and Category B, respectively (after Smith et al., 2018)

Importantly, when pigeons have been trained on these tasks, both tasks have been found to be acquired at the about the same rate (Smith et al., 2011). Thus, it is assumed that the pigeons acquire both tasks implicitly. Le Pelley et al. (2019) have noted, however, that the information-integration task is inherently more difficult for humans than the rule-based task. When they corrected for the difference in task difficulty, they found that for humans there was similar accuracy on both rule-based and information-integration tasks (see also Wills et al., 2019). Thus, these tasks may not differ inherently in the way that they are learned by humans, implicitly or explicitly.

A different approach to making a distinction between implicit and explicit learning was developed by Smith et al. (2020). They proposed that implicit learning relies on temporally contiguous reinforcement, such that one should be able to interfere with implicit learning by delaying feedback following a response. This hypothesis is based on research suggesting that the implicit system relies on direct neural connections to reward centers in the brain (Arbuthnott et al., 2000; Calabresi et al., 1996) and if reinforcement lags, it should not be possible to strengthen the synapses that contribute to implicit learning (Smith et al., 2020). Smith et al. (2020) argue that for the implicit system to function, the relevant cortical representation must still be active, and the signal for reinforcement must arrive promptly (within about 2 s).

With the goal of ruling out implicit learning, Smith et al. (2020) trained monkeys on various conditional discriminations (e.g., in the presence of stimulus A a response to stimulus X, but not stimulus Y, is reinforced, whereas in the presence of stimulus B a response to stimulus Y, but not stimulus X, is reinforced) in which the subjects received feedback from their response on Trial 1, only after completing Trial 2, and on Trial 2, only after completing Trial 3, and so forth. Smith et al. (2020) found that three of their four monkeys were quite good at learning these 1-back reinforcement discriminations (see also Smith et al., 2006). This conclusion relies on the assumption that the 1-back reinforcement task can only be learned explicitly. Alternatively, the 1-back reinforcement merely made implicit learning of the task more difficult.

Pigeons are generally thought to be implicit learners (Jozefowiez et al., 2009; Smith et al., 2011). However, Nosarzewska et al. (2021) found that pigeons, too, are able to learn this 1-back reinforcement task, albeit slowly and to a modest level (see also Zentall et al., in press). Thus, it may be that implicit learning is not disabled by a meaningful delay between the response and the reinforcer.

A different way of characterizing implicit versus explicit learning is in the way learning progresses. One of the characteristics of many human explicit learning tasks is that learning is often abrupt (e.g., all-or-none; Kintsch, 1963), especially when a relatively simple rule is involved (Bower & Trabasso, 1964; Millward & Spoehr, 1973). Subjects typically try out one strategy at a time until they find the correct rule. Thus, with such a task, as humans are trying out incorrect rules, accuracy should be close to chance, but when they discover the correct rule, accuracy should almost immediately reach a high, nearly perfect level. This is quite different from the way pigeons learn this task.

The purpose of the present experiment was to compare the learning functions of humans with the learning functions previously obtained from pigeons (Zentall, Peng, & Mueller, in press). Do humans learn abruptly, or gradually the way pigeons presumably do? And if humans learn, are they able to articulate the rule?

A second purpose of the present experiment was to determine if the ability of human subjects to acquire the 1-back reinforcement task would be affected by instructions that they are given: none (analogous to the task for pigeons), “use your intuition” (thought to encourage subjects to use implicit learning), or previous trial hint instructions (thought to provide an explicit hint to task solution).

Methods

Subjects

The subjects were 103 University of Kentucky students obtained from the SONA psychology subject pool. Of the subjects, 66 identified as female and 39 as male. The ages of the participants ranged from 18 to 50 years. The University Internal Review Board approved the study, and written informed consent was obtained from all participants prior to participation in the experiment.

Apparatus and procedure

The experiment was conducted in a small windowless room 3.2 m wide × 2.3 m deep with a table on the long side divided in two by a partition that rested on the table separating the subject from the experimenter. The task was a 1-back reinforcement, symbolic matching to sample task presented on a (19 in. diagonal) computer monitor. The stimuli consisted of a yellow (RGB values R = 255, G = 255, B = 0) or blue (RGB values of R = 0, G = 0, B = 255) colored circle (3.8 cm diameter) that appeared in the center of the computer screen (the sample). A click (with a mouse) to the sample presented a red (RGB values R = 255, G = 0, B = 0) and a green circle (RGB values G = 255, R = 0, B = 0) comparison stimulus (3.8 cm diameter) to the left and right of the center circle, spaced 2.5 cm (edge to edge) from the center stimulus.

Subjects were randomly assigned to one of three instruction conditions. Subjects in the No Instruction Group were given no additional instructions (n = 34). Subjects in the Intuition Group saw “The best way to solve this task is to not overthink it. Go with your intuition” (n = 36). Subjects in the Previous Trial Hint Group saw “What you did on the last trial is important to the feedback you will get on the current trial” (n = 33). The number of males and females in each group was approximately the same.

At the start of each session the following instructions were printed on the computer screen for all subjects: “At the start of each trial, you will see either a yellow or a blue circle. Click on the circle. You will then see two circles, one red and the other green on either side. You should click on one of them.”

Clicking either comparison stimulus ended the trial and started a 3-s intertrial interval. No feedback was provided on the first trial. On the second trial, a randomly selected sample (yellow or blue) was presented. After clicking the sample, red and green comparison stimuli were presented. After comparison choice on Trial 2, if the subject had chosen the correct comparison stimulus on Trial 1, a 1-s tone sounded and a “+ 1 point” visual stimulus appeared during the first second of the intertrial interval. An incorrect response on trial N led to no feedback on Trial N + 1. The correct sample-comparison relations were determined randomly for each subject (red for yellow samples and green for blue samples, or green for yellow samples and red for blue samples). Trials proceeded in this way, with the tone and point feedback dependent on whether the response was correct on the preceding trial. “Total points obtained” also appeared on the right side of the screen and was updated following each “+1 point” printed to the screen. Each subject received 100 trials of this task. The task took approximately 10 min to administer. After task completion, each subject was asked “what rule did you use to solve the task?”

Data analysis

A cumulative plot of the number of subjects reaching a criterion of nine out of ten trials correct in each ten-trial block for each instruction group was constructed and the total number of subjects reaching criterion by the last trial block was subjected to a Chi-square analysis to determine the effect of the instructions on learning. For subjects who reached criterion on any block of trials, to assess the all-or-none nature of the learning, a backward learning curve was created by plotting backward from the first moving ten-trial block on which each subject reached criterion. We were also interested in the relation between the accuracy of the subjects and the rule that they said they used to perform the task.

For comparison purposes, using pigeon data from Zentall et al. (in press), backward learning curves were calculated for pigeons, using the first session that each pigeon reached 70% accuracy on a 1-back reinforcement, color-matching task similar to the one used in the present study. A lower level of criterial accuracy was used for the pigeons because they failed to reach a higher level of accuracy. Most of the pigeons appeared to asymptote at about 70% correct.

Results and discussion

The 1-back reinforcement task proved to be a very difficult task for our subjects. Only about half of the subjects learned to perform to the learning criterion of nine out of ten trials correct on a block of ten trials. A cumulative plot of the number of subjects who learned the 1-back reinforcement task in each of the three groups, in each of the ten blocks of ten trials, is shown in Fig. 2. A Chi square analysis performed on the last block of ten trials indicated that the three groups were not statistically different, χ2 (2, 105) = 1.40, p = .496.

Fig. 2
figure 2

Cumulative proportion of subjects reaching criterion of nine trials out of ten correct in a block, for each instruction group

As the subjects who did learn did so at different points in training, to get a better idea of their rate of learning, we plotted a mean backward learning curve for the subjects. Because differences in the instructions did not appear to have an important effect, the data from the three groups were combined. The backward learning curve for the human subjects, plotted from the first criterial moving trial block, is presented in Fig. 3. As one might expect, if subjects were learning explicitly, they were not better than chance on the trial block prior to the criterial block.

Fig. 3
figure 3

Backward learning curve for human subjects with criterial block (C; nine out of ten trials correct). Error bars = ± 1 standard error of the mean

For comparison purposes, the backward learning curve from pigeons trained on the same symbolic matching, 1-back reinforcement task (Zentall et al., in press) are presented in Fig. 4. For the pigeon backward learning plot we considered the criterion as the highest level of accuracy on a given session that almost all of the pigeons achieved, 70% correct.

Fig. 4
figure 4

Backward learning curve for pigeon subjects with criterial session block (C; first session 70% correct or better) and one session post criterion. Error bars = ± 1 standard error of the mean

Although direct comparison of Fig. 3 with Fig. 4 would be difficult because of the large differences in the amount of training and the low level of task accuracy attained by the pigeons, two aspects of the two graphs are apparent. First, the humans show little evidence of learning the task prior to the criterial block of trials. In fact, prior to the criterial block, several of the subjects appeared to be choosing some trials based on the incorrect color association. The pigeons, on the other hand, show somewhat gradual learning of the task, at least for about five sessions prior to the criterial session. Second, although the criterial block of trials for humans shows a high level of accuracy, suggestive of explicit learning of rules, the pigeons never attained as high a level of task accuracy as the humans.

It should be noted that there is an artifact in the plot of the pigeons’ data resulting from the use of a criterion. For the pigeons, criterion was defined as the first session on which a pigeon performed at 70% accuracy or better. This criterion necessarily means that on the penultimate session, accuracy for each of the pigeons would have had to have been below 70% correct. Thus, there was necessarily an increase in accuracy for each pigeon from the session immediately before criterion to the criterial session, that would not be true of any other session. As can be seen in Fig. 3, however, the pigeons averaged about 60% correct on the session prior to the criterial session and they showed gradual learning on the preceding sessions. This gradual learning would suggest that implicit learning was involved, learning quite different in kind from that of the humans.

Curiously, the sharp rise in accuracy (about 15%) from Session C-1 to Session C suggests that the increase in accuracy was not solely due to this artifact. In fact, one measure of the artifact can be obtained by considering the difference between accuracy on Session C+1 (the session immediately following the criterion session) compared with Session C. This comparison indicates that there was, in fact, a small drop in accuracy of about 3% on Session C+1. That suggests that most of the remaining difference between Session C-1 and Session C, about 12% correct, cannot be accounted for by the artifact. Such a large increase in accuracy would not be expected to result from implicit learning. On the other hand, if some form of explicit learning was involved, one would have expected accuracy to have been considerably better than 70% correct, and as noted earlier, the gradual increase in accuracy over the approximately five sessions prior to the criterial session suggests that the learning was implicit.

Given the nature of the 1-back task, we were interested in how the human subjects had learned the task. Not surprisingly, most of those subjects who chose the correct comparison stimuli significantly better than chance over the course of training (binomial test, 59 out of 100 trials, p = .044), also were able to describe the rules relating the sample stimulus to the correct comparison stimulus.

For the No Instruction Group, 16 of the 18 subjects who performed better than chance correct were able to describe the sample-comparison rules (89%). For the Intuition Group, 13 of the 16 subjects who performed better than chance correct described the sample-comparison rules (81%). For the Previous Trial Hint Group, seven of the 11 subjects who performed better than chance correct described the sample-comparison rules (64%). None of the subjects who did not perform better than chance were able to articulate the rules.

Surprisingly, however, when the subjects were asked how they had learned to solve the task, not one subject mentioned the 1-back reinforcement rule. Although most of the subjects who performed well mentioned which sample went with which comparison stimulus, they did not appear to be aware of the 1-back rule. Apparently, the subjects learned this task without learning the 1-back rule. How could they have learned this task without learning the 1-back rule? They could have performed well on this task without learning the 1-back rule if they happened to have been correct on Trial N - 1 and they had also been correct on Trial N, but they attributed the feedback on Trial N incorrectly to the current trial rather than to the preceding trial. As long as they persisted with that attribution, they would have “solved” the task without actually learning the intended 1-back reinforcement rule. Of course, the probability of two trials in a row being correct by chance was only 25% and, furthermore, it should have been confusing to them because on earlier trials, what subjects thought was an error, because they received no reward, actually may not have been an error. In a sense then, subjects who performed well on this task may have done so because they neglected some of the earlier feedback. That is, when they matched incorrectly on one trial and then matched correctly on the next trial, the feedback that they would have received on that second trial would have suggested that they were wrong, when, in fact, they would have been correct.

On the one hand, contrary to the suggestion by Smith et al. (2020), the present results suggest that the 1-back reinforcement task is not an appropriate task to assess learning of the 1-back rule, at least not for humans. Human subjects can perform this task at a high level of accuracy without actually learning the 1-back rule. By misinterpreting the feedback for their response on the prior trial as feedback for their response on the current trial, they are learning the matching rule without learning the 1-back rule. In a sense, they are taking a “short cut.” On the other hand, the rule that the humans did learn suggests strategic explicit learning of the simpler matching rule.

With regard to the way pigeons learn this task, the Zentall et al. (in press) results suggest that it is likely that pigeons do show some evidence of learning the task, and although they appear to do so implicitly, the backward learning functions suggest that something like explicit learning may be involved. Their slow learning, and to a level far below that of the human subjects, suggests that they may have been learning the 1-back task. When comparing pigeons on this 1-back reinforcement task with pigeons that have learned a similar symbolic matching task with reinforcement for a correct response on the current trial, the simpler symbolic matching task is learned much faster and to a much higher level of accuracy (see, e.g., Carter & Eckerman, 1975; Zentall & Hogan, 1974). Thus, had the pigeons in the 1-back experiments been using the same matching short-cut as the humans did without learning the 1-back rule, they should have achieved a much higher level of accuracy and have learned considerably faster. Thus, paradoxically, the pigeons may actually have learned the 1-back reinforcement task, whereas the humans may have bypassed the 1-back rule.

To summarize, the results of the present study together with the results of Nosarzewska et al. (2021) and Zentall et al. (in press) suggest that humans and pigeons learn the 1-back reinforcement task differently and accurate performance of the 1-back reinforcement task cannot distinguish between implicit and explicit learning. Taken together, the results of research with the uncertain response, research comparing rule-based learning with information-integration learning, and 1-back reinforcement learning do not appear to provide evidence that animals learn these tasks by explicit learning. Whether backward learning functions provide an alternative means to demonstrate evidence for explicit learning by animals will have to await further research.