Introduction

Throughout the ages correct categorization has remained essential for human survival. In prehistory, classifying a predator as a harmless animal was mortal. Nowadays, classifying traffic signs correctly is important to avoid accidents. These examples explain why categorization has received continuous attention in the field of cognitive science (e.g., Ashby & Maddox, 2005; Ashby & Maddox, 2010; Medin & Schaffer, 1978; Nosofsky, 1987; Pothos & Chater, 2002). Although many categories that people use are acquired during childhood (French, Mareschal, Mermillod, & Quinn, 2004), adults also learn new categories. In the human category of learning research the focus is on the learning process itself. In order to understand this learning process, exemplars and non-exemplars of unfamiliar categories are typically presented (Ashby & Maddox, 2005). The behavior of participants is observed during the period when their ability to assign stimuli to these categories increases from chance level to a certain stable above-chance level (Ashby & Maddox, 2005).

Supervised and unsupervised learning

In the past, the human category of learning was studied using supervised or unsupervised learning paradigms. In supervised learning paradigms, the participant is presented with a stimulus that has to be classified into two or more contrasting categories. Immediately after this response, feedback is always provided about the correct category label. Generally the participant knows the number of contrasting categories in advance (see Shepard, Hovland, & Jenkins, 1961 for a description of a basic experiment). Numerous studies have demonstrated that, based on this paradigm, participants can learn very complex categories if a sufficient number of trials is provided (e.g., Ashby, Queller, & Berretty, 1999; Ashby, Maddox, & Bohil, 2002; Maddox, Filoteo, Hejl, & Ing, 2004c; McKinley & Nosofsky, 1995; Medin & Schwanenflugel, 1981; Maddox, Ashby, & Gottlob, 1998). In unsupervised learning paradigms, the participant never receives feedback or information about the category to which the presented stimulus belongs. The goal is to identify an intuitive or natural classification for a set of objects (Clapper & Bower, 1994; Love, 2002; Medin, Wattenmaker, & Hampson, 1987; Milton, Longmore & Wills, 2008; Pothos & Chater, 2002, 2005; Pothos et al., 2011). Depending on the paradigm, the number of contrasting categories may or may not be known in advance. Findings based on such unsupervised paradigms reveal that performance is dominated by the use of unidimensional rules, regardless of the complexity of the underlying category structure or the number of training trials (Ashby et al., 1999). These unidimensional rules (e.g., “small stimuli belong to category A and large stimuli to category B”) are easy to verbalize and to apply, whereas complex categorization rules are mostly hard to express. In conclusion, in unsupervised learning people have the tendency to use very simple categorization rules, whereas in supervised learning participants are able to learn very complex categorization structures.

Semisupervised learning

Vandist, De Schryver, and Rosseel (2009) argued that both supervised and unsupervised learning are ecologically rare. Translated to daily life, supervised learning means that for every object that we observe we immediately receive correct information about its category label. In most category learning situations, it seems very unlikely that this occurs after each single encounter of a category member. Strictly speaking, supervised learning would imply that, when we walk in the woods, a label “tree” is attached to every single tree. Moreover, this information is unambiguous, implying that the information provider and receiver always mean the same object. However, ambiguity often occurs in very rich environments. For example, when walking in the woods a parent might point to a bird and call it “bird”, while the child may be watching a nest and hence learns the wrong label. Unsupervised learning on the other hand entails that we never receive any information about object categories. This implies that during our entire lives, nobody ever informs us about the name of an object or about which objects belong together. Both types of category learning therefore do not represent our daily reality. In a previous study Vandist et al. (2009) argued that people instead learn in a semisupervised way: when confronted with (new) objects (e.g., a dog), sometimes category information will be provided (“look, a dog”) and sometimes not. This idea is supported by Gibson, Rogers, and Zhu (2013). In the semisupervised category learning paradigm this realistic scenario is incorporated. In a block of trials a predetermined percentage of category responses is followed by feedback (i.e., feedback or labelled trials). The remaining trials do not receive feedback (i.e., no-feedback or unlabelled trials). For example, a block in a 25% semisupervised classification learning paradigm consists of 25% feedback trials and 75% no-feedback trials.

Vandist et al. (2009) compared the effects of this semisupervised learning process to supervised and unsupervised learning processes by using the information-integration structure. This category structure is frequently used in category learning research (e.g., Ashby et al., 2002; Ashby & Ell, 2001; Ell & Ashby, 2006; Maddox, Ashby, Ing, & Pickering, 2004a; Maddox & Filoteo, 2011; Maddox & Ing, 2005; Maddox, Pacheco, Reeves, Zhu, & Schnyer, 2010b; Paul, Boomer, Smith, & Ashby, 2011; Spiering & Ashby, 2008a, b; Vermaercke, Cop, Willems, D’Hooge, & Op de Beeck, 2014). Figure 1 shows an example of the information-integration category structure used in the experiments reported in the current study. Although the within-category correlation is very high, this structure is difficult to learn. To obtain high-level performance, participants have to combine the perceptual information of the underlying stimulus dimensions simultaneously at some predecisional stage (Ashby & Gott, 1988). This perceptual integration could take many forms – in this case, by calculating a weighted linear combination of the dimensional values. The optimal decision bound is almost impossible to describe verbally (Ashby, Alfonso-Reese, Turken, & Waldron, 1998) and it cannot readily be discovered via an explicit reasoning process (Ashby & O’Brien, 2007), which makes the category structure difficult to master for humans (Vermaercke et al., 2014). Feedback is essential to learn the structure successfully (Ashby et al., 1999).

Fig. 1
figure 1

An example of the information-integration category structure used in the learning task in Experiment 1. The “A” stimuli are shown in squares, the “B” stimuli in solid circles. The decision bound that divides the two categories perfectly is shown in black. The Y-axis is the Orientation dimension, the X-axis the Frequency dimension

The results of the study of Vandist et al. (2009) indicated that, as expected, learning the information-integration structure was successful in the supervised condition but not in the unsupervised condition. In a 50% semisupervised condition, participants managed to learn the structure, suggesting that feedback after every trial is not necessary to learn the complex structure.

The impact of no-feedback trials in semisupervised learning

An additional goal of the study of Vandist et al. (2009) was to understand the contribution of the no-feedback trials on the learning process. In the no-feedback trials, participants were shown a stimulus, processed it, and categorized it. It was only clear after the categorization that no feedback followed. Crucially, it was investigated whether the processing of the stimulus in the no-feedback trials had an impact on the learning process or whether the experience was simply neglected. To achieve this, the no-feedback trials were replaced by irrelevant fillers, where no categorization whatsoever takes place. The number of feedback trials was identical in the condition with no-feedback trials and the condition with filler trials. The results indicated that the learning process in both conditions was similar. Hence, the no-feedback trials neither harmed nor helped learning. Apparently, when we encounter an object early in the learning process, we classify it, and when no feedback follows, this has no effect on our category learning. The semisupervised learning was also studied giving feedback after 25% of the trials. In this 25% semisupervised condition, participants failed to learn the task. This failure was not due to the low relative percentage of feedback trials in a block, because when the absolute number of trials was doubled (i.e., 25% feedback was maintained, but twice as many feedback trials were received), almost all participants were able to master the category structure. Thus, when given enough trials, even 25% feedback sufficed to learn the category structure. Again, this result suggests that the no-feedback trials have little impact on the initial learning process, but that learning is rather determined by the absolute number of feedback trials one receives.

An important question is whether these findings imply that people encounter objects, classify them and then simply delete this experience because no confirmation or correction is provided. If so, this would be in contrast to findings from machine learning where machines do use no-feedback trials to extend the knowledge gained from feedback trials. Remarkably, when supervised and semisupervised machine learning are compared, semisupervised machine learning can even achieve faster optimal performance (Chapelle, Scholkopf & Zien, 2006; Zhu & Goldberg, 2009). In machine learning, semisupervised learning is therefore the method used most often, also due to practical implications: semisupervised learning requires fewer feedback items, which must be annotated one by one by humans and therefore reduces time investment (Zhu & Goldberg, 2009).

Since Vandist et al. (2009) several human semisupervised category learning studies were conducted and the findings are not always consistent. In the study of Mcdonnell, Jew, and Gureckis (2012) no impact of the no-feedback trials was found. In this study the category label was shown only on some trials, but not on others. In the labelled trials all stimuli originated from one subset of the full category. In the unlabelled trials, the presented stimuli covered the full category. After the training phase, the category presentation of the participant was tested. Mcdonnell et al. (2012) found that a large weight was given to the labelled stimuli, making the unlabelled trials irrelevant.

In other studies, the impact of the no-feedback trials depended on the circumstances. First, semisupervised learning was observed in a speeded classification task but not if the responses were self-paced (Rogers, Kalish, Gibson, Harrison, & Zhu, 2010). Second, participants did use the no-feedback trials when the underlying categories were distinct and the gap between the categories was big. However, if the underlying categories were more ambiguous and the space between the categories was small but still existing, no effect of the no-feedback trials was found (Vong, Perfors, & Navarro, 2014). Third, Kalish, Zhu, and Rogers (2015) showed that the effect of the no-feedback trials depends on the age of the participants: young children (between 4 and 6 years old) were influenced by the no-feedback trials whereas no effects were found for older children (between 7 and 8 years old).

Finally, some studies did show that the no-feedback trials aided learning (Gibson, Rogers, Kalish, & Zhu, 2015; Kalish, Zhu, & Rogers, 2011; Lake & McClelland, 2011; Zhu, Gibson, Jun, Rogers, Harrison, & Kalish, 2010). However, all of these studies used unidimensional stimuli and a simple underlying category structure. Feedback was always given after a specific subset of stimuli. Based on these feedback trials only, a certain decision bound that splits the two categories can be expected. The stimuli of the no-feedback trials had a different mean and distribution than the stimuli of the feedback trials because the latter were extremes of the category. If participants take these no-feedback trials into account, the decision bound will be shifted. These studies showed that participants indeed use a shifted decision bound, implying that the no-feedback trials do have an impact on the learning process (Gibson et al., 2015; Kalish et al., 2011; Lake & McClelland, 2011; Zhu et al., 2010). Still, it is unlikely that in our daily life feedback is always provided after the same subset of examples and other examples of the category are never followed by feedback. Contrarily, we believe that every example of a category can be followed by feedback.

Automaticity

Given the inconsistent research results on human semisupervised learning and the advantages of semisupervised learning in machines, we aim to further investigate the role of no-feedback trials in the human semisupervised category of learning. In the current study, we specifically investigate the role of no-feedback trials in developing automaticity. Once a learner reaches automaticity, cognitive or motor skills are executed faster, more accurately and require less attention in comparison to initial learners (Ashby & Crossley, 2012; Ashby, Turner, & Horvitz, 2010). Although various definitions and criteria of automaticity exist, researchers agree that automaticity is the result of extensive overtraining after the skilled behavior is well learned (Ashby et al., 2010; Hélie, Waldschmidt, & Ashby, 2010; Moors & De Houwer, 2006; Schneider & Chein, 2003; Nosofsky & Palmeri, 1997; Shiffrin & Schneider, 1977). Especially in categorization this is the main consensus since several studies showed that the criteria for automaticity proposed by Schneider and Shiffrin (1977) as no interference of dual task performance (Waldron & Ashby, 2001; Zeithamova & Maddox, 2006, 2007) and decrease in performance after switching keys (Ashby, Ell, & Waldron, 2003; Maddox, Bohil, & Ing, 2004b; Maddox, Glass, O’Brien, Filoteo, & Ashby, 2010a; Spiering & Ashby, 2008a) already apply for initial information-integration category learning. Consequently, in this article automaticity will be defined as the result of overtraining after good performance was obtained.

In cognitive science, two influential models of expertise presented in the literature are Logan’s (1988) instance theory of automaticity and Rickard’s (1997) component power laws theory. Both models assume that feedback remains essential through the development of automaticity and hence only make predictions about supervised learning, not about semi-supervised learning. In category learning, two important models explicitly deal with automaticity: the Exemplar-based random walk model (EBRW-model) of Nosofsky and Palmeri (1997) and the Subcortical Pathways Enable Expertise Development (SPEED model) of Ashby, Ennis, and Spiering (2007). The EBRW-model assumes that expertise develops as the number of stored exemplars increases. The more stored exemplars, the faster the response will be elicited. Since only exemplars followed by feedback will be stored (and activated as belonging to the category), supervised learning is essential and no clear predictions can be made about semisupervised learning based on this model. For semisupervised learning, the most relevant model about the development of automaticity in the information-integration structure is the SPEED-model. The SPEED-model assumes that categorization is regulated by two different pathways, a slow and a fast one. The slow pathway is supposed to originate in the visual cortex, passes by the basal ganglia and the thalamus, and ends in the premotor cortex. This is an indirect and subcortical pathway that includes at least four synapses. When positive feedback is given (after a correct categorization), dopamine in the striatum will be released and the active synapses will be strengthened. When negative feedback is given (after an incorrect answer) or no feedback at all, the strength of the synapses will be weakened. On the contrary, Ashby et al. (2007) state that the fast pathway only involves one synapse. This is a direct route from the visual association areas to the premotor cortex. In this cortical-cortical pathway, synapses are strengthened when there is both pre- and postsynaptic activation (i.e., Hebbian learning). This occurs independent of feedback.

In the SPEED-model, the development of categorization automaticity is defined as a gradual process. Early in learning, the main pathway is the slow subcortical pathway. As learning progresses, the fast cortical-cortical pathway becomes more salient and the subcortical pathway becomes less important. Eventually, experts only rely on the cortical-cortical pathway for their categorization (Ashby et al., 2007). Because this pathway is independent of feedback and the strength of the connections increases with the number of categorization responses, SPEED predicts that late in learning every type of trial will strengthen the connections, regardless of whether the categorization response is followed by feedback. Hence, late in learning, adding extra no-feedback trials to the training would have an impact on the development of automaticity and faster response times can be expected.

To test this hypothesis in Experiment 1, participants were trained in a supervised way on the information-integration structure for 2 days. After reaching an expert level with regards to the trained category structure, half of the participants continued to practice supervisedly on days 3 and 4. The other half practiced according to a 25% semisupervised scheme. Both groups of participants received an equal amount of feedback trials, implying that the semisupervised group received four times as many trials as the supervised group. Based on the premises of the SPEED-model, we hypothesize that the categorization in the semisupervised condition will be more automatic, as indexed by faster response times. If the no-feedback trials in semisupervised learning have no impact on the automaticity process, a similar level of automaticity (and thus equal response times) should be observed in both conditions.

Experiment 1

Method

Participants

In total 34 participants (22 women, average age 21.4 years, SD=1.97, range=18–26 years) took part in the experiment in return for payment. If participants participated for 2 days, they received 20 euro; if they participated for 5 days, the payment was 35–40 euro.

Design

The experiment was organized on five consecutive days. In this way learning could benefit from between-session consolidation due to sleep (Censor, Karni, & Sagi, 2006; Stickgold, James, & Hobson, 2000a; Stickgold, Whidbee, Schirmer, Patel, & Hobson, 2000b; Stickgold & Walker, 2005). Participants were randomly divided into two conditions: the semisupervised condition (n = 19) and the supervised condition (n = 15). The first 2 days were equal for both conditions: on each of these days, 400 training trials were presented, divided into five blocks of 80 trials. Each trial was followed by feedback. The goal of this training phase was to master the category structure. Participants who achieved an average accuracy rate of 90% or more on the last two blocks of the second day were invited to the following phase. Participants who did not reach this expert level were excluded from the remainder of the experiment. On the third and fourth day, participants in the semisupervised condition were presented with 640 trials (eight blocks of 80 trials) and 25% of these trials were randomly followed by feedback. Participants in the supervised condition were shown 160 trials (two blocks of 80 trials) that where all followed by feedback. Consequently, the number of feedback trials was equal in both conditions. On the fifth and final day the test phase took place where all participants received 134 trials in one block. None of these trials were followed by feedback. It was decided to organize this “test” on a new day, ensuring that participants in both conditions were equally fit. Table 1 summarizes the differences between the two conditions.

Table 1 Number of trials in each condition of Experiment 1

Stimuli and apparatus

The experiment was conducted using Tscope (Stevens, Lammertyn, Verbruggen, & Vandierendonck, 2006). Participants viewed the stimuli on a 17-in. LCD monitor with an 800 × 600 resolution at a distance of approximately one arm’s length. The stimuli were gray 300 × 300 square-pixel Gabor patches, presented on a black screen. Two examples of Gabor patches can be seen in Fig. 2. In this study the “gratings” varied continuously on two dimensions: the spatial orientation and the spatial frequency. These dimensions are perceptually separable. The arbitrary stimulus coordinates were converted to physical units using the following transformations: spatial orientation was referred to in x degrees, with x varying between 0 to 100 degrees. Spatial frequency, expressed in cycles/pixel, was converted using f(y)=0.01+(y/1500), with y varying between 0 and 100. These coordinates originated from the information-integration category structure, an example of which is displayed in Fig. 1. The optimal decision bound, which classifies the stimuli perfectly in two categories, is diagonal. In the semisupervised condition, participants viewed 2,080 stimuli in the first 4 days. These stimuli were generated by randomly sampling from two bivariate normal distributions, leading to 1,040 “A” stimuli and 1,040 “B” stimuli. Category A had a different mean to category B, but the variance and the covariance of both categories were the same. Due to random sampling, the optimal decision bound varied slightly from block to block, although the mean optimal decision bound in one day was y=x. The exact parameter values are shown in Table 2. As in the Ashby et al. (1999) study, the mean, the variance, and the covariance values were chosen in such a way that a linear decision bound based on one dimension would account for an accuracy of maximum 80%. The stimuli in the supervised condition were constructed in the same way as in the semisupervised condition, the only difference being that in this condition participants viewed 1,120 stimuli in the first 4 days, 560 “A” stimuli and 560 “B” stimuli. On the last day, all participants viewed 134 fixed stimuli, half of which were depicted from the stimuli A range; the other half originated from the stimuli B range, as can be seen on Fig. 3. Again, the optimal decision bound was y=x and the category mean was identical to that of the previous days.

Fig. 2
figure 2

Two examples of Gabor patches

Table 2 Parameter values that define the categories of Experiment 1
Fig. 3
figure 3

An example of the information-integration category structure used in the test (= day 5) in Experiment 1. The “A” stimuli are shown in squares, the “B” stimuli in solid circles. The decision bound that divides the two categories perfectly is shown in black. The Y-axis is the Orientation dimension, the X-axis the Frequency dimension

Procedure

All participants were tested individually in a dimly lit room. On the first 2 days, participants were informed that they would see stimuli that would appear one by one and that originated from two categories A and B. They were asked to respond by pressing A on the keyboard if they believed that the stimulus was an A and to press B when they believed the stimulus was a B. Participants were informed that they would receive feedback (i.e., the true category label) after each category response. They were also informed that it was possible to do the task without errors. Participants were told that at the end of day 2 the accuracy would be calculated and only the participants who achieved an average accuracy of 90% or more would be allowed to continue to the next days. At the end of each block, the percentage of the correct responses was printed on the screen. This percentage additionally encouraged them to do better in the next block. A trial started when a stimulus was projected in the middle of the screen until the participant responded. Immediately after the response the stimulus disappeared. The response time was self-paced. After the response, the feedback, consisting of correct/incorrect and the right category label, became visible at the bottom of the screen for 1,500 ms. After that a new trial started. The procedure on days 3 and 4 was similar, except that in the semisupervised condition participants were informed that some trials would be followed by feedback and other trials not. In the no-feedback trials the stimulus disappeared immediately after the response and the screen remained blank for 1,500 ms. Afterwards a new trial started. Hence, there is no difference in post-response events on feedback trials and no-feedback trials except on the appearance of the feedback on the screen. Again, in both conditions participants were informed that it was possible to obtain maximum accuracy and they were encouraged to achieve this. On day 5 participants were informed that they would see similar stimuli to those in the preceding days and that feedback would no longer be given. In contrast to the first 4 days, participants were urged to respond as quickly as possible. After a response was given, the stimulus disappeared and the screen remained blank for 1,500 ms, after which a new trial started. To encourage the participants to respond as fast as possible, the stimulus disappeared when a response was not given within a time limit of 1,800 ms. In this case, the message “Respond faster” was shown during the subsequent intertrial interval of 1,500 ms. The trials in which the participant responded too slowly were presented again at the end of the block. Thus, for each participant we collected 134 valid categorization responses on day 5.

Results

Selection of participants

Before analyzing the response time patterns in both conditions, it was essential to ensure that the participants mastered the category structure at the end of day 2. Therefore accuracy and model-based analyses were performed. High accuracy rates indicate that the participant made few errors. Nevertheless, it is still unclear whether these errors were just random mistakes or systematic faults. The model-based analyses are a necessary complement to the accuracy. These models were calculated based upon the responses of the participant on the last two blocks of day 2. For each model the corresponding BIC score were calculated. The best fitting model was the model with the lowest BIC score. This model is supposed to correspond to the strategy that the participant used to solve the categorization task. The strategy that matched perfect performance in this experiment is called the optimal decision bound. Combining both criteria rules out the possibility that a participant increased the accuracy during the experiment, but this improvement was not reflected in the model-based analyses. This can be the case if the errors are systematic and a strategy other than the optimal decision bound was preferred. Therefore, only participants that passed both criteria were retained for further analyses.

Criterion 1: High accuracy

Participants could achieve perfect accuracy when using the optimal decision bound (i.e., by integrating the information from the two stimulus dimensions at some predecisional stage). On the other hand, using a unidimensional decision rule, the accuracy could never exceed 80%. This implies that participants with an achieved accuracy of more than 80% probably adopted a (suboptimal) information-integration decision rule. However, since our study aimed at studying automaticity after becoming an expert learner, category learning was considered successful when an average performance of at least 90% was obtained. As can be seen in Table 3, 11 participants did not pass this criterion and were therefore excluded from further analyses.

Table 3 Mean accuracy (%) and model-based analyses (BIC scores) of the last two blocks of day 2 (i.e., blocks 9 and 10) for every participant of Experiment 1

Criterion 2: Optimal decision bound

Figures 1 and 2 in the Supplementary Materials show the actual responses during the last two blocks of day 2 for each of the participants retained after applying Criterion 1. These responses (i.e., whether a stimulus belongs to category A or category B) form the basis on which the individual decision bounds were calculated. Four different types of models were fit to each participant’s response (see the Appendix for details). These models were introduced by Ashby and Gott (1988) and Ashby and Maddox (1993). Three models, namely the horizontal unidimensional model (DIM-O), the vertical unidimensional (DIM-V) and the general conjunctive classifier (GCC), are rule-based. If participants adopted one of these category decision strategies, category learning failed. The last model, the general linear classifier (GLC), is an information-integration model. Only with this category decision strategy perfect accuracy could be obtained. Consequently, if the general linear classifier was used and this decision bound fell in between the two categories, learning was successful. The model parameters were estimated using the method of maximum likelihood. To select the best-fitting model to the data, the model with the smallest Bayesian Information Criterion (BIC) was selected. The BIC penalizes according to the number of free parameters. BIC is defined as BIC = rlnN-2lnL, where r is the number of free parameters, N is the sample size and L is the likelihood of the model given the data (Schwarz, 1978). The BIC values of the four models for each participant are presented in Table 3. In the semisupervised condition, all participants except participant 4 favored the general linear classifier. Hence, participant 4 was excluded from further analyses. As can be seen in Fig. 1, all optimal decision bounds fell between the two categories. In the supervised condition, all participants favored a strategy based on the general linear classifier. As can be seen in Fig. 2, all optimal decision bounds fell between the two categories except for participant 15. Hence, participant 15 was excluded from further analyses. As a result, the final sample used in the subsequent analyses consisted of 21 participants (n=11 semisupervised and n=10 supervised), the average age was 21.3 years (SD=1.88, range 18–24 years), and 15 of them were women.

In the following sections, four types of analyses are described: accuracy and model-based analyses to define the strategy used by the participant, response time (RT) analyses, and the speed-accuracy trade-off analyses. The response time analyses were studied to compare the semisupervised learning process to the supervised learning process.

Accuracy analysis

Figure 4 shows the average percentage of correct responses and the 95% confidence intervals on each block of trials received during the first 4 days for the supervised and semisupervised condition separately. In the semisupervised condition, the accuracy was based on the feedback trials only. Eighty feedback trials were grouped into a block to facilitate comparison to the supervised condition. As expected, the learning process was similar in both conditions. The mean accuracy increased from an average of 73% (SD=11.06) in the first block for the semisupervised condition and an average of 75% (SD=11.72) for the supervised condition to almost perfect accuracy in the last block of day 2 (97%, SD=2.76 and 95%, SD=3.62, respectively). During the blocks on days 3 and 4, the mean accuracy remained high in both conditions. In the semisupervised condition, mean accuracy on the last response block on day 3 was 98% (SD=1.80) and 98% on day 4 (SD=1.89). Similarly, in the supervised condition mean accuracy on the last block was 96% (SD=2.95) on day 3 and 96% (SD=3.41) on day 4. A repeated measures ANOVA was conducted to determine whether the mean accuracy on the last two blocks differed depending on the day (4 levels: day 1, 2, 3, and 4) and condition (2 levels: supervised and semisupervised). Not surprisingly, there was a main effect of day, F(3,17)=13.97, p<.001, ηp2=.71, indicating that the accuracy increased during the succeeding days. Paired sample t-tests using the Bonferroni correction for multiple comparisons showed that in comparison to day 1, mean accuracy was significantly higher on days 2, 3 and 4 (resp. t(20)=6.41, p<.001; t(20)=6.80, p<.001; t(20)=7.02, p<.001). There was no main effect of condition, F(1,19)=1.71, p=.21, ηp2=.08, nor an interaction between day and condition (F<1, p=.99, ηp2=.007), suggesting that the accuracy in both conditions increased similarly across days. The accuracy on the test (day 5), where the speed of responding was stressed, was lower in both conditions compared to the accuracy reached at the end of day 4: 87% (SD=6.53) in the semisupervised condition and 85% (SD=4.37) in the supervised condition. This difference was significant for the semisupervised condition, t(10)=5.42, p<.001, d=1.64, and for the supervised condition t(9)=7.46, p<.001, d=2.36. Finally, an independent sample t-test revealed that there was no difference in accuracy between the two conditions on the fifth day, t(19)=0.91, p=.37, d=0.40.

Fig. 4
figure 4

Mean accuracy (%) along with the 95% confidence intervals by block for all participants in the Semisupervised and Supervised Condition of Experiment 1 from day 1 (blocks 1–5), day 2 (blocks 6–10), day 3 (blocks 11–12), day 4 (blocks 13–14), and day 5 (test). Only the participants who reached a performance of minimum 90% at the end of day 2 and revealed a decision bound based on the optimal decision bound were included in the analyses. In the Semisupervised condition, the accuracy for every 80 feedback trials is used

Model-based analysis

Table 4 shows the four model fits on day 4. In both conditions, all participants preferred a decision bound based on the general linear classifier, indicating successful learning. Figures 3 and 4 in the Supplementary Materials show the actual responses during the last two blocks of day 4 for each participant. Table 5 presents the model fits on the test day (i.e., day 5): in the semisupervised condition, most participants chose a strategy based on the optimal decision bound, except for participants 6, 12, and 19. For these participants a strategy based on the general conjunctive classifier fitted slightly better than the optimal decision bound. Figures 5 and 6 in the Supplementary Materials present the responses and the best fitting decision bound for every participant during the test day. In the supervised condition, most participants preferred the optimal decision bound, except for participants 1 and 18. For participant 1, the BIC score of the general conjunctive classifier was slightly lower than the general linear classifier. Participant 18 clearly preferred a strategy based on the general conjunctive classifier.

Table 4 Mean accuracy (%) and model-based analyses (BIC scores) on the last two blocks of day 4 (i.e., blocks 7–8 semisupervised condition; blocks 1–2 supervised condition) for every participant of Experiment 1
Table 5 Mean accuracy (%) and model-based analyses (BIC scores) on all trials of day 5 for every participant of Experiment 1
Fig. 5
figure 5

Mean response times (ms) and 95% confidence intervals for the semisupervised and the supervised condition of Experiment 1 calculated on the last two blocks of each day

Fig. 6
figure 6

Mean accuracy (%) and 95% confidence intervals by block for all participants in the Almost supervised and Almost Unsupervised Condition of Experiment 2 and Semisupervised Condition of Experiment 1 from day 1 (blocks 1–5), day 2 (blocks 6–10), day 3 (blocks 11–18), day 4 (blocks 19–26) and day 5 (“t”). Only the participants who reached a performance of minimum 90% at the end of day 2 and revealed a decision bound based on the optimal decision bound were included in the analyses. All trials (feedback and no-feedback trials) were adopted in the analyses

Analysis of the response times

The mean response times (RTs) along with the 95% confidence intervals for the semisupervised and supervised condition from day 1 to day 4 are presented in Fig. 5. These RTs were calculated on the last two blocks of each day. For day 1, the mean RTs in the supervised condition was 944 ms (SD =145.48) whereas the mean RTs in the semisupervised condition was 858 ms (SD=131.22). Importantly for this investigation is that participants were equally fast in the last two blocks of day 2 (semisupervised mean RT=785 ms, SD=90.10 and supervised mean RT=801 ms, SD=122.22). This was confirmed by an independent sample t-test, t (19)=0.35, p=.73, d=0.15. On days 3 and 4, the mean RTs slowly decreased in the semisupervised condition to, respectively, 794 ms (SD=106.53) and 719 ms (SD=128.59). This decrease in mean RTs was also observed in the supervised condition: 719 ms (SD=108.98) on day 3 and 724 ms (SD=126.85) on day 4. A repeated measures ANOVA was conducted to determine whether the mean RTs on the last two blocks differed depending on the day (4 levels: day 1, 2, 3, and 4) and condition (2 levels: supervised and semisupervised). Not surprisingly, there was a main effect of day, F(3,17)=7.12, p=.003, ηp2=.56, indicating that the RTs decreased during the succeeding days. There was no main effect of condition, F<1, p=.84, ηp2=.002, but the interaction between day and condition reached significance F(3,17)=3.92, p=.027, ηp2=.41. Post hoc paired-sample t-tests, adjusted by the Bonferroni correction for multiple comparisons, indicated that in the semisupervised condition participants did not speed up by day, none of the paired sample t-tests was significant, all p>.09. In the supervised condition, responses on later days were all faster in comparison to day 1 (all p < .02). None of the other comparisons were significant (all p>.06). The decrease in RTs is thus only present in the supervised condition.

Decisive to test our hypotheses was the difference in RTs on day 5. In the semisupervised condition the mean RT was 579 ms (SD=104.08) whereas the mean RT in the supervised condition was 767 ms (SD =153.20). An independent sample t-test confirmed that this difference in RTs on the test day was significant, t(19)=3.32, p=.004, d=1.45. Participants in the semisupervised condition responded significantly faster than participants in the supervised condition. Paired sample t-tests showed that in the semisupervised condition, participants responded significantly faster on day 5 compared to day 4, t(10)=4.71, p=.001, d=1.42 whereas participants in the supervised condition responded equally fast on days 4 and 5, t(9)=1.39, p=.20, d=0.44.

Speed-accuracy trade-off

On the fifth day, participants were instructed to respond as fast as possible. This might result in a speed-accuracy trade-off (SAT): participants gave up decision accuracy in favor of decision speed (see Heitz, 2014).The speed-accuracy trade-off was calculated for each condition. Since data points are limited and errors are rare, the SAT is calculated by the Pearson correlation between the mean RT and the mean accuracy rate (Heitz, 2014). In the supervised condition, there was no SAT-effect, r=-.21, p=.57. In the semisupervised condition, there was a SAT-effect, r=.61, p=.046, implying that the faster participants responded, the more errors they made.

Discussion

The objective of Experiment 1 was to test the hypothesis based on the SPEED-model that late in learning, when automaticity develops, participants benefit from semisupervised learning, resulting in faster RTs. Therefore, participants were trained during two days until a certain expertise was gained and then either received feedback on all trials (supervised condition), or on 25% of the trials (semisupervised condition) for the next 2 days. On days 3–4 both conditions received an equal amount of feedback trials but the total number of trials differs: in the semisupervised condition, participants categorized a quadruple of trials compared to the supervised condition. The results clearly showed that the mean RTs on the test (day 5) were significantly faster in the semisupervised condition than in the supervised condition. The participants in the semisupervised condition revealed more automatic behavior than the participants in the supervised condition. Importantly, this difference in mean RTs on day 5 was not observed on day 2, ruling out the possibility that participants in the semisupervised condition were always faster. On day 5 a SAT-effect occurred in the semisupervised condition: participants who tended to respond fast, also made more errors. This was not the case in the supervised condition. Note that the mean accuracy was similar on day 5 in both conditions: even though semisupervised participants sacrificed accuracy for response times, they still performed at the same level with regards to accuracy as supervised participants.

There are three possible explanations for these findings. The first is that semisupervised learning does have an impact late in learning when automaticity develops and that it leads to faster RTs. Second, confirming the SPEED-model, it is possible that the higher number of trials in the semisupervised condition is accountable for the faster RTs. On days 3 and 4 participants in the semisupervised condition responded to a fourfold number of trials. The SPEED-model postulates that late in learning the fast pathway is dominant. This pathway is assumed to be independent of feedback. According to the SPEED-model, the multiple repetitions in the semisupervised condition lead to faster response times, regardless whether or not these trials are followed by feedback. Third, the results may have been influenced by a confound. The participants in the supervised condition never experienced no-feedback trials before day 5 and this inexperience might have slowed their performance. In Experiment 2 these explanations were addressed.

Experiment 2

Experiment 2 again examines whether late in learning the nature of feedback (i.e., feedback on every trial, occasional feedback, or no-feedback) has an impact on the development of automaticity, indicated by faster RTs. In order to do this without the confounds present in Experiment 1, two control conditions are run and compared to the semisupervised condition of Experiment 1: an almost supervised condition, in which 95% of the trials are followed by feedback and an almost unsupervised condition, in which 5% of the trials are followed by feedback.Footnote 1 Instead of using fully supervised and unsupervised control conditions, “almost” supervised and unsupervised conditions are purposefully chosen so that all participants have experienced no-feedback trials prior to day 5, ruling out the possibility that the novelty of no-feedback trials on day 5 influences the RTs of the supervised condition. The total number of trials in these two new conditions corresponds to the semisupervised condition of Experiment 1, to test the hypothesis that just the higher number of trials in the semisupervised condition of Experiment 1 led to faster RTs on day 5. If the total number of trials is indicative for the development of automaticity, similar RTs are expected in these two new conditions to those in the semisupervised condition of Experiment 1. This is the outcome predicted by the SPEED-model (Ashby et al., 2007). Contrarily, when the RTs in the two new conditions differ from the semisupervised condition of Experiment 1, this effect will be due to the different amount of feedback trials. If RTs in the almost supervised condition are faster than the semisupervised condition, either the higher amount of feedback trials in the supervised condition aids automaticity, or the higher amount of no-feedback trials in the semisupervised condition slows down learning. On the other hand, if RTs in the almost supervised condition are slower than the semisupervised condition, semisupervised learning aids the development of automaticity, despite the lower amount of feedback trials.

For the almost unsupervised condition, the SPEED-model predicts similar RTs to those in the semisupervised condition of Experiment 1, since only expert participants are selected. When RTs do differ from the semisupervised condition, this will provide us with insight into the minimum amount of feedback trials needed to successfully develop automaticity.

Method

Participants, design, stimuli, apparatus and procedure

In total 38 participants (28 women, average age 20.74 years, SD=3.18, range=18–30 years) took part in the experiment in return for payment. The background of the participants was similar to the participants of Experiment 1. Also, the time of testing in the academic year was comparable. If participants participated for 2 days, they received 10 euro; if they participated for 5 days, the payment was 30 euro. Participants were randomly divided into two conditions: the almost supervised condition (n=19) and the almost unsupervised condition (n=19). The organization of Experiment 2 was identical to the semisupervised condition of Experiment 1 except on the third and fourth days. In the almost supervised condition, participants were presented with 640 trials (eight blocks of 80 trials) and 95% of these trials were randomly followed by feedback, resulting into 608 feedback trials on days 3 and 4. In the almost unsupervised condition participants also received 640 trials (eight blocks of 80 trials) but only 5% of these trials were randomly followed by feedback, resulting into 32 feedback trials on days 3 and 4. Note that the total number of trials in the semisupervised condition of Experiment 1 is the same as the total number of trials in the two conditions of Experiment 2. Table 6 presents the conditions schematically.

Table 6 Number of trials in each condition of Experiment 2 and the semisupervised condition of Experiment 1

Results

Selection of participants

As in Experiment 1, accuracy and model-based analyses were performed to ensure that the participants mastered the category structure at the end of day 2.

Criterion 1: High accuracy

Recall that participants could achieve perfect accuracy in this task. As in Experiment 1, the criterion was an average performance of at least 90% on the last two blocks of day 2. As can be seen in Table 7, ten participants did not pass this criterion and were therefore excluded from further analyses.

Table 7 Mean accuracy (%) and model-based analyses (BIC scores) of the last two blocks of day 2 (i.e., blocks 9 and 10) for every participant of Experiment 2

Criterion 2: Optimal decision bound

Figures 7 and 8 in the Supplementary Materials show the actual responses during the last two blocks of day 2 for each participant who passed the accuracy criterion. As in Experiment 1, these responses were used to calculate the individual decision bounds of four different models and the corresponding BIC scores. The model with the lowest BIC score is presumably the strategy that the participant adopted in the last two blocks of day 2. Only participants who favored the general linear classifier model with a decision bound falling between the two categories were retained. In both conditions all remaining participants favored a strategy based on the general linear classifier. As can be seen in Figs. 7 and 8 in the Supplementary Materials, all optimal decision bounds fell between the two categories. As a result, the final sample used in the subsequent analyses consisted of 28 participants (19 women; average age of 20.9 years, SD=3.24, range=18–30 years years; n= 15 in the almost supervised condition and n=13 in the almost unsupervised condition).

As in Experiment 1, four types of analyses are reported: accuracy, model-based, response time analyses, and SAT analyses. The response time analyses were used to compare the learning process of the almost supervised and almost unsupervised conditions of Experiment 2 to the semisupervised condition of Experiment 1.

Accuracy analysis

Figure 6 shows the average percentage of correct responses and the 95% confidence intervals on each block of trials received during the first 4 days for each of the conditions (almost supervised, almost unsupervised, and the semisupervised condition of Experiment 1) separately. In all conditions the accuracy was based on all trials (feedback and no-feedback trials). During the first 2 days (blocks 1–8) the learning process was similar in the three conditions. The mean accuracy increased from an average of 72% (SD=9.93) in the first block for the almost supervised condition, an average of 75% (SD=11.76) for the almost unsupervised condition, and an average of 73% (SD=11.06) for the semisupervised condition to almost perfect accuracy in the last block of day 2 (97%, SD=3.11, 96%, SD=2.35 and 97%, SD=2.76, respectively). During the blocks on days 3 and 4, the mean accuracy was almost perfect in the almost supervised condition: the mean accuracy on the last block on day 3 was 98% (SD=2.59) and 99% (SD=1.48) on day 4. In the almost unsupervised condition the mean accuracy was also high: the mean accuracy on the last block on day 3 was 94% (SD=5.54) and 94% (SD=4.92) on day 4. For the semisupervised condition of Experiment 1, the mean accuracy on the last block of day 3 was 97% (SD=2.00) and 97% (SD=2.00) on day 4. A repeated measures ANOVA was conducted to determine whether the mean accuracy on the last two blocks differed depending on the day (4 levels: days 1, 2, 3, and 4) and condition (three levels: almost supervised, almost unsupervised, and semisupervised). Not surprisingly, there was a main effect of day, F(3,34)=22.67, p<.001, ηp2=.67, indicating that the accuracy significantly increased during the succeeding days. Post hoc paired-sample t-tests, adjusted by the Bonferroni correction for multiple comparisons, indicated that in comparison to day 1, mean accuracy was significantly higher on days 2, 3, and 4 (resp. t(38)=7.79, p<.001; t(38)=5.60, p<.001; t(38)=7.18, p<.001). There was no main effect of condition, F(2,36)=1.60, p=.22, ηp2=.08 nor an interaction between day and condition (F(6,68)=1.60, p=.16 ηp2=.12, suggesting that the increase in accuracy across days was similar in the three conditions.

The accuracy on the test (day 5), where the speed of responding was stressed, was lower in all conditions compared to the accuracy reached at the end of day 4, the average difference was -6.7% (SD=3.35) in the almost supervised condition, -7.5% (SD=7.47) in the almost unsupervised condition, and -10.69% (SD=6.97) in the semisupervised condition. Paired sample t-tests showed that these differences were significant: accuracy significantly dropped between the last block of day 4 and day 5, t(14)=7.69, p<.001, d=2.06, for the almost supervised condition, t(10)=5.08, p<.001, d=1.53 for the semisupervised condition, and t(12)=3.62, p=.003, d=1.00 for the almost unsupervised condition. Finally and crucially, a one-way ANOVA was conducted to determine whether the mean accuracy on day 5 differed between the almost supervised, almost unsupervised, and the semisupervised condition of Experiment 1. This was the case: F(2,36)=4.18, p=.02, ηp2=.19. Independent sample t-tests corrected by the Bonferroni correction for multiple comparisons showed that this effect is due to the significant difference in accuracy between the almost supervised condition (92%, SD=4.08) and the almost unsupervised condition (86%, SD=6.40), t(26)=2.94, p=.02. The accuracy in the semisupervised condition (87%, SD=6.53) did not differ significantly from the almost unsupervised condition (t(22)=.50, p=1) and, most crucially, did not differ significantly from the almost supervised condition (t(24)=2.19, p=.11).

Model-based analysis

Table 8 shows the four model fits on day 4. In the almost supervised condition, all participants preferred a decision bound based on the general linear classifier, indicating successful learning. In the almost unsupervised condition, 11 participants of the 13 learned successfully: they revealed a decision bound based on the general linear classifier. Participants 25 and 37 preferred a decision bound based on the general conjunctive classifier, indicating that they switched to another strategy in comparison to day 2.

Table 8 Mean accuracy (%) and model-based analyses (BIC scores) on the last two blocks of day 4 for every participant of Experiment 2

Figures 9 and 10 in the Supplementary Materials show the actual responses during the last two blocks of day 4 for each participant. Table 9 presents the model fits on the test day (i.e., day 5). Again, for the almost supervised condition all participants used a decision bound based on the general linear classifier. In the almost unsupervised condition, 11 participants preferred a decision bound based on the general linear classifier whereas participants 23 and 37 adopted a strategy based on the general conjunctive classifier. Figures 11 and 12 in the Supplementary Materials present the responses and the best fitting decision bounds for every participant during the test day.

Table 9 Mean accuracy (%) and model-based analyses (BIC scores) on all trials of day 5 for every participant of Experiment 2

Analysis of the response times

The mean response times (RTs) along with the 95% confidence intervals from days 1–4 for the almost supervised, almost unsupervised, and the semisupervised condition of Experiment 1 from days 1–4 are presented in Fig. 7. These RTs were calculated on the last two blocks of each day. For day 1, the mean RT in the almost supervised condition was 855 ms (SD =162.41), the mean RT in the almost unsupervised condition was 825 ms (SD=156.31), and the mean RT in the semisupervised condition of Experiment was 858 ms (SD=131.22). Importantly for this investigation is that participants were equally fast in the last two blocks of day 2 (almost supervised mean RT=806 ms, SD=161.29; almost unsupervised mean RT=800 ms, SD=148.49, and semisupervised mean RT=785 ms, SD=90.10). This was confirmed by a one-way ANOVA comparing the RTs of the last two blocks of day 2 to the three conditions, almost supervised condition, the almost unsupervised condition, and the semisupervised condition of Experiment 1, F <1, p=.93, ηp2=.004. On days 3 and 4, the mean RTs dropped in the almost supervised condition to, respectively, 783 ms (SD=170.67) and 772 ms (SD=134.68). This decrease in mean RTs was also observed in the almost unsupervised condition: 772 ms (SD=134.68) on day 3 and 749 ms (SD=144.54) on day 4. A repeated measures ANOVA was conducted to determine whether the mean RTs on the last two blocks differed depending on the day (four levels: days 1, 2, 3, and 4) and condition (three levels: almost supervised, almost unsupervised, and semisupervised of Experiment 1). Not surprisingly, there was main effect of day, F(3,34)=6.81, p=.001, ηp2=.38, indicating that the RTs decreased across the succeeding days. Post hoc paired-sample t-tests, adjusted by the Bonferroni correction for multiple comparisons, indicated that participants were significantly faster on days 2, 3, and 4 compared to day 1, t(38)=2.92, p=.03; t(38)=2.95, p=.05 and t(38)=4.03, p=.001, respectively. None of the other comparisons were significant (all p>.42). There was no main effect of condition, F<1, p=.89, ηp2=.01, neither was there a significant interaction between day and condition, F(6,68)=1.85, p=.10, ηp2=.14.

Fig. 7
figure 7

Mean response times (in ms) along with the 95% confidence intervals for the almost supervised and the almost unsupervised condition of Experiment 2 along with the semisupervised condition of Experiment 1. The mean response times are calculated on the last two blocks of each day

Decisive to test our hypotheses was the difference in RTs on day 5. In the almost supervised condition the mean RT was 730 ms (SD=132.08), the mean RT in the almost unsupervised condition was 742 ms (SD =84.59), and the mean RT in the semisupervised condition of Experiment 1 was 579 ms (SD=104.08). A one-way ANOVA indicated significant differences between the RTs of the three groups on day 5, F(2,36)=8.07, p=.001, ηp2=.31. Post hoc paired-sample t-tests, adjusted by the Bonferroni correction for multiple comparisons, showed that RTs were faster in the semisupervised condition of Experiment 1, compared to the almost supervised condition, t(24)=3.15, p=.004, and to the almost unsupervised condition, t(22)=4.26, p=.003. The difference between the almost supervised condition and the almost unsupervised condition was not significant, t(26)=0.28, p>.99.

Speed-accuracy trade-off

On day 5 participants were deliberately asked to respond as fast as possible. The speed-accuracy trade-off was again calculated for each condition. In the almost supervised condition, there was no SAT-effect, r=-.08, p=.79. In contrast, there was a SAT-effect in the almost unsupervised condition, r=.85, p<.001: faster responses were correlated with more errors. Recall that in the semisupervised condition of Experiment 1 we also observed a SAT-effect.

Discussion

In Experiment 2 the total number of trials was equal in each condition to investigate whether the nature of feedback (almost supervised, semisupervised, or almost unsupervised) had an impact on the development of automaticity. Two control conditions were run to compare the semisupervised condition of Experiment 1: the almost supervised condition (in which mainly feedback trials were given) and the almost unsupervised condition (in which mainly no-feedback trials were given). The results indicated that the mean RTs on day 5 in the semisupervised condition of Experiment 1 were significantly faster than in the two control conditions of Experiment 2, indicating that a combination of feedback and no-feedback trials boosted the automaticity process. This result cannot be due to general faster RTs in the semisupervised condition, as RTs were equal for all conditions on day 2. This result can also not be affected by the novelty of the no-feedback trials on the test day, since all participants already experienced feedback and no-feedback trials on days 3 and 4. Remarkably, in the almost unsupervised condition, two participants did not reveal a strategy based upon the optimal decision bound anymore on day 4. Apparently, they changed their categorization strategy during day 3 and day 4. This was not the case in the almost supervised condition nor in the semisupervised condition. This might imply that for some participants a minimal percentage of feedback is still needed late in learning – even though almost perfect accuracy was obtained long before. On the test day, the mean accuracy was higher in the almost supervised condition compared to the almost unsupervised condition. There was no significant difference in mean accuracy in the semisupervised condition compared to the almost supervised condition and the almost unsupervised condition. In the semisupervised and the almost unsupervised condition, a few participants showed a drop in accuracy on day 5. This drop in performance was also reflected in the model-based analyses: in the almost supervised condition all participants adhered to a strategy based upon the optimal decision bound whereas in the semisupervised and the almost unsupervised conditions three and two participants, respectively, switched. When these participants who switched strategy were omitted from the analyses, a one-way ANOVA revealed that the difference in accuracy disappeared, F(2,31)=2.67, p=.12 ηp2=.13. The mean accuracy was 87% (SD=5.90) for the almost unsupervised, 89% (SD=7.04) for the semisupervised, and 92% (SD=4.08) for the almost supervised condition. Still, the effect of faster RTs in the semisupervised condition remained, one-way ANOVA F(2,31)=6.54, p=.004, ηp2=.30. Furthermore, this effect was due to the significant difference between the semisupervised (577ms, SD=123.14) and the almost supervised (730 ms, SD=132.08), t(21)=2.71, p=.015 and the difference between the semisupervised and the almost unsupervised (760 ms, SD=79.20), t(17)=3.96, p=.005 while the difference between the almost supervised and the almost unsupervised condition was not significant, t(24)=0.67, p>.99. These post hoc t-tests were corrected for multiple comparisons by Bonferroni. These results show that the faster response times in the semisupervised condition were not due to quick random guessing by some participants, resulting in fast RTs and low accuracy, since the effect remains when the switch participants were omitted.

On day 5 a SAT-effect occurred in the almost unsupervised condition of Experiment 2: participants who tended to respond fast, also made more errors. This was not the case in the almost supervised condition. This could explain the significantly lower accuracy in the almost unsupervised condition on day 5. Note that we also observed a SAT-effect on day 5 for the semisupervised condition. Even though this condition showed a SAT, it has a comparable accuracy as the almost supervised condition.

As a conclusion, Experiment 2 shows that, even when the total number of trials is the same, the development of automaticity is enhanced by 25% semisupervised learning. Only in this feedback scheme the accuracy remained high and response times were strikingly faster. When feedback was almost always provided, participants maintained a high accuracy but were not able to accelerate their responses. In the almost unsupervised condition, there is a drop in accuracy – some participants even unlearned the category structure – and response times remained high. These results are in contrast to the SPEED-model, which predicts a similar development of automaticity between the three conditions as the number of trials, regardless of feedback, is identical in all conditions.

General discussion

This study investigated the impact of semisupervised category learning late in the learning process when automaticity develops. Participants were first trained in a supervised way over 2 days on the information-integration category structure and only participants who performed at least 90% accurately and used a decision bound similar to the optimal decision bound were included in the actual experiments. In Experiment 1, half of the participants were trained in a 25% semisupervised way on days 3 and 4: only a quarter of the trials were followed by feedback. The other half were trained in a supervised way, implying that feedback was given after every categorization response. Both conditions received an equal amount of feedback trials. On the fifth day, differences in performance between the semisupervised and supervised learners were studied. Participants were urged to respond as fast as possible on this test day. Accuracy was similar in both groups on day 5, which is to be expected, as accuracy was already above 90% at the end of day 2. However, the results clearly showed that participants in the semisupervised condition responded significantly faster than the participants in the supervised condition on day 5. This effect cannot be due to general faster RTs and/or higher accuracy levels in the semisupervised condition, as evidenced by similar RTs and accuracies for both conditions at the end of day 2. Thus, the findings of Experiment 1 imply that late in learning the no-feedback trials in the semisupervised condition aided the development of automaticity.

However, two confounds hampered a clear conclusion that late in learning semisupervised learning is superior. First, even though the amount of feedback trials was equal in the semisupervised and the supervised conditions, the total number of trials differed. Participants in the semisupervised condition received four times as many trials on day 3 and day 4 as participants in the supervised condition and hence had more practice. It is therefore possible that the larger total number of trials caused the faster RTs in the semisupervised condition. Second, participants in the supervised condition never received no-feedback trials on days 3 and 4. Perhaps the sudden encounter of no-feedback trials on day 5 might have slowed them down on this test day. To exclude these alternative explanations, two control conditions were administered in Experiment 2 that were compared to the semisupervised condition of Experiment 1. The percentage of feedback trials on day 3 and day 4 was manipulated. In the almost supervised condition 95% of the trials were randomly followed by feedback. In the almost unsupervised condition 5% of the trials were randomly followed by feedback. This implies that in all conditions (almost supervised, semisupervised, and almost unsupervised), participants encountered both feedback and no-feedback trials on days 3 and 4. Crucially, the total number of trials was identical in all three conditions. Despite these alterations, the results of Experiment 2 again showed that the RTs on the fifth day were significantly slower for the almost supervised and the almost unsupervised conditions compared to the semisupervised condition, whereas RTs did not differ at the end of day 2. Since the total number of trials was now identical in all three conditions, we can conclude that the semisupervised learners achieved automaticity faster. Hence, late in learning semisupervised learning should be preferred. These results are not in line with the predictions of the SPEED-model since it stipulates that the type of trial (feedback or no feedback) should not have an impact on the development of automaticity late in learning, and would therefore expect similar RTs in all the conditions as they all contain the same number of trials. Why semisupervised category learning is superior late in learning requires further investigation. Perhaps semisupervised learning increased participants’ motivation and attention compared to a condition where feedback is almost always offered. Contrarily, participants in the almost unsupervised learning condition often reported frustration and they perhaps gave up to perform to their maximum ability.

In the almost unsupervised condition there was a Speed accuracy trade-off (SAT-)effect: participants who tended to respond fast also made more errors. This can explain why the mean accuracy in the almost supervised condition was significantly higher than in the almost unsupervised condition. Although we also observed a SAT-effect in the semisupervised condition, there was no difference in accuracy between the almost supervised and the semisupervised conditions, suggesting that, even though participants in the semisupervised condition sacrificed accuracy for speed, they still remained at a similar accuracy level to participants in the almost supervised condition. The model-based analyses on day 5 showed that a few participants in the semisupervised and the almost unsupervised condition switched in strategy. This was not the case in the almost supervised condition where all participants adhered to the optimal decision bound. Apparently, some participants in the almost unsupervised and in the semisupervised condition unlearned the category structure. Note that when these participants were omitted from analyses the faster RTs in the semisupervised condition remained, whereas the difference in accuracy between almost supervised and almost unsupervised conditions disappeared. These findings suggest that the faster RTs in the semisupervised condition were not due to quick random guessing. It rather seems that individual differences are decisive. In this study, the learning scheme in a condition was fixed and individual differences in learning were not taken into account. It is possible that some participants need a higher percentage of feedback-trials on days 3 and 4 even though they successfully learned the structure at the end of day 2. Another explanation could be that the switch point, that is, the point in the learning process on which supervised learning becomes less effective in favor of semisupervised learning, is later for some participants than the fixed point of 800 trials (end of day 2) used in this study. Thus, even though semisupervised learning appears to be the best learning mode late in learning, there may be exceptions for some participants. These individual differences in learning are interesting directions for further research.

The current study supports the idea that no-feedback trials aid the learning process, as shown in machine learning (Chapelle et al., 2006; Zhu et al., 2009) and in a few human semisupervised studies (Kalish et al., 2011; Lake & McClelland, 2011; Zhu et al., 2010). Contrary to the studies of Kalish et al. (2011), Lake and McClelland (2011) and Zhu et al. (2010), in our study all the stimuli (as compared to a fixed subset) could be followed by feedback, mimicking real-life category learning. As in the study of Rogers et al. (2010), semisupervised learning was found when the participants were urged to respond as fast as possible. It is possible that, in order to observe semisupervised learning, speed of responding is essential. Again, this can be an interesting direction for future research. Our study also proves that not only young children (Kalish et al., 2015), but young adults too are able to learn in a semisupervised way.

The results of our study may seem to be in contrast to the study of Vandist et al. (2009) where no effect of the no-feedback trials was found in learning the information-integration category structure. However, there are important differences between both studies. First and most importantly, Vandist et al. (2009) focused on the effects early in the learning process whereas the current study dealt with the effects late in learning, when automaticity develops. Second, the impact of the no-feedback trials was studied on the accuracy levels in the study of Vandist et al. (2009). In this study response time was the dependent variable. Third, in the Vandist et al. (2009) study, participants in the semisupervised condition learned from the start in a semisupervised way whereas in the present study semisupervised learning was only introduced after expert performance was obtained.

Combining the results of both studies suggests that early in learning the information-integration structure, the no-feedback trials do not have an impact, but that late in learning the no-feedback trials facilitate automaticity. Indeed, the effects of semisupervised learning, at least in the information-integration structure, might be especially apparent late in the learning process. These results can also explain why former semisupervised category studies failed to find convincing effects, as they all focused on initial learning processes. This also makes sense if we relate this to category learning in children. When a child is first confronted with items of an unknown category, parents label most of the presented items. As the child becomes more and more familiar with the category, the parent still labels information but less often. When the parent has the idea that the child has acquired the category, label information diminishes but still takes place from time to time. In fact, it is ecologically plausible that semisupervised learning takes place when a solid basis of category expertise has first been acquired and from that point on, it aids learning. Experiment 2 even suggests that from a certain expert level on, semisupervised category learning might be essential to the development of automaticity, since semisupervised category learning fastens the development of automaticity whereas (almost) supervised learning seems to slow it down. Although speculative, it could for example be that the continuous feedback in the almost supervised condition makes the expert learner less attentive or less motivated, leading to decreased performance. Nevertheless, our results also indicated that a certain percentage of feedback trials is still needed to develop automaticity successfully: when feedback was rare in the almost unsupervised condition, the mean RTs remained at the same level and at the end of day 4 a few participants even unlearned the category structure.

In conclusion, this is the first study that examines the effect of 25% semisupervised learning late in learning. In Experiment 1, faster RTs were observed in a 25% semisupervised condition in comparison to a supervised condition when the total amount of feedback trials in both conditions was the same. In Experiment 2 the total number of trials was kept identical in all conditions, but still the 25% semisupervised condition of Experiment 1 showed faster response times in comparison to the almost supervised and almost unsupervised conditions of Experiment 2. Hence, late in learning (25%), semisupervised learning seemed to have a beneficial effect as the no-feedback trials facilitated automaticity. A learning condition containing a certain amount of no-feedback trials even seems to outperform a condition where (almost) all trials are followed by feedback, as long as a minimal percentage of the trials is followed by feedback.