Interleaving (i.e., switching between exemplars of different categories) is an effective strategy in category learning — wherein learners acquire knowledge about the underlying category structures by studying multiple individual exemplars. A wealth of research has demonstrated a robust categorization advantage of interleaved sequences compared to blocked sequences (i.e., switching between exemplars of the same category) when learning various types of visual stimuli (for recent meta-analytical evidence, see Brunmair & Richter (2019)). Because learners perceive interleaving as more effortful than blocking (e.g., Janssen et al., 2023; Kirk-Johnson et al., 2019; Onan et al., 2022), interleaving is widely regarded as a desirable difficulty (Bjork & Bjork, 2011). It facilitates learning but makes learning more effortful for learners.

Despite the robust benefits of interleaving in category learning, learners’ study sequence choices usually reveal an overwhelming preference for blocking in learning confusable categories (e.g., Tauber et al., 2013). One explanation for learners’ disengagement in interleaving is that learners perceive the to-be-invested additional effort in interleaved learning as motivational costs (de Bruin et al., 2023; see also Feldon et al. (2019) and Grund et al., (2024) and do not recognize sufficient value in incurring these costs. According to this explanation, learners avoid unnecessary effort in their learning strategy choice (de Bruin et al., 2023; Kurzban et al., 2013; Shenhav et al., 2017) — if they do not anticipate learning benefits going beyond their use of a less effortful learning strategy, they tend to refrain from investing additional effort.

From the perspective of the wealth of metacognitive literature on learning strategy trainings (e.g., Dignath & Büttner, 2008; Dignath & Veenman, 2020), learners’ lack of engagement in interleaved learning could be related to deficits in metastrategic knowledge (see Zohar & Peled 2008) regarding interleaving. That is, learners might lack the conditional knowledge on when and why interleaving can improve learning. From a motivational perspective (e.g., Eccles & Wigfield, 2020; Zepeda et al., 2020), a further reason could be that learners do not see sufficient utility value when engaging in interleaving.

The present research addresses these potential reasons for learners’ lack of engagement in interleaved learning. In two experiments, we investigated the effects of providing learners with different conditional knowledge components regarding when and why interleaving can foster learning and of increasing the utility value of the learning benefits that can be harnessed through interleaving. In both experiments, learners’ study sequence choices were used as the main dependent variable.

The Interleaving Effect in Category Learning

Previous research on sequence effects in category learning has demonstrated a robust categorization advantage of interleaved sequences compared to blocked sequences when learning various types of visual stimuli ranging between arts, artificial, natural, and scientific categories (for a recent meta-analysis, see Brunmair & Richter (2019)). One explanation for these benefits of interleaving is the discriminative contrast hypothesis (Birnbaum et al., 2013; Kang & Pashler, 2012). According to this hypothesis, the ability to detect subtle differences among categories that are similar and thus easily confusable is crucial for successfully distinguishing between them. Interleaving, in turn, provides the necessary contrast that facilitates the detection of subtle differences (see also Abel et al. (2021) and Carvalho and Goldstone (2015)). Only if the categories are very distinct and barely share any features, interleaving is expected to be relatively disadvantageous to blocking — in such cases, relevant differences do not stand out against the irrelevant ones (Abel et al., 2021; Carvalho & Goldstone, 2015).

The outlined benefits of interleaving as a learning strategy apply to a variety of educationally relevant tasks such as mathematical problem-solving (e.g., Nemeth et al., 2021; Rohrer, 2012), learning from expository texts (e.g., Abel et al., 2020; Zulkiply et al., 2012), and the acquisition of source evaluation skills (Abel et al., 2024). However, learners scarcely engage in interleaving on their own accord. It is true that a few studies found that learners spontaneously switch between categories to a higher extent. For example, Lu et al. (2020) demonstrated learners’ spontaneous engagement in interleaving when learning easily confusable categories (appr. 65% of study sequence choices were interleaved), while Kornell & Vaughn (2018) showed that learners switch phases of pure blocking and pure interleaving. Furthermore, Abel (2023a) reported across multiple experiments no preference for blocking over interleaving when dealing with unfamiliar perceptual learning tasks across various sensory modalities. However, in sum, the studies on learners’ study sequence choices in category learning reveal an overwhelming preference for blocking in learning confusable visual categories (Carvalho et al., 2016; Kornell & Vaughn, 2018; Sun et al., 2022; Tauber et al., 2013; Yan et al., 2016, 2017). From the perspective of the literature on learning strategy training, both lacking metastrategic knowledge regarding interleaving and the low utility value that learners see in the benefits that can be harnessed through interleaving could contribute to this frequent disuse of interleaving.

Metacognitive and Motivational Perspectives on the (Under)Utilization of Interleaving

Metacognitive research on learning strategy usage and interventions attributes learners’ scarce use of effective learning strategies to two types of deficiencies: a mediation deficiency and a production deficiency (e.g., Brown, 1978; Flavell, 1978). A mediation deficiency would be diagnosed if learners lack relevant knowledge on how to use a strategy. That is, learners might not know how to execute the cognitive operations required by a learning strategy. A production deficiency, by contrast, would be diagnosed if learners are in principle able to apply a strategy but nevertheless do not use it. This deficiency is usually explained through the fact that learners lack conditional knowledge on when and why using a strategy would be beneficial (Paris et al., 1983). Due to such lack of metastrategic knowledge (see Zohar & Peled (2008)), learners might not be aware of the learning benefits that a strategy would yield in a certain setting and thus scarcely engage in it.

With respect to interleaving, a production deficiency but not a mediation deficiency is a likely explanation for its underuse, at least when it comes to adult learners with advanced learning skills such as university students. Specifically, as interleaving, a study strategy characterized by switching between categories, primarily relies on relatively simple choices for study sequence, it can be easily executed, and hence, a mediation deficiency appears to be unlikely. Learners’ lack of conditional knowledge on when and why interleaving would benefit learning (i.e., production deficiency), by contrast, appears to be more plausible (cf. Yan et al., 2016). A wealth of research has indicated that even advanced learners such as university students frequently lack knowledge concerning effective learning strategies (e.g., Biwer et al., 2020; see also Trentepohl et al. (2023)) and show persistent misconceptions regarding interleaving (McCabe, 2011; Yan et al., 2016).

The basic conditional knowledge on when and why engaging in interleaved category learning can be conceptualized as conditional (if–then) links. These conditional links altogether translate processes into spontaneous use of interleaving (see Fig. 1). First, learners need to be aware that detecting differences is essential for being able to distinguish confusable categories (hereafter referred to as awareness-that-differences-matter). This conditional knowledge component refers to learners’ understanding of the implicit requirements of a category learning task: If you are aware that differences matter for distinguishing, then your goal to distinguish confusable categories should translate into the willingness to invest effort in identifying these differences. Second, learners need to be aware that interleaving is supportive in detecting subtle differences between confusable categories (hereafter referred to as awareness-that-interleaving-highlights-differences): If you are aware that interleaving highlights differences, then your willingness to invest effort in identifying these differences should translate into the spontaneous use of interleaving.

Fig. 1
figure 1

Potential motivational and metacognitive reasons for learners’ (under)utilization of interleaving when learning confusable categories. Note: Black arrows indicate conditional (if–then) links that altogether translate processes (in boxes) into spontaneous use of interleaving. For example, the first conditional link addresses a motivational reason: If you expect consequences of confusion, then being faced with confusable categories (preceding box) should raise the utility value of distinguishing these categories (subsequent box). In the case of a conditional gap, there should be no or less transition between the processes

Beyond the metacognitive perspective on learners’ disuse of effective learning strategies, motivational deficits might contribute to learners’ deficits in using interleaving as well. Even if learners would be aware that detecting predictive differences is essential for being able to distinguish confusable categories and that interleaving can help detect these differences, they might still see hardly any utility in reaching the learning goal of being able to distinguish in the first place if no consequences of confusion are expected — hereafter referred to as utility-value-of-distinguishing (see Fig. 1). From the perspective of situated expectancy-value theory (e.g., Eccles & Wigfield, 2020), the utility value that learners see in the learning goal might be too low because they do not recognize how it is relevant for them or how it aligns with their present or future plans.

Interventions to Engage Learners in Interleaving

With respect to the role of the utility value of being able to distinguish between confusable categories, Abel (2023b) provided initial empirical evidence. In this experiment, the labels of two superordinate mushroom categories were manipulated — edible vs. poisonous (for the high utility value of being able to distinguish between the categories) and growing on acidic soil vs. growing on basic to neutral soil (for low utility value). The main finding was that the extent of interleaving increased when the utility value was high. This finding suggests that the utility value of distinguishing recognized by learners can explain parts of learners’ (dis)engagement in interleaving. However, the findings by Abel (2023b) also revealed that low utility value is not the only reason for low engagement in interleaving. The effect of motivation to distinguish spontaneous study sequence choices was small and even the learners in the group with high utility value prevailingly blocked. Hence, beyond the lack of utility value and hence the motivation to distinguish, learners might have gaps regarding the conditional knowledge on when and why using interleaving, particularly concerning the awareness-that-differences-matter and/or the awareness-that-interleaving-highlights-differences.

Findings by Yan et al. (2017) support this notion. In their studies, students underappreciated the importance of between-category comparisons relative to within-category comparisons, suggesting learners’ lack of the conditional knowledge that for distinguishing, differences matter. Regarding the awareness-that-interleaving-highlights-differences, however, some learners correctly assumed that mixing categories supports between-category comparisons, while others had awareness gaps. In line with this finding, Onan et al. (2022) reported that at least some students spontaneously explained their choice of interleaving over blocking with the aim of detecting differences between categories.

A well-established approach to provide learners with conditional knowledge regarding learning strategies and in turn foster their engagement in the strategies is informed training (e.g., Paris et al., 1983). In informed training, learners are explicitly informed about the benefits of a strategy and about the conditions in which its use is promising (i.e., the when and why of using a strategy). The benefits of informed training in terms of strategy application have been documented in several studies that focused on different learning strategies such as generative strategies and retrieval practice (e.g., Ariel & Karpicke, 2018; Carpenter, 2023; Hübner et al., 2010; Roelle et al., 2017; Wang et al., 2023), but also interleaving (e.g., McCabe, 2011; Onan et al., 2024; Sun et al., 2022).

One frequent shortcoming of studies on informed training, however, is that it is often implemented as a package that includes a variety of potentially relevant metastrategic knowledge components. Evidently, these intervention packages yield beneficial effects. However, they provide little insight into what specifically led to their positive effects. That is, in such interventions, it remains unclear which conditional knowledge components are necessary and crucial to engage learners in a learning strategy. For the design of efficient and adaptive informed training interventions concerning a particular strategy, however, insight into the actual active ingredients would be fruitful.

The Present Research

In view of the outlined theoretical and empirical background, our main goal was to investigate the role of the utility value of being able to distinguish and the two outlined components of conditional knowledge for the (lack of) learners’ engagement in interleaving when learning confusable categories (see Fig. 1). We tested the following hypotheses.

In view of the initial evidence concerning the impact of increasing the utility-value-of-distinguishing by Abel (2023b), we assumed that highlighting the consequences of confusion would increase the degree to which learners engage in interleaving (utility-value-hypothesis). Furthermore, in view of the finding that learners appear to underappreciate the importance of between-category comparisons relative to within-category comparisons (Yan et al., 2017), we assumed that providing learners with the conditional knowledge component that for distinguishing, differences matter would foster learners’ engagement in interleaving as well (differences-matter-hypothesis).

In terms of the role of the conditional knowledge component that interleaving highlights differences, we did not have a clear prediction that providing learners with this component would substantially foster the use of interleaving. As outlined above, previous research indicates that learners might already be aware of this fact, rendering instruction on this conditional knowledge component largely redundant for promoting the use of interleaving. On this basis, we considered it an open research question whether informing learners about the conditional knowledge component that interleaving highlights differences would substantially affect learners’ engagement in interleaving (interleaving-highlights-differences-question).

Our hypotheses and research questions were mainly directed at learners’ study sequence choices as a dependent variable. Study sequence choices are more representative of learners’ actual use of interleaved learning than either a pre-study binary choice for the whole study phase — interleaved or blocked (e.g., Janssen et al., 2023; Onan et al., 2022; Sun et al., 2022) — or a three-option sequence preference assessment after the classification test — interleaved, blocked, or equal (e.g., Yan et al., 2016; hereafter referred to as metacognitive preference). This is because study sequence choices offer a wider range of options and being logged throughout the study phase makes them more sensitive to data-driven effort experience (cf. Onan et al., 2024). However, to build on previous research, we assessed the metacognitive preferences as well, assuming the same pattern as implied by our hypotheses on study sequence choices. Furthermore, we were interested in learners’ classification performance and hence in learning outcomes as well. We expected that the use of interleaving should be reflected in higher classification performance scores in the final test.

To address our hypotheses and research questions, we used as visual stimuli distinct mushroom twin pairs from the study by Abel (2023b), because learners might spontaneously recognize the consequences of confusion and the critical importance of being able to reliably distinguish between poisonous and edible. We conducted two experiments (see Fig. 1 for manipulations across the experiments). In Experiment 1, we factorially varied the utility value of being able to distinguish between categories (high vs. low) and the conditional knowledge component (instruction vs. no instruction). In Experiment 2, we kept the utility value high across conditions but again factorially varied the conditional knowledge component that for distinguishing, differences matter and, beyond Experiment 1, also varied the conditional knowledge component that interleaving highlights differences. Learners’ study sequence choices were used as the main dependent variable, and classification performance was assessed as well in both experiments.

Experiment 1

In Experiment 1, we investigated the role of the utility value of distinguishing on study sequence choices when learning two types of easily confusable mushrooms and of the awareness that differences matter for being able to distinguish. In accordance with our hypotheses, we expected to find the main effects of our factor manipulations: Participants who see higher utility value in distinguishing should interleave to a higher extent than participants who see lower utility value (utility-value-hypothesis). Furthermore, participants who were informed about the conditional knowledge component that for distinguishing, differences matter should interleave to a higher extent than participants who were not informed (differences-matter-hypothesis). We additionally explored a potential interaction between our manipulations.

Method

The present research was not preregistered.

Sample

We conducted an a priori power analysis with the following parameters to determine the required sample size: alpha-level = 0.05, power = 0.80, f = 0.20 (small to medium effect size with regard to the use of interleaving). Effect sizes below a small-to-medium range are deemed practically irrelevant (Hattie et al., 1996). The power analysis, which was fitted for a 2 × 2-factorial ANOVA, indicated that 199 subjects would be needed. On this basis, we recruited a total of N = 212 participants for our online experiment. As an incentive, the participants were offered the chance to participate in a raffle for 15 vouchers valued at 15€ and received 0.5 credits. The exclusion criteria were as follows: If subjects reported on a 5-point Likert scale a low or a very low level of concentration during the study phase or if they were aware of the theoretical background of the present research (that is the interleaving effect). Thirteen subjects had to be excluded based on the first criterion, further eight subjects based on the second criterion, and one subject based on both criteria.

Of the remaining 190 participants, 181 were students (MAge = 23.28, SD = 4.57). The sample consisted of 145 women, 43 men, and two diverse. The subjects were randomly assigned across four conditions with 44 to 50 participants per condition. The research was conducted in compliance with the Declaration of Helsinki and the ethical standards of the DGPs (German Society of Psychology).

Design

Experiment 1 is based on a 2 × 2 between-subjects design with a utility value of distinguishing with the levels high (labelling mushroom stimuli as poisonous and edible) vs. low (labelling mushroom stimuli as growing on acidic and alkaline to neutral soil) and instruction informing about the conditional knowledge component that for distinguishing, the detection of differences matters. The learners who received this instruction were informed that to be able to distinguish confusable mushrooms, it is essential to identify their subtle differences, whereas the learners without this instruction received no additional instruction going beyond the generic goal to learn two types of mushrooms for a subsequent categorization test (see Appendix). We analyzed the impact of these two manipulations on learners’ spontaneous study sequence choices in the study phase, self-reported measures subsequently to the study phase, classification performance, and their metacognitive preferences subsequently to the classification tests.

Stimuli

The present research utilized a set of 72 naturalistic images, consisting of 12 distinct mushrooms shown from six different angles each. During the learning phase, participants were presented with three images of each mushroom from various perspectives, along with the corresponding superordinate category label and actual name. In the category assignment task, two images of each mushroom were displayed, while in the picture assignment task, participants were shown a single frontal image of each mushroom.

The stimuli were split into two groups, with half of the mushrooms being edible and growing on alkaline to neutral soil, and the other half being poisonous and growing on acidic soil. The only difference between the high and low utility values of distinguishing conditions was which superordinate category labels were used. The mushrooms within each superordinate category were visually similar to mushrooms from the other superordinate category, resulting in six pairs of confusable doubles. The twin pairs were very distinct from each other. That means that overall, mushroom stimuli are not uniformly of a high or a low between-category similarity. Because there are very distinct mushrooms but some of those are confusable, high and low between-category similarities are present at the same time. Accordingly, learners were faced with a kind of twins distinguishing task.

It is important to emphasize that the mushrooms within the same superordinate category did not share predictive features. This is inherent to mushrooms as naturalistic stimuli and makes the comparisons between mushrooms of the same type uninformative for the classification of new mushrooms as edible or poisonous. The comparisons with their twins, in contrast, seem the only informative way. This lack of predictive visual similarities was not a critical limitation of the present research since naturalistic superordinate categories do not necessarily share visual features. We also do not consider the lack of shared predictive features a difference to typical category tasks used in previous studies because therein, no superordinate level was present.

Study Phase

Participants were given the opportunity to create an individual study sequence by selecting mushrooms one by one from a mushroom selection page. The selection page displayed mushroom names in two columns, with the corresponding superordinate category labels (depending on the utility value manipulation) displayed above them (see Fig. 2). The rows were color-coded to indicate the visual similarity of the mushrooms within the same row (i.e., mushroom doubles); participants were not explicitly informed of their visual similarity. However, if learners chose to switch between superordinate categories, the interface indirectly nudged them toward selecting mushroom doubles (as opposed to visually dissimilar mushrooms) by organizing mushrooms in rows of different colors. The remaining number of images that participants could view for each mushroom (3/3, 2/3, 1/3, or 0/3) was also displayed under each mushroom’s name, with a button turning gray after it had been clicked three times.

Fig. 2
figure 2

Coding schema for study sequence metrics based on participants’ choices on the selection page when studying edible (left column) and poisonous mushrooms (right column). Note: Visually similar mushroom doubles were placed in the same row. Participants were clicking on a respective mushroom to view its image (three images in total for each mushroom). Interleaving (continuous lines) embraces switches between the columns, informative between-switches (horizontal switches for mushroom doubles) and non-informative between-switches (diagonal switches for dissimilar mushrooms). Blocking (dotted lines) embraces within-switches (vertical switches among mushrooms within a column) and no switches (selecting images of the same mushroom). Adapted with permission (cf. Abel, 2023b)

By clicking on a mushroom's name, one of the three randomized images of the corresponding mushroom appeared on a new slide for five seconds, along with the corresponding mushroom name and superordinate category label. After 5 s, the participant was redirected back to the mushroom selection page. The study phase was completed after all 36 images were viewed.

When looking at the interface, there is one obvious methodological difference to previous research — the presence of superordinate category labels (in the high value of distinguishing conditions, these are edible and poisonous). Going beyond these two labels, the interface is designed in the same way. Participants choose between twelve mushroom boxes where to click. Each box contains images of one particular mushroom. The images of the same mushroom share predictive similarities with this mushroom. Such an interface would perfectly work for investigating learners’ study sequence choices in a category induction task. Taken together, at the subcategories level, there is no difference to a typical interface of a selection page used by previous studies, but superordinate labels go on top.

How do the superordinate labels change the task requirements? At the superordinate category level, each mushroom is very distinct from all other mushrooms with one exception — its twin having the opposite superordinate label. Learners receive a task instruction at the superordinate level, that is to learn edible and poisonous mushrooms. Based on these task requirements, learners’ study sequence choices can be interpreted in terms of their goals, finding between-differences when switching between poisonous and edible vs. finding within-commonalities when switching between different mushrooms of the same type. Moreover, learners’ choices also reveal their preconceptions about the task requirements, that is for example whether they understand how the interface informs them which mushrooms are similar, which between-comparisons are informative (between the similar or the distinct ones), and whether different mushrooms of the same type share predictive features.

Measures

Study Sequence Metrics

Learners’ study sequence metrics were calculated based on their choices. Our coding schema is depicted in Fig. 2.

Interleaving Metrics We calculated an overall interleaving index, which reflects the number of switches between the superordinate categories and includes both horizontal switches (that is, between mushroom doubles) and diagonal switches (that is, across dissimilar mushrooms). It is worth noting that only horizontal switches were informative in detecting predictive differences. Diagonal switches, in contrast, were not informative for finding predictive differences due to their high dissimilarity. Henceforth, we shall refer to these two interleaving metrics as informative between-switches (horizontal) and non-informative between-switches (diagonal). Informative between-switches could occur a maximum of 30 times, while non-informative between-switches could occur a maximum of 35 times.

Blocking Metrics Complementary to the overall interleaving index is the blocking rate, which is operationalized as 1 minus the overall interleaving index. The blocking rate reflects the extent to which learners tended to study mushrooms of the same superordinate category without switching to the second superordinate category. This includes both within-switches (vertical switches between mushrooms within the same superordinate category) and no switches (i.e., subsequently selecting the images of the same mushroom). Within-switches could occur a maximum of 34 times, while no-switches could occur a maximum of 24 times. It is worth noting that the within-switches were not informative for categorization due to the lack of shared predictive features for superordinate category membership.

Self-Reported Measures on Learning Experience

Subsequently to the study phase, participants received questions concerning their learning experience. We report the wordings of particular items only for the high utility conditions (that is, for learning poisonous and edible mushrooms) and skip the analogous wordings for the low utility conditions (that is, for learning mushrooms growing on the acidic and alkaline to neutral soil).

Perceived Confusability The perceived confusability was assessed via a self-designed item It is easy to confuse, which mushrooms are poisonous and which edible on a 7-point Likert-scale with options ranging between not true at all and completely true.

Utility Value of Distinguishing The utility value of distinguishing the respective mushrooms (i.e., the awareness of the consequences of confusing) was assessed via two self-designed items, It is important not to confuse, which mushrooms are poisonous and which edible and It is dangerous to confuse, which mushrooms are poisonous and which edible, on a 7-point Likert-scale with options ranging between not true at all and completely true with Cronbach’s α = 0.76. This scale served for the utility value of distinguishing manipulation check.

Interest The interest in learning of the respective mushrooms was assessed via three adapted items for dimension interest from the multidimensional short scale of intrinsic motivation by Wilde et al. (2009) (e.g., Learning the mushrooms was very interesting) on a 5-point Likert-scale with options ranging between not true at all and is completely true with Cronbach’s α = 0.86. This scale served for the utility value of distinguishing manipulation check.

Perceived Authenticity The perceived authenticity of learning the respective mushrooms was assessed via two self-designed items, While learning, I imagined myself picking mushrooms in the forest and While learning, I imagined holding mushrooms in my hands, on a 5-point Likert-scale with options ranging between strongly disagree and strongly agree with Cronbach’s α = 0.70. Like the utility value of distinguishing scale and the interest scale, the perceived authenticity scale served for the utility value manipulation check.

Mental Effort Invested in Differences The mental effort invested to detect differences between the mushrooms of the two superordinate categories was assessed via two self-designed items, I invested effort to detect differences between poisonous and edible mushrooms and I invested effort to compare poisonous and edible mushrooms with each other, on a 7-point Likert-scale with options ranging between not true at all and completely true with Cronbach’s α = 0.73. This scale particularly served to explore whether learners invest more effort to detect differences when the utility value of distinguishing is high (that is, whether learners have the awareness that differences matter).

Mental Effort Invested in Commonalities The mental effort invested to detect commonalities within a respective superordinate mushroom category was assessed via two items, I invested effort to detect commonalities among poisonous mushrooms and I invested effort to detect commonalities among edible mushrooms, on a 7-point Likert-scale with options ranging between not true at all and completely true with Cronbach’s α = 0.92. This scale particularly served to explore whether learners invest less effort to detect commonalities when the utility value of distinguishing is high (that is, whether learners have the awareness that differences matter).

Classification Performance

We used two measures requiring learners to classify based on the superordinate categories and subordinate categories, respectively.

Category Assignment Task In the category assignment task, participants were presented with two novel images of each of the 12 mushrooms in a randomized order, without their respective names. They were then required to assign each image to one of the two superordinate categories previously learned (e.g., edible vs. poisonous in high utility value conditions).

Picture Assignment Task In the picture assignment task, participants were presented with one mushroom name at a time, in a randomized order, accompanied by six novel mushroom images belonging to the same superordinate category (without labelling the respective superordinate category). Participants had to select a matching mushroom image in response to each mushroom name (i.e., forced choice between six alternatives).

Metacognitive Preference

Participants were asked to indicate their metacognitive preference by choosing the most effective sequence for studying the respective mushroom types: (a) studying mushrooms of one type first and then the other type (blocking), (b) studying mushrooms of both types alternately (interleaving), or (c) no preference. Note, although most learners might actually prefer a mix of blocking and interleaving, we refrained from providing a mixed response option because such an option merges a high range of sequence possibilities.

Procedure

Our online experiment took approximately 25 min and was conducted using SoSci Survey. After participants provided their consent and answered socio-demographic questions, their previous knowledge of mushrooms was assessed by having them indicate the names of listed mushrooms (used in the experiment) that they could visually recognize. For the subsequent study phase, participants were instructed to learn mushrooms of two superordinate categories by freely creating a study sequence. The aim was to be able to assign mushroom images to one of the two respective superordinate categories in the final category assignment test. In the instruction that for distinguishing, the detection of differences matters conditions, participants were additionally informed that for being able to distinguish confusable mushrooms, it is essential to identify their subtle differences.

Before the classification tests, participants engaged in two distractor tasks, completed at their own pace. These tasks involved matching six words with their written backwards synonyms (time spent Mdn = 65 s) and six arithmetic equations with their results (time spent Mdn = 31 s). Following completion of the classification tests, participants were asked to rate the appropriateness of the learning setting and their level of concentration during the study on a five-point Likert scale ranging from very low to very high. Participants were then asked to indicate their metacognitive preferences. Finally, participants were asked whether they were familiar with interleaved learning and, if so, to provide information about it. Additionally, participants were given the options to participate in a raffle and or receive 0.5 credits.

Results

The data are publicly available on Open Science Framework at https://osf.io/hs3pk/?view_only=b3ec5b34fc3f4a55b6df63419d06493b. Table 1 displays the mean scores and standard deviations of dependent measures as a function of factor manipulations. Table 2 displays the correlational pattern across dependent measures as well as the mean scores and standard deviations across all subjects.

Table 1 Means and standard deviations of dependent measures in Experiment 1 as a function of factor manipulations, utility value of distinguishing and instruction that differences matter
Table 2 Spearman correlations among dependent measures in Experiment 1

Preliminary Analysis

The previous knowledge of mushrooms was quite low: 0.31 mushrooms out of 12 were reported to be visually recognizable by learners. The randomization was successful since there was neither a main effect of utility value, F(1, 186) = 0.54, p = 0.464, ηp2 < 0.01, nor a main effect of differences matter instruction, F(1, 186) = 0.21, p = 0.650, ηp2 < 0.01, nor an interaction, F(1, 186) = 1.02, p = 0.313, ηp2 = 0.01.

Hypotheses on Study Sequence Metrics

We hypothesized that learners refrain from interleaving because they do not recognize the utility value of distinguishing as a learning goal in learning confusable categories and are not aware that differences are essential for distinguishing confusable categories. We accordingly expected a positive impact of the utility value of distinguishing (utility-matter-hypothesis) and of instruction that differences matter (differences-matter-hypothesis) on the extent of interleaving in participants’ study sequence choices. Figure 3 displays the mean distribution of study sequence choices across conditions and visualizes an interaction pattern.

Fig. 3
figure 3

Study sequence metrics in Experiment 1 as a function of factor manipulations, utility value of distinguishing and instruction that differences matter for distinguishing. Note: Interleaving metrics embrace informative between-switches (indicated by horizontal lines) and non-informative between-switches (indicated by diagonal lines). Blocking metrics embrace within-switches (indicated by vertical lines) and no switches (indicated by monotonous colouring)

Informative Between-Switches

In terms of the number of switches between the mushroom doubles, there was no main effect of utility value, F(1, 186) = 2.26, p = 0.135, ηp2 = 0.01. Differences matter instruction yielded a main effect, F(1, 186) = 9.94, p = 0.002, ηp2 = 0.05, indicating that participants with instruction interleaved more frequently (M = 13.18, SE = 1.59) than participants without instruction (M = 6.07, SE = 1.61). Both factors interacted, F(1, 186) = 5.19, p = 0.024, ηp2 = 0.03. Participants who learned poisonous and edible mushrooms interleaved more when given the differences matter instruction compared to when given no instruction, p < 0.001, 95% CI [6.02, 18.47], MD = 12.25, SE = 3.16. In contrast, participants who learned mushrooms growing on acidic and alkaline to neutral soil were not affected by the differences matter instruction, p = 0.541, 95% CI [-4.39, 8.34], MD = 1.97, SE = 3.23. Moreover, participants with the differences matter instruction interleaved more when the utility value of distinguishing was high than when it was low, p = 0.008, 95% CI [2.27, 14.78], MD = 8.53, SE = 3.17. In contrast, participants without instruction were not affected by the utility value of distinguishing, p = 0.587, 95% CI [-8.08, 4.58], MD = -1.75, SE = 3.21.

Non-informative Between-Switches

In terms of the number of switches between dissimilar mushrooms of two types, there was neither a main effect of utility value, F(1, 186) = 0.78, p = 0.377, ηp2 < 0.01, nor a main effect of differences matter instruction, F(1, 186) = 2.08, p = 0.151, ηp2 = 0.01. Both factors interacted, F(1, 186) = 4.95, p = 0.027, ηp2 = 0.03. Participants who learned poisonous and edible mushrooms interleaved more when given the differences matter instruction compared to when given no instruction, p = 0.009, 95% CI [1.26, 8.93], MD = 5.10, SE = 1.94. In contrast, participants who learned mushrooms growing on acidic and alkaline to neutral soil were not affected by the differences matter instruction, p = 0.585, 95% CI [-5.01, 2.83], MD = -1.09, SE = 1.99. Overall, this interaction pattern is consistent with the findings on informative between-switches.

Within-Switches

In terms of the number of switches between the mushrooms of the same type, there was no main effect of utility value, F(1, 186) = 0.74, p = 0.391, ηp2 < 0.01. We found no significant main effect of differences matter instruction, F(1, 186) = 3.50, p = 0.063, ηp2 = 0.02. There was no interaction between the factors, F(1, 186) = 0.11, p = 0.746, ηp2 < 0.01.

No Switches

In terms of the number of choices that stick to the same mushroom, there was neither a main effect of utility value, F(1, 186) = 0.27, p = 0.606, ηp2 < 0.01, nor a main effect of differences matter instruction, F(1, 186) = 0.68, p = 0.413, ηp2 < 0.01. Both factors interacted, F(1, 186) = 5.19, p = 0.024, ηp2 = 0.03. Participants who learned poisonous and edible mushrooms blocked less when given the differences matter instruction compared to when given no instruction, p = 0.028, 95% CI [-18.24, -1.06], MD = -9.65, SE = 4.35. In contrast, participants who learned mushrooms growing on acidic and alkaline to neutral soil were not affected by the differences matter instruction, p = 0.309, 95% CI [-4.24, 13.31], MD = 4.54, SE = 4.45.

To sum up, the interaction pattern of results over all study sequence metrics is somewhat inconsistent with our expected main effects: Participants who were motivated to distinguish and informed about the importance of differences interleaved to a higher extent than participants who were not motivated or not informed.

Analyses on Learning Experience

Perceived Confusability

On a 7-point Likert scale, learners across all conditions recognized that categories were highly similar and thus confusable (M = 5.81, SD = 1.18). The utility value of distinguishing manipulation was neither supposed to increase the similarity between mushrooms nor to highlight it. Accordingly, there was no main effect of utility value; there was also no main effect of differences matter instruction and no interaction between the factors, Fs < 1.

Utility Value of Distinguishing

We found a main effect of utility value, F(1, 186) = 141.41, p < 0.001, ηp2 = 0.43, indicating that participants learning poisonous and edible mushrooms (M = 13.06, SE = 0.26) were more sensitive to the risks of confusing the mushrooms than their counterparts learning mushrooms growing on the acidic and alkaline to neutral soil (M = 8.72, SE = 0.26). Thus, the manipulation check was successful. There was neither a main effect of differences matter instruction, nor did both factors interact, Fs < 1.

Interest

We found a main effect of utility value, F(1, 186) = 8.78, p = 0.003, ηp2 = 0.05, in favor of learning poisonous and edible mushrooms (M = 9.13, SE = 0.30 vs. M = 7.88, SE = 0.30). Thus, the manipulation check was successful. There was neither a main effect of differences matter instruction, F(1, 186) = 1.99, p = 0.160, ηp2 = 0.01, nor did both factors interact, F(1, 186) = 0.26, p = 0.611, ηp2 < 0.01.

Perceived Authenticity

We found a main effect of utility value, F(1, 186) = 12.68, p < 0.001, ηp2 = 0.06, in favor of learning poisonous and edible mushrooms (M = 4.61, SE = 0.22 vs. M = 3.49, SE = 0.22). Thus, the manipulation check was successful. There was neither a main effect of differences matter instruction, nor did both factors interact, Fs < 1. Overall, the pattern of results on perceived authenticity resembles that of interest and utility value of distinguishing, completely confirming the manipulation check of the utility value of distinguishing manipulation.

Mental Effort Invested to Detect Differences Between Mushroom Types

We found no main effect of utility value, F(1, 186) = 2.57, p = 0.111, ηp2 = 0.01, indicating learners’ lack of awareness that differences matter for distinguishing (see Fig. 1). There was a main effect of differences matter instruction, F(1, 186) = 4.93, p = 0.028, ηp2 = 0.03, indicating that participants who were informed about the importance of differences for being able to distinguish invested more effort to look for differences than participants without an instruction (M = 10.86, SE = 0.33 vs. M = 9.83, SE = 0.33). Both factors did not interact, F(1, 186) = 0.75, p = 0.389, ηp2 < 0.01.

Mental Effort Invested to Detect the Commonalities Between Mushrooms of the Same Type

We found no main effect of utility value, F(1, 186) = 0.06, p = 0.816, ηp2 < 0.01, corroborating with the lack of its impact on mental effort for detecting differences. There was a main effect of differences matter instruction, F(1, 186) = 8.18, p = 0.005, ηp2 = 0.04, indicating that participants who were informed about the importance of differences invested less effort to look for commonalities than participants without an instruction (M = 7.83, SE = 0.40 vs. M = 9.46, SE = 0.41). Both factors did not interact, F(1, 186) = 0.14, p = 0.709, ηp2 < 0.01.

Correlational Analysis of Sequence Choices and Classification Performance

Informative between-category switches did not correlate with the performance in the category assignment task, r =  − 0.02, p = 0.407 (one-tailed).

Analysis of Metacognitive Preference

Finally, to test whether participants show a different preference for interleaving depending on whether they were motivated or informed, we applied ordinal regression. The values of the metacognitive preference measure ranged between blocking, equal, and interleaving. Figure 4 displays the mean proportions of metacognitive preferences across conditions. We found a significant impact of utility value, β = 0.77, SE = 0.38, Wald χ2 = 4.10, df = 1, p = 0.044, indicating a higher preference for interleaving when the utility value of distinguishing is high. We also found a significant impact of differences matter instruction, β = 0.79, SE = 0.38, Wald χ2 = 4.29, df = 1, p = 0.038, indicating learners’ higher preference for interleaving when being informed that differences matter for distinguishing. We found no interaction between both factors, β = 0.75, SE = 0.55, Wald χ2 = 1.87, df = 1, p = 0.172. To sum up, the pattern of results on metacognitive preferences shows the expected impact of the utility value of distinguishing and of instruction that differences matter.

Fig. 4
figure 4

Preferred study sequence in Experiment 1 as a function of factor manipulations, utility value of distinguishing and instruction that differences matter for distinguishing

Discussion

In Experiment 1, we investigated the impact of the utility value of distinguishing and of the conditional knowledge component that differences matter for distinguishing learners’ spontaneous engagement in interleaving. We assumed that learners would refrain from interleaving because they do not recognize the utility value of distinguishing when dealing with confusable categories and are not aware that differences matter to be able to distinguish. Accordingly, we expected two main effects, of utility value (utility-value-hypothesis) and of the instruction that differences matter (differences-matter-hypothesis). The pattern of results with regard to the study sequence choices is somewhat inconsistent with the expected main effects: Learners interleaved to a higher extent only if they were both motivated to distinguish and informed that to be able to distinguish, one needs to find the subtle differences between confusable mushrooms.

The findings on the metacognitive preferences showed a somewhat different pattern as compared with study sequence choices. Herein, we found no interaction effect but two main effects. These findings are aligned with our hypotheses on study sequence choices (although, numerically, the pattern of proportions between the conditions resembles the pattern of learners’ study sequence choices; see Fig. 4). Probably the metacognitive preference measure, which requires learners to choose which sequence is more effective, exhibits a distinct sensitivity to the experimental manipulations than study sequence choices with a potentially high range of proportions (between pure blocking and pure interleaving).

Taken together, the pattern of results in Experiment 1 suggests that learners refrain from interleaving because they generally lack the motivation to distinguish and they are not aware that detecting differences is essential for distinguishing. In line with this conclusion, we found no impact of the utility value of distinguishing manipulation on the perceived mental effort invested to detect differences (and commonalities), indicating learners’ lack of awareness that differences are essential for being able to distinguish.

However, this is a preliminary interpretation. In Experiment 1, we have neither manipulated learners’ awareness-that-interleaving-highlights-differences, nor have we assessed it. Furthermore, learners’ averaged interleaving rate of 18.53% was still substantially lower than their blocking rate.

Over and above, in Experiment 1, there is an unexpected null finding to highlight: The extent of interleaving (especially informative between-switches) had no impact on categorization performance, which is at odds with previous research that found a positive link between the extent of spontaneous interleaved study choices and categorization (Abel, 2023b; Lu et al., 2020; Onan et al., 2024). We consider this null finding a case of utilization deficiency. Utilization deficiency occurs when employing a strategy does not necessarily improve learning (Miller, 2000). One possible explanation for the lack of interleaving effect in self-regulated learning might be that the discriminative contrast between subsequent images is impaired by a study sequence choice, which probably imposes additional cognitive load due to metacognitive processes (van Gog et al., 2020) and diminishes the visual representation of the preceding image due to temporal spacing (cf. Birnbaum et al., 2013; Yan & Schuetze, 2021). One further explanation could be that learners did not sufficiently understand how the mushroom doubles were distributed on the selection page. In line with this explanation, we found a similar effect of our manipulations on informative and non-informative between-switches. Moreover, we found similar correlations with the effort invested in finding differences for informative (r = 0.25, p < 0.001) and non-informative between-switches (r = 0.22, p = 0.003), z = 0.31, p = 0.759. Experiment 2 was carried out to address these shortcomings and to scrutinize our research question more deeply.

Experiment 2

In Experiment 1, learners applied blocking to a higher extent than interleaving even when motivated to distinguish and informed about the importance of detecting differences for being able to distinguish: Indeed, 68% of their study sequence choices were blocked (especially within-switches). It was, thus, unclear whether learners do have awareness gaps that interleaving highlights differences or that learners simply lacked the information at the beginning of their study phase that mushroom doubles could be found in the same rows.

In Experiment 2, we manipulated two instructions to target potential awareness gaps, the instruction that differences matter for distinguishing and the instruction that interleaving highlights differences (see Fig. 1; see also Appendix). Furthermore, different from Experiment 1, we now labelled the mushroom stimuli as poisonous and edible in all conditions to motivate all learners to distinguish confusable mushrooms. We also informed all learners (irrespective of the condition) that respectively, similar poisonous and edible mushrooms were found in the same row (so that when they looked for differences, they were aware which mushrooms were likely to be confused with each other). We collected two novel measures by asking learners to indicate which sequence highlights the differences between poisonous and edible mushrooms to a higher extent and which sequence highlights the commonalities between the mushrooms of the same type to a higher extent.

To recap our hypothesis, learners refrain from spontaneous engagement in interleaving because they do not realize the utility value of distinguishing confusable categories as a learning goal and lack the awareness that to be able to distinguish, they need to detect the differences between the categories. We accordingly expected to replicate the main finding from Experiment 1; that is, when students are motivated to distinguish and informed about the importance of detecting differences, they engage in interleaving and prefer interleaving to a higher extent. Because we now used only poisonous and edible mushroom labels, we accordingly expected to observe a main effect of differences matter instruction on the extent of interleaved study choices (differences-matter-hypothesis).

In Experiment 2, we additionally investigated the research question of whether learners lack the awareness that interleaving highlights differences (interleaving-highlights-differences-question). If learners lack this awareness, we would expect an impact of interleaving highlights differences instruction on study sequence choices in learning poisonous and edible mushrooms. By implication, we would expect no additive advantage of interleaving highlights differences instruction going beyond the impact of the differences matter instruction if learners are aware that interleaving highlights differences.

Method

The present research was not preregistered. Overall, in Experiment 2, we used the same stimuli, measures, and procedure as in Experiment 1. In the following, we will thus list only the changes to Experiment 1.

Sample

To determine the required sample size, like in Experiment 1, we conducted an a priori power analysis with the following parameters: alpha-level = 0.05, f = 0.34 for the instruction that differences matter on informative between-switches when learning edible and poisonous mushrooms (i.e., high utility value of distinguishing), obtained in Experiment 1. We had no clear expectation regarding the extent of the advantage of interleaving highlights differences instruction. We thus chose to base our sample size calculations on a medium effect size of f = 0.34, ensuring a large power of 0.95, which led to a required sample size of 115.

A total of N = 148 participants completed the online experiment. The incentives and the exclusion criteria were the same as in Experiment 1. Ten subjects had to be excluded because of a low (or very low) level of concentration and one subject because they were aware of the theoretical background of the present research (interleaving effect). Additionally, we excluded three subjects based on their self-reported previous knowledge of mushrooms exceeding 2 (out of 12).

Of the remaining 134 participants, 128 were students (MAge = 24.08, SD = 5.45). The sample consisted of 102 women and 32 men. The subjects were randomly assigned across four conditions.

Design

Experiment 2 is based on a 2 × 2 between-subjects design with instruction that differences matter (with vs. without) and instruction that interleaving highlights differences (with vs. without) (see Appendix). We now used only the labels poisonous and edible for mushroom stimuli to induce a high utility value of distinguishing in all participants.

Additional Measures

Subsequently to the study phase, participants received additional questions concerning their awareness that interleaving highlights differences. Particularly, learners were asked to indicate on a 5-point Likert scale which sequencedefinitely interleaving, somewhat interleaving, equally, somewhat blocking, or definitely blockinghighlights the differences between the edible and poisonous mushrooms to a higher extent and in which sequence can the edible and poisonous mushrooms be better distinguished (Cronbach’s α = 0.87). To assess learners’ awareness that blocking highlights commonalities, we asked them to indicate which sequence highlights the commonalities between the edible mushrooms to a higher extent and which sequence highlights the commonalities between the poisonous mushrooms to a higher extent (Cronbach’s α = 0.93).

Procedure Changes

Before starting with the study phase, all participants were now explicitly informed that visually similar poisonous and edible mushrooms were respectively in the same row on the mushroom selection page, followed by the instruction that differences matter for distinguishing (with vs. without) and the general information (on study phase and subsequent classification test). Before moving to the next instruction, all participants were asked to briefly recap their learning task by writing their responses in a textbox.

Participants who received the interleaving highlights differences instruction were both briefly introduced to two basic study sequences, blocking (that is, studying mushrooms of one type at a time) and interleaving (that is, studying the mushroom types alternately), and informed about their respective effects on detecting commonalities between mushrooms of the same type and differences between mushrooms of different types. Participants without this instruction were merely briefly introduced to the study sequences without being informed about their effects. Before moving to the study phase, all participants were asked to briefly recap what they learned about the study sequences by writing their response in a textbox.

Results

The data are publicly available at https://osf.io/hs3pk/?view_only=b3ec5b34fc3f4a55b6df63419d06493b. Table 3 displays the mean scores and standard deviations of dependent measures as a function of factor manipulations. Table 4 displays the correlational pattern across dependent measures as well as the mean scores and standard deviations across all subjects.

Table 3 Means and standard deviations of dependent measures in Experiment 2 (with stimuli increasing the utility value of distinguishing) as a function of factor manipulations, instruction that differences matter and instruction that interleaving supports the detection of differences
Table 4 Spearman correlations among dependent measures in Experiment 2 (with stimuli increasing the utility value of distinguishing)

Preliminary Analysis

The prior knowledge of mushrooms was quite low: 0.37 mushrooms out of 12 were reported to be visually recognizable by learners. The randomization was successful since there was neither a main effect of the differences matter instruction, F(1, 130) = 2.60, p = 0.109, ηp2 = 0.02, nor a main effect of the interleaving highlights differences instruction, F(1, 130) = 0.44, p = 0.510, ηp2 < 0.01, nor an interaction, F(1, 130) = 0.35, p = 0.553, ηp2 < 0.01.

Hypotheses on Study Sequence Metrics

We hypothesized that participants who were informed about the importance of differences should interleave to a higher extent than participants who were not informed when dealing with edible and poisonous mushrooms (differences-matter-hypothesis). Furthermore, we explored whether the interleaving highlights differences instruction will add any advantage (interleaving-highlights-differences-question). Figure 5 displays the mean distribution of study sequence choices across conditions.

Fig. 5
figure 5

Study sequence metrics in Experiment 2 (with stimuli increasing the utility value of distinguishing) as a function of factor manipulations, instruction that differences matter for distinguishing and instruction that interleaving highlights differences. Note: Interleaving metrics embrace informative between-switches (indicated by horizontal lines) and non-informative between-switches (indicated by diagonal lines). Blocking metrics embrace within-switches (indicated by vertical lines) and no switches (indicated by monotonous colouring)

Informative Between-Switches

There was a medium-sized main effect of the differences matter instruction, F(1, 130) = 14.86, p < 0.001, ηp2 = 0.10, indicating that participants with instruction interleaved more frequently (M = 12.71, SE = 1.10) than participants without instruction (M = 6.69, SE = 1.11). The interleaving highlights differences instruction yielded no significant main effect, F(1, 130) = 2.90, p = 0.091, ηp2 = 0.02. There was no interaction, F(1, 130) < 0.01, p = 0.990, ηp2 < 0.01.

Non-informative Between-Switches

There was no significant main effect of the differences matter instruction, F(1, 130) = 3.02, p = 0.085, ηp2 = 0.02. The interleaving highlights differences instruction also yielded no significant main effect, F(1, 130) = 3.32, p = 0.071, ηp2 = 0.03. There was no interaction, F(1, 130) = 0.39, p = 0.531, ηp2 < 0.01.

Within-Switches

There was a moderate main effect of the differences matter instruction, F(1, 130) = 10.88, p < 0.001, ηp2 = 0.08, indicating that participants with instruction switched between mushrooms of the same superordinate category to a lesser extent (M = 9.35, SE = 1.27) than participants without instruction (M = 15.27, SE = 1.27). The interleaving highlights differences instruction yielded no main effect, F(1, 130) = 1.51, p = 0.222, ηp2 = 0.01. There was no interaction, F(1, 130) = 0.43, p = 0.512, ηp2 < 0.01.

No Switches

There was no main effect of the differences matter instruction, F(1, 130) = 1.59, p = 0.210, ηp2 = 0.01. The interleaving highlights differences instruction yielded also no main effect, F(1, 130) = 2.52, p = 0.115, ηp2 = 0.02. There was no interaction, F(1, 130) = 0.21, p = 0.650, ηp2 < 0.01.

To sum up, the pattern of results over all study sequence metrics supports the differences-matter-hypothesis implying that learners are not aware of the importance of differences to be able to distinguish: Learners who were informed about the importance of detecting the differences between poisonous and edible mushrooms switched less between the mushrooms of the same type (i.e., moderate effect on within-switches) but especially switched to a higher extent between poisonous and edible mushroom doubles (i.e., moderate effect on informative between-switches). Regarding the interleaving-highlights-differences-question, the results support rather the view that learners are aware that interleaving highlights differences: There was no additional benefit of informing learners that interleaving highlights differences in terms of any single study sequence metric, particularly not on informative between-switches, indicating no increasing willingness to contrast confusable mushroom doubles. To reassure our conclusion, we additionally ran Bayesian ANOVA to quantify the likelihood of observing the impact of instruction on the interleaving rate given its presence versus its absence. Bayesian analysis revealed no favor of impact, which was 0.75 times more likely than its absence, suggesting no decisive conclusion due to low statistical power.

Analyses of Learning Experience

Mental Effort Invested to Detect Differences Between Mushroom Types

Learners across all conditions invested much effort to detect differences (in sum 11.63 out of 14, which is higher than 10.35 in Experiment 1). We found no main effect of the differences matter instruction, F(1, 130) = 0.74, p = 0.393, ηp2 = 0.01. There was no main effect of the interleaving highlights differences instruction, F(1, 130) = 1.80, p = 0.183, ηp2 = 0.01. Both factors did not interact, F(1, 130) = 0.12, p = 0.731, ηp2 < 0.01.

Mental Effort Invested to Detect the Commonalities Between Mushrooms of the Same Type

We found no main effect of the differences matter instruction, F(1, 130) = 0.14, p = 0.706, ηp2 < 0.01. There was no main effect of the interleaving highlights differences instruction, F(1, 130) = 0.11, p = 0.739, ηp2 < 0.01. Both factors did not interact, F(1, 130) = 0.37, p = 0.547, ηp2 < 0.01.

As shown by a paired sample two-tailed t test, learners across all conditions invested less effort in detecting commonalities than differences, t(133) = -7.43, p < 0.001, d = -0.64, MD = -2.93, SE = 0.39.

Analyses on Awareness

Awareness That Interleaving Highlights Differences

We found no main effect of the differences matter instruction, F(1, 130) = 0.72, p = 0.399, ηp2 = 0.01. There was no main effect of the interleaving highlights differences instruction, F(1, 130) = 0.25, p = 0.622, ηp2 < 0.01. Both factors did not interact, F(1, 130) = 0.25, p = 0.620, ηp2 < 0.01. A one-sample two-tailed t test showed a significant difference between the observed rating across all participants of 3.8 (out of 5) and a hypothetical rating of 3 for equal effectivity, t(133) = 8.39, p < 0.001, d = 0.73, indicating learners’ awareness that interleaving highlights differences.

Awareness That Blocking Highlights Commonalities

We found no main effect of the differences matter instruction, F(1, 130) = 0.21, p = 0.646, ηp2 < 0.01. There was no main effect of the interleaving highlights differences instruction, F(1, 130) = 2.03, p = 0.157, ηp2 = 0.02. Both factors did not interact, F(1, 130) = 0.06, p = 0.803, ηp2 < 0.01. A one-sample two-tailed t test showed a significant difference between the observed rating across all participants of 4.03 (out of 5) and a hypothetical rating of 3 for equal effectivity, t(133) = 11.19, p < 0.001, d = 0.97, indicating learners’ awareness that blocking highlights commonalities.

Correlational Analysis of Sequence Choices and Classification Performance

This time, informative between-category switches positively correlated with the performance in the category assignment task, r = 0.17, p = 0.025 (one-tailed).

Analysis of Metacognitive Preference

Finally, to test the impact of both instructions regarding the metacognitive preferences, we applied ordinal regression. Figure 6 displays the mean proportions of metacognitive preferences across conditions. We found a significant impact of the differences matter instruction, β = 1.06, SE = 0.49, Wald χ2 = 4.72, df = 1, p = 0.030. We found no impact of interleaving highlights differences instruction, β = 0.67, SE = 0.45, Wald χ2 = 2.18, df = 1, p = 0.140. We also found no interaction between both factors, β = 0.54, SE = 0.46, Wald χ2 = 1.38, df = 1, p = 0.241.

Fig. 6
figure 6

Preferred study sequence in Experiment 2 (with stimuli increasing the utility value of distinguishing) as a function of factor manipulations, instruction that differences matter for distinguishing and instruction that interleaving highlights differences

To sum up, the pattern of results on metacognitive preferences shows only the impact of the differences matter instruction, which is aligned with the pattern of results on study sequence metrics, which mainly showed a higher engagement in interleaving when learners were informed about the importance of detecting differences.

Discussion

In Experiment 2, we manipulated the two conditional knowledge instructions that the detection of differences matters and that interleaving highlights differences and investigated their impact on learners’ study sequence choices in learning poisonous and edible mushrooms. We were able to replicate the main finding from Experiment 1, that is, a moderate positive effect of differences matter instruction on study sequence choices (less within-switches and more informative between-switches) and metacognitive preferences in learning poisonous and edible mushrooms, supporting the differences-matter-hypothesis. Our assumption that learners lack the awareness of the importance of detecting differences for being able to distinguish was hence fully supported.

Regarding the interleaving-highlights-differences-question, the results suggest that learners are more or less aware that interleaving highlights differences. We found no impact of this instruction, neither on any single study sequence metric, nor on the metacognitive preferences, nor on the assessed awareness that interleaving highlights differences. Furthermore, across all conditions, the assessed awareness that interleaving highlights differences was far above the chance level.

However, it is important to emphasize a potential methodological limitation. In case that the instruction that interleaving highlights differences had only a small effect size, substantially larger sample sizes would be required to detect any such effect. To be precise, for an effect size of f = 0.18, as observed for informative between-switches, and a statistical power of 0.80, 245 subjects would be necessary. However, it is worth noting that the effect size would be below a threshold of practical relevance (f = 0.20; cf. Hattie et al., 1996).

Comparing Experiment 2 with Experiment 1 (where participants were not informed that interleaving highlights differences) regarding the overall engagement in interleaving, we see an overall boost from M = 18.53% to M = 45.90%. We see this boost particularly in both conditions that were already employed in Experiment 1 for learning poisonous and edible mushrooms: from 29.91 to 51.31% when informed about the importance of differences and from 12.06 to 27.28% when no instruction was provided. We attribute this boost in Experiment 2 to additional information on the interface of the selection page that similar mushrooms were in the same rows. By providing this information, learners were aware in advance which poisonous and edible mushrooms are respectively confusable, while in Experiment 1, learners probably discovered it incidentally during the study phase. This instruction (in combination with distinct color-coding of mushroom pairs on the interface) served thus as indirect support for engaging in interleaved learning. In line with this interpretation, we now found an impact of instruction that differences matter only on informative between-switches (but not non-informative between-switches), and the correlation with the effort invested in finding differences was now higher for informative between-switches (r = 0.47, p < 0.001) as compared to non-informative between-switches (r = 0.23, p = 0.008), z = 2.32, p = 0.020. The utilization deficiency, which occurred in Experiment 1, was consequently reduced in Experiment 2: Learners who were now looking for differences were more likely to translate this intention into informative between-switches and in turn a better ability to tell poisonous and edible mushrooms apart.

General Discussion

The present research investigated the reasons for learners’ disengagement in interleaved learning. We started with the assumption that learners are more inclined to engage in effortful study strategies such as interleaved learning if they believe that their effort pays off. That is, if they understand when and why interleaving should be used for utilizing its benefits and if they realize the utility value of these benefits in the first place. Building upon metacognitive and motivational perspectives, we targeted learners’ potential deficits in a fine-grained manner regarding their conditional knowledge and motivation that presumably prevent them from engaging in interleaved learning when dealing with confusable categories: namely, by increasing the utility value of distinguishing, by informing learners about the importance of between-category differences for being able to distinguish, and by informing that interleaving highlights these differences (see Fig. 1).

Our findings indicate that learners, on the one hand, have some awareness that interleaving highlights differences. This was shown by no substantial advantage but somewhat redundancy of informing learners that interleaving highlights differences in terms of their assessed awareness but especially study sequence choices and metacognitive preferences in Experiment 2.

On the other hand, consistently in line with our expectations, across two experiments, learners were engaged in interleaving and preferred interleaving to a higher extent only when they were both motivated to distinguish (which was the case when they learned edible and poisonous mushrooms) and informed about the importance of differences for being able to distinguish. We conclude that learners lack the awareness of both the motivational why-component (Why should I care about distinguishing?) and the metacognitive why-component regarding the crucial role of differences in enabling reliable discrimination (Why interleaving’s advantage is especially important for distinguishing?). The latter lack of awareness was also indicated by no impact of the utility value of distinguishing manipulation on invested effort for finding differences in Experiment 1.

On a broader level, these findings highlight that the insights from the wealth of metacognitive and motivational learning strategy research (e.g., Dignath & Büttner, 2008; Dignath & Veenman, 2020; Zepeda et al., 2020) apply to the learning strategy of interleaving as well. Evidently, metastrategic knowledge and utility value matter for learners’ engagement in interleaving, and deficits in this regard can be successfully remedied by informed training interventions. On a specific level, these observations align with the findings of Yan et al. (2017), where learners tend to perceive between-category comparisons as less important than within-category comparisons, while some being aware that between-category switches highlight differences (as observed in their Experiment 4, where 51% of the students indicated this awareness) (see also Onan et al., (2022) for qualitative analysis of learners’ responses). Furthermore, as shown by Yan et al. (2016), providing learners with additional information that interleaving highlights differences barely affected learners’ metacognitive preferences beyond the presentation of empirical evidence demonstrating the superiority of interleaving over blocking. This was true regardless of whether learners were informed about the misleading experience of fluency caused by blocking.

In Experiment 2, we were able to motivate learners to surpass the benchmark of a 50% interleaving rate by addressing the why-components. We believe that this effect can be interpreted as meaningful in the face of two hurdles. Firstly, the interface implicitly suggested within-superordinate-category comparisons, facilitated by the diversity of different mushrooms per superordinate category and by a low number of distinct images (three per mushroom). This stands in contrast to previous studies that employed a much larger number of images per category (cf. Lu et al., 2020; Kornell & Vaughn, 2018; Tauber et al., 2013; Yan et al., 2017 in Experiment 4).

Secondly, our minimalistic approach yielded the observed effect despite the absence of numerous intervention components that have been identified as significant in previous research in mending persistent metacognitive misbeliefs (Onan et al., 2024; Sun et al., 2022; Yan et al., 2016). These components include the presentation of empirical evidence demonstrating the superiority of interleaving over blocking, providing information regarding the misinterpretation of mental effort, and personally experiencing these strategies. Taken together, the yielded effect underscores the importance of learners’ conditional knowledge on why-components for fostering their engagement in interleaving.

Why Do Learners Still Block?

The highest extent of interleaving was achieved in Experiment 2 when all potential motivational and conditional knowledge gaps were addressed. However, even under these conditions, two out of five study sequence choices remained blocked. As suggested by the overlapping waves theory (Siegler, 2016), instead of consistently engaging in a new, more advanced strategy, learners still use several competing strategies over a prolonged period of time (see also Kornell & Vaughn (2018)). This leads to the question: What further reasons drive learners to engage in blocking? On the one hand, it might be — as already mentioned — due to the interface features nudging learners to make within-superordinate-category comparisons. On the other hand, the tendency to block can be interpreted in terms of effort avoidance. However, we believe that the latter interpretation falls short as it overlooks what effort has been invested by making blocked switches.

In the present research, we designed the mushroom selection page for study sequence choices in a way that allows us to differentiate between two types of switches within a superordinate category, namely no switches and within-switches. We observed that no switches and within-switches likely serve different aims. This can be seen, for example, in a negative correlation between within-switches and no switches in Experiment 1 (see Table 2) and a null correlation in Experiment 2 (see Table 4). Only in the case of no switches do we believe that learners block to lower their effort experience — which can be explained in terms of effort regulation. Within-switches, in contrast, may be attributed to the typical learning goal to identify commonalities. Accordingly, it can be said that within-switches, unlike no switches, are an effort investment to detect commonalities.

Learners’ pursuit of finding commonalities as a typical learning goal was not intended to be challenged by our minimalist interventions, as evidenced by a consistently higher extent of within-switches compared to no switches across all conditions and experiments, and a significant extent of mental effort investment to detect commonalities (despite its decrease caused by the instruction that differences matter in Experiment 1). Learners correctly believed that by switching between mushrooms of the same superordinate category, their chances of discovering commonalities generally increase. Discovering commonalities via blocking, in turn, is beneficial for category induction if the category exemplars are too diverse to be able to recognize their relevant differences among the irrelevant ones (Abel et al., 2021; Carvalho & Goldstone, 2015). Accordingly, studies that used very distinctive categories sharing only predictive features but providing no explicit information on critical features typically showed a benefit of blocking over interleaving (Carpenter & Mueller, 2013; Sorensen & Woltz, 2016; Yan & Schuetze, 2021). However, in general, learners are unaware of the implicit requirements of a category learning task with confusable stimuli that are distinct only regarding their predictive features — that is, the awareness-that-differences-matter.

Essential Components for Designing an Intervention

The insights about learners’ motivational and conditional knowledge gaps can be conveyed by teachers or digital tools and integrated into learning strategy training programs in real educational settings such as Study Smart (cf. Biwer et al., 2020). That is, the observation that learners do not recognize the utility value of distinguishing and the relevance of predictive differences for being able to distinguish should be addressed. Thus, by highlighting the utility value of distinguishing as a learning goal in category learning and informing learners about the implicit requirements of a category learning task, learners may gain an understanding of why finding differences matters, change their learning goal toward distinguishing, and recognize the merits of interleaving as a study strategy, which helps to achieve this learning goal by highlighting predictive differences. Moreover, by engaging in interleaving, learners might probably not only consider that their invested effort pays off but interpret their effort invested to find predictive differences as goal-driven (cf. Baars et al., 2020; de Bruin et al., 2023; Grund et al., 2024) because the anticipated benefits align with their learning goal.

We have not yet addressed the question of how to raise learners’ perceived utility value of distinguishing as a learning goal. Figure 1 displays the conditional impact of expected consequences of confusion raising the utility value of distinguishing when dealing with confusable categories. In the present research, we relied on superordinate mushroom category labels such as poisonous and edible, which inherently bear dangerous consequences of confusing. Although being able to distinguish mushrooms is barely an educationally important goal in western societies, it may be representative of categories that look alike and are thus likely but dangerous to confuse — such as for diagnosis of diseases in medical education (e.g., for discrimination of skin cancer vs. harmless skin lesions, see Beeler et al. (2023)). One further example, which affects us all, is the recognition of deep fakes. In all these cases, it is essential to pay attention to subtle perceptual features that are different between the twins and to reliably classify new cases based on these predictive differences. However, in domains where the confusion between categories is not apparently dangerous — such as applying a formula to solve equations — learners might favor the learning goal to memorize how to execute a strategy over finding out when a strategy is appropriate (cf. Rohrer, 2012). More research is needed to find effective ways of raising learners’ metacognitive awareness of the likelihood of confusion and its potential consequences — such as failing — to ensure learners’ recognition of the utility value of distinguishing. One possibility is to draw learners’ attention to that the categories can be easily confused and that their acquired knowledge on how to execute a strategy would be in vain and lead to false conclusions (see, e.g., Roelle et al. (2017) for informing learners about the high frequency and detrimental consequences of overconfidence in learning).

While we do not have yet a complete recipe for designing an intervention for a laboratory set-up, we have identified two essential — why — components (emphasizing distinguishing as a learning goal and the importance of differences to achieve this goal) that, if absent, would likely result in a less successful intervention in terms of efficiency and long-lasting effects. These two why-components also aid in disentangling the effects of intervention packages. Looking at theory-based intervention studies in a laboratory set-up, we observe that those incorporating why-components as integral parts tend to be more successful (cf. Onan et al., 2024) than those without (cf. Yan et al., 2016). From the learners’ perspective, being faced with empirical evidence claiming the superiority of interleaving over blocking (theory-based approach) or gaining personal experience with interleaved learning (experience-based approach) does not guarantee acceptance by learners, as there are numerous cognitive biases that can hinder their receptiveness, especially when the instructions go against one’s own experiences (Yan et al., 2016). If learners’ awareness gaps regarding the why-components are left unaddressed, many learners would likely reject the evidence because it does not indicate how well interleaving serves their actual learning goal. Thus, by incorporating why-components, interventions might encounter less psychological resistance and diminish the effects of cognitive bias, as learners perceive a higher alignment between interleaving as a study strategy and their learning goal.

Individual Differences in Motivation and Conditional Knowledge

Up to this point, we have argued that learners in general are more or less motivated and/or aware when dealing with confusable categories. In the next step, which we consider a fruitful future avenue, we reflect on the interindividual differences in motivation and conditional knowledge on when and why engaging in interleaved learning. Particularly, learners might differ regarding the extent to which they recognize the importance of distinguishing in learning confusable categories and the importance of differences to be able to distinguish (why components). For example, even in a control condition, there might be a few learners who spontaneously recognize the implicit requirements of a category task and in turn prevailingly use interleaving. Learners might also vary regarding their awareness that interleaved study choices are useful for highlighting differences between confusable categories. For example, there might be a few learners motivated to distinguish and informed that differences matter, who are willing to comply with the learning goal, but nonetheless stick to blocking due to the lack of awareness that interleaving supports this goal. Such a consideration seems plausible and might account for a numerical tendency of the interleaving highlights differences instruction to engage learners in interleaving and no strong ceiling effect for the overall score on awareness that interleaving highlights differences measure in Experiment 2.

Exploring interindividual differences and the profile clusters, in general, may contribute to understanding the mixed results across studies on spontaneous study sequence choices and intervention studies. Moreover, in terms of intervention design, considering adaptivity and efficiency, it is crucial to provide learners only with the information they lack, as anything else would be redundant and may lead to unnecessary cognitive load (Kalyuga et al., 2003). Therefore, further development and refinement of diagnostic measures for assessing specific motivational and metacognitive awareness gaps is critical for designing adaptive interventions.

Implicit Requirements of a Category Learning Task

A category learning task is sometimes not as straightforward. There is no wrong way to mix categories if the selected subset of categories is uniformly similar because any category allows a useful comparison with any other category. In a real-world study setting, however, not all categories of the selected subset are uniformly (dis)similar. Previous research using stimuli with a uniformly high (or a uniformly low) between-category similarity, however, might have missed addressing this common challenge in learning natural categories. For example, when taking the birds categories from previous studies (cf. Tauber et al., 2013), they all are uniformly similar because experimenters have selected the similar ones from the variety of very distinct birds. If the categories were not selected to meet a uniform similarity, it is no good idea to nudge learners just to mix. Hence, a category learning task is sometimes tricky when specific characteristics such as a superordinate category level (especially if no unique features are shared across the stimuli) and variations in similarity across the stimuli (that is, very similar categories — twins — among very distinct ones) come on top. In this case, learners face a common challenge in learning natural categories, namely to decide which categories to compare with each other. In such a case, learners’ preconceptions about the specific task requirements may play an important role in their study sequence choices. And these preconceptions might considerably vary across learners.

For example, some learners in Experiment 1 might have quickly figured out by themselves that confusable doubles are placed in the same rows (marked by a distinct color). In turn, if they made between-switches, then likely the informative ones. In contrast, learners who have not figured out quickly might have made also non-informative between-switches. Beyond that, learners who believed that edible and poisonous mushrooms, respectively, have no shared visual features anyway, might have spontaneously refrained from making within-switches (because there is nothing to find) but preferred between-switches.

To counterbalance learners’ preconceptions that cause utilization deficiency by preventing the (effective) use of interleaving, the task requirements should be made recognizable via the instruction (direct support) and by tailoring the interface features of the selection page (indirect support). For example, by informing learners about which categories are likely to be confused, learners are more likely to apply interleaving not in random but in a more useful way by switching between confusable categories (as shown by the comparison between Experiment 1 and Experiment 2). Beyond that, by explicitly informing learners (and implicitly indicating through the selection page characteristics) that no unique visual features are shared by the stimuli of a superordinate category, learners’ effort investment might fully shift from finding similarities to finding differences.

Educational Implications for Engagement in Desirable Difficulties

According to the Start and Stick to Desirable Difficulties framework, learners avoid unnecessary mental effort. Based on a cost–benefit calculation, learners are willing to invest effort if they think their effort pays off (de Bruin et al., 2023). To use a figure of speech, learners are as selective as customers spending money in a mall. Thus, learners refrain from engaging in interleaving — as a more effortful study strategy — not because they simply avoid investing mental effort, but probably because learners do not recognize that these motivational costs pay off. Faced with a category learning task, learners typically do invest effort to find commonalities between the exemplars of the same category. They correctly believe that by switching within a category, they would achieve this learning goal. However, finding within-category commonalities is still a suboptimal strategy choice as compared with finding differences between confusable categories.

From learners’ perspective, invested effort only pays off if the anticipated benefits align with one’s learning goal. Accordingly, to make learners recognize the value of an effortful study strategy such as interleaving in learning confusable categories, learners need to be aware of the following aspects: the consequences of confusion, the implicit requirements of a category learning task, and that interleaving helps to meet these requirements. Present research sheds light on two — motivational and metastrategic — gaps, which should be addressed to achieve this understanding in learners: Emphasizing the utility value of distinguishing as learning goal and of identifying differences to achieve this goal may encourage learners to shift their effort allocation from identifying commonalities toward searching for differences by switching between categories, leading to improved learning outcomes. In a nutshell, to engage learners in desirable difficulties, the key might be not to inform learners how to learn but to reshape their learning goal.