Experiment 1 revealed that the main factor that is critical to the spacing effect in inductive learning is interleaving, and not temporal spacing. The term spacing that Kornell and Bjork (2008) used to describe their manipulation will hereafter be replaced by the term interleaving. The results of Experiment 1 suggest that participants may have taken advantage of interleaving to compare and contrast exemplars of different categories. As was noted previously, such a strategy may be more important when the category discrimination is difficult (Kurtz & Hovland, 1956). This issue was investigated in Experiment 2 with the interleaved immediate versus massed immediate conditions, in line with Kornell and Bjork. The stimuli were squares containing familiar and nonsense shapes in varying colors. Two research questions were addressed. First, what effect does interleaving have on inductive learning as a function of the discriminability of materials used? Specifically, we were interested in finding out whether the interleaving benefit is observed only when category discrimination is difficult. Second, which one is judged to be more helpful in the learning of categories in inductive learning—massing or interleaving? On the basis of previous findings when presentation style was varied within participants (e.g., Kornell & Bjork, 2008; Kornell et al., 2010), it was hypothesized that the majority of participants in Experiment 2 would report massing to be more helpful than interleaving in learning both high- and low-discriminable categories.
Participants and design
Forty students (28 of them female, 12 male) from an introductory psychology class participated in the experiment for course credit. The design of the experiment was a 2 (presentation style: interleaved vs. massed) × 2 (degree of discriminablity: high vs. low) × 4 (test block: blocks 1–4) mixed factorial design. Category discriminability was varied between participants, while presentation style and test block were varied within participants. There were four steps involved in the experimental manipulation: presentation (study) phase, distractor task, test phase, and question phase. The four steps introduced in this experiment followed the approach in Kornell and Bjork (2008). In the presentation phase, the picture exemplars from the 12 categories were arranged in 12 learning blocks. The order of the blocks was MIIMMIIMMIIM (M for massed; I for interleaved). In particular, in the interleaved learning blocks, the pictures from a category were interleaved with pictures from other categories. The assignment of picture categories to condition (massed vs. interleaved) was counterbalanced over two lists. Two versions of each list were produced in which there was a different assignment of categories to blocks. Thus, there were four lists in total. In the test phase, 48 new picture exemplars from the 12 categories learned in the presentation phase were arranged in four test blocks. Each block consisted of one new picture from each category presented in a fixed order across participants.
The materials were 240 pictures drawn and prepared using Adobe Illustrator and Adobe Photoshop software (see Fig. 2). The pictures were divided into two sets, each having 120 pictures from 12 categories. Specifically, the first set consisted of pictures that were designated to have a high degree of discriminability, while the second set consisted of pictures that were designated to have a low degree of discriminability. Furthermore, in each set, 72 pictures were used in the presentation/study phase (6 pictures per category), and 48 pictures were used in the test phase (4 pictures per category). Each category was assigned a distinctive nonsense-syllable name. Additionally, each picture consisted of a black square frame containing three elements that defined the category and one or more distractors. Two elements were nonsense shapes (termed category icons) that shared a color composition, and the third element was a shape (circle or triangle; see Table 1). The features that were diagnostic of the category were (1) color composition of the category icons and (2) the shape that accompanied the category icons (circle or triangle). The nondiagnostic features (those that did not distinguish categories) were (1) the shape of the category icons, (2) the position of the category icons and the accompanying shape, and (3) the number of distractors in the square frame. The high-discriminable categories and the low-discriminable categories differed in the three nondiagnostic features: (1) For high-discriminable categories, the category icons had the same shape in each picture, and for low-discriminable categories, the category icons were of either the same or a different shape in each picture; (2) for high-discriminable categories, the category icons and the shape were all positioned at either the top or the bottom, and for low-discriminable categories, the category icons and the shape were all positioned top, bottom, right, or left; and (3) for high-discriminable categories, there was only one distractor, and for low-discriminable categories, there were five distractors, one distractor from each category.
The presentation phase (72 pictures and their names), distractor task, and test phase with 48 new pictures were conducted as in Experiment 1. The only notable difference was that the pictures and their category names were displayed for 5 s during the presentation phase. Additionally, following Kornell and Bjork’s (2008) approach, after the test phase, participants read a description about the meanings of the terms “massed” and “spaced” (interleaved) on the computer screen. They were asked, “Which option do you think helped you learn more?” and were provided with three possible answers—“massed,” “about the same,” or “spaced”—which ended the experimental manipulation. Participation in the experiment took approximately 40 min, and participants were debriefed about the experiment before they left the experimental room.
The results of main interest are shown in Fig. 3. As can be seen from the figure, when the degree of discriminability was high, performance was higher in the massed condition (M = 0.51) than in the interleaved condition (M = 0.33), whereas when the degree of discriminability was low, performance was higher in the interleaved condition (M = 0.44) than in the massed condition (M = 0.22).
A three-way mixed ANOVA was conducted on the data for Experiment 2. The main effect of degree of discriminability was significant, F(1, 38) = 6.09, MSE = 3.71, p = .018, η
= .138, which indicated that, on average, participants’ accuracy was significantly higher in the high-discriminability condition than in the low-discriminability condition. Participants’ performance also significantly increased across test block, F(3, 114) = 25.87, MSE = .66, p < .001, η
= .405 (as illustrated in Fig. 3). However, the effect of presentation style was not significant, F(1, 38) = 0.48, MSE = 1.89, p = .494, η
= .012. Critically, the interaction between the two main effects—degree of discriminability and presentation style—was significant, F(1, 38) = 61.59, MSE = 1.89, p < .001, η
= .618, which indicates that the effect of presentation style differed at each level of discriminability (see Fig. 3). Follow-up t-tests revealed significant differences between interleaved and massed conditions in both the high- discriminability condition, t(19) = 4.14, p = .001, and the low-discriminability condition, t(19) = −8.48, p < .001. The remaining two-way interactions and the three-way interaction were not significant [test block × degree of discriminability, F(3, 114) = 2.47, MSE = .66, p = .066, η
= .061; presentation style × test block, F(3, 114) = 0.14, MSE = .61, p = .934, η
= .004; presentation style × test block × degree of discriminability, F(3, 114) = 1.11, MSE = .061, p = .350, η
With regard to participants’ judgments of which particular study presentation had helped them most, a similar pattern of preference for massed presentation was observed in both discriminability conditions. In the high-discriminability condition, of a total of 20 participants, a majority of 14 (70 %) claimed massed, 4 (20 %) claimed spaced, and another 2 (10 %) judged massed and spaced to have contributed about the same (as illustrated in Fig. 4). A one-way chi-square analysis compared the proportion of participants who judged massed to be most useful with the proportion not preferring massed presentation and confirmed that the majority of participants reported massing to be more helpful than spacing in learning the high-discriminable categories, χ2(2, N = 20) = 12.40, p = .002. In terms of categorization performance, 17 (85 %) of the participants performed better in the massed condition, and 3 (15 %) performed better in the spaced (interleaved) condition.
Similarly, in the low-discriminability condition, of a total of 20 participants, a majority of 12 (60 %) participants claimed that massed was most effective, 4 (20 %) claimed spaced, and another 4 (20 %) judged the two conditions equally effective, regardless of their performance in the two conditions (massed and spaced; as illustrated in Fig. 5). The result of a one-way chi-square analysis conducted on the judgment data revealed that most participants judged massing to be most helpful in learning the low-discriminable categories, χ2(2, N = 20) = 6.40, p = .041. In terms of categorization performance, 19 (95 %) of the participants performed better in the spaced (interleaved) condition, and 1 (5 %) performed equally in the two conditions.
Experiment 2 provided experimental evidence in support of Kurtz and Hovland’s (1956) argument that, in inductive or category learning, massing is more effective in learning the high-discriminable categories, whereas in learning the low-discriminable categories, interleaving the exemplars from different categories is critical in enhancing discrimination learning. These results suggest that the interleaving effect observed with paintings in Experiment 1 and elsewhere (e.g., Kornell & Bjork, 2008) reflects the high difficulty of discriminating the painter categories employed. It must be noted that there were some differences in the materials between the two experiments. The implications of these differences will be considered in the General Discussion section.
On the postexperimental questionnaire, the majority of participants in both discriminability conditions (i.e., high and low) appeared to believe that massed presentation made it easier to recognize the nature of each category of picture exemplars during the study phase. In line with Kornell and colleagues’ observation (Kornell & Bjork, 2008; Kornell et al., 2010) and our previous study (Zulkiply et al., 2012), it is interesting to discover that in the low-discriminability condition, the majority of the participants rated massing as more effective than interleaving in response to the administered questionnaire even though their own test performance had demonstrated the opposite. This finding is consistent with the view that people’s access to their complex mental processes is not very effective (Kahneman, 2011; Nisbett & Wilson, 1977). Nisbett and Wilson argued that we typically are conscious of the products of our thinking but are only vaguely conscious of the process of our thinking. The impressions, intuitions, and feelings that guide us are not always justified, and we are often confident even when we are wrong (Kahneman, 2011). It is likely that massed exemplars of a category that were presented to the participants consecutively caused them to develop a sense of familiarity toward each category regardless of whether the categories were difficult or easy, which then influenced their judgment toward massing. The sense of familiarity is one of the causes of cognitive ease, and we are likely to trust our intuitions when we are in a state of cognitive ease (Kahneman, 2011). In contrast, interleaving the exemplars of a category is unlikely to have created an equivalent sense of familiarity with the categories in participants, since the exemplars of each category were spaced apart throughout the presentation phase.