Which Technique is most Effective for Learning Declarative Concepts—Provided Examples, Generated Examples, or Both?

Replication

Abstract

Students in many courses are commonly expected to learn declarative concepts, which are abstract concepts denoted by key terms with short definitions that can be applied to a variety of scenarios as reported by Rawson et al. (Educational Psychology Review 27:483–504, 2015). Given that declarative concepts are common and foundational in many courses, an important question arises: What are the most effective techniques for learning declarative concepts? The current research competitively evaluated the effectiveness of various example-based learning techniques for learning declarative concepts, with respect to both long-term learning and efficiency during study. In experiment 1, students at a large, Midwestern university were asked to learn 10 declarative concepts in social psychology by studying provided examples (instances of concepts that are provided to students illustrate how the concept can be applied), generating examples (instances of concepts that the student generates on his or her own to practice applying the concept), or by receiving a combination of alternating provided examples and generated examples. Two days later, students completed final tests (an example classification test and a definition cued recall test). Experiment 2 replicated and extended findings from experiment 1. The extension group was a variation of the combination group, in which participants were simultaneously presented with a provided example while generating an example. In both experiments, long-term learning and study efficiency were greater following the study of provided examples relative to the other example-based learning techniques.

Keywords

Declarative concepts Example-based learning Provided examples Generated examples 

Suppose you are a student in a psychology course who is learning about negative reinforcement in a unit on operant conditioning. Presumably, your instructor’s goal for teaching this concept would not be for you to just know its definition (i.e., increasing the likelihood of a behavior in the future by removing an aversive stimulus after the behavior occurs). Rather, the instructor’s intent is likely to equip you with conceptual knowledge that you could use effectively in many real-world domains (e.g., to train your dog to sit, to motivate your children to clean their rooms, to get your spouse to put dirty laundry in the hamper, and so on). If so, what learning techniques would best support your comprehension and application of this concept and others like it?

In this hypothetical scenario, negative reinforcement is an example of what are referred to as declarative concepts, which are abstract concepts denoted by key terms with short definitions that can be applied to a variety of scenarios (Rawson et al. 2015).1 These concepts represent content that is commonly encountered and foundational for learning in many academic domains. For example, students learn about the concepts kinetic energy and potential energy in physics, symbiosis and mutualism in biology, personification and metaphors in literary analysis, and negative reinforcement and positive reinforcement in psychology. These concepts are all abstract in nature in that they can be mapped onto a variety of concrete situations.

Given that declarative concepts are common and foundational in many courses, an important question arises: What are the most effective techniques for learning declarative concepts? Note that the question here concerns not just what works, but what works best when compared to other possible techniques that an instructor or student might use. The effectiveness of a learning technique has two key components. The first component of effectiveness is the level of long-term learning: what is the level of mastery achieved and maintained over time? Long-term learning is the component of effectiveness that instructors and researchers typically focus on when investigating learning techniques. This focus is understandable, given that a central goal of education is to equip students with high levels of durable knowledge. However, a second component of effectiveness that is often overlooked but arguably also important is efficiency: how much time is required during acquisition to achieve the desired level of long-term learning?

Howe and Singer (1975) note that in practice, the challenge for educators and researchers is not to identify effective learning techniques when time is unlimited. Rather, the problem arises when trying to identify what is most effective when time is fixed. Indeed, long-term learning could easily be achieved if students had an unlimited amount of time and only a limited amount of information to learn (with the caveat that students spend their time employing useful encoding strategies). However, achieving long-term learning is difficult because students have a lot to learn within a limited amount of time (Rawson and Dunlosky 2011). Thus, long-term learning and efficiency are both important to consider when competitively evaluating the effectiveness of learning techniques.

Accordingly, the current research competitively evaluated the effectiveness of various example-based learning techniques for learning declarative concepts, with respect to both long-term learning and efficiency. Example-based learning involves the use of concrete examples to illustrate how abstract concepts can be applied or instantiated in particular contexts or situations. Although example-based learning can take many forms, the current research focuses on provided examples and generated examples. A provided example is an instance of a declarative concept that is provided to students (e.g., by a teacher or textbook) to illustrate how the concept can be applied. A generated example is an instance of a declarative concept that the student generates on his or her own to practice applying the concept. The current research aimed to answer two questions: (1) Are provided examples or generated examples more effective for learning declarative concepts, and (2) is the combination of both provided and generated examples more effective than either technique alone? Below, we describe the current state of the literature and then report two experiments designed to answer these primary questions.

Prior Research on Example-Based Learning of Declarative Concepts

Provided Examples or Generated Examples

Provided examples are often given to students by instructors and are commonly included in textbooks and educational materials (e.g., a social psychology chapter in a widely used introductory psychology textbook provided one or more examples for 93% of the chapter concepts; Rawson et al. 2015). For example, after introducing the definition of the concept negative reinforcement, an instructor or textbook might provide the following example to help further illustrate the concept: “If Judy opens her umbrella when it is raining and avoids getting wet, she will be more likely to use her umbrella when it is raining in the future” (for additional concepts and examples, see Tables 1 and 2). Despite the prevalence of provided examples in educational materials, minimal research has investigated the effectiveness of using provided examples to support learning. However, evidence thus far is promising; on final tests assessing concept comprehension and memory for concept definitions, performance was greater for students who studied provided examples versus students who restudied concept definitions (Balch 2005; Rawson et al. 2015).
Table 1

Declarative concepts and corresponding definitions

Declarative concept

Definition

Availability heuristic

The tendency to estimate the likelihood that an event will occur by how easily instances of it come to mind

Counterfactual thinking

The tendency to imagine alternative events or outcomes that might have occurred, but did not

Door-in-the-face technique

A strategy used to increase compliance based on the fact that refusal of a large request increases the likelihood of agreement with a subsequent smaller request

Deindividuation

The loss of a person’s sense of individuality that results in a reduction of normal constraints against deviant behavior

Foot-in-the-door technique

A strategy used to increase compliance based on the fact that agreement with a small request increases the likelihood of agreement with a subsequent larger request

Fundamental attribution error

The tendency to believe that another person’s behavior is due to his/her disposition and to underestimate the impact of situations on his/her behavior

Hindsight bias

The tendency, once an event has occurred, to overestimate one’s ability to have foreseen the outcome

Mere exposure effect

The phenomenon whereby the more people are exposed to a stimulus, the more positively they evaluate that stimulus

Representativeness heuristic

The tendency to judge the likelihood that a target belongs to a category based on how similar the target is to typical members of the category

Social facilitation

A process whereby the presence of others enhances performance on simple tasks, but impairs performance on more complex or unfamiliar tasks

Table 2

Sample declarative concepts and provided examples

Declarative concept

Provided example

Availability heuristic

After the movie “Jaws” came out, and again in the early 1990s after “Summer of the Shark,” many people were afraid to go in the ocean because they overestimated the chances of suffering a shark attack. Although shark attacks are very rare, the dramatic scenes in the movies made shark attacks easy to visualize and recall

Hindsight bias

Political assessments are often biased after the outcome is known, with voters commenting “I always knew my candidate would win”

Mere exposure effect

After seeing numerous TV commercials that show the Energizer battery out-performing another brand, a viewer may change his or her attitude toward the product from neutral to positive

Concerning generated examples, students report using example generation to help them learn concepts (Blaisman et al. in press; Gurung 2005; Gurung et al. 2010; Weinstein et al. 2013). Despite students’ common use of this strategy, few studies have examined the effectiveness of example generation. Earlier studies on example generation yielded mixed results (Dornisch et al. 2011; Gorrell et al. 1991; Hamilton 1989, 1997, 1999, 2004), but interpretive difficulties arise due to methodological limitations in these studies (e.g., dosage differences, quasi-experimental design, weak manipulations; for detailed discussion, see Rawson and Dunlosky 2016). A more recent study designed to overcome these limitations demonstrated greater final test performance for students who generated examples versus for students who restudied concept definitions (Rawson and Dunlosky 2016).

Provided Examples versus Generated Examples

Both provided and generated examples have shown promise for improving concept learning, at least compared to restudying concept definitions (Balch 2005; Rawson et al. 2015; Rawson and Dunlosky 2016). However, minimal research has competitively evaluated the relative effectiveness of provided versus generated examples for declarative concept learning, which is the first question of interest in the current research. Specifically, only two studies have directly compared the effectiveness of provided versus generated examples, only one of these examined long-term learning, and neither investigated learning efficiency (Dornisch et al. 2011; Hamilton 1990).

Hamilton (1990) had students learn four concepts from the topic of operant conditioning. After initial learning, students either studied provided examples or generated their own examples. Subsequent final tests included definition cued recall (in which participants were presented with the concept terms and asked to recall the definitions), example classification, and a problem solving test. Problem solving performance was greater for students who were given provided examples versus students who generated examples, but groups did not differ significantly on definition cued recall or example classification (although numerical trends favored the provided examples group). However, all final tests were taken immediately after learning. Importantly, research on other learning techniques often shows that the relative effectiveness of techniques reverses from immediate to delayed tests of learning (e.g., Dunlosky et al. 2013; Rowland 2014).

In the study by Dornisch et al. (2011), students either studied provided examples or generated their own examples to learn concepts from educational psychology. No performance differences emerged on matching, multiple-choice, open-ended, or factual recall questions administered immediately after learning and 7 days later. However, with only 13 example prompts for a 3000-word text, the dosage level of the examples manipulation may have been insufficient to produce detectable differences in learning.

Given the limitations of these two studies, the question of whether provided or generated examples are more effective for learning declarative concepts remains open. However, other related literatures provide some basis for predictions about the relative effectiveness of these two techniques. Regarding long-term learning, in the testing effect literature, the modal finding is that long-term retention is greater after students complete practice tests versus restudy (see Rowland 2014 for recent meta-analysis), particularly when practice tests require students to generate responses (e.g., retrieval of concept definitions, practice application tests). Drawing parallels to the current research, generating examples may be similar to taking practice tests, whereas studying provided examples may be similar to restudy. Based on this parallel, a plausible prediction is that long-term learning will be greater for students who generate examples versus students who study provided examples.

Concerning efficiency (the other component of effectiveness), findings from the literature on rule-based concept learning (i.e., rules, formulas, or algorithms for solving particular kinds of problem, such as Pythagorean theorem) provide some basis for our predictions in the current research. Research on rule-based concept learning has compared learning efficiency of worked examples (which consist of a problem statement with step-by-step illustration of how to arrive at the solution) to problem solving (in which students receive the problem to be solved on their own). The modal finding is that studying worked examples requires less time than problem solving (e.g., Nievelstein et al. 2013; Paas and van Merrienboer 1994; van Gog et al. 2006). Drawing parallels to the current research, provided examples may be similar to worked examples, whereas generated examples may be similar to problem solving. Based on this parallel, a plausible prediction for efficiency is that practice time will be lower for studying provided examples compared to generating examples.2

Provided and Generated Examples

Above and beyond the question of the relative effectiveness of provided versus generated examples, the second question of interest concerns whether the combination of provided and generated examples is more effective than either technique alone. No prior research in the declarative concept literature has investigated this question.

Although no direct evidence is available, the testing effect and rule-based concepts literatures again provide some basis for predictions. In the testing effect literature, long-term learning is consistently greater following practice testing with restudy versus either testing or restudy alone (e.g., Roediger and Butler 2011). Although few studies in the rule-based concept literature have examined learning using delayed tests, the limited evidence shows that the combination of worked examples and problem solving is often better than either technique alone (Carroll 1994; Salden et al. 2010; Leahy et al. 2015; van Gog et al. 2015; but see van Gog and Kester 2012).

Concerning efficiency, predictions are less clear. The testing effect literature is generally silent on efficiency. The rule-based concepts literature shows that the combination of worked examples and problem solving is often more efficient than problem solving (e.g., Salden et al. 2010), but has not examined the combination compared to worked examples. However, the parallels in the rule-based concepts literature at least provide some basis for the prediction that the combination of provided and generated examples will be more efficient than generated examples alone. The prediction that the combination of provided and generated examples will be more efficient than generated examples alone also directly follows from the previous prediction that studying provided examples will be more efficient than generating examples. Given that half of the trials for the combination will be provided examples and the other half of the trials will be generated examples, a reasonable expectation is that efficiency for this technique will fall in between efficiency for provided examples alone and generated examples alone.

Summary of Prior Research

In sum, minimal research has investigated the effectiveness of provided or generated examples separately, and only two studies have evaluated the effectiveness of provided versus generated examples. Concerning a competitive evaluation of provided versus generated examples, only one of the two studies examined long-term learning, results were inconclusive, and neither study examined efficiency. Finally, no research has investigated the effectiveness of the combination of provided and generated examples versus either technique alone.

Overview of Current Research

Given the state of the extant literature, the current experiments were designed to answer two primary questions. First, are provided or generated examples more effective for declarative concept learning? Second, is the combination of provided and generated examples more effective than either technique alone? In both experiments, participants were asked to learn declarative concepts from social psychology. After initial study of the concepts, participants completed a practice phase in which they studied provided examples of the concepts, generated examples of the concepts, or studied provided examples and generated examples of the concepts. To investigate learning efficiency, all example trials were self-paced and we recorded the time spent on each practice trial. To investigate long-term learning, all participants completed an example classification test 2 days later, in which they were given examples and asked to identify which of the ten concepts each example illustrated.

Concerning predictions, minimal research has directly investigated our two primary questions of interest. However, findings from other related literatures provide indirect evidence to support several predictions. Regarding our first question, we predicted that long-term learning would be greater following generated examples compared to provided examples. Concerning efficiency, we predicted that less time would be spent studying provided examples compared to generating examples. Regarding our second question, we predicted that long-term learning would be greater after a combination of provided and generated examples compared to either technique alone. Concerning efficiency, our prediction was that less time would be spent when students study provided examples and generate examples compared to just generating examples.

To foreshadow, experiment 1 disconfirmed some of these predictions and yielded some surprising outcomes. Given the importance of replication, experiment 2 was designed primarily to replicate these outcomes. As prior research on example-based learning for declarative concepts is scant, the consistent pattern of findings across these experiments are informative and important for the larger purpose of identifying the most effective techniques for learning declarative concepts.

Experiment 1

Method

Participants and Design

Experiment 1 included 133 participants from a large Midwestern university (71% female; 79% white, 14% black, 8% Asian, 3% first nations; 3% Hispanic or Latino, 1% native Hawaiian or Pacific Islander); 57% were in their first year of college (M years = 1.8, SE = 0.1) and 24% were psychology majors. Mean age of participants was 19.6 years (SE = 0.2). Participants were recruited from the Psychology Department’s participant pool.

Participants were randomly assigned to one of three groups: provided examples, generated examples, or a combination of provided and generated examples. An additional 36 participants were randomly assigned to a fourth group that is not of central interest to the primary questions under consideration here. Basic information and outcomes for this group are reported in Appendix A for full disclosure (Simmons et al. 2011).

Data for 19 participants were excluded from analysis due to lost session 1 data (n = 2), failure to return for session 2 (n = 9), or non-compliance (i.e., if they spent less than 2 s on more than half of the practice trials or on more than half of the classification test trials; n = 8). The final sample included 114 participants (provided examples n = 36; generated examples n = 40; combination n = 38). A sensitivity analysis was conducted using G*Power (Faul et al. 2009). For one-tailed, independent samples t tests of directional predictions with an alpha level of .05 and .80 power, this sample size afforded sufficient sensitivity to detect moderate effects (d ≥ .57).

Materials and Procedure

Materials included a textbook passage (396 words) from social psychology that included definitions for 10 declarative concepts (see Table 1 for all concepts and definitions). Materials also included 100 concrete examples (10 per concept) that were previously harvested from various undergraduate-level psychology textbooks by Rawson et al. (2015). Forty of the examples were used as provided examples during the practice phase and all 100 examples were included on the example classification final test (see Table 2 for sample examples).

All tasks and instructions were administered by a computer program, with participants working in individual stations in the laboratory. Instructions began by stating that our goal is to identify the best ways to help students learn conceptual material in their classes and encouraging them to try their best to learn the concepts in the experiment. Participants were then presented with each of the concept terms and were asked if they had previously learned each concept in a psychology course. Participants then completed a definition cued recall pretest. They were presented with each of the concept terms one at a time in random order with this prompt: “In the field below, type in the definition of this concept. Only type the definition if you think you know the meaning of the concept. If not, simply type ‘I don’t know.’” Performance on these prior knowledge assessments are reported in Table 3 for both experiments 1 and 2. Next, all participants completed the initial study phase, in which they first read the textbook passage and were asked to click the button when they were finished. Next, participants were presented with each concept and its definition one at a time for additional self-paced study.
Table 3

Outcomes for secondary measures

 

Self-reported concepts learned

Definition recall pretest

Category learning judgment magnitude

Experiment 1

Provided examples

3.1 (.5)

4.2 (1.2)

72.0 (3.0)

Generated examples

1.8 (.4)

1.7 (.6)

50.0 (3.5)

Combination

2.2 (.4)

1.0 (.4)

67.0 (2.6)

Experiment 2

Provided examples

2.3 (.4)

3.0 (.6)

73.9 (2.5)

Generated examples

2.3 (.5)

1.7 (.6)

55.5 (3.3)

Combination

3.0 (.5)

3.6 (1.0)

68.5 (2.5)

Simultaneous

2.1 (.4)

1.5 (.5)

66.7 (2.1)

Self-reported concepts is reported as number out of 10 concepts. Definition recall is reported as percentage correct. Category learning judgment values are mean values across all concepts on a scale of 0 to 100. Standard errors are reported in parentheses

After the initial study phase, participants received group-specific instructions for the practice phase. Participants in the combination group were told that they would first study a provided example for a concept and on the next trial, they would be asked to generate their own example of that concept. They were told that they would be asked to generate multiple examples of each concept and to do their best to generate a different example on each trial. They were also told not to use examples already provided to them. Participants in the generated examples group were told that they would be asked to generate multiple examples of each concept and to do their best to generate a different example on each trial. Participants in the provided examples group were told that they would be presented with multiple examples of each concept to study. After receiving group-specific instructions, all participants were told that they would be given a final test in 2 days that would assess their learning of the concepts.

Participants in all groups practiced concepts in a fixed random order. Practice trials were split into two blocks of 20, with two practice trials per concept in each block. In each block, the two practice trials for each concept were consecutive (see Fig. 1). For example, the first concept that participants practiced was availability heuristic. The combination group first received a provided example of the concept and on the next trial was asked to generate their own example. After finishing the two practice trials for availability heuristic, they received two practice trials for each of the remaining nine concepts (a provided example followed by a generated example trial). After all ten concepts had been presented for two practice trials each, participants completed the second block of trials with another two practice trials per concept (a provided example followed by a generated example trial). The provided examples presented in each block were different. The practice schedule in the generated examples group was the same as in the combination group, except that all practice trials were generated example trials (see Fig. 1). The practice schedule in the provided examples group was also the same as the other groups, except that all practice trials were provided examples, with different examples on each trial (see Fig. 1).
Fig. 1

PE: trial in which participants studied a provided example. GE: trial in which participants were prompted to generate their own example. The practice materials included four different provided examples for each concept, denoted by letters A through D. Depending on group assignment, participants studied four (provided examples), two (combination), or zero (generated examples) provided examples per concept. The simultaneous group was only included in experiment 2

After the practice phase, participants made a category learning judgment for each concept. They were shown the concept term and asked “How confident are you that you will be able to accurately identify real-world examples that illustrate the following concept?” They made their judgment on a scale of 0 to 100% confident (see Table 3 for outcomes for both experiments). Participants then completed a demographics questionnaire.

Participants came back 2 days later for the testing session. Final tests included an example classification test and a definition cued recall test. The order of final tests was counterbalanced across participants. Test order did not interact with group for any of the dependent measures.

The example classification test included 100 examples presented in random order. Sixty examples had not been presented to any group during practice (hereafter referred to as novel examples). The remaining 40 examples were those used as provided examples during the practice phase in session 1. Twenty of those 40 examples had been presented to the provided examples group and the combination group (hereafter referred to as studied by provided and combination). The remaining 20 examples had been presented only to the provided group (hereafter referred to as studied by provided). On each trial of the example classification test, participants were shown an example and the list of ten concept terms. Participants were asked to indicate which of the ten concepts the example illustrated by clicking on a button next to the concept term. For the definition-cued recall test, participants were presented with the ten concept terms one at a time in random order and were prompted to type in the definition. Trials on both final tests were self-paced.

Scoring

For definition-cued recall scoring, research assistants scored definitions by identifying the number of main idea units that the participant recalled and entering percentage correct scores for each definition. Both verbatim restatements and paraphrases that still preserved the meaning of the definition were counted as correct. All idea units were counted as equally important and unweighted. A training set of definition cued recall responses for 20 participants was scored by two trained assistants. Interrater reliability was r = .97 for experiment 1 and r = .87 for experiment 2. The rest of the protocols were scored by one of the raters.

For secondary analyses discussed below, we also scored the examples generated during practice on two dimensions: quality and variability. For scoring of example quality, research assistants assigned each example 0, 1, or 2 points based on the extent to which the example provided a good illustration of the concept. The total number of points each participant received was converted to percentage of possible points. A training set of generated examples for 28 participants was scored by two trained assistants. Interrater reliability was r = .99 for experiment 1 and r = .95 for experiment 2. The rest of the protocols were scored by one of the raters. For purposes of comparison to the generated examples, we also coded the quality of the provided examples. The mean quality of all of the provided examples (98% correct) was the same for all participants in the provided examples group and was used as the test value for comparison to generated example quality. The mean quality of provided examples that the combination group were exposed to was also 98%. For each participant in the combination group, an example quality score was computed as the mean quality of their generated examples and the quality of the provided examples that they saw.

For scoring of example variability, research assistants scored pairs of examples for the similarity of their surface characteristics. The four examples for each concept afforded six pairs of examples to be scored (i.e., examples 1 and 2, 1/3, 1/4, 2/3, 2/4, and 3/4), summing to 60 pairs per participant across the ten concepts. For each pair, research assistants were instructed to score how similar the surface features (e.g., people, places, events) of the examples were. They were instructed to not consider example quality (i.e., how well the example captured the concept) when scoring for similarity. Research assistants scored the pairs using a scale from 0 to 100% overlap (0 indicating that the examples had completely different surface characteristics and 100 indicating that the examples were the same). Some participants had fewer than 60 pairs to be scored due to omissions (i.e., trials in which participants did not generate an example or typed “I do not know”) and therefore no comparison could be made for those pairs. A training set of 240 pairs of examples were scored by two trained assistants. Interrater reliability for both experiments was r = .71. The rest of the protocols were scored by one of the raters. For purposes of comparison to the generated examples, we also coded the variability of the pairs of provided examples. The mean variability of pairs of provided examples (14% overlap in surface characteristics, indicating high variability) was the same for all participants in the provided examples group and was used as the test value for comparison to generated example variability. For each participant in the combination group, an example variability score was computed as the mean variability of all pairs of examples (some of which were both provided, some of which were provided and generated, and others which were both generated).

Results and Discussion

For both experiments, directional predictions were evaluated via one-tailed planned comparisons (for recommendations to only conduct the statistical analyses necessary to answer research questions of interest rather than omnibus ANOVAs, see Judd and McClelland 1989; Rosenthal and Rosnow 1985; Tabachnick and Fidell 2001; Wilkinson and APA Task Force 1999). For all t tests, Cohen’s ds were computed using pooled standard deviations (Cortina and Nouri 2000) and are reported as absolute values.

Question 1: Are Provided or Generated Examples More Effective for Declarative Concept Learning?

To revisit, we predicted that long-term learning would be greater for generated examples versus provided examples. Outcomes on the final test of example classification are of primary interest. Not surprising, performance for examples that had been studied by the provided examples group (middle bars in Fig. 2) was greater for the provided examples group than for the generated examples group, t(74) = 1.86, p = .035, d = .43. This advantage could simply reflect memory for the particular examples that the provided examples group had studied during practice. The more compelling outcome is that classification performance for novel examples was greater for the provided examples group than for the generated examples group (left set of bars in Fig. 2), t(74) = 2.06, p = .022, d = .47, providing evidence that provided examples were more effective than generated examples for enhancing concept comprehension.
Fig. 2

Experiment 1: Performance on the example classification final test. Provided: group that only studied provided examples during study; Generated: group that only generated examples during study; Combination: group that generated examples and studied provided examples during study. Novel classification items are examples that no group saw during study. Studied by Provided are examples that only the provided group saw during study. Studied by Provided and Combination are examples that the provided group and combination groups saw during study. Error bars reflect standard errors

Of secondary interest, the provided examples and generated examples groups did not differ on definition cued recall (M = 25%, SE = 3 versus M = 23%, SE = 3), t(69) = .50, d = .12; to foreshadow, we discuss implications of the patterns for cued recall performance across experiments in “General Discussion.”

Regarding efficiency, we predicted that provided examples would be more efficient than generated examples, as indicated by the amount of time students spent during the practice phase. As shown in Fig. 3, practice was more efficient for the provided examples group than for the generated examples group, t(74) = 5.06, p < .001, d = 1.16.
Fig. 3

Experiment 1. Efficiency outcomes as indicated by minutes spent during the practice phase. Provided: group that only studied provided examples during study; Generated: group that only generated examples during study; Combination: group that generated examples and studied provided examples during study. Error bars reflect standard errors

In addition to examining long-term learning and efficiency as separate components of effectiveness, we calculated a combined measure to examine the incremental gain in learning for each unit of time spent during practice (see Rawson and Dunlosky 2013 for a similar approach). For each participant, we divided their novel example classification score by the total number of minutes spent during the practice phase to create a gain per minute (GPM) measure. For this combined measure of effectiveness, GPM was greater for the provided examples group compared to the generated examples group, t(74) = 6.00, p < .001, d = 1.38 (see Table 4). In sum, provided examples were more effective than generated examples, for both long-term learning and for efficiency.
Table 4

Gain per minute outcomes

 

Mean (SE)

Experiment 1

Provided examples

5.7 (.6)

Generated examples

2.2 (.2)

Combination

1.7 (.1)

Experiment 2

Provided examples

5.7 (.5)

Generated examples

2.0 (.2)

Combination

1.7 (.2)

Simultaneous

1.5 (.1)

Mean values reflect percentage points gained on the novel example classification test per minute spent during the practice phase

SE = standard error of the mean

Question 2: Is the Combination of Provided and Generated Examples More Effective than Either Technique Alone?

We predicted that long-term learning would be greater following the combination versus provided examples or generated examples alone. Again, performance on the example classification test is of primary interest. For examples that had been studied by the provided and combination groups (rightmost bars in Fig. 2), classification performance in the combination group was similar to performance in the provided group and greater than performance in the generated examples group; t(72) = .48, d = .11 and t(76) = 2.13, p = .02, d = .48, respectively. The outcomes of greater interest concern classification performance for novel examples. Contrary to predictions, the combination group only showed a small and nonsignificant advantage over the generated examples group and a numerical disadvantage relative to the provided examples group; t(76) = .96, p = .17, d = .22 and t(72) = 1.24, p = .11, d = .29, respectively. This pattern provides evidence that the combination is not more effective than either technique alone for long-term learning.

Of secondary interest, the combination group did not outperform either group on definition cued recall, combination versus provided: M = 26%, SE = 3 versus M = 25%, SE = 3, t(67) = .30, d = .07; combination versus generated: M = 26%, SE = 3 versus M = 23%, SE = 3, t(72) = .84, d = .20.

As shown in Fig. 3, practice was less efficient for the combination group than for the provided examples group and for the generated examples group; t(72) = 9.40, p < .001, d = 2.19 and t(76) = 1.99, p = .03, d = .45, respectively. GPM was also lower for the combination group than for the provided examples group and for the generated examples group; t(72) = 6.90, p < .001, d = 1.60 and t(76) = 1.78, p = .04, d = .40, respectively (see Table 4).

In sum, the combination of provided and generated examples was not more effective than either technique alone, either for long-term learning or for efficiency. Rather, provided examples were more effective in both aspects.

Exploratory Analyses

Given the long-term learning advantage for the provided examples group versus the other two groups involving example generation, an interesting question arises: Why were provided examples more effective? Although not the primary purpose of this research, we conducted exploratory analyses to investigate two possibilities: to what extent did groups differ in the variability or in the quality of the examples they experienced during practice?

Concerning example variability, we first checked the extent to which participants in the generated examples and combination groups complied with instructions to come up with different examples on example generation trials. Participants did comply with instructions; the generated examples group and combination group did not repeat examples frequently (generated: M = 2% of the time, SE < 1; combination: M < 1% of the time, SE < 1).

Although participants appeared to comply with instructions, groups that generated examples may still have generated examples that had some similarity to other provided or generated examples, resulting in less example variability than in the provided examples group. To investigate this possibility, one-sample t tests were conducted against the provided examples example variability score (14%; to revisit, 0% indicates that surface characteristics of examples were entirely different, whereas 100% indicates that examples were the same). On average, example variability was somewhat lower for the generated examples group (M = 38%, SE = 4) than for the provided examples group, t(45) = 6.70, p < .001. The difference in example variability for the combination group versus the provided examples group was significant but small in magnitude, M = 9%, SE = 1, t(42) = 7.68, p < .001.

Concerning example quality, groups that generated examples may have generated examples of poorer quality than those to which the provided examples group were exposed during practice (98% correct). To the extent that students use examples to develop their understanding of a concept, an example that inaccurately represents the concept may lead to misconceptions of that concept. First, we examined the quality of all examples participants were exposed to. Total example quality was poorer for the generated examples group and for the combination group compared to the provided examples group, generated: M = 22%, SE = 3, t(38) = 24.51, p < .001; combination: M = 79%, SE = 2, t(36) = 10.15, p < .001.

To revisit, the total example quality score for the combination group included both provided and generated examples, but examining the quality of the generated examples alone is also informative. Quality of generated examples was greater for the combination group than for the generated examples group, M = 60%, SE = 4 versus M = 22%, SE = 3; t(74) = 8.02, p < .001, d = 1.84. This finding suggests that provision of some provided examples in the combination group improved the quality of the example they generated relative to example generation alone.

With that said, this difference in the example quality score could also reflect differential frequency of omissions in the two groups, given that omissions were scored as 0. The generated examples group was required to generate 40 examples (versus 20 in the combination group), and participants may have been unable to come up with 40 examples. Consistent with this possibility, significantly more omissions were committed during example generation trials in the generated examples group compared to the combination group, M = 38%, SE = 5 versus M = 7%, SE = 2; t(76) = 6.08, p < .001, d = 1.34. Nonetheless, example quality for just the subset of trials in which participants attempted to generate an example was still lower in the generated examples group than in the combination group, M = 33%, SE = 4 versus M = 63%, SE = 4; t(74) = 5.59, p < .001, d = 1.28.

Experiment 2

Experiment 1 showed that provided examples were the most effective example-based technique for learning declarative concepts, with respect to both long-term learning and efficiency. These findings disconfirmed predictions for long-term learning that were motivated by parallel findings in related literatures. Namely, long-term learning was predicted to be greater following generated versus provided examples and greater following the combination versus either technique alone. None of these outcomes obtained, with arguably surprising advantages for provided examples instead. The first purpose of experiment 2 was to replicate these findings (for recent emphasis on the importance of replicating novel findings, see Lishner 2015; Maner 2014; Schmidt 2009; Simons 2014).

The second purpose of experiment 2 was to investigate if changing the way in which the combination group received examples would improve the effectiveness of the combination technique. Although the quality of generated examples was greater for the combination group versus the generated examples group, example quality was still somewhat low for the combination group. One possibility is that the sequential presentation of provided and generated example trials may have hindered students’ ability to effectively use the provided examples while generating examples. This hindrance may have led to poorer generated example quality which may have undercut the value of the combination technique.

In the rule-based concepts literature, Carroll (1994) found that participants often used worked examples during problem solving by referring back to the example while attempting to solve a problem, rather than fully processing the worked example prior to moving on to problem solving. This observation suggests that if students in the current study had provided examples available to them during example generation, they may refer back to the provided example as a scaffold during example generation. Thus, we propose the scaffolding hypothesis which states that scaffolding (via simultaneous presentation of a provided example) will enhance the quality of generated examples which in turn will increase the effectiveness of the combination technique.

To test the scaffolding hypothesis, experiment 2 included a combination group that was presented with the provided and generated examples simultaneously in each pair of example trials (see Fig. 1), which we refer to as the simultaneous group hereafter. For question 2, the key prediction is that long-term learning will be greater for the simultaneous group versus the provided examples and generated examples groups.

Method

Participants and Design

Experiment 2 included 206 participants from a large Midwestern university (71% female; 81% white, 12% black, 8% Asian, 5% first nations, 4% Hispanic or Latino); 48% were in their first year of college (M years = 2.0, SE = 0.1) and 23% were psychology majors. Mean age of participants was 20.1 (SE = 0.2). Participants were recruited from the Psychology Department’s participant pool.

Participants were randomly assigned to one of four groups: provided examples, generated examples, combination (replication group from experiment 1), or simultaneous (the new extension group). Data for 25 participants were excluded from analysis due to lost session 1 data (n = 1), failure to return for session 2 (n = 13), or non-compliance (as defined in experiment 1; n = 11). The final sample included 181 participants (provided examples n = 46; generated examples n = 42; combination n = 46; simultaneous n = 47). A sensitivity analysis was conducted using G*Power (Faul et al. 2009). For one-tailed, independent sample t tests of directional predictions with an alpha level of .05 and .80 power, this sample size affords sufficient sensitivity to detect moderate effects (d ≥ .52).

Materials and Procedure

Materials and procedure for the three replication groups in experiment 2 were identical to experiment 1. The procedure for the simultaneous group was the same as in the combination group, with two exceptions. First, in the instructions prior to the practice phase, participants were told that the provided example would be visible during the example generation trial. Second, they were told that they could use the example as a model but to come up with their own new example. During the practice phase, the provided example and the example generation prompt for each pair of practice trials were presented at the same time on the computer screen.

As in experiment 1, all participants came back 2 days later to complete final tests including example classification and definition cued recall, with the order of final tests counterbalanced across participants. In contrast to experiment 1, test order did interact with group assignment for example classification performance in experiment 2. We ran all analyses for example classification with test order as a covariate, and outcomes supported the same qualitative conclusions as the analyses without the covariate. Therefore, for simplicity and parallelism to experiment 1, we report statistics without the covariate included.

Results and Discussion

Question 1: Are Provided or Generated Examples More Effective for Declarative Concept Learning?

Outcomes on the example classification test are again of primary interest. Replicating experiment 1, performance for examples that had been studied by the provided examples group (middle bars in Fig. 4) was greater for the provided examples group than for the generated examples group, t(86) = 1.99, p = .025, d = .43. More important, classification of novel examples (left set of bars in Fig. 4) was also greater for the provided examples group than for the generated examples group, t(86) = 1.61, p = .055, d = .34. This finding provides evidence that provided examples were more effective than generating examples for enhancing concept comprehension. Also replicating experiment 1, the provided examples and generated examples groups did not differ on definition cued recall, M = 24%, SE = 2 versus M = 23%, SE = 3; t(85) = .29, d = .06.
Fig. 4

Experiment 2: Performance on the example classification final test. Provided: group that only studied provided examples during study; Generated: group that only generated examples during study; Combination: group that sequentially studied provided examples and then generated examples during study. Simultaneous: group that simultaneously studied a provided example while generating an example during study. Novel classification items are examples that no group saw during study. Studied by Provided are examples that only the provided group saw during study. Studied by Provided and Combination are examples that the provided, combination, and simultaneous groups saw during study. Error bars reflect standard errors

Concerning efficiency, practice was again more efficient for the provided examples group than for the generated examples group (see Fig. 5), t(86) = 7.61, p < .001, d = 1.62. GPM was also greater for the provided examples group than for the generated examples group, t(86) = 7.01, p < .001, d = 1.50 (see Table 4). In sum, we replicated all findings from experiment 1. Namely, provided examples were more effective than generated examples, for both long-term learning and for efficiency.
Fig. 5

Experiment 1. Efficiency outcomes as indicated by minutes spent during the practice phase. Provided: group that only studied provided examples during study; Generated: group that only generated examples during study; Combination: group that sequentially studied provided examples and then generated examples during study; Simultaneous: group that simultaneously studied a provided example while generating an example during study. Error bars reflect standard errors

Question 2: Is the Combination of Provided and Generated Examples More Effective than Either Technique Alone?

First, we discuss results relevant to the replication groups. For examples that had been studied by the provided and combination groups (rightmost bars in Fig. 4), classification performance in the combination group was similar to performance in the provided examples group and greater than performance in the generated examples group; t(90) = .39, d = .08 and t(86) = 1.69, p = .05, d = .36, respectively. Again, of greater interest is classification performance for novel examples. Also replicating findings from experiment 1, the combination group showed a small nonsignificant advantage over the generated examples group and a numerical disadvantage relative to the provided examples group, t(86) = .75, p = .23, d = .16 and t(90) = .75, p = .24, d = .16, respectively. Of secondary interest, the combination group again did not outperform either group on definition cued recall; combination versus provided: M = 22%, SE = 3 versus M = 24%, SE = 2; t(89) = .65, d = .14; combination versus generated: M = 22%, SE = 3 versus M = 23%, SE = 3; t(86) = .35, p = .73, d = .07.

As shown in Fig. 5, practice was again less efficient for the combination group than for the provided examples group and for the generated examples group, t(90) = 9.93, p < .001, d = 2.07 and t(86) = 2.09, p = .02, d = .45, respectively. GPM was also lower for the combination group than for the provided examples group and for the generated examples group, t(90) = 7.77, p < .001, d = 1.62 and t(86) = 1.25, p = .11, d = .27, respectively (see Table 4). Thus, all findings from experiment 1 were replicated.

Concerning the extension beyond experiment 1, we next discuss results relevant to the simultaneous group included to test the scaffolding hypothesis. To revisit, this hypothesis states that scaffolding will enhance the quality of generated examples which in turn will increase the effectiveness of the combination technique. This hypothesis predicts that long-term learning will be greater for the simultaneous group versus either technique alone.

First, for examples that had been studied by the provided and combination groups (rightmost bars in Fig. 4), classification performance was numerically lower for the simultaneous group compared to the provided examples group, t(91) = 1.51, p = .07, d = .31. This finding was surprising, given that these items were studied by both groups. Furthermore, the numerical advantage in classification performance for these items for the simultaneous group compared to the generated examples group was small and not significant, t(87) = .66, p = .26, d = .14.

Concerning classification performance for novel examples, the simultaneous group showed a significant disadvantage compared to the provided examples group and a numerical disadvantage compared to the generated examples group, t(91) = 1.99, p = .025, d = .41 and t(87) = .40, p = .35, d = .08, respectively. These findings disconfirmed the prediction of the scaffolding hypothesis that long-term learning would be greater for the simultaneous group than for the provided examples and generated examples groups.

Of secondary interest, the simultaneous group performed worse on definition cued recall; simultaneous versus provided: M = 18%, SE = 2 versus M = 24%, SE = 2; t(90) = 1.92, p = .03, d = .40; simultaneous versus generated: M = 18%, SE = 2 versus M = 23%, SE = 3; t(87) = 1.55, p = .06, d = .33.

Finally, as shown in Fig. 5, practice was less efficient for the simultaneous group than for the provided examples group and for the generated examples group, t(91) = 10.60, p < .001, d = 2.20 and t(87) = 1.59, p = .06, d = .34, respectively. GPM was also lower for the simultaneous group compared to the provided examples group and for the generated examples group, t(91) = 8.38, p < .001, d = 1.74 and t(87) = 2.38, p = .01, d = .51, respectively (see Table 4).

In sum, the simultaneous technique was not more effective than either technique alone, either for long-term learning or for efficiency. The simultaneous presentation of provided examples and generated examples was also less effective than sequential presentation. One plausible explanation is that learners in the simultaneous group spent less time studying the provided examples, which attenuated the benefit of the provided examples. This possibility is consistent with the finding that the simultaneous group spent numerically less time during study and had lower performance for classification of studied examples (see Fig. 5).

Exploratory Analyses

As in experiment 1, we ran analyses to investigate if groups differed in example variability or in example quality. Concerning example variability, we first checked that participants complied with instructions; participants who generated examples did not repeat examples frequently (generated: M = 2% of the time, SE < 1; combination: M < 1% of the time, SE < 1; simultaneous: M < 1% of the time, SE < 1). However, as compared to the test value of 14% in the provided examples group, the generated examples group nonetheless had less variable examples, M = 30%, SE = 3; t(42) = 6.67 p < .001. Example variability in the combination group and the simultaneous group was similar to the provided examples group, M = 10%, SE = 1; t(46) = 4.78, p < .001 and M = 12%, SE = 1; t(48) = 2.11, p = .04, respectively.

Concerning example quality, total example quality was greater for the provided examples group (98%) compared to the generated examples group, the combination group, and the simultaneous group, M = 19%, SE = 3; t(39) = 23.55, p < .001; M = 77%, SE = 1; t(45) = 15.07, p < .001 and M = 74%, SE = 1; t(46) = 16.74, p < .001, respectively.

Concerning quality of generated examples only, quality was greater for the combination group than for the generated examples group, M = 57%, SE = 3 versus M = 20%, SE = 3; t(84) = 8.64, p < .001, d = 1.87. Generated example quality was also greater for the simultaneous group than for the generated examples group, M = 51%, SE = 3 versus M = 20%, SE = 3; t(85) = 7.04, p < .001, d = 1.51. Contrary to the prediction of the scaffolding hypothesis (i.e., that the simultaneous presentation would enhance example quality), example quality was greater for the combination group compared to the simultaneous group, t(91) = 1.63, p = .053, d = .34.

Although the quality of generated examples was greatest for the combination group, this advantage may partly reflect differential frequency of omissions. We replicated findings from experiment 1: significantly more omissions were committed during example generation trials in the generated examples group compared to the combination group, M = 27%, SE = 4 versus M = 5%, SE = 2; t(86) = 4.63, p < .001, d = .99. The simultaneous group had significantly fewer omissions (M = 9%, SE = 3) compared to the generated examples group, t(87) = 3.72, p < .001, d = .79. Additionally, the simultaneous group committed a similar number of omissions compared to the combination group, t(91) = .929, p = .36, d = .19. Nonetheless, example quality for the subset of trials in which participants attempted to generate an example was still lower for the generated examples group compared to the combination group, M = 26%, SE = 4 versus M = 61%, SE = 2; t(84) = 8.05, p < .001, d = 1.74. Additionally, example quality for the subset of trials in which participants attempted to generate an example was lower for the generated examples group compared to the simultaneous group, M = 26%, SE = 4 versus M = 56%, SE = 2; t(84) = 6.91, p < .001, d = 1.49.

General Discussion

Students are often asked to learn declarative concepts within their classes. Provided and generated examples are commonly used to support learning of these concepts. Motivated by the paucity of research on using examples for learning declarative concepts, the purpose of the current research was to competitively evaluate the effectiveness of example-based techniques for learning declarative concepts. In both experiments, we evaluated two key components of effectiveness, long-term learning, and efficiency. Given recent emphasis on the importance of replication and basing conclusions on multiple estimates of effect sizes, we adopted the continuously cumulating meta-analysis (CCMA) approach recommended by Braver et al. (2014). CCMAs comparing performance between groups for long-term learning (as indicated by novel classification performance) and efficiency (as indicated by time spent during the practice phase) are reported in Tables 5 and 6.
Table 5

Continuously cumulating meta-analysis (CCMA) outcomes for classification of novel examples on the final test

 

Mean diff

Spooled

t

p (2-tail)

Cohen’s d

Z

Provided vs. generated examples:

 Experiment 1

12

24

2.06

.043

0.47

2.02

 Experiment 2

7

22

1.61

.112

0.34

1.59

 CCMA results

   

.011

0.40

2.56

Combination vs. providedexamples:

 Experiment 1

−7

24

1.24

.220

0.29

1.23

 Experiment 2, sequential

−4

23

.75

.457

0.16

0.74

 Experiment 2, simultaneous

−9

23

1.98

.050

0.41

1.96

 CCMA results

   

.023

0.28

2.27

Combination vs. generated examples:

 Experiment 1

5

22

.96

.339

0.22

0.96

 Experiment 2, sequential

4

24

.75

.454

0.16

0.75

 Experiment 2, simultaneous

−2

23

.40

.693

0.08

0.40

 CCMA results

   

.461

0.09

1.21

Mean diff: mean difference between groups in the percentage correct. Cohen’s ds are reported as absolute values. Values in italics reflect CCMA results. Effect size homogeneity tests were nonsignificant for all CCMAs [provided vs. generated: Q(1) = .17, p = .68; combination vs. provided: Q(2) = .75, p = .69; combination vs. generated: Q(2) = 1.10, p = .58]

Table 6

Continuously cumulating meta-analysis (CCMA) outcomes for minutes spent during practice

 

Mean Diff

Spooled

t

p (2-tail)

Cohen’s d

Z

Providedvs. generated examples:

 Experiment 1

−13

11

5.06

<.001

1.16

4.67

 Experiment 2

−17

11

7.61

<.001

1.62

6.63

 CCMA results

   

<.001

1.39

7.99

Combination vs. provided examples

 Experiment 1

19

9

9.40

<.001

2.19

7.57

 Experiment 2, sequential

24

12

9.93

<.001

2.07

1.53

 Experiment 2, simultaneous

22

9

10.60

<.001

2.20

8.53

 CCMA results

   

<.001

2.15

13.99

Combination vs. generated examples:

 Experiment 1

6

13

1.99

.050

0.45

1.96

 Experiment 2, sequential

7

15

2.09

.040

0.44

2.06

 Experiment 2, simultaneous

5

14

1.59

.116

0.34

1.57

 CCMA results

   

.001

0.41

3.23

Mean Diff: mean difference between groups in the minutes spent during practice. Cohen’s ds are reported as absolute values. Values in italics reflect CCMA results. Effect size homogeneity tests were nonsignificant for all CCMAs [provided vs. generated: Q(1) = 1.72, p = .19; combination vs. provided: Q(2) = .14, p = .93; combination vs. generated: Q(2) = .17, p = .92]

Question 1: Are Provided or Generated Examples More Effective for Declarative Concept Learning?

Concerning long-term learning, learning was greater for the provided examples group compared to the generated examples group (classification of novel examples: pooled d = .40, 95% CI = .09–.71). Concerning efficiency, less time was spent during learning by the provided examples group compared to the generated examples group (pooled d = 1.39, 95% CI = 1.05–1.74). Taken together, these findings indicate that provided examples are more effective than generated examples for learning declarative concepts.

Question 2: Is the Combination of Provided and Generated Examples More Effective than Either Technique Alone?

Concerning long-term learning, learning was lower following the combination technique compared to studying provided examples (pooled d = .28, 95% CI = .04–.53). Long-term learning was also not greater following the combination technique compared to generating examples (pooled d = .09, 95% CI = −.15–.34). Concerning efficiency, more time was spent during learning using the combination technique compared to studying provided examples (pooled d = 2.15, 95% CI = 1.84–2.46). Additionally, more time was spent during learning using the combination technique compared to generating examples (pooled d = .41, 95% CI = .16–.66). Thus, the combination is not more effective than either technique alone.

Why were Provided Examples More Effective than Generated Examples?

Both long-term learning and efficiency were greater for provided examples compared to the generated examples. Note that the long-term learning advantage cannot be explained by time on task differences, a mundane explanation of perennial concern in research on learning techniques. This explanation is ruled out by the efficiency outcomes, given that much less time was spent studying provided examples compared to generating examples.

Another possible explanation for the learning advantage of provided examples over generated examples is that provided examples supported better memory for concept definitions. Presumably, learners retrieve concept definitions during example classification to help identify correct answers, and thus better definition memory may have enhanced classification performance. However, this explanation is also unlikely because definition cued recall did not differ between the provided examples and generated examples groups. Furthermore, definition recall was relatively low overall, suggesting that learners may not be able to rely heavily on memory for concept definitions to classify examples.

If learners do not rely on memory for concept definitions to help them classify examples, another possibility is that they remember examples they learned during acquisition and use those to support classification of novel examples via comparison or analogical reasoning. If so, one plausible explanation for the learning advantage of provided examples versus generated examples is that learners had better memory for the high-quality examples they studied during practice. During the classification test, students may be able to recall these examples and then use them to classify novel examples.

Why was the Combination Less Effective than Provided Examples?

Both long-term learning and efficiency were lower for the combination group compared to the provided examples group. To revisit possible explanations noted above, the learning advantage of provided examples over the combination cannot be due to time on task differences and is also unlikely due to differences in memory for concept definitions. Furthermore, differences in example memory is also an unlikely explanation. Both the combination and the provided example groups were exposed to high-quality examples during practice, and classification of the examples that were studied by the provided examples and combination groups was similar (see Figs. 2 and 4). This outcome suggests that both groups had similar memory for high-quality examples, but classification of novel examples was still greater for the provided examples group compared to the combination group. Therefore, although it is possible that both groups could be recalling studied examples and using them to help classify novel examples, it is unlikely to be the reason for the advantage of provided examples over the combination for long-term learning.

A more plausible explanation for why the combination method was less effective than provided examples concerns the difference in dosage of high-quality provided examples during practice. The provided examples group received four examples per concept, whereas the combination group received only two examples per concept. The extra examples in the provided examples group may have helped students develop a better understanding of the concepts. This possibility is consistent with findings from the analogical reasoning literature, which has demonstrated that receiving multiple analogs during acquisition results in better schema development (Gick and Holyoak 1983; Holyoak 2012). This better schema development aids participants to perform better in novel problem solving situations.

For example, Cantrambone and Holyoak (1989) had participants study one or two analogs that illustrated how to solve a particular kind of problem. Whereas the underlying problem solution was the same, the surface characteristics were different (putting out a fire by bringing in water from many different directions simultaneously and conquering a fortress by sending in military forces from many different directions simultaneously). Participants were then asked to solve the same underlying problem with a new scenario (destroying a tumor by applying radiation from many different angles simultaneously). Participants who studied two analogs versus one analog produced the correct solution in the tumor problem more frequently (71 versus 41%). Paralleling these findings to the current research, more provided examples may have resulted in better schema development for the provided examples group compared to the combination group.

Why was the Combination Not More Effective than Generated Examples?

Although differences in the dosage of high-quality provided examples may be a plausible explanation for why provided examples were more effective than the combination, this account is less plausible in light of the finding that the combination was not more effective than generated examples. With the assumption that the relationship between the number of provided examples and gains in long-term learning is linear, long-term learning should have been greater for the combination group compared to the generated examples group. However, one possibility is that the relationship between the number of provided examples and learning is non-linear, such that a certain threshold of examples needs to be reached before they become beneficial. One future direction for research is to investigate the impact of dosage of provided examples on long-term learning.

Given that the combination group generated better examples than the generated examples group, one might expect that the combination group would benefit more from the technique. However, for example-based learning for declarative concepts, Rawson and Dunlosky (2016) found that manipulations to enhance the quality of generated examples did not improve long-term learning. The current outcomes further suggest that the benefit of example generation depends more on the attempt than on the level of success. This possibility is consistent with findings from research on elaborative interrogation (a technique in which learners are prompted to explain why a fact is true), which often shows that subsequent fact recall is more highly related to whether learners attempted an explanation than to the quality of the explanation (e.g., Woloshyn et al. 1994). Concerning the likelihood of attempting generation in the current research, the percentage of example generation trials in which learners attempted to generate an example was greater in the combination group than in the generated example group. However, the absolute number of generation attempts was lower in the combination group than in the generated example group (19 versus 27 trials on average across experiments), given that the latter group completed twice as many example generation trials than in the combination group. This observation could partially explain why the groups did not differ much in long-term learning. With that said, further research is needed to tease apart the effects of attempting versus successfully generating examples.

Limitations and Future Directions

Given the paucity of research on example-based techniques for learning declarative concepts, many other interesting directions for future research remain open. For example, although the instructional materials used in these experiments were representative of materials students receive in their courses (the textbook passage, definitions, and provided examples were all extracted from actual course textbooks), the absolute level of performance achieved on the final tests was relatively modest. In the current research, students received minimal instruction before practice (specifically, they only read the textbook passage and studied each definition one time). In contrast, students in courses would likely receive lecture from an instructor and/or other kinds of in-class instruction before practice. This extra instruction in itself would likely result in overall better learning of the concepts. But additionally, receiving lecture first would give students a higher level of knowledge before beginning practice, which in turn may benefit the utility of example-based learning strategies. Further, receiving more extensive initial instruction may increase the efficiency of the example-based learning techniques. Thus, an interesting direction for future research is to investigate the utility of these strategies after receiving more extensive initial instruction.

Another way to possibly increase the effectiveness of these techniques is to implement example-based learning techniques over multiple sessions. To begin establishing the relative effectiveness of these techniques, our experiments involved learning within a single session. However, learning may be greater if examples are distributed across multiple learning sessions, consistent with research on other learning techniques showing greater levels of long-term learning when practice is distributed across sessions (Cepeda et al. 2006; Kornell 2009; Rawson and Dunlosky 2011; Vaughn et al. 2016). This possibility may be especially true for the combination schedule. The combination schedule may be more beneficial if students studied provided examples in one session to build up better knowledge of the concepts before moving on to example generation in a second learning session. This prediction aligns with findings in the rule-based concepts literature, in which learners with higher prior knowledge benefit more from problem solving, whereas learners with lower prior knowledge benefit more from practice schedules involving worked examples (i.e., the expertise reversal effect; see Kalyuga et al. 2012 for recent review).

Other combinations of provided and generated examples may also increase the effectiveness of the combination technique. The presentation of provided examples and generated examples in the alternating and simultaneous schedules used here were modeled after the presentation schedules typically used for presenting worked examples and problem solving in the rule-based concepts literature. However, a few studies in this literature have shown that a fading procedure can be more effective than the typical combination schedules (e.g., Renkl et al. 2002). In a fading procedure, students start by studying full worked examples, then solve partially completed problems, and then transition to full problem solving. By extension to declarative concept learning, one possibility is that fading from provided examples, to partially completed examples, to full example generation may be more effective than non-faded combination techniques.

Other directions for further research involve extending beyond the outcome measures examined here, both with respect to the kind of measure and timing of the test. We used a novel classification final test, which primarily measures conceptual understanding (Anderson et al. 2001) and is frequently used in the concepts literature to measure conceptual knowledge (Murphy 2004). In the current study, this measure involved far transfer on dimensions identified in Barnett and Ceci’s (2002) taxonomy (e.g., modality, timing). Future research could also include other final test measures to evaluate effects of example-based techniques on other aspects of conceptual learning, such as an example generation test or problem solving test to measure concept application.

Concerning timing of tests, we chose to implement a 2-day delay for several reasons. First, prior memory and learning research has consistently demonstrated that the most important functional distinction is between final tests given on the same day as practice versus a different day than practice, given that the steepest part of the forgetting function occurs shortly after learning with much more modest rates of loss thereafter (Ebbinghaus 1885). Second, we chose to start by investigating the possible effects of example-based techniques over a modest retention interval, because if no differences emerged after 2 days, investigating longer retention intervals that are logistically more difficult to implement would likely be unfruitful. Third, a 2-day delay is representative of the intervals at which students normatively engage in studying prior to an exam, given that students typically report spending most of their study time 1–2 days before their exam (e.g., Blaisman et al. in press; Taraban et al. 1999). Therefore, the findings here are relevant to what would occur if students used example-based learning techniques to prepare for an upcoming exam. With that said, educators want students to retain knowledge well beyond the exam, with the goal to instill durable, long-term learning that lasts beyond the duration of their course. Given that the current work has established the benefits of provided examples for enhancing learning over shorter retention intervals, an important next step is to investigate the durability of these benefits over longer retention intervals.

Conclusions

Declarative concepts are common and foundational in many courses, and example-based techniques are commonly used by students and instructors. Despite the prevalence of example-based concept learning in practice, surprisingly minimal research had investigated the effectiveness of these techniques. The current experiments provide foundational evidence concerning the relative effectiveness of three example-based techniques: studying provided or generating examples or the two techniques combined. The outcomes across both experiments are highly consistent and point to a clear prescriptive conclusion: Provided examples are most effective, both with respect to the level of long-term learning achieved and the efficiency of learning.

Footnotes

  1. 1.

    Declarative concepts are a type of relational category, which is an umbrella term used to encompass various subtypes of concepts that involve the representation of relations between features or entities. Research on relational categories is relatively scant compared to the vast literature on feature-based categories (i.e., categories that are represented by a set of independent, perceptual attributes). Goldwater and Schalk (2016) note that relational categories are distinct from feature-based categories in content and representational form; relational categories also differ from feature-based categories in the extent to which they involve processes such as structural alignment, mapping, integration, and analogical reasoning. Thus, “without a specific focus on relational categories and knowledge, the degree to which cognitive theories [i.e., theories about feature-based categories] can inform education is ultimately limited” (p. 730). In contrast, research on the learning of other kinds of relational categories can be leveraged to inform investigation of declarative concept learning of interest here. To foreshadow, we lean heavily on prior research on the learning of rule-based concepts (another type of relational category) to motivate our design and predictions in the current research.

  2. 2.

    Concerning long-term learning, no parallels can be drawn because the rule-based concepts literature has almost always used immediate rather than delayed tests to assess learning (e.g., Paas and van Merrienboer 1994).

References

  1. Anderson, L. W., Krathwohl, D. R., & Bloom, B. S. (2001). A taxonomy for learning, teaching and assessing: a revision of Bloom’s taxonomy of educational objectives. Boston, MA: Allyn & Bacon.Google Scholar
  2. Balch, W. R. (2005). Elaborations of introductory psychology terms: effects on test performance and subjective ratings. Teaching of Psychology, 32, 29–34.CrossRefGoogle Scholar
  3. Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn?: a taxonomy for far transfer. Psychological Bulletin, 128(4), 612–637.CrossRefGoogle Scholar
  4. Blaisman, R. N., Dunlosky, J., & Rawson, K. A. (in press). The what, how much, and when of study strategies: Comparing intended versus actual study behavior. Memory.Google Scholar
  5. Braver, S. L., Thoemmes, F. J., & Rosenthal, R. (2014). Continuously cumulating meta-analysis and replicability. Perspectives on Psychological Science, 9(3), 333–342.CrossRefGoogle Scholar
  6. Cantrambone, R., & Holyoak, K. J. (1989). Overcoming contextual limitations on problem-solving transfer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(6), 1147–1156.Google Scholar
  7. Carroll, W. M. (1994). Using worked examples as an instructional support in the algebra classroom. Journal of Educational Psychology, 86(3), 360–367.CrossRefGoogle Scholar
  8. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: a review and quantitative synthesis. Psychological Bulletin, 132(3), 354–380.CrossRefGoogle Scholar
  9. Cortina, J. M., & Nouri, H. (2000). Effect size for ANOVA designs. Thousand Oaks: CA: Sage.CrossRefGoogle Scholar
  10. Dornisch, M., Sperling, R. A., & Zeruth, J. A. (2011). The effects of level of elaboration on learners’ strategic processing of text. Instructional Science, 39, 1–26.CrossRefGoogle Scholar
  11. Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58.CrossRefGoogle Scholar
  12. Ebbinghaus, H. (1885). Memory: a contribution to experimental psychology. New York: NY: Columbia University Press.Google Scholar
  13. Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*power 3.1: tests for correlation and regression analyses. Behavior Research Methods, 41, 1149–1160.CrossRefGoogle Scholar
  14. Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1–38.CrossRefGoogle Scholar
  15. Glover, J. A., & Corkill, A. J. (1987). Influence of paraphrased repetitions on the spacing effect. Journal of Educational Psychology, 79, 198–199.CrossRefGoogle Scholar
  16. Goldwater, M. B., & Schalk, L. (2016). Relational categories as a bridge between cognitive and educational research. Psychological Bulletin, 142(7), 729–757.CrossRefGoogle Scholar
  17. Gorrell, J., Tricou, C., & Graham, A. (1991). Children’s short and long-term retention of science concepts via self-generated examples. Journal of Research in Childhood Education, 5(2), 100–108.CrossRefGoogle Scholar
  18. Gurung, R. A. R. (2005). How do students really study (and does it matter)? Teaching of Psychology, 31, 164–166.CrossRefGoogle Scholar
  19. Gurung, R. A. R., Weidert, J., & Jeske, A. (2010). Focusing on how students study. Journal of the Scholarship of Teaching and Learning, 10(1), 28–35.Google Scholar
  20. Hamilton, R. (1989). The effects of learner-generated elaborations on concept learning from prose. The Journal of Experimental Education, 57(3), 205–217.CrossRefGoogle Scholar
  21. Hamilton, R. (1990). The effect of elaboration on the acquisition of conceptual problem-solving skills from prose. The Journal of Experimental Education, 59(1), 5–17.CrossRefGoogle Scholar
  22. Hamilton, R. (1997). Effects of three types of elaboration on learning concepts from text. Contemporary Educational Psychology, 22, 299–318.CrossRefGoogle Scholar
  23. Hamilton, R. (1999). The role of elaboration within a text processing and text adjunct context. British Journal of Educational Psychology, 69, 363–376.CrossRefGoogle Scholar
  24. Hamilton, R. (2004). Material appropriate processing and elaboration: the impact of balanced and complementary types of processing on learning concepts from text. British Journal of Educational Psychology, 74, 221–237.CrossRefGoogle Scholar
  25. Holyoak, K. J. (2012). Analogy and relational reasoning. In K. J. Holyoak & R. G. Morrison (Eds.), The Oxford handbook of thinking and reasoning (234–259). New York: Oxford University Press.CrossRefGoogle Scholar
  26. Howe, M. J. A., & Singer, L. (1975). Presentation variables and students’ activities in meaningful learning. British Journal of Educational Psychology, 45, 52–61.CrossRefGoogle Scholar
  27. Judd, C. M., & McClelland, G. H. (1989). Data analysis: a model-comparison approach. San Diego, CA: Harcourt Brace Jovanovich.Google Scholar
  28. Kalyuga, S., Rikers, R., & Paas, F. (2012). Educational implications of expertise reversal effects in learning and performance of complex cognitive and sensorimotor skills. Educational Psychology Review, 24(2), 313–337.CrossRefGoogle Scholar
  29. Kornell, N. (2009). Optimising learning using flashcards: spacing is more effective than cramming. Applied Cognitive Psychology, 23, 1297–1317.CrossRefGoogle Scholar
  30. Leahy, W., Hanham, J., & Sweller, J. (2015). High element interactivity information during problem solving may lead to failure to obtain the testing effect. Educational Psychology Review, 27(2), 291–304.CrossRefGoogle Scholar
  31. Lishner, D. A. (2015). A concise set of core recommendations to promote the dependability of psychological research. Review of General Psychology, 19, 52–68.CrossRefGoogle Scholar
  32. Maner, J. K. (2014). Let’s put our money where our mouth is. If authors are to change their ways, reviewers (and editors) must change with them. Perspectives in Psychological Science, 9, 343–351.CrossRefGoogle Scholar
  33. Murphy, G. (2004). The big book of concepts. Cambridge: MIT press.Google Scholar
  34. Nievelstein, F., van Gog, T., van Dijck, G., & Boshuizen, H. P. A. (2013). The worked example and expertise reversal effect in less structured tasks: learning to reason about legal cases. Contemporary Educational Psychology, 38(2), 118–125.CrossRefGoogle Scholar
  35. Paas, F., & van Merrienboer, J. J. G. (1994). Variability of worked examples and transfer of geometrical problem-solving skills: a cognitive load approach. Journal of Educational Psychology, 86, 122–133.CrossRefGoogle Scholar
  36. Rawson, K. A., & Dunlosky, J. (2011). Optimizing schedules of retrieval practice for durable and efficient learning: how much is enough? Journal of Experimental Psychology: General, 140(3), 283–302.CrossRefGoogle Scholar
  37. Rawson, K. A., & Dunlosky, J. (2013). Relearning attenuates the benefits and costs of spacing. Journal of Experimental Psychology: General, 142, 1113–1129.CrossRefGoogle Scholar
  38. Rawson, K. A., & Dunlosky, J. (2016). How effective is example generation for learning declarative concepts? Educational Psychology Review, 28(3), 649–672.CrossRefGoogle Scholar
  39. Rawson, K. A., Thomas, R. C., & Jacoby, L. L. (2015). The power of examples: illustrative examples enhance conceptual learning of declarative concepts. Educational Psychology Review, 27, 483–504.CrossRefGoogle Scholar
  40. Renkl, A., Atkinson, R. K., Maier, U. H., & Staley, R. (2002). From example study to problem solving: smooth transitions help learning. Journal of Experimental Education, 70(4), 293–315.CrossRefGoogle Scholar
  41. Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences, 15(1), 20–27.CrossRefGoogle Scholar
  42. Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis: focused comparisons in the analysis of variance. New York: Cambridge University Press.Google Scholar
  43. Rowland, C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432–1463.CrossRefGoogle Scholar
  44. Salden, R. J. C. M., Koedinger, K. R., Renkl, A., Aleven, V., & McLaren, B. M. (2010). Accounting for beneficial effects of worked examples in tutored problem solving. Educational Psychology Review, 22, 379–392.CrossRefGoogle Scholar
  45. Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100.CrossRefGoogle Scholar
  46. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collections and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.CrossRefGoogle Scholar
  47. Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science, 9, 76–80.CrossRefGoogle Scholar
  48. Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics. Boston, MA : Allyn & Bacon.Google Scholar
  49. Taraban, R., Maki, W. S., & Rynearson, K. (1999). Measuring study time distributions: implications for designing computer-based courses. Behavior Research Methods, Instruments, & Computers, 31(2), 263–269.CrossRefGoogle Scholar
  50. van Gog, T., & Kester, L. (2012). A test of the testing effect: acquiring problem-solving skills from worked examples. Cognitive Science, 36(8), 1532–1541.CrossRefGoogle Scholar
  51. van Gog, T., Kester, L., Dirkx, K., Hoogerheide, V., Boerboom, J., & Verkoeijen, P. P. J. L. (2015). Testing after worked example study does not enhance delayed problem-solving performance compared to restudy. Educational Psychology Review, 27(2), 265–289.CrossRefGoogle Scholar
  52. van Gog, T., Paas, F., & van Merrienboer, J. J. G. (2006). Effects of process-oriented worked examples on troubleshooting transfer performance. Learning and Instruction, 16(2), 154–164.CrossRefGoogle Scholar
  53. Vaughn, K. E., Dunlosky, J., & Rawson, K. A. (2016). Effects of successive relearning on recall: does relearning override the effects of initial learning criterion? Memory and Cognition, 44, 897–909.CrossRefGoogle Scholar
  54. Weinstein, Y. Lawrence, J. S., Tran, N., Frye, A. A. (2013). How and how much do students really study? Tracking study habits with the diary method. Poster presented at the 54th Annual Meeting of the Psychonomic Society; Nov. 14–17; Toronto, ON, Canada.Google Scholar
  55. Wilkinson, L., & Task Force on Statistical Inference, American Psychological Association, Science Directorate. (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594–604.CrossRefGoogle Scholar
  56. Woloshyn, V. E., Paivio, A., & Pressley, M. (1994). Use of elaborative interrogation to help students acquire information consistent with prior knowledge and information inconsistent with prior knowledge. Journal of Educational Psychology, 86(1), 79–89.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Psychological SciencesKent State UniversityKentUSA

Personalised recommendations