Introduction

Within the field of gifted education, a recognition exists that special educational interventions need to be provided to optimally support the unique educational needs of gifted students (Hertberg-Davis & Callahan, 2013). Unfortunately, the current range of gifted programs that claim to meet the needs of gifted students in the classroom do not always appear to be informed by the research on the learning processes of these students. Questions therefore exist over the effectiveness, sustainability, and defensibility of some of these programs (Dimitriadis, 2016; Freeman et al., 2010; Gagné, 2011; Hertberg-Davis & Callahan, 2013; Renzulli, 2012).

To inform the future development of optimal gifted programs that give due acknowledgement to the specific learning processes of gifted students, this study draws on the latest educational research on learning and instruction, and integrates it with the research in the international field of gifted education. Specifically, this study investigated Invention as Preparation of Future Learning (IPL) while drawing on Cognitive Load Theory and the learning needs of gifted learners, to advance understanding on the optimal approaches to support the learning and instructional needs of gifted learners.

Giftedness and gifted education

The fundamental premise of the field of the gifted education is that there exists a group of individuals who have a unique set of abilities, traits, characteristics, and learning profiles that distinguish them from their age peers (Callahan & Hertberg-Davis, 2013). A plethora of definitions and theories of giftedness exist, most of which may be traced back to the work of Terman and Oden (1959), who conducted one of the first studies to develop and establish a specific and narrow definition of giftedness—a conception of giftedness based on IQ. According to Terman, the top 1% of scorers on the Stanford–Binet Intelligence Scale may be considered gifted.

Today, multiple other definitions and theories of giftedness, that espouse broader and more expanded conceptions of giftedness, are being used. One of the most prominent of these is Gagné’s Differentiated Model of Giftedness and Talent (DMGT; (Gagné, 2003, 2013), which proposes that giftedness is the possession of outstanding natural abilities in any domain (e.g., intellectual, creative, socio-affective, and sensorimotor) that may be translated into outstanding achievements in one or more fields, through a developmental process that is influenced by intrapersonal factors, environmental factors, and chance (Gagné, 2003, 2009). Someone who is considered gifted or talented under Gagné’s DMGT is placed within the top 10% of ability (i.e., gifted) or the top 10% of achievement (i.e., talented) among his/her age peers. This differentiation between “gifted” and “talented” in the model gives acknowledgement to learners whose high ability may not be translated into equivalent levels of performance for any reason. The DMGT forms the basis of policy and related documents in Australia, where this study was conducted (Bannister-Tyrrell, 2017; Gagné, 2021; Merrotsy, 2017).

Several scholars have argued in recent years that gifted education serves dual purposes. The first purpose is to ensure that educational interventions for gifted students are adapted to fit the particular educational needs of these students. Although gifted students may possess high levels of potential, a lack of access to appropriate educational programs and provisions may lead to negative outcomes for these students—ranging from boredom, underachievement, to even depression (Neihart, 2007, 2016). In order to optimally support gifted students, a number of educational interventions such as acceleration, ability grouping, curriculum differentiation, and mentorships are commonly recommended in the literature (Bartley-Buntz & Kronborg, 2018; Eddles-Hirsch, 2019; Jarvis & Henderson, 2012; VanTassel-Baska, 2005). Among the key considerations in making decisions about the specific programs and provisions to be provided to gifted students are their traits and characteristics (Dai & Chen, 2013; Eysink et al., 2015; Subotnik et al., 2011; VanTassel-Baska & Stambaugh, 2006), including a heightened level of curiosity, a high level of motivation when appropriate challenge is provided, effective problem-solving skills, a scientific approach to solving problems, a strong capacity for the detection of underlying patterns, the identification of relationships between concepts, and strong analytic and metacognitive skills (Barfurth et al., 2009; Leikin & Sriraman, 2016; Rogers, 2007; Sriraman, 2003).

To allow gifted students to optimally utilize these skills, which may all be considered to be related to problem-solving, scholars have recommended that gifted students be provided with authentic and open tasks, which are aligned with the gifted student’s specific needs such as challenge, independent work, and creativity (Bonotto & Santo, 2015; Eysink et al., 2015; VanTassel-Baska, 2013). Gifted students are not only likely to prefer such approaches to instruction, but may benefit from them as well (Eysink et al., 2015; Olszewski-Kubilius et al., 2017; Westberg & Leppien, 2018). Relatedly, some scholars have suggested that gifted students may perform better in less structured environments in comparison to non-gifted students. For example, in a study on undergraduate students learning “hold’em poker”, a game with significant complexity, ability (as measured through Scholastic Assessment Test scores) was found to explain a significant proportion of the variance in performance when little to no guidance was provided to the participants. In comparison, ability was not found to predict performance when participants were in conditions that provided more structure and instruction (DeDonno, 2016).

In addition to the provision of appropriate educational interventions, the second and more contentious purpose of gifted education, is the development of the next generation of exceptional individuals who make innovative and ground breaking advances in knowledge for the betterment of society (Subotnik & Rickoff, 2010; Subotnik et al., 2011). Although not all gifted individuals may achieve such creative successes, scholars in the field believe that relevant educational opportunities that support such development should at least be provided (Olszewski-Kubilius et al., 2016). Unfortunately, many gifted students have very limited access to such opportunities. In particular, and despite its recognition as an important component of the development of gifted students, many researchers suggest that very little has been done to promote creativity in gifted students (Baer, 2016).

Invention-first instruction

One instructional strategy that may be particularly useful in the promotion of creativity is Invention as Preparation for Future Learning (IPL), or invention-first instruction. Invention-first instruction is an instructional strategy that is hallmarked by two essential components—first, the solving of problems, followed by explicit instruction (Schwartz & Bransford, 1998; Schwartz & Martin, 2004; Schwartz et al., 2005). This instructional strategy is often referred to by other monikers, including productive failure (Kapur, 2008). Although invention-first instruction may be considered to be a relatively new approach or method of learning, its constituent components have had relatively long histories in both theory and practice, with each component being recognized for its respective merits and drawbacks. For example, the first component of invention-first instruction, solving problems, has long been seen as an important part of schema acquisition. The second component, explicit instruction, is commonly viewed as the “gold standard” of teaching methods. Nevertheless, the idea that a problem should be solved prior to the presentation of explicit instruction, which forms the underlying basis of invention-first instruction, has sparked considerable debate amongst scholars and educators.

One of the earliest advocates of invention-first instruction was Schwartz and Martin (2004). In their study, they asked students to complete two activities in sequence—a problem-solving/invention activity, whereby the participating students were presented with the task of inventing solutions to a novel problem, followed by instruction, whereby the students were explicitly taught the canonical solution. The findings of their study showed that task sequence significantly affected performance on a delayed post-test. That is, students who completed an invention activity prior to the instruction stage performed significantly better than students in the “tell-and-practice” condition (i.e., explicit instruction only). Schwartz and Martin (2004) consequently theorized that students in the invention groups were primed for learning (unlike those in the explicit instruction groups), and therefore performed better when presented, post-invention, with relevant learning materials (hence the term Invention as Preparation for Future Learning). These findings align with several learning theories (e.g., impasse-driven learning and the Zone of Proximal Development) which have suggested that the commencement of instruction practices with problem-solving activities may facilitate and encourage the uptake of new knowledge upon the receipt of instruction later on (Van Lehn, 1988; Van Lehn et al., 2003).

Kapur (2008) built upon the findings of Schwartz and Martin (2004) and coined the term “productive failure” to describe a phenomenon whereby students perform better when completing problems designed to incite failure prior to receiving instruction. It is noted that while the term invention-first instruction makes reference to “instruction” (and therefore highlights the notion that this is an instructional strategy), the term productive failure highlights the proposed outcome of such an instructional strategy (Kapur, 2016). In later work, Kapur and Bielaczyc (2012) compared instruction and problem pairs, whereby students would either receive instruction followed by a problem to solve, or vice versa. These studies found that students in the “problem-instruction” condition who received a problem to be solved first, performed better than their counterparts who received instruction first. Following these studies, studies on invention-first instruction have been conducted in multiple different contexts, including statistics (Kapur, 2014; Schwartz & Martin, 2004), biology (Song, 2018), mathematics (Coppens et al., 2019), and the assessment of learning strategies (Glogger-Frey et al., 2015).

Mechanisms of invention-first instruction

Unlike the explicit instruction approach, invention-first instruction and other “problem-solving first” instructional approaches appear to prepare students for future learning by encouraging the generation of pre-learning constructs (Lorch et al., 2010; Schwartz & Martin, 2004). Such pre-learning constructs may be used by students to allow them to more easily assimilate future instructional content (Glogger-Frey et al., 2015). Unfortunately, the specific mechanisms of how exactly this works is yet to be fully clarified. Indeed, despite the abundance of research on “problem-solving first” instructional approaches, only a handful of studies have investigated the mechanisms of invention-first instruction and how specifically these mechanisms may affect learning (Loibl et al., 2017).

Awareness of knowledge gaps

One possible mechanism to explain the impact of invention-first instruction on learning may be related to the increased awareness of knowledge gaps that result from such approaches (Glogger-Frey et al., 2013, 2015; Loibl & Rummel, 2014; Loibl et al., 2020). Specifically, it is possible that during the process of inventing a solution to a novel problem, many students reach an impasse in conjunction with the realization that their activated prior knowledge is inadequate. Such situations may aid students to perceive more clearly what specifically they are lacking in knowledge (i.e., knowledge gaps). This in turn may result in a heightened focus on relevant material when instruction is subsequently received (Loibl et al., 2017; Renkl, 2015). The mechanism contrasts with the underlying mechanisms of explicit instruction, which focus mental resources on schema acquisition with minimum cognitive load (Kalyuga & Singh, 2015), and whose advocates maintain that invention-first instruction may cause learners to have an excessive amount of extraneous cognitive load (Loibl et al., 2017).

Motivational factors

Aside from the cognitive mechanism noted above, there may also be motivational mechanisms underlying invention-first instruction. Glogger-Frey et al. (2015), for example, point out that the commencement of instruction with the requirement for solutions to be invented may affect motivational states in students, such as epistemic curiosity (i.e., a response to stimuli that motivates knowledge acquisition and exploratory behaviour). Epistemic curiosity may increase when invention-first instruction is required of students—due to the gaps in knowledge that become apparent, or the natural human propensity to invent and produce (Schwartz & Martin, 2004). In turn, an increase in epistemic curiosity may foster intrinsic motivation and continued interest in the activity (Hidi & Renninger, 2006; Schiefele, 1991).

Deep structure information

Another possible mechanism of interest is the role of deep structure information on the facilitation of learning. Studies on the topic have suggested that a recognition of the underlying structure (i.e., the deep features) of a problem may have a positive impact on student learning, particularly with respect to transfer (Jitendra et al., 2013, 2015; Lee et al., 2017; Van Dooren et al., 2010). Of note, studies that have explored the facilitation of learning through strategies that promote the recognition of the deep features of a problem within invention-first instruction, have found that such strategies may contribute to the greater effectiveness of invention-first instruction (Loibl et al., 2017). In practice, deep structure recognition in students may be facilitated through the use of high contrast between cases, where problem sets differ from one another with respect to one deep feature at a time for each problem set (and therefore serve to highlight the deep features; Schwartz & Bransford, 1998; Schwartz & Martin, 2004). Problems that present a high contrast between cases may promote the identification of the deep features of a problem for use by students in their attempts to solve the problem (Roll et al., 2014). Not only may this increase the likelihood that students produce more functional solutions (Loibl & Rummel, 2014), it may also allow students to notice when their solutions do not account for deep features (Loibl et al., 2020). In turn, this may cause students to become more aware of their knowledge gaps (Roll et al., 2011), and therefore be primed for future instruction.

Some studies have previously found invention-first instruction with high contrast to be effective (Belenky & Nokes-Malach, 2012; Schwartz et al., 2011). Nevertheless, few studies have investigated the unique impact of studying contrast by directly comparing the effects of the high vs. low contrast in the context of invention-first instruction (Loibl et al., 2020). This, in addition to some variation in study design in the invention-first instruction literature, has meant that that there continues to be some lack of clarity on the underlying mechanisms for the beneficial effects of invention-first instruction. As noted by Loibl et al. (2020), there is a need for full-factorial designs that keep manipulations constant between conditions (including contrast in explicit instruction).

Success

The last area of interest within invention-first instruction is success in the invention task. Success may be operationalised in terms of quality of solution or how close the solution is to the canonical solution. Some discussion exists on whether the spontaneous arrival at high-quality solutions during invention tasks may be conducive to learning in invention-first instruction more so than in situations where there is a failure to reach such solutions (Chin et al., 2016; Schalk et al., 2018; Schwartz et al., 2011; Sinha et al., 2020). The findings are somewhat mixed. On the one hand are a number of studies that have found correlational evidence (ranging from r = .33–.65, p < .01) for a positive relationship between arriving at appropriate solutions in invention tasks and overall performance across various topics (Glogger-Frey et al., 2015; Nachtigall et al., 2020; Schalk et al., 2018). On the other are studies such as Hartmann et al. (2021) which identified no significant relationship between solution quality and conceptual knowledge acquisition, possibly due to factors such as a lack of engagement in extensive prior knowledge activation. In comparison, Loibl and Rummel (2014) did not find significant correlations between solution quality and procedural knowledge, but did find significant correlations between solution quality and conceptual knowledge for unguided problem-solving prior to instruction (r = .33, p = .04). Interestingly, Loibl et al. (2020) has suggested that the solution quality in invention-first instruction conditions may not always be very high.

Cognitive load theory and explicit instruction

Despite the potential of invention-first instruction, a large group of scholars and educators believe that the explicit instruction approach, which requires the upfront provision of all essential information to learners during instruction, may be the most efficient form of learning, particularly for novice learners (Sweller et al., 2011). In fact, such a method of instruction appears to dominate classroom practices around the world today (Hiebert & Stigler, 2004). A widely researched instructional theory, Cognitive Load Theory, forms the basis for many of the principles behind explicit instruction.

In particular, Cognitive Load Theory posits that the way information processing structures of human cognitive architecture (i.e., working memory and long-term memory) are organized may limit the amount of information that humans are able to process or manipulate in their working memory at any one given point in time (Sweller, 1988). Consequently, in order to increase the efficiency of instruction [e.g., increase the number or quality of schemas (organized knowledge structures) students are able to acquire in the shortest amount of time], the theory proposes that the design of instructional materials and learner activities need to recognize these human limits. One of the ways to achieve this may be the avoidance of any information or activities that may contribute to unnecessary (extraneous) cognitive load, particularly information or activities that are not directly relevant to acquiring the desired schemas (Hsu et al., 2015; Kirschner et al., 2006). For example, a reduction in extraneous cognitive load may be achieved by the avoidance of any random searches for solutions to problems in novel task domains (as is required in the first stage of invention-first instruction).

A large body of research on Cognitive Load Theory has demonstrated the efficiency and advantages of explicit instruction approaches over limited-guidance instructional approaches for novice learners (Sweller et al., 2011). The worked example effect, in particular, has demonstrated that the provision of full guidance to learners in the form of a worked-out example may result in superior performance when compared to an instructional problem-solving task that is provided without any guidance (Sweller et al., 2011). That is, by making all the required knowledge (e.g., problem-solving steps) explicitly available to learners upfront, worked examples may provide an efficient way for learners to acquire the relevant organized knowledge structures or schemas in the task domain (Chen et al., 2015). Scholars have proposed that such an effect may usually be observed only when dealing with relatively complex materials and tasks that involve many interconnected elements of information that need to be processed in working memory at the same time (i.e., materials with high levels of element interactivity; Sweller et al., 2011).

A number of studies have been undertaken to compare the potential effects of the provision of worked examples prior to instruction with invention tasks prior to instruction. Rather than making comparisons of the traditional explicit instruction approach (i.e., instruction followed by problem-solving) with the invention-first instruction approach (i.e., invention followed by instruction), these studies compared the preparatory effects of worked examples and invention tasks on later instruction (Cook, 2017; Glogger-Frey et al., 2015, 2017; Hartmann et al., 2021; Likourezos & Kalyuga, 2017; Newman & DeCaro, 2019; Roelle & Berthold, 2015). While the findings of some of these studies indicated a lack of any meaningful difference between worked examples and invention tasks in conjunction with delayed instruction (Cook, 2017; Likourezos & Kalyuga, 2017), other studies have identified differences.

For instance, Glogger-Frey et al. (2015) found that although the invention-first (i.e., invention task followed by instruction) condition resulted in an increase in perceived knowledge gaps, curiosity, interest, and extraneous cognitive load (d = 0.94), the example-first (i.e., worked example followed by instruction) condition resulted in increased transfer (d = 0.71–0.72). In contrast, Glogger-Frey et al. (2017) found that the invention-first condition led to increased transfer [d = 0.54 (near transfer)–0.61 (far transfer)] but also increased extraneous cognitive load, which was attributed to the opportunity for practice afforded by the inclusion of a second problem-solving activity prior to instruction in the invention-first condition. The importance of the effects of practice on learning outcomes was also identified by Loibl et al. (2020), who pointed out that in studies with neutral to positive effects in favour of invention-first instruction, the participating students in the invention-first conditions were presented with an additional problem-solving activity after instruction (i.e., invention–instruction–problem-solving). These findings suggest that the combination of an invention activity and a problem-solving activity may be what accounts for increased learning in some approaches to invention-first instruction.

Further insights into worked examples in comparison to invention prior to instruction were provided by Newman and DeCaro (2019), who conducted a number of experiments (some of which featured pre-tests) to investigate the optimal level of guidance for students during learning. While one experiment, featuring a pre-test, corroborated the findings of Glogger-Frey et al. (2015) with respect to the greater transfer effects of example-first conditions in comparison to invention-first conditions, a second experiment (which included no pre-test) identified no difference in outcomes between the two instructional conditions. In comparison, the third experiment found that only in conditions that incorporated a pre-test, participants in the example-first condition demonstrated higher post-test scores (d = 0.62) than the participants in the invention-first condition. At the conclusion of the study, Newman and DeCaro (2019) proposed that due to the similarity of the pre-test to the learning activity (i.e., both tasked students with the identification of the cinema that had the most consistent attendance), the pre-test may have effectively served as an invention task that activated relevant instructional goals (Newman & DeCaro, 2019). Taken together, the findings of the experiments suggest that students in worked example-first conditions may learn more effectively when the worked example is preceded by an activity that activates instructional goals, similar to those that an invention-task may activate (e.g., activating prior knowledge and increasing awareness of knowledge gaps).

It is apparent that the numerous comparisons that have been made between worked examples and invention tasks as preparation for instruction have not produced a conclusive answer on the approach that may be more effective. Nevertheless, as for invention-first instruction, some key underlying mechanisms of worked examples appear to exist that support their use in instruction. First of all, worked examples may present less extraneous cognitive load, to free up working memory capacity for the processing of intrinsic material associated with a learning activity (Glogger-Frey et al., 2017). Additionally, worked examples have been demonstrated to increase the self-efficacy of learners (Crippen & Earl, 2007), which may mediate instructional effects on learning outcomes. Indeed, Glogger-Frey et al. (2015) found that self-efficacy may mediate the effects of example-first instruction on far transfer, which suggests that self-efficacy may partially explain the high performance scores that are often seen among students in example-first conditions.

Invention-first and giftedness (as an individual difference)

Although individual differences have been acknowledged to potentially influence the effects of invention-first instruction (Kapur & Bielaczyc, 2012), few studies in the area have a focus on how specifically individual differences may alter the effect of invention-first instruction on learning outcomes. One speculated individual difference is age. For example, Sinha and Kapur (2021) found, in a meta-analytic review, that while older students seemed to benefit from invention-first methods, this was not the case for students in younger age groups (2nd–5th graders). One possible explanation for the result may be related to the profile of younger students—they may possess insufficient prior knowledge on motivational and metacognitive strategies to respond optimally to invention-first learning, specifically in response to failure (Mazziotti et al., 2015), resulting in some reduction to the preparatory benefits of invention (Sinha & Kapur, 2021).

There has also been research that provides some evidence for the interaction of academic ability with invention-first instruction. For example, in Kapur and Bielaczyc (2012), the authors studied three schools, two of which predominantly enrolled students of below-average ability and one which predominantly enrolled students of average ability. The authors found that while some below-average classrooms did not demonstrate greater benefit from invention-first instruction in comparison to explicit instruction on several learning outcomes (i.e., graphical representation and well-structured items), this was the case in all the average classrooms. Nevertheless, multiple confounding differences between teacher, school and intervention specifics make it difficult to attribute learning outcome differences to any one factor. Relatedly, Wiedmann et al. (2012) found that mixed-ability and high-ability groups generated more solution attempts and more high quality solutions in comparison to low-ability groups, when invention-first instruction was implemented. Unfortunately, the lack of a comparison group made it difficult to conclusively determine whether the high-ability group would have benefitted more from invention-first instruction in comparison to example-first instruction.

For gifted students, the research suggests that invention-first instruction may possibly be a more effective form of instruction than example-first instruction—many gifted students have been found to have an affinity toward problem-solving activities (Leikin & Sriraman, 2016; Rogers, 2007; Sriraman, 2003), with some scholars indicating that performance in problem-solving activities may be used to identify giftedness (Threlfall & Hargreaves, 2008). Of note, problem-solving activities are often recommended to gifted students as an appropriate means to address their educational needs (Koichu, 2011; Ryser & Johnsen, 1996; Uçar et al., 2017). Problem-solving instruction may be particularly useful for gifted students as it may be possible to provide an appropriate level of challenge (Callahan et al., 2015; Leikin & Sriraman, 2016), which has been shown to enhance the academic performance, self-efficacy, and the motivation of gifted students (Reis & Boeve, 2009; Thompson, 2011). Furthermore, problem-solving instruction may provide greater opportunities for gifted students to be creative, which is another commonly recommended approach to support their educational needs (Sternberg et al., 2004).

Complementing the studies that suggest the possible benefits of invention-first instruction for gifted students, are studies which indicate that worked examples may not be an ideal form of instruction for these students. Specifically, Schwaighofer et al. (2016), who examined the moderating effects of several variables [including working memory capacity, ability to flexibly switch between different tasks or strategies (shifting ability), prior knowledge, and fluid intelligence on knowledge acquisition] in the presence and absence of worked examples, found that shifting ability and fluid intelligence were moderators of the effect of the worked examples on knowledge acquisition. That is, the higher the shifting ability and fluid intelligence (which may arguably be more common in gifted students than non-gifted students), the lower the benefit of worked examples in comparison to problem-solving activities.

While studies in the area that have been conducted specifically with gifted students are scarce, one exception is Coppens et al. (2019), who examined the performance of gifted and non-gifted elementary school students on problem-example pairs. In their findings, the authors found no effect of task sequence (i.e., invention–example–problem–example vs. example–problem–example–problem) on performance for either the gifted or non-gifted cohorts. As an explanation, Coppens et al. (2019) indicated that the relatively young age of the participants in combination with the puzzle-like nature of the given task may have produced increased levels of motivation for all participants, which may have resulted in little difference in performance between the gifted and non-gifted participants (Coppens et al., 2019).

Besides task sequence, a final area of interest in investigations of invention-first instruction with gifted students may be the recognition of the underlying problem structure during instruction. According to the literature in gifted education, it is possible that gifted students may be able to recognize the underlying problem structure without the need for scaffolding offered by a high level of contrast between cases. That is, a number of studies have indicated that gifted individuals may differ from their non-gifted peers in their superior abilities to make generalisations (Amit & Neria, 2008; Kanevsky, 1990; Kanevsky & Keighly, 2003; Sriraman, 2003, 2008). Any such abilities to identify similarities from a collection of entities may mean that gifted students are more readily able than non-gifted students to recognize the deep features in underlying problems, with or without the presence of instructional strategies that promote the identification of deep features.

Research questions and hypotheses

The present study aimed to investigate the potential differential effects of invention-first instruction vs. example-first instruction, and the use of high vs. low contrast materials, on the learning of gifted students in comparison to non-gifted students. As such, several research questions were set out to be answered:

  1. (a)

    How do invention-first and example-first instruction compare in increasing learning outcomes? Does giftedness interact with this effect?

  2. (b)

    How does the use of high vs. low contrast materials affect learning outcomes and deep feature recall? Does giftedness interact with this effect?

  3. (c)

    Do invention-first learners experience less self-efficacy and more extraneous load, awareness of knowledge gaps, curiosity, or interest?

To answer these questions, an examination was made of the impact of instructional sequence (i.e., invention-first instruction vs. example-first instruction) and contrast (i.e., high vs. low contrast materials) on the learning outcomes of gifted and non-gifted students with respect to transfer and procedural knowledge. A 2 (invention-first vs. example-first instruction) × 2 (high vs. low contrast materials) × 2 (gifted vs average learner) design was therefore implemented.

We expected to find similar results to Glogger-Frey et al. (2017), with increased near, far transfer, and deep feature recall in the invention-first conditions in comparison to the example-first conditions, due to the expected increase in prior knowledge activation, perception of knowledge gaps, curiosity, interest, as well as an incorporation of increased practice. In comparison, no difference was expected between the invention-first and the example-first groups on procedural knowledge (Glogger-Frey et al., 2015, 2017). Additionally, we expected to find that secondary gifted students, in comparison to their non-gifted peers, may have superior learning in invention-first conditions, since learners who are in 6th grade and above (Sinha & Kapur, 2021) and who have high academic ability (Kapur & Bielaczyc, 2012), appear to be more responsive to invention-first instruction.

Furthermore, we expected that increasing the visibility of the underlying problem structure using contrast would increase learning outcomes as assessed using post-tests for all participants. Finally, we expected that due to a superior ability of gifted students to make generalisations (Amit & Neria, 2008), the effect of contrast on learning outcomes would be more beneficial for non-gifted students in comparison to gifted students. Consequently, the following hypotheses have been proposed for this study:

Hypothesis 1

The invention-first groups will perform better in near transfer (1a), far transfer (1b), and deep feature recall (1d), but no differently in calculation (1c), in comparison to the example-first groups.

Hypothesis 2

Gifted students will perform better on the learning outcomes of near transfer (2a), far transfer (2b), and calculation (2c) under invention-first instruction, in comparison to example-first instruction.

Hypothesis 3

Students under invention-first instruction conditions will experience increased perceptions of knowledge gaps (3a), extraneous cognitive load (3b) and curiosity (3c), and decreased self-efficacy (3d), in comparison to students under example-first conditions.

Hypothesis 4

The use of high contrast materials will increase near transfer (4a), far transfer (4b), calculation (4c), and recall of deep features (4d) for the participants.

Hypothesis 5

The positive effect of high contrast materials will be smaller for gifted students than non-gifted students for near transfer (5a), far transfer (5b), calculation (5c), and deep feature recall (5d).

Methods

Participants

The participants of the study comprised 199 (i.e., 92 female and 102 male) seventh-grade students from three different high schools (i.e., two partially academically selective government schools and one Independent school) in Sydney, New South Wales, Australia. All participants were asked several socio-demographic questions, including whether they had been identified as a gifted student, and whether they were enrolled in a selective or accelerated mathematics class. Students (99 participants) who self-identified as gifted and were able to confirm that they were identified by a psychologist or their school as being gifted, were classified as gifted for the purposes of this study.

The students who were classified as gifted may largely be considered to be intellectually gifted and academically talented under Gagné’s DMGT (i.e., top 10% of intellectual ability and academic achievement among age peers). Those students who attended the selective streams of one of the two partially academically selective government schools were able to gain a place in these streams (which comprise less than 5% of the total high school places in the state of New South Wales, Australia; Australian Bureau of Statistics, 2020; New South Wales Department of Education & Communities, 2020a) on the basis of their performance in assessments of their intellectual ability (i.e., a standardized test of their reading, writing, mathematical reasoning and general abilities) and academic achievement (i.e., performance on English and Mathematics assessments; New South Wales Department of Education & Communities, 2020b). In comparison, gifted students who attended the third school (i.e. the Independent School) may also be considered intellectually gifted and academically talented under Gagné’s DMGT, as they were identified by a psychologist (who uses performance in IQ tests; Jung & Worrell, 2017) and/or the school’s gifted identification processes (which includes assessments of ability and achievement in mathematics and English; B. Dean, personal communication, March 8, 2021).

In addition to gifted students, the three participating schools enrolled substantial numbers of non-gifted students. While the two partially academically selective schools feature classes that form part of the academically selective stream, they also have non-selective classes for which intellectual/academic criteria are not used to determine enrolment (New South Wales Department of Education and Communities, 2021). In comparison, students at the Independent school did not need to demonstrate their giftedness or talent using intellectual/academic criteria for enrolment at the school.

Design and procedure

A task, which was an adaptation of a task used in Schwartz et al. (2011) and in many previous invention-first studies, was developed to allow students to gain an understanding of the concepts of density and ratios in physics (Schwartz & Martin, 2004). To complete the task, students were required to develop an index with certain specifications as a quantitative measure (i.e., the “crowdedness” of clowns in buses), by working out a ratio structure (i.e., a deep feature) after making comparisons of cases that are presented to them (e.g., if Bus A and Bus B have the same number of clowns, but Bus A is larger in size, Bus B is more crowded; if Bus C and Bus D are the same size, but Bus C has more clowns than Bus D, then Bus C is more crowded; see Fig. 1). It is noted that prior knowledge on the topic was assessed by asking all participants for their self-reported physics/mathematics performance on a 9-point scale, with response options ranging from 1 (very low) to 9 (very high). No pre-test was administered in this study, as the administration of a pre-test may have activated prior knowledge.

Fig. 1
figure 1

Pre-instruction invention worksheets, on the left is the high contrast materials worksheet and the right is the low contrast materials

As a first step, the participants of the study were randomly allocated into four instructional conditions which differ from one another in terms of the high or low contrast, and the order in which a problem or worked example associated with the above task are presented:

  1. (a)

    Invention-first high contrast;

  2. (b)

    Example-first high contrast;

  3. (c)

    Invention-first low contrast; and

  4. (d)

    Example-first low contrast.

Each of the instructional conditions featured five sequential stages—pre-instruction, problem-solving/worked example, recall, consolidation, and post-instruction (see Fig. 2). In comparison to previous designs, an additional practice stage of a problem-solving task and a worked example (i.e., the “problem-solving/worked example” stage) was included for all conditions prior to final consolidation. This was in acknowledgement of findings from Glogger-Frey et al. (2017)—additional problem-solving practice may increase the benefits of invention-first instruction in comparison to example-first instruction. Contrary to Glogger-Frey et al. (2017), however, where the invention-first conditions and example-first conditions were provided with a task that was congruent with the preparatory task in the additional practice stage (i.e., the invention-first groups received problem solving, and the example-first groups received a worked example), both a problem-solving task and a worked example were presented to all instructional conditions in the present study. Such a design allowed as much consistency between the four instructional conditions as possible, with the only difference being the initial stage. While the order of presentation of the two activities (i.e., the problem-solving task and the worked example) in the additional practice stage was not seen to be crucial, it was important that all four conditions received these two activities in the same order.

Fig. 2
figure 2

Stages of data collection

In the pre-instruction stage, instructions to participants in the two invention-first conditions were to invent an index, while instructions to participants in the two example-first conditions were to study the provided worked examples. The learning materials for all participants included information on the parameters of the problem: (i) each company has only one crowdedness index; (ii) the procedure used to find one company’s crowdedness index may be used to calculate another company’s crowdedness index; and (iii) a larger crowdedness index indicates that the clowns are more crowded, and a smaller crowdedness index indicates that the clowns are less crowded. The learning materials given to participants in the two example-first conditions featured a worked solution to the problem, and a note explaining that the worked solution was a fictional student’s work on the invention task. As part of the worked solution were the student’s thoughts (represented in “thought bubbles”) which provided details on the solution process, each bus company’s crowdedness index, the final result, and the principle of the crowdedness index (see Fig. 3). Depending on condition (i.e., high or low contrast materials), worked examples either featured a solution of the high contrast or the low contrast invention task.

Fig. 3
figure 3

Worksheet for the example-first low contrast materials condition. Invention-first worksheets did not include the thought bubbles and the solution (modified from Schwartz et al., 2011)

Participants assigned to all four experimental conditions worked alone on this task for 15 min (i.e., the invention-first groups spent 15 min generating an index while the example-first groups studied the same material in a worked example for 15 min). Participants in the example-first high contrast and invention-first high contrast conditions were also provided with the materials used in Schwartz et al. (2011)—three clown companies with different numbers of clowns in buses of different sizes. The materials were presented in a way to support students to understand the need to use information on both the “number of clowns” and the “number of compartments” in order to produce a ratio. Specifically, the differences between the cases were highlighted by “grouping” the buses that belonged to each company and presenting them side-by-side, to provide an opportunity for the participating students to see that although each bus in each company had a different number of clowns and these buses were of a different size, the crowdedness ratios remained the same. Students in the example-first high contrast conditions received a worked solution of the high contrast invention task, which also contained a side-by-side presentation of “groupings” of buses, with notes highlighting the contrast between cases. It is noted that a side-by-side presentation of information is not the only possible form of presentation of high contrast conditions, and that other forms of presentation (e.g., sequential) are also commonly used (Schwartz et al., 2011; Sidney et al., 2015).

Unlike the students in the two high contrast conditions, students in the two low contrast conditions did not receive problems featuring any contrast. That is, rather than separating the buses by company to increase the saliency of the deep features (as was the case in the high contrast conditions), all the buses in the low contrast conditions were presented in a single box (see Fig. 1). Although this simulated a complex problem where information is placed in one segregated location, a side effect of this design decision was that the participants in the invention-first low contrast condition may not have had access to all of the information necessary to solve the task itself. Indeed, the contrasting cases may be considered to be a crucial element in arriving at a solution to the task. Consequently, while the low contrast conditions were clearly different to the high contrast conditions, they cannot be interpreted to be the “low contrast” equivalent of the high contrast conditions for both the invention-first and example-first groups. As such, the high contrast vs. low contrast conditions in this study may be better considered to be a comparison of high contrast vs. low contrast materials, than a comparison of high contrast vs. low contrast cases. Thereafter, as the final part of the pre-instruction stage, the participants in all four instructional conditions were asked to complete an inventory of questionnaires relating to their self-efficacy, perceived knowledge gaps, extraneous cognitive load, and epistemic curiosity.

The subsequent stages were similar for all four instructional conditions (see Fig. 2), to allow as little difference between conditions as possible. Therefore, regardless of which instructional condition that a participant was allocated to, each participant received similar materials. For example, in the second stage (i.e., “problem-solving/worked example”), all participants were given another problem task that required the use of the principles relating to density and ratios outlined in the worked example (greater details of the worked example and the invention task appear in the “Measures” section below). After completing the problem task, participants were given a worked example with a different cover-story premise (i.e., chickens in coops). The worksheet was otherwise identical to the worked examples in the pre-instruction stage (see Fig. 2). As for the pre-instruction stage, the worksheets given to participants in these two stages corresponded to whether they were in a high or low contrast group (i.e., high contrast groups received worksheets denoting a high contrast problem and a high contrast worked example).

In the third stage (i.e., recall stage), participants were asked to remember what the “clown and buses” worksheet (Fig. 1) looked like, and to reproduce the worksheet by drawing it. The recall stage was followed by the consolidation stage in which students were given instructional booklets that consolidated the previously presented information for study. Finally, during the post-instruction stage (i.e., the “post-test” stage), all participants were asked to solve several problems—near transfer, far transfer, and procedural knowledge—before completing a demographics questionnaire.

Measures

Self-efficacy, perceived knowledge gaps, extraneous load, and curiosity

The self-efficacy, perceived knowledge gaps, extraneous load, and epistemic curiosity of the participants were assessed using a number of established scales with proven psychometric properties. Glogger-Frey et al. (2015) was the source of the self-efficacy (α = .91) and the perceived knowledge gaps (α = .83) scales, which were developed for specific use in invention-first instruction studies. An example of an item in the self-efficacy scale is “I am confident in my ability to calculate a crowdedness index for a clown agency”, while an example of an item in the perceived knowledge gaps scale is “I have realized through the task that I do not know some things”. Extraneous cognitive load was assessed using a scale (α = .77) developed by Leppink et al. (2013). An example of an item in this scale is “It was easy for me to distinguish between important and unimportant information”. Finally, epistemic curiosity was assessed using a scale (α = .82) that was based on the Melbourne Curiosity Inventory State Form and was adapted to this context by Glogger-Frey et al. (2015). Two examples of items in this scale are “I want to know more” and “I feel absorbed in what I am doing”. All four of these scales have 7-point Likert-type scale response options that range from 1 (i.e., not at all true of me) to 7 (i.e., very true of me).

Recall

The instrument used to assess recall prompted participants to recall as much as possible from the pre-instruction stage worksheet and to draw them out in the way that they had been displayed. Drawing allows the participating students to provide a representation of the mental model that they had obtained during the pre-instruction stage (Jee et al., 2009). In this study, the task provided a measure of the degree to which the students had become aware of the structure of the problem from the pre-instruction stage (Jee et al., 2009). The task was adapted from Schwartz et al. (2011), and has been used in previous studies to examine the content of the worksheets that the participating students would spontaneously recall from the pre-instruction stage (Glogger-Frey et al., 2015). Participants were given five minutes to complete the task. The drawings produced by the participants were coded with respect to the presence of deep and surface features. If any participants were able to reconstruct ratio structures, they were recognized as having recalled deep features. These participants received one deep feature point for each distinct ratio, for a maximum of three points (for three distinct ratios produced). The ratio structures in the participant drawings did not need to be identical to the ones in the pre-instruction worksheet to score a deep feature point. That is, drawings only needed to: (a) display a ratio structure and (b) be distinct from other ratios on the answer form (e.g., 2:4 and 1:3 counts as two points, but 1:2 and 2:4 counts as one point). For example, if a student drew four clowns in two compartments, two clowns in one compartment, three clowns in one compartment, and six clowns in two compartments, he/she would receive two deep feature points (for each of the two distinct ratios of 1:2 and 1:3).

Participant drawings were also assessed for recall of surface features, which may be considered to be an evaluation of how well the participants remember information that are not relevant to the learning task. Participants could score a total of six surface feature points, with one point being awarded for each of the following—bus company name, incidental text features, whether the clowns had been situated on lines between bus compartments, varying line styles of the buses, wheels that did not correspond to the number of compartments, and details of the clown features. This measure has previously been used in similar prior studies relating to surface feature recall (Glogger-Frey et al., 2015; Schwartz & Martin, 2004; Schwartz et al., 2011).

Consolidation and post-instruction

An information handout that provided an explanation of the solution to the problem, related the problem to the broader concept of density, and explained the overall presence and importance of ratio structures in physics, was provided to all participants during the consolidation stage of the study. This handout was based on materials used to achieve a similar purpose in Schwartz and Martin (2004), and was recollected from the participants after they had an opportunity to study it. Thereafter, a number of post-instruction assessments were made of near transfer, far transfer, and calculation ability using measures developed by Schwartz and Martin (2004).

The measure used to assess near transfer involved solving a task that is structurally similar to the pre-instruction invention task (with clowns and buses), but with a differing cover story (i.e., “chickens in coops”). As such, an assessment could be made of the ability of participants to solve a problem that is structurally identical to the instructional task, but with differences in context and specific numerical values. All participants were required to provide answers relating to three independent ratio structures. The performance of the participants was assessed with the award of two points for each ratio structure (i.e., one point for the correct ratio and one point for the correct answer). One additional point was awarded if participants arrived at a measurement statement that referred to the ratio of chickens to coops. Therefore, the maximum possible score that each participant could be awarded for the near transfer task was seven points.

To assess far transfer, participants were required to complete a task that involved the same problem structure (i.e., the invention of an index) and the underlying concept of ratio, but a different topic area (i.e., spring constant). The task required use of the same problem-solving mechanisms as all the previously presented tasks. Nevertheless, instead of being required to find a crowdedness index, participants were required to determine the “stiffness” of four trampolines. In essence, the task allowed for the measurement of the participants’ ability to transfer knowledge acquired during the learning activity to concepts that are structurally distant. The performance of participants in the task was assessed in the same manner as for the near transfer task, and the maximum possible score that each participant could be awarded for the task was nine points (i.e., two points for ratios/answers relating to each of the four trampolines, and an additional point for a measurement statement).

Finally, an assessment was made of the ability of students to use the procedural knowledge that had previously been provided in the information handout during the consolidation stage, and the worked examples during the problem-solving/worked example stage. This involved the participants being asked two calculation questions that could be solved by the use of a formula and given values. Participants were awarded a maximum of two points for this task—one point for each correct solution. All of the post-instruction assessments were adaptations of the materials used in Glogger-Frey et al. (2015) and Schwartz et al. (2011).

Results

The descriptive statistics of the collected data organised by dependent variable are presented in Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. Table 1 shows that there were no observable differences between the experimental groups in self-reported physics performance (F (3, 195) = 1.35, p = .227) in a one-way ANOVA. The main analyses for the study included a multivariate ANOVA [with post-hoc Bonferroni corrections (adjusted p-value)] that was run with the three learning outcome measures (i.e., near transfer, far transfer, and calculation) and the two recall variables as dependent variables, and instruction (i.e., invention-first vs example-first), contrast (i.e., high contrast materials vs. low contrast materials), as well as giftedness (i.e., gifted vs. non-gifted) as independent variables. The multivariate ANOVA revealed a significant multivariate main effect for the dependent variables collectively for giftedness (Wilks’ λ = .822, F (5, 186) = 8.08, p < .001, η2p = .178) and a significant interaction between giftedness and instruction (Wilks’ λ = .923, F (5, 186) = 3.12, p = .010, η2p = .077).

Table 1 Means and standard deviations (in parentheses) for physics performance
Table 2 Means and standard deviations (in parentheses) for far transfer
Table 3 Means and standard deviations (in parentheses) for near transfer
Table 4 Means and standard deviations (in parentheses) for calculation
Table 5 Means and standard deviations (in parentheses) for deep feature recall
Table 6 Means and standard deviations (in parentheses) for surface feature recall
Table 7 Means and standard deviations (in parentheses) for perceived knowledge gaps
Table 8 Means and standard deviations (in parentheses) for extraneous load
Table 9 Means and standard deviations (in parentheses) for curiosity
Table 10 Means and standard deviations (in parentheses) for self-efficacy

Thereafter, to answer the research questions, 2 (invention-first vs. example-first) × 2 (high contrast materials vs. low contrast materials) × 2 (gifted vs. non-gifted) univariate ANOVAs were conducted (with Bonferroni adjusted p-values for post-hoc) for each of the dependent variables with giftedness, instruction, and contrast entered as independent variables. A significance level of .05 was used for all analyses. The residuals for dependent variables were found to be skewed. Nevertheless, as the ANOVA has been shown to be robust to non-normal distributions (Schmider et al., 2010), it was retained as the method of analysis.

Univariate analyses

Hypothesis 1

A main effect of instruction was found for far transfer (F (1, 199) = 5.94, p = .016, η2p = .030; Hypothesis 1b). That is, students in the invention-first condition performed significantly better than students in the example-first condition on the far transfer task (see Table 2). This was qualified by a significant instruction × giftedness effect detailed in the next subsection. While this trend was also observed for the near transfer (F (1, 199) = 1.79, p = .183, η2p = .009) and calculation tasks (F (1, 199) = 754, p = .386, η2p = .004), they did not reach statistical significance (Hypotheses 1a, 1c; see Tables 3 and 4). In comparison, no significant main effect of instruction was found for deep feature recall (F (1, 199) = .17, p = .680, η2p = .001; Hypothesis 1d).

Hypothesis 2

The instruction × giftedness interaction was not significant for near transfer (F (1, 199) = 1.30, p = .255, η2p = .007; Hypothesis 2a). Nevertheless, a significant instruction by giftedness interaction was found for the far transfer task (F (1, 199) = 6.99, p = .009, η2p = .035; Hypothesis 2b) and the calculation task (F (1, 199) = 11.45, p = .001, η2p = .057; Hypothesis 2c). Specifically, the follow up simple effect analyses found that gifted students in the invention-first conditions (M = 3.74, SD = 3.79) performed better in far transfer tasks than those in the example-first conditions (M = 1.48, SD = 2.90), although there was no significant difference in performance for non-gifted students (invention-first: M = 1.32, SD = 2.69; example-first: M = 1.40, SD = 2.77). In comparison, there was no significant difference in performance between gifted students (M = 1.48, SD = 2.90) and non-gifted students in the example-first conditions (see Fig. 4). A similar pattern was seen for calculation performance. That is, follow up simple effect analyses found a significant difference only for gifted students in invention-first conditions (M = 1.54, SD = .53) who outperformed gifted students in the example-first conditions  (M = 1.10, SD = .70), while no difference was identified for non-gifted students in the invention-first  (M = .65, SD = .73) and example-first  (M = .88, SD = .84) conditions (see Fig. 5).

Fig. 4
figure 4

Interaction between giftedness and instructional format for the dependent variable of far transfer performance

Fig. 5
figure 5

Interaction between giftedness and instructional format for the dependent variable of calculation performance

Following these analyses, 2 (instruction) × 2 (contrast) multivariate ANOVAs were conducted separately with the gifted and non-gifted participants. While no significant multivariate main effects or interactions with the non-gifted participants were found, a significant multivariate effect of instruction (Wilks’ λ = .836, F (5, 91) = 3.56, p = .005, η2p = .164) and a significant interaction of instruction and contrast (Wilks’ λ = .882, F (5, 91) = 2.44, p = .040, η2p = .118) were identified for the gifted participants. Specifically, follow up ANOVAs found main effects of instruction on far transfer  (F (1, 98) = 10.31, p = .002, η2p = .098; Hypothesis 2b) and calculation  (F (1, 98) = 11.58, p = .001, η2p = .106; Hypothesis 2c), and a marginally significant effect on near transfer (F (1, 98) = 3.61, p = .061, η2p = .037; Hypothesis 2c).

Hypothesis 3

The 2 × 2 × 2 ANOVA with perceived knowledge gaps  (α = .80) did not find a significant main effect of instruction  (F (1, 199) = 3.273, p = .072, η2p = .017; Hypothesis 3a), but it trended towards increased knowledge gaps with invention-first instruction (see Table 7). Moreover, a main effect of contrast (F (1, 199) = 6.92, p = .009, η2p = .035) favouring the low contrast materials over high contrast materials (i.e., high contrast: M = 3.78, SD = 1.57; low contrast: M = 4.41, SD = 1.50) was found. In comparison, no main effects or interactions were identified when extraneous cognitive load (α = .72; instruction: F (1, 199) = 1.99, p = .160, η2p = .010; contrast: F (1, 199) = .47, p = .495, η2p = .002; giftedness: F (1, 199) = 1.63, p = .204, η2p = .008; see Table 8) or curiosity (α = .90; instruction: F (1, 199) = .31, p = .579, η2p = .010; contrast: F (1, 199) = .47, p = .495, η2p = .002; giftedness: F (1, 199) = 1.63, p = .204, η2p = .008; see Table 9) were used as dependent variables (Hypotheses 3b, 3c). When self-efficacy (α = .92) was input as the dependent variable (Hypothesis 3d), a significant main effect of instruction  (F (1, 199) = 3.91, p = .049, η2p = .020) was found favouring the example-first condition over the invention-first condition (see Table 10; Hypothesis 3d).

Hypothesis 4

No significant interactions of instruction × contrast were found for near transfer, far transfer, or deep feature recall (Hypotheses 4a, 4b). Nevertheless, a significant instruction by contrast interaction was found for calculation performance (F (1, 199) = 5.09, p = .025, η2p = .026; Hypothesis 4c), as a follow up simple effects analysis indicated significant effects only in the invention-first condition. That is, students in the invention-first condition performed better on the calculation task when they were exposed to high contrast materials (M = 1.23, SD = .82) in comparison to low contrast materials  (M = .98, SD = .71; see Fig. 6). In comparison, no significant difference was identified in the performance of students in the example-first conditions when exposed to high contrast (M = .93, SD = .81) and low contrast materials  (M = 1.04, SD = .76)..

Fig. 6
figure 6

Interaction between contrasting cases and instructional format for the dependent variable of calculation performance

When deep feature recall was input as the dependent variable, a 2 × 2 × 2 ANOVA only identified a significant interaction between instruction and contrast (F (1, 199) = 4.17, p = .043, η2p = .021; Hypothesis 4d). Specifically, a simple effects analysis found a significant difference only in the invention-first condition, whereby participants who were exposed to high contrast materials (M = 2.28, SD = 2.25; Table 5) performed better in comparison to participants who were exposed to low contrast materials (M = 1.49, SD = 1.64), while there were no significant difference in the performance of the participants in the high contrast (M = 1.63, SD = 1.88) and low contrast (M = 1.95, SD = 2.09) scenarios of the example-first conditions (see Fig. 7). For surface feature recall, only a significant main effect of giftedness was found (F (1, 199) = 9.47, p = .002, η2p = .047; Table 6), whereby gifted students (M = 2.47, SD = 1.01) identified more surface features than non-gifted students (M = 1.96, SD = 1.17).

Fig. 7
figure 7

Interaction between contrasting cases and instructional format for the dependent variable of deep feature recall

Hypothesis 5

No significant interactions of contrast × giftedness were found for near transfer, far transfer, or deep feature recall (Hypotheses 5a, 5b, 5d). Nevertheless, when 2 (instruction) × 2 (contrast) multivariate ANOVAs were conducted separately with the gifted and non-gifted participants, a significant instruction by contrast interaction was found for calculation (F (1, 98) = 7.70, p = .007, η2p = .075). That is, analyses of simple effects found that gifted participants in the invention-first conditions performed significantly better when exposed to high contrast materials (M = 1.75, SD = .43) than low contrast materials  (M = 1.00, SD = .75), while no such difference was identified in the example-first conditions, or with non-gifted students (Hypothesis 5c).

Discussion

The present study compared the effect of three different factors on instruction. Specifically, an experimental design was utilized that introduced variations to learners in terms of whether they were initially provided with explicit instruction or were initially expected to invent a solution to a problem, whether the problems were presented with high contrast materials or not, and whether the recipients of the instruction were gifted students or not. The results indicated several notable findings:

  1. (a)

    The observed main effect of instruction suggested that invention-first instruction may be more effective than example-first instruction in increasing transfer for the cohort as a whole, independent of contrast condition [Hypothesis 1a (d = .35)]..

  2. (b)

    No main effect of contrast materials was identified (Hypothesis 4a, 4b, 4c). This suggests that there is no advantage to learning outcomes under differing contrast conditions, over and above the effects of instruction or giftedness.

  3. (c)

    As expected, a main effect of giftedness was found for all three learning outcomes, as gifted students outperformed non-gifted students on every performance measure. Furthermore, a significant interaction (d = 0.38) was identified indicating that gifted students benefited more from invention-first instruction than example-first instruction with respect to transfer and calculation (Hypothesis 2a, 2b, 2c). In comparison, no significant differences were identified in the manner in which non-gifted students responded to invention-first and example-first instruction. Therefore, the relative effectiveness of invention-first instruction over example-first instruction was substantially greater for gifted students than for non-gifted students.

  4. (d)

    The investigated factors—instruction, contrast, and giftedness—had no impact on how much extraneous cognitive load or curiosity (Hypotheses 3a, 3c) were perceived by the participants. The participants perceived more knowledge gaps in conditions with low contrast materials than in conditions with high contrast materials, although this did not lead to increased learning performance in low contrast conditions as measured by post-test measures.

  5. (e)

    While the gifted participants outperformed non-gifted participants in post-test learning outcomes, the two groups did not differ in deep feature recall (Hypothesis 2d). Instruction and contrast did not appear to affect deep feature recall performance either; in fact, the only significant finding with deep feature recall was that participants in the invention-first high contrast conditions performed better than participants in invention-first low contrast conditions (while all participants in the example-first conditions recalled a similar number of deep features).

While Schwartz et al. (2011) found that invention-first instruction was more effective in increasing transfer, Glogger-Frey et al. (2015), using modified worked examples of Schwartz et al. (2011) to increase the level of contrast, found that the example-first group performed better than the invention-first group on transfer tests. The results of this study, which utilised worked example materials from Glogger-Frey et al. (2015), appear to align more with Schwartz et al. (2011) than with Glogger-Frey et al. (2015)—invention-first instruction was more effective than example-first instruction, but only for gifted students. No significant differences were identified between the two instructional methods for non-gifted students.

Hence, the idea that gifted students may benefit from invention-first instruction differently to non-gifted students has been supported in this study. This finding aligns with previous literature that has suggested that gifted students may perform better when given multiple avenues to explore during learning (Coleman, 2005). It is also consistent with research that has suggested that open-ended learning, which requires higher-level thinking, is beneficial for gifted students (Bonotto & Santo, 2015). Although it is inconsistent with Coppens et al. (2019), which found that both gifted and non-gifted students performed similarly on example-problem instruction and problem-example instruction, their study involved elementary school students and tested only near transfer and isomorphic tasks post-instruction (which contrasts with this study, which found that gifted secondary school students in invention-first instruction only outperformed their non-gifted counterparts in the performance areas of procedural knowledge and far transfer). Overall, the findings of this study provide an important contribution to the literature in invention-first instruction and giftedness. Previous research has tended to focus on the effects of invention-first instruction on only the average student. As one of the few studies to compare the effects of invention-first instruction on students of varying academic ability levels, this study has opened up a new avenue of invention-first instruction research.

While the reported results are consistent with the findings of previous studies that support invention-first instruction, the mechanisms that were previously seen to underlie invention-first instruction were not fully supported in this study. Specifically, previous literature has suggested that factors such as an increased awareness of gaps in knowledge, increased motivation (e.g., self-efficacy and curiosity) and extraneous cognitive load, and/or increased deep feature awareness may be possible reasons for the greater effectiveness of invention-first instruction in comparison to explicit instruction (Kalyuga & Singh, 2015; Loibl & Rummel, 2014; Loibl et al., 2017). However, this study indicated that not all of these factors may be influenced by the instructional conditions or level of contrast. That is, neither self-efficacy nor perceived knowledge gaps were found to contribute to increased performance with respect to invention-first instruction.

It is noted that this study was not able to fully address the widespread suggestion within the invention-first instruction literature that contrast may increase the effectiveness of invention-first instruction due to the greater likehood that deep structural features will be internalised under high contrast conditions in comparison to low contrast conditions (Loibl & Rummel, 2014; Loibl et al., 2017). While the study findings indicated that the use of high contrast materials was not a significant predictor of transfer or procedural knowledge performance, these findings should not be interpreted as strong evidence against this literature, nor strong support for recent studies that have failed to identify any positive effects of high contrast when using a full factorial design (Loibl et al., 2020). This is so as the materials for the two invention-first instructional groups in this study were presented to participants in a manner such that the high contrast conditions were not a direct equivalent to the low contrast conditions (which lacked contrasting cases, and as a consequence, the complete means by which to solve the problem). In comparison, the materials for the two example-first instructional groups were direct equivalents of one another. Although such a presentation of materials may not have affected the main findings of the study, they may have resulted in the perception of more knowledge gaps among participants in the low contrast invention-first condition.

Regardless of the manner in which the materials were presented, one possible explanation for the lack of a significant predictive relationship between the use of high contrast materials and transfer or procedural knowledge performance may be the view by some scholars that the full benefits of high contrast may not be realized in all situations, as such benefits may come with several trade-offs. That is, while students who are exposed to high contrast materials may be more likely to notice the deep structure of problems, this may not necessarily translate into an increased application of the noticed deep structure construct in problem-solving (design) or performance (Chase et al., 2019). In fact, students who are presented with high contrast materials may focus strongly on single features of the problem at hand rather than its overall underlying structure, which may hinder rather than promote learning. For example, in Chase et al. (2019), despite the fact that participants were more likely to notice the deep structure of a problem after viewing high contrast examples, they tended to focus their attention on only one of the two deep feature concepts highlighted in the cases. Similarly, Chin et al. (2016) found that without the directive to look for (and invent a solution to find) an analogy across cases, students who were in high contrast conditions tended to focus on discrete surface features, and failed to account for the variation across the cases. Hence, the provision of high contrast may not be enough to support learning. A related explanation for the lack of a significant predictive relationship may be the lack of explicit prompts for self-explanation by the participants in this study while they completed the assigned tasks. For example, Sidney et al. (2015) found that high contrast cases with self-explanation prompts resulted in increased conceptual learning in fraction division, while high contrast cases without such prompts did not promote such understanding.

A number of other limitations of the study need to be acknowledged. First, it is noted that in the two example-first instructional conditions, contrast may be considered to have been automatically high (even in the condition with low contrast materials) as the worked solution provided to participants always comprised contrasts of two instances (i.e., buses) at a time. This may have meant that low contrast materials were effectively presented only to the participants in the invention-first condition. Even in the invention-first condition, the effect of low contrast materials may have been deluded as a result of the provision of worked examples in the “problem-solving/worked example” stage. Second, rather than being assessed using a number of conventionally utilized identification instruments, giftedness in this study was determined through self-reports. Although schools (including academically selective schools) with rigorous entry criteria and processes for the selection of students for gifted classes were approached (and therefore some assurance was obtained of the validity of the gifted vs. non-gifted classifications of participants), even greater confidence may have been achieved if a formal multiple criteria identification process that incorporated identification instruments were used. Nevertheless, it is noted that such an approach to participant recruitment is impractical, and approaches that are similar to the approaches used to recruit participants in this study are conventional (Jung, 2014, 2017; Margot & Rinn, 2016; Preckel et al., 2010). Third, as participants were drawn only from schools located in one urban area on the eastern seaboard of Australia, the generalizability of the study to the larger student population is limited. Fourth, it is possible that the quantity of materials in the worked example task may have increased the extraneous cognitive load of the participants, which may have had the effect of reducing the cognitive load benefit of presenting a worked example to participants. It is noted that the materials used in this study were not modified extensively from Glogger-Frey et al. (2015) and Schwartz et al. (2011) to maintain as much similarity as possible between the look and feel of the invention task and the worked example task. Fifth, the sample sizes used in the study, ranging from 17 to 30 in each group, were somewhat small. It is possible that the resulting low power may have contributed to the lack of significant findings in this study. Sixth, ANOVA was used to analyse calculation and deep feature recall. Although the relevant scales did not have a large number of response options, they were treated as continuous variables in the study (Jaeger, 2008). Lastly, it is noted that the study failed to take into account the possible practice effects associated with the two additional tasks that were introduced.

In terms of practice, the findings of the study suggest that invention-first approaches should be actively considered when providing educational interventions for gifted students. In comparison, the negligible differences that were found in the outcomes between the invention-first and example-first instructional approaches for non-gifted students, suggest that both instructional approaches may be appropriate for non-gifted students. Further research that compares the impact of these two instructional approaches on non-cognitive variables such as creativity and motivation may be necessary before any further recommendations may be made on the relationship between these instructional approaches and non-cognitive variables. Some alternative avenues for future research include an in-depth examination of the potential practice effects relating to the two additional tasks that were introduced in this study, and the conduct of a longitudinal study on invention-first instruction to gain important insights into the longer term impacts of this approach to instruction in comparison to explicit instruction in regular classrooms.