# Productive failure in mathematical problem solving

## Authors

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s11251-009-9093-x

- Cite this article as:
- Kapur, M. Instr Sci (2010) 38: 523. doi:10.1007/s11251-009-9093-x

- 38 Citations
- 761 Views

## Abstract

This paper reports on a quasi-experimental study comparing a “productive failure” instructional design (Kapur in Cognition and Instruction 26(3):379–424, 2008) with a traditional “lecture and practice” instructional design for a 2-week curricular unit on rate and speed. Seventy-five, 7th-grade mathematics students from a mainstream secondary school in Singapore participated in the study. Students experienced either a traditional lecture and practice teaching cycle or a productive failure cycle, where they solved complex problems in small groups without the provision of any support or scaffolds up until a consolidation lecture by their teacher during the last lesson for the unit. Findings suggest that students from the productive failure condition produced a diversity of linked problem representations and methods for solving the problems but were ultimately unsuccessful in their efforts, be it in groups or individually. Expectedly, they reported low confidence in their solutions. Despite seemingly failing in their collective and individual problem-solving efforts, students from the productive failure condition significantly outperformed their counterparts from the lecture and practice condition on both well-structured and higher-order application problems on the post-tests. After the post-test, they also demonstrated significantly better performance in using structured-response scaffolds to solve problems on relative speed—a higher-level concept not even covered during instruction. Findings and implications of productive failure for instructional design and future research are discussed.

### Keywords

Ill-structured problemsFailure in problem solvingPersistenceClassroom-based researchMathematical problem solving## Introduction

Situative, socio-constructivist theories of learning emphasize the importance of having learners engage in authentic, complex problem-solving activities for meaningful learning to take place (Brown et al. 1989; Scardamalia and Bereiter 2003; Spiro et al. 1992). The complex nature of the problem thus demands, as is argued, that support structures be provided as learners engage in solving such problems, for without the support structures, learners may fail. Structure, broadly conceived, comes in a variety of forms such as structuring the problem itself, scaffolding, instructional facilitation, provision of tools, expert help, and so on, and a substantial amount of research examines the effect of structuring and scaffolding learners within complex problem-solving activities (Reiser 2004; Puntambekar and Hübscher 2005). Inadvertently, this has perhaps led to a commonly-held belief that there is little efficacy in students solving complex problems *without* the provision of external structures to support them. Of course, believing in the efficacy of structuring what might otherwise be a complex, divergent, and unproductive process is well-placed (Kirschner et al. 2006). However, allowing for the concomitant possibility that under certain conditions letting learners persist, struggle, and even fail at tasks that are complex and beyond their skills and abilities may in fact be a productive exercise in failure requires a paradigm shift (Clifford 1984). I explore this very possibility.

## Failure and structure

The role of failure in learning and problem solving is no doubt intuitively compelling. For example, research on impasse-driven learning (VanLehn et al. 2003) in coached problem-solving situations provides strong evidence for the role of failure in learning. Successful learning of a principle (e.g., a concept, a Physical law) was associated with events when students reached an impasse during problem solving. Conversely, when students did not reach an impasse, learning was rare despite explicit tutor-explanations of the target principle. Instead of providing immediate structure, e.g., in the form of feedback, questions, or explanations, when the learner demonstrably makes an error or is “stuck,” VanLehn et al.’s (2003) findings suggest that it may well be more productive to delay that structure up until the student reaches an impasse—a form of failure—and is subsequently unable to generate an adequate way forward. Echoing this delaying of structure in the context of text comprehension, McNamara (2001) found that whereas low-knowledge learners tended to benefit from high-coherence texts, high-knowledge learners benefited from low-coherence texts, and especially more so when a low-coherence text preceded a high-coherence one. This, McNamara argues, suggests that reading low-coherence texts may force learners to engage in compensatory processing by using their prior knowledge to fill in the conceptual gaps in the target text, in turn, preparing them better to leverage a high-coherence text subsequently. Further evidence for such preparation for future learning (Schwartz and Bransford 1998) can be found in the inventing to prepare for learning research by Schwartz and Martin (2004). In a sequence of design experiments on the teaching of descriptive statistics, Schwartz and Martin (2004) demonstrated an existence proof for the hidden efficacy of invention activities when such activities preceded direct instruction (e.g., lectures), despite such activities failing to produce canonical conceptions and solutions during the invention phase.

Clearly, the relationship between failure and structure forms a common thread through the above mentioned studies. It is reasonable to reinterpret their central findings collectively as an argument for a *delay of structure* in learning and problem-solving situations, be it in the form of feedback and explanations, coherence in texts, or direct instruction. Indeed, all of them point to the efficacy of learner-generated structures—comprehensions, conceptions, representations, and understandings—even though these may not be correct initially and the process of arriving at them not as efficient. However, the above mentioned studies deal with students solving well-structured, text-book problems, which is typically the case in schools (Jonassen 2000; Spiro et al. 1992).

While there exists a substantive amount of research examining students solving complex problems *with* the provision of various support structures and scaffolds (e.g., Cho and Jonassen 2002; Ge and Land 2003; Hmelo-Silver 2004), my earlier work on *productive failure* (Kapur 2006, 2008) examined students solving complex, ill-structured problems *without* the provision on any external support structures. In an earlier study (Kapur 2008), I asked 11th-grade student triads from seven high schools in India to solve either ill- or well-structured physics problems in an online, chat environment. After group problem solving, all students individually solved well-structured problems followed by ill-structured problems. Ill-structured group discussions were found to be more complex, chaotic, and divergent than those of their well-structured counterparts, leading to poor group performance. However, findings suggested a hidden efficacy in the complex, divergent interactional process even though it seemingly led to failure. I argued that delaying the structure received by students from the ill-structured groups (who solved complex, ill-structured problems collaboratively followed by well-structured problems individually) helped them *discern* how to structure an ill-structured problem, thereby facilitating a spontaneous transfer of problem-solving skills (Dixon and Bangert 2004; Marton 2007).

An important instructional implication that one could derive from the above mentioned research programs (taken together) and test in a classroom-based setting is that instructional designs need not take learners through a narrow path to success or the “correct” answer in the most efficient manner especially when learners are engaged in solving complex problems. Designing opportunities for learners to generate and develop their own structures—representations, problem-solving methods, conceptions—in the absence of external structure may well lead to performance failure in the shorter term. However, this very process may be germane to learning in the longer term (Clifford 1984; Schmidt and Bjork 1992). Therein lies the purpose of this study.

## Purpose

The purpose of this study was to design a productive failure instructional cycle for mathematics classrooms in a Singapore school and compare it with a conventional lecture and practice instructional cycle. I wanted to test whether or not there is a hidden efficacy in delaying structure in the learning and performance space (Hatano and Inagaki 1986) of students by having them engage in unscaffolded problem-solving of complex problems first before direct instruction. To achieve this, two classroom-based, quasi-experimental studies with 7th-grade mathematics students were carried out; each study targeting a 2-week curricular unit. The first curricular unit was on estimation and approximation; the second was on rate and speed. For the first unit on estimation and approximation, I found evidence for productive failure. This paper, however, focuses only on the second study (targeting the curricular unit on rate and speed) because the second study provided confirmatory evidence and thus bolstered the pedagogical tractability of productive failure in a real classroom context. That said, the reality of doing classroom-based research is that one has to strictly adhere to the curriculum and the time allotted for the targeted curriculum unit. However, it is reasonable to argue that if the productive failure hypothesis could be demonstrated with minimal changes to the school curriculum, teacher training, and technological resources, and that too within a timeframe determined by the school curriculum and schedule, then it would only speak well of the productive failure’s ecological significance.

## Method

### Participants

Participants were *n* = 75, Secondary 1 (7th-grade) students (43 male, 33 female; 12–13-year-old) at a co-educational, secondary school in Singapore. The medium of instruction is English throughout the Singapore school system. Students were from two mathematics classes (37 and 38 students, respectively) taught by the same teacher. The participating school can be described as a mainstream school comprising average-ability students on the grade six national standardized tests. Students at this school typically come from middle-class socio-economic backgrounds. Students had limited or no experience with the targeted curricular unit—rate and speed—prior to the study.

### Research design

A quasi-experimental design was used with one class (*n* = 37) assigned to the ‘productive failure’ (PF) condition and the other class (*n* = 38) assigned to the ‘lecture and practice’ (LP) condition. Both classes participated in the same number of lessons for the targeted unit totaling seven, 55-min periods over 2 weeks. Thus, the amount of instructional time was held constant for the two conditions. Before the unit, all students wrote a 30-min, 9-item pre-test (*Cronbach alpha* *=* .72) as a measure of prior knowledge of the targeted concepts. There was no significant difference between the two conditions on the pre-test, *F*(1,73) = .177, *p* = .675. After the unit, all students took two post-tests (described later in the paper).

### Productive failure class

Productive failure versus lecture and practice instructional cycle

Period | Productive failure (PF) class | Lecture and practice (LP) class |
---|---|---|

Pre-test | ||

1–2 | Complex problem 1 (group) | Lecture, practice (in class and HW), feedback |

3 | Problem 1 “What if” extensions (individual) | Lecture, practice (in class and HW), feedback |

4–5 | Complex problem 2 (group) | Lecture, practice (in class and HW), feedback |

6 | Problem 2 “What if” extensions (individual) | Lecture, practice (in class and HW), feedback |

7 | Consolidation lecture (whole class) | Lecture, practice (in class and HW), feedback |

Post-tests 1 and 2 |

Only during the seventh (and last) period was a consolidation lecture held where the teacher led a discussion of the targeted concepts. In the consolidation lesson, the teacher got the groups to share their problem representations and solution methods and strategies. The goal was to compare and contrast the effectiveness of those representations and solution methods. The teacher then shared the canonical ways of representing and solving the problems with the class. For example, the teacher showed how the algebra may be used to represent and solve the problem. Likewise, a domain-general method such as trial and error—a form of means-ends analysis—was also shared. While doing so, the teacher explicated the concept of average speed in the context of the problems. Finally, students practiced three well-structured problems on average speed, and the consolidation ended with the teacher going through the solutions to these problems.

Two complex problem scenarios were developed for the unit on rate and speed (see Appendix 1 for one such problem scenario). The design of the complex problem scenarios was closely aligned to the design typology for problems espoused by several scholars (e.g., Goel and Pirolli 1992; Jonassen 2000; Spiro et al. 1992; Voss 1988, 2005). Accordingly, the complex problems were designed such that they possessed many problem parameters with varying degrees of specificity and relevance. Some of the parameters interacted with each other such that their effect could not be examined in isolation. As a result, the problem scenarios were complex, possessed multiple solution paths leading to multiple solutions (as opposed to a single correct answer), and often required students to make and justify assumptions.

As an instantiation of the above mentioned principles for designing complex problems, consider the problem scenario in Appendix 1. Clearly, the problem scenario contains several parameters such as Hady and Jasmine’s walking and riding speeds, Ken’s average speed and time for driving to the Expo, Ken’s uniform speed and time on the expressway, departure time from Ken’s house, arrival time at the auditions, and so on. However, not all of these parameters are fully specified. For example, the exact time of departure of Hady and Jasmine from Ken’s house is not specified at all. This is a critical parameter, and because it is not specified, the problem requires students to make assumptions or devise ways around the lack of specification of this parameter. Furthermore, not all the parameters are equally relevant. For example, Ken’s uniform speed (90 kmph) and time (3 min) on the expressway are not as relevant as his average speed (75 kmph) and time (7 min) taken to get to the Expo. Therefore, the problem requires students to decide between the two sets of parameters as to which one provides a better indication of the distance between Ken’s house and the Expo. Indeed, part of complex problem solving is to deal with varying degrees of specification and relevance of the parameters (Jonassen 2000; Spiro et al. 1992). Additionally, note that the lack of specification of the parameter—departure time from Ken’s house—interacts with the parameter—distance between Ken’s house and the Expo. For a speed-rate problem, both parameters are critical; lack of specification of one of them does not allow the other to be considered in isolation. Another interaction parameter was designed in the form of a constraint that Jasmine and Hady had to arrive at the Expo at the same time. Once again, this meant that students could not consider Jasmine and Hady’s journeys in isolation. A change in one parameter had an effect on other interacting parameters. Taken together, the varying degrees of specification and relevance of the parameters as well as the interactions between them make the problem scenario complex, especially so from the perspective of grade 7 students who had not had any formal instruction on the targeted unit on rate and speed. This point is important because the complexity of a problem is not the property of the problem alone but a relation between the problem and the problem solver. The pilot test revealed that one must consider a student-oriented view towards problem design (Lobato 2003). In other words, it is possible that this problem may well be well-structured for experts or advanced students but the pilot studies suggested that this was not the case for the grade 7 students in the study.

An additional and critical design principle for the complex problem scenarios was that of *persistence*, i.e., the focus was more on students being able to persist in problem solving than on actually being able to solve the problem successfully. A focus on ensuring that students solve a problem which they may not otherwise be able to in the absence of support structures necessitates the provision of relevant support structures and scaffolds during problem solving. However, a focus on persistence does not necessitate such a provision as long as the design of the problem allows students to make some inroads into exploring the problem and solution spaces without necessarily solving the problem successfully.

Based on the above design principles, it is easy to see that the problem design process minimally requires the following decisions: how many parameters are to be included, how many of these parameters are to be specified and to what degree, how many are to be relevant and to what degree, how many interactions are to be designed, how does one know the problem is designed for persistence, and so on? Of course, decisions around the above design principles were not made in isolation but as part of an iterative design and validation process that involved the teachers so that the complexity of the problem scenarios took into account the age and grade level of the students. Achieving such validation of the problem design decisions was obviously difficult in practice. To this end, validation of the problem scenarios was carried out through multiple iterations of design with two mathematics teachers at the school (one of them taught the two participating classes) as well as pilot-testing with a small but representative group of students (*n* = 14; 4 triads, 1 dyad) from the previous cohort of students from the school. The pilot tests, in particular, provided insights into and helped fine-tune the design decisions described above. The validation exercise informed the time allocation for group and individual tasks as well as the above-mentioned design principles for the complex problem scenarios, especially that of persistence.

### Lecture and practice class

The 38 students in the LP class were involved in teacher-led lectures guided by the course workbook. The teacher introduced a concept (e.g., speed) to the class, worked through some examples, encouraged students to ask questions, following which students solved problems for practice. The teacher then discussed the solutions with the class. For homework, students were asked to continue with the workbook problems. Note that the worked-out examples and practice problems were typically well-structured problems with fully-specified parameters, prescriptive representations, predictive sets of solution strategies and solution paths, often leading to a single correct answer (see Appendix 2 for examples). The well-structured problems ranged from simple to moderately difficult. This cycle of lecture, practice/homework, and feedback then repeated itself over the course of seven periods (see Table 1). Students worked independently most of the time although some problems were solved collaboratively.

In short, the LP condition represented a design that was highly structured from the beginning. The major design elements of the LP instructional design were: individual work, scaffolded solving of well-structured problems, and a high level of structure throughout the instructional cycle in the form of teacher-led lectures, proximal feedback, and regular practice, both in-class and for homework (Puntambekar and Hübscher 2005). The PF condition represented a design that delayed structure (in the form of the consolidation lecture) up until students had completed two complex problem scenarios and the corresponding what-if extension problems *without* any instructional facilitation, support structures, or scaffolds. In contrast, therefore, the major design elements of the PF instructional design were: collaborative work, unscaffolded solving of complex problem scenarios designed for persistence, and a delay of structure.

It was hypothesized that as long as students persisted in the problem-solving process, they were not only likely to try different ways of representing the problems but also develop a diversity of qualitative as well as quantitative methods for solving the problems. Based on past research, I did not expect students, who were novices with respect to the targeted content of rate and speed, to use the most effective representations and domain-specific methods for solving the problems nor did I expect them to be successful in their problem-solving efforts (Chi et al. 1981; Hardiman et al. 1989; Kirschner et al. 2006; Kohl et al. 2007). However, based on past research on PF (Kapur 2008; Kapur and Kinzer 2009), I believed that there would be a hidden efficacy in students’ persistent exploration of the problem and solution spaces for representations and methods for solving the problem even though the exploration may not necessarily result in performance success in the shorter term. There are at least two reasons for this efficacy. First, students often do not have the necessary prior knowledge differentiation to be able to discern and understand the affordances of domain-specific representations and methods during direct instruction (e.g., Amit and Fried 2005; Even 1998; Schwartz and Martin 2004; for a similar argument applied to perceptual learning, see Garner 1974; Gibson and Gibson 1955). The PF condition was one way of designing for knowledge differentiation by providing opportunities for students to develop structures—concepts, representations, and methods—for solving complex problems. Second, when concepts, representations, and methods are presented in a well-assembled, structured manner during direct instruction, students may not understand why those concepts, representations, and methods are assembled or structured in the way that they are (Anderson 2000; Chi et al. 1988; Schwartz and Bransford 1998). Again, the PF condition was one way of designing for knowledge assembly by providing structure (in the form of a consolidation lecture) but only after students had persisted sufficiently in problem solving.

Therefore, compared to the LP condition, the delay of structure in the PF condition may result in students attempting to assemble key ideas and concepts underlying rate and speed, as well as generate their own structures—representations and methods—for solving the complex problems (Lampert 2001; Lesh and Doerr 2003; Schwartz and Martin 2004; Spiro et al. 1992). The complex problem scenarios afforded opportunities for students to generate their own structures when structure was not provided; developing capacity to generate such structures is indeed a critical dimension of expertise (Anderson 2000; Chi et al. 1981, 1988). Consequently, such a process may be integral to engendering the necessary knowledge differentiation which may help students better discern and understand those very concepts, representations, and methods when presented in a well-assembled, structured form during the consolidation lecture (Dixon and Bangert 2004; Marton 2007; Schwartz and Bransford 1998).

It is important to note that the research design allows for a comparison between two instructional designs as wholes, not their constituent design elements. Unlike laboratory experiments, one is rarely able to isolate individual elements of an instructional design in a single classroom-based research study because it is the complexity of how the individual elements combine that gives rise to the efficacy of a particular design (Brown 1992). Furthermore, to bring about change in teacher practice and pedagogy especially in a system of high-stakes testing such as in Singapore, comparing a new instructional design (e.g., PF) with a design most prevalent in practice (e.g., LP) is an important first step towards convincing teachers and school leaders about the need for such a change. If and when a compelling comparison of such nature has been made, future studies can then unpack the various design elements further. Therefore, for a start, my strategy was to put greater emphasis on a comparison of designs as wholes vis-à-vis causal attribution of effects to design elements (Fishman et al. 2004; Tatar et al. 2008; Tharp and Gallimore 1982).

### Data sources and analysis

Data analysis procedures are described together with the results in the following section. Both process and outcome measures were analyzed. The problem-solving process was analyzed qualitatively as well as quantitatively. For qualitative process measures, I analyzed the problem representations produced by groups. I also analyzed the group discussions, which were captured in audio and transcribed. In accordance with the hypothesis, group discussions were analyzed to understand how representations produced by the groups related to the various qualitative and quantitative methods—both domain-general and domain-specific—that groups used in their attempt to solve the problems. The focus on problem representations and the associated methods for solving the problem is consistent with a substantial body of research on problem solving that underscores the importance of analyzing problem representations and domain-specific and domain-general problem-solving methods (e.g., Chi et al. 1981; Even 1998; Goldin 2002; Kaput 1999; Greeno and Hall 1997; Janvier 1987). Quantitative process measures included analyses of group solutions, individual solutions to the what-if extension problems, and the corresponding confidence ratings. Performance on the two post-tests formed the outcome measures.

## Process analyses and results

### Problem representations

*diversity*of linked representations in their attempt at solving the problems. I illustrate this pattern through an analysis of the representations produced by one of the participating groups as it solved the complex problem in Appendix 1. This group was selected on the basis of it being a representative example (see Fig. 1) of the pattern.

The complex problem scenario that Fig. 1 refers to can be found in Appendix 1. Recall that the problem essentially described a scenario where two friends, Jasmine and Hady, had to get to an audition by a certain time. They could walk and/or ride a bicycle. The constraint was that they had to reach the exhibition at the same time despite having different walking and biking speeds. Furthermore, a little while into their journey, one of the bicycles breaks down, forcing them to re-strategize. Groups had to determine ways in which Jasmine and Hady could ride and/or walk for different periods of times and distances to reach the audition.

Figure 1 reveals that the group used a diverse but linked set of *iconic* (e.g., house, bicycle), *graphical* (e.g., straight lines for Jasmine and Hady), *proportional* (e.g., ratios between Jasmine’s and Hady’s speeds and distances for walking and riding), and *letter-symbolic algebraic* representations (e.g., using letters *X*, *Y*, *A*, *B*, *Q* to represent unknown variables and link with other representations). For example, the group used two house-like icons and a bicycle icon to represent the problem concretely. Connected to this iconic representation is the 1-dimension graphical representation of two straight lines for Jasmine and Hady, respectively. These lines seemingly represent the total distance travelled by Jasmine and Hady. They are of equal length with several partitions on them. The first partition occurs at the same distance from the start of the lines, likely representing the point where the bicycle broke down. The lines are further partitioned, representing the different ways of apportioning the walking and riding portions of the journey. The group also used proportions to represent the ratios between Jasmine’s and Hady’s speeds and distances for walking and riding (a bike). Analysis of the group discussion around these ratios and proportions revealed a domain-specific strategy that the group came up with in their attempt to solve the problem (more on this in the following section). The representations seem linked through a common set of referents be it text (e.g., walk, ride), numbers representing the speeds (e.g., 50 m, which is Hady’s walking speed of 50 m/min), or symbols representing unknown time variables (e.g., *X*, *Y*, *A*, *B*).

Additionally, the group set up systems of algebraic equations, S1 and S2. S1 is a system of three equations with five unknown time variables, which is obviously not solvable but it possibly represents a frequently-used, domain-specific strategy that when any mathematical quantity is calculated or expressed in two different ways, then the two expressions must also be equal (Zeitz 1999). S2 is another system of two equations that is used to derive the equation with four unknowns found at the bottom of Fig. 1. Again, one equation in four unknowns is obviously not solvable. I will discuss the graphical, proportional, and algebraic representation during the analysis of group discussion in the following section. For now, however, note that the use of letter-symbolic algebraic representations is significant because the introduction of algebra in the formal curriculum does not happen until after the unit on rate and speed. It seemed that the students were able to produce algebraic representations without having had any prior, formal instruction in algebra.

### Analysis of group discussions

Not only did groups use a variety of representations, a qualitative analysis of the group discussions revealed that groups, in general, generated several qualitative and quantitative concepts and methods for solving the problems. These concepts, representations, and methods can be seen as structures generated and developed by the group to solve the problem (Anderson 2000; Chi et al. 1981; Schwartz and Martin 2004). The focus of analyzing group discussions was modestly limited to an analysis of the relationship between these student-generated structures, that is, how the representations produced by the groups related to the various qualitative and quantitative methods—both domain-general and domain-specific—that groups discussed and used in their attempt to solve the problem. In so doing, I was able to triangulate the representations evidenced in group-work artifacts with the group discussions. Once again, for the purposes of an illustration, I present an analysis of the discussion of the same group whose problem representations were analyzed in the previous section.

^{1}

Excerpt 1

1 | s3 | This is very complicated |

2 | s2 | I don’t even know how to start… |

3 | s1 | Never mind. Find the simple things first. What’s the distance? What’s their speed for walking? What’s their speed for riding? |

Excerpt 2

1 | s1 | Hey, Hady walks faster than Jasmine. And Hady also rides faster than jasmine |

2 | s2 | Hady, yes |

3 | s3 | Yes. So Jasmine should ride the bike longer |

4 | s2 | Done |

5 | s1 | Wait, we have to find what the problem is, what we have to solve. A 15 km/h times 25 min equals 3.75 km. So 3.75 km…so 8.75 m. Equals 5 km. Hey, they are left with 5 km only. They left 5 km. So…what do we have to solve? Hey, start to solve already. I have everything down here already. Hey help me confirm all the speeds: the biking speed, etc. So…how to solve? |

Excerpt 3

1 | s1 | Ok these are the rates. Ok, they only have one bike. So Jasmine will ride first. She will ride for maybe half the way, then during the same time, this fellow, Hady, will be walking. Then after riding a certain distance, she will stop. Then, she will start walking already, she will walk, walk, walk, then when he reaches the bike…they have to reach at the same time. How to calculate? |

2 | s2 | That is the problem |

3 | s3 | Ok. Must divide equally |

4 | s2 | Wait… so Hady walks faster, Hady also rides the bicycle faster, so Hady should walk the most |

5 | s1 | Ok. It has something to do with their difference |

Excerpt 4

1 | s1 | Hady, he walks the same distance as Jasmine rides. Then Jasmine walks the same distance as Hady rides. It has to be like that. So…we must find the…common factor of their speeds… |

2 | s2 | Oh, we can use the ratios, right? |

3 | s1 | I know |

4 | s2 | Maybe not… |

5 | s3 | Wait, let me see… I am not sure whether this can work or not. If we use the other way right, if we use the ratio of the riding speeds right, it is 3:4 |

6 | s2 | We cannot find by using both the ratios. We can only use the walking speeds |

7 | s1 | But then the timing may be different |

8 | s3 | Seriously I don’t understand this part |

9 | s1 | This can’t be done. No matter how I think it is very complicated… |

10 | s2 | It is very complicated |

11 | s3 | We can’t. If we use the biking speed right, ratio is 4:3, not 5:3 |

12 | s2 | What do you mean? |

13 | s1 | You see, on this line, here is split to 5 here is split into 3. But then down here right, if use riding speed, down here is split to 4 down here is split to 3. Its not 5 you know |

14 | s2 | I know |

15 | s3 | Can’t find it that way |

16 | s2 | How to find it then? |

In utterance 1 of this excerpt, s1 argues that Jasmine’s walking distance should equal Hady’s riding distance, and vice versa, referring to the graphical representation in Fig. 1. He then proposes that they find the common factor of the speeds. As a domain-specific method, the idea is a reasonable one: if the total distance can be divided into base units—the common factor of the speeds—then the distances can be distributed in some relationship with the number of base units in each speed. However, the group once again does not develop this proposal any further. In any case, because there are two different types of speeds, walking and riding, this approach would have been problematic. That said, it is hard to say whether the group recognized the common-factor proposal to be problematic or why this proposal was not developed any further. At any rate, the common-factor proposal provides further evidence that the problem afforded students the opportunities to generate their own structures to solve the problem regardless of whether these structures could be developed further to successfully derive a solution.

In utterance 2, s2 responds by raising the possibility of using ratios, referring to the ratio of the walking and riding speeds of Jasmine and Hady. This was triangulated with the content of their group work, as indeed the use of ratio and proportions in Fig. 1 reveals. Note that using ratios to represent the qualitative insight is yet another structure generated by the group and constitutes a reasonable domain-specific method. However, this method works only if the speeds are in certain critical ratios. The simplest case is when all the speeds are equal, i.e., in a ratio of 1:1, then the distance for walking and riding should be partitioned equally. Likewise and more generally, for symmetrical ratios *k*:1 (*k* being any positive real number), the partition can be easily deduced.^{2} Although s1 seems to agree with s2’s suggestion, s2 herself is not sure whether this method would work. Then s3 realizes the ratio of walking speeds is different from that of the riding speeds, following which s2 argues that they should use the ratio of the walking speeds, and not the riding speeds, to find the partition. In utterance 7, s1 reasons that using just ratio of the walking speeds may result in different overall time for the journey for Jasmine and Hady. This argument is a good one given that one constraint on the solution was that both Jasmine and Hady had to reach the auditions at the same time. Over the next few utterances, all group members reiterate that the problem is complicated, following which s3 reconfirms (in utterance 15) that this method will not work because the ratios of the respective walking and riding speeds are different.

Excerpt 5

1 | s1 | Why don’t you use guess and check? |

2 | s2 | Guess and check will take even longer. Wait, I am thinking, I am thinking |

3 | s1 | Can be possible. I found another method |

In Excerpt 5, s1 proposes the use of a domain-general method of guess-and-check, also known as trial and error. In utterance 2, s2 argues that guess and check will take even longer, suggesting that they should consider another method. Hence, the domain-general method is rejected for being even more time consuming. Once again, the group generated a structure in the form of a guess-and-check method for solving the problem only to reject it because of its perceived inefficiency. In utterance 3, s1 takes up s2’s suggestion and claims that he has found yet another method to solve the problem. This is where the group started to develop an algebraic representation of the problem.

*A*and

*B*, respectively. Likewise, Jasmine’s walking and riding times are denoted as unknowns

*Y*and

*Z*. These unknowns can be seen multiplied to the respective speeds on the graphical representation in Fig. 1. For example, the horizontal line representing Jasmine’s total distance (the group used the letter

*J*next to the line to refer to it) is partitioned into riding and walking distances. The group uses the product of 30 m and

*Z*to represent Jasmine’s riding distance. Similarly, the product of 150 m and

*Y*is used to represent Jasmine’s walking distance. Note that the group did not use the proper units for speed of m/min and evidently also made an error in switching the Jasmine’s walking and riding speeds for some reason. The products, 50 m times

*A*and 200 m times

*B*, represent Hady’s walking and riding distances accurately. Building on their earlier discussion and calculation, the group set up an equation each for Jasmine and Hady representing the total distance they traveled after the bicycle broke down, which according to the group’s calculations earlier was 5 km. In Fig. 1, the system of equations referred to as S1 comprises Hady’s equation:

*A*,

*B*,

*Y*, and

*Z*.

Excerpt 6

1 | s1 | Wait, wait. So that means that… |

2 | s3 | But the problem is |

3 | s1 | That is the total timing |

4 | s3 | I know… So who walks for longer? Who walks more? Hady right? |

5 | s1 | Yes |

6 | s2 | So distance Hady walks should be more than Jasmine riding |

7 | s1 | No…it’s the timing. But the distance is the same…But how to find the distance? |

8 | s2 | Distance is 5 km…they both have to travel 5 km. Time we still don’t know yet |

Having developed three equations involving the four unknown time variables, Excerpt 6 reveals the group’s ideas about finding the unknown time variables for the walking and riding portions of Jasmine’s and Hady’s journeys. For this, the group set up another system of equations referred to as S2 in Fig. 1. They introduced another unknown time variable, *Q*, to represent the total time for the journey. The equations *Z* + *Y* = *Q* and *A* + *B* = *Q* represent the fact that the total time taken by Jasmine and Hady is equal, as captured in the equation *A* + *B* = *Z* + *Y*. At this juncture, the group had altogether two systems of equations (S1 and S2) with a total of five unknown time variables; both systems being unsolvable in the unknown time variables. Within the larger goal of solving the problem, their sub-goal was to find the unknown time variables.

*A*(Hady’s walking time) equals

*Z*(Jasmine’s riding time), and that

*B*(Hady’s riding time) equals

*Y*(Jasmine’s walking time). Conceptually, this proposal is incorrect. Recall that, in Excerpt 4, the group had already established that Jasmine’s riding distance equals Hady’s walking distance, and vice versa. Hence, given that Jasmine and Hady have different speeds, the time for the various walking and riding components has to be different. In the same utterance, however, s1 realizes that his proposal may be problematic. In utterance 2, s3 responds by suggesting that the problem is that

*A*+

*B*equals

*Q*. In utterance 3, s1 responds by saying that what s3 is referring to is total time to which s3 responds (in utterance 4) that she knows. In the same utterance, s3 raises the question of who walks for a longer time and checks whether it is Hady who does so. After s1 confirms (in utterance 5) that it is indeed Hady who walks more, s2 argues that the distance Hady walks should be more than the distance Jasmine rides. It is interesting that s2 makes this conceptual error even though in Excerpt 4 earlier, the group had already established Jasmine’s riding distance equals Hady’s walking distance, and vice versa. In any case, in utterance 7, s1 disagrees with s2 and argues that it’s the timing that is different, the distances are the same. By distances, s1 seems to be referring to Jasmine’s riding and Hady’s walking distances, and not the total distance, which is clearly the same. In the same utterance, s1 asks how the distances can be found. In utterance 9, s2 misinterprets the distance to mean the total distance, and reiterates that the total distance is 5 km, but the time remains unknown. It is not clear whether s2 was referring to the total time,

*Q*, or the component times

*A*,

*B*,

*Y*, and

*Z*. Regardless, the remainder of the discussion revolved around finding the four unknown time variables, an effort that was expectedly and ultimately unsuccessful. The group did not come up with any additional domain-general or domain-specific methods for solving the problem, and after a long discussion over two class periods of 55 min each, s1’s remark at the end of the discussion aptly summed up the group’s problem-solving efforts.

Excerpt 7

1 | s1 | We tried so many methods but in the end, all the methods could not be used |

The analyses of problem representations together with that of the group discussion suggest a reasonable and plausible interpretation: despite producing various insights and inter-connected graphical, proportional, and algebraic representations as well as methods for solving the problem, the group was ultimately unable to succeed in developing even one solution to the problem; a problem that admitted multiple solutions. These conceptual insights, representations, and methods constituted evidence that students generated various forms of structures to solve the problem. Some of these structures were rejected quickly (e.g., guess-and-check method) while others were “abandoned” only when further development resulted in an understanding that the structure was either not suitable for the problem (e.g., ratios method) or resulted in an impasse (algebraic method). Equally importantly, the analyses also revealed that the group did in fact persist in the problem-solving process despite repeatedly not being able to build on their representations and methods to solve the problem successfully. Classroom observations also suggested that groups were quite engaged and tended to persist in the problem-solving process. However, to confirm if this inability to develop even one solution held true more generally across the other groups, I analyzed solutions produced by each group as well as solutions produced by students when they worked individually to solve the what-if, extension problems.

### Group solutions

Analysis of the groups’ solutions suggested that all groups were able to identify relevant parameters such as the various distances, speeds, and time, and perform basic calculations involving these parameters (e.g., calculating time, given speed and distance as shown in Fig. 2 earlier). However, to the extent that groups were actually able solve the problems successfully, we noticed a clear, bimodal distribution, i.e., groups were either able to solve the problem successfully or not despite their extensive exploration of the problem and solution spaces. Solving a problem successfully means that groups were able to build on their representations to devise either domain-general and/or domain-specific strategies, develop at least one solution (recall that the problem admitted multiple solutions), and support it with quantitative and qualitative arguments (Chi et al. 1981; Anderson 2000; Spiro et al. 1992). For the first complex problem, only 11% of the groups managed to solve the problem. For the second complex problem, 21% of the groups managed to solve the problem. Thus, the average success rate was evidently low at only 16%. This was not totally surprising because the problem scenarios were carefully designed and validated for students persisting in their attempt to solve the problems without necessarily solving it successfully.

### Individual solutions to what-if extension problems

Analysis of the solutions produced by students for the what-if extension problems displayed a pattern similar to that of group performance. For the first and second extension problems, 3% and 20% of the students respectively managed to solve the extension problems. Thus, the average success rate for solving the extension problem scenarios was also low at only 11.5%.

### Confidence ratings

After solving the individual what-if extension problems, students in the PF design condition rated the confidence they had in their solutions using a 5-point Likert scale from 0 (0% confidence) to 4 (100% confidence). The average confidence reported by students was low, *M* = 1.22, SD = .82.

### Summary

In sum, the process findings suggested that despite producing various inter-connected graphical, proportional, and algebraic representations and methods for solving the problems, students were ultimately unable to successfully solve the problems, be it in groups or individually. Their self-reported confidence in their own solutions was also reportedly low. This is not to suggest that their efforts were not productive because our findings do suggest that students did persist in the problem-solving process. In fact, part of the argument of productive failure is that there is a hidden efficacy in a persistent exploration of the problem and solutions spaces for representations and methods to solve the problem even though such an exploration may not result in performance success in the conventional sense. However, it is precisely in the conventional sense (e.g., in a traditional classroom setting) that these findings would be considered a failure on the part of the PF students. Indeed, on conventional measures of efficiency, accuracy, and performance success, students in the PF condition seemed to have failed.^{3}

## Outcome analyses and results

Individual outcomes were measured using two post-tests. The first post-test (post-test 1) targeted content covered during the unit on rate and speed. The second post-test (post-test 2) targeted an extension concept—relative speed—that was *not* covered during the unit. The inter-rater reliabilities (*Krippendorff’s alphas*) for scoring post-tests 1 and 2 were .87 and .83, respectively.

### Post-test 1

Students from both the PF and LP classes were given 35 min to complete a 7-item post-test (*Cronbach alpha* = .78) comprising six well-structured problem items similar (not same) to those on the pre-test as well as one higher-order, analysis and application problem item.^{4} (see below for an example of each).

#### A post-test well-structured item

David travels at an average speed of 4 km/h for 1 h. He then cycles 6 km at an average speed of 12 km/h. Calculate his average speed for the entire journey.

#### The higher-order analysis and application item

Hummingbirds are small birds that are known for their ability to hover in mid-air by rapidly flapping their wings. Each year they migrate approximately 8,583 km from Canada to Chile and then back again. The Giant Hummingbird is the largest member of the hummingbird family, weighing 18–20 g. It measures 23 cm long and it flaps its wings between 8 and 10 times per second. For every 18 h of flying it requires 6 h of rest. The Broad Tailed Hummingbird beats its wings 18 times per second. It is approximately 10–11 cm and weighs approximately 3.4 g. For every 12 h of flying it requires 12 h of rest. If both birds can travel 1 km for every 550 wing flaps and they leave Canada at approximately the same time, which hummingbird will get to Chile first?

*F*(1, 72) = 10.69,

*p*= .002, ES (effect size) = .75. The adjusted (i.e., after controlling for prior knowledge) mean performance of students in the PF class,

*M*= 37.8, SD = 4.57, was better than those in the LP class,

*M*= 33.4, SD = 6.52; an average difference of 10% points given that the maximum score possible on post-test 1 was 43. Levene’s test for homogeneity of variance was not significant. We also conducted further analysis by considering the well-structured and higher-order application items on post-test 1 separately. Findings suggested that:

- 1.
On the well-structured items, students from the PF class scored higher,

*M*= 30.8, SD = 4.09, than those from the LP class,*M*= 28.9, SD = 5.13. This effect was statistically significant,*F*(1,72) = 4.87,*p*= .019, ES = .42. However, this difference amounted to only 6% points (maximum score on these items was 32) with a low effect size. Notwithstanding, it was remarkable that PF students who were not given any homework or practice assignments during instruction still managed to outperform LP students on the well-structured items; the type of items that the LP students solved and received regular practice and feedback on during instruction. - 2.
On the higher-order analysis and application item, students from the PF class scored higher,

*M*= 7.0, SD = 3.60, than those from the LP class,*M*= 4.5, SD = 3.55; an average difference of 23% points (maximum score possible on this item was 11). This effect was statistically significant,*F*(1,72) = 8.95,*p*= .004, ES = .98.

Thus, students from the PF class outperformed those from the LP class on both the well-structured items as well as the higher-order analysis and application item on post-test 1 thereby suggesting that the PF hypothesis held up to empirical evidence, even in a relatively-short, 2-week design intervention.

### Post-test 2

Post-test 2 immediately followed post-test 1 and lasted 15 min. The objective of post-test 2 was to determine if there were any differences between students from the PF and LP classes in their ability to learn and apply the extension concept of relative speed on their own. Whereas the concept of speed involves the motion of one body at a time, the concept of relative speed is more difficult and demanding because it involves the motion of two bodies at the same time. It is noteworthy that the formal curriculum does not cover the concept of relative speed until grade 10. Inspired by the assessment experiment designed by Schwartz and Martin (2004), two versions (A and B) of post-test 2 were created, each comprising 2 items on relative speed. Version A comprised the following two items:

#### Item 1

- a.
In 1 s, how many meters will you travel towards your friend?

- b.
In 1 s, how many meters will your friend travel towards you?

- c.
In 1 s, how many meters will the two of you travel towards each other in total?

- d.
Therefore, how many seconds will it take for the two of you to first cross each other?

#### Item 2

Two MRT trains on separate but parallel tracks are traveling towards each other. Train A is 100 m long and is traveling at a speed of 100 km/h. Train B is 200 m long and is traveling at a speed of 50 km/h. How many seconds will it take from the time that the two trains first meet to the time they have completely gone past each other?

For version B, item 2 was exactly the same as in version A, whereas for item 1, parts a, b, and c were removed leaving only part d. Thus, in version A of post-test 2, students first received a structured-response scaffold for item 1 following which the scaffold was removed for item 2. In contrast, both items were unscaffolded in version B of the post-test 2. Further note that item 2 was designed to be conceptually more challenging than item 1. Relative distance between the two runners is explicitly stated in item 1 (i.e., the distance to be covered by the two runners on the track is 400 m). In contrast, for item 2, students had to ascertain the relative distance using the lengths of the two trains before using relative speed to calculate the time.

Recall that all students from both the experimental conditions took the same post-test 1. However, for post-test 2, roughly half of the students in each condition (PF and LP) were randomly assigned to take either version A or B of post-test 2. Thus, version A and B of post-test 2 created a nested factor within the experimental condition (PF or LP), i.e., a 2(PF or LP) by 2(version A or B of post-test 2) design. This created four groups of students, namely, PF-A (18 students), PF-B (19 students), LP-A (19 students), and LP-B (19 students), where PF-A refers to students in the PF condition who received version A of post-test 2, PF-B refers to students in the PF condition who received version B of post-test 2, and so on for the LP condition.

*Krippendorff’s alphas*) for items 1 and 2, respectively. Figures 3 and 4 show the percentage of students who were successful or unsuccessful for items 1 and 2, respectively in each of the four groups PF-A, PF-B, LP-A, and LP-B.

For item 1 in Fig. 3, two findings are noteworthy. First, students from PF-A group (PF students who received scaffolded version of item 1) had a significantly greater success rate than all the other students, including students from the PF-B group who did not receive the scaffolded version of item 1, *χ*^{2}(1) = 7.98, *p* = .005. This finding suggested that students from the PF condition were better prepared to use a structured-response scaffold, when provided with one, to solve a problem involving a new concept. If this were not the case, then students from the PF-A group would not have outperformed those from the PF-B, LP-A, and LP-B groups. Second, students in the PF condition seemingly demonstrated greater capacity to generate structures in the absence of externally-provided structure or scaffolds. As can be seen from Fig. 3, students from the PF-B condition who solved item 1 *without* a structured-response scaffold were just as successful as students from the LP-A condition who solved item 1 *with* the scaffold. In other words, students from the PF-B condition seemed to be able to generate the structures needed to solve a new problem when such structures were not provided to them. The two findings are significant because, when confronted with a novel problem, the capacities to utilize structure when structure is provided or generate relevant structures when structure is not provided constitute important dimensions of expertise (Anderson 2000; Chi et al. 1981, 1988; Dixon and Bangert 2004; Hatano and Inagaki 1986).

For item 2 in Fig. 4, the success rates were considerably lower across the four groups when compared with the corresponding success rates for item 1 in Fig. 3. This was expected because item 2 was designed to be conceptually more challenging than item 1, as described earlier. Still, the fact that 50% of students from PF-A managed to solve item 2 successfully—a success rate that was more than twice the success rates achieved in each of the other three groups—was statistically significant, *χ*^{2}(1) = 6.77, *p* = .009. One may argue that a success rate of 50% is not high but considering that the concept of relative speed was not covered during the unit and that this concept is not formally covered until grade 10, a 50% success rate is arguably significant. Finally, although it may seem from Fig. 3 that students in the PF-B condition were perhaps more unsuccessful in solving item 2 than their counterparts in the LP condition, this difference was statistically not significant.

## General discussion

This study was designed to compare a productive failure instructional design with a conventional lecture and practice instructional design. I wanted to test whether or not there is a hidden efficacy in delaying structure in the learning and performance space of students by having them engage in unscaffolded problem-solving of complex problems first before direct instruction. Findings from the study suggest: (a) despite seemingly failing in their collective and individual problem-solving efforts, students from the productive failure condition significantly outperformed their counterparts from the lecture and practice condition on the targeted content in post-test 1, and (b) the productive failure instructional design better prepared students to use the structured-response scaffolds as an opportunity to learn an advanced concept of relative speed that was not even targeted during instruction. This is an important finding because an often-made, albeit implicit instructional assumption (at least in the classrooms I have been working with) is that students are prepared to use the structure or scaffolds designed for them. From the perspective of the teacher or the instructional designer, structuring the lesson right from the start only makes good pedagogical sense. Findings of this study suggest that this assumption is not always valid. Taken together, findings from the study are consistent with and add to a growing body of research underscoring the role of productive failure in problem solving and learning (Clifford 1984; Kapur 2008; McNamara 2001; Schwartz and Martin 2004; VanLehn et al. 2003).

As hypothesized, explanation for the above findings comes from two interconnected mechanisms that are germane to learning. First, the productive failure design afforded opportunities for students to generate and develop their own structures—concepts, representations, and methods—for solving the complex problems (Amit and Fried 2005; Even 1998; Kapur 2008; Kapur et al. 2008; Schwartz and Martin 2004). Analysis of group discussions and artifacts did in fact reveal that students in the productive failure condition generated a diversity of inter-connected concepts, representations, and methods to solve the complex problems. The capacity to generate such structures in the absence of scaffolds is indeed an important dimension of expertise (Anderson 2000; Chi et al. 1981; Kaput 1999; Lampert 2001; Lesh and Doerr 2003).

Second, the process of generating a diverse set of structures while exploring the problem and solution spaces may have engendered sufficient knowledge differentiation even though it did not result in a successful solution. Such knowledge differentiation was critical for learning because it prepared students to better *discern* and understand those very concepts, representation, and methods when presented in a well-assembled, structured form during the consolidation lecture (Gibson and Gibson 1955; Marton 2007). Solving the complex problems influenced what and how they learned from the consolidation lecture, and helped them discern critical and relevant aspects of the concept of rate and speed. Discernability, in turn, may have resulted in better *knowledge assembly* (Schwartz and Bransford 1998; Spiro et al. 1992). Having explored various representations and methods for solving the complex problems, students in the productive failure condition perhaps better assembled the targeted concepts as well as understood the affordances of the representations and methods when delivered by the teacher during the consolidation lecture (Greeno et al. 1993). In other words, when the teacher explained the “correct” concepts, representations, and methods for solving the problem, students better understood not only why the correct concepts, representations and methods work but also how and why the “incorrect” ones, the ones they tried, did not work (Kaput 1999). Taken together, the two mechanisms may explain why students from the productive failure condition outperformed their counterparts on not only the well-structured items but also the higher order transfer item. They also explain why these students were better prepared to make use of the structured-response scaffold (in post-test 2) to learn a new concept on their own.

### Instructional implications

Though it is hard to derive broad implications from one study, I believe the findings from this study do suggest some warrants for deriving what may be preliminary but important instructional implications, particularly as they relate to: (a) designing instruction for transfer of learning, and (b) the role of persistence in problem solving.

#### Designing for transfer

From the standpoint of transfer, the efficacy of an instructional design can be assessed in two major ways: (a) the extent to which students learn (and can apply) concepts and skills targeted during instruction—a *static* view of transfer, and (b) the extent to which students, when given an opportunity to learn, are able to learn (and apply) new concepts and skills not targeted during instruction—a *generative* view of transfer (Mestre 2005; for a discussion, see Marton 2007). According to Marton (2007), “The [static] notion of transfer as a function of sameness derives from the deeply seated view of learning as being based on repetition and habituation. However, the fundamentally fascinating question, both from the point of basic research on learning and from the point of schooling, is how learning one thing now can prepare us to learn something else in the future… One learns something in some situations, and then one becomes better at learning something else in other situations” (p. 530).

It follows that, from a static view of transfer, the efficacy of an instructional design lies in the extent to which it prepares learners to deal with situations that are similar to those experienced during the instructional cycle. From a generative view, however, the efficacy lies in the extent to which an instructional design prepares learners to deal with novel situations that are different from the ones experienced during the instructional cycle. Hence, an important instructional implication is to focus on designing experiences that prepare learners not only to deal with situations that are similar to the ones they have experienced but also those that are novel (Hatano and Inagaki 1986). On both counts, findings from this study demonstrated a greater efficacy of the productive failure instructional design.

#### The role of persistence in problem solving

Notwithstanding the fact that the present study cannot speak to the effect of the individual design elements of the productive failure instructional design, designing for persistence was central to the productive failure design and has important implications for problem solving. Many instructional designs make either implicit or explicit commitments to the Vygotskian (1978) notion of a zone of proximal development (ZPD) (Puntambekar and Hübscher 2005). The ZPD is defined as the “distance between the child’s actual developmental level as determined by independent problem solving and the higher level of potential development as determined through problem solving under adult guidance and in collaboration with more capable peers” (Vygotsky 1978, p. 86). Enabling the learner to bridge this gap between the actual and the potential requires the provision of support structures, which need not necessarily be in the form of a more capable person (e.g., a teacher, expert) but may also include tools, instructional facilitation, and so on. It is not surprising then that the notion of scaffolding, originally conceived by Wood et al. (1976), was eventually linked with the notion of ZPD (Bruner 1985).

More pertinent to the argument here is that the design of instruction in the ZPD invariably involves support structures designed to help learners successfully solve a problem or carry out a task that they would not be able to achieve on their own (Wood et al. 1976). A ZPD-driven focus on achieving performance success, therefore, clearly necessitates the provision of relevant support structures and scaffolds during problem solving (Puntambekar and Hübscher 2005). In my research program on productive failure, the focus was more on students persisting in problem solving than on actually being able to solve the problem successfully. Instead of a focus on achieving performance success, a focus on persistence does not necessitate a provision of support structures as long as the design of the problem allows students to make some inroads into exploring the problem and solution spaces without necessarily solving the problem successfully. As the analyses revealed, students produced several representations, and methods for solving the problems, which means that they were able to persist in the problem-solving process without achieving performance success. An important instructional implication is that there is efficacy in persistence itself even though it may not lead to success in performance. Of course, an emphasis on persistence comes with its own set of problems because of students have varying levels of persistence, not all students persist in problem-solving, the nature of their persistence varies, and relationship between the extent to which students persist and the nature of their persistence relates to learning remains an open and important question for future research.

### Limitations and future work

It is of course much too early nor is it my intent to attempt broad generalization of the claims based on a single study; the scope of inference holds only under the conditions and settings of the respective study and is thus circumscribed by the content domain, communication modality, age-group, and socio-cultural factors, and so on. Furthermore, these findings may only be attributed to the productive failure instructional design as a whole and not to its constituent design elements of collaboration, unscaffolded solving complex problem scenarios designed for persistence, and delay of structure (Brown 1992). The reality of working in real classroom contexts rarely allows, if at all, for strict causal attribution to design elements to be achieved in a single study. However, recall that my strategy was to put greater emphasis on a comparison of designs as wholes vis-à-vis causal attribution of effects to design elements (Fishman et al. 2004; Tatar et al. 2008; Tharp and Gallimore 1982). As such, this study essentially presents an existence proof for a productive failure instructional design, and much future research needs to be carried out to provide a fuller explanatory support. Going forward, therefore, suggestions for future research may be broadly categorized into two major areas.

First, future research would do well to extend this study to larger samples within and across schools. At the same time extending to target units other than rate and speed is also important because I do not expect the productive failure design to be equally effective for all curricular units. Such replications and extensions of productive failure research form the focus of my current research program, and they may also help in unpacking the effect of the various design elements insofar as it is meaningful and possible to do so. For example, one could always argue that perhaps students in the PF condition performed better on the post-tests because they had more collaborative activities built into the larger design. This is a perfectly valid argument that the present study was not designed to address given it’s a priori focus on comparing instructional designs as wholes. However, I do hope to address it in part in the upcoming cycle of studies where we are designing the LP condition to have a similar emphasis on collaborative activities. Even so, it will still not be possible to unpack the effect of collaboration per se but at least we will have some petite comparisons that may provide some insight into the dynamics of the design elements, and in so doing, help in our efforts towards a fuller explanatory proof of productive failure.

Second, what also needs immediate unpacking is the variation within PF groups. While, on average, PF groups exhibited productive failure, naturally some “failed” more than others in the shorter term inasmuch as some gained more than others in the longer term. What explains this variation *within* the PF condition? Examining learner characteristics as well as the nature of interactional behaviors and relating them to eventual gains in group and individual performance would be most insightful. For example, examining productive failure from cognitive aspects alone is not sufficient; learners’ motivation, engagement, and frustration thresholds for solving complex problems may be particularly critical as well; low frustration thresholds and motivation levels may not result in productive failure. Further interactional analyses might speak to these concerns and add further explanatory power to productive failure. Therefore, much more work, including follow-up studies, has to be carried out before any meaningful findings emerge.

## Conclusion

In the classrooms that I have been working in, the conventional bias has typically been towards heavy structuring of instructional activities right from the start. The basic argument being—why waste time letting learners make mistakes when you could give them the correct understandings? This arguably makes for an efficient process but what PF suggests is that processes that may seem to be inefficient and divergent in the shorter term potentially have a hidden efficacy about them provided one could extract that efficacy. The implication being that by not overly structuring the early learning and problem solving experiences of learners and leaving them to persist and possibly fail can be a productive exercise in failure. I am neither alone nor the first one to advocate this, and research on productive failure only adds weight to the growing number of voices that have alluded to resisting an all-so-common, efficiency-dominant rush to overly structure the learning and performance space of learners (Dillenbourg 2002; Scardamalia and Bereiter 2003). Perhaps one should resist the near-default rush to structure learning and problem-solving activities for it may well be more fruitful to first investigate conditions under which instructional designs lead to productive failure as opposed to just failure.

Minimal language and grammar editing was necessary to make them readily accessible to a wide audience because students often use a local variant of English called “Singlish” (short for Singapore English) in their interactions with each other.

For example, if Jasmine’s walking speed is greater than that of Hady’s by a certain proportion, say *k*, and if Hady’s riding speed is greater than that of Jasmine’s by exactly the same proportion *k*, then the ratio method works well to suggest a partition that is also in the same proportion *k*.

It is important to note that the process findings also double up as a manipulation check demonstrating that students in the PF condition experienced failure at least on the above mentioned conventional measures. In contrast, students in the LP condition, *by design*, repeatedly experienced performance success in solving well-structured problems under the close monitoring, scaffolding, and feedback provided by the teacher. This is also the reason why the process analyses focuses only on the PF condition.

Due to practical time constraints, only one class period of 55 min was available for carrying out the two post-tests. Hence, we were unable to include more than one higher-order analysis and application item on the post-test because such items demand more time and effort. Hence, the higher-order, application problem on the post-test did not match the complexity of the ill-structured problem used for group work (such as the one in Appendix 1), for which groups were given two class periods to solve.

## Acknowledgments

The research reported in this paper was funded by a grant from the Learning Sciences Lab of the National Institute of Education of Singapore. I would like to thank the students, teachers, the head of the department of mathematics, and the principal of the participating school for their support for this project. I am also grateful to Professors David Hung, Katerine Bielaczyc, Katherine Anderson, Liam Rourke, Michael Jacobson, and anonymous reviewers for their insightful comments and suggestions on earlier versions of this manuscript. Parts of this manuscript have also been presented at the 2008 meeting of the Cognitive Science Society.