Introduction

The Productive Failure (PF) approach is an instructional design that combines an initial problem-solving phase with a subsequent direct instruction phase (Kapur, 2012; Kapur & Bielaczyc, 2012). During the problem-solving phase, learners deal with a challenging novel problem which prepares them for learning from a subsequent instruction. It has been shown that students who attempt to solve a problem prior to instruction outperform students who receive direct instruction followed by problem solving in conceptual knowledge—this effect is known as the PF effect (e.g., Kapur, 2012, 2014; Loibl & Rummel, 2014a, 2014b, see Loibl et al., 2017; Sinha & Kapur, 2021 for an overview). The PF approach is a subvariant of the instructional designs within the framework of problem solving prior to instruction (PS-I). While PS-I also includes similar instructional designs such as ‘invention as preparation for future learning’ (see e.g., Schwartz & Martin, 2004; Schwartz et al., 2011), PF differentiates from other PS-I settings through specific design principles, which have been labelled PF fidelity criteria (Sinha & Kapur, 2021). The PF fidelity criteria specify the design of the problem-solving phase and the instruction phase. That is, the problem needs to afford the generation of multiple solution attempts and demonstrate an affective draw that triggers students’ situational interest (Sinha & Kapur, 2021). The students need to work in groups and motivational scaffolds should be provided to create a social surround facilitation during problem solving (Sinha & Kapur, 2021). Subsequently, the instruction phase needs to build on student-generated solutions and, similarly as in the problem-solving phase, create a social surround facilitation in the instruction by encouraging conversation and activate participation (Sinha & Kapur, 2021). The original PF task that includes all these design principles, deals with the topic of variance (i.e., standard deviation) in the domain of mathematics (Kapur, 2012). The topic is part of the secondary school curriculum. This PF task therefore addresses (mid) secondary school students. As this original PF task has been used in various studies that have replicated the PF effect (e.g., Kapur, 2012, 2014, 2015; Loibl & Rummel, 2014a, 2014b), we will refer to it as the traditional PF design in the following.

As highlighted by the PF fidelity criteria, collaboration is described as one of the essential design components of the problem-solving phase in PF (Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021). It is assumed to facilitate the preparation for subsequent learning by enabling beneficial processes such as explanation and elaboration within the group (Kapur & Bielaczyc, 2012). While most studies in PF adhered to the fidelity criteria and employed a collaborative problem-solving setting (e.g., Kapur, 2012; Kapur & Bielaczyc, 2012; Loibl & Rummel, 2014b), some studies indicate that individual problem-solving activities can also replicate the PF effect (e.g., Kapur, 2014, 2015; Loibl et al., 2020). However, these studies did not compare individual to collaborative problem solving. If individual problem solving also affords a preparation for learning in PF, this raises the question whether collaboration is indeed an essential design component that prepares for learning better than individual problem solving. In fact, so far, most studies that have experimentally variated collaborative and individual problem solving in PF settings (see Mazziotti et al., 2019; Sears, 2006; Weaver et al., 2018) found no significant differences in post-test between collaborative and individual PF activities. However, these studies diverged significantly from the traditional PF design (i.e., they relied on different age groups and domains) and violated important PF fidelity criteria with regards to task design (e.g., not affording the generation of multiple solutions). Indeed, studies that have examined the effectiveness of PF in other domains and age groups have led us to assume that boundary conditions for the PF effect do exist (see Nachtigall et al., 2020 for PF in the domain of social sciences; see Mazziotti et al., 2019 for PF with younger students). For these reasons, it remains unclear whether the studies that compared individual to collaborative problem solving, did not find an effect of collaboration because they violated important design principles of a PF study, used a domain or age group, which is not suitable for PF, or whether collaboration indeed might not be as relevant to the PF design as it is assumed. Thus, in order to learn about the true effect of collaboration in PF, a study is needed that compares individual to collaborative problem-solving in the traditional PF design for which the PF effect has been shown.

The present study aims to fill the missing gap in the investigation of collaboration in PF. We investigate the effect of collaboration for conceptual learning in PF within the traditional PF setting of secondary school mathematics. We specifically focus on conceptual knowledge acquisition, as it is the main learning variable that is facilitated through the PF design (Loibl et al., 2017). In contrast, the PF design does not aim to foster procedural learning and, thus, usually shows no special effect with regards to procedural knowledge, while results on transfer are inconsistent (Loibl et al., 2017). In the following, we first critically examine previous studies that have compared collaborative and individual problem solving in PF and subsequently focus on the cognitive mechanism that could relate to collaboration and afford learning in PF.

Collaboration in productive failure: previous studies

In the PF problem-solving phase, students deal with an unfamiliar mathematical problem and are instructed to generate as many solutions to it as possible (Kapur & Bielaczyc, 2012). As they still lack the necessary knowledge to solve the problem, they struggle and generate incomplete or erroneous solutions. Subsequently, they receive a direct instruction on the to-be-learned concept. In this instruction, erroneous student solutions are contrasted with the canonical solution, leveraging students’ activated prior knowledge from the previous problem-solving phase, and connecting it to the conceptual features of the canonical solution (Loibl et al., 2017).

Collaboration is regarded as one of the main design principles of PF and is assumed to be a condition for the success of the PF effect (Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021). Students are expected to struggle to produce solutions to the unknown problem together. Even in groups, students are not capable yet of solving the problem together, however, collaborative problem-solving processes such as explanation and elaboration of the erroneous solutions, are assumed to prepare students for subsequent instruction (Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021). Indeed, qualitative studies which adhered to PF fidelity and compared collaboration processes in PF (albeit without a contrast to individual problem solving) showed that collaboration did influence students’ learning from instruction. In these studies, the quality of the collaboration process during the problem-solving phase significantly affected the students’ post-test scores (Kapur & Bielaczyc, 2012; Kerrigan et al., 2021; Nachtigall & Sung, 2019).

Most of the existing PF studies across different domains have implemented a collaborative PF design (e.g., Chowrira et al., 2019; Kapur, 2014; Loibl & Rummel, 2014a; Nachtigall & Sung, 2019) and have replicated the PF effect, suggesting that collaboration is indeed an essential component of the PF design. Yet, studies with individual problem solving have also shown the PF effect (e.g., Hartmann et al., 2020; Kapur, 2014, 2015). This raises the question whether collaboration indeed affects students’ preparation for learning from instruction better than individual problem solving in PF.

On an empirical level, only few studies have compared the effect of collaborative and individual problem solving on preparation for learning in PF. While these studies did not reveal significant differences between collaborative and individual conditions, the findings are largely inconclusive with regards to the effect of collaboration in PF due to two reasons: (1) The studies diverged majorly from the domain or age group of the traditional PF design for which the PF effect has been shown, or (2) the design of the study lacked PF fidelity and therefore violated important PF design principles. Consequently, the lack of an effect might not have been caused by the fact that collaboration is not important to PF but because the PF task design was not realized adequately or the domain and age group might have not been suitable for the PF design.

Mazziotti et al. (2019) experimentally varied collaborative and individual problem solving with fifth graders in mathematics (Mazziotti et al., 2019). No significant differences between conditions were found. While they adhered to PF fidelity criteria (see Sinha & Kapur, 2021), they examined an age group that diverged from that of the mid secondary school students in the traditional PF design. In line with this result, the meta-analysis by Sinha and Kapur (2021) indicates that PF is less effective for younger learners. The missing effectiveness of PF effect for younger students indicates a potential boundary condition of PF (Mazziotti et al., 2019).

Weaver et al. (2018) focused on undergraduate students in physics. The collaborative and individual conditions did not differ in post-test, but in their performance during the learning activity, which was superior in the collaborative condition. However, Weaver et al. (2018) did violate PF fidelity criteria with regards to the design of the problem-solving and instruction phase: Students did not generate multiple solutions and instruction was not built on student solutions (Sinha & Kapur, 2021). Moreover, Weaver et al. (2018) also focused on an older age group and a different domain.

Sears (2006) relied on a mathematics task for undergraduate students. He found that students working collaboratively performed significantly lower in a post-test on comprehension than individually working students, but better on far-transfer problems. However, again, the study lacked PF fidelity as the task did not require students to generate multiple solutions and student solutions were not discussed during instruction (Sinha & Kapur, 2021). Moreover, the sample in Sears’ study (2006) relied on an older age group than traditional PF designs.

Overall, current experimental findings seem to be inconclusive: While most studies did not find significant difference between collaboration and individual preparation for learning in post-tests, some have found beneficial effects of collaboration in process and transfer results. Even more importantly, the studies do not provide evidence on the true effect of collaboration in PF as the studies lack PF fidelity and focus on different domains and age groups than the traditional PF design for which the PF effect has been shown. Potentially missing effects therefore can be due to the divergence from the traditional PF design and fidelity, not because collaboration does not contribute to PF.

Conclusively, we require a study that investigates the true effect of collaboration by comparing students’ preparation for learning through collaborative and individual problem solving in the traditional PF setting and with adherence to PF fidelity criteria. Only by this, reliable empirical evidence for the necessity of collaboration as a design criterion in PF can be established. In the following, we examine how collaboration might specifically prepare students for subsequent learning.

The preparatory function of collaborative problem solving

The instructional effect of the traditional PF setting is assumed to be rooted in three preparatory mechanisms: prior knowledge activation, awareness of knowledge gaps, and recognition of deep features (Loibl et al., 2017). If collaboration is indeed a prerequisite for students’ preparation for learning from instruction, it could be assumed that it is directly related to the three preparatory mechanisms that facilitate students’ learning.

It is expected that students need to activate their prior knowledge during the problem-solving phase when they generate solutions to the unknown problem (Kapur & Bielaczyc, 2012; Loibl, et al., 2017). The generated solutions are regarded as a proxy for prior knowledge activation (Sinha & Kapur, 2021). Kapur and Bielaczyc (2012) propose that collaboration facilitates the activation and differentiation of prior knowledge by “enrich[ing] the shared representational and solution spaces” (p. 51). This suggests that collaboration helps students activate broader prior knowledge than individual problem solving. Indeed, in collaborative problem solving, students are required to create a shared understanding of the task by sharing their own prior knowledge and functioning as a pool of resources (Schwartz, 1995; Wittenbaum et al., 2002). In an overview of the cognitive advantages and disadvantages of collaboration, Nokes-Malach et al. (2015) highlight how shared knowledge might help to cue or complement prior knowledge based on studies in collaborative recall (Congleton & Rajaram, 2011; Johansson et al., 2005). They conclude that collaboration can enable higher cognitive and problem-solving resources (Nokes-Malach et al., 2015).

However, evidence for the success of sharing knowledge effectively and efficiently is mixed (Nokes-Malach et al., 2015). For instance, brainstorming literature showed that group members block each other’s idea generation and retrieval (Basden et al., 1997; Diehl & Stroebe, 1987). Yet, Nokes-Malach et al. (2012) found that collaborative inhibition did not occur with novices who had some relevant prior knowledge, opposed to experts or participants with no prior knowledge at all. Similarly, in PF, students have some relevant prior knowledge that allows them to generate multiple solutions, but not the necessary knowledge to find the correct solution to the problem (Loibl et al., 2017). This could lower the risk of collaborative inhibition in this setting. Consequently, collaboration in PF could help students to amplify their prior knowledge, leading to a broader prior knowledge activation than in individual problem solving. Such a broader activation could become apparent through a higher quantity of generated solutions in collaborative than individual problem solving. As activated prior knowledge facilitates students’ processing of the subsequent instruction, a broader activation could yield a more beneficial preparation for learning from instruction for the collaborative condition (Loibl et al., 2017).

During problem solving, the students largely generate erroneous or incomplete solutions due to their limited prior knowledge (Kapur & Bielaczyc, 2012; Loibl et al., 2017). The erroneous solutions prompt students to reflect on their own present knowledge and to become aware of their knowledge gaps, which can then be filled more easily during instruction (Loibl et al., 2017). In a collaborative setting, students not only share, but also discuss their prior knowledge (Schwartz, 1995). This gives them the opportunity to detect and correct errors (Nokes-Malach et al., 2015; Rajaram & Pereira-Pasarin, 2010). Engaging in discussion of potentially conflicting views or discrepancies in prior knowledge might cause cognitive conflict that is fruitful for the group's collaboration. Experiencing and attempting to solve socio-cognitive conflict has been established as one of the key beneficial processes that enables successful collaborative learning (Mugny & Doise, 1978). Attempting to solve conflict requires students to explain and reflect upon possible misconceptions and gaps in their own knowledge (King, 2007). This could help students in a collaborative setting to gain a higher awareness of knowledge gaps compared to students who work individually. Learning about one’s own knowledge gaps may also raise the students’ curiosity and motivate them to learn about the canonical solution during the subsequent instruction phase (Glogger-Frey et al., 2015). Due to a more pronounced awareness of knowledge gaps, students of a collaborative PF condition may report a higher curiosity to learn about the canonical solution. Therefore, socio-cognitive conflict may enable several potentially beneficial processes that could prepare students better for subsequent learning than individual students.

Furthermore, trying to solve socio-cognitive conflict with the help of group discussion may help students to unravel some of the deep features of the canonical solution. Collaboration is assumed to facilitate attention to, and explanation and elaboration of critical features of the canonical solution (Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021). This is supported by literature on collaborative learning, which shows that groups who experience socio-cognitive conflict make progress in resolving it despite still lacking the necessary knowledge (Mugny & Doise, 1978), similar to the PF setting. Thus, socio-cognitive conflict could also beneficially affect the last preparatory mechanism: deep feature recognition. Even though deep feature recognition is mostly afforded during the instruction phase, when student solutions are compared and contrasted with the canonical solution, resolving socio-cognitive conflict in collaboration could enable students to unravel deep features of the canonical solution already before instruction (see Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021), expanding beyond their individual limited knowledge and capabilities (see e.g., Mugny & Doise, 1978). Such progression in the recognition of deep features of the targeted concept would likely become apparent in students’ generated solutions, resulting in a higher quality of solutions, as they already include important features of the canonical solution. Due to the early recognition of deep features, students of a collaborative condition likely are able to structure and integrate new knowledge during instruction better than individual students who only then learn about the deep features (Loibl et al., 2017).

Overall, on a theoretical level, collaboration facilitates each of the preparatory mechanisms in PF. This suggests that collaborative problem solving affords a superior preparation for learning from subsequent instruction in a traditional PF setting than individual problem solving, which leads to higher conceptual knowledge after instruction.

Research questions and hypotheses

We examine the effect of collaborative (vs. individual) problem solving for the preparation of learning in PF within an experimental study in the traditional PF setting of secondary school mathematics. In PF, students’ preparation for learning is assumed to be facilitated by three preparatory mechanisms: prior knowledge activation, awareness of knowledge gaps, and deep feature recognition. We argue that collaboration specifically affords each of these three preparatory mechanisms, leading to better conceptual knowledge after instruction than individual problem solving. Prior knowledge activation becomes apparent in the quantity of solutions, and deep feature recognition in the quality of solutions. Awareness of knowledge gaps is connected to students' curiosity, both of which are assessed by students’ self-reports.

To test our assumptions, we propose the following hypotheses and mediation model (for an overview, see Fig. 1):

Fig. 1
figure 1

Mediation hypotheses

H1

We expect that students of the collaborative condition outperform those in the individual condition with regard to conceptual knowledge after instruction.

H2

Collaboration is assumed to foster prior knowledge activation, apparent in a higher quantity of generated solutions (mediator 1), which, mediates the effect of the condition on conceptual knowledge after instruction.

H3

Students of the collaborative condition are more likely to experience socio-cognitive conflict and thus become aware of their knowledge gaps (mediator 2a), which in turn raises their curiosity (mediator 2b), and finally, their conceptual knowledge after instruction.

H4

Socio-cognitive conflict could help collaborative students to recognize deep features of the targeted concept, visible in a higher quality of solutions (mediator 3). The quality of solutions is hypothesized to mediate the effect of the condition on conceptual knowledge after instruction.

Method

Sample and design

The sample was selected based on the availability of secondary schools in a densely populated area of Germany. Two secondary school agreed to participate in the study with nineFootnote 1 classes in total. The sample consisted of 162 tenth-grade students. Each student within a class was randomly assigned to one of two experimental conditions: a collaborative PF condition (PF COLL: n = 105), divided into 35 groups of three, and an individual PF condition (PF IND: n = 57). 25 students had to be removed from the original sample as they did not complete one of the phases. The final sample for analysis consisted of 137 participants (PF COLL: n = 87; PF IND: n = 50),Footnote 2 who were on average 16.16 years old. More female than male students participated in the study (male: n = 44, female: n = 93). In order to take the dropout into account, we ran a post-hoc Monte Carlo power analysis for mediation effects using the online app provided by Schoemann et al. (2017). The power was calculated for each path of the mediation model separately (see Schoemann et al., 2017 for recommended simulation inputs). Based on previous studies, we expected a medium effect for each of the paths in the mediation model (see Fig. 1). Ideally, this effect should be detectable with a power of 0.80 (see Cohen, 1992). To calculate the power of the mediation paths most precisely, we determined the smallest possible effect size that would be detectable with our sample with a power of around 0.80 and additionally provided the power for the medium effect size that we were expecting. The effect sizes were based on Fritz and MacKinnon (2007) and entered into the app provided by Schoemann et al. (2017) to yield the power of each path for an alpha level of 5%. This simulation revealed that our sample was sufficient to detect mediation effects of not only medium but also smaller effect size between a medium and small effect (0.26, labelled as small-medium from here; 0.39 labelled as medium effect, see Fritz & MacKinnon, 2007 for effect sizes) for all paths of the model (see Fig. 1), with a power of 0.78 or higher (i.e., a small-medium effect with a power of 0.78 and a medium effect with a power of 0.90 for a1b1, quantity of solutions, a small-medium effect with a power of 0.81 and a medium effect of 0.39 with a power of 1.00 for a2b2, quality of solutions; a small-medium effect of 0.30 with a power of 0.86 and a medium effect with a power of 1.00 for a3d43b4, awareness of knowledge gaps and curiosity).

Procedure

The study was divided into two phases (see Fig. 2). In phase 1, students worked on a 20-min pre-test before they were randomly assigned to one of the experimental conditions (PF COLL or PF IND). Students in the collaborative PF condition were randomly assigned to groups of three and instructed to work together. Both conditions worked on the mathematical problem for 45 minutes. After the problem-solving phase, students were asked to report their awareness of knowledge gaps and curiosity. Phase 2 consisted of a 45-minute instruction on the targeted concept of variance. The instruction highlighted typical student errors to explain the different components of the canonical solution and was based on the instruction by Loibl and Rummel (2014b). Finally, students completed a 30-minute post-test that measured knowledge outcomes.

Fig. 2
figure 2

Procedure of the study

Material and instruments

The materials and instruments in this study were based on those in previous traditional PF studies (e.g., Hartmann et al., 2021, 2022a, 2022b; Kapur, 2012; Loibl & Rummel, 2014a). During the problem-solving phase, students engaged with the traditional mathematical problem on the concept of variance (e.g. Kapur, 2014; Loibl & Rummel, 2014a). The students were asked to determine the most reliable soccer player based on a list of three players’ goal scores across ten years. Students were instructed to generate as many solutions to this problem as possible.

As control variables, we assessed students’ prior knowledge, mathematical ability, and mathematical self-concept. Prior knowledge and mathematical ability were selected because they are frequently included as control variables in PF studies and often have been shown to affect students’ learning (e.g., Kapur, 2012, 2014; Loibl & Rummel, 2014a, 2014b). We additionally included mathematical self-concept as it could be relevant to students’ learning in the context of PF by affecting how students handle struggling when they deal with a challenging and unfamiliar mathematical problem. A short pre-test of five tasks (Cronbach’s alpha = 0.388)Footnote 3 was adopted from Loibl and Rummel (2014a) to measure prior knowledge on descriptive statistics. For the prior knowledge test, inter-rater reliability between two raters was calculated for 10% of the data and yielded an almost perfect agreement (ICC (3, k) = 0.986, 95% CI [0.958, 0.995]).

Students’ mathematical ability was determined by the students’ last two grades in mathematics. For the students’ mathematical self-concept, a scale by Rost and Sparfeldt (2002) measured the students’ beliefs about their skills and achievements in mathematics (e.g., “It is easy for me to solve problems in mathematics.”) with eight items on a 6-point Likert scale (Cronbach’s alpha = 0.93).

The mediation variables quantity of solutions, awareness of knowledge gaps, curiosity, and quality of solutions were assessed as following: the quantity of each student’s generated solutions was based on the number of separate solution attempts that were generated during the problem-solving phase. After the problem-solving phase, we collected students’ self-reported awareness of knowledge gaps (e.g., “During the learning phase, I have realized that I do not yet know some things.”) with seven 6-point Likert scale items (Cronbach’s alpha = 0.694) adopted from Loibl and Rummel (2014b) and Glogger-Frey et al. (2015). The students’ self-reported curiosity was measured using five items (e.g., “Now, I would like to know more about the task and content.”) (Cronbach’s alpha = 0.90) on a 6-point Likert scale (Naylor, 1981).

The measure of solution quality was based on a coding scheme developed by Loibl and Rummel (2014a): the solution attempt of each student was scored, depending on how many components of the canonical solution it included. The score ranged from zero (i.e., none of the canonical solution components were included) to four (all components included). A students’ quality of solutions was based on their highest rated solution attempt. Two raters coded 10% of the data with an almost perfect agreement (ICC(3, k) = 0.979, 95% CI [0.935, 0.993]).

As dependent variable, we measured conceptual knowledge on the target concept of variance in a post-test. The test included eight mathematical tasks and was a revised version of the post-test by Loibl and Rummel (2014a). It had a satisfactory internal consistency (Cronbach’s alpha = 0.732). We calculated inter-rater reliability for 10% of the post-test data and found high consistency between the two raters (ICC(3, k) = 0.956, 95% CI [0.770, 0.988]).

Statistical analysis

For all analyses, we included the highest number of students possible. As some students presented missing data (e.g., in the prior knowledge and intermediate test), these participants were excluded in the respective analyses. That is, one participant each was excluded for analysis of the variable mathematical self-concept, awareness of knowledge gaps, resulting in two missing students for the overall mediation analysis (N = 135). Moreover, for all analyses, we used two-tailed tests.

As a preparation of the mediation analysis, we first assessed the relevance of three covariates for students’ learning: prior knowledge, mathematical ability, and mathematical self-concept. These variables were included in the mediation analysis, in case a correlation with the dependent variable conceptual knowledge was found. For this, a Pearson correlation was conducted. All covariates of the model were mean-centered prior to analysis for an easier interpretation (see Hayes, 2018). To check for potential confounds, we also tested for significant differences between conditions in the covariates.

Furthermore, we checked our data for potential dependencies. As our analyses partly dealt with individual assessments of participants who collaborated in a group, dependencies between members of one group in individual variables (e.g., the mediator variable awareness of knowledge gaps and the dependent variable conceptual knowledge) could occur and threaten the validity of the analysis. That is, as dependencies could lead to higher risks of Type I error (see Kenny et al., 1998) and violate one assumption of mediation analyses (Hayes, 2018). In order to deal with the potential dependencies, we corrected the standard errors in our mediation model for clustering (see Cheah, 2009, for further information on this method).

We planned the mediation analyses based on the mediation model 82 as described by Hayes (2018). For this, as well as for the correction for clustering, we used the SPSS macro PROCESS. In PROCESS, we tested for the whole model but also included each path of the model in the results section. The analysis of indirect effects was based on 5000 bootstrap samples. We chose 95%-bootstrap confidence intervals, an alpha level of 5% for all statistical analyses and provided adjusted p-values (based on Holm-Bonferroni correction) for the mediation analysis. We provided confidence intervals for all relevant parameters and used the adjusted p-values (in the following referred to as p’) to account for alpha error accumulation as a result of potential dependencies in the data as well as the high number of tests resulting from the mediation analysis. After mediation analyses, we added further explorations of the variable quality of solutions.

Results

Preparation of mediation analysis

For the mediation model, three variables were included as mean-centered covariates, as they significantly correlated with the dependent variable conceptual knowledge, that is: prior knowledge, r(135) = 0.37, p < 0.001, 95% [0.21, 0.50], mathematical ability, r(135) = 0.57, p < 0.001, 95% [0.45, 0.68], and mathematical self-concept, r(134) = 0.46, p < 0.001, 95% [0.32, 0.59. Table 1, 2 shows the descriptive statistics for all variables that were included in the mediation analysis.

Table 1 Descriptive statistics for all variables

When testing for potential confounds before experimental manipulation, a T-test revealed that there is a significant difference in prior knowledge between conditions, t(135) = − 2.30, p = 0.023, 95% CI [− 1.92, − 0.15]. This indicates a threat of confounding. However, we can control for the potential confounding effect of the variable as it is included as a covariate in the mediation, as suggested by Hayes (2018).

Furthermore, we corrected the standard error for clustering with PROCESS in order to deal with potential dependencies that were caused by the group structure in the collaborative condition. We also addressed the threat of alpha level inflation due to the dependency by providing adjusted p-values (p’), as explained in the statistical analysis section.

Mediation analysis

In the mediation analysis, we tested the four hypotheses for conceptual knowledge (see Fig. 3 and Table 2). Hypothesis one tested the total effect c of condition on conceptual knowledge. Hypotheses two to four tested the indirect effects of condition on conceptual knowledge through each of the mediators. In the following, we report the findings for each of the hypotheses. For hypotheses two to four, we first report the indirect effects and then the direct effects of each path in the respective mediation (i.e., the effect of condition on mediator, and mediator on conceptual knowledge).

Fig. 3
figure 3

Effects of Mediation model 82 (Hayes, 2018) with three covariates that are included as antecedent variables of all mediators and the dependent variable (see Table 2 for adjusted p'-values)

Table 2 Regression results of the mediation analysis (including adjusted p'-values)

Hypothesis 1: total effect of condition on conceptual knowledge

The first hypothesis deals with the total effect of collaborative and individual problem solving (i.e., including all potential mediation effects) on conceptual knowledge scores after instruction (total effect c). It was expected that the collaborative PF condition outperforms the individual PF condition in conceptual knowledge scores. Contrary to this, the total effect of the conditions on conceptual knowledge was not statistically significant, c = 0.57, SE = 0.48, t = 1.17, p = 0.241, p’ = 1.00, 95% CI [− 0.39, 1.53]. That is, both conditions did not differ in conceptual knowledge acquisition (see Table 2 for effect values and an overview of all regression results).

The direct effect of collaboration (i.e., the effect of only collaboration and individual problem solving without potential mediation effects), similarly did not reach significance, c’ = 4.71, SE = 0.71, t = 6.67, p = 0.437, p’ = 0.294, 95% CI [− 0.90, 1.63].

Hypothesis 2: indirect effect of condition on conceptual knowledge through solution quantity ( \({{\varvec{a}}}_{1}{{\varvec{b}}}_{1}\) )

The second hypothesis tested for an effect of condition on conceptual knowledge, which was mediated through the quantity of generated solutions (\({a}_{1}{b}_{1}\)). It was assumed that students of the collaborative condition generate more solutions and the more solutions students generate, the higher is their conceptual knowledge after the instruction phase.

The bootstrap confidence interval showed no evidence for an indirect effect of the condition on conceptual knowledge that is mediated through quantity of solutions, \({a}_{1}{b}_{1}\) = 0.01, SE = 0.05, 95% CI [− 0.06, 0.14]. This indicates that the quantity of solutions did not function as a mediator to influence students’ learning based on whether they worked collaboratively or individually during problem solving.

Effects of condition on solution quantity ( \({a}_{1}\) ) and quantity on conceptual knowledge ( \({b}_{1}\) )

A more specific investigation of the individual paths of the mediation through solution quantity showed that, contrary to our expectations, collaborative or individual problem solving did not predict the quantity of solutions, \({a}_{1}\) = − 0.14, SE = 0.22, t = − 0.63, p = 0.530, p’ = 1.00, 95% CI [− 0.56, 0.29]. Likewise, the quantity of generated solutions did not significantly predict conceptual knowledge, \({b}_{1}\) = − 0.05, SE = 0.19, t = − 0.28, p = 0.780, p’ = 1.00, 95% CI [− 0.66, 0.39]. This means that collaboration did not lead to a different quantity of solutions and the quantity of solutions did not affect students’ conceptual knowledge after instruction in any of the conditions.

Effect of solution quantity on solution quality ( \({d}_{21}\) )

While we did not specifically assume that solution quantity would predict solution quality, the model did test for this effect, but did not find an effect for this relation, \({d}_{21}\) = 0.07, SE = 0.08, t = 0.90, p = 0.368, p’ = 1.00, 95% CI [− 0.08, 0.72].

Hypothesis 3: indirect effect of condition on knowledge gaps and curiosity on conceptual knowledge ( \({{\varvec{a}}}_{3}{{\varvec{b}}}_{3}; \boldsymbol{ }{{\varvec{a}}}_{4}{{\varvec{b}}}_{4}\) )

Hypothesis three dealt with awareness of knowledge gaps (mediator 2a) and curiosity (mediator 2b). We assumed that students of a collaborative condition would report higher awareness of knowledge gaps (\({a}_{3}\)), which in turn would be related to a higher reported curiosity after the problem-solving phase (\({d}_{43}\)), and higher curiosity would be associated with better performance on the conceptual knowledge test (\({b}_{4}\)).

Contrary to the hypothesis, no indirect effect of the condition on conceptual knowledge as mediated through awareness of knowledge gaps and curiosity could be found, which was indicated by the bootstrap confidence interval, \({a}_{3}{d}_{43}{b}_{4}\) = − 0.00, SE = 0.01, 95% CI [− 0.02, 0.02]. Thus, collaborative or individual problem solving did not influence the students’ conceptual knowledge based on how their reported awareness of knowledge gaps and curiosity varied.

Effects of condition on knowledge gaps ( \({a}_{3}\) ) and curiosity ( \({a}_{4}\) )

Subsequently, we examined the individual paths of the above described mediation.

The conditions did not differ in their reported awareness of knowledge gaps, \({a}_{3}\) = − 0.27, SE = 0.17, t = − 1.51, p = 0.133, p’ = 1.00, 95% CI [− 0.62, 0.08]. While we did not formulate a hypothesis on the direct effect of condition on students’ curiosity, we did not find an effect for this path as well, \({a}_{4}\) = − 0.44, SE = 0.23, t = − 1.90, p = 0.060, p’ = 0.720, 95% CI [− 0.89, 0.02]. This means that whether students collaborated or worked individually did not affect their awareness of knowledge gaps or curiosity after the problem-solving phase.

Effects of knowledge gaps on curiosity ( \({d}_{43}\) ) and curiosity on conceptual knowledge ( \({b}_{4}\) )

Lastly, we investigated the path between the serial mediators awareness of knowledge gaps and curiosity. Other than we expected, students’ awareness of knowledge gaps did not predict their reported curiosity, \({d}_{43}\) = 0.01, SE = 0.12, t = 0.12, p = 0.907, p’ = 1.00, 95% CI [− 0.22, 0.24]. Furthermore, curiosity was not associated with conceptual knowledge after instruction, \({b}_{4}\) = 0.10, SE = 0.18, t = 0.56, p = 0.580, p’ = 1.00, 95% CI [− 0.25, 0.45].

Hypothesis 4: indirect effect of condition on conceptual knowledge through solution quality ( \({{\varvec{a}}}_{2}{{\varvec{b}}}_{2}\) )

The fourth hypothesis addressed the mediation effect of the condition on conceptual knowledge through the variable quality of solutions (\({a}_{2}{b}_{2}\)). It presumes that collaboration enables students to generate solutions of a higher quality (\({a}_{2}\)) and that the higher the quality of solutions, the higher the students’ conceptual knowledge acquisition (\({b}_{2}\)).

The bootstrap confidence interval of the indirect effect of the condition on conceptual knowledge, mediated through quality of solutions, did not suggest a mediation effect, \({a}_{2}{b}_{2}\)= 0.24, SE = 0.15, 95% CI [− 0.01, 0.59]. Therefore, students’ form of problem solving does not affect students’ learning based on how quality of solution varies.

Effects of condition on solution quality ( \({a}_{2}\) )

We further examined the individual paths of this mediation. The effect of condition on quality of solution reached significance but failed to remain significant with adjusted p’-values, \({a}_{2}\) = 0.36, SE = 0.18, t = 2.03, p = 0.045, p’ = 0.585, 95% CI [0.01, 7.19]. This indicates that the effect of condition on quality might be a result of alpha error inflation and not a true effect in the population. Consequently, whether students work collaboratively or individually on the generation of solutions, does not affect the quality of the generated solutions.

Effects of solution quality on conceptual knowledge ( \({b}_{2}\) )

The individual path between quantity of solutions and conceptual knowledge initially reached significance in our mediation model, but the effect did not hold for adjusted p’, \({b}_{2}\) = 0.65, SE = 0.28, t = 2.31, p = 0.022, p’ = 0.308, 95% CI [0.10, 1.21]. Thus, while our data might present a descriptive difference in conceptual knowledge based on students’ quality of solutions, the effect of solution quality on conceptual knowledge might again only be a result of alpha error inflation and not a true effect.

Exploratory analyses

While no indirect and direct effects were found in the mediation model, we could find descriptive differences with respect to the variable solution quality in the two conditions as well as in conceptual knowledge based on generated solution quality. Even though, no effects were found, our data suggests potential relevance of the variable solution quality. In order to explore the descriptive difference and the effect of solution quality on conceptual knowledge, we conducted additional exploratory analyses.

Firstly, we examined the distribution of quality in both conditions. Table 3 gives an overview of the different levels of quality scores for each condition. The quality score represents a student’s best solution, that is, the solution that includes the highest number of conceptual components of the canonical solution (see Loibl et al., 2020). The exploration reveals that almost half of the students in the collaborative condition (41%) and less than a fourth of the individual condition (22%) reached a high-quality score of three or four. Lower quality solutions of one or no functional component were constructed with an almost equal distribution between the conditions (PF IND: 46%; PF COLL: 41.40%).

Table 3 Frequencies of quality of solutions (N = 137)

Secondly, we explored correlational relationships for solution quality with the remaining variables of the main analyses. The correlations were calculated for each condition separately to further explore the descriptive differences. The exploratory correlational analyses revealed that quality of solutions was significantly associated with awareness of knowledge gaps and conceptual knowledge after instruction for the individual condition, but not for the collaborative one (see Table 4).

Table 4 Pearson correlations of the variable quality of solutions

Discussion

In this study, we investigated the effects of collaborative versus individual problem solving in the traditional PF setting of secondary school mathematics. We hypothesized that students who engage in a collaborative PF problem-solving phase would outperform students who had worked individually. A mediation model estimated the effect of the condition (PF COLL and PF IND) on conceptual knowledge after instruction as mediated through three preparatory mechanisms in PF: prior knowledge activation (measured here through quantity of solutions), awareness of knowledge gaps, and deep feature recognition (measured here through the quality of solutions).

Contrary to our expectations, collaborative and individual problem-solving students did not significantly differ in the conceptual knowledge post-test or in any of the mediation variables, nor did a mediation reach significance. Moreover, the mediators did not predict conceptual knowledge.

Theoretically, collaboration is regarded as a vital design aspect of PF (Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021) and since several beneficial collaboration processes relate to the PF preparatory mechanisms, it would be expected to facilitate learning from instruction to a higher degree than individual problem solving. Yet, this study did not find any clear indications that support this assumption, agreeing with prior research on the comparison of collaborative and individual problem-solving conditions in PF-like settings (see Mazziotti et al., 2019; Sears, 2006; Weaver et al., 2018).

One explanation for this could be that individual problem solving prepares students for learning in the subsequent instruction similarly well as collaboration. Consequently, collaborative problem solving would not foster a more beneficial preparation. For the first preparatory mechanism, prior knowledge activation, we assumed that students in the collaborative condition would be capable of a broader prior knowledge activation, resulting in a higher quantity of generated solutions than individually working students by complementing and cueing each other’s prior knowledge (Nokes-Malach et al., 2015). However, the collaborative and individual conditions did not differ in the quantity of solutions. As discussed above, collaborative inhibition, that is, blocking of each other’s ideas in collaboration, often occurs in collaborative learning (Rietzschel et al., 2006) and could have been the reason why groups did not outperform individual students. More specifically, group members might have been hampered in their idea generation due to group processes (e.g., turn-taking, Rietzschel et al., 2006), and therefore, the collaboration condition did not generate more solutions than the individual condition. However, inhibition did not seem to be high, since no detrimental effects of collaboration occurred. As shown by Nokes-Malach et al. (2012), collaborative inhibition decreased for students with some (but not too much) relevant prior knowledge. However, their study included smaller group sizes than our study (i.e., they used dyads instead of groups of three) and as inhibition was shown to increase with group size (Rajaram & Pereira-Pasarin, 2010), students likely still faced some degree of inhibition that levelled potential positive effects of collaboration (cf. Nokes-Malach et al., 2015).

With regards to students’ awareness of knowledge gaps, we expected that the collaborative PF condition would become more aware of their own knowledge gaps due to the experience of socio-cognitive conflict when discussing erroneous solutions. Subsequently, these students would show a higher curiosity to learn more about the targeted concept, and thus, better learning after instruction than students in the individual condition. However, collaborating students did not show higher gap awareness or curiosity, suggesting that they likely failed to discuss socio-cognitive conflict. This would not be uncommon; studies have shown that if quick consensus building is favored over other discourse moves of more elaborate manner, this will be especially harmful to the development of socio-cognitive conflict (Mugny & Doise, 1978; Roschelle & Teasley, 1995). If students of the collaborative PF condition avoided conflict and only engaged with the task on a superficial level, they would likely not have become sufficiently aware of their knowledge gaps, which in turn would not have raised their curiosity and prepared them for subsequent learning. The need for joint group discussion in collaboration in PF was also emphasized by Kerrigan et al. (2021) who found that groups engaging in elaborate discussion by challenging and revising each other’s ideas, identifying key problems and knowledge gaps, showed a more successful problem-solving process and higher conceptual knowledge after instruction.

Lastly, we hypothesized students of the collaborative condition to recognize deep features of the targeted concept when they jointly resolve socio-cognitive conflict during the problem-solving phase, unlike individual students who only learn about the deep features during instruction (Loibl et al., 2017). The recognition of deep features would then be visible in higher-quality solutions and lead to better preparation for subsequent learning. The findings of this study could not clearly confirm this hypothesis. Despite a descriptive difference, we did not find a significant difference between conditions. The quality of solutions also did not function as a mediator, nor did it predict conceptual knowledge. Yet, our explorations revealed that a correlation existed between solution quality and conceptual knowledge, but only for the individual condition, not the collaborative. Taken together, these findings indicate that the generation of more high-quality solutions may prepare students for learning from instruction under certain conditions. That is, it appears to only function as a preparation for subsequent learning when students generate high-quality solutions themselves. This indicates that students need to actively understand the deep features included in that solution, instead of copying solutions of group members. The lack of a correlation between solution quality and students’ learning in the collaborative condition indicates that likely not all group members understood the deep features included in the solutions. This is another indicator that students did not engage in discussion of socio-cognitive conflict, which would have been needed in order for all group members to comprehend the deep features of the high-quality solutions and be prepared for subsequent learning. Instead, the descriptively higher number of high-quality solutions in the collaborative condition might only be a result of single group members raising the whole group’s quality score. This would line up with research on social loafing, in which only some members carry the group (e.g., Latané et al., 1979). Consequently, collaboration in PF does not seem to afford elaborate discussion sufficiently, and in turn, does not prepare for subsequent learning in a better way than individual problem solving.

Further open questions remain about the lack of mediations for the other mediator variables: even though the preparatory mechanisms are assumed to relate to students’ learning, they were not associated with learning scores after instruction. It needs to be noted that all variables presented low values in both conditions, which could have limited the preparatory effect of the mechanisms, eradicating potential effects. Both conditions generated a low number of solutions as compared to earlier studies in this setting, with on average 3.26 solutions in the individual and 3.13 in the collaborative condition (Kapur, 2014 study 1: 6.08; Loibl & Rummel, 2014a study 1: 4.13). Moreover, the descriptive results of the variables knowledge gap awareness and curiosity showed mid-scale results, meaning that both conditions neither felt particularly curious nor particularly aware of knowledge gaps after the problem-solving phase. Despite the missing effects and low descriptive results of the preparatory variables, the conceptual knowledge scores of the students in both conditions are comparable to earlier PF studies within the traditional PF setting, which used similar materials and relied on students with a similar age and prior knowledge as the present study. In the present study, students achieved an average of 44% of the conceptual knowledge scores (M = 5.97, SD = 3.14, max. 13.5 points). In Loibl and Rummel (2014a, 2014b), students in the PF conditions achieved an average of 35% to 49% of the conceptual knowledge scores, compared to 15% to 17% in non-PF, direct instruction conditions (see Table 5 for exact values). These studies did find a significant PF effect in comparison to direct instruction. The similar outcomes on conceptual knowledge of our PF condition and the ones in the Loibl and Rummel studies as well as the respective differences to the direct instruction conditions indicate that PF was successful in our study.

Table 5 Means and standard deviations of conceptual knowledge scores (max. 7 points) in PF and non-PF conditions in previous traditional PF studies with similar settings

Another reason could be that the proposed preparatory mechanisms do not explain preparation for learning in PF adequately. So far, studies investigating the quantity and quality of solutions yielded mixed findings (Loibl & Rummel, 2014a; Loibl et al., 2020; Wiedmann et al., 2012), and some studies proposed alternative operationalizations for prior knowledge activation, such as the diversity of solutions (i.e., the number of different solution approaches, Kapur, 2012). Moreover, it remains to be questioned whether students are capable of reporting their awareness of knowledge gaps.

Lastly, some limitations of this study need to be considered. As the conditions differed in prior knowledge before our experimental manipulation, it shows a risk of a potential confounding effect. While the effect of prior knowledge was controlled in our mediation model (see Hayes, 2018), there might be further unknown confounds in the sample which were not measured and thus could not be controlled for and could have blurred potential effects (Hayes, 2018). Furthermore, even though we decreased the risk of alpha error inflation with the help of adjusted p-values (p’) to account for the large number of tests that were conducted, this increases our risk of committing a type-II error. For this reason, we chose Holm-Bonferroni correction instead of the more conservative Bonferroni correction that decreases statistical power to a higher degree (e.g., Aickin & Gensler, 1996).

In conclusion, previous studies have described collaboration in PF as a vital design aspect (Kapur & Bielaczyc, 2012; Sinha & Kapur, 2021). However, our findings suggest that collaboration in PF does not show a superior preparatory effect with regards to students’ conceptual learning compared to individual students. There has been accumulating evidence that implementing collaborative and individual problem solving without any additional guidance does not influence learning outcomes differentially. However, there are also no detrimental effects of collaboration when compared to individual problem solving, other than described in brainstorming literature (e.g., Diehl & Stroebe, 1987). Consequently, PF designs might rely on both collaborative and individual problem solving without compromising learning scores. This is an important finding that suggests that when collaborative and individual problem solving are directly compared in PF, collaboration does not appear to be a critical component for learning in PF compared to individual problem solving, as has been previously proposed by Kapur and Bielazyc (2012) and Sinha and Kapur (2021).

While collaboration does not present to be vital to PF fidelity, it has to be noted that this does not imply that collaboration could not have an effect at all in PF research. Some studies suggest that to improve collaboration in PF, a higher amount of scaffolding (see Kerrigan et al., 2021) or scripting of collaboration (e.g., cognitive conflict) might be needed. Other studies added cognitive guidance to collaborative PF conditions, but this did not significantly improve students’ learning after instruction (Loibl & Rummel, 2014a). While the aim of this study, was to investigate the necessity of collaboration as a main PF design and fidelity criteria, future studies could investigate under what conditions guidance could improve collaboration in PF and facilitate students’ learning from instruction. Results of process analyses conducted as part of existing PF studies (e.g., Brand et al., 2018; Brand et al., 2021; Hartmann et al., 2016; Nachtigall & Sung, 2019; Hartmann et al., 2022a, 2022b) could serve as a starting point for the design and study of such guidance. Research on scripting of collaboration, for instance, suggest that instead of providing guidance on a cognitive or task-level, social scripts could have a more beneficial effect for students’ learning (see Weinberger & Fischer, 2006).

Furthermore, we would also suggest future studies to examine the potential of collaboration in PF beyond cognitive effects. While research in PF so far has largely focused on cognition, more recent research examined the role of affect in PF (Sinha, 2021). Affective and motivational effects of collaboration in PF might specifically target the success of implementing important design components, such as creating safe social surrounds for failure or an affective draw of the problem (i.e., situational interest) (Sinha & Kapur, 2021). In our study, we took a first look at curiosity as one possible measure of affect but there is still potential for further explorations.