Student teachers’ conceptual knowledge of operant conditioning: How can case comparison support knowledge acquisition?

Teachers need knowledge about operant conditioning as an important way to prevent student misbehavior. In an experiment with 181 student teachers, we investigated how the acquisition of conceptual knowledge about operant conditioning can be promoted through case comparisons. Our results showed that case comparison effectively supported the acquisition of knowledge about operant conditioning. Knowledge acquisition was higher with more guidance during case comparison by providing key features to be used to compare the cases. Furthermore, with more guidance student teachers learned more efficiently. In addition, higher germane load and lower extraneous load were found to mediate the effect of instructional guidance on learning. The case comparison was also associated with a shift in student teachers’ beliefs, with more appropriate beliefs about operant conditioning afterwards. Thus, the results indicate that case comparison is an effective approach to promoting the acquisition of conceptual knowledge and positive beliefs about operant conditioning.


Introduction
Case comparison is a powerful approach to promoting the acquisition of concepts (e.g., Alfieri et al., 2013). However, previous research is inconclusive regarding the question of how much instructional support learners need in case comparison, especially when complex concepts are to be acquired. Therefore, in this study, we investigated how best to support the acquisition of complex concepts through case comparison in the context of teaching. Teaching is a demanding profession (e.g., Doyle, 2006). In particular, handling classroom situations efficiently has been shown to be a major challenge for teachers (Jones, 2006). Thus, student teachers need to acquire concepts about classroom management (Voss et al., 2011) in order to flexibly adapt their behavior in such challenging teaching situations. An important aspect of classroom management is knowledge about concepts of operant conditioning, such as how to reinforce positive student behavior. However, teachers are often skeptical of the potential of operant conditioning and have a poor understanding of its principles (e.g., Dovey et al., 2017;Maag, 2001). This is the starting point of our study. We conducted an experimental study to examine how case comparison (Alfieri et al., 2013) can support the acquisition of conceptual knowledge in the context of teaching (i.e., the acquisition of conceptual knowledge about operant conditioning). First, we examined the effects of different levels of instructional guidance during case comparison on learning. Second, we studied whether the levels of instructional guidance also influenced the extent of extraneous and germane cognitive load during case comparison. Third, we investigated whether the effect of instructional guidance on learning was mediated by the cognitive load experienced during case comparison. Last, we examined whether learning through case comparison is associated with changes in student teachers' skeptical beliefs about operant conditioning.

Conceptual understanding of teaching situations as a foundation of teaching skills
From a cognitive perspective, it is important that teachers not only practice classroom management skills but also learn the conceptual aspects of a classroom situation to flexibly adapt behavior (e.g., Feldon, 2007). For example, the ACT-R learning theory developed by Anderson (1993) assumes that cognitive skills are mentally represented as production rules. The if-part of a production rule refers to a specific situation that triggers a given behavior, that is, the then-part of the production rule. Which behavior is triggered largely depends upon how a situation is conceptually understood.
In professional domains such as teaching, the concepts teachers typically need to interpret a situation are relational. Such concepts refer to relational systems in which multiple role categories each play a specific part (Gentner & Kurtz, 2005). For example, the principle of positive reinforcement is a relational system with behavior and consequence as role categories that satisfy a specified relational structure (i.e., adding a positive consequence to a behavior) such that the behavior is strengthened. Learning relational concepts is challenging for several reasons (Goldwater & Schalk, 2016;Steininger et al., 2022): First, unlike in feature-based concepts, it is not perceptual characteristics (e.g., barking as a defining characteristic of the concept dog) but relational similarities between situations (e.g., a positive consequence is added to a behavior in both situations) that are critical to understanding relational concepts. Second, in addition to knowing which role categories are part of a relational structure (e.g., behavior and consequence), learning relational concepts requires the acquisition of a rule (i.e., a principle) specifying the relationship between the role categories (e.g., if the consequence is negative, then its removal strengthens the behavior). Thus, learning relational concepts is usually working memory-intensive. Third, relational concepts occur in a specific context. However, the superficial characteristics of the context (e.g., whether the student who receives positive reinforcement is male) are usually not relevant for learning relational concepts and might even be distracting. Therefore, when learning relational concepts, it is necessary to abstract from the surface features of a context.

Case comparison
One way to promote the acquisition of relational concepts is to study multiple examples, that is, instantiations of a relational concept (e.g., Gentner et al., 2003;Higgins, 2017). Learning with multiple cases prompts learners to develop a conceptual understanding by making analogies. More specifically, following the process model for learning through case comparison developed by Alfieri et al. (2013), learners effortfully (a) search for commonalities and differences between cases (e.g., how is Case 1 similar to Case 2 and how do they differ?), (b) align the target features of the cases (e.g., what features make the cases similar?; Schwartz et al., 2011), and (c) acquire a reduced representation of the cases without extraneous details of single cases (e.g., what are the key features of the concepts?; e.g., Rittle-Johnson & Star, 2011). In doing so, learners arrive at a schematic representation that includes the role categories and the relations between the role categories, but does not include information unique to single cases. Thus, the resulting representation enables learners to flexibly transfer their knowledge to future cases by removing the case-specific details of the new contexts (Schwartz et al., 2011).

Instructional guidance during case comparison
In their meta-analysis, Alfieri et al. (2013) found that case comparison generally facilitates learning. Thus, comparing simultaneously presented cases is more effective than learning with single or sequentially presented cases (d = 0.50). Recent findings are in line with this meta-analytic evidence (e.g., Hancock-Niemic et al., 2016;Schalk et al., 2020). However, how to design case comparison most effectively remains an open question.
Usually, case comparisons can differ with regard to the level of instructional guidance. First, case comparison is often accompanied by information about the concepts being learned (e.g., a definition). This information can be provided before or after case comparison (Alfieri et al., 2013). Defining the principles before case comparison provides information about how to link the cases based on their underlying concept. Therefore, receiving information about principles before case comparison can facilitate the search for commonalities and alignment processes during case comparison (Star & Rittle-Johnson, 2009). When learning relational concepts, understanding the relational structure is a prerequisite for knowing which differences between cases can be aligned and which cannot (Higgins, 2017). To successfully learn about operant conditioning, this would mean that student teachers need information about each principle (e.g., negative reinforcement is defined as removing a negative consequence to strengthen a behavior), before they can benefit from comparing cases displaying different principles of operant conditioning (e.g., negative reinforcement and negative punishment).
Second, prompts are usually given during case comparison (e.g., Higgins, 2017). These prompts can differ in how specifically they elicit comparison processes. General prompts simply ask learners to study or compare the cases. Specific prompts instruct learners to compare the cases with regard to key features that are crucial for understanding the concepts. When such key features are provided, learners receive explicit labels that they can use to identify commonalities and align cases (Roelle & Berthold, 2016). Thus, specific prompts should facilitate the search for commonalities and the alignment of the cases (Alfieri et al., 2013;Higgins, 2017).
Despite the potential benefits of receiving information about concepts before case comparison, Alfieri et al. (2013) found that receiving information after instead of before case comparison supported learning more effectively. In addition, providing key features did not moderate the learning effects obtained in the studies included in their meta-analysis. However, in this meta-analysis, it cannot be ruled out that the effect of instructional guidance was confounded with other features such as academic domain or complexity of the cases. For example, Alfieri et al. included studies on different types of learning content such as perceptual, conceptual, or procedural content. Different types of learning content are assumed to vary in complexity and, consequently, in ease of learning and the beneficial effect of instructional guidance. Thus, varying the level of instructional guidance for the same learning material is needed to learn more about the pure effects of features of instructional guidance. Roelle and Berthold (2016), for instance, implemented such a design: They varied the specificity of the prompts during case comparison in a sample of eighth-grade students learning relational concepts and found more specific prompts were more beneficial to learning than more general prompts. In our study, we varied the specificity of the prompts and whether the information on the concept was provided before or after case comparison among student teachers learning relational concepts about operant conditioning. Thus, the complexity of the concepts and domain were kept constant within our study to avoid a potential confounding effect.
Furthermore, Alfieri et al. (2013) concluded in their meta-analysis that cognitive load during case comparison should be investigated to understand the mechanisms underlying learning through case comparison with different levels of instructional guidance. The cognitive load experienced during case comparison might play a role in understanding how different levels of instructional guidance influence knowledge acquisition. According to cognitive load theory (Sweller, 2010), instruction imposes three types of cognitive load on learning. Intrinsic load is determined by the internal complexity of the learning content and a learner's prior knowledge. It can be reduced by matching the learning materials to a learner's prior knowledge (Kalyuga, 2009); it cannot be reduced through instructional manipulations (Paas et al., 2003). Extraneous load refers to features of instruction that are detrimental to learning, such as a poor explanation. Thus, extraneous load can be reduced through instructional manipulations. Intrinsic and extraneous load are experienced passively (Klepsch & Seufert, 2021) and can be contrasted with germane load. Germane load is the active cognitive engagement employed by the learner to master the task demands (Klepsch & Seufert, 2021). To emphasize the active character of the germane load, Kalyuga (2011) also speaks of germane resources. The investment of such resources can be promoted by instructional features.
In case comparison, defining principles before case comparison and providing key features should reduce extraneous cognitive load and enhance germane load (e.g., Paas et al., 2003). Reduced extraneous cognitive load should free up working memory capacity, which can in turn be used to acquire new schemas (e.g., Paas et al., 2003). Thus, a higher germane load and a lower extraneous load should mediate the positive effect of more instructional guidance during case comparison.

Classroom management as a challenging core practice for beginning teachers
A core practice of teachers in which a conceptual knowledge base is of particular importance is classroom management. Classroom situations are challenging for beginning teachers (Tynjälä & Heikkinen, 2011): Many people are involved, several events happen simultaneously and at a rapid pace, and situations often take unpredictable turns. To proactive prevent students' misbehavior, it is necessary to closely monitor student behavior, deal with multiple events at the same time, run the classrooms smoothly, and establish clear rules und routines (Kounin, 2006). Behavior modification approaches focus on changing students' misbehavior in the classroom (Bear, 2015). To do so, these approaches rely on operant conditioning as a basic type of associative learning in which the consequences of a behavior modify its strength (e.g., Bear, 2015).
In operant conditioning, the probability of a behavior increases or decreases depending on whether its consequence is perceived as pleasant or unpleasant. If the consequence is perceived as pleasant, the learning process is known as reinforcement. Reinforcement strengthens behaviors by adding a pleasant consequence (positive reinforcement) or removing an unpleasant stimulus (negative reinforcement). If the consequence following the behavior is perceived as unpleasant, the learning process is called punishment. Punishment decreases the likelihood of a behavior by adding an adverse stimulus (positive punishment) or removing a pleasant stimulus (negative punishment). Reinforcement is considered a main strategy to increase engagement and prevent misbehavior (Bear, 2015;Little & Akin-Little, 2003). Operant conditioning has been shown to effectively modify student behavior in the desired direction (e.g., Marzano et al., 2009). However, many teachers believe that operant conditioning strategies are ineffective or even detrimental. In addition, research shows that, despite their skeptical beliefs, teachers do in fact use strategies such as punishment in teaching (Dovey et al., 2017;Little & Akin-Little, 2003). One reason for this discrepancy between beliefs and behavior is a lack of knowledge about the principles of reinforcement and punishment (e.g., Maag, 2001). For example, teachers might have an aversion to the word punishment itself and difficulties distinguishing between positive and negative punishment (e.g., Dovey et al., 2017). Furthermore, people often do not have a clear understanding of the differences between punishment and negative reinforcement (DeBell & Harless, 1992). Thus, it is important to support teachers' acquisition of knowledge about operant conditioning.

Change in skeptical beliefs about operant conditioning
Despite the importance of behavior modification approaches for successfully managing classroom situations, many teachers hold skeptical beliefs about operant conditioning strategies and often reject such strategies (e.g., Dovey et al., 2017, for punishment). Beliefs are assumed to serve as filters for interpreting new experiences and, thus, are thought to be relatively stable and resistant to change (e.g., Skott, 2015). Consequently, deep and systematic processing is needed to permanently change teachers' skeptical beliefs (e.g., Gregoire, 2003;Kleickman et al., 2016;Lunn Brownlee et al., 2017). Gregoire (2003) developed a model to explain lasting changes in teachers' beliefs based on conceptual change theory (e.g., Vosniadou et al., 2007). She suggested that effortful systematic cognitive processing is necessary to change beliefs. Empirical evidence supports this model, showing that instruction that fosters deep processing can change teachers' beliefs (e.g., Gill et al., 2004). Instructional approaches that have so far been shown to successfully foster deep processing include refutation texts (e.g., Prinz et al., 2019), expert scaffolding (Kleickmann et al., 2016), and collaborative discursive learning (Decker et al., 2015). Previous research on case comparison has shown that learning with cases goes beyond a surface representation of categories and leads to deeper relational understanding (e.g., Christie & Genter, 2010). Therefore, it can be assumed that case comparison would stimulate deep processing and thus can be powerful for changing teachers' skeptical beliefs.

The present study
In our study, we examined how case comparison should be effectively implemented to promote the student teachers' acquisition of conceptual knowledge about the principles of operant conditioning. We also investigated what type and level of cognitive load student teachers experienced during processing. We experimentally varied the degree of instructional guidance during case comparison by presenting theoretical input about operant conditioning principles before or after case comparison and providing or withholding key features useful for case comparison.

Hypothesis 1: conceptual knowledge
We hypothesized that instructional guidance during case comparison would support the acquisition of conceptual knowledge about operant conditioning. Specifically, we expected to find higher conceptual knowledge when principles were defined before case comparison (vs. after case comparison) and when key features for case comparison were provided (vs. not provided).

Hypothesis 2: cognitive load
We hypothesized that more guidance during case comparison (input before and key features provided) would be associated with lower extraneous load and higher germane load.

Hypothesis 3: mediation
Over and above this effect of guidance on extraneous and germane load, we expected that the lower students' extraneous load and the higher their germane load, the more knowledge about operant conditioning student teachers would obtain.

Hypothesis 4: student teachers' beliefs
We hypothesized that student teachers would, on average, have more positive beliefs about operant conditioning after case comparison than before. We also expected that the more student teachers learned about the concepts (i.e., the more successful the deep processing of the cases), the more strongly they would change their skeptical beliefs.

Sample
Based on the prior research (e.g., Alfieri et al., 2013;Kleickmann et al., 2016;Roelle & Berthold, 2016), we expected medium to large effect sizes. To determine the required sample size, we calculated a priori power analyses (alpha = 0.05, power = 0.80; effect size = 0.25). The results indicated: a sample size of N = 124 would be sufficient to detect a medium effect for the tests of conceptual knowledge as dependant variables (correlation among the tests = 0.50) and a sample of N = 156 would be required to detect a medium effect for the change in student teachers' beliefs (repeated measurement analysis of variance).
In total, N = 181 student teachers from a German university participated in the experiment (59% female). Their mean age was 21.15 (SD = 3.73). Most of the participants were at the beginning of a bachelor's program in teacher education (semester of studies: M = 2.49, SD = 3.74). All participants were provided information about the study and volunteered to participate.

Design
Following a 2 × 2 factorial design, we varied whether the principles of operant conditioning were defined before or after case comparison (timing of input) and whether or not key features for the comparison were provided (provision of key features). The participants were randomly assigned to the four experimental conditions: (a) principles of operant condition presented before case comparison and key features provided (input before-key features provided, n = 47), (b) principles of operant conditioning presented before case comparison and key features not provided (i.e., participants received only the general prompt to compare cases, input before-key features not provided, n = 48), (c) principles of operant conditioning presented after case comparison and key features provided (input after-key features provided, n = 41), and (d) principles of operant conditioning presented after case comparison and key features not provided (input after-key features not provided, n = 45).

Input on the principles of operant conditioning
We used a text about operant conditioning comprising 462 words. The text first introduced the general mechanism underlying operant conditioning and its two modes, namely reinforcement and punishment. Second, the four principles (i.e., positive reinforcement, negative reinforcement, positive punishment, negative punishment) were defined and illustrated with a schematic overview and an everyday example.

Case comparison
In all experimental conditions, we introduced participants to a critical classroom management situation with several challenging student behaviors. We constructed eight cases that described how a teacher applied a principle of operant conditioning to manage this situation (see Table 4 in the appendix for example items). Two such cases were paired to form a respective comparative case resulting in four pairs of comparative cases. Principles that could be easily confused were paired into a comparative case (see Table 1).
In all experimental conditions, participants compared the cases with regard to teachers' classroom management behavior. Depending on experimental condition, student teachers received a general prompt to compare the cases ("What similarities and differences can you find between the two examples?") or specific prompts ("Are the teachers applying reinforcement or punishment?" and "Are the teachers providing pleasant or unpleasant consequences?").

Pretest
We measured prior knowledge about operant conditioning with the question "Please explain briefly what is meant by operant conditioning." The advantage of this open-ended response format is that the question did not provide information about the principles of operant conditioning prior to the learning phase, as would have been the case with specific closed questions about each principle. Answers were scored using a scoring rubric that assigned 4 points to a fully correct and 1, 2, or 3 points to a partially correct explanation. 50% of the answers were independently scored by two raters with a good inter-rater reliability (ICC = 0.89).

Beliefs
We measured beliefs about operant conditioning before and after case comparison using nine items. Agreement with statements such as "Using operant conditioning strategies is useful to create a pleasant classroom climate" was indicated on a Likert-type scale ranging from 1 (strongly disagree) to 6 (strongly agree). Reliability was good: Cronbach's α = 0.73 (before) and 0.75 (after).

Learning time
Participants noted the time when they started and finished the learning phase. Based on this information, we calculated the learning time in minutes.

Cognitive load
To measure cognitive load, we used a questionnaire with eight items adapted from Leppink et al. (2013) with a 6-point Likert-like scale ranging from 1 (strongly disagree) to 6 (strongly agree). Four items assessed extraneous load (e.g., "It was challenging for me to extract the relevant information from the two cases", Cronbach's α = 0.72), and four items assessed germane load (e.g., "I focused on understanding the underlying principles of operant conditioning in the teachers' behavior", Cronbach's α = 0.72).

Posttest
To measure knowledge acquisition after the learning phase, three tests assessed different levels of understanding (Anderson & Kratwohl, 2001;Klausmeier, 1992). The first test requires to recall the definition of each of the four principles of operant conditioning. Additionally, student teachers are asked to provide two examples of each principle. The definition and the two examples were each coded as correct (1 point) or incorrect (0 points). Thus, up to three points for each principle were obtainable. We summed up all points across the four principles in order to create a definition score (maximal 12 points) . Eight examples of different classroom situations were presented in the second test. Each example needed to be classified according to one of the four categories positive reinforcement, negative reinforcement, positive punishment, or negative punishment (e.g., "A student is allowed to spend her free time in the school library because of her good cooperation in class"). Classifying the examples correctly requires an understanding of the similarities within categories and differences between categories. Each correct classification was assigned 1 point. Thus, the maximum classification score that could be obtained was 8 points.
The third test required participants to explain an authentic, videotaped situation by applying the principles of operant conditioning to this situation. The short video showed a classroom situation in which the teacher had implemented a token system combining positive reinforcement and negative punishment strategies. All four principles of operant conditioning were observable in the video example. The task was to explain the behavior of students and teachers using the principles of operant conditioning. The answers were coded with one point each for a correct explanation of each of the four principles. Thus, the maximum explanation score that could be obtained was 4 points.
Two raters independently scored 50% of answers to the open-ended questions with a good inter-rater reliability (definition ICC > 0.89, explanation ICC > 0.81).

Procedure
The experiment consisted of three phases. In the pre-experimental phase, participants answered demographic questions, completed the pretest and answered the questions about their beliefs. In the learning phase, participants read the input on the operant conditioning principles and studied the comparative cases. Depending on experimental condition, the student teachers read the input either before or after comparing the cases and received or did not receive specific prompts regarding the key features about the principles of operant conditioning during case comparison. Participants in all groups answered the questions about cognitive load directly after the case comparison. In the post-experimental phase, participants took the posttest and again completed the questionnaire about their beliefs. Overall, the experiment lasted approximately 40 min.

Statistical analyses
We conducted 2 × 2 analyses of covariance to test Hypothesis 1 (MANCOVA) and Hypothesis 2 (ANCOVA). We controlled in both analyses for the influence of prior knowledge, gender, and semester of studies by including these variables as covariates.
To address the mediation hypothesis (Hypothesis 3), we used the SPSS macro PRO-CESS. The independent variable (i.e., experimental condition) was categorical. Therefore, we conducted a multi-categorical mediation analysis in which the entire sample (i.e., participants in all experimental conditions) was analyzed simultaneously (Hayes & Preacher, 2014). We calculated 95% bootstrap percentile Cis of the potential mediation effects from 5000 bootstrap samples. We performed a repeated measurement analysis of variance to analyze the changes in student teachers' beliefs (Hypothesis 4, controlled for prior knowledge, gender, and semester of studies).

Results
Descriptive statistics for all measures are depicted in Table 2. Correlations among the scales are presented in Table 3. We found no significant a-priori differences among the four experimental conditions with regard to gender, age, or semester of studies. Furthermore, participants in the experimental conditions did not differ significantly on the pretest or on the beliefs questionnaire before learning, all F(3, 171) < 1.74, ns. The learning time significantly differed as a function of the factor provision of key features: The groups with key features needed less time (M = 12.16 min) than the two groups without provided key features (M = 16.88 min), F(3, 161) = 23.10, p < .001, η 2 = 0.30.

Hypothesis 1: conceptual knowledge
We predicted that analyzing the case comparison with more guidance (i.e., principles defined before case comparison and key features provided) would lead to a better understanding of the principles of operant conditioning than analyzing the cases with less  Table 3 Intercorrelations of the scales *p < .05 guidance (definition after case comparison and key features not provided). The results of the MANCOVA with the three measures of knowledge (definition, classification, and explanation) as dependent variables and the two factors timing of input (before or after case comparison) and provision of key features (provided or not provided) partially supported this hypothesis. We found a large and significant main effect of provision of key features, F(3, 169) = 6.533, p < .001, η p 2 = 0.10, but no significant effect of timing of input, F(3, 169) = 0.19, p = .904, η p 2 = 0.003, and no significant interaction effect, F(3, 169) = 0.32, p = .810, η p 2 = 0.01. The results of the univariate analysis of variance revealed that the main effect of provision of key features was statistically significant for the highest level of conceptual understanding, that is, explanation, F(3, 169) = 18.45, p < .001, η p 2 = 0.10, but not for the lower levels of conceptual understanding, F(3, 169) < 1.44. Hence, the experimental conditions with more guidance in terms of providing key features outperformed the other groups with regard to explanation. None of the covariates significantly influenced knowledge acquisition.

Hypothesis 2: cognitive load
To shed further light on the learning mechanisms, we examined the cognitive load experienced during case comparison. We predicted that analyzing the case comparison with more guidance would lead to lower extraneous and higher germane load than analyzing the cases with less guidance. The results of the MANCOVA with the two measures of cognitive load (extraneous and germane load) as dependent variables and the two factors timing of input (before or after case comparison) and provision of key features (provided or not provided) partially supported this hypothesis. We found a large and significant main effect of timing of input, F(2, 169) = 29.60, p < .001, η p 2 = 0.26, but no significant effect of provision of key features, F(2, 169) = 0.18, p = .834, η p 2 = 0.002, and no significant interaction effect, F(2, 169) = 0.76, p = .471, η p 2 = 0.009. The results of the univariate analysis of variance revealed that the main effect of timing of input was statistically significant for extraneous load, F(2, 169) = 49.29, p < .001, η p 2 = 0.23, and germane load, F(2, 169) = 32.73, p < .001, η p 2 = 0.61. Hence, the experimental conditions with more guidance in terms of timing of input (i.e., the experimental conditions with input before case comparison) yielded higher germane load and lower extraneous load. None of the covariates significantly influenced the level of cognitive load.

Hypothesis 3: mediation
To test the hypothesis that cognitive load would mediate the effect of instructional guidance on conceptual understanding, we performed a multi-categorical mediation analyses (Hayes & Preacher, 2014): We ran a first model to test the indirect effect via germane load (experimental conditions with more guidance should experience a higher germane load and, in turn, acquire a deeper understanding), and a second model to test the indirect effect via extraneous load (i.e., experimental conditions with more guidance should experience a lower extraneous load and, in turn, acquire a deeper understanding). For these mediation analyses, we created three dummy-coded variables with the experimental condition input after-key features not provided as the reference category (D1 = input after-key features provided vs. the other experimental conditions, D2 = input before-key features not provided vs. the other experimental conditions, and D3 = input before-key features provided vs. the other experimental conditions). The results are depicted in The results indicate that student teachers in the experimental conditions input before-key features provided and input before-key features not provided perceived a significantly higher germane load and a significantly lower extraneous load than students in the other condition. In turn, higher germane load and lower extraneous load were related to higher conceptual knowledge.

Hypothesis 4: student teachers' beliefs
A repeated-measurement analysis of variance with measurement point as the withinsubject factor (i.e., before learning and after learning) and with beliefs as the dependent variable revealed a significant main effect of measurement point, F(1, 169) = 105.16, p < .001, η 2 = 0.38, but no significant interaction with the between-subject factor experimental condition, F(1, 169) = 1.29, p = .28, η 2 = 0.02. Thus, regardless of experimental condition, beliefs about operant conditioning were more positive after learning than before learning with a large effect size. Furthermore, we found positive correlations between knowledge at the posttest and changes in beliefs. This correlation was statistically significant for the classification (r (173) = 0.15, p = .02) and explanation score (r (173) = 0.19, p < .01), but lower and not statistically significant for the definition scores (r (173) = 0.14, p = .07). This pattern of results indicates that the more knowledge about operant conditioning they acquired, the more student teachers changed their skeptical beliefs about operant conditioning. In addition, the pattern suggests that the higher the required level of relational understanding, the more pronounced was the association between acquired knowledge and changes in beliefs.

Discussion
In this study, we investigated how student teachers' acquisition of conceptual knowledge about operant conditioning could be promoted through case comparison (Alfieri et al., 2013). We experimentally varied the level of instructional guidance during case comparison and investigated the effects of this variation on knowledge acquisition and teacher beliefs. To shed light on the learning mechanisms, we examined the role of the cognitive load experienced during case comparison.

The power of case comparison for knowledge acquisition
Consistent with meta-analytic evidence (Alfieri et al., 2013), our results indicate that case comparison supports the acquisition of conceptual knowledge. Descriptively, the acquisition of conceptual knowledge about operant conditioning was high in all experimental groups (see Table 2). At the same time, student teachers acquired higher knowledge with more guidance during case comparison. More precisely, student teachers who were provided with key features produced better explanations about a teacher's behaviors shown in an authentic classroom situation than student teachers who generated key features. However, lower levels of understanding (i.e., defining principles and classifying examples) were not affected by weather key features were provided or not. Thus, providing key features was only necessary to successfully complete the most cognitively complex tasks. Further, the student teachers provided with key features needed less time for the case comparisons than student teachers not provided with key features.
The observed beneficial effects of providing key features are not in line with the metaanalytic evidence from Alfieri et al. (2013). However, there was great heterogeneity in the effect sizes between the studies included in the meta-analysis. Furthermore, almost a third of the studies included in the meta-analysis investigated the learning of concrete concepts (i.e., perceptual knowledge acquisition) and thus can be classified as less complex (Genter, 2003), while the materials and aligned cases used in our study are based on relational concepts, which are assumed to be of higher complexity. When acquiring complex concepts, learners normally focus on surface similarities between cases (e.g., Waltz et al., 2000). To counteract this tendency, key features can help learners focus on structural similarities instead (e.g., Gentner et al., 2003;Roelle & Berthold, 2016) and abstract from the surface features (Higgins, 2017). Our results support this assumption: Although many participants in our study benefited from the learning environment, student teachers who learned with key features retained some advantages.
Contrary to our hypotheses, the timing of the input did not influence learning significantly. In line with theoretical approaches involving an initial problem-solving phase followed by an instruction phase such as the productive failure approach (e.g., Kapur, 2008), engaging in case comparison before reading input about principles might prepare learners to understand the principles when they are presented to them afterwards (Bransford & Schwartz, 1999). For example, one element in the productive failure approach is that learners obtain explicit instructions after they have explored problems (Kapur, 2008). However, in contrast to the productive failure approach, we only delayed the information about the principles in our study and did not instruct other crucial elements of the productive failure approach, such as a hypothesis generation phase and instruction directly addressing the generated hypothesis (e.g., Loibl et al., 2017). Thus, our results suggest that just presenting the information about the principles after case comparison is not beneficial for acquiring conceptual knowledge.

The role of cognitive load during case comparison
We found that student teachers who received information about the principles before case comparison experienced higher germane load and lower extraneous load than student teachers who received information about the principles after case comparison. These results are in line with explicit or direct instruction approaches (e.g., Kirschner et al., 2006), which emphasize the importance of an initial instruction phase including an explicit explanation of concepts. Consequently, the results did not support delayed instruction approaches (e.g., Schwartz & Martin, 2004) that emphasize the importance of active knowledge construction through problem-solving and exploratory cognitive processes for learning. For instance, with regard to the productive failure approach (Kapur, 2008), our results on cognitive load support our conclusion that only delaying instructional information is not sufficient to foster beneficial, exploratory cognitive processing (that is, high germane load). On the contrary, germane load was lower in the groups with delayed instruction.
However, was higher germane load beneficial for learning, as postulated by cognitive load theory? A fundamental assumption of instructional design approaches is the importance of linking appropriate instructional activities to specific instructional goals (e.g., Smith & Ragan, 2005). According to Kalyuga and Singh (2016), this assumption has implications for the effects of experienced cognitive load: Different cognitive activities contribute to productive or unproductive cognitive load depending on the specific instructional goals. Consequently, the same learning activity may generate productive or unproductive cognitive load depending on the specific goal (e.g., Scott & Schwartz, 2007). In our experimental conditions with provision of principles after case comparison, germane load was lower and extraneous load was higher than when the principles were provided before case comparison. Extraneous load was unproductive (i.e., negative coefficient b in the mediation analysis; Fig. 1) and germane load was productive in these groups (i.e., positive coefficient b in the mediation analysis; Fig. 1). Introducing the principles to learners at the beginning was associated with high germane and low extraneous load in our study. This pattern of cognitive load-high germane load and low extraneous load-also proved beneficial to learning, as it was associated with deeper conceptual understanding. Thus, our results support the assumptions of cognitive load theory that a low extraneous load and high germane load are beneficial for the acquisition of knowledge (e.g., Sweller, 2010).
A possible explanation for this result could be that the student teachers who received an explanation of the principles first had to keep the principles in working memory while they compared the cases. This may have helped them align the cases by relating their key features to the principles retained in working memory. Following Klepsch and Seufert (2021), high germane load results from actively promoting learners' productive interactions with the intrinsic demands of the task (Kalyuga, 2011). This would indicate that in the experimental conditions with the principles provided prior to the case comparison, the instructional support encouraged learners to invest germane resources to build knowledge. Sweller et al. (2019), in contrast, in his recent additions to cognitive load theory, challenges the conceptualization of the three types of load and questions whether germane load should be represented as a separate type of load. He argues that germane load has a redistribution function: The germane resources that must be expended to address the extraneous cognitive load are correspondingly less available to cope with the intrinsic demands of the task. Thus, the pattern of results in our study would suggest that in the experimental conditions with principles provided after case comparison, more germane resources were needed to cope with the extraneous load, leaving less capacity to meet the intrinsic demands and develop learning-relevant schemas than in the experimental conditions without principles provided.

The power of case comparison to alter skeptical beliefs
Extending prior research on case comparison, we investigated its effectiveness for changing beliefs about the learning content. In our study, we focused on operant conditioning as a means of preventing student misbehavior (Marzano et al., 2009). The results indicate that learning through case comparison was able to change teacher beliefs independent of experimental condition. Thus, all student teachers obviously processed the cases deeply enough to develop positive beliefs about operant conditioning.
Additionally, we analyzed the associations between conceptual knowledge after case comparison and changes in beliefs and found a positive correlation: The more student teachers learned about operant conditioning, the stronger their skeptical beliefs were broken down. Interestingly, this association was more pronounced with respect to the higher levels of understanding that require deeper relational understanding (i.e., the explanation and classification scores in contrast to the definition scores). This result supports the assumption that deep processing is important for altering beliefs.

Limitations
Although a strength of this study was our application of the instructional approach of case comparison to an important aspect of classroom management, there were also some limitations. First, our tests to measure conceptual knowledge about operant conditioning at the lower levels of conceptual understanding (i.e., the definition and classification scores) were relatively easy for some student teachers, which could have resulted in limited variance in this test scores. Thus, there might have been further differences between the experimental groups as a function of the degree of instructional guidance that we could not detect with our instruments. Second, we used self-report instruments to measure cognitive load. Similar questionnaires have been proven to validly measure the types of cognitive load (e.g., Leppink et al., 2013). However, it would be important to obtain more empirical evidence on the validity of our newly developed questionnaire. Nevertheless, we delivered important findings about the mediating role of experienced cognitive load using this instrument. Future research should combine analogue measures, such as our questionnaire, with digital measures such as think-aloud protocols or eye-tracking techniques to provide further insights into this process. Third, our design did not allow us to draw conclusions about the long-term effects of case comparison. Thus, it would be important for future research to implement follow-up assessments to examine whether case comparison has the power to develop learners' knowledge about concepts in a lasting way and to permanently change beliefs. Fourth, although beliefs about operant conditioning were more positive after learning with case comparison than before, we had no control group in our study. Thus, our study design does not allow us to draw causal conclusions: We cannot conclude that the case comparison was the cause of the changes in beliefs. Therefore, it would be important for future research to investigate beliefs in designs with a control group. Fifth, it would be important to conduct further studies examining how student teachers learn to apply the principles of operant conditioning into their teaching practices. To date, there is little evidence on how to systematically promote classroom management skills during teacher education (Shank & Santiague, 2021). From a cognitive perspective (e.g., Feldon, 2007), it is assumed that conceptual knowledge forms the basis for behavioral procedures. Further empirical studies are needed to support this assumption with regard to operant conditioning.

Conclusion
This study extends research in several ways. First, it adds to evidence on the acquisition of conceptual knowledge about operant conditioning. We applied the case comparison approach to this important topic within teacher education and systematically investigated how to optimally support the acquisition of knowledge about the principles of operant conditioning. Teachers often use strategies such as positive or negative punishment without a clear understanding of the relational concepts (Marzano et al., 2009). Existing trainings on classroom management target in-service teachers' skills (Dicke et al., 2015;Evertson & Weinstein, 2006;Ophardt et al., 2017) and thus do not focus on the systematic promotion of conceptual knowledge. In our experiment, student teachers successfully acquired an understanding of operant conditioning through case comparison. Knowledge acquisition was higher when more guidance was provided during case comparison by identifying key features to be used to compare the cases. Second, cognitive load was an important factor in explaining differences in knowledge acquisition between the experimental conditions. Third, case comparison resulted in more appropriate beliefs about operant conditioning. This is an important finding in light of skeptical beliefs among teachers regarding the importance of operant conditioning for teaching success (Dovey et al., 2017;Maag, 2001).

Appendix
See Table 4. Ms. Pustel teaches the class in mathematics. To compensate for the many interruptions and the resulting missed time on task, she decides on the following: The class has to stay longer after each lesson.
Ms. Maus is the teacher for STEM. She regularly writes short tests in her lessons. Since the class is often unfocused, the students usually perform poorly. She gives the class the following option: if everyone works with concentration until the vacation, everyone's worst short-test will be dropped and will not be considered for grading. Comparative Case 4: Positive Reinforcement vs. Negative Punishment Ms. Albert teaches computer science. The students really enjoy working with tablets. If the class works well for two lessons, she promises to organize the school tablets for the next lesson.
Mrs. Haas is the class teacher and teaches the class in German. She is increasingly annoyed by the students' behavior and the fact that she is only busy admonishing them. Therefore, she cancels the admired reading evening at school, which was planned for the next month.