Introduction

Learners often rely on superficial and inefficient learning strategies that target rote learning, and that do not lead to meaningful knowledge. Meaningful knowledge is an interrelated collection of new and existing knowledge about a particular topic which is necessary to understand the learning material in the long term, and which leads to new knowledge structures (Dunlosky et al., 2013; Fiorella & Mayer, 2013, 2016). It is therefore considered important to encourage learners to process learning material through generative learning strategies. If learning is generative, learners try to make sense of the instructional material presented to them (Fiorella & Mayer, 2016). It is an interplay between selecting, organising, and integrating new and prior information. Hence, generative learning strategies contribute to effective learning because learners are actively involved in making the to-be-learned information meaningful by paying attention to important aspects of the new information, by reorganising and integrating this information with prior knowledge, which facilitates the transfer of what they have learned to new contexts (Fiorella & Mayer, 2016). The transfer of learning is the ultimate educational goal, i.e., learning beyond the initial course, task, or test (Barnett & Ceci, 2002; Lobato, 2006).

Generative learning strategies such as preparing to teach and learning by teaching on video are effective for knowledge building and have more robust effects on memory and transfer of knowledge compared to more superficial learning strategies, such as (massed) re-study (e.g., Annis, 1983; Coleman et al., 1997; Fiorella & Mayer, 2013; Hoogerheide et al., 2019a, 2019b; Kobayashi, 2019; Renkl, 1997; Roscoe & Chi, 2008). Preparing to teach means that learners study the learning material by preparing a lesson with a teaching expectancy (Kobayashi, 2019; Muis et al., 2016). This implies that learners study the material and prepare an actual lesson on paper while keeping in mind that they have to explain it at a later moment to someone else. By doing so, the construction of deeper meaning of the concepts is enhanced compared to the often used learning strategies such as re-reading or highlighting (Dunlosky et al., 2013; Fiorella & Mayer, 2016). This is because the “teacher” benefits from explaining to others because one has to select the relevant information to include in the explanation, organise it in a way that it can be understood by others, and one has to elaborate on the material by incorporating one’s existing knowledge, which leads to new knowledge structures (Duran, 2017; Fiorella & Mayer, 2016).

The learning gains of preparing to teach might be strengthened by the actual act of teaching on video to a (fictitious) audience with the goal of helping others to learn (Fiorella & Mayer, 2013, 2016; Hoogerheide et al., 2016). Teaching on video presumably evokes feelings of social presence. Social presence can be defined as the awareness of a (fictitious) audience (Hoogerheide et al., 2019a, 2019b). Feelings of social presence might in turn generate arousal which may result in subsequent better learning and transfer compared to re-study (Hoogerheide et al., 2016, 2019a, 2019b). Indeed, the “learning by teaching on video”—strategy has shown promising effects on learning and transfer across various ages, different learning materials and in various domains compared to re-studying the learning material (Fiorella & Mayer, 2013).

Learning by teaching on video has been shown to be a beneficial strategy for acquiring procedural knowledge e.g., to learn to reduce the confirmation bias through three “consider the opposite”-stages (Van Brussel et al., 2021), and acquiring problem-solving skills from step-based worked examples (Hoogerheide et al., 2019a). It is an open question as to whether learning by teaching on video supports student teachers’ learning through authentic tasks such as preparing open-minded citizenship lessons. There is no reason to believe that teaching on video does not generalize to authentic tasks. However, this generalization question has not yet been addressed. For student teachers, preparing a lesson plan and afterwards teach it, is an authentic task because it simulates the task, they have to perform during their future job (Abrami et al., 2015) Simulating teaching while learning by teaching on video might therefore be seen as an additional meaningful learning opportunity for student teachers which might enhance effective learning.

To prepare an open-minded lesson, it is necessary that student teachers have knowledge about the concept of open-mindedness, the confirmation bias, and the designing principles of open-minded lessons. A fallacy that might hinder designing an open-minded lesson is the confirmation bias (Cavojova et al., 2018; Nickerson, 1998; Schwind et al., 2012; Stanovich et al., 2016; Sternberg & Halpern, 2020). The confirmation bias refers to the finding that people tend to be selective in finding and using evidence that is consistent with their own beliefs or expectations rather than selecting and processing inconsistent information (Cavojova et al., 2018; Nickerson, 1998; Schwind et al., 2012; Stanovich et al., 2016; Sternberg & Halpern, 2020; Tversky & Kahneman, 1974). As a result, the confirmation bias can lead to one-sidedness. Perspective taking is an important element of citizenship education to avoid this one-sidedness (e.g., Abrami et al., 2015; Nickerson, 1998; Schwind et al., 2012).

When a primary education teacher prepares a citizenship education lesson that addresses a topic that might provoke discussion and one-sidedness in the classroom, e.g., racism, it is important to be open-minded. Open-mindedness is a crucial critical thinking disposition and it is defined as one’s willingness and ability to consider opposing experiences, beliefs, values, and perspectives and give these a serious, impartial consideration by setting aside one’s commitment towards one’s own experiences, beliefs, values and perspectives (Baehr, 2011; Facione, 1990; Kwong, 2016). By being open-minded during citizenship education, a teacher provides students with a good example of a consideration mode. In addition, it creates an atmosphere in which pupils feel free to express their own views and, hence, to learn about the views of others. Therefore, when preparing a citizenship education lesson, student teachers must hold the goal of an “open-minded lesson” closely in mind. That is, they have to prepare a lesson, which will allow students to express different perspectives to a social topic such as racism or sexual orientation.

Learning by teaching on video is a promising strategy to gain meaningful knowledge about the confirmation bias and perspective taking, which student teachers have to apply to prepare open-minded citizenship education lessons. In addition, this strategy may have a stronger effect on student teachers’ open-mindedness compared to learning by preparing to teach and re-studying.

The present studyFootnote 1

The central question of the present study was which instructional strategy supports student teachers best to prepare an open-minded citizenship education lesson. To address this question, teaching on video was pitted against preparing to teach, and re-study. Participants were student teachers who first completed the Actively Open-minded Thinking (AOT; Stanovich & West, 2007). After one week all participants received an instructionFootnote 2 on what open-minded lessons, confirmation bias and perspective taking are and why it is important to gain knowledge about these subjects. Subsequently, participants were assigned to one of the three conditions. Participants in the first condition, Teaching on video (TOV), processed the instructional content through preparing an explanation about the instructional content and taught this explanation in a video to a fictitious audience. Participants in the second condition, Preparing to teach (PTT), processed the instructional content through preparing an explanation of that content. Participants in the third condition, the control condition (CC), processed the instructional content by re-studying the text for 10 min. During the learning phase, feelings of social presence and arousal were measured through questionnaires in the teaching on video and preparing to teach condition.Footnote 3 After the learning phase, all participants completed the AOT for the second time. As a post-test, all participants wrote a lesson plan for a topic within the context of citizenship education lesson in primary education (e.g., racism, obesity). Finally, all participants completed a conceptual knowledge test to assess their knowledge about important concepts from the instruction, e.g., confirmation bias and open-mindedness.

Hypotheses

In the present study, the quality of the explanation of the instructional content and the open-mindedness of the lesson plan were the most important variables. We hypothesized that the quality of the explanations in the learning phase and the degree of open-mindedness of the lesson plan after the learning phase, would be higher in the group who practised through teaching on video (TOV) compared to participants who only prepared to teach (PTT). This hypothesis is based on the following line of reasoning: Participants who learn through teaching on video often experience higher feelings of social presence which induce higher arousal levels because of addressing an audience (i.e., the social presence hypothesis; Gunawardena, 1995; Hoogerheide et al., 2019a, 2019b). As a result, participants are stimulated to generate accurate explanations to ensure that the audience understands the subject. Based on the generative learning hypothesis (e.g., Hoogerheide et al., 2019a, 2019b), we thus expected that participants in the TOV condition obtained a deeper conceptual understanding of what open-mindedness is and why it is important compared to respectively participants in the PTT and control condition (CC). Compared to TOV and PTT, participants in the control condition only re-studied the instructional content. Compared to the generative learning strategies, re-studying does not involve deep processing of the to-be learned content (Dunlosky et al., 2013; Fiorella & Mayer, 2016). If we assume that deep conceptual understanding is needed to prepare an open-minded lesson, then it is reasonable to hypothesize that the accuracy and completeness of the open-minded lesson plan would be highest in the TOV condition, followed by the PTT condition, in which accuracy and completeness would be higher compared to the control condition, i.e., TOV > PTT > CC.

For conceptual knowledge, based on the idea of generative learning as outlined earlier in the Introduction, we hypothesized that participants who learned through teaching on video would perform better on the conceptual knowledge test compared to participants who only prepared to teach, and subsequently to participants who re-studied: TOV > PTT > CC for conceptual knowledge.

Furthermore, we exploratively compared mean AOT pre-test to post-test scores within and between the conditions. Because the attitudes of the participants will not change quickly during a relatively short intervention, we will have to interpret the results cautiously.

Finally, learning by teaching on video probably induces higher feelings of social presence and arousal. Therefore, we exploratively compared the TOV and PTT conditions on self-reported feelings of social presence and arousal.

Method

Participants and design

To determine the required sample size for a standard sensitivity of the test procedure (i.e., power) of 0.80 for the One-Way (single factor) ANOVA, under a significance level of 0.05, and a medium effect size (f = 0.25), we needed to test at least 159 participants according to G*Power (Faul et al., 2007). This criterion was met, because 176 Dutch student teachers (Mage = 21.60, SD = 4.99, 153 women) from six Dutch primary education teacher education institutions participated in our research. At the time of the experiment, the concepts of confirmation bias and open-mindedness were not yet taught in the curriculum. Participants gave informed consent prior to the experiment. The first 160 participants were randomly assigned to the three conditions of the experiment. While the experiment was in progress, 16 additional participants registered for participation in the study. These took part in the control condition. Therefore, the distribution of the participants was as follows: TOV (n = 51), PTT (n = 54), and CC (n = 71). Participants either received course credits or a shop voucher. The rewards were not correlated with condition. The dependent variables were the quality of the explanation (TOV and PTT) and the open-mindedness of the lesson plan, i.e., the degree to which the lesson plan contained multiple perspectives on the topic at hand, and whether it left room for an open-minded discussion. Feelings of social presence and arousal (TOV and PTT conditions only), conceptual knowledge, and the tendency towards open-minded thinking were also measured.

By pre-registering and storing all data on the Open Science Framework, we refrained from biasing the results by e.g., null hypothesis significance testing (i.e., NHST), or p-hacking. Our view is that pre-registration and open science are important ways to achieve more transparency and objectivity in science (e.g., Conlin et al., 2019; Munafò et al., 2017; Nosek & Lakens, 2014; Simmons et al., 2011).

Materials

All materials and measures were delivered in Dutch through the online Qualtrics platform (Qualtrics, 2017). In the Teaching on video condition, the actual act of teaching was recorded on participants’ smart phones and sent to the researcher via WhatsApp or e-mail. The participants could not click back to previous parts during the experiment.

Open-mindedness

The Dutch version of the Actively Open-minded Thinking scale (translated in Dutch by Heijltjes et al., 2014; Stanovich & West, 2007) was used to measure participants’ open-mindedness. This scale is aimed at measuring the level of one’s open-minded thinking. The test consists of 41 items to which participants have to respond on a 6-point Likert scale, ranging from (1) strongly disagree to (6) strongly agree. Higher scores on the AOT imply a greater tendency towards open-minded thinking. Lower scores indicate closed-minded thinking which leads to e.g., the confirmation bias in reasoning and decision making (Baron, 2008; Stanovich & West, 2007). In general, studies that use the AOT, report a high reliability of the test (For an overview, see Janssen et al., 2020, p. 2, Table 1).

Examples of AOT items are: “I believe that the different ideas of right and wrong that people in other societies have may be valid for them,” “Someone who attacks my beliefs is not insulting me personally.” Some items have to be reversed before analysis, e.g., “I tend to classify people as either for me or against me.”

Social presence

The social presence questionnaire with 10 statements (See Appendix B) was constructed by Van Brussel et al. (2021), and in their study the Cronbach’s alpha was 0.73. In the social presence questionnaire participants had to indicate on a scale from (1) strongly disagree to (5) strongly agree to which degree each of the ten statements represents how they felt after the learning phase. We used sum scores per participant: Scores run from 10 (10 × 1), to 50 (10 × 5). The higher the score, the higher the feelings of social presence.

Arousal

To measure arousal, the activitation-deactivation adjective check list (ADACL, see Appendix C) by Thayer (1967, 1986) was used. This checklist was translated to Dutch and used in a former study by the authors (Van Brussel et al., 2021). It assesses core arousal or activation states based on two dimensions (activation and deactivation) and four subscales i.e., energetic, tiredness, tension, and calmness. Each subscale consists of five adjectives. Participants rate on a four-point scale how well the adjective described their immediate feelings after the explanation phase (4 = “definitely feel”, 3 = “feel slightly”, 2 = “cannot decide”, and 1 = “definitely do not feel this way”). Per subscale we averaged the scores of the five adjectives. “Wakeful” and “wide-awake” were reversed for the Tiredness subscale. Higher scores indicated higher levels of arousal. Previous studies revealed excellent Cronbach’s alphas on all four categories (Boyle et al., 2015; Thayer, 1978). Van Brussel et al. (2021), found a Cronbach’s alpha of 0.82 on “energy”, and 0.82 on “tiredness”, but questionable Cronbach’s alphas on both “tension” (0.67) and “calmness” (0.60).

Conceptual knowledge test

The conceptual knowledge test consisted of six open questions about the content of the instruction, e.g., “Explain what the confirmation bias is.” This test was designed by the first author and was aimed at testing participants’ knowledge about the concepts that were addressed in the instruction (See Appendix A).

All measures and objectives of the measures that are used in the present study are presented in Table 1.

Table 1 Overview of the measures and objectives

Procedure

Participants were tested online because due to the restrictions of the COVID-19 pandemic in 2020, they were not allowed to attend the university building. They were called upon to work individually, focused and without disturbance. See Fig. 1 for a visualisation of the procedure.

Fig. 1
figure 1

Procedure of the experiment

One week before the experiment, participants completed the AOT for the first time. After 1 week, participants received an e-mail with the link that led to the Qualtrics platform. Participants read the assignment and its goals and by clicking to continue, they gave informed consent to use their data for the research. The experimenter was available by phone or e-mail throughout the experiment for practical questions. All participants started with studying the instruction without any further instructions on how to process the information afterwards. The text-based instruction consisted of approximately 1800 words. The instructional content concerned the concepts of critical thinking, confirmation bias, open-mindedness and perspective taking. The steps that are needed to design an open-minded lesson were described. Some examples and didactical and pedagogical suggestions (hints) were provided to further explain the importance of having an open-mind and how to prepare a citizenship education lesson on a social topic.

Next, the three groups each followed their own intervention to process the instruction.

In the teaching on video group (TOV), participants received the following instruction: “Prepare an explanation of what you have just learned and then provide this explanation to your peer student teachers who are not participating in this project. You do this by recording the explanation via the camera of your smartphone. They will watch your explanation online later. It is therefore important that you give an accurate and complete explanation.” Participants had to write down their preparation in Qualtrics before the teaching started, and they were told that their video would be used for online activities which are currently common in the Netherlands during the COVID-19 crisis to create the most authentic situation as possible.Footnote 4 The recording of the explanation was sent by e-mail or WhatsApp to the experimenter. In the preparing to teach group (PTT), participants only wrote a preparation of their explanation for their peers in Qualtrics. Their instruction was as follows: “Prepare an explanation of what you have just learned for your peer student teachers who are not participating in this project. They will read your explanation later. This information is also important for your peer students to learn to think critically. Therefore, make sure your explanation is accurate and complete. Start typing your explanation in the text box below.” Participants in the TOV and PTT conditions were called upon to set the time for this phase for 10 min. In the re-study control condition (CC), participants re-studied the instructional text for 10 min after which they were automatically forwarded to the next page in Qualtrics. The re-study instruction was: “You now have the opportunity to study the subject matter again before continuing with assignments on this subject. You have ten minutes. When the time is up, the program will automatically proceed to the next page. You are not allowed to take notes. At the bottom of the page, you can see how much time you have left.” Then these participants had to indicate how often they re-studied the instructional text: once, twice, or otherwise, showing how often they studied the instructional text.

After this, the TOV and PTT conditions completed the social presence and arousal questionnaire and all three conditions (TOV, PTT, and CC) completed the Actively Open-minded Thinking scale for the second time. Then, all participants, including the control condition, received the assignment to prepare a lesson plan for a citizenship education lesson for primary school children in 6th grade (11- and 12-year-olds) on a topic that can provoke discussion: “You will be teaching on one of the themes below. Choose a theme and write your lesson plan, be complete and accurate so that others can also teach your lesson. Be concrete: What do you say, what do you do, what do you ask? The themes are: Radicalization of young people, migration and refugees, LGBTQ + community, religion / belief, childhood obesity, mouth masks in public transportFootnote 5 and (black) Pete. Please note, there is a minimum number of characters that you must use. If you cannot click to the following page yet, you will have to explain more.” In their lesson plan, participants were supposed to show open-mindedness by taking perspective on the chosen topic through considering opposites and alternatives to their own point of view without giving the explicit instruction to do so. The minimum amount of characters was 1250. Participants were not able to continue until they reached that amount of characters. This was to prevent participants from rushing the assignment. None of the conditions had the instructional text at hand.

All participants then made the conceptual knowledge test. After the test, they were asked for their prior education, sex, and age. Finally, participants were thanked and given the opportunity to receive a summary of the research results.

Data analysis and results

Based on our data analysis plan of the pre-registration, we checked variables on missing data and outliers. There were no missing data. Prior to running a statistical test, we checked for multivariate and univariate outliers. We ran and reported all analyses with and without outliers, but outliers—if any—did not change our results. For the AOT, the conceptual knowledge test and the social presence and arousal questionnaires, a minimal Cronbach’s alpha level of 0.70 was set as a threshold for the analysis of a sum score or average score. In all analyses below, a significance level of 0.05 was used as a threshold for statistical significance. Eta-squared (η2) is reported as measure of effect size for the ANOVAs for which 0.01 is considered small, 0.06 medium, and 0.14 large.

We asked TOV and PTT participants to set the time for 10 min in which they have to prepare and teach (TOV) or prepare (PTT). Qualtrics automatically reported when TOV and PTT participants clicked through to the following page (namely the social presence questionnaire). In the control condition, the page automatically proceeded after 10 min. However, the actual time spent on reading and learning to prepare (PTT) or reading and learning through teaching video (TOV) somewhat deviated from the suggested time. We observed five outliers at the upper side of the reading time for the instructional text. Without these outliers, the mean time spent on reading the instruction was 9.45 min (SD = 11.78). TOV participants spent on average 37.05 min (SD = 116.64) to prepare and teach. PTT participants spent on average 8.78 min (SD = 3.97) to prepare. Analyses without the time-on-task outliers in the TOV condition on dependent variables did not influence the results, therefore we decided to include the data of all participants. A One-Way ANOVA on the time-on-task did not reveal a significant difference between the means of the TOV and PTT conditions: F(2,102) = 0.645, p = 0.527. In addition, we analysed the correlation between the time-on-task of the preparation phase collapsed across the TOV and PTT conditions and the final test results. The results show no significant correlation between the time-on-task and the accuracy of the lesson plan, r(98) = 0.165, p = 0.101, nor on the completeness of the lesson plan, r(98) = 0.155, p = 0.124. Furthermore, the correlation between the time-on-task and the results of the conceptual knowledge test was also not significant, r(105) = 0.085, p = 0.38. Our results show that the time-on-task differences between the TOV and PTT condition were due to outliers: without these outliers the time-on-task means were similar for the TOV and PTT conditions. Moreover, the mean time-on-task was comparable to the time allotted to participants in the re-study condition. In the control condition, 27% of the participants indicated that they re-read the text once, 61% re-read the text two times and 12% reported ‘other’ (e.g., 3 times, or 1.5 times). Furthermore, and crucially, the outcomes of our correlation analysis show that time-on-task is not a confound in the current study.

Analysis of open-mindedness

Negatively formulated AOT items were reversed as indicated on the test form by Heijltjes et al. (2014). The AOT was individually scored on both test moments: For each participant we calculated the mean scores of the 41 items. The initial measurement had a low Cronbach’s alpha of 0.53. The test was reliable on the second measure: Cronbach’s alpha = 0.83. To determine whether there were differences between the three groups, we conducted a 3 (Condition) × 2 (Pre-test, Post-test) Mixed ANOVA.

Analysis of the explanation and lesson plan

The explanation (TOV, n = 51 and PTT, n = 54), and the open-minded lesson plan (TOV, PTT, and CC, n = 71) were analysed. An independent rater and the first author scored 30% of the explanations and 28% of the lesson plans to check for the reliability of the scoring method. The interrater reliability was “substantial” according to the interpretation of Cohen’s kappa (Cohen, 1960; Landis & Koch, 1977): Completeness explanation: κ = 0.73, accuracy explanation: κ = 0.73, completeness lesson plan: κ = 0.71, and accuracy lesson plan: κ = 0.79. Therefore, one rater scored the remainder of the texts and these results were used in the analyses.

The quality of the explanations and the open-mindedness of the lesson plans were operationalized by scoring items on completeness and accuracy. See Appendix D for the scoring forms. The explanation (TOV and PTT) was firstly assessed on the presence of six concepts that were addressed in the instruction (e.g., explaining confirmation bias). This resulted in a completeness score. Next, the explanation of the concepts present, were scored on the accuracy of the explanation (e.g., the explanation of the confirmation bias was accurate).

The lesson plan (TOV, PTT, and CC) was firstly assessed on whether the instructional content was incorporated in the lesson. One of the four items was, for example, “The explanation shows that the teacher presents multiple perspectives with regard to how people can think about the theme (= applying perspective taking to avoid the confirmation bias/showing open-mindedness).” This led to a completeness score. Subsequently, the accuracy of the present incorporated content was scored.

Completeness was measured through the presence (1 point) or absence (0 points) of concepts (e.g., a participant earns 1 point if the explanation of open-mindedness is present). For each accuracy item, the answer rate was correct (1 point) partly correct (0.5 points) or incorrect (0 points). For example, when there were missing elements in the explanation of open-mindedness, the participant earned 0.5 points. The approach for this analysis was based on Hoogerheide et al., (2019a, 2019b) and Van Brussel et al. (2021). The maximum score for the explanation was 10 points per category, and for the lesson plan 4 points per category. For the analyses, we conducted One-Way ANOVAs.

Beyond our pre-registration, we exploratively conducted an overall evaluation of the quality of the lesson plan because retrospectively, in our view, the pre-registered scoring was quite narrowly focused on the exact instructional content. Therefore, it probably left elements underexposed that indicated that a participant learned to prepare an open-minded lesson. The first author, who is an experienced teacher educator and assessor, scored the lesson plans blinded for condition on the following criteria: (1) In general, the lesson is aimed at stimulating open-mindedness towards the topic (i.e., open-mindedness as defined in the Introduction), (2) The content of the lesson plan shows that elements of the instruction are applied in the lesson plan, and (3) Teaching or working methods that contribute to open-mindedness are described. The lesson plans (n = 176), were scored based on the Dutch rating system in which assessment scores range between 1 (very insufficient) and 10 (excellent). We conducted a non-parametric Kruskal–Wallis Test on the overall quality scores.

Analysis of the social presence questionnaire

The analysis for this questionnaire was based on the approach by Van Brussel et al. (2021). Items 4, 9 and 10 were reversed before analysis. To explore any differences on feelings of social presence between the two conditions, an independent samples t-test was conducted on the sum scores per participant.

Analysis of the arousal questionnaire

The ADACL was used to investigate whether arousal level differences between the conditions could explain different effects of the two instructional strategies. “Wakeful” and “wide-awake” were reversed for the Tiredness subscale. For the Tension subscale, after deleting item 6, the Cronbach’s alpha was 0.83. After deleting the first item, the Energetic subscale showed a Cronbach’s alpha of 0.76. However, the other two subscales were not reliable: For Tiredness, even after deleting item 3, Cronbach’s alpha was 0.65, and for Calmness the Cronbach’s alpha was only 0.61 after deleting item 14. Therefore, explorative analyses of these subscales were conducted on individual items. Mean scores per subscale per participant were calculated. For the Tension and Energetic subscales and the individual items of the subscales Tiredness and Calmness, we conducted independent samples t-tests to determine differences between conditions.

Analysis of the conceptual knowledge test

For each of the six items of the conceptual knowledge test, a maximum score of six points could be obtained. Participants earned 1 point (accurate), 0.5 point (partly accurate) or no points (wrong answer). The test was, however, not reliable: Cronbach’s alpha = 0.41. Therefore, contrary to what was pre-registered, we conducted explorative One-Way ANOVAs on the individual items.

Results for open-mindedness

We conducted a 3 (Condition: TOV, PTT, CC) × 2 (Pre-test vs. Post-test) Mixed ANOVA with Condition as a between-subjects factor on the AOT scores. The initial measurement had a low Cronbach’s alpha of 0.53. The test was reliable on the second measure: Cronbach’s alpha = 0.83. In Table 2, the mean individual scores and SD’s of both test moments are presented. We found a main effect of Test Moment: F(1,173) = 3515.33, p < 0.001, η2 = 0.95: All participants scored higher on the second measurement compared to the first. However, we found no main effect of Condition: F(2,173) = 0.049, p = 0.952. Also, no interaction effect was found; F(2, 173) = 0.19, p = 0.824 and η2 = 0.002.

Table 2 Mean individual item scores and standard deviations of the actively open-minded thinking tests per condition

Results for the quality of the explanation and lesson plan

The 105 participants in the TOV and PTT conditions prepared an explanation after the intervention as part of the instructional strategy. All 176 participants prepared an open-minded lesson plan as a post-test. See Table 3 for the relevant descriptive statistics.

Table 3 Mean scores and standard deviations, per condition for the explanation and the lesson plan

The explanation (TOV and PTT)

The conditions neither differed significantly on completeness: F(1,103) = 0.71, p = 0.40 with a small effect size of η2 = 0.007, nor on accuracy: F(1,103) = 0.12, p = 0.73, again with a small effect size of η2 = 0.001. Two outliers for accuracy were detected but running the analyses without them did not yield other results: F(1,101) = 0.049, p = 0.825, η2 = 0.001.

The lesson plan (TOV, PTT and CC)

All participants wrote a lesson plan as a post-test and there were no outliers. Participants chose one of the given topics for the lesson plan: Radicalization of young people (1.14%), Migration and refugees (5.47%), Religion (6.31%), Childhood obesity (12.36%), LGBTQ + community (21.06%), Mouth masks in public transport (22.33%), and Black Pete (31.35%). Participants who chose Black Pete scored highest on completeness and accuracy, whereas participants who chose Radicalization of young people scored lowest on both variables.

There were significant differences between the conditions on the completeness of the lesson plan, F(2,173) = 3.32, p = 0.039, η2 = 0.037, and for accuracy, F(2,173) = 5.05, p = 0.007, η2 = 0.055. As a follow up, a planned Helmert contrast was performed. The contrast showed that the completeness of the lesson plan in the TOV condition was not significantly lower with a contrast estimate of − 0.33 (SE = 0.22), p = 0.133, compared to the combined completeness score of PTT and CC. A contrast estimate of − 0.47 (SE = 0.24), p = 0.050 showed that PTT participants scored lower on completeness than CC participants. For accuracy, the contrast showed that the lesson plan in the TOV condition was not significantly lower with a contrast estimate of − 0.36 (SE = 0.20), p = 0.073 compared to the combined accuracy score of PTT and CC. A contrast estimate of − 0.54 (SE = 0.22), p = 0.014 did reveal that PTT participants scored lower on accuracy compared to CC participants. In sum, participants who re-studied the instruction delivered a more complete and accurate lesson plan compared to participants who processed the instruction through preparing and to participants who taught on video.

For the analysis of the overall quality, the mean score for the TOV lesson plans was 7.10 (SD = 2.20), for the PTT lesson plans 7.00 (SD = 2.05), and for the CC lesson plans the mean score was 6.87 (SD = 2.08). The results showed however, that the three groups did not differ significantly from each other on the quality of the lesson plan (H (2) = 0.311, p = 0.856). See Table 4 for percentages per condition per ordinal scale: insufficient, sufficient, good, and excellent.

Table 4 Scores of overall quality of the lesson plan per condition as percentages per scale

We analysed whether topic choice and condition were associated by conducting a Chi-square test. This test failed to reveal a significant association between the choice of topic and condition, χ2(12, n = 176) = 5.11, p = 0.954. Hence, topic choice cannot explain the results we found on the dependent variables. The choice of topic does not vary between conditions and therefore does not offer an alternative explanation for the results.

Results for social presence

To achieve the minimum required Cronbach’s alpha level of 0.70, items 3 and 10 were deleted, which resulted in a Cronbach’s alpha of 0.70. The analysis was conducted with the subset of eight items. The mean score for the TOV participants on the subset of the social presence questionnaire was 29.24 (SD = 4.51), and for the PTT participants 30.56 (SD = 4.51). This difference was not significant: t(103) = − 1.50, p = 0.137.

Results for arousal

See Table 5 for the descriptive statistics of the ADACL subscales for the TOV and PTT condition. For the Tension subscale, after deleting item 6, the Cronbach’s alpha was 0.83. After deleting the first item, the Energetic subscale showed a Cronbach’s alpha of 0.76. However, the other two subscales were not reliable: For Tiredness, even after deleting item 3, Cronbach’s alpha was 0.65, and for Calmness the Cronbach’s alpha was only 0.61 after deleting item 14. Therefore, explorative analyses of these subscales were conducted on individual items.

Table 5 Mean scores and standard deviations of the subscales of the ADACL

An independent samples t-test showed that for the subscale Tension, a significant difference with higher mean scores for the PTT participants was found: t(103) = − 2,541, p = 0.001. No significant difference was found for Energetic t(103) =  − 1.461, p = 0.147. Conducting the analyses without outliers, did not yield other results. For the subscales Tiredness and Calmness, the analyses on individual items only revealed a significant difference between TOV and PTT on item 14 “still” of the Calmness subscale; t(103) = 3.636, p < . 001 with a higher mean score for TOV (M = 3.35, SD = 0.98) compared to PTT (M = 2.61, SD = 1.11).

Results for the conceptual knowledge test

Since the test was not reliable (α = 0.41), results are presented on item level in Table 6. We conducted explorative One-Way ANOVAs which revealed significant differences on item 3: F(2,173) = 4.723, p = 0.010, and item 5: F(2,173) = 13.776, p < 0.001. To find out which conditions differed, independent t-tests were conducted on these two items.

Table 6 Conceptual knowledge test scores and standard deviations per proportion correct per item per condition

Discussion and limitations

The aim of this experiment was to examine whether the instructional strategy affects the open-mindedness of student teachers’ lesson plans on social topics within citizenship education at primary school. An instruction on confirmation bias, perspective taking and open-mindedness, and the role of these concepts when preparing an open-minded lesson formed the basis for three instructional strategies: Teaching on video, Preparing to teach, and Re-study. In contrast to our hypothesis, the results showed that teaching the instructional content to a fictitious peer audience did not lead to more complete and accurate explanations compared to only preparing. As a post-test, participants in the control condition designed an open-minded lesson plan that was more accurate and more complete compared to the other two conditions. An analysis of the overall quality of the lesson plan revealed no significant differences between the three conditions. All participants showed progress on the Actively Open-minded Thinking scale after the intervention. We will discuss these findings and limitations of this study.

We expected that participants in the TOV condition would score higher on social presence compared to PTT participants. There were, however, no significant differences between the conditions on feelings of social presence. A potential explanation might be the timing of the measurement. We measured feelings of social presence after the actual act of preparing and teaching which may not have represented the feelings of social presence during the task and is a limitation of the current study. Therefore, feelings of social presence might have faded away because the actual act of teaching was over. Other measures such as wristbands that measure a direct change in the electrical resistance or temperature of the skin caused by e.g., arousal (e.g., Biocca & Harms, 2002; Cui, 2013; Gunawardena & Zittle, 1997; Hoogerheide et al., 2019a, 2019b), or counting the number of personal references in the explanation (e.g., Jacob et al., 2021; Hoogerheide et al., 2016; Lachner et al., 2018) might have yielded other results. For follow-up research, it is interesting to use both types of measurements and to measure feelings of social presence during the intervention.

Another explanation as to why the TOV and PTT conditions did not differ, might be that participants did not receive enough cues indicating that a peer would actually watch their video. Therefore, they might not have really believed that peers would watch their explanation. This might have attenuated their feelings of social presence and arousal and hence, this might have worked against an additional effect of teaching on video relative to preparing to teach. However, in the study by Hoogerheide et al., (2019a, 2019b), which used a similar instruction in the TOV condition as we did, and in which a real audience was also absent, participants in the TOV condition did show higher levels of arousal than participants in the control condition who only studied worked examples. Hence, it is not evident that the cues that we used in our instruction were ineffective in leading participants to believe that their videos would be used for a real audience.

As a post-test, all participants prepared an open-minded lesson on a social topic. The results were, however, not consistent with our predictions, because participants who re-read the instruction created a more complete and accurate lesson plan compared to the participants in the TOV and PTT condition. An explanation of this result might be that the TOV and PTT participants experienced more mental effort because they had to retrieve the information of the instruction from their working memory during the learning phase in which they prepared, or prepared and taught an explanation about the instructional material (Paas et al., 2003; Van Gog et al., 2015). Participants in the control condition, did have the learning material at hand to re-study. In the teaching on video study by Hoogerheide et al., (2019a, 2019b), participants had to teach a worked example to peers with the example at hand. They reported higher effort investment compared to participants in a control condition who had to study the example. Teaching the example in that study also led to better scores on a post-test with problem solving tasks, which is an indication that their perceived effort investment was beneficial for learning. In our study however, TOV and PTT participants did not have the learning material at hand when they processed the instruction, compared to the re-study control condition. TOV and PTT participants had to retrieve the learning material from their working memory during the learning phase. Therefore, the re-study control condition, who had the instructional content at hand during the learning phase, probably scored better, at least on the immediate post-test that we administered.

Another explanation of the absence of performance differences might have been the quality of the explanations of the TOV and PTT participants. This quality was not particularly high in both conditions in both experiments. Effects of self-explanation on learning and performance are contingent on a sufficiently high quality of self-explanations. It might be possible, that the level of understanding participants reached after instruction and practice tasks was not high enough to allow for beneficial effect of teaching on video and preparing to teach to emerge (e.g., Jacob et al., 2021). In sum, it is still not fully clear when increased social presence is desirable or not for learning (Oh et al., 2018), especially in the context of generative learning for novices (Jacob et al., 2021). Further research is needed to examine when and why feelings of social presence are beneficial for learning, and in which learning contexts.

Retrieval practice, sometimes referred to as testing, is a strategy in which learning is enhanced by asking a learner to retrieve information from memory (Agarwal & Roediger, 2018). In our study, during the practice phase, TOV and PTT were in fact combined with retrieval practice and retrieval practice typically reveals its positive effect on performance after a longer term (Rawson et al., 2013; Roediger & Karpicke, 2006, but see Van Gog et al. (2015) for contrasting findings). Retrieval practice might explain why we did not find an advantage of TOV or PTT on the main dependent variables. This is potentially consistent with other studies in the literature. For example, in the study by Hoogerheide et al., (2019a, 2019b), in which the TOV participants had a worked example at hand during teaching, TOV scored better on a final test compared to a control condition in which the example was re-studied. Furthermore, Jacob et al. (2021), also showed that a retrieval practice control condition scored lower compared to a written or oral practice condition in which participants had the instructional materials at hand. Also, Lachner et al. (2018), did not find any differences between a retrieval practice condition and explaining conditions where participants were allowed to consult the instructional materials. The latter finding suggests that retrieval practice might be the effective ingredient in strategies that involve (self-)explanation. It might be interesting for future studies to investigate whether this is true or whether the combination of retrieval practice and (self-)explanation has an effect that is larger than the effect of any of the constituent components. However, in the short term, immediately after processing the learning material, often no differences are found between retrieval practice and a more superficial learning strategy such as re-studying (Hoogerheide, Vincent et al., 2019). Therefore, it might be possible that we would have found a positive effect of TOV and PTT relative to the control condition, if we had administered a delayed test, for example 1 week after the instruction. All in all, whether or not retrieval practice was the decisive factor and whether retrieval practice in combination with teaching on video or preparing to teach has additive effects, is still an open question and might be of interest for future research.

All participants became more aware of the importance of actively searching for opposing evidence against one’s own beliefs and the ability to weigh the available evidence fairly, as measured by the AOT. Higher scores on the AOT are positively related to considering more alternative possibilities than one’s initial point of view (Baron, 2008; Stanovich & West, 2007). Our finding is in line with earlier studies with comparable content in which considering multiple perspectives reduced participants’ confirmation bias (Adame, 2016; Lord et al., 1984; Mussweiler et al., 2000; Van Brussel et al., 2020, 2021). An explanation for this finding might be that all participants received the same instruction on open-mindedness, and that the instruction and the demand characteristics of the subsequent tasks determined the response on the 41 AOT-items. There is, however, an ongoing debate whether the AOT measures open-minded thinking as a unidimensional trait notwithstanding its high Cronbach’s alphas. Janssen et al. (2020), found that despite various scale and factor analyses, neither the 41-item AOT, nor a subset of items measured open-minded thinking as a single trait that could discriminate between participants. Therefore, we have to interpret the AOT scores cautiously. Taken together, the present study showed that preparing an open-minded lesson plan is an ecological valid manner to measure student teachers’ confirmation bias. These results might imply that extending this task with observing the actual teaching and the degree to which the confirmation bias is addressed in the classroom might enhance performance even more. This might be the focus of future research.

Finally, we discuss factors in our study that might limit our conclusions. One could suggest adding a retrieval practice control condition in which participants have to summarize the instructional content. However, if one uses a learning strategy that is much used by higher education students, such as summarizing, then the effect compared to TOV and PTT is probably much smaller because summarizing promotes generative learning as well for skilled summarizers in higher education (Dunlosky et al., 2013; Fiorella & Mayer, 2016). To demonstrate an effect of a generative learning strategy, one needs to make a comparison with a control condition in which a non-generative learning strategy is used. Re-studying, as our participants in the control condition did, is often used for this (e.g., Annis, 1983; Coleman et al., 1997; Fiorella & Mayer, 2013; Hoogerheide, 2016; Hoogerheide et al., 2019a, 2019b; Kobayashi, 2019; Renkl, 1997; Roscoe & Chi, 2008).

The short instruction and the ten minutes preparation time might not be an assignment that would be given in a real-world situation to teacher students. From our earlier studies (Van Brussel et al., 2020, 2021), it is shown that the confirmation bias was reduced on abstract tasks that could be solved procedurally, such as Wason’s four-card selection tasks (Wason, 1960). Participants in those studies were also briefly instructed. The length of the instruction might therefore not have led to different results in the current study. In contrast with the current study, participants received feedback on the practiced tasks. In the real world, feedback is provided to student teachers on their lessons plans before teaching the lesson. Therefore, expanding the instruction and practice phase with feedback on the explanation in future studies, might shed a light on whether a higher degree of external validity leads to differences between the quality of the lessons plans in the three conditions.

Conclusion

The current experiment did not reveal any differences between both generative learning strategies (teaching-on-video and preparing to teach), and a re-study control condition on conceptual knowledge. Furthermore, re-study had a positive effect on designing an open-minded lesson at least compared to PTT. These results might in part be due to the fact that TOV and PTT involved both elaboration and retrieval practice or because participants in both conditions did not experience remarkably high levels of social presence. Hence, the results of the current experiment indicate that more research is needed to answer the theoretically and practically relevant question when TOV and PTT are most effective. For example, to examine hypotheses in which social presence and arousal are more distinctive from each other. Then mediation analyses could provide more insight in the relationship between the treatment and the outcome measures. Lastly, we found an overall increase on mean active open-mindedness. This result might also be relevant to the educational practice as it shows that a small intervention might enhance the adaptive dispositions that teachers need for example in citizenship education lessons.