Keywords

1 Introduction: Prior Studies, Research Questions, and Significance

The potential of peer tutoring is boundless if peer tutors have the content mastery and tutoring skills to even remotely resemble effective adult tutors. High-dosage tutoring with trained adult tutors is consistently identified as the most productive learning intervention, including among students with low socio-economic status (Dietrichson et al., 2017; Fryer, 2017). Unfortunately, to date, there is little evidence that K-12 students can be quickly trained to teach well (Berghmans et al., 2013).

Studies consistently find that tutors tend to do much more explaining than tutees (King, 1997), place minimal demand on tutees when questioning (Graesser et al., 1995), and rarely stimulate deep-level reasoning or monitor the understanding of tutees (Graesser et al., 1995; Roscoe & Chi, 2007). In short, tutors tend to adopt stereotypical, didactic teaching practices, cutting off opportunities for tutees to actively engage with ideas, sometimes severely hampering their learning. Drawing from in-depth observations of peer helping in middle school classrooms, Webb and Mastergeorge (2003) found that receiving highly didactic help actually predicted poorer content understanding than being left alone to struggle. I have come to label these the common sins of the Default Didact. Thus, while recent meta-analyses have found that peer tutoring does significantly increase learning for both tutors and tutees (Bowman-Perrott et al., 2013; Kobayashi, 2019; Leung, 2015), the efficacy of this learning arrangement is limited by our ability to effectively train peer tutors (Topping et al., 2017).

Few prior studies have attempted to train students to overcome these common sins of the Default Didact, and with minimal success. King’s (1998) model, ASK to THINK—TEL WHY, is an example of a program that trains students to ask questions. This is a reciprocal model where students take turns as the “questioner” or “explainer” following a whole-class lesson. Questioners ask a series of five types of questions using a card with question prompts. Emblematic of this under-researched area, the one experimental study of this model was underpowered, with three groups of just ten dyads. It found suggestive evidence that students using this structured inquiry model improved their ability to make inferences based on class content, but they did not comprehend class content better.

In this article, we define learner-centered peer tutoring similarly to learner-centered teaching, which emphasizes learners actively participating and constructing their own knowledge, as opposed to passive knowledge transmission (Yeh & Swinehart, 2017). Berghmans et al. (2013) attempted to train advanced college math students to adopt learner-centered peer tutoring strategies. Their training lasted 90 minutes, incorporating an overview on facilitative strategies (mainly questioning and hinting) and opportunity for tutoring roleplay with feedback. They then analyzed the instructional moves used by tutors in an introductory math class and interviewed them to better understand the rationales for their decisions. They rigorously evaluated the impact of their training and found that it did not meaningfully shift the behaviors of peer tutors. In line with past findings and despite the preparation to be more facilitative, tutors inevitably inclined toward directive strategies and “knowledge-telling,” and their questioning was “low level and shallow” (p. 717). The authors concluded that novice tutors require extensive training on deep-level questioning, working with tutees of varying levels, and reshaping beliefs about learning.

To address this persistent challenge, I designed a study to test the efficacy of two different interactive online training approaches to increase tutors’ use of learner-centered teaching behaviors and promote tutee learning. One approach was prescriptive, telling subjects the exact learner-centered pedagogical behaviors to use then prompting practice in identifying and executing them; the other approach assumed that students inherently possess productive pedagogical notions that must be strategically unearthed and committed to in writing. This comparison intentionally mirrored the classic tension between direct instruction and constructivist approaches to learning new skills. Specifically, this study asked:

  1. 1.

    can short, interactive, online modules prescribing key learner-centered pedagogical strategies shift middle schoolers’ tutoring behaviors?

  2. 2.

    can short, interactive, online modules embedded with social psychological intervention strategies unearth dormant learner-centered pedagogical inclinations and shift middle schoolers’ tutoring behaviors?

  3. 3.

    can increased adoption of learner-centered tutoring behaviors through either intervention approach increase learning for tutees?

Structures for group learning and Peer Assisted Learning (PAL), which includes peer tutoring, have been studied extensively by numerous researchers, perhaps most prominently by Slavin (2006), who co-developed three common structures: Student Teams-Activity Division, Teams-Games-Tournaments, and Cooperative Integrated Reading and Composition. Despite myriad structures and ample scholarship on their implementation and efficacy, there are few evidence-based models for training K-12 students on how to effectively communicate during group learning. Training for peer tutoring—the most obvious and common form of PAL (Topping & Ehly, 2001) where one student actively supports the academic learning of a peer—should be informed by the mass of accumulated knowledge on teacher professional development, but these connections are rarely made. This research project attempted to bridge this gap by transposing the framework of learner-centered teaching onto peer tutoring, and testing the viability of effective training through a web application.

The results of these studies provide strong evidence for the prevalence of the Default Didact and the realistic possibility of tutors becoming what I call Emergent Elicitors. The Default Didact, though often well-meaning, treats teaching and helping opportunities as opportunities to lecture and demonstrate competence, embodying and mirroring years of being spoken at by teachers. As Lortie postulated about novice teachers, this default didact too is a product of the “apprenticeship-of-observation” (1975, p. 67). These studies suggest, however, that this default state is not as sticky for peer tutors as its prevalence among adult teachers might imply.

2 Prescriptive Intervention Design to Promote Three Learner-Centered Tutoring Strategies

This study aimed to discover ways to quickly train students to be learner-centered tutors capable of eliciting, probing, and guiding the thinking of peers in much the same way that effective teachers do. The hope was that, after just 40 minutes interacting with either PeerTeach training—a short enough duration to fit within one class period—students would be able to more effectively teach their peers. While the goal of both trainings was to promote learner-centered tutoring, their structures were distinct, testing the comparative affordances of a prescriptive training approach versus a more constructivist one. The Talk Moves training provides students with proven teaching strategies then offers an online environment in which to practice identifying and using them.

Talk moves (at times called “talk tools” or “accountable talk”) are the result of three decades of research aimed at identifying the speaking choices of teachers who are skillful at orchestrating equitable and productive classroom discourse (Godfrey & O’Connor, 1995; O’Connor, 2001; O’Connor & Michaels, 1993, 2015). Among the teacher professional development efforts to increase and improve teacher questioning, this approach is among the most specific, practical, and easy to grasp.

From the teacher talk moves described in this literature, a subset of moves were identified that are ideal for peer tutoring as they are conceptually simple, broadly applicable, and intended for one-on-one interactions. These include (1) eliciting questions that encourage students to express their ideas (e.g., “Say more about that”), (2) probing questions that dig into why students think what they think (e.g., “Why do you think that?), and (3) revoicing moves where tutors state what they think the learner is saying (e.g., “I hear you saying ______”).

There are two main ways that these three talk moves promote learning. First, eliciting and probing moves encourage tutees to talk, which forces them to make sense of their thoughts in order to verbalize them. It is common for this alone to help learners work through ideas and develop solutions on their own (King, 1998; Webb & Mastergeorge, 2003). At minimum, eliciting and probing push students to take stock of what they do or do not know at any given moment and make them active participants in knowledge creation. Second, all three talk moves enable tutors to better understand their peers, helping them to identify misconceptions, gaps in knowledge, and errors in reasoning, preparing them to scaffold learning more effectively.

In their study of the Talk Science intervention, Michaels and O’Connor (2015) found that their training quadrupled the frequency that nine teachers used language that video-coders perceived to be “helping students deepen their reasoning” (p. 343). This success in uptake of moves is likely a product of talk moves being “easy to remember and easy to pull out with a bit of practice” (p. 336), making them practical and realistic tutoring techniques for children. Thus, it stands to reason that preparing children to use eliciting, probing, and revoicing talk moves could be an effective way to shift students from what is typical didactic tutoring to more elicitive strategies that promote better dialogue and deeper learning.

2.1 Design

The first PeerTeach intervention focuses on Talk Moves and uses Sherin and Van Es’s (2005) video-based noticing framework as a vehicle for promoting their uptake. That framework asserts that those who teach must attend to important teaching moments, relate them to useful pedagogical frameworks, and act based on pedagogically sound reasoning. PeerTeach creates such experiences when students watch animated tutoring interactions and practice noticing and tagging effective talk moves (see Fig. 11.1 for an example of this type of PeerTeach level). The theory driving this intervention is that if students are trained to notice and identify effective talk moves, they might internalize and use them in real-world tutoring interactions.

Fig. 11.1
A snippet of peer teach tab. An instruction at the top reads tag when you see the tutor, use a talk move. A graphic includes a female teacher stick diagram pointing to a maths diagram and a drawing of a man standing aside. Options for gather intel, say it back, press for reasoning are present at the bottom.

Noticing practice level. Note. Students tag the video each time the cartoon tutor uses one of the focal talk moves

Figure 11.1 shows the intersection of a curated set of talk moves with the first two elements of Sherin’s (2005) Noticing Framework for professional development: attending to important teaching moments and relating them to useful pedagogical frameworks. To accomplish the third and final element of that framework—acting based on pedagogically sound reasoning—PeerTeach has students practice making teaching decisions. Within the application, students engage in virtual tutoring sessions where they practice selecting the most strategic utterance (of three) to propel a virtual student forward. After selecting an utterance, students receive two forms of feedback: (1) the virtual learner responds verbally, revealing the impact of the selected utterance, and (2) the learn-o-meter, an indicator of the virtual student’s thinking, goes up or down. See Fig. 11.2 for an example of this type of level.

Fig. 11.2
A snippet of the peer teach tab for evidence-based teaching moves. It has an instruction at the top and a mathematical expression. A stick diagram for you, a tutor, with 3 speech bubbles. An empty box with the done option is present. The figure for the tutee has a blank speech bubble. A cartoon of a thermometer with a brain in place of mercury is on the right.

Practice level for choosing evidence-based teaching moves. Note. Students practice making strategic teaching decisions. The Learn-o-meter ticks up when the virtual tutee is learning

Great tutoring, like great teaching, involves a complicated set of processes. While some peer tutoring models restrict tutors to solely asking questions (King, 1998), this training simply encourages their inclusion.

3 Constructivist Intervention Design to Unearth Learner-Centered Tutoring Strategies

The first intervention was driven by the theory that students (1) lack useful pedagogical intuitions, (2) should be directly told what constitutes effective teaching, and (3) need practice using those learner-centered techniques. The second intervention was premised on the idea that students intuitively possess productive notions of learner-centered teaching—that students believe, either innately or through experience as learners, that learning happens best when the learner is engaged, actively verbalizing thoughts, and in a dialogic back-and-forth with a responsive, question-asking guide. This intervention gives students mild priming to make salient their existing conceptions of learner-centered teaching, then prompts them to describe the helper they want to be in a letter to themselves. It is modeled after “wise” interventions from the social psychological literature, in particular, the Saying is Believing intervention strategy (Aronson et al., 2002). This particular intervention technique has proven very effective in prompting psychological shifts in other areas: to believe intelligence is malleable (Aronson et al., 2002) and to believe they belong in college (Walton & Cohen, 2011) to name two of many.

Aronson argues that people want to be consistent. If they are prompted to write that learner-centered teaching behaviors are key to good tutoring, they can only maintain consistency and avoid feeling hypocritical if they tutor accordingly. Thus, this intervention works by priming subjects to write down that they believe good tutoring is about asking questions, understanding the other person, and encouraging that person to do the thinking work. In this way, the Wise intervention approach more closely resembles discovery learning, which assumes and calls forth prior knowledge as a central component of learning.

3.1 Design

Through the PeerTeach web application, students who engage with this intervention take notes while watching a series of videos. The first two videos (each approximately one minute long) show a compilation of interview clips where experienced peer-tutors discuss the lessons they have learned (shown in Fig. 11.3). These clips are curated to reinforce specific messages: tutees need to be actively problem solving and tutors need to be asking questions and probing the other students’ thinking. Those brief videos are followed by videos of example tutoring sessions, illustrated by Fig. 11.4. While they are not marked “good” and “bad,” it is clear from extensive user testing that students intuitively pick up on one tutor dominating the conversation and explaining too much while a different tutor asks questions that help the other student think through a problem. Pilot testing showed that students make this discovery themselves; past research has shown that learning can be longer lasting when students make discoveries themselves, even through computer simulations (De Jong & Van Joolingen, 1998). After watching videos and taking notes, students write a letter to themselves about the kind of helper they want to be, tacitly committing to enacting those behaviors in the real world.

Fig. 11.3
A snippet of peer teach tab related to priming interview. Instruction is given at the top. A video of 3 tutors talking is present, along with a video slider. A field for takeaways is present at the right. The takeaways and done buttons are at the bottom.

Priming interviews on PeerTeach. Note. Peer tutors discuss lessons learned, focusing on learner-centered strategies

Fig. 11.4
A snippet of contrasting cases videos in peer teach tab. Instruction is given at the top. A video of 2 young boys, one of them is tutoring, and the other looks on. A video slider is present. A text box with the title I noticed is present on the right. The button I noticed is at the bottom.

Contrasting cases videos on PeerTeach. Note. Students watch contrasting tutoring videos, identifying the problematic nature of overly didactic teaching and the learning benefits of more elicitive strategies

4 Methods

These studies took place in a Northern California middle school in partnership with one sixth and one seventh grade teacher. They were conducted with 198 sixth and seventh graders in regular, non-advanced math classes. The students were 53% Latino and 42% White at a school where 33% of students are eligible for free or reduced-price lunch.

4.1 Round One Implementation Sequence

In both rounds of data collection, which were separated by five months, students first engaged in training to become effective helpers, then employed their new skills in real teaching interactions with peers. Students in each of seven classrooms were randomly assigned to one of three conditions: the wise psychological intervention, the Talk Moves (TM) Training, and the control condition. To minimize classroom effects, randomization occurred within each classroom. Students received the same training in both studies, so round two can be considered a re-dosing of treatment.

The first round of data collection was underpowered for detecting learning differences by tutee condition, since only half of the students were tutees. The main aim was to validate that the interventions could successfully shift students’ online tutoring inclinations from didactic knowledge-telling to more learner-centered approaches. Significant learning differences following in-person tutoring, by condition, was an aspirational outcome, not an expected one.

4.1.1 Day 1—Determining Baseline Content Understanding and Tutoring Inclinations

To measure students’ a priori inclinations toward didactic helping versus elicitive helping, all students in this study—those in both treatment conditions, along with the control students—began their intervention experience making teaching decisions in an online game. On this level, each student individually controlled a virtual peer tutor helping a virtual cartoon learner. For each of four scenarios, students were presented with three speech options: one learner-centered teaching move and two didactic (or overly directive) options that shut down opportunities for the virtual learner to think. Many of these overly directive speech options were cloaked in questions (e.g., “Would you like me to show you how to solve this?”) so that students could not “game” the system by just picking questions.

All students were taught a lesson on ratios then given an assessment to determine how well they learned the content. The top half of student performers were designated as tutors. To increase the likelihood that tutors in each condition would have similar tutoring ability at baseline, tutors were ranked by score on their baseline tutoring decision-making then sorted into conditions through blocked sampling (i.e., the tutors with the top 3 scores were randomly assigned into each condition, then the next three were assigned, etc.). The same blocked sampling strategy was used to assign tutees to conditions. Lastly, tutors and tutees within conditions were paired randomly.

4.1.2 Day 2—Training and then Tutoring

Students completed their assigned training silently on laptops sitting at desks that were spaced out in their classrooms. Following the intervention, students played a similar game with 4 new scenarios to reveal any shifts in their online teaching inclinations.

Tutoring pairs were given worksheets with practice problems. Tutors were instructed, “You can do whatever you think is best to help the other student learn.” Tutoring occurred for 10 min then all students took a final assessment on ratios the following day. That assessment was scored using an adaptation of the “Representing and Solving the Task” portion of the Mathematics Problem Solving Official Scoring Guide used by the Oregon Department of Education Office of Assessment and Evaluation (2011). See Appendix A. Each of four problems was scored on a rubric of 1–4 to allow us to distinguish between degrees of mathematical understanding. The author and a research assistant scored the assessments, achieving an interrater reliability of 87.5% on 20% of the data.

4.1.3 Control Group

The aim in designing the control was to mimic every contextual feature of the intervention experience without actually shifting how students thought about peer tutoring. It was hoped that controls would (1) believe they were being trained as effective helpers, but (2) teach in the natural way they would have without any training. To accomplish this, controls were treated identically by facilitators, partnered with a student in the same group, and completed their training through PeerTeach. In order to avoid changing how they conceptualized peer tutoring, leaving intact their natural inclinations, this training focused on the importance of tutors understanding math. A prior survey revealed this belief to be nearly universal among middle school students, making it appropriate for the control “training.” Thus, controls spent their training time engaged in solving math problems accessed through PeerTeach as preparation for future peer tutoring.

4.2 Round Two Implementation Sequence

The second round of data collection took place five months after the first, with the same group of students. It focused on two main questions: (1) do shifting pedagogical mindsets translate into measurably different teaching behaviors in real-life, particularly more learner-centered moves? and (2) do these shifts in tutoring style produce more learning for tutees? Fig. 11.5 illustrates the study design.

Fig. 11.5
A flow diagram of 3 blocks represents the implementation of round 2. Block 1 has 3 facilitators, Each bench has a tutor and a tutee. Block 2 has 2 facilitators with separate groups of students seated facing the facilitators. Block 3 has 9 mixed pairs of tutors and tutees without a facilitator.

Implementation flow of round 2

4.2.1 Day 1—Sorting by Condition and Training

Students completed the same assigned training as before through the PeerTeach website sitting next to a new randomly assigned partner in the same experimental group. The three experimental groups were clustered together with an assigned facilitator (one of two researchers or the teacher) facing away from the middle of the classroom to maintain the facade that all students were engaged in the same training. By and large, students only paid attention to their own training, minimizing the cross-pollination of ideas between treatment conditions. Only one student appeared to notice that each cluster was advancing through a different training.

While Round One showed promising training results without interaction between participants, past studies on the learning benefits of collaboration suggested that these interventions might be even more powerful if children could talk through their thinking with one another. As just one of many examples, Bamiro (2015) demonstrated that teachers could produce significant learning gains in chemistry classrooms simply by adding in think-pair-shares. As such, facilitators in Round two encouraged partner pairs to discuss the training ideas to better understand them.

The PeerTeach interventions were administered consistently, largely because students’ experiences were facilitated by a computer program. To ensure that facilitators acted predictably, we collaborated to develop a facilitation script that included what we would say before students opened their laptops, along with three acceptable prompts to encourage collaboration between partners. To account for slight differences that could emerge from the presence of one facilitator instead of another, facilitators rotated between experimental groups each class period.

4.2.2 Day 2—Learning Different Math Content

Each class was split in half to learn different content, either comparing means and medians (taught by the researcher) or comparing rates (taught by the teacher). Partner pairs from Day 1 were split and randomly assigned to these different content groups. These topics were selected through negotiation with the two teachers. These topics—ideal for peer tutoring because they are conceptually rich with multiple solution paths—were on the pacing guide for the 6th grade teacher and were deemed important, challenging, and worth re-teaching by the 7th grade teacher. In this way, the study was built into the fabric of a legitimate learning sequence, aiming to both answer important research questions and serve the learners within the context of their classrooms. Following the Day 2 lessons, quizzes were administered to enable later examination of the relationship between tutors’ content knowledge and how well their tutees learn.

4.2.3 Day 3—Peer Tutoring and Post Assessing

Students taught partners the content they learned the prior day. After 20 min of peer tutoring, each student wrote a reflection describing the teaching of their partner, then took an assessment to measure their learning. That assessment, like the one used in Study 1, was later scored by the author and a research assistant using an adapted version of a rubric focused on “Mathematics Problem Solving” (Oregon Department of Education, 2011). Again, problems were scored 1—4 and an interrater reliability of 83.5% was achieved on 20% of the data.

4.3 Measures

After both rounds of data collection, the three conditions were compared on a number of variables: the frequency that students chose elicitive teaching moves in online scenarios, tutees’ assessment scores, and in Round Two, also the frequency of tutees describing particular tutoring behaviors in real life. To account for classroom differences, linear mixed-effects models were implemented from the lme4 package (Bates et al., 2015) in the statistical software R (Version 3.0.3. R Development Core Team. 2008). The primary comparisons were treated as fixed effects while the classroom was treated as a random effect. Each dependent variable was regressed using orthogonal contrasts to test two comparisons: whether treatment conditions combined (coded as +1/3 each) produced more effective outcomes than the control condition (coded as −2/3) and whether one treatment was more effective than the other (coded as −1/2 and +1/2). Only one outlier was excluded.

One key difference between Round One and Round Two was that tutor and tutee sample sizes were doubled in Round Two because all students served as tutors, not just the top half of performers on the pre-assessment. To determine appropriate sample sizes, the most reliable method is to identify prior studies with near-identical measures to make a priori power estimates. Unfortunately, no substantive body of research exists measuring learning impacts of training K-12 peer tutors. Instead, past studies measuring the learning impacts of teacher professional development and teacher questioning were selected as the nearest analogue. Hattie (2012, p. 252) estimates the effect size of teacher questioning on student learning to be 0.48 and the effect size of teacher professional development to be 0.51. With an effect size of approximately 0.5, alpha of 0.05, and a power score of 0.80, samples should have 50 participants to perform a well-powered one-sided t-test. For this study, after removing students who were absent during any day of the study, the three samples had, on average, 52 students each. Thus, if the effects on student learning resembled prior success levels training adult teachers, this study was adequately powered to detect statistical differences.

4.3.1 Qualitative Measures of Tutoring Behavior

To gauge differences in tutoring behaviors post-intervention, an open-ended survey was administered immediately after peer tutoring occurred. It asked, “What was the most helpful thing your classmate did or said when teaching you? Give as much detail as you can.” The author and a research assistant applied emergent codes to these responses to unearth patterns in the ways that students taught each other (and what their peers considered their best teaching moves). A codebook was developed with 13 main codes (e.g., “Asked questions” or “Checked work/understanding”) and 29 sub-codes (e.g., “Asked probing questions” or “Used yes or no checks for understanding”). Codes were applied to descriptions without names or experimental conditions visible to ensure unbiased coding. The frequency of applied codes is shown in Appendix B.

To ensure accuracy, two procedures were employed, as described by Saldaña (2021, p. 27–28): a check for intercoder reliability and consensus coding on the full corpus of data. After every response was coded by both the author and a research assistant using NVIVO 11 software, a check for reliability revealed 86% overlap in applied codes, which is above the 80% threshold as recommended by Miles and Huberman (1994). Next, to ensure the accuracy of final codes, the 14% of cases with disagreement were discussed until consensus was reached. Combined, these two procedures ensured that the codebook was reliably employed and that final codes were accurate representations of the data.

5 Results

5.1 Students Default to Didactic Teaching Online, but Shift with Training

Past studies have shown that peer tutors tend to explain more than they should (King, 1997). To measure students’ inclinations toward over-explaining versus more learner-centered behaviors, students made decisions in online scenarios before and after their intervention experiences. Unsurprisingly, before receiving the training, students across conditions tended to choose didactic speech options (e.g., “The first thing you need to do is…”). More surprising was the extent to which students avoided trying to elicit the virtual student’s thinking. Out of four scenarios, students selected the more elicitive move only 1.04 times, on average, markedly lower than by chance. See Fig. 11.6 for an example of one scenario and the frequency with which students selected utterances.

Fig. 11.6
A snippet of a teaching scenario in peer teach tab. It has an instruction at the top. A female figure has 3 speech bubbles with the values 31%, 44%, and 25%. A box with a puzzle question is next to her. The male figure has a blank speech bubble. A cartoon of a device similar to a thermometer with a brain instead of mercury is on the right.

Example teaching scenario with frequency of selected moves (pre-intervention)

When given a similar scenario-based game post-intervention, as predicted, students in both the wise intervention group (labeled “WISE Training” in plots) and the Talk Moves training (“TM Training”) became more elicitive online helpers than controls (p < 0.001), as illustrated in Fig. 11.7. This analysis was executed using planned orthogonal contrasts to compare combined treatment groups with controls. Students in the Talk Moves Training chose elicitive moves most often, likely because their training incorporated practice making decisions in similar online scenarios, but their performance was not significantly different from students in the wise intervention condition. Compared to controls, the Cohen’s D effect size was 0.95 for the Talk Moves Training and 0.63 for the Wise Training.

Fig. 11.7
A bar graph with error bars of elicitive teaching moves chosen versus 3 conditions, control, WISE training, and T M training. The bars are in increasing order. Their respective n values are 63, 65, and 57.

Following training, elicitive teaching moves increase in online scenarios. Note. Error bars represent 95% confidence intervals

5.2 Learning Gains in Round 1 of Data Collection

More elicitive decision-making in online scenarios was not, however, the ultimate goal. This was an intermediate measure. The true test of the efficacy of these training approaches was how well students’ training experiences translated into effective real-world teaching. As stated prior, it seemed unlikely that differential learning effects would be detected with such underpowered samples (since only half of students were tutors) and relatively short, 10-min tutoring experiences. Despite those constraints, tutees in treatment conditions did appear to learn more than controls. Using planned orthogonal contrasts to compare student groups, we find that tutees taught by tutors in treatment conditions did indeed have higher post-assessment scores than controls. To account for possible differences by teacher, a linear mixed effects model was utilized where tutee scores are treated as fixed effects while teacher was treated as a random effect. To confirm that post-assessments were not influenced by differing content mastery between intervention groups, pre-assessments were compared across groups and were not significantly different.

Post-assessment analysis suggests that treatment group tutors (combined) were more effective than controls [F(1, 71) = 1.91, p = 0.009]. The results were not significantly different between treatment groups [F(1, 48) = 0.38, p = 0.398]. Table 11.1 summarizes scores by condition. Given that the variance was significantly different between control and treatment conditions, and that the variance of controls is more likely to reflect the true non-treated population variance, Glass’s Delta could be a more accurate measure of effect size than the more commonly used Cohen’s D, which is also provided (Fritz et al., 2012).

Table 11.1 Treatment tutors produce tutees with higher assessment scores

5.3 Round Two: Peer Instructional Behaviors Shift to Make Room for Peers to Think

Before examining tutee learning, let’s consider how tutors taught. Several key patterns emerged: while behaviors related to explaining were common across groups, tutees in treatment groups described their tutors as asking questions and promoting active learning (i.e., “helping when needed” and “letting them try to solve the problem”). In Table 11.2 below, the percentages represent how often tutees mentioned these teaching moves, along with the other major categories mentioned, as the “most helpful thing” their tutor said or did.

Table 11.2 Frequency of “most helpful” teaching moves as recalled by tutee

These percentages are likely low estimates of how often these teaching practices occurred, as students were not specifically asked about each teaching practice, but rather given a general prompt to recall the “most helpful thing” the tutor did. That said, even though these data are not precise indicators of how often each of these teaching practices occurred, they do draw striking distinctions between treatment students and controls. While control group tutors were almost never described as asking questions, helping when needed, or letting their tutees try to solve problems, these were common descriptions of treatment group tutors. Here are several illustrative examples of tutee descriptions of treatment group tutors:

  • “He kept trying to get my thinking and he did that so he could explain the parts of the problem I did not know.”

  • “She gave me time to think. She also helped me with the problem when I needed it.”

  • “He asked me very helpful questions.”

  • “The most helpful thing was when they let me try the problem without trying to quickly correct my mistakes.”

5.4 Tutoring Improves with Training and Content Mastery

While shifting teaching behaviors is an important intermediary goal, a successful intervention would additionally result in increased learning. Tutee assessment data suggest that both the wise intervention and Talk Moves Training were effective tools for improving peer tutoring quality, particularly when tutors first mastered the content.

Using orthogonal contrasts to compare the effect of tutors’ training on tutee assessment scores, we find that being in either treatment group rather than control had a significant effect on tutee scores [F(1, 152) = 8.65, p = 0.004]. Neither treatment produced significantly different results than the other [F(1, 105) = 0.07, p = 0.79]. The mean score for tutees taught by control tutors (M = 39.9, SD = 14.5) was far below tutees taught by Wise Intervention (M = 50.3, SD = 21.1) and Talk Moves tutors (M = 49.2, SD = 21.3). The Cohen’s D effect size was 0.58 and 0.51 for the Wise Intervention and Talk Moves training, respectively, compared to controls. Using the Glass’s Delta formula, which substitutes the control SD for the pooled SD in cases where variance differs significantly by condition, the effect size was 0.72 for the Wise Intervention and 0.65 for the Talk Moves training, compared to controls.

To determine how much variance in tutee scores can be explained by tutors’ content knowledge and treatment condition when controlling for each, a multiple regression analysis was conducted. In order to better understand the relationship between tutors’ pre-assessment scores (indicating their content knowledge) and tutees’ scores after being tutored, both sets of scores were converted into standardized Z scores where their mean is 0 and their standard deviation is 1. As shown in Table 11.3, analysis revealed significant effects for both tutor knowledge (i.e., tutor pre-assessment scores) and tutor training on tutees’ scores following tutoring. Treatment condition and tutors’ pre-test scores were not significantly associated with one another.

Table 11.3 Treatment and pre-test both have significant independent association with tutee learning

5.5 Combining Data from Both Studies Highlights Need for Mastery and Training

Given the similar data collection designs of Round 1 and Round 2, an even more robust statistical analysis is made possible. By standardizing tutee assessment scores and tutor pre-test scores (i.e., calculating z scores for each value where the mean score is 0 and the SD is 1), regressions were enabled for a combined dataset. Multiple regression with this data, which includes all students who participated in the entirety of either study (n = 204), reveals large training effects and large pre-test effects, both of which occurred independently of the other, as shown in Table 11.4. The Cohen’s D effect sizes were 0.65 and 0.62 for the wise and talk moves trainings, respectively, compared to controls. The Glass’s Delta effect sizes, which use controls’ variance as their basis, were 0.92 and 0.78 for the wise and talk moves trainings, respectively.

Table 11.4 Treatment versus control and tutor pre-test are both independently associated with tutee learning

To visualize the combined effects of tutors’ content knowledge and treatment condition on tutees’ post assessment scores, the data was broken down by pre-teaching quiz score bands. About a third of tutors fit in each of three categories: tutors who scored lowest, middling, or highest on the pre-test. After separating all tutors into pre-teaching quiz score bands in Fig. 11.8, we find that: (1) trained tutors are more effective helpers within every content knowledge band, and (2) tutors with strong mastery of the math content before teaching who received the PeerTeach training were much more effective helpers than every other group. This suggests that peer tutoring should occur when helping students have both strong content understanding and training on learner-centered teaching practices. Both pieces appear critical.

Fig. 11.8
3 dot plots with error bars for low, medium, and high pretest tutors with n= 64, 74, and 66 respectively, represent standardized tutee test scores versus 3 conditions, control, T M training, and wise training. The highest value for low pre-test tutors is with T M training, and for medium and high pre-test tutors are with wise training.

Combining both rounds of data, tutor pre-test scores and condition both predict tutee learning. Note. Data was combined from both rounds of data collection by first converting tutee assessment scores and tutor pre-test scores into standardized z-scores. Dots represent means. Lines represent 95% confidence limits for the population mean obtained through nonparametric bootstrapping of the data

6 Discussion

As Paul and Elder (2019) write, “The history of education is also the history of educational panaceas, the comings and goings of quick fixes for deep-seated educational problems.” The human tutor is not a novel innovation of the twenty-first century, but its efficacy is unparalleled by modern “panaceas.” Instead of maintaining the churn of new innovations, identifying ways to expand and improve this millennia-old instructional strategy could pay more dividends.

Enlisting students to teach one another is a clear way to expand access to individualized coaching. The limiting factor is students’ ability to teach as past studies have repeatedly documented their inclinations toward over-explaining and shallow questioning (Roscoe & Chi, 2007), which generally hinder learning. This investigation offers promising solutions. The two PeerTeach interventions increased the frequency of students using elicitive teaching techniques in both virtual and real-life tutoring scenarios, which translated into significant learning gains for tutees. While content mastery was a strong predictor of tutoring success, the combination of math knowledge with PeerTeach training produced more learning at every level of math proficiency. Given the seeming importance of both mastery and training, it seems likely that activity structures that do not vet tutor mastery—for instance, ASK to THINK—TEL WHY—will yield less learning.

The results of this study suggest that (1) both prescriptive and constructivist online training modules can successfully shift peer tutoring behaviors, and (2) when those behaviors shift, tutee learning can be greatly amplified. While one might imagine other ways of improving peer tutoring, these specific intervention approaches are promising. Educators aiming to train tutors should consider combining these evidence-based training techniques with their own strengths as trainers and knowledge of their students. When facilitating teaching between children, confirming the tutor’s mastery of content and monitoring their use of learner-centered teaching strategies will likely increase tutee learning.

The students of this study were split between two math teachers. One teacher’s tutors exhibited learner-centered teaching behaviors at a much higher rate and their tutees performed significantly higher. Consequently, one alternative explanation of the results is that the effect of tutor training relies on how well teachers model the kinds of learner-centered teaching behaviors that are central to the trainings. With only two teachers participating in this study and without systematic measures of their teaching behaviors, this analysis was not possible in this study. Exploring the link between teachers’ behaviors and student uptake of training ideas should be a priority in future studies.

The PeerTeach interventions are predicated on the consistent finding that tutors tend to explain too much, ask shallow questions, and fail to open up space for tutees to engage thoughtfully with content. To the degree this study underscored the potential for evidence-based training to cultivate Emergent Elicitors, it also highlighted the pervasiveness of the Default Didact. Before the intervention, students were less likely to select a learner-centered utterance out of three options than if selecting at random. When asked to report the most helpful thing their tutor did or said, tutees never described control tutors asking questions and only once described tutors helping when needed and letting them try to solve the problem. With this in mind, teachers who casually enlist students to help peers should heed this finding and take a more active role when facilitating peer helping. Indeed, as tutoring becomes a more integral feature for a broader swath of students in a Covid-impacted world, it is increasingly critical that non-expert tutors (peers or otherwise) learn to employ learner-centered pedagogy.

These interventions do not, however, advocate for a model of tutoring that is strictly question-based, like King (1998). There is a place for explanation, modeling, and many other non-questioning moves. Peer tutors should put together a toolbox of varied techniques to be applied when the situation is appropriate (MacDonald, 2000). In fact, backend data showed that tutors who selected learner-centered teaching moves 50–75% of the time (not 100%) helped tutees learn the most.

7 Limitations

These promising results are accompanied by several caveats. First, students’ decisions in four online tutoring scenarios were not identical reflections of how they would behave in real life. They were proxies that suggest where students likely fall on a spectrum between didactic and elicitive endpoints. In order to predict tutoring tendencies based on online behaviors, building a sizable bank of teaching decisions in varied tutoring contexts (e.g., with different types of tutees or problems) could offer a more nuanced and precise indication of students’ inclinations. The possibility of writing their own utterances could also lend further measurement precision. While providing added accuracy and nuance, these changes would also carry drawbacks. Drastically increasing the number of scenarios would be much more time-consuming for students and the inclusion of free responses would make data analysis and reporting more challenging. That said, future work should explore both mechanisms as tools for evaluating students’ teaching inclinations and tracking progress.

Students’ in-person teaching behaviors are also challenging to track. This investigation opted to measure them by asking tutees, “What was the most helpful thing your classmate did or said when teaching you? Give as much detail as you can.” While this technique provided useful insights into the behavioral differences by condition, a more precise or in-depth method would utilize video or audio recordings of tutoring interactions. That way, a permanent record could be transcribed and coded by researchers to pinpoint exactly what students did. While video data was collected and analyzed to better understand the interactional mechanics of about ten tutoring pairs, tutee-written records allowed more coverage for this analysis. With more researchers and resources, video-based measurement will hopefully be utilized more extensively in future iterations of this work.

8 Conclusion

Emerging from COVID's devastating toll on learning, districts are turning to professional tutoring more than ever before. While there is solid evidence of the powerful impacts of high dosage tutoring (Dietrichson et al., 2017; Fryer, 2017)—often considered one-on-one instruction at least thrice weekly—it is logistically challenging to execute in schools (Allor & McCathren, 2004; Bryant et al., 2011) and expensive; even when scaled efficiently, costs are estimated between $2,500 and $3,800 annually per student (Ander et al., 2016). This study provides reason for optimism, suggesting that peer tutoring could be a viable alternative when coupled with the right training or effective assessment and matching systems. After just 40 min with both PeerTeach trainings, middle schoolers became demonstrably more effective tutors, particularly when they first mastered the math content. This finding was repeated in Round One and Round Two of data collection, offering a robust corpus of evidence.

This demonstration, though, is just a signal of how powerful peer tutoring can be when accompanied by research-based training. The next step in this line of research is to measure the impact of sustained peer tutoring that incorporates other elements of teacher professional development that can be applied to student tutors. For instance, as the Measures of Effective Teaching (MET) project evidenced, feedback from learners and instructional expert observers can be powerful tools for promoting teaching improvement (Rothstein & Mathis, 2013). Future studies could also measure students’ growth in teaching ability over time as they engage in different forms of training, practice, and reflection, offering more precise insights on how to support development. In situating peer tutoring as a classroom routine, there are also opportunities for identifying useful principles for determining which students should teach what content and when.

For decades, we have known that all children can learn more with individualized support (Bloom, 1984), but we forgo such investments in our children. Fortunately, though, the benefits of tutoring may be within every child’s grasp if we can harness the existing talent and ingenuity that abounds in every classroom. If we give students the responsibility of tutoring each other, though, we as educators must take on the responsibility of training children to teach effectively. This study suggests that—so long as students attain sufficient content mastery before tutoring—training them to use more learner-centered teaching strategies is an effective and realistic goal.