Effectively managing challenging student behavior is a common concern among K-12 educators. A common practice used to deal with perceived misconduct is the use of out-of-school suspensions (OSS) which involves the removal of students from school as a form of punishment, usually for peer conflict or low-level non-safety threatening reasons (e.g., disruption, disrespect; Losen & Martinez, 2020). However, several studies have indicated that students who are suspended from school, relative to their peers, are more likely to have worse academic outcomes, to be held back a grade level or leave school, and have a greater risk of substance use, arrest, or incarceration (Anderson et al., 2019; Dong & Krohn, 2020; Hwang, 2018; Lacoe & Steinberg, 2019; Mowen & Brent, 2016; Noltemeyer et al., 2015; Rosenbaum, 2020).

In addition, the disparities in the use of exclusionary discipline have been well documented for several decades (Huang, 2020). For example, Black students are suspended at double the rate (8%) of their White peers (3.8%), and students with disabilities are suspended at twice the rate (8.6%) of their peers without disabilities (4.1%; Harper et al., 2019). Schools across the country have been struggling to reduce the use of exclusionary disciplinary practices as well as eliminate these disparities (Losen & Martinez, 2020).

One approach that has the potential to reduce the use of OSS is restorative practices (RPs) in schools (Gregory et al., 2018). RP initiatives show promise and the US Department of Education (2014) formally recommended RPs as means to address racial disparities in school discipline. Over 41% of schools in the USA reported using some level of RPs in 2017–2018 (Diliberti et al., 2019), up from 34% in 2015–2016 (Diliberti et al., 2017).

However, despite RPs growing popularity, there is a lack of experimental studies that have investigated its efficacy (Song & Swearer, 2016; Zakszeski & Rutherford, 2021). Reviews of RP studies have lamented that most of the research on RP has generally limited generalizability and low internal validity (Darling-Hammond et al., 2020; Fronius et al., 2019). At the end of the last decade (i.e., 2020), only two cluster randomized controlled (CRCTs) studies involving RPs, long considered the gold standard in providing evidence of a causal relationship in evaluation studies, have been conducted, specifically in the USA, with mixed results (Acosta et al., 2019; Augustine et al., 2018).

Background on Restorative Practices

RPs are often cited as originating in indigenous cultures (e.g., the Maori in New Zealand) and have a focus on community building, repairing harm based on accountability, and restoring relationships (Zehr, 2015). RPs center on the belief that individuals are connected through a web of relationships and when harm occurs between people, those relationships (which are the basis of a community) are damaged and need repair or restoration. Fractured relationships are both a cause and effect of wrongdoings (Zehr, 2015).

RPs in schools include several types of programs and no one single definition exists in the literature, though RPs are often seen as both a nonpunitive approach to dealing with conflict (Fronius et al., 2016) and as a means of building community (Kline, 2016). The theoretical foundation of RP in education is that accountability is based on collective, fair processes and relationships, rather than top-down compulsory control (Thorsborne & Blood, 2013). Although RPs are often seen as a way of mending relationships, RPs also seek to prevent harm by strengthening relationships (Kline, 2016). In contrast to retributive justice, which relies on punishment to address wrong doing, a restorative approach focuses on building relationships to avoid reoccurring misbehaviors (Standing et al., 2012).

In schools, community-building activities for students and staff in small groups include weekly circles. Circles are a structured form of communication that allow individuals a chance to speak and listen to one another in a safe environment (Boyes-Watson & Pranis, 2015; Wachtel, 2013). Other activities include holding restorative conferences and problem-solving circles. When conflicts arise, RPs focus on identifying the needs of those harmed, directly or indirectly, as well as the needs of those causing the harm (Evans & Vaandering, 2016). Students and staff involved in incidents jointly share their experiences, identify the harmed parties, diagnose the contributors to the problem, and suggest solutions to repair the harm done. A series of structured questions are often used to guide the problem-solving process (e.g., “What happened?,” “Who has been affected by what you have done?,” “What do you think you need to do to make things right?”) (Wachtel, 2013). Claims that RPs have the potential to reduce suspension rates are based on the theorizing that a focus on relationship-building (prevention) and problem-solving (intervention) can support student social-emotional learning and prosocial community engagement and conflict resolution.

The number of school-based RP studies appearing in peer-reviewed academic journals has increased from 3 in 2006 to 71 in 2020 (Zakszeski & Rutherford, 2021). Quantitative (descriptive or correlational) studies on RP have generally shown that RP use is related to lower suspensions (Anyon et al., 2016; Gregory et al., 2018), fewer behavioral incidents (Boulton & Mirsky, 2006), improved attendance rates (Gonzalez, 2012), and higher academic achievement (Darling-Hammond et al., 2021). However, despite the growth in RP studies focusing on various outcomes (e.g., school climate, student victimization), Fronius et al. (2019) characterized the RP research base as still in its infancy due to the inability of most studies to make strong causal claims about RPs effectiveness.

Experimental Evidence Base on Restorative Practices

In a review of 71 studies on school-based RP published in academic journals between 2000 and 2020 (i.e., the last 20 years), Zakszeski and Rutherford (2021) noted that only two peer-reviewed studies published in journals (7.7%) used a cluster randomized controlled trial (CRCT) while the rest of the studies were mostly qualitative or correlational/descriptive studies. In a 2-year CRCT with 40 middle schools (20 intervention and 20 control) in England (students = 5960), Bonell et al. (2018) found that schools assigned to an RP and a social-emotional learning (SEL) intervention had students who experienced less bullying victimization, less substance abuse (e.g., smoking, alcohol), and better psychological well-being. Effects were characterized as small (d = 0.08) and no statistically significant differences in terms of aggression were found.

In a 2-year CRCT with 13 middle schools (seven intervention and six control) in Maine (students = 2834), Acosta et al. (2019) reported no statistically significant differences in terms of 11 outcomes related to school connectedness, school climate, social skills, and bullying victimization. Acosta et al. noted that the lack of differences may be due to intervention schools only having a modest amount of RP experience—which was not that different from what control schools received on their own. However, students who reported having more exposure to RP had more positive outcomes.

Of note is that both aforementioned RP CRCTs did not focus on disciplinary outcomes. In contrast, another 2-year CRCT was conducted in 44 public schools in Pittsburg (22 treatment and 22 control) and Augustine et al. (2018) reported that students in intervention schools reported improved perceptions of school climate and fewer days suspended compared to students in control schools.Footnote 1 The reductions in days suspended for Black students were greater compared to White students and the lower suspensions were due to reductions in suspensions in the elementary but not the middle or high schools. Teachers in the intervention group (compared to the control group) also reported more positive working conditions and climate managing student behavior (e.g., schools had clearer policies and had a safer environment) and reported better student relationships. However, the study noted no change in overall student behavior, student arrest rates (which were already low at 1.2%), and daily attendance. In addition, academic achievement (math but not reading) of middle school students, in schools with a predominantly Black student enrolment, in the intervention condition worsened.

Overall, the findings of the CRCTs investigating RPs are mixed. However, given that schools across the country are seeking alternatives to suspensions and are already adopting RPs at a fast rate (Diliberti et al., 2019), despite the lack of a compelling evidence base, “it is imperative to invest in well-designed experimental (i.e., cluster randomized) trials” (Zakszeski & Rutherford, 2021, p. 12).

The Current Study

We present the findings of a recent CRCT using the Whole School RP Project in a large urban school district in the US Northeast. The project builds on a collaboration between the Morningside Center for Teaching Social Responsibility and the district. Through a federally funded grant, Morningside developed an RP model with a focus on racial equity and SEL. We hypothesized that the RP Project could shift disciplinary policies/practice, strengthen relationships within the school, and develop student and teacher SEL skills. These in turn would result in a positive climate for learning, resulting in fewer disciplinary incidents, and reducing the need for suspensions.

For the current study, and as required by the funder, we completed a preregistered research protocol (submitted to Abt Associates, the technical advisors of the US Department of Education) prior to randomizing schools to the intervention and control conditions. Preregistration increases the credibility of findings based on a priori hypotheses. We build on recent published findings using the same sample of 5878 students in 18 schools. The prior study focused solely on the reduction of office disciplinary referrals (ODRs; the preregistered primary outcome) as a result of an RP intervention and findings indicated that students in the RP intervention schools were less likely to receive a discipline incident record (11.1%) as compared to students in the comparison schools (18.2%; Gregory et al., 2022). For the current study, we asked the following secondary research questions (RQs):

  1. 1.

    Does the use of RP result in a reduction in OSS? (preregistered)

  2. 2.

    Does the effect of RP on OSS vary by student disability status, gender, and race/ethnicity? (preregistered)

  3. 3.

    Does the effect of RP on OSS depend on the student’s prior history of OSS? (not preregistered)

Although not preregistered, we explored RQ3 as studies have shown interventions can reduce OSS for those with a history of suspension. In a study in New Orleans using a quasi-experimental design (i.e., difference in difference models using propensity scores), Glenn et al. (2020) reported that suspensions did not decline for the general student population that used RP but OSS declined for students that had been suspended at least once prior to treatment. Another set of studies also demonstrated effects for students with prior suspensions: in evaluating two separate empathetic mindset interventions for teachers with the aim of improving student–teacher relationships, an interaction effect for the intervention condition and prior student suspension was statistically significant showing benefits for students who had been suspended in the prior year (Okonofua et al., 2022). Students with a history of suspensions were less likely to receive an OSS and felt more respected by teachers who had received a brief intervention compared to the students whose teachers were in the control group (Okonofua et al., 2016).


To estimate the impact of the Whole School RP Project on OSS, we conducted a CRCT where schools were randomly assigned to be in either the treatment/intervention or the control/business-as-usual condition (BAU). Within one participating district, 23 schools with similar demographic characteristics (i.e., student enrollment based on gender, race, language learner status, disability status, percent poverty, and grade levels offered) were invited to participate in the study. Eighteen schools agreed to participate in the study. Schools were recruited in spring 2018, intervention assignment to the 2-year RP initiative was on June 1, 2018, and outcomes were measured after a single year of the initiative, at the end of the 2018–2019 school year, given the second year was interrupted by the COVID-19 pandemic.


Eighteen schools (i.e., six elementary schools, six middle schools, four high schools, and two combined schools with both middle and high school students) from one large urban school district agreed to participate (see Online Appendix A for a CONSORT chart). The evaluators and project staff met with all 18 principals to discuss the project and the randomization procedures. Blocking by school type (e.g., elementary) resulted in an equal number of elementary, middle, combined, and high schools randomized into the BAU and intervention conditions.

School Sample

At the school level, the average school size was 366 students. On average, the percent of Black and Hispanic student enrollment was 55% and 38%, respectively. White students comprised 2% of the sample. A large proportion of students (91%) qualified for free or reduced price meals (FRPM), a proxy for socioeconomic status (SES). No schools dropped out of the 1-year evaluation (i.e., there was no school-level attrition).

Student Sample

From the 18 participating schools, there were 6507 students enrolled from 1st to the 12th grade. As preregistered, kindergarteners were excluded as they would not have records of prior suspensions. Out of the 6507 students, 421 enrolled 6 weeks after the start of class and were considered “late joiners” and were excluded as was prespecified based on What Works Clearinghouse (WWC, 2020) guidelines. In addition, 208 students (3.4%) were considered attritors who completely left the school district for various reasons (e.g., transferred out of state, withdrew from school after repeated absences, was deceased). Of those that attrited, 4.1% and 2.7% left from the control and intervention groups, respectively, which is not considered problematic based on WWC guidelines.

The final analytic sample consisted of 5878 students, of which 253 students had transferred to another school within the wider school system (the school system was comprised of over 30 school districts with over a million students) but were retained in the analytic sample in line with the prespecified intention-to-treat (ITT) analysis which preserved the original randomized design. The analytic sample was composed of 2919 students in the BAU group and 2959 students in the intervention group (see Fig. 1 and Table 1).

Fig. 1
figure 1

Out-of-school suspension (OSS) rates by intervention status and receipt of OSS in the prior year

Table 1 Descriptive statistics comparing students and schools assigned to the intervention and control conditions (n = 5878)

Of the sample (male = 50%), the majority were Black (54%), followed by Hispanic (38%), some other race/ethnicity (7%), and White (2%) students. Twenty-four percent had an identified disability and 91% were classified as coming from low SES backgrounds. Baseline equivalence analysis, based on both school- and student-level characteristics, showed minimal differences across all characteristics (e.g., all ps > 0.10, see Table 1) suggesting successful randomization.

Intervention Condition

The RP Project integrated RP with SEL and racial equity activities with the goals of promoting both systemic and individual change. The RP Project is comprised of five components (see Online Appendix B), which include professional development and coaching with school leaders and staff in order to foster equitable, relationship-focused, and restorative school environments. Given the RP Project was implemented in elementary, middle, and high schools, developmental considerations in the design of the intervention model included grade-level–specific curriculum guiding students’ social-emotional learning and weekly restorative community-building circles. Schoolwide staff training (e.g., restorative mindset, understanding race, racism, and oppression) was similar across school levels given the adult audience, and training in facilitating SEL curricula was specific to students’ developmental levels.

The comprehensive nature of the RP, SEL, and racial equity project was intentionally designed to incorporate social-emotional skill development to address equity through a restorative format. Furthermore, each program component was designed to address equity through adults’ intrapersonal exploration about bias and privilege and through their critical examination of policy/practices that marginalize students resulting in their increased alienation/exclusion and reduced belonging/inclusion. For example, schoolwide staff training included sharing a personal cultural artifact, an interactive activity mapping a timeline of oppression connecting historical racialized policies to current racial harm, exploration of one’s privileged and marginalized identities and experiences, discussion of the four i’s of oppression (i.e., ideological, institutional, interpersonal, individualized), and sharing/reflecting on one’s racialized life experiences. In addition to the staff-wide professional development training, which included the aforementioned activities addressing equity consciousness raising and mindset shifting, the intervention model included coaching throughout the school year through small-group discussion of racialized incidents in the school, individualized mentoring to uncover and dismantle biases, and schoolwide shared readings from marginalized perspectives (e.g., James Baldwin and Angie Thomas). A notable feature of the intervention design is that the RP Coordinators assigned to each school for training and coaching met twice per month for full day collective problem-solving and their own continuous professional development and support from Morningside Center leaders.

The nine intervention schools were assigned to one of eight Morningside Center RP Coordinators and one of three RP/equity principal coaches. RP coordinators worked for 1–3 full days per week in the intervention schools in the 2018–2019 school year. RP coordinators provided training programs and experiential learning opportunities with a focus on capacity building. Intervention schools held 15-h staff training designed to help promote a restorative mindset and build SEL skills with all staff (e.g., teachers, administrators, and support staff). The RP coaches (who were also former school principals) worked individually with the principals during regular 90-min sessions. Individual principal coaching focused on building the principals’ restorative leadership capacity, developing the principals’ SEL skills, and building a racial equity mindset. Students received weekly restorative community-building circles in consistent small-group advisory settings (or homeroom settings in elementary schools) with teachers who were trained and coached in implementing Morningside Center’s grade-level–specific social-emotional learning curriculum (for an evaluation of the effectiveness of the program in elementary schools, see Jones et al., 2011).

Control Condition

At the start of the project, principals in control schools were interviewed to assess the ongoing initiatives at the school. Interviews indicated that none of the control schools was engaged with integrated RP initiatives which focused on capacity building of the school leadership, taught SEL skills, and had school-based teams develop and deliver training programs using the circle process. In four out of the nine schools, principals noted some staff were trained in restorative approaches to discipline, a condition which reflects the disciplinary reforms that are being used throughout the state (as well as the nation). BAU schools were provided $5000 and could avail of the discounted intervention services (e.g., training, coaching) after study completion. BAU schools could continue with their own professional development.

Implementation Fidelity of the Training

Implementation fidelity of the training was measured throughout the project in close contact with RP coordinators and Morningside Center staff. Implementation measures were designed and refined during a pilot year of the project working with three schools (not enrolled in the current study). Fidelity measures included interviews with each intervention and control condition school principal, bi-annual interviews with each RP coordinator, and monthly check-in data collected from each RP coordinator (i.e., Morningside Center staff consultants assigned to each intervention school). Despite some variability across schools, each intervention school met the overall threshold for adequate implementation of training—a threshold developed in collaboration with project designers and the research team and based on a pilot year of implementation (with three different schools). Specifically, for the RP Leadership Team training component, all 9 principals attended the 2-day training and individual coaching sessions (they averaged 11 sessions ranging from 7 to 14 sessions). Also, in all the interventions schools, RP coordinators helped facilitate schoolwide planning through diverse Restorative Intervention (RI) leadership teams, which met an average of 8 times ranging from 4 to 13 meetings during the year. For the RI training component, in all of the intervention schools, RP coordinators regularly consulted with administrators and staff about how to work with students to resolve disputes in a restorative manner. However, only 3 of the 9 schools had multiple staff members participate in the formal 2-day RI training. For the Schoolwide Staff Development training component, RP coordinators, in all 9 schools, held the 15-h training for staff and conducted individualized coaching (an average of 5 sessions per trained circle facilitator with a range of 1 session to 24 sessions per person). Finally, training and support for the Family RP Opportunities component were more variable with 4 of the 9 schools training their parent coordinators in RP and 6 of the 9 schools holding RP family events.


The outcome of interest was if the student had received a principal/superintendent suspension or not within the 2018–2019 school year (i.e., 1 = received one or more suspensions, 0 = did not receive a suspension). Instead of the typical in- and out-of-school suspensions used by other school systems, the district distinguishes between three forms of responses that may result in a student’s removal from the classroom: (1) a teacher removal (student is removed from the classroom but can remain in the school building); (2) a principal suspension (which can last for 1 to 5 days), and (3) a superintendent’s suspension which lasts longer than 5 days but less than a school year. As done by Rodriguez and Welsh (2022), we combined the latter two categories as these represent students who are not allowed to attend school, similar to what is more commonly referred to as an OSS. Suspension data, as well as student demographic data, were provided by the district. Student demographic data included gender, race/ethnicity, eligibility for FRPM, and disability status. In addition, we also use suspension data from the prior school year.

Analytic Strategy

We examined whether assignment to the RP intervention condition resulted in a lower likelihood of receiving an OSS for students (RQ1). As the intervention status was assigned at the school level, to account for the nesting of students within schools, two-level multilevel models were used. Multilevel linear probability models (LPMs) were prespecified using restricted maximum likelihood estimation with the lme function in the nlme package (Pinheiro et al., 2014) in R 4.1 (R Core Team, 2022). The approach is used to evaluate the ITT impact of being assigned to receive the RP intervention or not. LPMs are valid for experimental studies (Deke, 2014; Huang, 20212022) though we had prespecified robustness checks using logistic regression models as well. Logistic regression models used a CR2 standard error adjustment (Bell & McCaffrey, 2002) that accounted for the limited number of clusters (Huang et al., 2022b; Huang & Li, 2022).

As there was little missing data (5.4%) and only coming from the baseline measure (e.g., OSS recorded in the prior year), the Institute of Education Sciences (IES, WWC, 2020, p. 38) suggests the use of the dummy variable indicator method. Using the dummy variable method for missing data is simple, produces unbiased treatment effects due to the randomization, and is acceptable when a CRCT is used and only if the baseline measure is missing (Puma et al., 2009).

The student-level covariates included the dummy-coded baseline measure (e.g., receipt of an OSS in the prior year), race (Hispanic, White, Other and Black as the reference group), gender, eligibility for FRPM, and disability status. A dummy-coded variable indicated if a student was missing the baseline measure (1 = missing, 0 = not missing). At the school level, enrollment size (in hundreds of students), the percent of students eligible for FRPM, and the percent of Black students enrolled were included as covariates. The continuous school-level covariates were grand mean centered. The predictor of interest (at the school level) was intervention status (1 = treatment, 0 = control; the variable of interest). School type was included as a series of dummy codes with middle schools as the reference group. The combined impact evaluation (main effects) formula can be expressed as:

$${Y}_{ij}=\delta {\text{Trea}}{\text{t}}_{\text{j}}+{\beta }_{1}{\text{Ele}}{\text{m}}_{\text{j}}+{\beta }_{2}{\text{Com}}{\text{b}}_{\text{j}}+{\beta }_{3}{\text{Hig}}{\text{h}}_{\text{j}}+{\beta }_{4-6}{\text{SchDem}}{\text{o}}_{\text{j}}+{\beta }_{7-12}{\text{Dem}}{\text{o}}_{\text{ij}}+{\beta }_{13}{\text{Prio}}{\text{r}}_{\text{ij}}+{\beta }_{14}{\text{Mis}}{\text{s}}_{\text{ij}}+{u}_{0j}+{u}_{1j}{\text{Prio}}{\text{r}}_{\text{ij}}+{e}_{ij}$$

where Yij represents the outcome (receipt of an OSS; 1 = yes, 0 = no) of student i in school j and δ represents the ITT effect. Random slopes allowed baseline measure to randomly vary by school to investigate whether the effect of prior suspensions varied across the schools. The residual variance for eij was estimated based on the type of school attended to account for violations of homoscedasticity which can affect model results (Huang et al., 2022a). An improvement in model fit was assessed using a likelihood ratio test (LRT; LaHuis & Ferguson, 2009), with a nonstatistically significant LRT indicating that the simpler, more parsimonious model was preferred.

Three moderators were prespecified which assessed if there was an interaction between intervention status with student gender, disability status, and race (RQ2). A final model was tested which investigated if there was an interaction between intervention status and the receipt of a prior OSS (RQ3). LRTs were used to test (using maximum likelihood) for an improvement in model fit of the moderation models compared with the simpler main effects model.


Basic Descriptives

In the year prior to the intervention (school year 2017–2018), the average OSS rate for the 18 schools was 4.4%. In the year of the intervention (school year 2018–2019), the OSS rate was 4.5% (see Table 2). In both pre- and post-intervention periods, the OSS rate was lowest in elementary schools (i.e., 1.2 to 1.3%) and highest in high schools (i.e., 9.1 to 10.2%).

Table 2 Out-of-school suspension (OSS) rates by school type and intervention condition

Regression Results

A null multilevel model (MLM) indicated that the intraclass correlation coefficient for suspensions was 0.04. LRTs indicated that the inclusion of random slopes (χ2 = 74.6, p < 0.001) and allowing residual variances to vary by school type (χ2 = 1231.4, p < 0.001) were both warranted (e.g., suspensions varied by the grade-level configuration of the school).

Main effects of the MLM (see Table 3) indicated no statistically significant differences in the likelihood of suspension between students in the intervention and control schools (B =  − 0.001, p = 0.95). A succeeding model tested whether the effect varied by student gender, race/ethnicity, or disability status. An LRT, which simultaneously tested if including the interaction terms improved model fit, indicated that the effect of the intervention did not vary by the gender, race, or disability status; χ2(5) = 8.4, p = 0.14. This is also evident in model 2 (see Table 3) where none of the interactions was statistically significant (all ps > 0.05).

Table 3 Multilevel linear probability models predict receipt of out-of-school suspensions (OSS; n = 5878)

The final model (see model 3, Table 3) showed that the intervention effect varied as a function of whether the student had received a prior OSS or not (B =  − 0.14, p = 0.04). This indicates that students with a prior suspension were less likely to be suspended in the intervention schools. To further illustrate the interaction visually (see Fig. 1), students in the control school with a prior OSS had a 33% probability of being suspended; in the intervention schools, the probability was only 11% (d = 0.77). In addition, the marginal R2, based on the model fixed effects (Nakagawa & Schielzeth, 2013), increased from 0.108 (in the main effects model) to 0.142 in model 3, which is substantial (i.e., an increase of 30% based on only one additional predictor).

Robustness Checks

Given that the average OSS rates were low to begin with (i.e., < 5%), another type of regression model may be more suitable (note: the LPM was prespecified as part of the preregistered research protocol and was approved by the technical advisor of the grant funding agency). We had also prespecified that robustness checks would be conducted using logistic regression analysis using cluster robust standard errors.

Robustness checks (see Online Appendix C) indicated that students in the intervention schools were less likely to receive an OSS (OR = 0.57, p = 0.09; one-tailed p = 0.045, d = 0.31) while controlling for all other variables in the model. Results also indicated that the intervention effect did not vary by subgroup as all the interactions were not statistically significant (all ps > 0.20). The interaction with prior OSS also showed lower rates of suspension for those with a prior OSS (B =  − 0.98, p = 0.08; one-tailed p = 0.04). An additional test was conducted to determine if the intervention effect varied by type of school. An LRT (χ2 = 0.76, p = 0.86) indicated that the intervention effect did not vary by school type.


Literally, a handful of rigorous, true experimental studies have been performed investigating RP effects on different outcomes. We show, using a rigorous CRCT, that after 1 year of implementation, RP did not have a direct effect in the reduction of suspensions, and effects were consistent regardless of student race, disability status, or gender (i.e., there was no differential effect of RP). Other studies have shown that RP may not necessarily lower suspension gaps between Black and non-Black students as well as students with and without disabilities (Anyon et al., 2016; Hashim et al., 2018).

The current findings contrast with our prior research using the same sample that focused solely on the preventative promise of RP. Discipline incidents are recorded as ODRs which are upstream/proximal precursors to downstream/more distal suspension outcomes. We found that overall, students in the RP Project schools were less likely to receive a discipline incident record as compared to students in the comparison schools (Gregory et al., 2022). One possible reason why RP main effects for suspension may not have materialized is that the time frame may have been too short (i.e., 1 year). Considering that RP initiatives may take several years (e.g., 3 to 5 years) to fully implement (Glenn et al., 2020; Hashim et al., 2018; Mansfield et al., 2018) and that RP coordinators at the school may still be building relationships with school staff during the first year, it is uncertain how effective the RP initiative may have been in the longer term. Although we had originally planned on collecting discipline data together with end-of-year school climate surveys in spring of 2020, the global Coronavirus pandemic forced schools to transition to remote schooling in March 2020, truncating data collection efforts. RP coordinators could not enter school buildings to meet with staff, students were not in school physically, and teachers had to quickly adapt to the sudden shift to online learning. Moreover, during remote schooling, out-of-school suspensions were not being issued (as students were already not in school), which made 2019–2020 incomparable with prior years and a large portion of suspensions typically occur toward the end of the school year (e.g., in the 2018–2019 school year, 42% of suspensions were issued in March or later). Although we attempted to collect data in the third year (i.e., through an extension of the implementation to 2021, which was not planned), schools were using a hybrid mode where suspensions were rarely used (since students were mostly at home) and students who were originally part of the project had progressed to the next level of schooling or beyond (e.g., 8th grade middle schoolers moved on to high school, high schoolers had graduated).

Another possible reason is that four (out of nine) BAU/control schools, based on administrator interviews, were using some form of restorative approach to discipline as well. However, this is reflective of discipline reform efforts across the country as RPs (and other types of interventions) are more widely adopted (Diliberti et al., 2019). The wider school system has been undergoing discipline reform efforts for several years and suspension rates in the school system fell from approximately 8 to 5% from 2012 to 2019 (Rodriguez & Welsh, 2022). Yet, even when faced with this comparison and low prevalence rates, the Whole School RP Project had an impact on those with a prior history of suspension.

As a prior suspension is one of the strongest predictors of a future suspension (see Table 2), the finding that RP may differentially impact those with a prior suspension is meaningful. Teachers and administrations may be less reluctant to issue suspensions to students who already have a track record of suspensions (Blakely & Woodward, 2000) and especially to already-suspended Black students—a group particularly impacted by implicit racial bias (Okonofua et al., 2016). The finding of intervention effects for those students with a prior history of OSS has been shown by others as well (Glenn et al., 2020; Okonofua et al., 2022).

Students who had received an OSS previously represent a subpopulation of students who are already at greater risk of academic failure (i.e., they had already experienced being suspended). Prior suspensions may lead to a breakdown of trust and respect for the teacher and can trigger a vicious cycle of misconduct and suspensions (Okonofua et al., 2016; Rosenbaum, 2020). In the current study, of those who had been suspended previously, the large reduction in suspension rates is practically meaningful and represents the breaking of the cycle of suspensions. Given outcomes are worse for Black students with a history of suspension relative to suspended students from other racial/ethnic groups (e.g., Rosenbaum, 2020), the current study’s findings also indicate some promise for RP to aid in dismantling the school-to-prison pipeline.

Said speculatively, mechanisms through which the RP initiative may have helped to break the suspension cycle may include (a) improved relationship-building with adults/teachers through the SEL curricula and circle process, (b) shifted adult mindset to attend to the unmet needs of students with a history of suspension, and/or (c) increased student engagement in SEL skill-building interventions that prevented future misconduct (e.g., restorative conferences). As one of the goals of RP is to build and foster connections between individuals, healthy relationships play a key role in the reduction of misbehaviors (or perceived misbehavior) which can lead to suspensions.


Aside from the disruptions in the study design brought about by the COVID-19 pandemic (e.g., shortened timeframe), a few more limitations should be kept in mind. Although there were reductions in OSS for those with a history of suspensions, we do not know if RP changed student behavior or if the staff response to student misbehavior changed (see Glenn et al., 2020). We also are not aware if observable classroom teacher practices or behaviors changed due to the training and coaching in the project. Moreover, the Whole School RP Project weaved together RP, an SEL curricula, and activities to shift adult mindsets and practices to increase racial equity in discipline. The integrated initiative did not isolate whether specific well-defined RP such as circles or conferences caused the findings. Future research should integrate systematic observation to identify if specific relationship-building adult behaviors shifted after trainings (e.g., provision of socioemotional supports), beyond their implementation of community-building circles.

Of consideration as well is that an intervention such as the Whole School RP Project requires resources, commitment, and effort (e.g., RP coordinators are in schools several days a week; teacher, administrator, and student buy-in is required). RP has been previously described as “an effective but exhausting alternative” to suspensions (Dominus, 2016). However, these requirements may be similar to other school-wide or teacher interventions such as positive behavior intervention supports (PBIS; Bradshaw et al., 2010) or MyTeaching Partner (Gregory et al., 2015). Moreover, the proverbial “train has left the station,” given RPs are already being adopted at a fast rate by schools, despite the lack of an experimental evidence base which highlights the need for the current study (Diliberti et al., 2019; Song & Swearer, 2016).


Based on the preregistered analytic plan, of which the evaluation was conducted after 1 year instead of the 2 years as originally planned as a result of the COVID-19 pandemic, the current study found no differences in a student’s likelihood of being suspended over the course of the academic year between schools randomized to the intervention and control conditions. In addition, no differential intervention effects were found based on race, gender, and disability status. Given the project was evaluated without implementation of its full 2-year design, the results should be interpreted with caution. Noteworthy was that after a single year of implementation, the current experimental study found a reduction in the likelihood of receiving an OSS for students with a prior history of suspension. Although main effects were not found, RP may be a promising approach for reducing suspensions in schools, especially for a subsample of students who are already at greater risk of academic failure.