School-based mindfulness training (SBMT) may be an attractive possibility for the promotion of well-being and prevention of mental disorders in children and adolescents. Given that mental health problems commonly have their first onset in adolescence (Kessler et al., 2007), scholars have called for early prevention in elementary school settings (Patel et al., 2007). In the past 15 years, research on SBMTs has increased sharply (Ergas & Hadar, 2019). However, initial enthusiasm about the effectiveness of mindfulness in children and adolescents was dampened by recent, more rigorous evaluation studies questioning overall positive effects (Kuyken et al., 2022; Lassander et al., 2020; Zelazo et al., 2018). Indeed, scholars (e.g., Dunning et al., 2019; Roeser et al., 2020) have emphasized the need to address methodological shortcomings that are often characteristic of relatively young research fields, to gain a better understanding of the potential and limitations of the effectiveness of SBMTs. One crucial factor to consider is the role of implementation, which has been rarely done within this field (Tudor et al., 2022), but will be addressed in the present study. By applying a cluster-randomized trial, we examined the effectiveness of two 8-week SBMTs in Swiss elementary school students. A broad array of student outcomes tapping emotional and social aspects of well-being, as well as two aspects of implementation quality (i.e., responsiveness of participants and quality of delivery), were examined.

Mindfulness in School

Mindfulness is defined as "paying attention in a particular way, on purpose, in the present moment, non-judgmentally" (Kabat-Zinn, 2003, p. 145). Thus, mindful awareness involves an intentional focus on thoughts, feelings, sensations, and perceptions with an attitude of kindness, curiosity, and openness (Shapiro et al., 2006). Although being aware of the present moment does not imply a specific activity, it is thought to be enhanced by mindful meditation. Some practices among various forms of mindful meditation include awareness of breath, sound, and bodily experience (Kabat-Zinn, 1994).

Mindfulness practice is well established in intervention programs for adults, which were shown to be particularly effective in reducing depression, anxiety, and stress (Khoury et al., 2013). These promising effects of mindfulness-based interventions for adults have motivated educators, therapists, and researchers to develop a variety of SBMTs to enhance self-regulation, learning and mental health of children and adolescents (Meiklejohn et al., 2012; Weare, 2019). In comparison to interventions for adults, SBMT typically uses more movement-based practices and shorter segments of mindfulness practice (Zelazo & Lyons, 2012). A recent overview by Roeser et al. (2020) looking at characteristics of 36 SBMTs in prekindergarten through secondary school settings (K-12) identified that (a) the majority of these programs applied either a novel curriculum or adapted existing adult programs (an additional 20% of these SBMTs involved brief practices designed for and administered to students), (b) were delivered either by an external facilitator or the classroom teacher, and (c) lasted around 360–900 min. In the present study, we evaluate two distinct SBMTs: One (Binja) is based on a novel curriculum, whereas the other (BTP) is based on brief practices. Both are delivered by the classroom teacher, and the total in-class program time is 360 min.

SBMT is theorized to promote self-regulation by improving attention control, emotion regulation and self-awareness. Improved self-regulation in turn is thought to enhance mental health, school performance and social behavior (Roeser et al., 2020; Tang et al., 2015). These proposed mechanisms, and in particular self-regulation in childhood has been found to be a strong predictor of academic success (McClelland et al., 2013; Neuenschwander et al., 2012) and mental health (Moffitt et al., 2011). Better emotion regulation has been found to be associated with higher levels of social competence and lower levels of psychopathology in childhood (Zeman et al., 2006). Furthermore, children with symptoms of depression and anxiety have been found to show impoverished emotion regulation (Siener & Kerns, 2012; Suveg & Zeman, 2004). This implies that enhanced self-regulation and in particular emotion regulation through SBMT in childhood, may hinder the generally early onset of mental disorders such as depression, anxiety, attention deficit disorder, and substance abuse (Kessler et al., 2007), and thus enhance the well-being of children and adolescents. In addition, SBMT appears to offer an ideal tool to reach the educational goals of social-emotional learning (SEL; Grant et al., 2017; OECD, 2021), as it targets the specific SEL competencies of self-awareness, social awareness, relationship skills, and responsible decision-making (Collaborative for Academic, Social, and Emotional Learning, 2013; Lawlor, 2016).

Elementary school might be the ideal setting to improve emotion regulation and SEL competencies (Patel et al., 2007). From middle to late childhood, the ability to recognize, understand and express emotions typically increases, which lays the foundation for adaptive emotion regulation. Furthermore, appropriate social regulation is enabled by emerging perspective taking and understanding of emotional cues. These abilities are important building blocks for the future development of self-regulation in adolescence and are thus at best fostered in this sensitive period (Bailey & Jones, 2019).

Effectiveness of SBMTs

In recent years, not only the application of SBMT has increased, but also its scientific examination by scholars (Ergas & Hadar, 2019). A growing body of research indicates that SBMT reduces internalizing distress such as depression, anxiety, and stress in children and adolescents (Dunning et al., 2019). However, the more recent and to date largest randomized control trial (RCT) study by Kuyken et al. (2022) found no effects of SBMT on 11- to 12-year-olds risk for depression.

Research further indicates that the self-regulation components of executive functions and emotion regulation can be enhanced by SBMT. A recent meta-analysis has shown that SBMT promotes executive functions. However, this effect was lost when only RCTs with an active control group were considered (Dunning et al., 2019; for similar results, see also recent studies, Kuyken et al., 2022; Lassander et al., 2020; Zelazo et al., 2018).

No meta-analytic insight on the effects of SBMT on emotion regulation is available. Still, several studies have shown that SBMT has a positive effect on the emotion regulation of elementary school (McCurdy et al., 2024; Schonert-Reichl et al., 2015; van de Weijer-Bergsma et al., 2014) and high-school students (Broderick & Metz, 2009). One study however, found no effect on emotion expression and emotion awareness in elementary school students – although significant decreases in negative affect were found at follow-up (Vickery & Dorjee, 2016).

Findings on the effects of SBMT on social behavior are mixed. Although three studies found that SBMT improved prosocial behavior in elementary school (Schonert-Reichl et al., 2015), kindergarten (Viglas & Perlman, 2018), and preschool children (Flook et al., 2015), in two other studies no effects on preschool children’s prosocial behavior (Thierry et al., 2018), and on elementary school students’ peer relationships were found (Mendelson et al., 2010). Furthermore, a positive effect on social behavior was not corroborated by a meta-analysis (Dunning et al., 2019). So far, there is little evidence that SBMT effectively reduces externalizing distress, namely anger and aggression (Dunning et al., 2019; Roeser et al., 2020). Taken together, evidence for a reduction of symptoms of anxiety and stress is strongest. Findings on the reduction of depression, as well as the enhancement of emotion regulation and social behavior are promising but not as robust.

Although research on SBMT indicates potential benefits for children and adolescents, one should not draw hasty, overly confident conclusions on its effectiveness. Findings in the still young field of research are not as clear-cut as the widespread use of SBMT would suggest. The authors of the most recent comprehensive meta-analysis on the effects of SBMT (Dunning et al., 2019) point out several limitations to their meta-analysis, which can be seen as emblematic for the field. First, methodologies of studies, including SBMT protocols, participant characteristics, and outcome variables are very heterogeneous, which limits comparability. Second, publication bias is highly likely, which suggests a possible overestimation of benefits. Third, relatively few studies use RCTs. Subsequently, in their meta-analysis including exclusively RCTs, the number of studies analyzed for specific effects is generally small, which makes findings less robust. Fourth, in a large part of studies, SBMT is implemented by researchers themselves, which increases the likelihood of bias. Fifth, most studies on SBMT include student self-report measures only, which might cause participants to report in a socially acceptable, rather than a genuine way (Podsakoff & Organ, 1986). In our study, we address these issues by using a cluster-randomized waitlist-control group design, examining interventions that were not developed by the investigating researchers, and utilizing self- and parent-report questionnaires.

The Role of Implementation Quality

Another shortcoming in the research on SBMT is that the role of implementation is hardly ever considered in a systematic way (Tudor et al., 2022). As mindfulness finds its way into more and more classrooms and is therefore more likely to be taught by trained regular teachers rather than program developers (Doyle et al., 2018), knowledge on the implementation of SBMT is especially valuable. To ensure that effects found in pilot studies translate to a large-scale implementation, researchers, facilitators, and trainers of facilitators themselves must understand which aspects of a specific program are essential for a successful implementation (Milat et al., 2015).

In order to facilitate and propel the examination of implementation quality within the realm of SBMT, Feagans Gould et al. (2016) have created a framework for the systematic assessment of implementation quality. Based on previous work by Dane and Schneider (1998) and Durlak and Dupre (2008), they propose four dimensions of implementation quality, including adherence, dosage, quality of delivery, and responsiveness. Adherence refers to the extent to which the core components of a program are delivered. Dosage measures the amount of exposure. Quality of delivery specifies whether the curriculum of a program is delivered well and as intended, and responsiveness indicates how strongly participants engage in a program.

Applying this framework, Sciutto et al. (2021) found that the responsiveness of young children, reported by mindfulness instructors, was associated with better treatment outcomes. Results suggest that children in classrooms with higher levels of teacher and student engagement, showed a larger reduction in externalizing behavior and larger enhancement of prosocial behavior. Two other studies examined the impact of dosage on the effectiveness of SBMT. One study found that adolescents who deliberately practiced mindfulness at home besides practice in class, showed a stronger reduction in somatic complaints than students practicing in class only (Broderick & Metz, 2009). Another study found that students who practiced mindfulness more frequently, showed greater reductions in depression and stress as well as bigger increases in well-being (Kuyken et al., 2013). However, a more recent study by Montero-Marin et al. (2022) with a large sample found that implementation of SBMT was not associated with student outcomes of well-being, risk of depression, and social-emotional behavioral functioning.

Research on implementation quality of SBMT is scarce, however, the role of implementation quality in the effectiveness of SEL programs is better understood. Even though SEL programs generally do not include mindfulness practice, SBMT and SEL programs show a significant overlap in goals as well as in components, such as psychoeducation and the learning of behavior skills (Lawlor, 2016; Meiklejohn et al., 2012; Semple et al., 2017). A meta-analysis has shown that when problems with the implementation of SEL programs have been reported, treatment effects tend to be smaller (Durlak et al., 2011). In addition, two studies examining SEL programs found that higher implementation quality was associated with several positive outcomes (Dowling & Barry, 2020; Humphrey et al., 2018).

Present Study

This study was conducted to contribute to the ongoing effort to determine the effectiveness of SBMT (Dunning et al., 2019; Kuyken et al., 2022; Lassander et al., 2020; Zelazo et al., 2018) by addressing methodological limitations in prior research. First, we applied a robust study design (i.e., cluster-randomized control trial, relatively large sample size, measurement of implementation quality), which allows us to infer causality to the SBMT effects. Second, we did not develop the interventions ourselves, thus there is no partiality. Third, we used self- and parent-report questionnaires, which reduces social acceptability bias. Thus, the aim of this cluster-randomized trial was to examine whether two distinct and established 8-week SBMTs (Binja and BTP) implemented in elementary school, enhance students’ emotion regulation, social well-being, and emotional well-being. Based on previous research, we hypothesized that both SBMTs would enhance emotion regulation (Schonert-Reichl et al., 2015; van de Weijer-Bergsma et al., 2014), social well-being (Schonert-Reichl et al., 2015), and emotional well-being (Dunning et al., 2019).

Furthermore, the current study examined what role the responsiveness of participants and quality of delivery are playing in the effectiveness of these SBMTs. By addressing two critical aspects of implementation quality, we responded to calls withing the research community (Feagans Gould et al., 2016; Milat et al., 2015; Tudor et al., 2022) to support translational efforts in this field. Drawing from a substantial body of SEL research (Dowling & Barry, 2020; Durlak et al., 2011; Humphrey et al., 2018), we hypothesized that higher levels of responsiveness of participants and quality of delivery would be associated with better program effectiveness.

Methods

Participants and Procedure

A total of 246 students aged 9 to 12, from 19 German-speaking Swiss elementary school classes, ranging from second to sixth grade have completed the study. The sample also contained 199 parents and 18 teachers (i.e., one teacher taught in two classrooms, with one classroom being assigned to the intervention, the other to the waitlist-control group). See Table 1 for demographic characteristics and Fig. 1 for the flow of participants (including group sizes per measurement occasion). This study was approved by the Ethics committee of the Faculty of Human Sciences, University of Bern (no. 2021–01-00002). The trial was not preregistered.

Table 1 Demographic characteristics at baseline for each group separately
Fig. 1
figure 1

Flowchart of Participants. Note. The flowchart shows how many participants enrolled and completed questionnaires at each time point of measurement, including questionnaires that were only partially completed

Participating teachers and their classes were recruited by program developers of either intervention and were thus set to implement either Binja or BTP as an intervention group or – after data collection – as a waitlist-control group. In a next step, classes were randomly assigned to either the respective intervention group or to the waitlist-control group. All students enrolled in a participating class were invited to participate in the intervention – study participation, however, was voluntary. Letters with information about the study were sent to all parents and parents of participating students approved by returning a consent form (73.7% of all parents gave consent).

Students and parents completed self- or parent-report baseline questionnaires in February 2021 and post-intervention questionnaires in April 2021. Teachers delivering the SBMT were trained by the program developers beforehand in each respective intervention and delivered the program during 8 weeks in their classroom. Students in the control group received treatment-as-usual during the intervention period. To not contaminate the 2-month-follow-up, the waitlist-control group received the interventions in August 2021. Students completed 89% (90% in Binja, 78% in BTP, and 95% in control) of questionnaires at pretest and 80% (76% in Binja, 74% in BTP, and 89% in control) of questionnaires at posttest. Parents completed 80% of questionnaires at pretest (87% in Binja, 84% in BTP, and 70% in control) and 70% (87% in Binja, 44% in BTP, and 65% in control) of questionnaires at posttest. Since self-report questionnaires were split into two parts, measurements for some students were only partially missing for a measurement occasion.

In addition, to measure implementation quality, teachers completed weekly fidelity logs during the intervention. Teachers in the control group did not complete fidelity logs as they did not deliver the intervention. One teacher of the intervention groups (8.3%) did not complete weekly fidelity logs. In addition to baseline- and post-intervention-measurements, 2-month-follow-up self- and parent-report questionnaires were completed by participants. However, this third measurement occasion was not included in the statistical analysis, because at follow-up, students completed only 52% of questionnaires and parents 31% of questionnaires.

Interventions

The Binja intervention program (Monstein, 2020) has been developed by Ruth Monstein and has been taught and delivered in multiple schools over the past few years. It consists of eight scripted lessons, with a duration of 45 min each, for elementary school students and is accompanied by a picture book and a book with teacher resources. The program is taught to the whole class and aims at the enhancement of self-awareness, emotion regulation, stress regulation, compassion, and self-confidence. It employs mindfulness practice, emotion and social regulation exercises, class discussions, and psychoeducation.

The BTP intervention program (Fankhauser, 2020) has been developed by Erica Fankhauser and has also been taught and delivered in multiple schools over the past few years. It encompasses a collection of mindfulness practices, emotion and social regulation exercises, and class discussions that can be flexibly applied in the classroom, supported by a set of mindfulness cards with instructions. Although mindfulness practices are grouped in eight categories such as awareness of the body or social behavior, no scripted lessons are provided. Teachers were instructed to devote 45 min each week to the mindfulness practices. See Table 11 and Table 12 (Appendix) for training protocols of Binja and BTP.

Teacher training for both programs took place prior to this study and included workshops and/or courses taught by either of the program developers based on the program materials. Most teachers (n = 9), who participated in our study, had attended one of these two-day teacher workshops (for Binja, an additional evening course was provided) in the same year or the year before our study took place (i.e., n = 3 teachers attended the workshops two, three, or four years prior to the study). During program implementation, teachers and/or classrooms were not systematically supervised. However, both program developers responded to teachers’ questions instantly and were in close contact with the teachers throughout the study.

Measures of Primary Outcomes

Emotion regulation, social well-being, and emotional well-being were measured by self- and parent-report questionnaires.

Emotion Regulation

Measurement of emotion regulation included the assessment of emotion awareness, emotional control, and anger control.

Students’ emotion awareness was assessed by the 30-item German version of the Emotion Awareness Questionnaire (EAQ; Rieffe et al., 2008; Rueth et al., 2019). The four subscales Differentiating Emotions (α = 0.69; α = 0.76), Not Hiding (α = 0.62; α = 0.75), Bodily Awareness (α = 0.70; α = 0.72), and Analyses of Emotions (α = 0.63; α = 0.76) were answered on a 3-point Likert scale. Internal consistencies at baseline and post-intervention measurement were sufficient and similar to internal consistencies found in a study examining psychometric properties of the EAQ (German validation sample with mean age of 13;4 years, αs = 0.74-0.81 in Rueth et al., 2019; Dutch primary school validation sample with mean age of 10;8 years, αs = 0.64-0.68; and secondary school validation sample with mean age of 14;3 years, αs = 0.74-0.77 in Rieffe et al., 2008). The Differentiating Emotions subscale measures the ability to differentiate between emotions. The Not Hiding subscale measures the tendency not to keep emotions hidden. The Bodily Awareness subscale measures the awareness of bodily symptoms that come with emotions (reverse coded) and the Analyses of Emotions subscale measures the willingness to gain understanding by analyzing one’s own and other’s emotions. The two additional subscales of Verbal Sharing and Attending to Other’s Emotions were not used in further analyses as they showed insufficient Cronbach’s alphas (α < 0.60).

Students’ emotional control was parent-rated by the 10-item Emotional Control subscale of the German version of the Behavior Rating Inventory of Executive Function (BRIEF; Drechsler & Steinhausen, 2013; Gioia et al., 2000). Items were scored on a 3-point Likert scale, where higher values indicate lower emotional control. The Emotional Control subscale showed good internal consistency before and after the intervention (α = 0.84; α = 0.86; German validation sample α = 0.93).

Students’ anger control was measured by self- and parent-report using the 10-item Anger Control subscale of the German version of the State-Trait Anger Expression Inventory-2 (STAXI-2 KJ; Brunner & Spielberger, 2009; Kupper & Rohrmann, 2016). Both self-reported Anger Control (α = 0.80; α = 0.86; German validation sample α = 0.77) and parent-reported Anger Control (α = 0.88; α = 0.88; German validation sample α = 0.83) subscales used a 4-point Likert scale and showed good internal consistency before and after the intervention.

Social Well-Being

Social well-being was assessed by the 18-item Self- and Other-Oriented Social Skills measure (SOCOMP; Perren, 2008) both via self- and parent-report with a 3-point Likert scale. The three self-report subscales Social Participation (α = 0.73; α = 0.80; original Swiss validation sample α = 0.71), Prosocial Behavior (α = 0.74; α = 0.66; original Swiss validation sample α = 0.77) and Cooperative Behavior (α = 0.63; α = 0.79; original Swiss validation sample α = 0.49) showed sufficient internal consistency at baseline and post-intervention measurement. Likewise, the three parent-report subscales Social Participation (α = 0.82; α = 0.68; original Swiss validation sample α = 0.78), Prosocial Behavior (α = 0.68; α = 0.74; original Swiss validation sample α = 0.69) and Cooperative Behavior (α = 0.72; α = 0.66; original Swiss validation sample α = 0.68) showed sufficient internal consistency. A fourth subscale of Setting Limits showed insufficient internal consistency and was thus not further considered in the analysis.

Emotional Well-Being

Assessment of emotional well-being consisted of measurements of anxiety, depression, anger, and stress. Except for stress, which was assessed by self-report only, all these measurements were assessed by self- and parent-report.

To measure symptoms of anxiety and depression, students completed the German version of the Anxiety Inventory for Youth (BAI-Y) and the German version of the Depression Inventory for Youth (BDI-Y; Beck et al., 2005; Siefen & Busch, 2018). The BAI-Y scale (α = 0.89; α = 0.83; German validation sample αs = 0.86-0.93) and the BDI-Y scale (α = 0.93; α = 0.95 German validation sample αs = 0.91-0.94) were measured with 20 items on a 4-point Likert scale and showed excellent internal consistency at baseline and post-intervention measurement. For the parent-report the 13-item subscale Anxiety Problems and the 9-item subscale Affective Problems of the German version of the Child Behavior Checklist (CBCL/6-18R; Achenbach & Rescorla, 2001; Döpfner et al., 2014) was used. Both subscales for Anxiety Problems (α = 0.69; α = 0.74; German validation sample of 1994, general population, α = 0.75; German validation sample of 1994, clinical population, α = 0.81; US American validation sample, αs = 0.86-0.88) and Affective Problems (α = 0.63; α = 0.63; German validation sample of 1994, general population, α = 0.75; German validation sample of 1994, clinical population, α = 0.81; US American validation sample, αs = 0.86-0.88) used a 3-point Likert scale and showed sufficient internal consistency before and after the intervention.

Stress vulnerability was assessed by self-report only, using the 6-item Stress Vulnerability subscale of the German Questionnaire for the Measurement of Stress and Coping in Children and Adolescents (SSKJ 3–8; Lohaus et al., 2006). Stress vulnerability refers to the amount of stress a child or youth experiences in stressful everyday situations, which was measured on a 4-point Likert scale. The Stress Vulnerability subscale showed sufficient internal consistency (α = 0.69; α = 0.82; original German validation sample α = 0.66) before and after the intervention.

To measure anger, both students and parents completed the Trait Anger subscale of the German version of the STAXI-2 KJ (Brunner & Spielberger, 2009; Kupper & Rohrmann, 2016). Both self-reported Trait Anger (α = 0.83; α = 0.88; German validation sample α = 0.81) and parent-reported Trait Anger (α = 0.84; α = 0.82; German validation sample α = 0.83) employed 10 items on a 4-point Likert scale and showed good internal consistency at both measurement occasions.

Measures of Implementation

Measurement of responsiveness and quality of delivery are based on Dane and Schneider (1998), Durlak (2016), and Feagans Gould et al. (2016). These authors conceptualized responsiveness as participants attraction and their level of participation, and quality of delivery as the implementor’s enthusiasm and attitude towards the program as well as how well the program was delivered. For reasons of feasibility (costs, time investment, privacy concerns), we chose teacher-reports to measure implementation quality.

Similar to Sciutto et al. (2021), implementation quality was measured by weekly implementation logs that contained items measuring the responsiveness of participants and quality of delivery on a 3-point Likert scale (1 = little, 2 = moderately, 3 = very much). Teachers completed the implementation logs each week during the intervention. Responsiveness was assessed by two items concerning the motivation and involvement of the students. For responsiveness, the Cronbach’s alpha was 0.84. To assess quality of delivery three items were used that required teachers to rate their own motivation, how much they themselves liked the lesson and how good they were able to deliver the lesson. For quality of delivery, the Cronbach’s alpha was 0.91. Mean scores over the eight-week period represent the final responsiveness and quality of delivery scores per classroom. In addition, teachers reported each week whether the respective lesson was held and delivered as planned.

Measures of Covariates

Measurement of socioeconomic status (SES) was based on both parents’ educational and occupational status (Wegener, 1988). Covariates of age, gender, and SES were included as various studies found associations of these variables with self-regulation, social cognition, and mental health (e.g., Blakemore & Choudhury, 2006; Bradley & Corwyn, 2002; Kessler et al., 2007; Paus, 2005). Further, class size has been found to affect classroom processes and learning progress (Brühwiler & Blatchford, 2011).

Statistical Analysis

An a priori power analysis with G*Power version 3.1.9.6 (Faul et al., 2009) was conducted to calculate necessary sample sizes with a significance criterion of α = 0.05 and power of 0.80 (Cohen, 1988). Based on previous research (Klingbeil et al., 2017), we expected small effects of f = 0.20 on primary outcomes. The power analysis indicated that a minimum sample size of 134 students is necessary. Thus, sample size of 144 (n = 58 BTP group + n = 86 control group) for the comparison between BTP and the control group and 188 (n = 102 BTP group + n = 86 control group) for the comparison between Binja and the control group (i.e., comparison of each program to the control group, cf. our hypothesis) are sufficient to detect expected intervention effects. We set a significance level threshold of p < 0.05. However, because data on covariates was missing for some students, sample sizes for models with covariates (Model 2) are generally smaller than sample sizes for models without covariates (Model 1, see below). Therefore, marginally significant effects (p < 0.10) in Model 2 will be considered in the discussion.

Measurements of primary outcomes were z-standardized, and covariates were mean centered prior to statistical analysis. This procedure enhances the interpretability of multilevel model outputs and facilitates comparison of coefficients across various measurements. The standardization of primary outcomes also allows for a comparison of resulting standardized regression coefficients with effect sizes reported in other studies and can thus be applied in meta-analyses (Lorah, 2018; Nieminen, 2022). Regression coefficients represent the estimated number of standard deviations of change in the outcome variable.

In multilevel models an analysis of the full sample is appropriate, even if post-intervention data is partially missing (Twisk et al., 2020). Therefore, the full sample was used in statistical analysis and participants with only one out of two measurements were included. See tables with results for the number of cases analyzed for each variable for the examination of intervention effects (see Table 3 and 4) and the role of implementation quality (see Table 5. and 6.). We did not correct alpha levels for multiple comparisons given the exploratory nature of the study (Rubin, 2017).

Intervention Effects

As teachers and their classes were recruited by the program developers, classes were not randomly assigned to the Binja or BTP intervention. Further, the random assignment of classrooms to either an intervention or control group was carried out in blocks. This demanded a series of descriptive analyses to rule out that groups differed at baseline. A 2-level multilevel model was used to assess group differences of student- and classroom-level covariates and group differences of baseline-measures of emotion regulation, social well-being, and emotional well-being.

To consider the clustering of students in classrooms, multilevel analysis was utilized. An unconditional means model was employed to assess intraclass correlation coefficients (ICCs). To examine the effects of both SBMTs, two 3-level multilevel models were used, where Model 1 included no covariates and Model 2 included student-level and classroom-level covariates. For this analysis, data was restructured into long format. Both models examined repeated measurements of primary outcomes at level 1, which were nested in students at level 2. In addition, students were nested within classrooms at level 3. Dummies for both interventions and interactions between interventions and time as well as random intercepts for students and classrooms were added. In addition to Model 1, Model 2 included the student-level covariates of age, gender, and SES as well as the classroom-level covariate class size. For each primary outcome variable, a separate model was used.

In order to interpret and discuss whether groups showed stable, decreasing, or increasing effects over time, group means (see Table 9 and Table 10 in the Appendix) were examined when significant effects were found in multilevel models. However, it is important to note that the determination of stable, decreasing, or increasing effects over time was not statistically tested.

The Role of Implementation Quality

To examine the role of implementation quality, only the intervention groups were included in the analysis as implementation quality was not measured in the control group. Since theoretical assumptions on the role of implementation do not differ between the two interventions, groups were collapsed, which was also advantageous in terms of sample size at the class level. One class from the intervention groups was excluded from the analyses as the teacher did not complete the weekly fidelity logs. Three categories were built based on distributional cut-points of one standard deviation. Scores of responsiveness and quality of delivery below and above one standard deviation were categorized as low or high. Scores within the range of one standard deviation were categorized as moderate. The distribution of classrooms and students across categories of implementation quality can be found in Table 2. The categorization allows for a direct comparison of low and high implementation quality. Furthermore, it makes results comparable to other research on the role of implementation quality in school intervention programs, where similar categorizations are prevalent (e.g., Dowling & Barry, 2020; Humphrey et al., 2018). A 2-level multilevel model with students at level 1 and classrooms at level 2 was used to examine the role of implementation quality. For this analysis, data was structured into wide format. At level 2, random intercepts for classrooms, dummy variables for responsiveness and quality of delivery as well as a covariate for class size were entered into the model. In addition, baseline scores for the respective primary outcome were added at level 1. A separate model was examined for each primary outcome.

Table 2 Descriptive statistics for implementation quality variables for each implementation subgroup

Results

Five out of 8 teachers from the Binja intervention group reported that they delivered 100% of the lessons planned, whereas the remaining 3 teachers delivered 87.5%. For the BTP intervention group, 2 out of 3 teachers delivered 87.5% of the lessons planned and one teacher delivered 75%. For one class in the BTP intervention group, no data on the implementation quality is available.

Preliminary Analyses

Intervention and control groups differed on some primary outcome measures and covariates at baseline. Compared to the control group, students in the Binja intervention group showed significantly lower values of differentiating emotions (β = -0.30; SE = 0.15; p = 0.043) and significantly higher values of analyses of emotions (β = 0.31; SE = 0.15; p = 0.031). No significant baseline differences between the BTP intervention group and the control group were found for primary outcomes. Students in both the Binja (β = -0.50; SE = 0.22; p = 0.020) and BTP (β = -1.08; SE = 0.25; p =  < 0.001) intervention groups were significantly younger than students in the control group. Classroom size was significantly smaller in the Binja group than in the control group (β = -3.09; SE = 0.54; p =  < 0.001). No significant between-group differences for gender and SES were found. See Table 1 for mean differences of demographic variables between groups.

Across the whole sample, student-level covariates of gender, age, and SES were correlated with several primary outcomes at baseline (see Table 7 and Table 8 in the Appendix). Boys reported higher values of bodily awareness of emotions (r = 0.33; p =  < 0.001) as well as lower values of prosocial behavior (r = -0.18; p = 0.006), and stress vulnerability (r = -0.32; p =  < 0.001) at baseline. Age was negatively correlated with self-reported anxiety (r = -0.19; p = 0.028). SES was positively correlated with self-reported social participation (r = 0.16; p = 0.023), and cooperative behavior (r = 0.24; p =  < 0.001), as well as with parent-reported social participation (r = 0.14; p = 0.043), and cooperative behavior (r = 0.17; p = 0.013). Further, SES was negatively correlated with parent-reported anxiety (r = -0.15; p = 0.035) and low emotional control (r = -0.22; p = 0.002).

The classroom-level covariate of class size was positively correlated with self-reported social participation (r = 0.14; p = 0.040), and cooperative behavior (r = 0.16; p = 0.016), and negatively correlated with self-reported trait anger (r = -0.17; p = 0.011), and parent-reported affective problems (r = -0.17; p = 0.018).

Intervention Effects

Intervention effects on emotion regulation, social well-being, and emotional well-being are reported for Model 1, without covariates, and for Model 2, with covariates, and are shown in Table 3 and 4. Associations between covariates and several primary outcomes shown by previous research and found in this study (see Table 7 and Table 8 in the Appendix) as well as mean differences of covariates between groups (see Table 1) indicate that it was appropriate to add the covariates of age, gender, SES, and class size to the model. Nevertheless, missing data on covariates lead to lower statistical power in Models 2. Significant group by time interactions indicate an intervention effect in comparison to the control group.

Generally, ICCs were very low (= < 0.05). Although ICCs for some scales were larger than 0.05 (not hiding emotions, anger control, stress vulnerability, trait anger, social participation, and affective problems) they were still small and below 0.10 (refer to Table 3 and Table 4 for ICCs).

Table 3 Intervention effects of SBMT on primary outcomes reported by students
Table 4 Intervention effects of SBMT on primary outcomes reported by parents

For emotion regulation, in Model 1 and Model 2, Binja showed a significant negative effect on not hiding emotions (β = -0.32; SE = 0.14; p = 0.026 and β = -0.38; SE = 0.18; p = 0.033). In addition, in Model 1, there was a significant positive effect by Binja on self-reported anger control (β = 0.37; SE = 0.18; p = 0.038). No significant intervention effects were found on student-reported differentiating emotions, bodily awareness of emotions, and analyses of emotions, nor on parent-reported low emotional control, or anger control.

For social skills, a positive effect on self-reported social participation by the BTP intervention was found in Model 1 (β = 0.39; SE = 0.18; p = 0.037), and in Model 2 (β = 0.52; SE = 0.21; p = 0.015). For parent-reported social participation, BTP showed a negative effect in Model 1 (β = -0.43; SE = 0.20; p = 0.030). Furthermore, Binja had a significant negative effect on parent-reported prosocial behavior in both Model 1 (β = -0.33; SE = 0.15; p = 0.026), and in Model 2 (β = -0.33; SE = 0.15; p = 0.030). There were no significant effects on self- or parent-reported cooperative behavior in Model 1 and in Model 2.

For emotional well-being, significant positive effects on self-reported stress vulnerability by the Binja intervention were found in Model 1 (β = 0.39; SE = 0.15; p = 0.007), and in Model 2 (β = 0.43; SE = 0.18; p = 0.016). No significant effects on student- and parent-reported anxiety, trait anger nor parent-reported affective problems were found.

Marginally significant effects were found for several outcomes. In Model 1, BTP showed marginally significant effects on not hiding emotions (β = -0.32; SE = 0.17; p = 0.059), self-reported anxiety (β = -0.49; SE = 0.25; p = 0.054), and low emotional control (β = 0.37; SE = 0.20; p = 0.061). In Model 2, BTP showed marginally significant effects on not hiding emotions (β = -0.38; SE = 0.18; p = 0.033) and parent-reported social participation (β = -0.33; SE = 0.20; p = 0.099). Binja showed marginally significant effects in both Model 1 and Model 2, on bodily awareness of emotions (β = -0.26; SE = 0.15; p = 0.073 and β = -0.32; SE = 0.18; p = 0.064) and parent-reported affective problems (β = -0.25; SE = 0.14; p = 0.085 and β = -0.27; SE = 0.15; p = 0.069). In Model 2, Binja showed marginally significant effects on anger control (β = 0.35; SE = 0.21; p = 0.099) and parent-reported anxiety (β = -0.23; SE = 0.14; p = 0.098).

The Role of Implementation Quality

Associations between implementation quality and primary outcomes can be derived from Table 5. and 6.. ICCs were generally very low (< = 0.05). For some outcomes (not hiding emotions, depression, trait anger, social participation, and affective problems) ICCs were larger but still small (< = 0.10; refer to Table 5. and 6. for ICCs). Students in classrooms with high responsiveness (β = 1.13; SE = 0.41; p = 0.006), and moderate responsiveness (β = 0.97; SE = 0.35; p = 0.005) reported higher anger control after the intervention in comparison to students in classrooms with low responsiveness.

Table 5. Associations between implementation levels and primary outcomes reported by students
Table 6. Associations between implementation levels and primary outcomes reported by parents

Students in classrooms with moderate quality of delivery reported significantly lower anger control after the intervention than students in classrooms with low quality of delivery (β = -0.66; SE = 0.27; p = 0.014). No significant associations between any parent-reported primary outcomes and responsiveness or quality of delivery were found.

A marginally significant association between moderate responsiveness and lower levels of self-reported social participation was found in comparison to low responsiveness (β = -0.55; SE = 0.31; p = 0.082). Additionally, a marginally significant association was found between high quality of delivery and lower levels of self-reported anxiety after the interventions in comparison to moderate quality of delivery (β = -0.54; SE = 0.32; p = 0.087).

Discussion

Despite the growing interest in SBMT and its evaluation, evidence on the effectiveness of SBMT, although promising, is still scarce and the number of rigorous studies is limited (Dunning et al., 2019). Particularly noticeable is the lack of knowledge on the role of implementation quality, which is essential for a more widespread use of SBMT (Milat et al., 2015). The goal of this study was to evaluate the impact of two distinct and established SBMTs, Binja and BTP, on primary school students’ emotion regulation, social well-being, and emotional well-being. Furthermore, the study examined whether implementation quality was associated with the effectiveness of SBMT. Our findings align with recent, methodologically rigorous evaluation studies that have failed to uncover overall positive effects of SBMT (Kuyken et al., 2022; Lassander et al., 2020; Zelazo et al., 2018). Consistent with recent research (Montero-Marin et al., 2022), we similarly did not find meaningful moderation of these limited effects by implementation quality. Therefore, our results underscore the need for further thorough scientific examination and consideration, particularly in light of the widespread use of SBMT.

In the following sections, we first contextualize intervention effects for Binja and BTP within the broader evaluation literature. We then provide a detailed discussion on study limitations and potential reasons for failing to find further positive SBMT effects. Diverse avenues for future research are highlighted, including the exploration of informant-, baseline-, and age-dependent intervention effects, investigation into adverse effects, and factors supporting research-practitioner partnerships and student engagement.

Intervention Effects

Effects of SBMT found in models with and without covariates are discussed below. The model without covariates should not be disregarded and will also be considered in our discussion, as missing data on covariates reduced statistical power, rendering in some cases significant effects only marginally significant in models with covariates.

Emotion Regulation

For emotion regulation, both expected and unexpected intervention effects were found. Without covariates, Binja showed a stabilizing effect on anger control, whereas students in the control group reported decreasing levels of anger control. Although Binja partially stabilized emotion regulation (i.e., anger control), it does not reflect theorized mechanisms of SBMTs (Roeser et al., 2020; Tang et al., 2015), as emotion regulation was not clearly enhanced and previous findings on the enhancement of emotion regulation could not be replicated (Broderick & Metz, 2009; Schonert-Reichl et al., 2015; van de Weijer-Bergsma et al., 2014).

Moreover, with and without covariates, students in the Binja group reported stable levels of not hiding of emotions, whereas students in the control group unexpectedly reported increasing levels. Apparently, SBMT led to changes in how students responded to items such as “when I am angry or upset, I try to hide this” (reverse coded) or “when I am upset about something, I often keep it to myself” (reverse coded). One would assume that mindfulness teaches individuals to be more aware of their emotions and to deal with them in a non-judgmental, non-reactive way (Kabat-Zinn, 2003), thus not hiding negative emotions. A possible explanation for this unexpected finding could be that both programs educate children on sharing and discussing their feelings, possibly making them realize that they frequently hide their emotions. Future research needs to corroborate this finding.

When additionally considering marginally significant effects in models with covariates, Binja showed a trend towards stabilizing effects on bodily awareness of emotions, whereas students in the control group reported rising levels of bodily awareness of emotions (reverse coded). Thus, Binja had a positive effect on the awareness of bodily symptoms that come with emotions. Furthermore, marginally stabilizing effects of BTP on not hiding of emotions are in line with significant stable levels of not hiding of emotions in the Binja group. Overall, it can be concluded that neither intervention clearly enhanced emotion regulation.

Social Well-Being

For social well-being significant effects were either contradictory or unexpected. While students in the BTP group reported an increase in social participation, their parents reported a decrease in social participation. Furthermore, in the Binja group parents reported an unexpected decrease in prosocial behavior. These effects were found with and without covariates, except for the effect on parent-reported prosocial behavior, which was not found when covariates were added.

Instead of dismissing these inconsistent findings as unreliable reporting, as a result of informants’ biased perspectives or random error, recent data and theoretical reasoning indicates that informant discrepancies are informative and clinically and educationally useful (Achenbach et al., 1987; De Los Reyes & Kazdin, 2005). Indeed, the phenomenon of informant discrepancies in assessments of psychosocial functioning in school-based services and research (De Los Reyes et al., 2019) offers an interesting framework to interpret present contradictory findings. First, interrater agreements for social participation between parents and children at baseline were similar (r = 0.34, p < 0.05) to cross-informant agreements found in representative samples of students and their parents from grades 3–12 for SEL relationship skills (r = 0.27, p < 0.05; Gresham et al., 2018), suggesting that these low to moderate interrater agreements reflect setting-based differences among informants’ opportunities for observing behavior as proposed by de Los Reyes and colleagues (2005).

Second, it has been proposed that multiple outcomes within RCTs may systematically vary in usage of informants and discrepancies may be used to identify meaningful treatment outcomes patterns (De Los Reyes, 2011). Thus, present findings may be interpreted as follows. Possibly, SBMT and particularly BTP enhanced social participation in children, as reported by themselves. Enhanced self-reported social participation, however, may have also lead to more peer conflicts and possibly attempts at withdrawal from peers. Thus, at home, children may have talked more often about these negative experiences and negative peer experiences may be more salient in children’s communication to their parents (cf. negativity bias, Baumeister et al., 2001), hence, parents may have rated social participation lower at post intervention. In any case, this hypothesis would need to be tested within future RCTs.

These mixed findings on social well-being reflect the current state of research that has shown no clear benefit for social behavior (Dunning et al., 2019). In their theory of change, Roeser et al. (2020) regard social behavior as a more distal outcome than mental health, which indicates that it might not be enhanced as easily.

Emotional Well-Being

Results indicate that neither intervention enhanced students’ emotional well-being, as no effects on anxiety, depression, nor trait anger were found. Surprisingly, students in the Binja group reported an unexpected increase of stress vulnerability. Previous research on adverse effects of adults’ meditative practices (for a meta-analysis see, Farias et al., 2020) found that the occurrence of adverse experiences during or after meditation practices is not uncommon, with the most common adverse experience being anxiety. Furthermore, recent advances in mindfulness research propose that attention monitoring skills begin to improve more immediately after initial practice, while a stance of acceptance (non-judgement, non-reactivity) may take longer to cultivate (Desbordes et al., 2015). Thus, it is plausible that the time delay in the development of attention monitoring skills and acceptance may be responsible for heightened emotional reactivity in novice practitioners, which may be experienced as stress vulnerability. Future research is needed to test this hypothesis in children and in the context of SBMT.

When taking the marginally significant effects in models with covariates into account, parents in the Binja group reported a trend towards a decrease in affective problems. Parents in the Binja group additionally reported a marginally significant decrease in anxiety problems. Nevertheless, the unexpected and amplifying effect of Binja on children’s stress vulnerability indicates that SBMT has potentially negative effects on children’s well-being.

Wrapping up Intervention Effects

Taken together, neither of the interventions clearly enhanced emotion regulation, social well-being, nor emotional well-being. One possible explanation for these findings that indicate limited effectiveness, is that participating students did not belong to an at-risk population. Comparisons of baseline means to norm values (Achenbach & Rescorla, 2001; Beck et al., 2005; Brunner & Spielberger, 2009; Lohaus et al., 2006) and means found in measurement validation studies (Rueth et al., 2019) indicate that on average, participating students and their parents reported typical emotion regulation, as well as normative levels in social and emotional well-being. Supporting the hypothesis that low baseline levels may moderate intervention effects, one study found that positive effects of SBMT on executive functions were larger for children with poorer executive functions (Flook et al., 2010). Thus, children growing up in low-SES environments might benefit more from SBMT, as various poverty-linked stressors impede the development of self-regulation and impair mental health (Blair, 2010; McEwen, 2000). In line with this argument, one study with at-risk 17-year-olds found that SBMT reduced depressive symptoms (Bluth et al., 2016). In contrast, a study with 11- to 12-year-olds found that for pre-adolescents at risk for mental health problems, SBMT had a negative effect on their emotional well-being (Montero-Marin et al., 2022). Future research is needed to better understand differential and possibly age-dependent intervention effects.

Another explanation might be that intervention effects take time to unfold and might have been detected in a later follow-up measurement. A meta-analysis has shown that effects of SBMT were larger at follow-up (Klingbeil et al., 2017) and a study found that beneficial effects on emotional awareness and anxiety of 8- to 12-year-old children were stronger 7 weeks after the intervention (van de Weijer-Bergsma et al., 2014). However, the effect of mindfulness-based interventions for adults was found to be smaller at follow-up than right after the intervention (Khoury et al., 2013). Unfortunately, we have not been able to test follow-up effects because the attrition rate at follow-up was very high. One reason for this high attrition rate might have been the extensive self- and parent-report questionnaires. Additionally, teachers and parents were faced with severe challenges during the COVID-19 pandemic (Hascher et al., 2021; Mohler-Kuo et al., 2021), which may have hampered their motivation to complete the questionnaires for a third time. Furthermore, the fact that one study found that the effect of SBMT is dependent on the degree to which students practiced mindfulness (Kuyken et al., 2013), suggests that mindfulness practice would have had to be continued after the intervention to increase its effectiveness at follow-up.

Besides long-term SBMT effects, a school-wide implementation might show different outcomes, while in this study interventions were delivered in single classrooms only. A school-wide implementation would allow to integrate mindfulness into daily routines and contribute to a positive school climate similar to the implementation of school-wide SEL programs (Oberle et al., 2016). Although school-wide implementation has proven feasible for SEL programs (Meyers et al., 2019), there are no studies examining school-wide implementations of SBMT. Program developers and scholars should therefore consider school-wide implementation of SBMT.

Last but not least, although only limited beneficial effects were found, anecdotal reports by teachers reveal great appreciation for SBMT. One teacher for example, expressed astonishment in students’ ability to practice mindful awareness of bodily experiences. Another teacher revealed that some students practiced mindful breathing in moments of distress and that SBMT enabled profound conversations about feelings in class. Thus, it is important to acknowledge that the external validity of our findings may be limited. Therefore, future studies should not solely rely on quantitative questionnaire data but should also consider the "student voice" (Cook-Sather, 2006) by incorporating qualitative data. As proposed by Huynh et al. (2019), employing mixed methods study designs can offer a more comprehensive understanding of why, how, when, and for whom SBMT may be effective.

Effects of Implementation Quality

Results indicate that implementation quality was associated with the effectiveness of SBMT only regarding student-reported anger control. As hypothesized, students in classrooms with higher and moderate responsiveness of participants reported better anger control after the intervention than students in classrooms with low responsiveness. Furthermore, a marginally significant effect indicated an unexpected effect of responsiveness for social participation, as after the intervention, students in classrooms with moderate responsiveness reported marginally lower levels of social participation than students in classrooms with low responsiveness.

As for quality of delivery, we found an unexpected effect for anger control, such as students in classrooms with moderate quality of delivery reported significantly lower anger control after the intervention than students in classrooms with low quality of delivery. Furthermore, a marginally significant effect was found for students in classrooms with high quality of delivery who reported lower levels of anxiety after the intervention compared to students in classrooms with moderate quality of delivery. However, this finding must be interpreted cautiously, as many classes did not complete this part of the student questionnaire. Hence the sample size was much smaller.

Previous findings by Sciutto et al. (2021), which indicate that responsiveness is associated with the effectiveness of SBMT to reduce externalizing behavior and enhance prosocial behavior could not be replicated. However, in the study by Sciutto et al. (2021) students’ outcomes were reported by teachers, which could be subject to bias as teachers participated in the SBMT themselves. Furthermore, research indicating that implementation quality affects SEL program effectiveness (Dowling & Barry, 2020; Humphrey et al., 2018) could not be corroborated for SBMT in our study.

Several issues with the measurement of implementation quality may explain these results. First, it is important to note that the categorization of implementation quality is based on statistical and not qualitative terms. Thus, according to the questionnaire, classes categorized as low in responsiveness and quality of delivery still reported moderate implementation quality on average (see Table 2). Second, as teachers rated their own performance, social desirability might have led to an inaccurate assessment of implementation quality (Lillehoj et al., 2004). Third and purely hypothetical, it is possible that teachers that are more critical of their practice have reported lower implementation quality because they saw room for improvement, even though their implementation quality might have exceeded the implementation quality of teachers less critical of their practice. Previous research supports this tentative interpretation of teachers’ characteristics playing a crucial role in the implementation of school-based interventions. Specifically, one study found that teachers who faced higher levels of work-related stress, attended the most teacher trainings, and were most highly engaged in a comprehensive, classroom-based intervention in Head Start classrooms compared to their less work-stressed colleagues (Li-Grining et al., 2010). Similarly, Domitrovich et al. (2009) showed that Head Start teachers reporting higher levels of emotional exhaustion were more rather than less involved in implementing a new classroom-based intervention with high fidelity. Finally, ICCs were generally low, thus only a small amount of between-classroom variance could be explained by classroom-level implementation quality.

Taken together, it can be concluded that implementation quality did not substantially affect the effectiveness of SBMT. This fortifies findings on intervention effects, as at large, the examination of implementation quality failed to reveal stronger benefits for students in classes with high implementation quality (for similar findings and conclusions see, Montero-Marin et al., 2022).

Limitations and Future Directions

There are several potential limitations concerning the results of this study. First, the rate of attrition regarding primary outcomes was high, which led to the decision to not include follow-up measurements in the analysis. Furthermore, for some variables the number of cases and classes has been severely reduced, especially for all parent-report measurements and self-report measurements of emotional well-being, and parent-report measurements of BTP, possibly leading to bias in the parent-reported BTP effects. In future studies, less extensive questionnaires or individual interviews could lead to less attrition at follow-up assessments. Second, implementation quality was assessed by teacher-reports only, which may be subject to positivity and desirability bias as previous research has shown that teachers rate implementation quality higher than independent observers (Hansen et al., 2014; Lillehoj et al., 2004). Here, additional ratings of video recordings and student-reports could improve the validity of implementation quality measurements. Furthermore, a comparison of self- and observer-ratings would allow to investigate whether more experienced and self-reflective teachers are less prone to positivity or desirability bias. While researchers must balance higher measurement validity with the cost, time investment, and privacy concerns associated with ratings based on video, findings from this study that indicate no association between teacher-reported implementation quality and effectiveness of SBMT suggest a need for a more comprehensive evaluation of implementation quality. Third, due to a smaller sample size, statistical power for intervention effects of BTP is lower than for Binja. This limits comparability of intervention effects of Binja and BTP. Fourth, findings on the role of implementation quality are limited by the small sample size on the classroom-level. Nevertheless, as this is one of the first studies to systematically investigate the role of implementation quality in the effectiveness of SBMTs, findings should not be disregarded. Fifth, as teachers voluntarily chose to implement SBMT, motivation for in-depth engagement might have been especially high (Bowden et al., 2020). This limits the generalizability of the findings for widespread application and possible inclusion of SBMT in regular school curricula, where teachers might be less motivated. Importantly, though, teacher qualification for delivering the SBMTs was not part of this study (i.e., participating teachers had already been trained in the mindfulness protocols prior to this study). Future research may monitor teacher training more closely to ensure satisfactory levels of teachers’ ability to provide these programs. Sixth, curricula for Binja and BTP slightly differ depending on grade levels and both interventions offer optional content and room for program adaptations. Hence, SBMT varied between classes, which limits comparability. Here, assessment of program adherence and taught components could have explained additional variance between classrooms.

Future research should measure the extent to which single components of SBMT curricula are taught (e.g., Espil et al., 2021). This would allow to examine which components of SBMT curricula are effective and thereby disentangle the effectiveness of mindfulness practice and components of psychoeducation. Seventh, some subscales (EAQ, SOCOMP, CBCL/6-18R) showed relatively low reliabilities (Cronbach’s alpha <  = 0.70) in one measurement point. After careful consideration, we decided to include these subscales nevertheless because reliabilities found in this study do not substantially differ from the ones in the original validation samples. Additionally, we made sure that the reliability was checked with alternative statistics (i.e., the interrelatedness of the items was not threatened by negative covariances) and that the validity was given (i.e., low to moderate interrater agreements for domains that were assessed with the same instrument in students and parents).

Finally, the study took place during the COVID-19 pandemic, which further limits the generalizability of the findings. Although during the spring semester of 2021 in-class teaching took place in Switzerland, teachers and parents were still faced with severe challenges (Hascher et al., 2021), which probably had an impact on the implementation and evaluation of the programs (e.g., poor classroom climate for teaching the lessons and completing the questionnaires). Moreover, effects found in this study may be confounded with the negative effects that the pandemic had on students’ mental health (Schuler et al., 2022).

Although implementation quality hardly affected the effectiveness of either intervention it would be inappropriate to disregard the importance of implementation quality for SBMT, as implementation quality plays a central role in similar school-based interventions (Durlak et al., 2011). However, this study laid bare that additional research with comprehensive assessment of implementation quality, that goes beyond teacher self-report is necessary. Furthermore, future research should examine whether implementation quality can be enhanced by measures such as training, expert supervision, and team teaching. This knowledge is highly relevant to ensure good implementation quality as SBMT becomes more widespread and is therefore more likely to be taught by regular teachers (Doyle et al., 2018).

A promising avenue for future work may also be the implementation and evaluation of combined mindfulness programs (i.e., teacher training with subsequent classroom training). In light of research suggesting that teaching mindfulness to students is more effective when teachers themselves have established a mindfulness practice, such a combination appears meaningful (Shapiro et al., 2016). Little is known, however, about how the implementation of mindfulness in the classroom setting, can sustain or enhance the effects achieved by teacher training sessions (see Rohner et al., (under rev.), for a recent study disentangling the effects of teacher and classroom training).

On another note, facilitating teachers, principals, and parents’ comprehension of the evidence supporting SBMT may prove crucial for fostering stronger research-practitioner partnerships and promoting the advancement of evaluation research overall (Nguyen et al., 2022). Finally, to enhance student engagement, offering optional participation or providing choices among various mental health-promoting approaches may be an intriguing strategy to explore, especially with older students who may also have the opportunity to co-design interventions (Kuyken et al., 2023).

Conclusion

Findings on intervention effects are ambiguous and for the most part not reflective of theorized mechanisms of SBMT. Although findings do not indicate a clear-cut superiority to treatment as usual under all circumstances, partially beneficial effects on social well-being and emotion regulation are promising. Furthermore, anecdotal reports by teachers reveal great appreciation for SBMT. The fact that implementation quality was hardly associated with the effectiveness of SBMT, does not necessarily imply that implementation quality does not matter. A more comprehensive assessment of implementation quality and a larger sample size could help determine under which circumstances SBMT can be effective, which is essential for the implementation of SBMT on a larger scale.