Mindfulness-Based School Interventions: a Systematic Review of Outcome Evidence Quality by Study Design

The purpose of this systematic review was to assess the current literature on mindfulness-based school interventions (MBSIs) by evaluating evidence across specific outcomes for youth. We evaluated 77 studies with a total sample of 12,358 students across five continents, assessing the quality of each study through a robust coding system for evidence-based guidelines. Coders rated each study numerically per study design as 1 +  + (RCT with a very low risk of bias) to 4 (expert opinion) and across studies for the corresponding evidence letter grade, from highest quality (“A Grade”) to lowest quality (“D Grade”) evidence. The highest quality evidence (“A Grade”) across outcomes indicated that MBSIs increased prosocial behavior, resilience, executive function, attention, and mindfulness, and decreased anxiety, attention problems/ADHD behaviors, and conduct behaviors. The highest quality evidence for well-being was split, with some studies showing increased well-being and some showing no improvements. The highest quality evidence suggests MBSIs have a null effect on depression symptoms. This review demonstrates the promise of incorporating mindfulness interventions in school settings for improving certain youth outcomes. We urge researchers interested in MBSIs to study their effectiveness using more rigorous designs (e.g., RCTs with active control groups, multi-method outcome assessment, and follow-up evaluation), to minimize bias and promote higher quality—not just increased quantity—evidence that can be relied upon to guide school-based practice.

Many preschool, elementary, and high school students experience problems related to anger, anxiety, depression, and low self-esteem (Barnes et al., 2003;Fisher, 2006;Langer et al., 2015;Mendelson et al., 2010;Rempel, 2012) that negatively influence their academic and social development (Leigh & Clark, 2018;Maughan et al., 2013;Murphy et al., 2015) and have lasting effects on their well-being (Steger & Kashdan, 2009). Schools can play a pivotal role in promoting students' mental health and their social, emotional, and behavioral development (Barnes et al., 2003;Fisher, 2006;Mendelson et al., 2010). To address these challenges, many schools have adopted mindfulness-based interventions (MBIs). Studies conducted over the past 15 years have examined the impact of MBIs on mental health, educational performance, and related outcomes in children and adolescents (Kallapiran et al., 2015;Meiklejohn et al., 2012).
Mindfulness is the process by which we "pay attention in a particular way: on purpose, in the present moment and nonjudgmentally" (Baer, 2003;Roeser, 2014). Originally adapted for adults, practicing mindfulness typically includes meditation exercises and bringing mindful awareness to daily activities, such as eating and walking. These practices are intended to foster purposeful focused attention, coupled with a nonjudgmental attitude toward moment-tomoment experience (Kabat-Zinn, 2003). Mindfulness-based interventions target many aspects of well-being, resiliency, and mental health by cultivating a present-centered awareness and acceptance (Fjorback et al., 2011;Gawrysiak et al., 2018;Greeson, 2009;Khoury et al., 2013;Roeser, 2014). In particular, emotion regulation has been the focus of much MBI research (Guendelman et al., 2017;Wisner, 2014). Individuals who have difficulty with emotion regulation have problems processing, experiencing, expressing, and managing emotions effectively (Chambers et al., 2009). Furthermore, the nonjudgmental awareness in mindfulness may facilitate a healthy engagement with emotions, allowing individuals to experience and express their emotions without under-engagement (e.g., experiential avoidance and thought suppression) or over-engagement (e.g., worry and rumination; Hayes & Feldman, 2004;Ivanovski & Malhi, 2007). Specifically, research indicates that MBI with adults can increase awareness of moment-to-moment experience and promote reflection, empathy, and caring for others (Hölzel et al., 2011). Mindfulness training with adults can also improve stress regulation, resilience, anxiety, and depression (Forkmann et al., 2014;Hofmann et al., 2010;Irving et al., 2009;Klatt et al., 2015;Li & Bressington, 2019;Marcus et al., 2003;Morton et al., 2020;Tang et al., 2007).
The practices incorporated in MBSIs include psychoeducation about emotions and mindfulness, as well as specific mindfulness exercises, including awareness of breath, mindful body scans, and awareness of thoughts, feelings, and sensations. MBSIs are often delivered in the context of whole class instruction (general population of students) or targeted intervention (at-risk or clinical populations; Kuyken et al., 2013;Napoli et al., 2005;Raes et al., 2014). In addition, MBSIs are offered in a variety of formats (i.e., delivered by the research team or teacher, as multi-session programs or brief single-session workshops, with a variety of activities and exercises included), which previous reviews have shown to impact the effectiveness of MBSIs (Bender et al., 2018;Carsley et al., 2018;Schonert-Reichl & Roeser, 2016;Semple et al., 2017).
Mindfulness practices targeting school-aged populations include developmentally appropriate adaptations for children and adolescents (Bostic et al., 2015;Carsley et al., 2018). For example, time for practices is shorter; they incorporate multiple sensory modalities into activities, and rely on simplified metaphors to communicate difficult concepts; and there is more time for explaining key concepts (Burke, 2010;Felver et al., 2013). Most MBSIs tested in schools are designed to increase resilience to stress and decrease depression and anxiety symptoms (Wisner, 2014). Early studies showed promising results in decreasing anxiety, fatigue, depressive symptoms, stress-related issues, and disorders for various conditions (Bei et al., 2013;Fjorback et al., 2011;Grossman et al., 2004;Piet & Hougaard, 2011;Piet et al., 2012). Furthermore, mindfulness training for youth has been shown to be efficacious for some neurocognitive, psychosocial, and psychobiological outcomes while also showing that MBIs are feasible and acceptable for youth in schools (Black, 2015). Although there have been studies examining outcomes of MBIs, there are limited reviews focused solely on school-based interventions. Additionally, it is important to examine which outcomes show promising results together with outcomes that are not improved through MBSIs. Previous reviews and meta-analyses examined the quantity and strength of the evidence but did not weigh this by the quality of the evidence according to research design. Thus, the present study addresses this gap in the literature by providing a systematic review that examines MBSIs on youth outcomes by quality of study design using evidence-based guidelines, which is key to advancing the field of MBSIs. Prior to turning to the present study, we first consider what is known from previous reviews of MBI with youth and in schools.
These reviews indicate the need for future studies to examine the effects of MBI with youth and in schools on symptoms of psychopathology, to include more active controls as the comparison group to allow future meta-analyses to compare the effects of the intervention, and to examine potential moderators that potentially influence program effectiveness (e.g., length of program), as well as to investigate the additional benefit of incorporating mindfulness practices with other evidence-based practices.
Considering the findings from the previous meta-analyses and systematic reviews, there seems to be a clear pattern of evidence suggesting that MBIs are, on the whole, safe and effective for use with youth (generally) as well as in schools (specifically) for improving a host of valued outcomes. Although most of the outcomes in most reviews showed small to moderate positive effects, it is noteworthy that some reviews yielded null effects for some outcomes. For example, Maynard et al. (2017) found no effect for behavioral and academic outcomes; similarly, Zenner et al. (2014) found no effect for emotional problems. Therefore, further examination is needed on the consistency of positive outcomes from MBSIs. That said, it is also important to note that none of the previous reviews indicated harmful or iatrogenic effects.
Finally, previous reviews have not focused on grading the quality of evidence but instead produced the average effect sizes. Given that several reviews collapsed all the studies together, the evidential quality is mixed, which makes it challenging to know how strong the quality of evidence is that supports the outcomes (Bender et al., 2018;Black, 2015;Carsley et al., 2018;Klingbeil, Fischer, et al., 2017;Klingbeil, Renshaw, et al., 2017;Maynard et al., 2017;Semple et al., 2017;Zenner et al., 2014;Zoogman et al., 2015). Likewise, one review that only examined RCTs produced much higher quality evidence (Kallapiran et al., 2015). Since these reviews either collapsed all studies together or looked at RCT only, none of the reviews systematically considered the quality of evidence both across study designs and within RCTs.
To address the growing interest in MBSIs and to inform those choosing programs, we systematically reviewed published studies of MBSIs for youth in schools (cf. Felver et al., 2016;Zenner et al., 2014). Unlike prior systematic reviews and meta-analyses, our review sought to examine the quality of outcome evidence by research design, as well as the quantity of evidence across studies. Specifically, the first objective was to determine the quality of the evidence across diverse outcomes including well-being, self-compassion, social functioning, mental health, self-regulation and emotionality, mindful awareness, attentional focus, psychological and physiological stress, problem behaviors, academic performance, and acceptability. The second objective was to investigate the quantity of the evidence across studies. Finally, the quality and quantity combined was examined across studies to determine which outcomes are most robustly associated with MBSIs. We anticipate that findings from our systematic review would contribute to the literature by providing evidence-based recommendations to clinicians, educators, and school-based researchers on which specific outcomes can be reliably targeted with MBSIs.

Methods
We identified studies through a systematic search of published articles of MBSIs with youth from the first available date until July 2021. The electronic databases searched were PsycINFO, EBSCOHost, MEDLINE, and CINAHL using terms related to MBSIs: (school-based mindfulness 1 3 interventions subt.exact (("mindfulness" OR "mindfulnessbased interventions" AND "students" OR "preschool students" OR "elementary school students" OR "high school students" OR "adolescent" OR "schools" OR "adolescent development" OR "curriculum" OR "teachers" OR "educational programs" OR "middle school students" OR "elementary school teachers" OR "public school education") NOT ("middle aged" OR "yoga" OR "college students" OR "young adult" OR "occupational stress" OR "parents" OR "chronic pain" OR "drug abuse" OR "neoplasms" OR "parenting" OR "substance-related disorders" OR "relapse prevention" OR "no terms assigned" OR "psychotherapy" OR "test construction" OR "health care services" OR "medical students" OR "mobile phones" OR "adult" OR "pregnancy")) NOT su.exact ("Thirties (30-39 yrs)" OR "Middle Age (40-64 yrs)" OR "Aged (65 yrs & older)" OR "Very Old (85 yrs & older)") NOT po.exact ("Outpatient" OR "Inpatient" OR "Animal") AND PEER(yes) AND la.exact ("English") NOT rtype.exact ("Comment/Reply" OR "Editorial" OR "Erratum/Correction" OR "Review-Book" OR "Letter")). We found 352 articles through this initial search prior to eligibility coding (see Fig. 1 for the study selection process). In defining MBSIs, we selected only intervention studies that applied mindfulness meditation including dialectical behavior therapy (Linehan, 1993) and acceptance and commitment therapy (Strosahl & Wilson, 1999) as intervention frameworks since they both focus on acceptance and mindfulness.

Eligibility Ratings
Two coders assessed the eligibility of each journal article for inclusion based on the following criteria: (1) peer-reviewed journal article; (2) mindfulness-based school intervention, program, or strategies; (3) mindfulness outcome on teachers or children and/or implementation outcomes; (4) review paper on school-based mindfulness interventions; and (5) grade levels from kindergarten to 12th grade. Exclusion criteria included the following: (1) studies focusing only on yoga, creativity, or other approaches not specific to mindfulness; (2) parent-based training on mindfulness; (3) clinic-based mindfulness interventions; (4) student age group ≥ 22 years (as students with disabilities in the USA can stay at school until they are 21 years old). Raters reached high inter-rater reliability (k = 0.98) in determining article eligibility. When raters disagreed, they discussed eligibility to reach a consensus.

Extracted Data from Studies
The following information was extracted from each study: (1) country, (2) sample characteristics (sample size, mean age [or age range if mean was not provided], percentage of males and females, ethnicity, socioeconomic status, whether children were of a special needs population), (3) information on the school level (preschool, elementary, middle, or high school), classroom setting (general education, special education, or alternative school; private or public), (4) type of intervention, (5) research design (quantitative, qualitative, or mixed), (6) evaluation design (e.g., RCT, pre-post), (7) the mediator (i.e., person who conducted the intervention), (8) the findings on outcomes (outcome measures), (9) outcome measure type (self-report, teacher-report, etc.), (10) control group, and (11) whether teacher training was provided. We believe it is important to consider the research and evaluation design of studies given the impact of methodological variations on the results. Furthermore, it is also essential to examine whether teacher training was provided since research shows that there are significant effects at follow-up

Evidence Ratings
We used a robust system for grading recommendations in evidence-based guidelines (Harbour & Miller, 2001) to weigh evidence per study design in a two-step process.
Using PRISMA 2020 as a guideline for our systematic review, we used the Harbour and Miller (2001) ratings to examine the level of evidence since PRISMA 2020 recommends assessing certainty in the body of evidence of an outcome (item #15 in the PRISMA checklist) and to present assessments of certainty in the body of evidence for each outcome assessed (item #22 in the PRISMA checklist). We are not using the Harbour and Miller guidelines in replacement of the PRISMA 2020 guidelines, but rather to grade evidence per study design in order to adhere to items #15 and #22 in the checklist. As such, we graded evidence based on the methodological rigor of studies to draw conclusions about the state of the science of MBSIs, and to make informed recommendations to advance the field. First, for all eligible articles, two authors independently assigned a numerical rating regarding the level of evidence for each article on a scale outlined by Harbour and Miller (2001), ranging from 1 + + (RCTs with a very low risk of bias), 1 + (RCTs with a low risk of bias), 1 − (RCTs with a high risk of bias), 2 + + (high-quality case-control or cohort studies with a very low risk of confounds, bias, or chance, and a high probability that the relationship is causal), 2 + (well-conducted case-control or cohort studies with a low risk of confounds, bias, or chance and a moderate probability that the relationship is causal), 2 − (case-control or cohort studies with a high risk of confounds, bias, or chance and a significant risk that the relationship is not causal), 3 (non-analytic studies, e.g., case reports, case series) to 4 (expert opinion). We further specified criteria relating to risk of bias; for example, studies rated as 1 + + were RCTs that include at least three of the following criteria: competence/fidelity measurement, daily program implementer meetings, high participant attendance rate of 90% or higher, experienced program implementer, large sample size, 8 week or longer sessions, conducted follow-ups post-intervention. See Table 1 for the full grading system of recommendations in evidence-based guidelines. Using the breakdown mentioned above, ratings of studies included in this review ranged from 1 + + , 1 + , 1 − , 2 + + , 2 + , 2 − , 3 to 4, with Table 1 Grading system for recommendations in evidence-based guidelines based on Harbour and Miller (2001) Levels of evidence • 1 + + RCTs with a very low risk of bias, competence/fidelity measured, program implementers meet regularly to prevent drift, facilitator/ teacher blind to study condition, participant attendance rate 90% or higher, program implementer has 3 + years of mindfulness training, large sample size (> 100), 8-week or longer, 10 session course, follow-ups on studies that are 12 months or longer • 1 + RCTs with a low risk of bias, facilitator/teacher blind to study condition, participant attendance rate 80% or higher, medium sample size (40-100), 6-7 week or 8-9 session course • 1 − RCTs with a high risk of bias, small sample size (< 40), self-reported data, facilitator/teacher not blind to study condition, competence/ fidelity not formally measured, single study site (less generalizable), high percentage of female vs. male (or vice versa), < 6 week or < 8 session, implementation of program was shorter than intended • 2 + + High-quality case-control or cohort studies with a very low risk of confounds, bias, or chance and a high probability that the relationship is causal, competence/fidelity measured, program implementers meet regularly to prevent drift, facilitator/teacher blind to study condition, participant attendance rate 90% or higher, program implementer has 3 + years of mindfulness training, large sample size (> 100), 8-week or longer, 10 session course, follow-ups on studies that are 12 months or longer, has a control group • 2 + Well-conducted case-control or cohort studies with a low risk of confounds, bias, or chance and a moderate probability that the relationship is causal, facilitator/teacher blind to study condition, participant attendance rate 80% or higher, medium sample size (40-100), 6-7 week or 8-9 session course • 2 − Case-control or cohort studies with a high risk of confounds, bias, or chance and a significant risk that the relationship is not causal, small sample size (< 40), self-reported data, facilitator/teacher not blind to study condition, competence/fidelity not formally measured, single study sight (less generalizable), missing data, high percentage of female vs. male (vice versa), < 6 week or < 8 session, lack of control group, implementation of program was shorter than intended • 3 Non-analytic studies, e.g., case reports, case series • 4 Expert opinion Grades of recommendations • A At least one RCT rated as 1 + + and directly applicable to the target population, or a body of evidence consisting principally of studies rated as 1 + directly applicable to the target population and demonstrating overall consistency of results • B A body of evidence including studies rated as 2 + + directly applicable to the target population and demonstrating overall consistency of results, or extrapolated evidence from studies rated as 1 + + or 1 + • C A body of evidence including studies rated as 2 + directly applicable to the target population and demonstrating overall consistency of results, or extrapolated evidence from studies rated as 2 + + • D Evidence level 3 or 4 or extrapolated evidence from studies rated as 2 + 1 3 high inter-rater reliability (k = 0.91). Raters discussed the six discrepant articles that they initially rated differently until they reached a consensus on the ratings. Second, after determining the level of evidence for each article, a lettered grading system was applied based on a summary of the numbered ratings across studies: A (at least one RCT rated as 1 + + and directly applicable to the target population, or a body of evidence consisting principally of studies rated as 1 + directly applicable to the target population and demonstrating overall consistency of results), B (a body of evidence including studies rated as 2 + + directly applicable to the target population and demonstrating overall consistency of results), C (a body of evidence including studies rated as 2 + directly applicable to the target population and demonstrating overall consistency of results), and D (a body of evidence including studies rated as 3 or 4). See Table 1 for the full grading system of recommendations in evidence-based guidelines with further specificity per evidence rating level. There was often variability in the numbered study ratings across outcome measures. The ultimate letter grade was determined by the inclusion of the number and number rating for high-quality studies (1 + + or 1 +), as described above. For example, for an outcome documented in two studies rated 1 + and 3, the letter grade would be Grade B as there was only one 1 + rated study (if there was a 1 + + rated study or a body of 1 + rated studies, the letter grade would be Grade A).

Study Characteristics
We identified 77 eligible articles, which incorporated data from 12,358 students across 5 continents (North America, South America, Europe, Asia, and Australasia). The breakdown of articles by methods was as follows: 9 qualitative, 49 quantitative, and 19 mixed methods. For the control group type, there were 28 active control groups, 21 passive control groups, and 28 without a control group. There were 35 elementary schools, 8 middle schools, 25 high schools, 1 preschool, 5 mixes of elementary and middle schools, and 3 mixes of middle and high schools. Given that all studies took place in a school setting, the data from this review are community-based instead of clinically based.
Forty-three percent of schools did not report on setting (e.g., public, private), but across those that did, 22% were private, 55% public, 5% alternative schools, 2% specialized school, and 16% a combination of schools. Fifty-two percent of children were female. Forty percent of studies did not include race/ethnicity, but those that did showed a diverse sample of 44% while 16% had homogenous samples within the study. Likewise, most studies did not include socioeconomic status (62%).
Regarding the person that mediated the treatment delivery, 3% did not report on the mediator, and of the studies that did report on the mediator, 40% were researchers, 28% teachers, 19% trained instructors, 7% mix of researcher and teacher/mindfulness instructor, 4% mindfulness instructors, and 3% counselors. In terms of teacher training on mindfulness interventions, only 31% reported teacher training. Furthermore, 50% reported using self-report as their outcome measure, 17% used both teacher report and self-report, 11% used a cognitive test with teacher or self-report, 8% used only teacher report, 8% used two or more measures, and 7% used other forms of outcome measure (i.e., computer tasks, cognitive tests, observation). See Online Resource 1 and Online Resource 2 for participant demographics, design, and methods for each of the 77 included studies.

Summary of the Highest Quality Evidence Across Outcomes
In this systematic review of the quality of existing scientific literature base of MBSIs (see the "Methods" section, "Evidence Ratings"), the strongest level of evidence ("A Grade") across outcomes indicated that MBSIs increased prosocial behavior, resilience, executive function, attention, and mindfulness, and decreased anxiety, attention problems/ADHD behaviors, and conduct behaviors, with evidence for wellbeing being split, with some studies showing increased wellbeing and some showing no improvements. As described in the "Methods" section, "A Grade" evidence comes from at least one RCT rated as 1 + + and directly applicable to the target population, or a body of evidence consisting principally of studies rated as 1 + directly applicable to the target population and demonstrating overall consistency of results. See Table 1 for a description of each level of evidence, Table 2 for the outcomes per study, Fig. 2 for the breakdown of studies for each outcome by quality, and Online Resource 3 for the numbered list of included studies from Table 2.  1 C = Weight/shape concern 32, 33 C ↓ Weight/shape concern 1 C 5) Self-regulation and emotionality Self-regulation

3
Below we summarize the results per outcome type, highlighting "A Grade" and "B Grade" evidence, and noting any differences that were apparent between the overall summary of results from pre-to post-treatment incorporating all studies and when examining studies per research design (quantitative, qualitative, and mixed), evaluation design (RCT, pre-post, single case/series, etc.), or per control group type (active, passive, none). For a full breakdown of outcomes by these study characteristics and individual study evidence ratings, see Online Resource 4.

Well-being
Ten of the 77 eligible articles (13%) targeted well-being domain outcomes. Results were mixed regarding well-being outcomes, with 50% of studies showing improved wellbeing, and the rest showing no difference (42%) or lower well-being (8%). The mixed results from studies specifically studying well-being were both from "A Grade" evidence. No differences were apparent when examining results per research design, evaluation design, or control group type, except no pre-post design studies reported null improvements in well-being.

Self-compassion
Five of the 77 eligible articles (6%) targeted self-compassion domain outcomes. 100% of studies across research designs, evaluation designs, and control group types that examined self-compassion showed greater improvement. There was no "A Grade" evidence and the strongest evidence ("B Grade") documented higher school self-concept.

Social Functioning
Fifteen of 77 eligible articles (19%) targeted social functioning domain outcomes. Most studies (86%) that examined social functioning found that MBSIs improved social relationships and social participation as well as reduced social bias, and those that found no improvements were of low evidence quality ("C and D Grades"). The highest quality of evidence documented ("A Grade") was for improvements in prosocial behavior, followed by "B Grade" evidence showing improvements in empathy and social competence, and reduced prejudice towards outgroups. No differences were apparent when examining results per research design, evaluation design, or control group type, except no pre-post or passive design studies reported null improvements in social functioning.

Mental Health
Nineteen of 77 eligible articles (25%) targeted mental health domain outcomes. Most studies reported reduced depression and anxiety symptoms (71% and 80%, respectively). However, higher quality evidence ("A Grade") shows no decrease in depression symptoms (compared to "B Grade" evidence that does show a decrease in depression symptoms). By contrast, studies showing no decrease in anxiety were of lower quality evidence ("C Grade") compared to evidence showing a decrease in generalized anxiety disorder, worry, and panic disorder ("A Grade"), or anxiety symptoms ("B Grade"). The one study examining suicidality and the one study examining trauma each found reduced symptoms. Only one of the three studies examining eating disorder symptoms reported a reduction in symptoms. No differences were apparent when examining results per research design, evaluation design, or control group type, except no pre-post design studies reported null improvements in mental health.

Self-regulation and Emotionality
Thirty-one of 77 eligible articles (40%) targeted self-regulation and emotionality domain outcomes. Most studies (97%) in this category reported improved self-regulation and emotionality across research designs, evaluation designs, and control group types, except for one study of "C Grade" evidence that found no change in negative affect. No differences were apparent when examining positive vs. null improvement studies in terms of research design, evaluation design, or control group type. For the self-regulation category, the highest quality evidence ("A Grade") documented improvements in resilience and executive function, followed by "B Grade" evidence showing improvements in self-and emotion regulation, coping skills, and cognitive control, as well as more frequent relaxed states at school.

Fig. 2
Breakdown of studies for each outcome by quality. Note: Acceptability outcomes were not included in the breakdown as few studies examined this outcome For the emotionality category, the highest quality studies ("B Grade") documented higher positive moods and lower negative feelings.

Mindful Awareness
Eleven of 77 eligible articles (14%) targeted mindful awareness domain outcomes. All studies documented improved perspective-taking and having a positive outlook, and most (73%) documented improvements in mindfulness; however, evidence showing no improvements in mindfulness was of a lower quality ("C Grade"). No differences were apparent between positive and null improvement studies when examining results per research design, evaluation design, or control group type. The strongest evidence ("A Grade") showed improvements in mindfulness, followed by "B Grade" evidence showing increased awareness of thoughts, feelings, emotions, and bodily sensations, being more present in life as well as decreased mind-wandering.

Attentional Focus
Twenty of 77 eligible articles (26%) targeted attentional focus domain outcomes. Most studies (95%) showed improvements in attention and reduced impulsivity across research designs, evaluation designs, and control group types, except one study finding no effects in task-shifted facilitation; however, evidence showing no improvements was of a lower quality ("C Grade"). The highest quality evidence ("A Grade") found increased attention, and decreased attention problems and ADHD behaviors, followed by "B Grade" evidence showing increased concentration, and decreased distractibility and impulsivity.

Psychological and Physiological Stress
Fifteen of 77 eligible articles (19%) targeted psychological and physiological stress domain outcomes. Overall, most studies (73%) showed that MBSIs decreased psychological and physiological stress. Specifically for psychological stress, eight studies showed a reduction in stress ("B Grade" evidence), one study (7%) showed a null effect on stress ("C Grade" evidence), and two studies (13%) showed an increase in psychological stress ("D Grade" evidence). Specifically for physiological stress, four studies showed a reduction in stress ("B-D Grades" evidence) and one study showed an increase in stress ("B Grade" evidence). There was no "A Grade" evidence for this domain, and regarding research designs, evaluation designs, and control group types, no studies with active control groups found null/negative effects on psychological stress.

Problem Behaviors
Nine of 77 eligible articles (12%) targeted problem behavior domain outcomes. All studies reported a reduction in problem behaviors across research designs, evaluation designs, and control group types, including reduced aggression, disruptive behaviors, conduct behavior, and externalizing problems. The highest quality evidence ("A Grade") showed a decrease in conduct behavior, followed by "B Grade" evidence showing a decrease in aggression.

Academic Performance
Sixteen of 77 eligible articles (21%) targeted academic performance domain outcomes. In most studies (94%) across research designs, evaluation designs, and control group types, MBSIs improved academic performance. One study found null improvements in reading fluency, so this was characterized as "D Grade" evidence. There was no "A Grade" evidence for this domain. The strongest evidence ("B Grade") documented specific improvements in academic performance, auditory-verbal memory, GPA, math performance, math score, and social studies score, as well as an increase in positive attitudes towards academic subjects and lower test anxiety.

Acceptability
Only four of 77 eligible articles (5%) examined the acceptability of MBSIs, with all finding that they were highly acceptable; however; this evidence was of "C and D Grades." There was no "A or B Grade" evidence reported for this domain.

Discussion
Our findings on the highest quality of evidence on MBSIs ("A Grade") are consistent with previous studies on adults which have documented increased prosocial behavior, resilience, executive function, attention, and mindfulness, and decreased anxiety, attention problems/ADHD behaviors, and conduct behaviors (e.g., Goldberg et al., 2021;Guendelman et al., 2017;Hofmann et al., 2010;Hoge et al., 2013;Kemeny et al., 2012;Ramasubramanian, 2017;Rogers, 2013). In addition, these results are in line with recent studies where MBIs have demonstrated therapeutic effects targeting these mental health outcomes with youth in both clinical and school settings (Borquist-Conlon et al., 2019;Dunning et al., 2019;. Unlike in previous reviews, by examining the evidence grade per outcome measure, it is evident that there is a true split in evidence on well-being outcomes, with some highquality evidence showing increased well-being and some other high-quality evidence showing no improvements (both "A Grade" evidence). When considering the studies rated as 1 + + (the highest evidence level), the positive effect study included middle school students from private schools and the null effect study included elementary school students from public schools; therefore, the difference in outcomes may relate to resources or student age groups. Further research is needed to elucidate this issue. Moreover, our re-examination of the evidence per evidence grades has highlighted that MBSIs have a null effect on depression symptoms (as per "A Grade" evidence).
Findings on well-being and depression are in contrast with prior reviews examining adults, where there are many well-designed RCTs examining the efficacy of mindfulness relative to control groups. These RCTs have shown that the intervention is effective in reducing depression and demonstrating improvements in well-being (Goldberg et al., 2021;Hofmann & Gómez, 2017;Strauss et al., 2014). Previous reviews have also shown that MBSIs positively affect wellbeing and depression among youth (Chi et al., 2018;Erbe & Lohrmann, 2015). Our findings also are inconsistent with previous meta-analyses with adults (Khoury et al., 2015) and youth (Dunning et al., 2019;McKeering and Hwang, 2019), which suggested that mindfulness practice improves well-being.
The next tier of evidence (B grade) supported the role of MBSIs in improving self-concept, social competence, selfand emotion regulation, coping, executive function, cognitive control, and mood, as well as reducing social bias and attentional problems. Our review accords with previous studies (Joss et al., 2019;Nejati et al., 2015;Quaglia et al., 2019) and a recent narrative review (Renshaw & Cook, 2017) of MBSIs, which strengthens the evidence that MBSIs improve these outcomes for youth (Barnes et al., 2003;Flook et al., 2010;Mendelson et al., 2010). With improved self-concept and social competence, students can pay attention without judgment to what is happening with themselves and with others (Schonert-Reichl et al., 2015). This can allow them to become resilient and to confront the challenges they will face in classroom settings, such as exam stress, problems concentrating, and dealing with difficult peers (Keye & Pidgeon, 2013). As a result of mindful practice, students may be better able to increase overall self-care by making constructive changes in their personal and professional lives, allowing for a healthier relationship with themselves and with others (Napoli & Bonifas, 2011).
Strong (B grade) evidence also showed that MBSIs improved mindfulness, awareness of thoughts, feelings, emotions, and bodily sensations, being more present in life, concentration, and attention, as well as reduced mindwandering, distractibility and impulsivity. Our findings on these outcomes are in line with increasing evidence on the benefits of mindfulness for adults (Norris et al., 2018;Rahl et al., 2017;Shapero et al., 2018) and youth (Dunning et al., 2019;Renshaw, 2020). Although there is strong (B grade) evidence showing improved attention and reduced mindwandering, there is still insufficient evidence as to how much mindfulness practice is needed to benefit students' attention regulation (Wimmer et al., 2020). Therefore, future studies should focus on the dosage-whether the length of intervention time, number of sessions, or total mindfulness practice time-needed for students to achieve improved attention regulation.
Strong (B grade) evidence also showed that MBSIs improved academic performance, specifically, report card grades, auditory-verbal memory, GPA, math, and social studies performance. Several studies examining MBSIs have been shown to improve academic performance with children (Lu et al., 2017;Thierry et al., 2016) although one review found that MBSIs did not improve academic achievement (Maynard et al., 2017). Given the mixed results, the methodological differences in the quality of reviews compared to studies should be considered before determining whether MBSIs improve academic performance with children. It is noteworthy that gender differences in response to mindfulness may also play an important role in youth academic performance. For example, a preliminary analysis indicated a greater increase in both mindfulness and self-compassion for females compared to males (Bluth et al., 2017). Likewise, in terms of academics, girls tend to achieve higher grades than boys (Duckworth & Seligman, 2006;Duckworth et al., 2015). Therefore, examining potential gender effects is especially important given the prevalence of gender differences in affective disturbances and treatment outcomes among youth (Kang et al., 2018). Future studies are needed to further explore these factors when looking at gender and academic performance to refine and enhance existing programs and to inform future development of MBSIs.
Nonetheless, a smaller group of studies suggested positive changes (B grade) in physiology, neurophysiology, and brain plasticity. MBSIs have been shown to influence physiological changes in adults, although relatively fewer studies examine this connection compared to other behavioral and mental health outcomes (Creswell et al., 2019). Given our knowledge of brain plasticity in early development, future research in this area with children is especially important (Black, 2015;Burke, 2010;Zoogman et al., 2015). Considering the potential neurophysiological processes of mindfulness, future studies should also explore the relationships among length and quality of mindfulness practice, developmental stages of students, and their mental health outcomes (Wielgosz et al., 2019). These factors may benefit MBIs in schools by improving memory and language skills (i.e., reading), which can increase academic success (Mundkur, 2005).
Overall, there were no systematic differences between positive vs. null/negative effect studies in terms of research design (quantitative, qualitative, and mixed), evaluation design (RCT, pre-post, single case/series, etc.), and per control group type (active, passive, none), suggesting overall consistency in terms of these factors in the body of literature to date on MBSIs. However, there were outcomes in need of higher quality evidence, including self-compassion, psychological and physiological stress, academic performance, and acceptability.

Limitations and Future Research
There are several areas of notable strengths when considering the literature on MBSIs used in schools. All studies reported on group-based interventions conducted in typical classrooms during normal school hours, suggesting the generalizability of the results to school-based practice. Another strength is that many studies in this review used components of MBSR, the mindfulness-based intervention with the most empirical support for its effectiveness (Kabat-Zinn, 2003;Klingbeil, Fischer, et al., 2017;Klingbeil, Renshaw, et al., 2017;Kriakous et al., 2020). Finally, several studies included data on student educational, attentional, and behavioral outcomes, such as student achievement, ability to focus, and grades. However, additional studies and metaanalyses are needed to explore the evidence of the effectiveness of MBSIs on these educational outcomes, which may be relevant to educators and other school-based stakeholders.
Nevertheless, the literature exploring the effects of MBSIs with youth has several limitations. Many studies included in this review relied on small samples, with studies averaging around 35 participants. Future studies may benefit from larger sample sizes to power statistical analyses adequately and to aid in the generalizability of the findings. There also are significant limitations in how outcomes were measured. Most studies relied on questionnaire measures to assess for effects (particularly student self-report), which are limited by possible response bias and retrospective memory biases. Although some studies included used multiple methods (e.g., subjective self-reports, behavioral observations, and objective neurocognitive, and physiological testing), the majority relied on a single method. To address these limitations, we recommend future MBSI studies to collect data regarding the training quality of the instructors and the amount of meditation conducted during training, as well as to use substantially larger and more diverse samples of students to examine both the immediate and long-term impact of mindfulness training post-treatment.
A third limitation of studies included in this review was the lack of reporting of participant characteristics. For example, 40% of studies in this review did not provide details about participant race and ethnicity, which is important given the underrepresentation of racial and ethnic populations in rigorous trials of MBIs (Waldron et al., 2018). Very few studies included students receiving education supports, and only five studies specifically examined the impact of MBSIs on children with disabilities (see Online Resource 1 for more details). Given that most of these studies were conducted through whole class instruction, it is possible that existing mindfulness interventions are not well suited to the specific needs and reality of a classroom for children with disabilities. Attention to specific developmental child characteristics (e.g., cognitive ability, attention span) is therefore required when adapting MBSIs.
Few studies, all of lower quality, investigated the impact of MBSIs on problem behaviors such as aggression, disruptive behaviors, conduct behavior, and externalizing problems. More studies of higher quality are needed to better address these problem behaviors in schools since it has been positively associated with teacher burnout and self-efficacy (Brouwers & Tomic, 2000;Burke et al., 1996). This leads to poor student-teacher relationships, which could affect students' learning and achievement (Herman et al., 2018). Although many studies examined the acceptability and feasibility of child adaptations to adult MBIs (Bluth et al., 2016;Broderick & Metz, 2009;Hiltz & Swords, 2021;Luiselli et al., 2017;Metz et al., 2013;Quach et al., 2017), future work on MBSIs should consider scalability and other factors known to impact the implementation of other schoolbased or youth-focused programs. This includes principal and district buy-in, individual attitudes towards the intervention, and organizational climate and culture, as well as implementation climate and leadership (Locke et al., 2016). To facilitate effective implementation and sustainment of MBSIs, studies should use a mixed-methods approach to assess both outcomes and acceptability, adopting methods such as teacher reports on student outcomes, review sessions, observations of training sessions, and student questionnaires and interviews (Zenner et al., 2014).
Finally, despite compelling theory and emerging evidence from adult samples (Gu et al., 2015), no studies examined the mechanisms or active ingredients of mindfulness to understand the key components of MBSIs for producing positive outcomes. These studies are essential to explore the various active ingredients in mindfulnessbased interventions such as social support, relaxation, and cognitive-behavioral elements. Examining the central construct of mindfulness itself is also important to determine if the development of mindfulness is what leads to the positive changes that have been observed (Shapiro et al., 2006). This is important to advance knowledge on how to best develop, adapt, and implement MBSIs to optimize outcomes. Also, no studies examined the long-term impact of MBSIs after 1 year, which would be beneficial in learning about the lasting impact that MBSIs have on youth. Future studies should therefore examine both mediating mechanisms and the long-term impact of school-based mindfulness training post-treatment.
We should note several limitations of our review methodology as well. First, we did not include gray/unpublished literature, which may have resulted in missing some relevant studies. Indeed, there may have been a publication bias in the literature included, in that published studies are systematically different from results of unpublished studies due to either non-submission for publication or rejection at the review stage. Second, we did not evaluate specific mindfulness practices (e.g., sitting meditation, body scan, movement meditations) and program delivery aspects (e.g., level of teacher training). Given that mindfulness training is highly variable across studies, it is important for future research to examine these factors to determine which intervention best fits the needs of youth. We also did not examine program fidelity, which is important to moderate the relationship between the intervention and its outcomes as well as to prevent potentially false conclusions from being drawn about the intervention's effectiveness. Third, our review did not analyze the age appropriateness and pedagogy used for MBSIs so future studies may benefit from examining these factors. We would also like to acknowledge that comparing public school versus private school as well as integrating socioeconomic status into the analysis would have added to higher quality studies. Given that our study did not incorporate this into our analysis, we recommend that future studies consider these factors when examining the quality of MBSIs. Furthermore, our "Results" section focused mainly on the outcomes of the MBSIs without reporting the differences in the effectiveness of MBSIs based on the other data that was extracted from individual studies (e.g., research or evaluation design, teacher training, educational level). Since our review examined the quality of outcome evidence by research design, as well as quantity and strength of evidence across studies, examining the differences in the effectiveness of MBSIs based on the mentioned constructs is beyond the scope of our study. The descriptive information we coded about the studies was intended to describe the characteristics of the population studies we reviewed rather than examining moderator and mediator analyses. As such, we suggest future studies to include moderator and mediator analyses when looking at the overall effectiveness of MBSIs and suggest considerations of these factors in further considerations of outcome quality. Finally, there are limitations to using a systematic review methodology, which could have resulted in the variability of our findings. Various design factors such as the educational level of students, type of intervention, and type of delivery may have impacted the lack of effectiveness observed in this present review. We recommend future studies to conduct a meta-analysis using high-quality evidence, especially for the outcomes with mixed results.
This study reviews the studies of MBSIs for youth using a robust system for grading recommendations that considers the methodological rigor of studies to determine effectiveness recommendations of MBSIs for producing certain outcomes. Strong evidence (B grade) indicates that MBSIs improve self-compassion, social relationships, mental health, self-regulation and emotionality, mindful awareness, attentional focus, physiological stress, and academic performance. The strongest evidence (A grade) indicated that MBSIs produce improvements in resilience and anxiety across youth. In addition, the strongest evidence suggests no changes in decreasing depression symptoms and increasing well-being across youth receiving MBSIs. Given the difficulties that children and adolescents face in an increasingly demanding world, this review demonstrates the promise of incorporating mindfulness interventions to youth in a school setting. Despite the benefits that MBSIs may have with youth, this area of research is still maturing, with many studies incorporating pre-post design or otherwise less rigorous evaluation methods. Therefore, we urge researchers interested in MBSIs to study their effectiveness using more rigorous designs (e.g., RCTs with active control groups, multi-method outcome assessment, and follow-up evaluation), to minimize bias and promote higher quality-not just increased quantity-evidence that can be relied upon to guide school-based practice.
Acknowledgements This paper would not have been possible without the exceptional support of the lead author's friends and family.
Author Contribution MP: conceptualized the research, reviewed the literature, wrote the paper, submitted the manuscript. TR: collaborated in the writing and editing of the final manuscript. JC: reviewed the literature. JG: collaborated in the writing and editing of the final manuscript. EM: conceptualized the research, reviewed the literature, designed measurement approach. ZAD: reviewed the literature. ND: reviewed the literature. HT: reviewed the literature. DM: designed measurement approach, designed analytic approach, and collaborated in the writing and editing of the final manuscript. HN: developed general research design, conceptualized the research, designed measurement approach, designed analytic approach, conducted data analysis, wrote the "Results" section, and collaborated in the writing and editing of the final manuscript. All authors approved the final version of the manuscript for submission.

Conflict of Interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.