Mindfulness can be defined as “the awareness that emerges through paying attention on purpose, in the present moment, and nonjudgmentally to the unfolding of experience moment by moment” (Kabat-Zinn, 2003, p. 145). There has been a steady increase in empirical research on school-based mindfulness programmes (SBMP)s in schools over the past 10 to 15 years. Yet, compared to other sectors, such as health care, the research demonstrating the benefits of mindfulness training in educational settings is still in a nascent stage (Schonert-Reichl, 2023). A narrative review (Roeser et al., 2023a) reported on studies conducted between 2000 and 2019, showing that SBMPs improve students’ mindfulness and self-regulation skills, reduce students’ feelings of anxiety and depression, support their physical health, and help them to engage in healthy relationships with others.

Although most of the scientific literature on the benefits of mindfulness have focused on psychological and physical effects, there has more recently been a parallel interest in its interpersonal and social effects (Kreplin et al., 2018). One area in which the research into mindfulness is still in its infancy is how it may have an effect on a person’s empathy and prosocial behaviour. In a meta-analysis, Donald et al. (2019) found mindfulness training to have a small positive effect on prosocial behaviours in adults. A more recent meta-analysis (Malin, 2023) found mindfulness interventions to have a small-pooled effect size on prosocial behaviour. Almost all of the studies in the meta-analysis that measured the dependency of the effect on pre-existing traits such as empathy (Chen & Jordan, 2020; Malin & Gumpel, 2022) and empathic care (Berry et al., 2018) found evidence for this hypothesis, suggesting a possible link between prosocial behaviour and empathy. There are a limited number of studies which have looked at the effect of mindfulness on prosocial behaviour in children, compared to those involving adults. Helping, sharing, and cooperating are critical aspects of social competence in childhood that predict diverse outcomes in academic (Wentzel & McNamara, 1999) as well as interpersonal (Wentzel et al., 2004) domains. Prosocial behaviour in the sense of intentional, voluntary, and sometimes altruistic helping behaviour among children is part of a positive learning culture in the classroom. Mindfulness exercises may help in building a supportive climate among classmates (Salisch & Voltmer, 2023). Therefore, investigating the effects of mindfulness on prosocial behaviour may have important implications for its use in schools.

Of the few studies investigating the effects of mindfulness on prosocial behaviour in children, most investigated preschoolers. These studies measured prosocial behaviours through observations or behavioural tasks finding increases in sharing (Berti & Cigala, 2020; Flook et al., 2015; Viglas & Perlman, 2018), helping, and comforting behaviours (Berti & Cigala, 2020; Viglas & Perlman, 2018). Preschoolers were also rated as more prosocial by their teachers (Viglas & Perlman, 2018) following mindfulness training. All studies showed small effect sizes and included no follow-up measures. One study which involved older primary school-aged children found that, following a SBMP (MindUP, 2022), children were rated as more prosocial by peers and teachers, reported greater empathy and mindfulness, and increased peer acceptance in a sociometry measure. In this study too, all effect sizes were small (Schonert-Reichl et al., 2015). A more recent study (Salisch & Voltmer, 2023) measured prosocial behaviour in 7–11-year-olds, again following the Mind Up programme. Teacher and peer ratings of prosocial behaviour, peer acceptance, and classroom climate measures showed that, following the SBMP, girls were more prosocial compared to the active wait-list control group (effect sizes were not reported). To date, there is a lack of research measuring the effects of UK-based SBMPs on prosocial behaviour in primary school-aged children. Also, previous studies have not included any direct behaviour tasks to measure prosociality. Using a different SMBP, with a range of measures, including a direct behaviour task, would allow for a wider and deeper understanding of the effects of SBMPs on prosocial behaviour.

There are a number of proposed mechanisms by which mindfulness might increase prosociality in adults. First, mindfulness might foster prosocial behaviour by increasing individuals’ capacity to sustain and direct attention (Condon, 2017). Greater attentional capacities may increase the likelihood that an individual observes the needs of others, meaning they are more likely to respond to them (Brown & Ryan, 2003). Although research has not yet investigated the mechanisms for possible increases in prosocial behaviour in children, studies have highlighted increases in children’s attentional capacity following mindfulness training (Li et al., 2019). Second, mindfulness training has been shown to increase activity in the insula (Farb et al., 2007), a part of the brain involved, not only in interoceptive awareness, but also in processing others’ emotional experiences. Greater interoceptive awareness may increase individuals’ awareness of the needs of others in the social environment. Third, mindfulness may change an individual’s affective experience, i.e. the positive and negative emotions which they experience (Donald et al., 2019). Dispositional mindfulness (i.e. mindfulness as a trait) was found to be associated with more positive emotions such as love/closeness, joy, gratitude, and interest and fewer negative emotions such as anger, fear, guilt, and stress. These emotions were in turn associated with, respectively, greater and lesser self-reported helping behaviour (Cameron & Fredrickson, 2015). Finally, as we have seen above, studies with adults also suggest that mindfulness leads to increased empathic concern, which may explain why increases in prosocial behaviour are evident (Schindler & Friese, 2022). In the current study, we will investigate this last potential mechanism, examining the link between mindfulness training, empathic concern, and prosocial behaviours in children.

Previous studies have demonstrated that children with higher levels of empathy are generally better able to regulate their emotions, show less aggression, and act in a more prosocial way (Eisenberg et al., 1996; Meuwese et al., 2016). In a recent systematic review, Cheang et al. (2019) examined the research to date which considered the link between mindfulness, empathy, and compassion, involving children and adolescents. They found support in favour of mindfulness-based interventions (MBIs) increasing empathy in children and adolescents. Ten studies measured empathy, seven of which showed a significant increase following an MBI. Four out of eight studies measuring self-compassion showed a significant increase post-intervention. The majority of outcome measures included those which were probed behaviours, and were hypothesised to be related to empathy (i.e. prosocial behaviour, social skills, peer relationships, social connectedness, social responsibility, and social and emotional competence measures, e.g. Joyce et al., 2010, Harpin et al., 2016); however, only one study measured empathy specifically, through self-report (Schonert-Reichl et al., 2015). The author notes that, due to the poor methodological quality used in many of the included studies, however, the results of this review should be interpreted with caution. Of the 16 studies included in the review, 9 were underpowered. Only 6 out of the 16 studies were RCTs; therefore, being able to compare a mindfulness intervention to some form of control group and, of those 6 RCTs, only 2 involved children, the remainder involved adolescents, suggesting a lack of evidence with primary-aged children. Neither of the studies with children collected follow-up data. To our knowledge, there are no studies in which empathy and prosocial behaviour have been measured using self, other, and direct behaviour task measures, for primary school-aged children in the United Kingdom (UK). There is a need, therefore, for more RCTs involving primary-aged children, measuring both empathy and prosocial behaviour as separate concepts, using a range of measures (e.g. self-assessment, other-assessment, sociometry, and behaviour tasks). Measures should also be taken at follow-up points to consider both the longer-lasting effects of SBMPs and the possibility of skills taking longer to emerge.

One question to consider is: How might mindfulness training lead to increased empathy? Considering the literature on developing empathy in children and youth, Cotton (2002) highlights how mindfulness practice may lead to increased empathy. To increase the ability in children to assume another’s perspective, it is most fruitful to have them focus first on their own feelings, the different kinds of feelings they have, and what feelings are associated with what kinds of situations (Black & Phillips, 1982). This links closely to mindfulness in that an important aspect of mindfulness training is about learning to notice feelings and emotions being experienced at any given moment. Activities which focus children’s attention on similarities between themselves and another person (or other persons) are effective in increasing affective and cognitive empathy (Hughes et al., 1981). Mindfulness teaching includes learning how to focus one’s attention. The analogy of a torchlight, narrowing and broadening, is used to help children understand how their attention can be focused or unfocused, respectively (MiSP (Mindfulness in Schools Project), 2023). Mindfulness training also allows time for inquiry, where children are able to share their own experiences of practices, encouraging other children to listen and notice how experiences can be different for everyone.

The present RCT was conducted to explore the effects of a SBMP called Paws b, developed by the Mindfulness in Schools Project (MiSP (Mindfulness in Schools Project), 2023), on empathy and prosocial behaviour at post-intervention and at 3-month follow-up. The study investigated the effects on 7- to 10-year-old primary school children in the UK. This age range is the least researched to date, with the majority of studies involving younger or older children. To reduce confounding variables, the wait-list control group received a teaching-as-usual (TAU) Personal Social and Health Education (PSHE) curriculum. Using multi-informant measures to address methodological issues in previous studies, we examined group differences between the intervention group and a wait-list control group on multiple outcomes, including self-assessed mindfulness; self-assessed empathy; a sharing task, with a blinded research assistant; teacher-assessed prosocial characteristics; peer-assessed prosocial characteristics; and sociometry measures of peer acceptance. We hypothesised that, when compared with pupils in the wait-list control group, Paws b programme pupils would show positive changes from pre-intervention to post-intervention on all outcome measures. Specifically, we hypothesised that: Mindfulness scores would be significantly higher for the intervention group compared to the wait-list control group at post-intervention and maintained at follow-up, as measured with self-report questionnaires; Empathy scores would be significantly higher for the intervention group compared to the wait-list control group at post-intervention and maintained at follow-up, as measured with self-report questionnaires; Prosocial behaviour scores would be significantly higher for the intervention group compared to the wait-list control group at post-intervention and maintained at follow-up, as measured with teacher and peer-report questionnaires, as well as a behavioural task and sociometric measures.

Method

Participants

Three schools in Hampshire were invited to participate. All were junior schools, with comparable socio-economic characteristics (e.g. the majority of pupils were White British and the proportion of pupils who spoke English as an additional language was below the national average (Ofsted, 2023)). The three schools all had more than one class per year group. Junior schools in the UK are formed of four year groups, Years 3, 4, 5, and 6, where children’s ages range from 7 to 11 years. One of the schools declined to participate due to timetabling difficulties. The two schools who agreed to take part had no previous experience with the implementation of SBMP in their curriculum. School A offered three year groups, Years 3, 4, and 5. Years 3 and 5 consisted of two classes each and Year 4 consisted of three classes. School B offered one year group, Year 3, which also consisted of three classes. Altogether a total of 10 classes were enrolled in the study.

In year groups with two classes, one was randomly assigned to the intervention group and the other to the wait-list control group. In year groups with three classes, it was decided that two would randomly be assigned to the intervention group and one to the wait-list control group. This led to a total of 6 intervention classes and 4 control classes (n = 273; mindfulness = 165; control = 108). More specifically, the intervention group consisted of three Year 3 classes, two Year 4 classes, and one Year 5 class, and the control group consisted of two Year 3 classes, one Year 4 class, and one Year 5 class. Randomisation, using “GIGAcalculator” (Georgiev, 2017), an online randomiser, was completed by the first author to allocate classes to either the intervention or wait-list control group. Class size ranged from 25 to 30 children. See Fig. 1 for participant flow chart. Absent children were only excluded from the study if they missed more than 2 of the 12 lessons, resulting in no exclusions for absence for this study.

Fig. 1
figure 1

Participant flow chart

Procedure

Once the headteachers from each school had agreed to take part in the study, and before randomisation took place, teachers attended a staff meeting for their school where a presentation about the study took place. Children in participating classes from each school received a brief introduction to the study, from the mindfulness teacher, in which they were invited to take part and given the chance to ask any questions they had about the study. They were told that they were participating in a research study that was aimed at understanding any changes in children’s behaviour over time. Children were then provided with information packs about the study to take home to their parents. The packs included a parental information sheet and a parental/guardian consent form. Children were also asked individually whether they would like to participate before the start of each evaluation session and their decisions were respected. After randomisation, meetings were also held with participating teachers before the study commenced to inform them of which group their class had been randomly assigned to, and discuss practicalities and schedules. For children whose parents did not provide consent, evaluation did not take place but all children in the intervention group participated in the Paws b course as part of the regular curriculum. Children from the wait-list control classes were offered the Paws b curriculum at the end of the study. The teaching began after the Easter Holidays, in May 2022 and continued, for one session per week, throughout the summer term.

Data was collected at three time points: pre-intervention (1 week before the intervention; post-intervention (the week following the final teaching session of the intervention); and follow-up (3 months after completion of the intervention). A trained research assistant (RA), blind to the hypotheses of this study and the study conditions, administered self-report and peer behavioural assessments during one 45-min whole class session. To control for any differences in reading abilities, each item on the questionnaires was read aloud whilst students completed the measures. Teacher measures were administered during the same week, although teachers were not blind to the study conditions of students. The RA also carried out a sharing task activity with students individually, at all three time points.

The Paws b course was developed by MiSP. The course includes a range of formal and informal mindfulness practices adapted for children aged 7–11-year-olds. Paws b can be delivered by a class teacher, if they have completed an 8-week mindfulness course and the Paws b training course, or a freelance trained Paws b teacher. In this study, the course was delivered by the first author who was also a Paws b teacher and a trained primary school teacher, but who did not work in either of the schools at the time of the study. This teacher delivered the course as part of PSHE lessons in the classroom setting, with the class teachers present.

Paws b aims to teach children to develop more mindful and less automatic responses to their present-moment experiences. The six themes covered in the course (i.e. Our Amazing Brain, Puppy Training, Finding a Steady Place, Dealing with Difficulty, The Story Telling Mind, and Growing Happiness) can be delivered using 12 individual lessons or 6 longer sessions with two lessons combined (MiSP (Mindfulness in Schools Project), 2023). In this study, all sessions were delivered in 45-min sessions, as part of the PSHE curriculum. Participants were also invited to practise the mindfulness activities at least three times per week, at home, between each session. This was the first time the pupils had taken part in the Paws b course or any other mindfulness-based activities in school. The wait-list control group continued their usual PSHE curriculum without the addition of mindfulness throughout the study period, but these lessons were taught by the mindfulness teacher to eliminate teacher effects.

Measures

Demographic Information

All parents were asked to complete a demographics questionnaire. In the intervention group, 49 out of 81 questionnaires were returned (60%). For the control group, 19 out of 52 questionnaires were returned (36%). The gender, age, and year group of all participants were supplied by each school. The demographics questionnaire aimed to gather further information on the participants’ ethnicity, first language at home, number of siblings, parents’ age, educational background, and marital status.

Self-report Assessments

To assess mindfulness, we used the Mindful Attention and Awareness Scale for Children (MAAS-C; Lawlor et al., 2014), a 15-item measure. This is an adapted version of the Mindful Attention and Awareness Scale (MAAS) for adults (Brown & Ryan, 2003), to be used with children of 8 years and above. Authors of the scale adapted the original version by (a) altering language to be age-appropriate and (b) changing the 6-point Likert-type scale to read in a more child-friendly format (1 = almost never; 2 = not very often at all; 3 = not very often; 4 = somewhat often; 5 = very often; 6 = almost always). On analysis, items were reverse-scored and averaged, with higher scores indicating higher mindfulness. Lawlor et al. (2014) reported the MAAS-C to be a reliable and valid instrument for children, with a reported internal consistency of 0.84 as assessed via Cronbach’s alpha. For this study, reliability was acceptable (α = 0.72; ω = 0.71). Readability on the Flesch-Kincaid Grade Level Test was 5.8 indicating it would be suitable for 10–11-year-olds. Because some of our participant groups were younger than 10, each question on the questionnaire was read out to each class by the RA.

To assess empathy, we used the Empathy Questionnaire for Children and Adolescents (EmQue-CA; Overgaauw et al., 2017), adapted from the Empathy Questionnaire (EmQue), developed by Rieffe et al. (2010). It is an 18-item questionnaire and is suitable for children from the age of 8. It specifically focuses on three aspects of empathy: (1) affective empathy: a scale that measures the extent to which the child/adolescent feels for the emotional state of the suffering person; (2) cognitive empathy: a scale that measures the extent to which the child/adolescent understands why the other person is in distress; and (3) intention to comfort: a scale that measures the extent to which the child/adolescent is inclined to actually help or support the suffering person. Participants were asked to rate to what extent the description was true for them on a 3-point scale: (1) not true, (2) somewhat true, and (3) true (Pouw et al., 2013). All questions were (re)scored such that higher scores reflect higher empathy. Mean scores were calculated per scale. Reliability for this study was questionable (α = 0.68; ω = 0.66) for affective empathy, and acceptable for cognitive empathy (α = 0.70; ω = 0.70) and intention to comfort (α = 0.77; ω = 0.76). For our study, the wording on question 1 was altered slightly from “If my mother is happy, I also feel happy” to “If someone I love is happy, I also feel happy”. This was in case any of the children in the study did not have a relationship with or had lost their mother.

Sharing Task

This task was adapted from a sharing task (Flook et al., 2015) used for preschoolers which consisted of four separate trials in which children distributed stickers between themselves and a target recipient. The task took place in a quiet room within each school and was facilitated by the RA, who was blind to the conditions. The four target recipients included a most- and least-liked peer (identified by the participant) from their class, an unfamiliar child in the same year group but from another school, and a child who was unknown but also unwell. Gender-neutral names were used for the two unknown recipients. In each of the four trials, children were presented with an envelope for themselves labelled “me” and an envelope with the name of the designated target recipient. Children were given 10 stickers at the beginning of each trial and told they could keep as many as they would like for themselves and give as many as they would like to the other person. A script was used, and the RA turned away whilst the participant shared the stickers, so as not to influence the participants’ decision. At the end of the task, the “me” envelope was given to the child to take with them and, after the child left, the stickers from each of the other envelopes were counted. The number of stickers given to each of the four recipients is the sharing score, ranging from 0 to 10 for each recipient. A total score was computed, which ranged from 0 to 40, and reflected the number of stickers donated across all four trials. Finally, an average sharing score was also computed.

Teacher-Report Assessments

The Strengths and Difficulties Questionnaire (SDQ) is a behavioural screening questionnaire developed by Goodman (2017) to measure prosocial and maladaptive behaviours in children. It consists of 25 items that are divided equally between five scales: Prosocial Behaviour, Hyperactivity, Conduct Problems, Emotional Symptoms, and Peer Problems. Teachers rate how closely the target child fits with each attribute on a 3-point scale as “Not true”, “Somewhat true”, or “Certainly true”. Scores range from 0 to 2 for each item and scores can be totalled for each scale with a possible total of 0–10 points. The prosocial subscale of SDQ was selected for this study, to be completed by class teachers at pre-intervention, post-intervention, and follow-up, to assess children’s prosocial behaviour. Previous studies have used the prosocial subscale of the SDQ and not the remaining four subscales, when measuring prosocial behaviour (e.g. Joyce et al., 2010; Waldemar et al., 2016). Reliability for this study was very good (α = 0.88; ω = 0.88).

Peer-Report Assessments

Following the procedures outlined by Parkhurst and Asher (1992), and used by Schonert-Reichl et al., (2015), peer nominations were used to obtain independent assessments of prosociality, whereby children nominated their classmates who fit particular behavioural characteristics. Specifically, unlimited and cross-gender peer nominations were used to obtain independent assessments of children’s prosocial behaviour. Five types of prosocial behaviours were assessed: is kind; shares and cooperates; helps others when they have a problem; is trustworthy and understands others’ points of view. For each behaviour, children were provided with a list of all of their classmates and were asked to circle the names of their classmates on each list who fit the behaviour description; children could circle as many or as few names as they wanted. This was a whole class activity but children were asked to complete their responses confidentially. For each question, the percentage of nominations each participating child received was computed by dividing the number of nominations received by the total number of children in the classroom. The average percentage score was then computed for each child. This methodology is consistent with published investigations in which peers’ ratings of behaviours are considered to be a reliable and valid way in which to assess students’ social behaviours in a school context (Schonert-Reichl et al., 2012; Wentzel et al., 2004).

A sociometry measure of children’s level of acceptance by peers (one item: “would like to be in school activities with”) was assessed using the same nomination sociometric procedure used for obtaining measures of behaviours (e.g. Oberle et al., 2010; Schonert-Reichl et al., 2015). Three measurements were taken from this data: (a) the number of classmates each participant voted for; (b) the number of votes a participant received from classmates; and (c) the average number of reciprocal relationships per class.

Cross-contamination Checks

At the end of the study, all participants were given a questionnaire to check for any cross-contamination from the intervention group to the wait-list control group. Participants were asked questions about what they had been learning about in PSHE over the last term, and were asked specific questions about mindfulness exercises (which should only have been answered correctly by participants in the intervention group if no cross-contamination had taken place). They were also asked questions about the hypotheses of the study, to ensure children had remained blind to these throughout the duration of the study.

Debriefing

Following the cross-contamination questionnaire, the first author visited each class and explained to the children what the study was about. It was explained to them that some of the classes had been taught mindfulness whereas the other half had been taught their normal PSHE lessons. The children were informed that the study was investigating how mindfulness may affect their feelings and behaviour towards others, compared to children who had not been taught mindfulness. They were then given the chance to ask any questions they had about the study.

Data Analyses

An a priori power analysis was conducted using G*Power version 3.1.9.7 (Erdfelder et al., 2009) to determine the minimum sample size required. Results indicated the required sample size to achieve 80% power for detecting a medium effect, at a significance criterion of α = 0.05, was n = 121. Thus, the obtained sample size of n = 133 was adequate for this study. Analyses were conducted with SPSS 28 and statistical significance was set at p < 0.05, using two-tailed tests. Mean imputation was used to “fill in” missing data as the proportion of missing data was within the 20% allowance suggested by (Peng et al., 2006) for school-based studies. Outliers were not removed as the variability in measurement was estimated possible. The dependent variables in this study were the MAAS-C scores, the EmQue-CA scores, the sharing task scores, the SDQ prosocial subscale scores, the peer nominations scores, and the sociometry scores. The MAAS-C scores were presented as an average and ranged from 0 to 6. Because the EmQue-CA consists of three separate scales, affective empathy, cognitive empathy, and intention to comfort, the average scores for each of the scales were presented. The sharing task score ranges from 0 to 40. The SDQ prosocial subscale score ranged from 0 to 10. Peer nomination scores and sociometry scores were calculated as percentages.

Before running statistical analyses addressing the research questions in this study, the preliminary descriptive statistics, randomisation checks, and baseline scores across the two conditions were examined. This was an important step considering that the sampling for this study was performed on the classroom level, which could result in non-equivalence of the samples between the two conditions. To compare the groups on their baseline sociodemographics, we used the Pearson chi-square tests for categorical data and independent t-tests for continuous data.

For each of the outcome measures, separate 2 (group: intervention vs. wait-list control) × 3 (time: pre-intervention, post-intervention, and follow-up) mixed-factorial analysis of covariance (ANCOVA) was conducted to examine changes between the two groups, over the three time points. In all analyses, gender, age, school, and year group were included as covariates in order to control for potential confounds.

Results

Preliminary Descriptive Statistics

The average age of the participants was 8.23 years (SD = 0.98), 52% (n = 69) were female and 48% (n = 64) male. Ninety-six percent of participants described their ethnicity as “white” (n = 128). The majority (57%, n = 76) of the participants were in Year 3, 26% in Year 4 (n = 34), and 17% in Year 5 (n = 23). Baseline sociodemographic characteristics for the intervention group and wait-list control group are displayed in Table 1.

Table 1 Sociodemographic characteristics of participants (n = 133)

Randomisation Checks

An independent sample t-test found that the wait-list control group had a significantly higher age mean than the intervention group t(107.92) = 2.89, p = 0.005, d = 0.516, with a medium effect size (see Table 1 for means). A Pearson chi-square test revealed no significant difference for gender between the two groups, χ2(1, n = 133) = 0.12, p = 0.73, φ = 0.03, but a significantly higher proportion of children from school B in the intervention group, compared to school A χ2(1, n = 133) = 6.39, p = 0.011, φ = 0.219, with a small effect size, and a significantly larger number of children from Year 3 in the intervention group compared to the wait-list control group χ2(2, n = 133) = 10.84, p < 0.004, φ = 0.286, again with a small effect size. These results can be explained due to the randomisation method taking place at class level rather than individual level. School B invited only their Year 3 children to participate, and because they were a three-form entry, two out of the three were randomly assigned to the intervention group. As this school accounted for 44% (n = 59) of all participants, this led to significant differences between the two groups.

Cross-contamination Check

Children were asked what they had been learning about over the last term in PSHE. Answers which included any of the following words were scored as 1: mindfulness; breathing; Paws b; finger breathing; mind; brain; petal practice; puppy training. Any other answers or no answer were scored as 0. No participants in the control group were scored as 1. Children were also asked what the other class in their year group had been learning about. The same scoring criteria were used. Three children in the control group scored 1. We are confident therefore that no major cross-contamination of learning took place.

Baseline Comparisons

Independent sample t-tests revealed no significant differences at baseline between the intervention group and the wait-list control group for self-reported mindfulness t(131) = 1.56, p = 0.12, d = 0.28 or empathy (affective t(131) = 0.14, p = 0.892, d = 0.024 cognitive t(131) = 0.44, p = 0.658, d = 0.079 and intention to comfort t(131) = 1.48, p = 0.142, d = 0.262), nor for the sharing task t(131) = 1.97, p = 0.051, d = 0.349, teacher-reports t(131) = 0.88, p = 0.381, d = 0.156 or peer-reports t(131) = 0.03, p = 0.975, d = 0.005 of prosocial behaviour, or sociometry scores (votes for classmates t(131) = 0.46, p = 0.647, d = 0.081, votes for participants t(131) = 0.29, p = 0.776, d = 0.051, reciprocal votes t(8) = 0.69, p = 0.507, d = 0.416) (see Table 2 for means).

Table 2 Descriptive statistics for all outcome measures at three time points

Main Analysis

Table 2 provides descriptive statistics for the complete data set, for each measure at pre-intervention, post-intervention, and 3-month follow-up. The intervention analyses are reported below for each variable considered:

Self-report Assessments

A 2 (intervention vs wait-list control) × 3 (pre-intervention vs post-intervention vs follow-up) mixed-factorial ANCOVA for mindfulness scores showed no significant differences between the two groups, F(1,127) = 0.66, p = 0.418, η2 = 0.005, and no significant differences across the three time points F(2, 254) = 0.64, p = 0.53, η2 = 0.005. There was no significant time by group interaction F(2,254) = 0.97, p = 0.369, η2 = 0.008 therefore showing that mindfulness scores did not change significantly across time between either the intervention group or the wait-list control group.

To assess empathy, a 2 (intervention vs wait-list control) × 3 (pre-intervention vs post-intervention vs follow-up) mixed-factorial ANCOVA was completed for each of the three scales of the EmQue-CA. There were no significant results for group, time, or group × time interaction for any of the empathy scales, meaning children’s self-assessed level of empathy did not differ between the intervention group and the control group following the mindfulness intervention or at follow-up. Results are presented in Table 3.

Table 3 ANCOVA results for EmQue-CA scales

Sharing Task

A 2 (intervention vs wait-list control) × 3 (pre-intervention vs post-intervention vs follow-up) mixed-factorial ANCOVA for the sharing task total scores revealed no significant differences between the two groups F(1,127) = 1.36, p = 0.245, η2 = 0.011, no significant differences across the three time points F(1.79, 226.94) = 0.002, p = 0.997, η2 = 0.000, and no significant time by group interaction F(1.79, 226.94) = 1.23, p = 0.292, η2 = 0.01, showing that the number of stickers given away across time, did not differ significantly between the intervention group and the wait-list control group.

Teacher-Report Assessments

A 2 (intervention vs wait-list control) × 3 (pre-intervention vs post-intervention vs follow-up) mixed-factorial ANOVA for the SDQ scores revealed no significant differences between the two groups, F(1, 127) = 0.13, p = 0.724, η2 = 0.001, and over the three time points, F(1.82, 231.37) = 1.41, p = 0.247, η2 = 0.011. There was a significant group × time interaction between the intervention group and wait-list control group for the first two time points (pre-intervention and post-intervention) F(1, 127) = 7.35, p = 0.008, η2 = 0.055, with a small effect size. The intervention group was rated as significantly more prosocial, by their teachers, at post-intervention, compared to the wait-list control group, but this effect was no longer visible at follow-up F(1, 127) = 0.12, p = 0.73, η2 = 0.001.

Peer-Report Assessments

For the peer nominations, a 2 (intervention vs wait-list control) × 3 (pre-intervention vs post-intervention vs follow-up) mixed-factorial ANCOVA was conducted on the data. There were no significant differences between the two groups F(1, 127) = 0.09, p = 0.771, η2 = 0.001, nor was there a significant difference across the three time points, F(1.85, 234.77) = 0.26, p = 0.752, η2 = 0.002. There was also no significant group × time interaction, F(1.85, 234.77) = 0.22, p = 0.788, η2 = 0.002, meaning that the change in scores from pre-intervention to post-intervention to follow-up did not change or differ significantly for the intervention group or the wait-list control group.

To understand prosociality in more detail, the five different prosocial characteristics (is kind, helpful, and trustworthy, understands others’ points of view, and shares and cooperates) were analysed separately. The prosocial characteristics “helpful” and “shares and cooperates” did reveal significant time × group interactions. Children in the intervention group were voted as significantly more helpful than the wait-list control group, by their peers, following the mindfulness intervention, F(1, 127) = 9.37, p < 0.003, η2 = 0.069, with a medium effect size, and this difference was still prevalent at follow-up F(1, 127) = 7.96, p = 0.006, η2 = 0.059. For “shares and cooperates”, the wait-list control group was voted significantly higher than the intervention group at post-intervention F(1, 127) = 14.27, p < 0.001, η2 = 0.101, with a medium effect size but this was no longer visible at follow-up, F(1, 127) = 1.61, p = 0.207, η2 = 0.012.

For the sociometry scores, a 2 (intervention vs wait-list control) × 3 (pre-intervention vs post-intervention vs follow-up) mixed-factorial ANCOVA was conducted on the data. Results are displayed in Table 4. There was a significant time × group interaction from pre-intervention to post-intervention. At post-intervention, the participants in the intervention group received a significantly higher percentage of votes from their peers than the participants in the wait-list control group. Participants were voting on who they would like to be in school activities with, suggesting that following a mindfulness intervention, participants were more popular with their peers than children in the wait-list control group.

Table 4 Sociometry outcome data

To investigate the sociometry measure further, we considered reciprocal relationships within each class, i.e. how many friendships were reciprocated in each class. This was analysed at post-intervention, but not at follow-up, as the children in the classes were moved around between post-intervention and follow-up. There was a significant difference between the scores for the two groups at post-intervention t(8) = 2.52, p = 0.04, d = 1.383, with a large effect size. The number of reciprocal relationships increased at post-intervention for the intervention group whereas, for the wait-list control group, this number decreased.

Discussion

This RCT examined whether the SBMP “Paws b” had an effect on children’s levels of self-assessed mindfulness, self-assessed empathy (affective empathy, cognitive empathy, intention to comfort), observable sharing behaviours, peer-assessed and teacher-assessed prosocial behaviour, and peer relationships. The study also included a follow-up continuing past the immediate post-intervention assessments, comparing the outcomes of the intervention with a wait-list control group, who received their normal planned PSHE lessons. Results showed no significant differences in self-assessed mindfulness, or self-assessed empathy between the intervention group and wait-list control group at post-intervention or at the 3-month follow-up point. There were no significant differences in sharing behaviours between the intervention group and the wait-list control group at post-intervention or follow-up. The intervention group was however voted as significantly more helpful by their peers, compared to the wait-list control group, at post-intervention, and this continued to follow-up. The intervention group was also voted as significantly more prosocial by their teachers at post-intervention, compared to the wait-list control group, but this difference was no longer visible at follow-up. Following the intervention, the intervention group was voted as significantly more popular by their classmates, compared to the control group.

Our first hypothesis was that the SBMP would significantly increase self-reported mindfulness compared to the wait-list control group. This hypothesis was not confirmed as the two groups did not differ significantly at post-intervention or follow-up. This contradicts findings from previous studies (e.g. Schonert-Reichl et al., 2015). Secondly, we hypothesised that empathy would increase significantly for the intervention group compared to the wait-list control group. Empathy was measured in three forms: affective empathy, cognitive empathy, and intention to comfort. The hypothesis was not confirmed as the two groups did not differ significantly at post-intervention or follow-up. Possible reasons for these null findings are discussed below. Thirdly, we hypothesised that children’s prosocial behaviour would increase significantly for the intervention group compared to the wait-list control group. This was measured through teacher assessment, peer assessment, sociometry, and an observable sharing task. Our third hypothesis was partially supported through teacher assessments, peer assessments, and sociometry, but not through observable sharing behaviour.

The significant increase in peer-assessed helping behaviour for the intervention group was in line with findings from Schonert-Reichl et al. (2015) who also assessed helpfulness through peer nominations. A systematic review also found evidence that MBIs predicted increases in helping behaviours, with medium effect sizes (Donald et al., 2019). In testing their hypothesis that mindfulness interventions which focus on the cultivation of prosocial emotions would have a larger effect on helping behaviour than those that focus only on cultivating mindful awareness, Donald and colleagues (2019) found no evidence to support this claim. This suggests that mindfulness by itself is sufficient to produce increases in helping behaviour. Although there is no specific teaching in the Paws b course about being helpful, in the last two lessons, children are taught about gratitude. They are shown a Venn diagram called the Magic Mix which includes the words gratitude, kindness, and happiness. Here the intention is to help the children recognise that their kindness and warmth in their connection with others can have a direct impact on how they feel and how others feel (MiSP (Mindfulness in Schools Project), 2023), therefore promoting prosocial communication. Because helpfulness was measured through peer assessment, it may in fact be the case that, rather than children acting in a more helpful way to their peers, peers themselves are more perceptive to children acting in helpful ways. The final two lessons of the course encourage children to notice positive events, which may bring happiness, so perhaps children notice their peers being helpful following these lessons.

The increase in the intervention group’s prosocial behaviour, as assessed by their teachers, was in line with Sciutto et al. (2021) who found that, from pre-programme to post-programme, teacher ratings of students’ prosocial behaviour increased. Similar results have been found with the child version of the SDQ (Waldemar et al., 2016). The significant results in this study were only visible at post-intervention and no longer significant at follow-up. It may be that at some point between post-intervention and 3 months later this increase in prosocial behaviour is lost, and that SBMPs do not produce long-lasting effects on prosocial behaviour. Possibly, mindfulness practice needs to continue for the effects to remain present. As we did not measure continued practice following the intervention, we are unable to assess this theory. We should note that, between post-intervention and follow-up, all classes had a 6-week summer holiday and then returned to a new teacher for the next academic year. It is possible that the new teacher, who assessed the children after 1 month of teaching them, was not as exposed to their prosocial behaviours to the same degree as the teacher previously, who had taught them for a whole year. Finally, it was impossible to blind teachers to condition; therefore, the post-intervention teacher results may have shown response-bias.

The results of the sociometry were also in line with previous findings by Schonert-Reichl et al. (2015), who found that an SBMP led to significantly greater peer acceptance within classes. Previous research has found that prosocial children are significantly more popular within the classroom (Warden & Mackinnon, 2003) and that children are more likely to be nominated as popular by peers when they exhibit higher levels of prosocial behaviour (Kornbluh & Neal, 2014). It could be suggested, therefore, that the increase in helpfulness (one characteristic of prosocial behaviour) following the SBMP, in turn, led to participants being voted as more popular by their peers. Alternatively, SBMPs may lead to increased peer acceptance and this may in turn lead to popular children being voted as more prosocial by peers. This link between popularity and peer-nominated prosocial behaviour was found in a study (Greener, 2000) in which popular children were rated as significantly more prosocial than all other sociometric groups. Whilst studies support the theory that SBMPs can lead to increased peer acceptance (Schonert-Reichl et al., 2015), the mediating effect of peer acceptance on peer-rated prosocial behaviour following SBMPs has yet to be investigated.

There are a number of possible explanations for the null findings for self-assessed mindfulness and empathy. It is likely that the null findings for mindfulness may be due to a lack of implementation of mindfulness outside of the weekly sessions. Mindfulness-based interventions encourage home practice to cultivate the development and enhancement of skills (Quach et al., 2017). One substantial difference in Schonert-Reichl et al. (2015) study was that, alongside the weekly 40–50-min lesson, the mindfulness exercises were practised every day for 3 min, three times per day. As the Paws b course in this study was delivered by an external mindfulness teacher, it was not possible for the untrained class teacher to lead the mindfulness practices during the week, in the absence of the mindfulness teacher. As recommended by MISP, the children in our study were invited to practise mindfulness exercises at least three times per week at home; however, as implementation outside of sessions was not measured directly, we are unable to draw conclusions about the effect it had. In another study investigating the effects of Paws b (Vickery & Dorjee, 2016), students were asked how often they practised mindfulness outside of school using a 4-point Likert scale (never, rarely, often, every day). 60.6% of participants responded ‘never’, or ‘rarely’. One study measuring home practice through completion of a daily log indicated that home practice compliance was poor relative to suggested home practice (Quach et al., 2017). In the study, adolescents in the mindfulness meditation group practised outside of intervention sessions for only a quarter of the total recommended time. A measurement of home practice for our study, in the form of a diary or log, could have helped in understanding the impact of dose on effects. As suggested by students in the study by Schwind et al. (2017), preparing audio-taped step-by-step instructions could have been one method to support consistent home practice.

In addition, the MindUP curriculum used in the study by Schonert-Reichl et al. (2015) included lessons that involved performing acts of kindness for one another and collectively engaging in community service learning activities. These activities were aimed at changing the ecology of the classroom environment to one in which belonging, caring, collaboration, and understanding others are emphasised to create a positive classroom milieu (e.g. Staub, 1988). This may explain the increase in empathy, which was not as prevalent in our study.

Secondly, it may be that the intervention used in our study, Paws b, is actually benign and does not deliver sufficient training in key mindfulness skills to impact mindfulness and empathy. However, previous studies that have used Paws b have found positive significant changes in psychological variables which are associated with prosociality in classrooms, e.g. decreases in negative affect (Vickery & Dorjee, 2016) and increases in attention skills (Thomas & Atkinson, 2016). When looking at alternative mindfulness courses for children, we have not been able to find one that has yielded greater significant results regarding psychological and interpersonal effects. Identifying and evaluating specific mindfulness activities that have an effect on empathy might be an area of further development.

A third possibility for the null findings is that the measures were unsuitable for the youngest participants in the study. The MAAS-C was selected as the most appropriate self-report measure of mindfulness as, to date, it has been validated with the youngest population (8–9-year-olds). Nevertheless, the youngest children in this study were 7 years old, and although all items were read out to the children, cognitively, this questionnaire may have been challenging for the youngest of the children to understand, and therefore may have affected the accuracy of the results. A similar study which compared children who had received the Paws b course to a control group of children measured mindfulness using the Child and Adolescent Mindfulness Measure (CAMM) (Greco et al., 2011). They also found no significant time by group interaction for mindfulness and the average age of participants was below the validated age for the CAMM (Vickery & Dorjee, 2016). Therefore, the measure may be providing inaccurate results, because of the age group it was administered to. This was also the case for the EmQue-CA, which had been validated for children as young as 8 years but not 7.

The null results of the sharing task were also contradictory to previous studies (Berti & Cigala, 2020; Flook et al., 2015; Viglas & Perlman, 2018). These studies however involved 3–5-year-old children rather than 7–10-year-old children. It appears that, for preschoolers, studies have shown improvements in sharing behaviour following an SBMP, whereas in our study this was not the case. We are unaware of a study involving primary school-aged children which has measured sharing behaviour directly, so perhaps SBMPs do not increase sharing in this particular age group; however, without the comparison of other studies with this age group, it is difficult to draw conclusions. One study which did investigate this age group (Schonert-Reichl et al., 2015) found that children in the intervention group were more likely to improve from pre-intervention to post-intervention in peer-rated sharing, compared to the control group. In the analysis of peer-rated prosocial behaviours in our study, the children in the control group were in fact rated as significantly better at “sharing and cooperating” at post-intervention compared to the intervention group where there was no significant increase. These results were unexpected and could perhaps be based on differences in the content of MindUp compared to Paws b. It is important to highlight that sharing is just one of many displays of prosocial behaviour. It would be interesting to measure other forms of prosocial behaviour in direct behavioural tasks, e.g. helping behaviours or acts of trustworthiness or kindness.

Finally, it may be that there are other variables that may moderate the effects of mindfulness programmes with children. For instance, variations in classroom teacher characteristics (e.g. programme “buy-in”), school structures (e.g. one class teacher or a job share of two class teachers in one class), or contextual influences (e.g. facilitator-teacher communication) may explain why a similar programme might have different effects across studies (Dariotis et al., 2017). As the field moves toward larger-scale studies, school-related factors will be important to consider (e.g. teacher buy-in) (Baelen et al., 2023). MiSP encourages teachers to embed the mindfulness course into school life as much as possible. They suggest that, for their courses to be most effective, they should be part of a whole school initiative where as many teachers as possible are on board. This includes the use of posters around the school, full teacher participation when they are not teaching the course themselves, regular practice throughout the school week, and encouragement for the children to practise at home. As the schools in our study selected specific year groups to take part, we were unable to facilitate this whole school initiative as suggested by the course developers.

Limitations and Future Research

There were various limitations to this study. First, analyses were conducted at the individual child level even though randomisation to groups was done at the classroom level. This limits the causal inferences to be drawn from this study. Unfortunately, the small number of classrooms did not provide sufficient statistical power to use multilevel modelling (MLM)-analysing data at the classroom level. A sample size of at least 40 clusters (e.g. classrooms or schools) needs to be given in order to achieve satisfactory statistical power in classroom-based programme evaluation using MLM (Raudenbush, 1997). Hence, analyses were conducted at the level of the individual (i.e. child), rather than at the classroom level. The clustering of children within classrooms results in the non-independence of subjects, which could bias the statistical tests used to identify intervention effects. This is a major challenge to evaluations of universal, school-based interventions when insufficient resources exist to recruit large numbers of classrooms or schools (Stoolmiller et al., 2000). Although methodological research has indicated that significance levels resulting from individual-level analyses where a programme was implemented at the level of the classroom may be overstated (e.g. Donner et al., 2000), it has also supported the notion that effect sizes remain unbiased (Rindskopf, 2010). Nevertheless, it is possible that these methodological limitations of having a classroom-level intervention but an individual-level analysis could have implications for the interpretation of the results. Future studies could consider using linear mixed modelling to inform readers more accurately of classroom-level effects.

Secondly, with regard to our teacher and peer assessments, neither class teachers nor peers were blind to treatment conditions. Although peers as participant observers can provide important sources of information about their classmates’ behaviours, both inside and outside of the classroom, our peer behavioural assessment measure of prosocial behaviours may have been influenced by peers’ knowledge about the experimental condition. We suggest that peers’ ratings of classmates’ behaviours would be less likely than teachers’ to be influenced by knowledge of the conditions, given that it is unlikely that children would be able to offer specific hypotheses of the study. Nonetheless, we have no data to support this suggestion, and future investigations of the Paws b course would benefit from collecting data from observers, blind to conditions to allow for a more objective measure of children’s behaviours. Similar concerns arise with respect to using teacher report measures of students when teachers are not blind to the condition to which a classroom has been assigned in a study.

Thirdly, this study failed to collect in-depth assessments of participant responsiveness and experiences, critical aspects of implementation (Baelen et al., 2023) that have been shown to contribute to intended outcomes (e.g. Monteiro, 2020; Roeser et al., 2023b). It may have been helpful to assess the use or implementation of mindfulness outside of the lessons. This could have been achieved by providing each participant with a home practice diary, to record each time they had a go at a practice, or by asking them at the end of the study how often they practised mindfulness at home. It also could have been achieved by providing teachers with a class practice record to record each time the class practised one of the mindfulness exercises, as used in Schonert-Reichl et al. (2015) study. Nonspecific effects of school-based interventions, e.g. interaction, interest, and attention, are valuable parts of the intervention itself that we should acknowledge when we study intervention mechanisms (Donovan et al., 2009).

One direction for future research would be to measure differences in the long-lasting effects of mindfulness, depending on whether practice and/or teaching of mindfulness continues beyond the initial 12-week course. We cannot comment at this stage on whether continued practice would result in longer-lasting prosocial effects. It would also be useful to measure empathy and prosocial behaviour through other means, to compare findings from this study. One suggestion would be to measure observable prosocial behaviours in a naturalistic setting, for example in the playground, as used with preschoolers in previous studies (e.g. Berti & Cigala, 2020). Finally, consideration of the mechanisms involved in this change in prosocial behaviour and peer acceptance would provide a deeper level of understanding as to how and why mindfulness training may increase certain aspects of prosocial behaviour in children.

This study was conducted prior to the release of the SBMP Implementation Framework (SBMP-IF; Baelen et al. 2023). Although every effort has been made to adhere to the reporting recommendations in the SBMP-IF, there were aspects of the framework which were not measured (e.g. feasibility, acceptability, and responsiveness), and therefore we were unable to report on. In order to allow researchers to identify the core components of SBMPs and discern for whom and under what conditions SBMPs work, future studies should make every effort to adhere to these guidelines in the planning of any SBMP data collection.

Overall findings suggest that, for 7–10-year-olds, the SBMP, Paws b, delivered by a mindfulness teacher, once per week, for one school term, does not significantly increase self-assessed levels of mindfulness or empathy, nor does it increase observed sharing behaviours. Findings suggest however that the SBMP Paws b does increase helpfulness, as judged by peers and teacher-assessed prosocial behaviour. It also increases the popularity of pupils involved in the SBMP, and reciprocal relationships within the classes of those who take part in the SBMP. This was the first study, to our knowledge, to measure empathy and prosociality with this age group in the UK. Results suggest that SBMPs may lead to increases in certain aspects of prosociality in children. Further research considering the mechanisms behind these changes may provide a better understanding of the results.