Introduction

Early self-regulation contributes to disparities in success in learning, interpersonal behaviours, mental and physical health in later life (Robson et al., 2020). Self-regulation is considered the foundation for lifelong functioning in a wide range of domains, from academic achievement to emotional well-being, physical and mental health (e.g., Murray et al, 2015; Perry, 2013). Most importantly, self-regulation has proved to be malleable, and can be strengthened with intervention (Dignath et al., 2008; Hattie et al., 1996). Therefore, we propose that with intervention, children’s self-regulation can be improved with significant positive outcomes spanning from childhood to adulthood.

Reaching a consensus as to a definition for self-regulation has proven very difficult. The multitude of definitions for self-regulation range from ‘impulse control, self-control, self-management, self-direction, independence’ (Bronson, 2000, p. 3) to ‘the processes whereby learners personally activate and sustain cognitions, affects and behaviours that are systematically oriented towards the attainment of personal goals’ (Zimmerman, 2011, p. 1). For the purposes of this paper, we define self-regulation as managing all aspects of human behaviour, including emotional, social, cognitive and motivational elements, to enable goal-directed actions such as organising behaviour, controlling impulses, and solving problems constructively (Boekaerts & Corno, 2005; Bronson, 2000; Murray et al., 2015). This definition acknowledges metacognition as the central cognitive element of self-regulation (Whitebread et al., 2010), and recognises that self-regulation also ‘encompasses affective, motivational and social elements’ (Whitebread et al., 2009, p. 64). This conceptualisation of self-regulation has inspired the C.Ind.Le coding framework of self-regulation and metacognition (Whitebread et al., 2009) on which our study relies heavily. In fact, Whitebread et al. (2009) explain that their model of self-regulation has been inspired by the cognitive information-processing tradition, which coined the terms ‘metamemory’ and ‘metacognition’ (Brown, 1987; Flavell, 1979) and the socio-cultural tradition which introduced the term ‘self-regulation’ (Vygotsky, 1978). Whitebread et al (2009) further explained that in their model, they use both the terms self-regulation and metacognition, in order to recognise the parts of their model relying heavily on the cognitive tradition. From this point onwards, we shall also be referring to both self-regulation and metacognition when referring to Whitebread et al. (2009) model and coding framework.

Whitebread et al.'s (2009, p. 79–80) model of self-regulation and metacognition, which we adopted for this study, proposes three areas of self-regulation: metacognitive knowledge, metacognitive regulation and emotional and motivational regulation. Metacognitive knowledge involves: metacognitive knowledge of persons (the explicit expression of one’s knowledge in relation to cognition or people as cognitive processors), metacognitive knowledge of tasks (the explicit expression of one’s long-term memory in relation to elements of the task), and metacognitive knowledge of strategies (the explicit expression of one’s knowledge in relation to strategies used when performing a task). Metacognitive regulation involves: planning (selecting procedures necessary for performing a task), monitoring (the ongoing on-task assessment of the task performance), control (changes in the way a task had been conducted as a result of monitoring), and evaluation (reviewing task performance and evaluating its quality). Finally, emotional and motivational regulation involves: emotional/motivational monitoring (assessment of current emotional and motivational experiences) and emotional/motivational control (regulation of one’s emotional and motivational experiences). All the self-regulation and metacognition measures employed in this study are based on Whitebread et al. model of self-regulation and metacognition.

Early childhood provides an important opportunity for intervention to improve self-regulation, since underlying neurobiological changes occurring up to the age of 7 enable self-regulation skills to increase dramatically (Berger et al., 2007; Rueda et al., 2011). Dignath et al. (2008) much cited meta-analysis results indicated that self-regulation training programmes are effective for children, even at primary school. In fact, they revealed that even young primary school students’ self-regulation can be improved efficiently. When comparing younger and older students, primary school-aged participants (particularly those in the first years of primary school) benefited more in the areas of strategy use and motivation than older primary-school children. Overall, they suggested that intervention programmes are most effective when targeting younger primary school students. All these results indicate the importance of targeting self-regulation interventions at children in early childhood, particularly so in the early years of primary school.

When considering self-regulation interventions that could be incorporated with young children, we examined the mounting evidence which suggests that in early childhood, play is a viable route for supporting self-regulatory development. In a review of the evidence, Whitebread (2018) concluded that play enhances learning because a) it is motivating which leads to enhancing the efficiency of key regions of the brain relevant, for example, for regulating attention and mental flexibility, b) it supports the development of symbolic abilities, and c) supports children to develop their self-regulation. The theoretical basis for this link between play and self-regulation derived from Vygotsky’s (1978) work, which articulated a cognitive mechanism through which, he argued, play contributed to children’s intentional learning, creativity and problem solving, all contexts in which self-regulation is required. This is the Zone of Proximal Development (ZPD): a metaphorical description of the difference that exists between the developmental level of the child when working alone, and their potential developmental level when working with a more skilled person (Vygotsky, 1978). Children, according to Vygotsky, set their own level of challenge during their play so that it is always developmentally appropriate for them. This ZPD is created through two communicative ingredients: intersubjectivity (i.e., shared understanding) and scaffolding (i.e., effective support by a more skilled person which is gradually relinquished as the child masters a task). This theory inspired researchers such as Elias and Berk (2002) and Whitebread et al. (2009) to explore if it was upheld in practice. Elias and Berk’s (2002) observational study suggested a positive correlation between children’s play when in their classrooms and children’s socially-shared behaviour/ responsibility during circle time and cleaning up time (which were conceived as two instances where self-regulation was required). Whitebread et al. (2005, 2007, 2009) observed 3- to 5-year-old children in their classes for a period of two years, and their results suggested that self-regulation mostly occurred during playful activities compared to class time.

Research on play and self-regulation has traditionally focused on make-believe play (e.g., Berk et al., 2006). Make-believe play is argued to promote self-regulatory development, because it provides children with the grounds to act in their ZPD, to act beyond their ages. This happens due to make-believe play’s rule-based and intersubjective nature, and owing to the fact that it affords for self-regulating language (Berk et al., 2006; Vygotsky, 1978), and for emotional regulation (Fantuzzo et al., 2004; Galyer & Evans, 2001).

Self-regulation and musical play

More recently, musical play surfaced as another type of play which could promote self-regulatory development. Musical play possesses all the characteristics that Whitebread (2018) delineated as supportive of learning: it is motivating, it affords opportunities for symbolic abilities development, and it enables children to develop their self-regulation. The theoretical support for linking musical play and self-regulation was first delineated by Zachariou and Whitebread (2017) and came from the fact that musical play shares all the characteristics that make make-believe play a context fertile for self-regulation: musical play is based on rules (Marsh, 2008; Marsh & Young, 2007), reinforces self-regulatory language and emotional regulation (Bannan & Woodward, 2009; Barrett, 2009) and incorporates intersubjectivity and scaffolding (Bannan & Woodward, 2009; Marsh, 2008; Marsh & Young, 2007; Young, 2005). Musical play refers to the prevalent activities which are described in the literature: hand-clapping games, circle games, movement play, singing play and instrumental play (Harwood, 1998; Lew & Campbell, 2005; Marsh & Young, 2007; Pond, 1980; Tarnowski, 1999; Young, 2003, 2004). Musical play is a promising context for intervention, because it is a universal, innate type of play, prevalent in children’s lives (Papousek, 1996; Trevarthen, 2000; Young, 2005), and could be easily incorporated in the curriculum.

The link between musical play and self-regulation was recently supported by empirical evidence from various studies (Boese, 2017; Williams, 2018; Winsler et al., 2011; Zachariou & Whitebread, 2015, 2017). Winsler et al. (2011) compared young children’s (aged 3 to 4) self-regulation if they had participated in music and movement classes incorporating musical play to that of their peers who had not had participated in these classes. The children’s self-regulation was assessed through laboratory self-regulation tasks. The children who had experienced musical play (mean length of involvement in these classes was 10 months) showed better self-regulation and used more self-regulatory language in the form of private speech, a strategy which was positively associated with their performance on a selective attention task. Additionally, these children were more likely to use singing or humming to themselves as a facilitative strategy while engaging in a delay of gratification task. This strategy was linked to inhibiting their desire to open a gift or call out to the experimenter, which was negatively related to performance and self-regulation. It could thus be argued that the children participating in musical play sessions were more likely to successfully engage in strategies that would foster their self-regulation.

Some of the first studies in the area of musical play and self-regulation were conducted by Zachariou and Whitebread (2015, 2017). The aim was to explore if musical play would provide a fertile context for self-regulation. In the first study, children’s self-regulation was evident in their musical play (Zachariou & Whitebread, 2015). Here, the authors observed and video-recorded the musical play of ten children aged 6 to 7 years, in a primary classroom. Using the C.Ind.Le coding framework (Whitebread et al., 2009), they found that the most frequently coded self-regulatory behaviours during musical play were children planning, monitoring, and controlling their play. In the second study, musical play sessions were implemented in six classrooms over five weeks, and the study focused on 36 children (Zachariou & Whitebread, 2017). During their musical play, more than 15,000 short episodes of self-regulation were identified at a micro-genetic utterance level. Some of the children’s predominant self-regulation behaviours included: checking their efforts, checking whether they were on track and self-correcting when they made a mistake (cognitive monitoring), showing understanding of their own and others’ emotions, and monitoring their emotional/motivational reactions (emotional/motivational monitoring). Importantly, Zachariou and Whitebread (2017) also found that children were very likely to share regulation between group members while involved in musical play, and they attributed this to the interdependency that is afforded by musical play.

Since then, many studies have linked musical play to self-regulation (e.g., Boese, 2017; de Bruin, 2018; Williams, 2018) but none of them adopted a focused observational perspective where the effects of musical play were assessed. The only exception to this is a study by Williams and Berthelsen (2019) who implemented an intervention focused on coordinated rhythmic movement with music, taking place over 16 sessions each lasting 30 min. The results suggested positive intervention effects for emotional self-regulation for all participants (children aged 4–5) as reported by the children’s teachers, and positive effects on teacher-reported cognitive and behavioural regulation for one of the three intervention sites. On the one hand, these results are promising and in line with previous research, which suggests that musical play has the potential to support self-regulatory development. On the other hand, it should be noted that self-regulation was not directly assessed but was reported by the teachers, who were not blind to children’s assignment to either the intervention or the control group. This was a methodological constraint with effects on the study’s validity. Therefore, it is fundamental that research on musical play interventions also adopts direct, on-task assessments of children’s self-regulation, a strategy that was followed in the present study.

The current study

Given the gaps in the literature, the key aim of the current study was to investigate the impact of a musical play intervention on young children’s self-regulation and metacognition. However, most research on musical play and self-regulation is observational and the field is lacking in rigorous experimental research examining the effects of musical play. Therefore, we adopted a quasi-experimental, pre-test and post-test control-group design in order to explore whether introducing musical play as an intervention in schools could have beneficial effects on young children’s self-regulation and metacognition. In order to address these aim and gaps, we put forward the hypothesis that musical play would have a beneficial effect on self-regulation and metacognition (as suggested by prior research). More specifically, we generated the following three hypotheses, each addressing the three areas of self-regulation and metacognition as delineated by Whitebread et al. (2009):

  • H1: The intervention group will show a steeper increase in their metacognitive knowledge compared to the control group;

  • H2: The intervention group will show a steeper increase in their metacognitive regulation compared to the control group;

  • H3: The intervention group will show a steeper increase in their emotional/motivational regulation compared to the control group.

Based on the meta-analysis of self-regulation research by Dignath et al. (2008) which revealed that children in the younger classes of primary school benefited the most from self-regulation interventions, we decided to focus on the youngest primary school students (Year 1, aged 6). To our knowledge, this is the first musical play intervention happening in real-world primary classrooms. This supports the study’s ecological validity and its applicability in real-world contexts.

Additionally, in prior musical play interventions, the effects on children’s self-regulation were measured through either teacher reports (Williams & Berthelsen, 2019) or lab-based measures (Winsler et al., 2011). These methodological approaches were significant limitations of the studies. In the present study, we employed teacher reports on children’s self-regulation, but we also measured children’s self-regulation based on: a) observations of children while engaging in naturalistic, meaningful tasks for them, and b) while responding to a follow-up interview about the task, in order to have an accurate image of children’s abilities. We coded these observations on the basis of an already validated coding framework (Whitebread et al., 2009). We adopted a micro-genetic, utterance-level coding approach, which ensured that no self-regulation indications were missed.

Method

Participants

In total, 117 children had participated in this study. The participants were aged 6 at the start of the project (Mage = 76 months, 6.3 years, SD = 3.47 months, 57% female) and were recruited from 10 Year 1 classrooms at 4 state primary schools in Cyprus. The sample size was determined using G*Power analysis (Buchner et al., 2011; Cohen, 1988). The calculations indicated that a sample size of 45 children per group was sufficient to attain a statistical power of .91 (GPower), with an estimated small-medium effect size (f = .25). Out of these 117 children, six were removed because they were multivariate outliers (see the preliminary analyses below) and 13 had missing data on at least one of the main variables in the analyses presented below. Thus, the data from 98 children were used for the present analyses. Forty-five children participated in the intervention, and the control group comprised 53 children who continued with their music curriculum as normal. Participants’ information is shown in Table 1.

Parents reported on their family’s SES, by providing their current occupation and their highest level of education at Time 1 (start of project). We coded their responses into socioeconomic metrics employing the Hollingshead (1975) four-factor index of social status. We converted parental occupation into scores ranging from 1 to 9. For example, farm labourers would get a score of 1, and higher executives would get a score of 9. Parents’ education was coded on a 7-point scale. A parent who had completed less than 7th grade would get 1 for their education code, and a parent who had graduate training (beyond university) would receive a 7. We collected information for both parents in a family and we calculated mean occupation and education scores based on both parents. In the case of single parents, only their scores were used in the analyses. The final SES score for each child comprised the mean of their parents’ education and occupation codes. The sample’s MSES score was 6 out of 8.

Materials and procedures

Baseline data (Time 1) were collected from all children across 1 month in November–December 2018. The 13-lessons intervention was then delivered to the intervention group by their music teacher. The intervention and control group were exposed to the same number of music lessons (2) every week. For the duration of the intervention, the music lessons of the intervention group were dedicated to musical play, whereas the control group continued with their primary school music lessons as usual, following the curriculum. Upon the completion of the intervention, post-test data (Time 2) were collected over 2 weeks.

Time 1 and 2 data collection for children participants included assessments of children’s self-regulation and metacognition. Self-regulation and metacognition were assessed on three validated observational instruments: a) on-task assessment of children’s self-regulation and metacognition through completion of the Train Track Task (Bryce & Whitebread, 2012; Whitebread et al., 2009), b) children’s metacognitive knowledge was assessed after the task during a Metacognitive Knowledge interview about the task (Marulis & Nelson, 2021), and c) children’s self-regulation and metacognition was reported by their teachers on the CHILD observational checklist (Whitebread et al., 2009).

At Time 1 we also collected information on the children’s gender, age, and family socioeconomic status (SES) through an information sheet sent to the parents at the start of the project. This information was used to calculate control variables.

The study’s procedures were approved under the procedures of Roehampton University’s Ethics Committee and Cyprus’ Ministry of Education, Sport and Youth. Consent for participation in the study was granted by children’s parents/guardians and only children who assented to participate in the study were included as participants.

Musical play intervention

The intervention was designed by the lead music teacher (a highly qualified music educator, holder of a PhD in Music Education) with input from the study’s first author. This same music teacher delivered the intervention to all children in the intervention group. There were 13 musical play sessions of 40 min duration, conducted twice per week across 7 weeks (we allowed 1 contingency session in the case of public holidays).

The musical play sessions were based on musical play activities previously employed in similar research, which involve instrumental play, singing play, circle games, movement play and clapping games (e.g., Boese, 2017; Zachariou & Whitebread, 2015, 2017). The music teacher and the first author had four one-hour-long meetings to discuss the content of this intervention. The music teacher was aware of the study’s focus on musical play and ‘independent learning’ (a term used in the past instead of ‘self-regulated learning’ when addressing non-expert audiences, e.g., Whitebread et al., 2009), and the meetings focused on discussing what musical play is and providing various examples of musical play activities implemented in past research. The lead music teacher developed detailed lesson plans, to ensure consistency in their implementation across different classes. In order to reveal musical play’s full potential for self-regulation, the activities had to first and foremost be ecologically valid and meaningful to the children (Whitebread et al., 2009). For this reason, the teacher had relative freedom to develop the activities and make small modifications according to each class’s preferences and capabilities. The fact that the teacher who designed the intervention was also the one implementing it and that the intervention was delivered at the same time period to all children ensured high levels of intervention fidelity. Additional evidence for high fidelity of intervention, came from the teacher having provided a note-taking record next to musical play lesson plans, where she consistently noted that she was able to complete all activities with all classes, with the only discrepancies being that some classes had opportunities to repeat some activities (the session duration was the same for all classes).

All intervention activities had elements of practising key skills of self-regulation, as these would come up naturally in the musical play activities. None of these activities were externally manipulated in order to further stimulate self-regulation. However, we note that there is clear potential for this, and future studies could consider this. Some common activity elements across the sessions included: start/stop (inhibition) such as moving when the teacher/child plays an instrument and stopping when they stop; working memory games such as a take away song where the children had to progressively sing more and more of the song ‘in their heads’ but continue to carry out the accompanying movements of the song; children having to wait for their turn, for example when in a circle game they had to wait patiently until the instrument reached them; reversal of instruction (e.g., moving in one direction and immediately changing at the sound of a signal); planning, such as preparing their own song- magical spell to ‘awaken their magical toys’; controlling and changing their strategies when something would not work, for example when they were trying to create their own movement play with their peers. Musical play sessions are described in Appendix 1.

We should note here that most of the musical play activities comprised of guided play. Looking at play as a spectrum, as proposed by a group of leading researchers of play (Zosh et al., 2018), play ranges from free play (without any adult guidance or support), to guided play, then games, then co-opted play, to playful instruction (closely guided and directed by the adult). Guided play, which is initiated by the adult and directed by the children, includes purposeful adult support but maintains playful elements (Zosh et al., 2018), and is a powerful tool for teaching and learning with catalytic effects on children’s intellectual, emotional, social and linguistic development (Golinkoff et al., 2008; Hirsh-Pasek et al., 2008; Skene et al., 2022).

Control group instruction

The control group continued with their primary school music lessons as usual, following the curriculum. The control group lessons mainly involved activities carried out from children’s desks/chairs such as singing, recognising and identifying musical instruments from musical pieces, learning to play the glockenspiel and the recorder, and identifying musical notes. These activities do not qualify as musical play. It was only rather infrequently that the control group engaged in some activities that resembled musical play (e.g., movement play) but these were mostly very closely guided by the adult who gave very specific instructions. On the spectrum of play as defined by Zosh et al. (2018) these activities would sit at the opposite end of the spectrum from free play (closer to direct instruction) and be categorised as playful instruction: initiated and directed by the adult.

We collected the lesson plans for all the music lessons from the control group teachers for the duration of the intervention. This enabled us to confirm that all control group music teachers were indeed following the standard music curriculum. We should also note that all teachers were highly qualified, had expertise and interest in teaching music.

Measuring self-regulation and metacognition

On-task self-regulation and metacognition

The Train Track Task (TTT) was used at Times 1 and 2 and provided a reliable way of eliciting children’s self-regulation and metacognition skills during a playful problem-solving task. It involves building a train track to match a predefined shape from a plan. The train track task (TTT) was adapted by Bryce and Whitebread (2012) from Karmiloff-Smith’s (1979) closed-circuit railway task. The children’s attempts were video-recorded and then coded. This task has been widely used in the literature (e.g., Bryce & Whitebread, 2012; Spektor-Levy et al., 2017).

Children were asked to match a train track plan as well as they could, using as many pieces as required. The children were asked to use the available train track pieces to attempt some shapes. The experimenter presented one of five shapes to the child: 1) circle, 2) oval, 3) goggles, 4) p-shape, 5) g-shape. Three shapes (shapes 2–4) were the same as those used by Bryce and Whitebread (2012) for their study with 5- and 7-year old children. Bryce and Whitebread had deemed the oval shape as the easiest and the p-shape as the most difficult. The other two shapes were added based on a similar study by Pino-Pasternak et al. (2014) who had added the circle as the easiest of all shapes and the g-shape as the most difficult. See Appendix 2 for shapes and instructions.

For the purposes of analysing the children’s self-regulation and metacognition on task, it was imperative that we would analyse the task at the right level of challenge for each child (not too difficult/ easy for them). At Time 1, all children started their attempts with Shape 1. The experimenter rated the quality of each track created by the child from 0 to 3. If the track did not resemble the model at all, it was rated as 0, if there was some resemblance but it was far from being a replication it was rated as 1, if it resembled the model but with some small mistakes (e.g., did not link up, some pieces wrong) it was rated as 2, and a perfect replication of the model was rated 3. Based on pilot data, any child who completed the task at quality 2 or above within a specific time frame (this was calculated as the mean time for completion of the specific model + 1SD) was then asked to attempt the next shape in difficulty. For example, the threshold for shape 1 was having completed the track at quality 2 or 3, within 2′32’’. To decide the level of difficulty for the starting shape at Time 2, this same procedure was followed based on the child’s attempts at Time 1.

The video recordings of children’s attempts at the shape of the right level of challenge were analysed for their on-task self-regulation and metacognition on the basis of the C.Ind.Le coding scheme (Whitebread et al., 2009). This observation coding scheme allows for identification of self-regulation behaviours indicative of the following three areas which include nine facets:

  1. A.

    Metacognitive knowledge

    1. 1.

      metacognitive knowledge of persons: knowledge in relation to cognition or people as cognitive processors

    2. 2.

      metacognitive knowledge of tasks: expression of one’s long-term memory in relation to elements of the task

    3. 3.

      metacognitive knowledge of strategies: knowledge in relation to strategies used in performing a cognitive task

  2. B.

    Metacognitive regulation

    1. 4.

      metacognitive regulation –planning: verbalisation or behaviour about the selection of procedures necessary for a task

    2. 5.

      metacognitive regulation –monitoring: ongoing on-task assessment

    3. 6.

      metacognitive regulation –control: change in the way a task had been conducted

    4. 7.

      metacognitive regulation –evaluation: reviewing and evaluating task performance

  3. C.

    Emotional/motivational regulation

    1. 8.

      emotional and motivational regulation-monitoring: assessment of current emotional and motivational experiences during a task

    2. 9.

      emotional and motivational regulation-control: regulation of one’s emotional and motivational experiences while on task

We note here that the first three behaviours (behaviours 1–3), comprise of metacognitive knowledge behaviours during on-task activity. For the purpose of discussion of our results, we will return to comparing these behaviours to measures of metacognitive knowledge during interview, which we also collected in this study.

We followed a micro-genetic, utterance level coding procedure (as adopted in relevant research e.g., Neale & Whitebread, 2019; Zachariou & Whitebread,  20192022). This means that a research assistant coded every utterance or behaviour that indicated self-regulation, and categorised it as one of the behaviours listed above. Upon completion of coding, we calculated the rate of each self-regulation behaviour for each child by dividing the number of its incidents by the total duration of the task. For example, if a child took 2.5 min to complete the task and engaged in five planning behaviours during the task, the rate of planning behaviours would be 5/2.5’, which equals to a rate of 2 self-regulatory behaviours per minute. We also calculated the rate for each area of self-regulatory behaviour by calculating the sum of all self-regulatory facets within that area (e.g., for metacognitive knowledge: the sum of metacognitive knowledge of people, tasks and strategies). The overall rate of on-task self-regulation was the sum of all self-regulatory areas.

Metacognitive knowledge at interview

At the end of the Train Track task, the children were engaged in a Metacognitive Knowledge Interview. We chose to include this additional task explicitly targeting the area of children’s metacognitive knowledge because, in the literature, there have been concerns that children’s metacognitive knowledge abilities are underestimated. From the three areas of self-regulation described by Whitebread et al. (2009), metacognitive regulation and emotional/motivational regulation can be identified through both verbal and non-verbal behaviours. On the contrary, identification of metacognitive knowledge on-task relies only on children’s verbalisations. This is a limitation of most on-task coding schemes assessing metacognitive knowledge, since verbalisations are not necessarily evident while on task nor is this a developmentally appropriate way to assess these (e.g., see Marulis & Nelson, 2021; Robson & Zachariou, 2022). To overcome this issue, we adopted the Metacognitive Knowledge Interview, which prompts the children to externalise the metacognitive knowledge, and was inspired by the work reported in Marulis and Nelson (2021). The procedures followed in this interview are presented in Appendix 3.

Marulis and Nelson (2021) suggested that each interview question assessed either metacognitive knowledge of people, tasks or strategies, which is aligned with Whitebread et al.’s (2009) model of self-regulation. For example they suggested that the question ‘Do you think you did a good, okay or not so good job on the puzzles? Why?’ would elicit responses showing metacognitive knowledge of people. However, we found that children in our study would respond to any single question with responses that could indicate metacognitive knowledge of people (e.g., I am good at this), but they could also give responses indicating metacognitive knowledge of tasks (e.g., knowledge of the shape, I found it easy/difficult) or strategies (e.g., I looked at/copied the model, changed my strategy). Therefore, we coded each child’s response as to whether it was metacognitive knowledge of people, tasks or strategies, without deciding in advance what each question assesses.

Children’s responses were coded as a) no response, b) irrelevant response, c) inconsistent (saying something that was inaccurate, e.g. saying that they had separated curved pieces from straight pieces when we had not observed this), d) consistent but not showing metacognitive knowledge (e.g., first I put this, then this…), e) showing metacognitive knowledge. If the response was indicating metacognitive knowledge, we coded it as to whether it indicated the following facets: metacognitive knowledge of 1. people, 2. tasks or 3. strategies and coded how many pieces of evidence were evident for each type of metacognitive knowledge behaviour. Figure 1 illustrates the coding path followed for each response to the metacognitive knowledge interview questions.

Fig. 1
figure 1

Coding metacognitive knowledge responses to the interview questions

At the end of the interview each child’s score was the sum of metacognitive knowledge responses they gave (with separate counts for metacognitive knowledge of people, tasks or strategies), but children could also have a score of 0, if for example they gave all irrelevant responses.

Coding on-task self-regulation and metacognition, and metacognitive knowledge at interview

A research assistant (RA) coded all the observations and interviews. The RA has a background in Education, an MA degree in Primary Education and is a doctoral researcher in Psychology and Education, with more than three years of practice in the area. The RA received training on the coding scheme and was blind to the study’s aim, hypotheses and children’s assignment to the control or intervention group. 15% of the observations were coded by a second observer (the study’s first author, an expert in the area, who holds a PhD in Education and Psychology and has worked in the area both as a researcher and practitioner for more than ten years) for inter-rater reliability purposes. We went through each of the observations, and compared whether the two coders identified the same utterances of self-regulation for all facets of self-regulation. The intraclass correlation coefficient was .93 which indicates a high consistency between the two observers.

Teacher-reported self-regulation

At Times 1 and 2 the children’s teachers were asked to complete the CHecklist of Independent Learning Development 3–5 (CHILD 3–5) for each child. This is an observational instrument that is used by class-teachers to assess children’s self-regulation and metacognitive ability, after observing children’s daily behaviours. The teachers score 22 statements describing self-regulation behaviours for each child in the A. prosocial (e.g., child shares and takes turns independently), B. emotional (e.g., tackles new tasks confidently), C. cognitive (e.g., is aware of own strengths and weaknesses) and D. motivational (e.g., initiating activities) areas of self-regulation. The teachers attended a one-hour session in which they were introduced to the concept of self-regulation, and underwent training through the C.Ind.Le package (Whitebread et al., 2009) on how to complete the CHILD checklist. For the control group, the class teachers completed these checklists at Times 1 and 2. For the intervention group, the music teacher completed these forms, due to the unavailability of the class teachers.

To derive the final assessment results for each child, each statement on the CHILD checklist was scored from 0, standing for ‘never’, to 3, standing for ‘always’, which was the procedure most recently followed for CHILD (Whitebread et al., 2011; Zachariou & Whitebread, 2015). Following this, an average score for each area of self-regulation was calculated and the mean score of self-regulation was the average of all four aspects.

The psychometric structure and the external validity of this instrument have already been endorsed in a variety of contexts. The CHILD has achieved high levels of internal consistency (Cronbach alpha = .97) amongst its 22 statements and provided high inter-rater reliability (level of agreement = 85.9%) (Whitebread et al., 2009). In the present study, internal consistency coefficients were also excellent (Pre-test α control = .96; α intervention = .87; Post-test α control = .97; α intervention = .91) and test–retest reliability of the CHILD checklist over 16 weeks was also high (ICC control = .91; α intervention = .79).

Analytic plan

A series of 2 (control vs intervention group) X 2 (pre-post tests) repeated measures ANCOVA (R-ANCOVA) were conducted separately for each of the three self-regulation and metacognition measures and their areas/facets:

  1. 1.

    On-task self-regulation (3 areas: A. metacognitive knowledge; B. metacognitive regulation; C. emotional/motivational regulation);

  2. 2.

    Child’s metacognitive knowledge at interview following train track tasks (3 facets: 1. metacognitive knowledge of people, 2. tasks and 3. strategies); and

  3. 3.

    Teacher-reported self-regulation, i.e., child’s self-regulation reported by their teachers through the CHILD checklist (4 areas: A. prosocial, B. emotional, C. cognitive and D. motivational).

In all analyses, children’s age in months, gender, and parents’ socioeconomic status were added as covariates.

Results

Preliminary analyses

Screening for assumptions of independence and multivariate normality using Mahalanobis distance on all of the dependent, independent and covariate variables revealed that 6 participants had a standardized Mahalanobis distance coefficient above 3 SD. These participants were removed from the analyses. Following this removal, the assumption of normality was met as evidenced by probability plots of residuals approximating normality and further analyses of residuals. Assumptions of independence and non-multicollinearity were also met. Homogeneity of variance was investigated for each analysis separately. When unequal variances (sphericity) were found, Greenhouse–Geisser correction was used.

Baseline measures of self-regulation and metacognition

Independent sample t-tests revealed that the control and intervention groups did not differ significantly at baseline for any of the on-task self-regulation measures. The two groups were also found to be equal at baseline in terms of all the metacognitive knowledge measures at interview. Finally, all of the scores differed at baseline for the teacher-reported self-regulation measures, with the intervention group scoring systematically lower than the control group. Table 1 presents the sample characteristics and t-tests for the self-regulation measures at baseline and Table 2 shows the Pearson’s correlations between all of the variables included in this study.

Table 1 Mean, standard deviation and independent-sample t-tests between the variables at baseline
Table 2 Zero order correlations between all of the variables included in this study

Main results

All of the results presented are controlling for the effects of intervention group allocation (intervention versus control), age, gender and SES.

On-task self-regulation

Box’s tests of equality of covariance revealed that the variances were equal across groups for the on-task self-regulation measure. Results of the multivariate analysis showed no significant differences in on-task self-regulation as a whole between pre- and post-tests, no difference between groups, and no interaction effect. Therefore, there was no intervention effect. However, SES was a significant predictor of on-task self-regulation changes between pre-and post-tests, F(1, 93) = 5.48, p = .02, η2 = .06. Results are shown in Panel A of Fig. 2, which shows that, while controlling for age, gender and experimental grouping, children whose parents scored lower on SES (split at the standardized mean; n = 43) displayed a slight increase in self-regulation between pre- and post-test, whereas children whose parents scored higher in SES (n = 55) remained more stable in self-regulation over time.

Fig. 2
figure 2

Significant interaction effects between the observational measures of on-task self-regulation at pre- and post-test, SES and Age

Area A. Metacognitive knowledge. Variances were not equal between the control and the intervention groups, Box’s F(3, 13886829) = 4.97, p = .002. There were no observed effects of time, group, interaction with the groups, or any of the covariates on this area of on-task self-regulation. Area B. Metacognitive regulation. Box’s tests of equality of covariance indicated that the variances were equal across groups for metacognitive regulation. Results did not reveal a main effect of time. There was no interaction between time and group. Results have shown an interaction between age and metacognitive regulation, F(1, 93) = 8.27, p =.005, η2p = .08. In order to examine age effects, age was standardized and then children were split into two groups (younger than average (n = 50) and older than average (n = 48)). Panel B of Fig. 2 shows that, for the lower age group, metacognitive regulation decreased slightly between pre-and post-test, whereas for older children, it increased. No other effects were found for this variable. Area C. Emotional/Motivational regulation. Variances were found to be equal across groups. Multivariate effects showed no effect of time, group or interaction between group and time on emotional/motivational regulation. An interaction effect was found between SES and emotional/motivational regulation, F(1, 93) = 6.01, p = .02, η2p = .06. Panel C of Fig. 2 shows that while children on the low SES level showed an increase in emotional/motivational regulation, children on the higher end of SES displayed a slight decrease in emotional/motivational regulation between pre-and post-test. While SES effects are not related to the intervention, they indicate the importance of low SES children attending primary school, since children who start school at lower levels of self-regulation, make more significant gains during their first months at school, compared to children of higher SES.

In sum, we did not find any interaction effects between group and time on any of the on-task self-regulation observational measures and its three specific areas. Therefore there was no indication of an intervention effect on on-task self-regulation and its three areas.

Children’s metacognitive knowledge at interview following train track tasks

Box’s Test of equality of variance were not significant, meaning that the variances were equal between groups. We did not find a main effect of time. However, results revealed a significant interaction between time and group, F(1, 93) = 18.24, p < .001, η2p = .16. Panel A of Fig. 3 illustrates this finding. As can be seen, whereas the control group remained relatively stable between pre- and post-test, the intervention group showed a significant increase in metacognitive knowledge following the intervention.

Fig. 3
figure 3

Interaction between pre- and post-test scores on metacognitive knowledge at interview, and group assignment (Control vs Intervention)

Facet 1. Metacognitive knowledge of people. Variances were shown to be equal across groups. There were no significant effects of time, group or covariates on the metacognitive knowledge of people facet of the interviews. Facet 2. Metacognitive knowledge of tasks. The variances being equal between groups, results revealed no effect of time, group or covariates on metacognitive knowledge of tasks. Facet 3. Metacognitive knowledge of strategies. Results have shown that variances were not equal between the two groups, F(3, 13886829) = 4.89, p = .002. Multivariate effects revealed no main effect of time but a significant interaction between group and time on metacognitive knowledge of strategies, F(1, 93) = 21.44, p < .001, η2p = .19. Panel B of Fig. 3 reveals that the intervention group displays a significant increase in metacognitive knowledge of strategies between measures at pre-and post-tests, but the control group remained relatively stable between the two time points.

Teacher-reported self-regulation

We found unequal variances between the two groups, F(3, 13886829) = 9.19, p < .001. Multivariate analyses did not find a main effect of time on the teacher-reported self-regulation measure. However, we found a significant interaction effect between group and time, with the intervention group showing a significantly greater increase in teacher-reported self-regulation between pre- and post-tests, F(1, 93) = 127.09, p < .001, η2p =.58. Panel A of Fig. 4 illustrates the results. This figure shows that while in the control group the measure of teacher-reported self-regulation remained stable between pre- and post-tests, it significantly increased in the intervention group. None of the covariates were significant.

Fig. 4
figure 4

Interaction between pre- and post-test scores on teacher-reported self-regulation, and group assignment (Control vs Intervention)

Area A. Prosocial self-regulation. Box test of equality of variance revealed that the variances were unequal between groups, F(3, 13886829) = 6.69, p < .001. Results have shown no effect of time or the covariates on prosocial regulation. However, grouping significantly predicted an increase in prosocial regulation between pre- and post-test, F(1, 93) = 59.64, p < .001, η2p = .39. While the control group remained stable between pre and post-tests, prosocial self-regulation of the intervention group increased significantly at post-test as compared to pre-test. Panel B of Fig. 4 shows this result, which goes in the same direction as the overall measure of teacher-reported self-regulation. Area B. Emotional self-regulation. Variances were not equal between groups, F(3, 13886829) = 3.55, p = .01. While no main effect of time was found, we found a significant effect of group in predicting change in emotional self-regulation between pre- and post-tests, F(1, 93) = 96.84, p < .001, η2p = .51. Emotional self-regulation increased significantly at post-test as compared to pre-test for the intervention group but remained relatively stable over time for the control group. Panel C of Fig. 4 illustrates the results that show an increase in emotional self-regulation only for the intervention group. The covariates were not significant predictors in this model. Area C. Cognitive self-regulation. Box test of equality of variances was significant, F(3, 13886829) = 10.34, p < .001. Results revealed that time alone, or the covariates, were not significant predictors of cognitive self-regulation as reported by teachers. However, we found a significant effect of group, with the intervention group displaying a significant increase in cognitive self-regulation between pre- and post-tests, F(1, 93) = 75.68, p < .001, η2p = .45. The results are displayed in Panel D of Fig. 4, which shows that only the intervention group had a significant increase in cognitive self-regulation between pre- and post-test. Area D. Motivational self-regulation. Box test of equality of variance results indicated that the variances were not equal between groups, F(3, 13886829) = 13.88, p < .001. Results have shown no effect of time or the covariates on motivational regulation. However, grouping significantly predicted an increase in motivational self-regulation, F(1, 93) = 105.99, p < .001, η2p = .53. While the control group remained stable between pre- and post-tests, motivational self-regulation of the intervention group increased significantly between pre- and post-test. Panel E of Fig. 4 shows that children’s motivational self-regulation as reported by teachers remained relatively stable between times of measurement for the control group whereas it significantly increased for children in the intervention group.

Discussion

Our results contribute two key findings, namely that (a) when measuring children’s on-task self-regulation and metacognition (including metacognitive knowledge, metacognitive regulation and emotional/motivational regulation), the musical play intervention did not appear to have an effect, although according to the measure of children’s teacher-rated self-regulation and metacognition, the intervention had a positive effect on self-regulation, and (b) when specifically measuring children’s metacognitive knowledge at interview after the task, the musical play intervention did have a positive effect on it.

The intervention had a positive effect on children’s self-regulation as reported by their teachers, but no effect when on-task self-regulation was measured

The results relating to children’s on-task self-regulation differed to the results relating to teacher-reported self-regulation. On the one hand, when looking at children’s self-regulation on task through an observational measure there were no significant intervention effects. This was also the case when looking at the three areas of on-task self-regulation separately, namely metacognitive knowledge, metacognitive regulation and emotional/motivational regulation. This was an unexpected result since, based on previous studies (Williams & Berthelsen, 2019; Winsler et al., 2011) a positive effect of the musical play intervention was expected on children’s self-regulation. One possible explanation for the differing results could be differences in the length of engagement with musical play. Even though the length of our intervention (520 min of musical play) was comparable to the length of Williams and Berthelsen’s (2019) intervention (480 min), Winsler et al. (2011) had focused on children who had participated in music and movement classes for, on average, 10 months. It could also be that both these other studies had a specific focus on rhythmic movement that led to their results. Alternatively, the difference in results may be due to the different tasks that were used to measure self-regulation in this study compared with previous studies. Therefore, future research could look at whether extending the length of the intervention, focusing more on the movement element and perhaps even intentionally accentuating the self-regulatory elements of musical play could lead to positive effects on children’s self-regulation.

On the other hand, according to the teacher-reported measure of self-regulation, the intervention group’s self-regulation improved significantly more compared to the control group, both for overall self-regulation and for each area (prosocial, emotional, cognitive, motivational). This result is in line with Williams and Berthelsen’s (2019) results, which also used a teacher-reported measurement of self-regulation, and reported positive intervention effects for emotional self-regulation for all children and for cognitive and behavioural regulation for one of the three intervention sites.

This result raises two interesting points: First, this result could be due to one of our study’s limitations, that for the intervention group, it was the teacher who delivered the intervention (i.e., not the class teacher as was the case for the control group) who also filled in the CHILD checklists. This highlights the importance of blind assessments, such as the coding that was done for TTT. Future studies could also aim to always have the class teacher (or another teacher not involved in the intervention) complete assessments for the children. Second, and on the contrary to the previous point, could it be that the teacher-report observational instrument was more sensitive than the on-task observational instrument? The music teacher was actually observing the children while engaged in the musical play tasks, so she might have been able to capture improvements that were not captured by the train track task, which was of a different nature compared to the musical play tasks. Therefore, we could assume that children were not able to or did not have the time to transfer their self-regulation development from the musical play context to the train track task’s requirements. If so, then this would explain why teacher-reported self-regulation seems to have been affected by the intervention, whereas on-task measures do not suggest this.

The results of our study are insightful in that they illustrate how teacher-reported data and observational data can often be misaligned. Self- or other-reported data not agreeing with observational data has often been identified as a pertinent issue in self-regulation research (see for example Azevedo, 2009; Perry, 2019). The discrepancy between teachers’ and task’s results also highlights how results from previous musical play intervention studies suggesting positive effects of musical play on self-regulation but entirely based on teacher-reported assessments might be misleading. The key point emerging from our research is that teacher-reported or on-task observational data alone are not sufficient when looking at self-regulation, and that a combination of instruments should be employed to triangulate results and ensure adequate levels of validity in studies.

The intervention had a positive effect on children’s metacognitive knowledge at interview

When looking at children’s metacognitive knowledge during an interview following the TTT task, the results revealed a significant intervention effect. Children who participated in the intervention showed a significantly steeper increase from pre- to post-test in their metacognitive knowledge during interview, compared to children in the control group. More specifically, we explored the separate facets of metacognitive knowledge and found that the intervention had a significant effect on children’s metacognitive knowledge of strategies. This means that the children who participated in the intervention grew significantly better in terms of knowledge in relation to strategies used in performing a cognitive task. Compared to children in the control group, after the intervention, children in the intervention group were better at defining, explaining or teaching others how they had done or learned something, explaining procedures involved in a task, evaluating the effectiveness of one or more strategies (Whitebread et al., 2009). This result is consistent with Dignath et al.’s (2008) argument that children in the first years of primary schools benefit more in the areas of strategy use when involved in interventions. This reveals that our choice of working with young primary school children was successful and future studies should consider focusing on this age group too.

Three points deserve note here. First, it is worth exploring why children’s metacognitive knowledge of strategies improved following the intervention. We could speculate that this was observed because musical play in groups requires interdependency (Zachariou & Whitebread, 2017), and thus requires children to share with the other children their strategies, to explain the procedures in their task (all these are aspects of metacognitive knowledge of strategies). Further research could investigate whether interdependency is the reason for improvement in metacognitive knowledge of strategies, by comparing activities that require interdependent work to ones that do not. Second, a methodological point should be considered. We assessed metacognitive knowledge observationally on task (see on-task self-regulation, area A: metacognitive knowledge) and at interview following the task. It could be argued that only the latter method was able to capture children’s metacognition. This shows the importance of supplementing observational measures of on-task performance with post-task interviews in which children are prompted to express their metacognitive knowledge, a conclusion aligned with recent research (Marulis & Nelson, 2021; Robson, 2016). The adoption of multiple different types of measurement helps combat the issue of underestimating children’s metacognitive knowledge abilities, by providing alternative, developmentally appropriate ways of assessing children’s metacognitive knowledge. Third, due to the quasi-experimental nature of the study and given that we could not control for every aspect of the environment, it is not possible to argue with certainty that the significant effect we captured is because of the intervention, as a variety of other extraneous variables (such as differences between schools or teaching styles) could have a role to play.

Conclusion

Our study is the first quasi-experimental study to explore the effects of a musical play intervention taking place in real-world primary school classrooms, using a unique combination of naturalistic self-regulation measures including 1) teacher-reported assessments, 2) a micro-genetic utterance-level coding procedure to code for children’s self-regulation on task and 3) a metacognitive knowledge interview following the task.

Our aim was to explore whether introducing a musical play intervention could have beneficial effects on young children’s metacognition and self-regulation (including metacognitive knowledge, metacognitive regulation and emotional/motivational regulation). Taking into consideration all the results of the present study (see Table 3 for a summary of results), there is sufficient supporting evidence to suggest that children’s metacognitive knowledge improved following the intervention (a result that is triangulated by two sources of evidence). In regards to whether self-regulation in general was improved, we can only make tentative claims since teacher-reported data reveal a positive effect, but on-task assessment of self-regulation does not corroborate this finding.

Table 3 Table summarising which self-regulation and metacognition measures indicated a significantly steeper improvement for the intervention group, and which area of self-regulation was improved

This study, by using three different sources of data: on-task self-regulation, metacognitive knowledge at interview, and teacher-reported self-regulation, and reaching different results depending on the measurement tool, offers substantial insights for future studies and furthers the discussion about the importance of a variety of measures when assessing self-regulation and metacognition. Future studies focusing on young children’s self-regulation and metacognition should avoid basing their results solely on teacher-reported effects, on-task observational data or metacognitive knowledge interviews. A combination of the three approaches is more likely to provide a more accurate, less biased picture of the results and a comprehensive understanding of constructs, which in turn can help inform educational practice and policy. Finally, together with previous findings, our results may enable impactful shifts in practice and policy to reverse the recent side-lining of music and play from the curriculum.