Included Studies
The number of studies at each stage of the review is reported in Fig. 1.
The five studies included in the final review are summarised in Table 1. Inter-rater reliability for study inclusion/exclusion at the ‘Eligibility’ stage of the process was κ = 0.92 (95% CI 0.87–0.92).
Table 1 Studies included in the systematic review
Four of the included studies are published in peer-reviewed journals, and one is an as yet unpublished manuscript.
Quality Assessment
The quality assessments for different features of the studies in relation to our primary research question are summarised in Figs. 2 and 3. Figure 2 shows quality ratings by category from the EPHPP quality assessment tool.
Inter-rater reliability for quality assessment was κ = 0.69 (95% CI 0.45–0.93). Differences in ratings arose from differing interpretations of the studies (as opposed to different interpretation of the rating criteria), and most frequently occurred in the ‘bias’ rating category. Differences were resolved by discussion and with reference to the online documentation for EPHPP.
Figure 3 shows the number of studies reporting additional features considered important for quality—ethical procedures, conflict of interest (CoI) declarations, funding sources and study pre-registration.
Narrative Synthesis
Study Characteristics
Two studies were conducted in Australia, one in New Zealand, one in the UK and one in the USA. All studies except one were published in peer-reviewed journals (the Bundy and colleagues study was kindly provided to us as a manuscript in preparation).
Interventions
Three of the included studies examine LPP interventions which introduce recycled scrap materials to the playground (Bundy, Wyver, Naughton, Engelen, & Tranter, n.d.; Farmer et al., 2017; Hyndman et al., 2014b). The duration of this type of scrap intervention ranged from 7 weeks to 1 year. In the Farmer and colleagues study, LPP was one of a number of components in an intervention package designed to improve opportunities for risky and challenging play. One study implemented more traditional loose sports materials such as skipping ropes, balls and Frisbees, over a short period of 5 consecutive days (Barton et al., 2015). Finally, the study by Kuh and colleagues evaluated LPP as part of a much larger-scale ‘playscaping’ exercise, where the whole school grounds were transformed over 3 months in the summer (Kuh et al., 2013).
Participant Characteristics
Across all studies, participating children were aged between 4 and 12 years old and attending mainstream schools. Ethnicity and SES data were not consistently reported by the studies making it difficult to aggregate this information.
As evident from the quality assessment above, sampling considerations are important in intervention studies. The ‘target population’ for LPP interventions was not always easy to ascertain, leading to some inter-rater disagreement in the quality ratings. The Bundy study used a random sample of schools in a fairly broad geographical area, while in contrast the Kuh study randomly sampled participants from a single school.
Study Designs
In terms of study design, of the five included studies, two used a cluster-randomised design, one used a quasi-experimental design, and two used observational designs. Both the study by Bundy and colleagues (2016) and the study by Farmer and colleagues (2017) adopt ‘cluster-randomised’ designs, where the random allocation of participants to intervention or control group occurred at the level of the school, rather than individual children. The Hyndman, Benson, Ullah and Telford study used a quasi-experimental design, with an intervention group and a matched control group (Hyndman et al., 2014b). The remaining two studies used observational designs (Barton et al., 2015; Kuh et al., 2013), meaning that baseline and post-intervention measurements were taken but control groups were not used.
Measures
No two studies shared an outcome measurement tool in common, although some methodological approaches were shared, with 3 studies using video coding of observations and 4 studies using questionnaires. All but one study investigated outcomes associated with aspects of social development, examples include co-operative play, prosocial behaviour, experience of bullying and psychosocial quality of life. Emotional outcomes were measured in 3 studies: Self-esteem in the Barton and colleagues study, Enjoyment in the Hyndman and colleagues study, and Happiness at school in the Farmer and colleagues study. No study included in the review used assessment-based indicators of cognitive or academic outcomes, although the Bundy and colleagues paper does contain ratings of self- and teacher perceived academic competence. We now describe these outcome measures in detail for each study before going on to summarise findings.
The Barton and colleagues study (Barton et al., 2015) investigated effects of the introduction of loose sports equipment on children’s physical activity (PA) and self-esteem. Self-esteem (SE) was measured at baseline and post-intervention using the 10 item, well-established, Rosenberg SE self-report questionnaire (Rosenberg, 1965). The authors report good test–retest reliability (rs ranging from 0.82 to 0.99) and good internal consistency (Cronbach’s alpha ranging from 0.77 to 0.88) for previous datasets although not for the sample in their study.
The paper by Hyndman and colleagues included in this synthesis (Hyndman et al., 2014b) reports on two measures which relate to outcomes other than PA (the study’s primary outcome measure). (1) The Pediatric Quality of Life Inventory 4.0 (QoL, Varni & Limbers, 2009) including a sub-scale which focuses on psychosocial development, and (2) The Lunchtime Enjoyment of Activity and Play (LEAP) Questionnaire (Hyndman, Telford, Finch, Ullah, & Benson, 2013) which aims to measure children’s enjoyment of physical, interpersonal (i.e. social) and intra-personal (i.e. individual) aspects of play. For both measures, the authors report the measures have good reliability and validity, citing a validation paper as evidence, but they do not report the co-efficients directly, or for the dataset under analysis. These measures were completed with a subset of the main sample, composed of those children aged 8–12 years. Presumably, this decision was taken due to practical difficulties in using self-report questionnaires with younger children.
The study reported by Farmer and colleagues used the well-established Peer Relations Questionnaire Revised (PRAQ-R). They used a multi-informant approach, with a different number of questions per category of respondent: child (10 items), parent (3 items) and teacher (8 items). Reliability and validity for the subset of questions adopted for the study is not reported. Outcomes were analysed on an item by item basis, rather than using scales summing across all items.
The most comprehensive set of quantitative indicators from an included study is to be found in the unpublished manuscript supplied to us by Bundy and colleagues. This study used a combination of systematic video coding, child self-report and teacher report to measure outcomes related to social and emotional development. Video recordings were taken for 15 min each day during the intervention period. An independent researcher (unaware of the study hypotheses) used the footage to note and quantify pre-specified social and play behaviours. The coding scheme used is not reported in detail; however, the authors state that the behaviours of interest were ‘categories of play and non-play, as well as quantification of social interactions’. A third of the video sample was coded for inter-rater reliability; no specific reliability co-efficient is reported, but the authors report agreement was ‘almost perfect.’
Children’s self-perceptions of their competence in physical and academic domains, together with their perceptions of social acceptance by peers and caregivers, were measured using the Harter and Pike Pictorial Scale of Perceived Competence and Social Acceptance for Young Children (PSPCSAYC) (Harter & Pike, 1984). This measure asks children to report their own assessment of their skills in these domains, using a series of pictorial prompts. The authors report ‘reliability between 0.75 and 0.89’; we assume this refers to the internal consistency of the scale in previous studies although this is not explicitly stated. Social skills were also assessed via the Social Skills Improvement System Rating Scale (SSIS-RS, Gresham & Elliott, 2008), which is a parent or teacher questionnaire used in the assessment of children’s social development. Again, good reliability and validity information are available from the cited study.
The study by Kuh et al. (2013) used a mixed methods approach, where data from systematic observations were combined with field notes and semi-structured interviews with children. Observers were trained to observe children’s behaviour live on the playground and to record the nature and duration of play activities at timed intervals of 30 s. Inter-rater reliability is reported as κ = 0.78, although it is not clear if this was for the observer training data or the study data. These frequency data were combined with field notes on play narratives and with comments from the children to facilitate interpretation. The pre- and post-intervention measurements were taken on a randomly picked sample each time and therefore represent changes in group behaviour, rather than changes at the individual level.
Findings
Having summarised some of the methodological approaches used, we now report the findings. Studies showed good awareness of potential confounding variables, and all studies included measures to control effects of at least some of the following: age, gender, SES and baseline scores. Differences in the playground environmental context were accounted for statistically in one study only, and this only accounted for space available per child. Other studies reported differences in playground type between activities, but these were either not controlled or were part of the intervention ethos itself.
Regarding social outcomes, for the studies with the most robust designs, few statistically significant intervention effects were observed. In the Bundy et al. study, null findings were reported for; engagement in play (β = 11.8, 95% CI −1.3 to 24.8, p = 0.08, d = 0.27), self-rated peer social competence (β = −0.13, 95% CI −0.29 to −0.28, p = 0.11) and teacher-rated social skills (β = −1.15 to 2.96, p = 0.1–0.4). For the Farmer et al. study null findings were reported as follows, Child:Footnote 1 Liking classmates at 1 year (Odds Ratio (OR) = 0.83, 95% CI 0.64–1.07, p = 0.15), Liking playtime (OR = 1.06, 95% CI 0.75–1.50, p = 0.76), Playing with others at 2 years (OR = 1.00, 95% CI 0.59–1.70, p = 0.99), Liking school (OR = 0.65, 95% CI 0.41–1.03, p = 0.07), Verbal abuse at playtime (OR = 0.97, 95% CI 0.73–1.29, p = 0.82), Exclusion during playtime (OR = 1.14, 95% CI 0.69–1.90, p = 0.61), Being told off by a teacher (OR = 1.18, 0.80–1.75, p = 0.40) and Reporting bullying at 1 year (OR = 1.07, 95% CI 0.76–1.50, p = 0.72). Parent: Child upset by bullying at school (OR = 1.27, 95% CI 0.82–1.97, p = 0.29), Child has been bullied (OR = 1.60, 95% CI 0.97–2.65, p = 0.07). Teacher: Again mostly null findings were reported including: Frequency of reported bullying (OR = 0.01, 95% CI −0.15 to 0.16), school safety (OR = 0.12, 95% CI −0.07 to 0.30, p = 0.19), name-calling (OR = 0.08, 95% CI 0.13–0.29, p = 0.43), amongst others.
Likewise, for the quasi-experimental study no differences between intervention and control groups were found for the psychosocial QoL, nor for the interpersonal aspect of the LEAP questionnaire. In the observational study (Kuh et al., 2013), a significant increase in co-operative behaviour was observed after the implementation of the intervention.
A couple of statistically significant between-group differences in social outcomes were observed in the Farmer study: intervention children reported playing with more children at 1-year follow-up (OR = 1.66; 95% CI: 1.29–2.15), being pushed/shoved more at 2-year follow-up (OR: 1.33; 95% CI: 1.03–1.71), and less likely to tell a teacher about bullying (OR: 0.69; 95% CI: 0.52–0.92) at 2 years. Corrections for multiple comparisons were not made in the statistical analysis reported.
With reference to emotional outcomes, the pre- and post-test single group study by Barton and colleagues did not find any significant changes in self-esteem in children exposed to LPP (mean change = 1.53, SD = 5.94) compared to an orienteering activity (mean change = 1.32, SD = 4.66). Hyndman and colleagues found a small effect of LPP intervention on increased intra-personal enjoyment at the 7 weeks’ time point in their intervention (+0.24 adjusted mean change, 95% CI = 0.004–0.48, p = 0.045). Meanwhile, Farmer and colleagues found higher odds of children in the intervention group being happy at school at 2 years (OR = 1.64, 95% CI 1.20–2.25).
For academic outcomes, the Bundy and colleagues study reported no statistically significant changes were observed in teacher-rated academic competence (t = 0.13, 95% CI = −0.03 to 0.29, p = 0.11) and a similar outcome for self-ratings (co-efficients not reported in paper).