Introduction

Rationale

Problems in effective parenting are increasingly seen as a significant public health issue [1] and public policy has come to reflect this. The Positive Parenting Programme (Triple P) [2] is a multi-level behavioral family intervention which has been proposed [3, 4] and used [5, 6] in recent years on a whole-population basis as a public health intervention in addition to its use on a more targeted basis. Many administrative entities (cities or counties) throughout the world have adopted or are in the process of adopting the program on a large scale, with substantial cost implications [7]. UK National Institute for Clinical Excellence guidelines suggest Triple P is an effective educational intervention for parents of children with conduct disorder, a recommendation which carries considerable weight in policy and purchasing decisions in England [8].

The evidence base for Triple P appears to be extensive, with more than 200 publications and a large number of published randomized trials. There are four existing meta-analyses of the program [912], uniformly reporting positive effects on child behavior, but these reviews did not make systematic attempts to analyze risk of bias beyond the differing effect sizes attributable to different informants [9, 11]. Moderators of effectiveness, such as severity of presenting problems, intensity of intervention and age/gender of the child, were assessed in three reviews [9, 11, 12]. There is some doubt about the effectiveness of Triple P in deprived communities [11], with lone parents [13] and among younger children, and the overall impact at population level has not been examined in detail. Much of the published work is authored by affiliates of the Triple P organization putting the independence of the evidence in a less favorable position.

We have used Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [14] to examine reporting and other biases in a systematic way and to delineate any gaps in the evidence base supporting Triple P. We have focused on child-based outcomes in this review since the ultimate aim of parenting programs is to improve children's wellbeing.

Objectives

We examined the published data to:

  • Identify characteristics of the populations in which Triple P interventions have been subject to investigation

  • Clarify which comparison conditions were used in Triple P evaluations

  • Identify child-based outcome measures and which informants provided outcome data

  • Examine critically the design of studies in which comparisons with alternative interventions have been reported

  • Clarify any contribution of publication bias to the existing meta-analyses through examination of trial registry entries, funnel plots, and meta-regression approaches

  • Clarify any contribution of outcome reporting bias and selective reporting of results in article abstracts

Methods

Protocol and registration

We did not register the protocol for this review.

Eligibility criteria

Published articles in which any level of Triple P (or a precursor behavioral family intervention from the same group of authors) was used, in which any (non-Triple-P) comparison condition was employed, and in which a quantitative child-based outcome was reported, were eligible for inclusion in the systematic review. Criteria for the meta-analysis were more restrictive: eligible studies were randomized controlled trials (RCTs) reporting Child Behavior Checklist or Eyberg Child Behavior Inventory scores for intervention and comparison groups.

Journal articles published in English before September 2011 were eligible for inclusion. We also examined book chapters, whole books and electronic documents available locally and through the United Kingdom's inter-library loan system.

Information sources

We searched databases PsycINFO 1970- August 2011, Embase 1980- August week 3 2011 and Ovid Medline 1950-August week 3 2011. We also included all journal articles, books and book chapters listed on the Parenting and Family Support Centre Triple-P database at the University of Queensland [15] (accessed 16 September 2009 and 29 August 2011) and relevant secondary references from the four available systematic reviews [9, 11, 12, 16].

Search

A search was carried out on 29 August 2011 using the following strategy:

Keywords = "bfi" or "Behav$ Family Intervention" or ["parenting" and "Triple"] or ["positive" and "parenting"].

Study selection

The study selection process is illustrated in Figure 1.

Figure 1
figure 1

PRISMA diagram.

Screening:

In the first stage, papers which were clearly not:

  • intervention studies or

  • studies about the Triple-P parenting program or one of its precursors were excluded on the basis of title alone.

For the next stage papers were rejected which:

  • were not published in the English language

  • were not intervention studies

  • were not conducted using a comparison group

  • did not report a quantifiable child outcome.

In addition, review papers and book chapters which were clearly reviews were excluded.

Full documents were obtained for the remaining records.

Papers were rejected at this stage if they:

  • were not intervention studies

  • were not conducted using a non-Triple P comparison group

  • did not report a quantifiable child outcome

  • did not use Triple P or one of its precursors as an intervention

  • did not report original data.

Eligible papers were tabulated and used in the qualitative synthesis.

For the meta-analysis, papers from randomized trials reporting the two most commonly used outcome measures, the Intensity scale of the Eyberg Child Behavior Inventory [17] (ECBI-I) and the Externalizing Behavior subscale of the Child Behavior Checklist (CBCL) [18] were used. These are the outcome measures reported in other meta-analyses of Triple-P child-based outcomes and are applicable to children 2- to 16-years old. Other child-based outcomes (apart from the Problem subscale of the ECBI and the Internalizing subscale of the CBCL) were reported in too few studies to allow meaningful meta-analysis. Reductions in ECBI-I and CBCL scores represent improvement. Scores on the ECBI and CBCL subscales not reported here generally mirrored those that we have reported but effect sizes were usually of lesser magnitude.

Data collection process

Data were collected, with permission, onto a form based on that used by the Scottish Intercollegiate Guideline Network [19] (accessed 11 October 2012). For each paper, two of the authors completed the data collection form. If authors disagreed, a third author adjudicated. As our analysis concerned only published data, we did not seek to obtain further data from investigators.

Data items

The following variables were assessed:

  • Numbers of patients or families included in the study

  • Main characteristics of the patient population

  • Nature of the intervention being investigated

  • Which outcomes were compared across groups

  • Nature of the control or comparison group

  • Length of follow-up

  • Nature of child-based outcome measure(s) used in the study

  • Which outcomes were reported in article Results and Abstract sections

  • Whether a principal outcome measure was pre-specified

  • Whether a power calculation was included

  • Whether the assignment of subjects to treatment groups was randomized

  • Whether an adequate concealment method was used (RCTs only)

  • Whether reporters of the child-based outcomes were blind to treatment allocation

  • Whether treatment and control groups were similar at baseline

  • Dropout rates for participants recruited into each arm of the study

  • Whether group differences were analyzed by intention to treat

  • Whether subgroup analyses were performed

  • Mean and standard deviation of post-intervention child-based outcome measures (for meta-analysis)

  • Whether a statement of study funding was included

  • Affiliations of authors

  • Whether a conflict of interest statement was included

  • Whether trials were registered with a public trials registry.

Risk of bias in individual studies

Outcome reporting bias within eligible studies was reported qualitatively. Numerical summaries were made of the likelihood of statistically significant and non-significant results being equally reported in the Results and Abstract sections of published papers.

Summary measures

The effect size (ES) for each study included in the meta-analysis was estimated using the standardized mean difference (SMD), with post-intervention mean and pooled standard deviation. Hedges g, under a random effects modelling approach, was used to obtain unbiased estimates of ESs. From studies with more than one treatment group, or subgroups reported separately, an averaged effect was derived based on sample size, standard deviation and mean.

Synthesis of results

Both fixed and random effects models were generated, but the resulting models were very similar and only the random effects model is reported here. Random effects models assume that treatment effects may differ between studies, and this assumption has face validity given that treatment intensities and types of participants varied between studies.

Variation in SMDs attributable to heterogeneity was assessed with the I-squared statistic (that is, the percentage of between study heterogeneity attributable to variability in the true treatment effect, rather than sampling variation).

Risk of bias across studies

Publication bias was assessed with funnel plots which illustrate the possibility of selective publication of small studies with positive results. Egger's regression based adjustment method was used on the data presented on funnel plots.

Additional analyses

We planned sensitivity analyses in relation to authorship (Triple-P affiliated versus non-Triple-P affiliated). We also planned a subgroup analysis of data obtained on child behavior from sources other than the mother or principal carer (for example, fathers, teachers, independent observers).

In order to assess whether baseline symptom severity moderated treatment effects, we undertook a random effects meta-regression to investigate the association of baseline (pre-intervention) values with ES, examining only those studies which employed the most commonly used outcome measure - the ECBI Intensity scale score.

Results

Study selection

The selection process is illustrated in Figure 1.

Study characteristics

The main characteristics of the studies are presented in additional file 1. Most of the studies (26/33) used a waiting list control condition where treatment was offered immediately after the post-intervention assessment. This design precluded control group follow-up beyond the end of the intervention. Comparisons of intervention and control groups beyond the duration of the intervention were only possible in five studies.

Risk of bias within studies

Data on the risk of bias of each study are presented in Table 1.

Table 1 Risk of bias in individual studies.

No studies were registered with national or international trials registries. All the studies apart from two [6, 20] used individual or cluster [5, 13, 21, 22] random assignment to the study group, but the mechanism of randomization was only reported in the minority of studies. No papers reported a pre-specified principal outcome measure, and no power calculations based on specific outcome measures were reported. Four papers reported a power calculation based on a general ES [2326]. All eligible papers appeared to be co-authored by a Triple-P affiliated author, apart from one [26], although it was difficult to establish affiliation in some cases. We were, therefore, not able to conduct sensitivity analyses in relation to authorship. Conflict of interest statements were found in two papers: one [13], where 'no conflict' was reported and another [27] where royalty payments to authors were mentioned.

There is substantial risk of outcome reporting bias. Between papers, there is inconsistent reporting of subscale results within the Strengths and Difficulties Questionnaire, the ECBI, the Family Observation Schedule and the Developmental Behavior Checklist. In some papers all subscales are reported, in others selected subscales and in two, [25, 28] no subscales are reported. Such selective reporting might increase the likelihood of presentation of findings supporting a favored hypothesis and the omission of less favorable analyses. Before- and after- data from the intervention group were usually presented more prominently than between-group comparisons, and this often obscured interpretation of group effects. In the 33 papers tabulated above, all except one [29] report at least one statistically significant positive child-based outcome for Triple P compared to the control condition in the Results sections, while 25/33 papers report at least one statistically non-significant result. Only 4/33 abstracts report any negative findings, whereas 32/33 report positive findings so that abstracts tend to give a more favorable picture of the effects of Triple P interventions than are supported by the more detailed findings.

Risk of bias within studies - whole-population ('public health') interventions

Three whole-population studies met our inclusion criteria. The South Carolina study [30] was a well-designed cluster randomized trial, but the presentation did not comply with the recommended Consolidated Standards for Reporting Trials (CONSORT) format for the reporting of cluster randomized trials [31], making an accurate assessment of the implications of the paper difficult. Although it claimed to have achieved a reduction in the incidence of episodes of child maltreatment [5], it actually demonstrated an unexplained rise in reports in control areas rather than a drop in Triple P intervention sites. The description of the random allocation was poor, and the analysis was simplistic, being a two-sample t-test of county-wide measures. In particular, although some form of stratification or matching was used (it was not clear exactly how this had been done), there was no evidence that this had been accounted for in the analysis. For example, if counties were randomized within pairs, then the within-pair differences in the changes from baseline would have been of interest, but these were not reported. Therefore, although there are positive conclusions from this study, some doubt remains as to their validity.

There are two other whole-population Triple P evaluations involving a comparison group. Sanders et al. [6] reported a quasi-experimental study in parts of Brisbane, Sydney and Melbourne. There were substantial baseline differences between intervention and control populations. Approximately 3,000 parents were interviewed before and after the intervention, but different samples were used in each data collection and so it is not possible to characterize changes in individuals over time. Results are reported only as proportion of children with 'clinically elevated' scores rather than mean or median results. The positive child-based outcomes reported with this approach, from seven possible outcomes, were in the emotional and total problems domains of the Strengths and Difficulties Questionnaire, although neither finding would have attained conventional levels of statistical significance had allowance for multiple comparisons been made. We consider that this study offers relatively little support for any effect of triple P on children at the whole-population level.

Zubrick et al. [20] reported a further quasi-experimental study in two areas of Western Australia. There were again substantial differences in the characteristics of intervention and control populations. Recruitment methods differed significantly between the two areas: in the intervention area parents volunteered for active participation whereas in the control area parents volunteered to take part in a health services survey of child behavior. Analysis using hierarchical linear modelling suggests a short term improvement in ECBI externalizing behavior scores but given the potential for confounding by factors such as parental motivation it is difficult to confirm that this difference is attributable to the intervention.

Results of individual studies

The 23 papers listed in Table 2 and associated data were used in the meta-analysis. These papers report the randomized trials in which the principal carer (usually the mother) of the index child returned ECBI or CBCL data before and after the intervention. Insufficient information is presented in most publications to allow use of intention-to-treat data, so non-imputed data for study completers only are used here.

Table 2 Papers included in the meta-analysis.

Only two studies [29, 32] compared a Triple-P intervention with an active comparison condition - a marital distress prevention program (Couples Coping Enhancement Training, n = 50 per group) [32] and standard dietary education (group n = 12 and 9) [29]. No significant differences between the active intervention groups in terms of maternal or paternal reports of child behavior were reported in either study.

The whole population studies were excluded from the meta-analysis on the grounds of non-randomized design or the nature of the reported outcome measures.

Synthesis of results

The forest plot (Hedges) depicting the included studies (maternal report ECBI-I or CBCL-E) is shown in Figure 2.

Figure 2
figure 2

Forest Plot (Hedges) of Standardized Mean Differences. Studies reporting data based on ECBI or CBCL questionnaires completed by mothers are presented in increasing order of weight to final estimate, based on sample size. CBCL, Child Behavior Checklist; ECBI, Eyberg Child Behavior Inventory.

For the (generally) maternally-reported ECBI-I and CBCL-E, the summary ES was 0.61 (95%CI 0.42, 0.79) under the definition of a random-effects model with Hedges correction. Thirteen of the studies showed a significant positive effect while ten did not, with most ESs falling in the range 0.3 to 1.0.

There is evidence of heterogeneity (chi-squared = 60.16 (d.f. = 22) P = 0.000), with the variation in SMD attributable to heterogeneity (I-squared) = 63.4%. This level of heterogeneity indicates that there are significant differences between the studies which cannot be explained by random variation.

Risk of bias across studies

Publication bias was assessed by the use of a funnel plot, which illustrates the relationship between sample size and ES. Publication bias is present when there is selective reporting of small studies with positive results. Larger studies are more likely to be published successfully regardless of ES. The results are shown in Figure 3.

Figure 3
figure 3

Funnel plots for the random effects model (Hedges) based on maternally-reported ECBI-I or CBCL-E data. CBCL, Child Behavior Checklist; ECBI-I, Eyberg Child Behavior Inventory - Intensity scale.

Egger's test (regression of the standard normal deviate of intervention effect estimate against its standard error) yielded limited evidence of small-study effects (P = 0.067), with an estimated bias coefficient of 1.98.

Additional analysis

Sensitivity analysis in relation to author affiliation was not possible because of the small number of articles published without Triple-P affiliated authorship.

The meta-regression comparing baseline severity scores with ES for those studies which employed the ECBI-I outcome is shown in Figure 4. Ninety four percent of the between-study variance is explained by the covariate mean baseline value, and a ten-point increase in baseline ECBI-I score is associated with a 0.15 increase in ES (95% CI: 0.005, 0.025). The mean baseline ECBI-I score was 132.7, and the ES for the studies included in this meta-regression was 0.65.

Figure 4
figure 4

Bubble plot of standardized between-group mean difference (SMD - equivalent to effect size) against pre-intervention (baseline) pooled ECBI-I scores. The baseline ECBI-I scores are centered on the mean value across all included studies. The size of the circle represents the study sample size. ECBI-I, Eyberg Child Behavior Inventory - Intensity scale.

Summaries of child-based outcomes reported by informants other than the principal (usually maternal) carer are reported in Table 3.

Table 3 Child based outcomes reported by informants other than the child's mother.

Independent observers reported benefit attributable to Triple P on at least one subscale of an observational measure in two of seven papers in which these data are reported. Teachers reported benefit in one subscale score in one of four papers with relevant data. Seven papers yielded data on paternally reported ECBI-I or CBCL-E. Summary data are reported in Table 4.

Table 4 Papers giving paternally-reported ECBI Intensity scores.

There was strong evidence of heterogeneity (chi-squared = 29.72 (d.f. = 5) P < 0.001), with a variation in SMD attributable to heterogeneity (I-squared) of 83%. The summary ES for the six studies for which data were presented was 0.42 (95%CI -0.02, 0.87) under the definition of a random-effects model. The remaining study reporting non-significant results could have influenced this estimate in either direction.

Discussion

There are a large number of published evaluations of Triple-P parenting interventions and we were able to identify 33 English-language studies which measured a child-based outcome and which compared Triple P interventions with a comparison condition. Most of the studies involved families who responded to media advertisements. These families clearly have children whose parents are finding difficulties with their behavior but may well not be typical of such families in the population. They are more likely to be motivated and literate and are sufficiently confident to present for treatment as volunteers. These characteristics would be likely to lead to high levels of compliance with treatment and better than average treatment response. Only five studies [5, 6, 21, 33, 34] did not rely upon self-referral by potential participants. All the studies involved only children over two years old. There are many forms of the Triple-P program [2], and new versions emerge regularly, but for simplicity we have not distinguished between the levels of intervention. Nevertheless, most of the studies reported on the effectiveness of small group-based Triple-P interventions, usually level 4 or 5, and synthesis of results from other Triple P levels would be limited by low numbers.

Most of the studies were relatively small and the great majority used a waiting-list control design in which the participants on the waiting list were offered active treatment immediately after the post-intervention data collection. It is, therefore, not possible to draw conclusions about the longer term effectiveness of Triple P relative to a comparison condition. Before- and after- data from the intervention group were usually presented more prominently (and frequently) than between-group comparisons, and this method of reporting often obscured interpretation of group effects and tended to increase the impression of a positive effect from the intervention. Only two trials used an active comparison group, and neither of these showed any advantage for Triple P in terms of child-based outcomes. Trials with waiting list controls or usual care provide intervention estimates which reflect the combined specific and non-specific effects that will accrue in practice, and are more likely to show between-group differences than trials with active controls [35].

A range of child-based outcome measures was used but the most commonly reported was the ECBI, completed by parents. In the majority of cases (31/33) the main informant was the mother, and these data were synthesized in our main meta-analysis. Despite some differences in methodological approach, the ES obtained in our meta-analysis of maternally-reported ECBI scores (0.61) is impressive and is broadly in line with that reported by other authors [9, 11, 12].

There is some evidence of publication bias from our analyses of the published work and there is additional evidence that results of a number of evaluations of Triple-P have not been published [30, 36]. Further evaluation of publication bias is not possible given the uniform lack of Triple-P trial registration. Thus, despite the apparent consistency of more recently published work, there is still the possibility that these studies represent a particularly favorable picture of Triple-P. The International Committee of Medical Journal Editors (ICMJE) agreed in 2005 that only registered trials would be considered for publication. Allowing for publication time lags, about one third of the studies in our meta-analysis pre-dated the guidance and two thirds could potentially have benefitted from adopting these recommendations.

There are considerable limitations in relying on maternal report data alone: 17 papers gave outcomes reported by informants other than mothers. Five of these studies showed a relative benefit for Triple P on one or more outcome measures compared with no treatment. Meta-analytic synthesis of paternally-reported data identified a pooled ES of 0.42 but there was significant heterogeneity and the overall effect size was not significantly greater than zero. Multiple outcomes were used in five of the six papers reporting significantly positive results: primary outcomes were not pre-specified (in common with all the reported trials), and corrections were not made for multiple comparisons. Paternal reports are often difficult to assess because of missing data which may not be missing at random. The incorporation of independent direct observations of parent and child behavior into trial design provides important confirmatory information [37, 38], and seven of the Triple P trials (see Table 4) included data from independent observers. Two of these seven studies reported benefit in one or more observational subscale.

It is possible that the discrepancy between maternal and paternal (or independent) reports of child behavior may be accounted for by the fact that maternal mental state improved significantly with most Triple-P interventions [10] and this may have led to a more positive maternal evaluation of the child's behavior, reflecting more optimistic states of mind. Fathers are less likely to attend sessions and independent observers are unconnected with the intervention. Related attribution effects have been reported in relation to Triple-P [39]. It is also possible that mothers are more accurate than fathers in reporting their children's behavior difficulties. One paper [13] reported a planned subgroup analysis for lone parent families - and reported no benefit from the triple P intervention.

All of the papers considered here, with one exception [26] were authored or influenced by Triple-P affiliated personnel. This is commonly observed in the early stages of development of non-pharmaceutical interventions but readers should interpret findings accordingly, particularly when authors may gain financially from the intervention under study. Although authors of Triple P interventions receive royalty payments from sales of training and materials [27], only one of the articles we obtained declared any conflict of interest. Conflict of interest may be of particular importance in interpreting studies (such as many of those reported here) in which subgroup analyses are reported [40]. Outcome reporting bias [41] may also be an important consideration in this respect, with possible significance for the interpretation of meta-analyses [42].

Claims that whole-population parenting programs have significant impact on public health are particularly important, because these may have led to substantial commitment of public funds. We were unable to find any convincing evidence of benefit from the Triple P program in the three whole-population studies eligible for inclusion in the present review.

Summary of evidence

Although a standard meta-analysis confirmed previous findings that mothers report improved child behavior after Triple-P interventions in comparison to a waiting list control condition, fathers and independent observers generally do not report improvements that are significantly different from those attributable to the control condition (Table 4). There is an absence of evidence of sustained benefit from Triple P interventions compared to control conditions, and no evidence that Triple P is superior to any other active intervention.

Limitations

Given the highly specific nature of the literature search, and the multiple sources of data, we believe that we have retrieved almost all of the relevant literature. We did not synthesize papers in languages other than English. We were not able to retrieve some book chapters with titles indicating possible eligible studies, but we did not find any new data reported in the many chapters we were able to retrieve. We did not obtain potentially relevant data from studies which were conducted but not subsequently published [30, 36].

Conclusions

The studies to date give proof-of-concept that group-based Triple-P may be effective in the short term according to maternal report of child behavior but, given the high risk of bias (or unknown risk of bias when reporting is poor), they do not support the view that Triple P provides other benefits to children.

The lack of convincing evidence of benefit from whole-population interventions is in line with previous work in which no significant improvement in child-based outcomes resulted from a public health parenting program [43] and with a more recent large-scale independent evaluation of Triple P in Zurich which demonstrated no impact on child behavior [44]. Along with findings from a previous systematic review [12], the results of our meta-regression support the view that some benefit might be achieved if interventions were focussed on the families of children with more severe problems. A recent Cochrane review of parent training interventions for children with established conduct problems and those at high risk of conduct disorder [45] provides robust evidence of the effectiveness of such targeted programs. It is, therefore, likely that an effective case-finding approach combined with offers of interventions to families with identified problems may be more effective than the 'public-health' approach.

Only one of the studies [26] included in our review had no apparent conflict of interest. We are aware of two further independent evaluations of Triple P which were ineligible for inclusion in our review - one published in German [46] and one recent large scale trial [44]. Both produced negative results. These findings mirror the frequently observed failure of independent replication of positive results from a range of developer-led studies. Theoretical models developed by Eisner [47], describing the mechanisms by which conflict of interest can lead to research bias, may help to explain this phenomenon.

Given the substantial cost implications, health care providers and policymakers would be well advised to apply the same standard of evidence when purchasing behavioral interventions as they do to the purchase of pharmaceutical agents or medical devices. Compulsory clinical trial registration and full and open declaration of conflicts of interest would address many of the deficits noted in our review. Pending the implementation of such mechanisms, unproven interventions should only be carried out in the context of a robust independent evaluation.

Developers and evaluators of psychological interventions should be encouraged to adhere to the guidelines dealing with good publication practice for communicating company sponsored medical research (GPP2 [48]). Journal editors and reviewers should be encouraged to adhere to CONSORT guidelines for both text and abstracts [49]: we believe that authors who choose journals that do not adhere to these guidelines, and editors who choose not to adopt them, do the field, as well as their own work, a disservice. There should be a clear expectation that instrument subscales should be reported in full and covered even-handedly in article abstracts. Care providers and policy makers should assess the generalizability of findings from socially advantaged and volunteer samples to their own situation. There is a need for registered, large, multicenter trials, with prospectively defined, long-term outcomes and active comparison groups rather than further evaluations using waiting list control groups. Rigorous systematic reviews of parenting interventions (for example [45]) attest to the importance of including data from independent observers in trials and in reviews, both in order to reduce risk of bias and to provide more convincing data on the effects of parenting interventions. Whole-population data on child behavior, reported by multiple informants, and linked to provision of parenting interventions, may be an alternative approach to the evaluation of public health parenting programs.

Funding

The study was unfunded, apart from a contribution made by the Gillberg Neuropsychiatry Centre to pay for statistical analysis. The funder played no part in the study design, analysis or interpretation of the data except insofar as one of the authors (CG) is employed within the Gillberg Neuropsychiatry Centre at the Sahlgrenska Academy, University of Gothenburg.