Introduction

Good mental health is a global priority and central to the UN Sustainable Development Goal (SDG) on Good health and Wellbeing (UN, 2015). Currently, around 13% of children and young people globally are diagnosed with a mental disorder (UNICEF, 2021). Poor mental health during childhood has been linked with school failure, delinquency, substance misuse, and other health and social problems that persist later into life such as higher risks of obesity or poverty in adulthood (Jenkins et al., 2011). Social–emotional and mental health interventions early on in life therefore have both the potential to improve immediate outcomes and reduce the likelihood of adverse outcomes in adulthood due to poor mental health.

Primary schools are central to the lives of the 91% of children enrolled in schools globally (UNESCO Institute for Statistics, 2023) and are a key community setting for improving child and adolescent mental health (Barry et al., 2013; Fazel et al., 2014a, 2014b). Reviews have found strong evidence on the positive effect of school interventions on child and adolescent social–emotional and mental health outcomes (Barry et al., 2013; Fazel et al., 2014a, 2014b; Fazel et al., 2014a, 2014b). School mental health interventions can be universal, intended for all children in a school regardless of need, or targeted at specific groups of children—such as victims of bullying and those that bully others. Despite reporting little evidence on costs and cost-effectiveness, one review suggests universal primary school social–emotional and mental health interventions (curricula and programmes in particular) are less costly and more likely to be adopted in practice than targeted interventions (Fazel et al., 2014a, 2014b). A recently published systematic review of economic evaluations of universal child and adolescent mental health interventions identified three delivered in primary schools (Schmidt et al., 2020). The limited evidence reviewed is mixed. Compared with usual school provision, a social–emotional learning curriculum and a universal antibullying intervention that combines a targeted component were found to be cost-effective relative to country cost-effectiveness thresholds (Humphrey et al., 2018; Persson et al., 2018), whilst one social–emotional learning programme was not found to be cost-effective (also compared with usual practice) due to a lack of statistically significant effect (Stallard et al., 2015).

Overview of Economic Evaluations

In this paper, we refer to interventions being “cost-effective” to indicate value for money relative to a comparator (e.g. usual practice or another intervention), irrespective of how outcomes were captured and valued. Cost-effectiveness analyses and benefit–cost analyses are arguably the most common economic evaluations, with several published reference cases to support appropriate evaluation methods and reporting (Robinson et al., 2019; Wilkinson et al., 2016). Cost-effectiveness analyses measure outcomes in natural units (e.g. number of students reached or improvements in test scores) and are called cost-utility analyses when quality of life measures are used—such as quality-adjusted or disability-adjusted life years. These consider both length and health-related quality of life to improve comparability between interventions and studies. Results from cost-effectiveness analyses are typically reported as an incremental cost-effectiveness ratio (ICER), representing the average difference in costs divided by the average difference in outcomes between an intervention and its comparator. The lower the ICER, the more cost-effective the intervention relative to its comparator, with lower costs required per unit of improvement in an outcome.

By comparison, benefit–cost analyses report both costs and outcomes, or benefits, in monetary terms. Intervention benefits are monetised either based on stakeholders’ willingness-to-pay for an intervention or by assigning a monetary value to intervention outcomes (e.g. expected lifetime earnings based on improved educational attainment). Benefit–cost ratios represent the average monetised benefits of an intervention divided by its average cost relative to a comparator. Interventions with higher benefit–cost ratios represent greater benefits relative to costs and therefore better value for money.

Whilst cost-utility and benefit–cost analyses aim to improve comparability between interventions and studies, any comparison of average intervention costs, ICERs and benefit–cost ratios must be interpreted with caution for several reasons. In addition to the representativeness of study populations and other common concerns around generalisability (e.g. transferability of findings across settings or countries), economic evaluations can differ based on whether provider/payer costs or wider societal costs are considered, whether set-up costs are captured alongside implementation costs, how outcomes are captured and valued, and the time horizon over which costs and outcomes are analysed, amongst other methodological choices.

Nonetheless, despite limited comparability between economic evaluations, it is widely recognised that interventions must be feasible and cost-effective compared with alternatives (including usual practice) to be implemented at scale (Srikala & Kishore, 2010) and accelerate progress towards SDGs. Economic evaluations that estimate intervention cost and and/or cost-effectiveness compared with alternatives are essential to inform policymakers on how limited public resources can be allocated most efficiently to improve outcomes (Lindstrom Johnson et al., 2023). Whilst evidence on the effectiveness of universal primary school social–emotional and mental health interventions has previously been reviewed, to the best of our knowledge, no review exists of their cost-effectiveness compared with available alternatives or usual practice. In this study, we systematically reviewed published economic evaluations of universal primary school social–emotional and mental health interventions. Our review included studies from both high-income countries and low- and middle-income countries (LMICs), where government budgets are especially limited.

It is important for the limited comparability and availability of economic evaluation findings not to preclude their use, rather, where possible, for findings to be adjusted or interpreted with varying degrees of certainty based on the decision-making context, study characteristics and quality (Goeree et al., 2011; Huda et al., 2023). Given that this is a global review, we set out to interpret results at an aggregate level, rather than focus primarily on each of the studies and interventions. In other words, we aim to: (1) draw general conclusions about the value for money of published universal primary school social–emotional and mental health interventions, (2) make context specific policy recommendations where appropriate, such as for higher income countries where more evidence is typically available and interventions for which multiple evaluations are found, and (3) suggest research priorities to improve the evidence base moving forward.

Methods

Objective and Research Questions

The primary objective of our systematic review was to answer the following questions by synthesising the findings and appraising the quality of economic evaluations of universal primary school, school-community, or school-parent interventions to improve child social–emotional and mental health outcomes:

  • How cost-effective are universal primary school interventions to improve child social–emotional or mental health outcomes compared with alternatives or usual practice?

  • What is the current state of evidence from economic evaluations of universal primary school interventions to improve child social–emotional or mental health outcomes, in terms of availability and quality?

This systematic review was guided by the Preferred Reporting Items for Systematic reviews and Meta-Analyses 2020 (PRISMA 2020) statement (Page et al., 2021). Adherence to the PRISMA checklist is reported in Supplement 1. A research protocol was prospectively published for this review in PROSPERO (CRD42020190148) in July 2020: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020190148.

Search Strategy

We searched for English language publications from database inception until 17 October 2022. A full list of the databases searched can be found in Supplement 1 and included MEDLINE (PubMed), EMBASE, Web of Science and Econlit amongst others. The search strategy (title, abstract, keyword or subject heading searches) used groups of key words that captured variations in terminology related to economic evaluations, primary schools, and school stakeholders. The full list of keywords is included in Supplement 1, and specific search strategies were developed for the different databases.

To ensure that no relevant articles were missed through the systematic database searches described above, we also searched the first ten pages of several Google Scholar searches, as well as the NHS Economic Evaluation Database, Tufts Cost-Effectiveness Analysis Registry (CEAR), Tufts Global Health CEAR, Cochrane Library and Campbell Collaboration database. Last, we screened the reference lists of all included studies and publication citations were forward tracked.

Screening and Study Selection

Search results were downloaded and imported into the Endnote reference package, and duplicates were removed prior to screening (AUHE Information Specialists University of Leeds, 2016). Studies identified by the search were then screened in a two-step selection process based on the review inclusion and exclusion criteria outlined below. The first screen for inclusion was based on the relevance of study titles and abstracts, before a second screen which assessed the full text of remaining articles. Titles and abstracts in the first step were screened independently by a single reviewer (GAJ), with a second reviewer (RMG) screening a randomly selected 5% sample to check for inter-rater agreement. No discrepancies were encountered in the decisions made, with full agreement between the two reviewers. Full text documents were independently assessed for inclusion by two reviewers (GAJ and RMG), and discrepancies were discussed with the wider team until resolved.

The review included universal school, school-community, or school-household interventions aimed at improving social–emotional or mental health outcomes of primary school children. Interventions were deemed universal during screening, as opposed to targeted, if they were implemented school-wide and were not designed solely for a group of primary school children with a particular difficulty (e.g. children with disruptive behaviour or poor educational attainment). Interventions that combined universal and targeted components were also eligible for review. We included both partial (costs only) and full economic evaluations (costs and outcomes), regardless of study design (e.g. modelled or conducted alongside randomised controlled trials (RCTs), amongst others). Outcome measures from any type of economic evaluation were considered and included in the review, whether reporting on: (a) total programmatic cost or average cost per child, (b) cost-effectiveness, (c) cost-utility, (d) benefit-cost, (e) cost consequence, or (f) cost minimisation. We excluded studies published in languages other than English and those published without peer-review.

Data Extraction and Quality Assessment

We summarised the characteristics of included studies (Table 1) using Excel to extract information on: (a) country, (b) participant age, (c) type of intervention and components, (d) intervention focus/objectives, (e) study design, (f) type of economic evaluation, (g) economic evaluation components (perspective, time horizon, primary outcomes, sensitivity analysis), and (h) partial or full economic results, average cost per pupil and benefit-cost or cost-effectiveness/cost-utility ratios. Data were extracted and double-coded separately by two reviewers (GAJ and RLG) and then consolidated. Whilst all included interventions affected social–emotional or mental health outcomes, given the broad scope of the review, interventions were categorised by type (e.g. curriculum/programme, teacher practices) and focus (e.g. behaviour and social–emotional development, bullying). Costs reported in included studies were all converted to International Dollars (Int$) and inflated to the year 2021 (World Bank, 2023). All conversions, tabulations and visualisations were carried out in Excel. No pooled or sensitivity analyses were possible due to heterogeneity across study methods and outcomes.

Table 1 Summary table of key study characteristics

The reporting quality of eligible studies was independently assessed by two reviewers (GAJ and RLG), using the 2022 Consolidated Health Economic Evaluation Reporting Standards (CHEERS) (Husereau et al., 2022). Each CHEERS criterion was qualitatively assessed with “yes” if fully addressed by a study, “partially” if some criterion components were addressed, “not reported” if the CHEERS criterion was not reported by a study and “n/a” if a criterion was not applicable to the study design. There was 80.5% agreement on the quality appraisal before discussions to resolve disagreements. Studies were then divided into three categories: “high quality” adhered to 75% or more of the CHEERS checklist items, “moderate quality” between 50 and 75% of items, whilst “low quality” complied with less than half of the checklist (Rinaldi et al., 2020).

Results

Search Results

The study selection process is outlined in the PRISMA flowchart (Fig. 1) below (Haddaway et al., 2022; Page et al., 2021). We identified a total of 31,254 studies through the systematic database search, of which 6072 were duplicates. Following an initial screen of 25,182 titles and abstracts, 47 articles were selected for full text review. Alongside these, we identified an additional 14 articles for full text screening through citation searching and tracking on Google Scholar. By the end of the screening process, we included a total of 24 publications that report on 25 economic evaluations of interventions.

Fig. 1
figure 1

PRISMA flowchart of the study screening and selection process

Characteristics of Included Studies

Table 1 summarises the key characteristic of included studies. Most studies evaluated curricula or programmes (n = 9) (Belfield et al., 2015; Berry et al., 2016; Connolly et al., 2018; Humphrey et al., 2018; Hunter et al., 2018; Klapp et al., 2017; Long et al., 2015; Stallard et al., 2015; Turner et al., 2020), which were implemented as part of learning activities or class time. Another five studies evaluated interventions with both a universal and a targeted component that focused on the same objective (Clarkson et al., 2019; Huitsing et al., 2020; Jadambaa et al., 2022; Le et al., 2021; Persson et al., 2018), namely bullying. Teacher practices and classroom management interventions were evaluated by three studies (Belfield et al., 2015; Ford et al., 2019; Hickey et al., 2017), involving activities such as teacher training and supervision. An equal number of studies also evaluated service mediation interventions (n = 3) (Bagley & Pritchard, 1998; Bowden et al., 2017, 2020), which provided and linked students to support services through school social workers. One study evaluated an intervention involving changes to school culture at the operational level by involving leadership, students, parents and the community (Greco et al., 2018). The remaining four studies evaluated multicomponent interventions (Chaux et al., 2017; Foster et al., 2006; Peters et al., 2010, 2016), which involved more than one intervention or component targeting multiple different outcomes and objectives. Whilst all interventions targeted social–emotional or mental health outcomes, the primary focus of interventions differed. Most studies evaluated interventions aimed primarily at improving behaviour and social–emotional development (n = 10), followed by five studies that evaluated interventions to reduce bullying, three to reduce aggression and violence and one to reduce anxiety and depression. The remaining six studies evaluated interventions with multiple aims, including reductions in crime, drug use and educational attainment amongst others.

Full economic evaluations accounted for 20 of the 25 evaluations reviewed, the remaining five carried out a cost analysis (partial economic evaluation). An equal number of full economic evaluations were benefit-cost (n = 8), cost-effectiveness (n = 8) and cost-utility (n = 8) analyses—with four of the latter performed alongside a cost-effectiveness analysis. Benefit-cost analyses typically monetised outcomes on educational attainment, mental health, bullying and theft amongst others, by linking these outcomes to lifetime income or costs such as crime and drug use. Cost-effectiveness analyses were based on outcome measures such as the Strengths and Difficulties Questionnaire, Revised Child Anxiety and Depression Scale, Social Skills Improvement System Rating Scale and violence or crime averted. To enable comparability between evaluations using a standardised outcome and capture intervention impact on overall health, four cost-effectiveness analyses also estimated health utility (cost-utility)—in addition to four other studies only reporting cost-utility analyses. Quality-adjusted life years (QALYs), which consider length and quality of life, were commonly used (n = 7) to estimate health utility and generate a cost per QALY gained. Except for one study that used the EuroQoL-5 Dimension Youth (EQ-5D-Y), the Child Health Utility 9-dimensions (CHU9D) measure was used by cost-utility analyses to collect quality of life data from trial participants to estimate QALYs. Disability-adjusted life years (DALYs) were used in one study. No cost consequence or cost minimisation analyses were identified.

Most of the included economic evaluations (n = 13) were based on a randomised controlled trial, followed by cohort or quasi-experimental designs (n = 9). Only three studies used a decision analytic model, mainly Markov models. Primary (n = 8) or a mix of primary and secondary data (n = 10) were commonly used in the included studies. A similar number of evaluations were carried out from a provider (n = 12) or societal (n = 13) perspective, with a majority adopting a time horizon between 1 and 10 years (n = 14). Nine (36%) of the included evaluations estimated costs and outcomes for ten years or longer. Overall, only one evaluation was conducted in a low-income country (Uganda) and one in a middle-income country (Colombia), with the remaining 23 evaluations in high-income countries (United Kingdom = 8, United States = 7, Sweden = 2, Canada = 2, Australia = 2, Ireland, and the Netherlands). All but three of the 24 papers included were published from 2015 onward (Fig. 2).

Fig. 2
figure 2

Number of partial and full economic evaluations published over time

Intervention Costs

Table 2 summarises the findings from included studies. Interventions varied substantially in cost (Fig. 3), ranging from an average annual cost of Int$18.7 to Int$83,656 per child. The least costly interventions were universal interventions that combined a targeted component, all of which focused on bullying, with an average annual cost ranging between Int$26.9 and Int$66.8 per child. The school operational culture intervention focused on violence reduction was also amongst the least costly (Int$46.0). This was followed by curricula or programmes, which on average costed between Int$21.4 and Int$396 per child. The large variation in average costs of curricula or programmes was driven by the extent of ongoing coaching, supervision, administration, and management costs required during implementation—including from curriculum or programme developers, which is typically expensive. The latter accounted for a large share of the total costs of the four most expensive curricula or programmes. The remaining curricula or programmes, for which coaching, administration and management comprised a smaller share of total costs, were amongst the least costly interventions, with average annual costs per child comparable to those of universal and targeted or school operational culture interventions. Curricula or programmes focused on behaviour and social–emotional development (n = 6), aggression and violence (n = 2) and anxiety and depression (n = 1), without any clear differences in cost based on intervention focus. The costliest interventions were service mediation (Int$887–Int$973) and multicomponent interventions (Int$817 and Int$83,656), all of which focused on multiple outcomes and objectives. It is important to note that excluding the costliest multicomponent intervention (Int$83,656) reduced the range of average costs per child to Int$817–Int$907. Cost estimates varied substantially for teacher practices and classroom management interventions, with the lowest average cost per child found (Int$18.7) coupled with one of the highest (Int$838).

Table 2 Summary of partial and full economic evaluation results
Fig. 3
figure 3

Average annual intervention cost per child (2021 Int$)

Intervention Cost-Effectiveness

A total of 16 full economic evaluations reported comparable or standardised outcomes in the form of a benefit–cost ratio (BCR, n = 8), cost per QALY gained (n = 7) or cost per DALY averted (n = 1). All benefit–cost analyses reported a positive return on investment compared with usual practice, with BCRs ranging from Int$1.31 to Int$11.55 for each Int$1 invested (Fig. 4). The highest reported BCRs were for a teacher practices (Belfield et al., 2015) and a curriculum/programme intervention (Klapp et al., 2017), both of which focused on behaviour and social–emotional development. Reported ratios were lowest for multicomponent and service mediation interventions focused on multiple objectives and outcomes (Bagley & Pritchard, 1998; Bowden et al., 2020; Peters et al., 2010, 2016). Cost-utility analyses, which report a cost per QALY gained or DALY averted, found seven of eight interventions evaluated to be cost-effective compared with usual practice (Fig. 5)—i.e. lower than the respective United Kingdom, Australian and Swedish country cost-effectiveness thresholds of Int$29,412-Int$44,118 (£20,000-£30,000), Int$34,483 (A$50,000) and Int$57,339 (SEK550,000). All cost-effective interventions focused either on behaviour and social–emotional development (n = 4) or on bullying (n = 3). The majority of cost-effective interventions were curricula/programmes (n = 3) (Connolly et al., 2018; Humphrey et al., 2018; Turner et al., 2020), ranging between Int$15,527 and Int$25,463 per QALY gained, or universal interventions that combined a targeted component (n = 3) (Jadambaa et al., 2022; Le et al., 2021; Persson et al., 2018) which ranged from being cost-saving to Int$16,068 per QALY gained. One teacher practices intervention was also evaluated and found to be cost-effective compared with usual practice (Int$21,126/QALY) (Ford et al., 2019). Only one intervention, a curriculum/programme focused on anxiety and depression, was not found to be cost-effective due to negative effect sizes observed when compared with usual practice (Stallard et al., 2015). No cost per QALY gained or DALY averted was estimated for service mediation or multicomponent interventions.

Fig. 4
figure 4

Intervention benefit–cost ratio. Note As explained in the introduction, higher benefit–cost ratios signal higher benefits compared with costs, and therefore better value for money from the intervention relative to its study comparator (usual practice)

Fig. 5
figure 5

Incremental cost per QALY gained or per DALY averted. **Cost-per DALY averted. Note As explained in the introduction, a lower incremental cost per QALY gained or DALY averted signals better value for money from the intervention relative to its study comparator (usual practice)

Study Reporting Quality

None of the full or partial economic evaluation studies we appraised complied with all CHEERS checklist items. Full economic evaluations adhered to an average of 72.2% of checklist items, compared with an average of 60.9% across partial economic evaluations. Wide variations were observed between evaluations. Overall adherence to the CHEERS checklist ranged between 40.0% and 86.5% across full economic evaluations, with 11 of high quality, six of moderate quality, and two of low quality. Across partial evaluations, adherence to the checklist ranged between 35.7% and 75.0%, with one high quality, three moderate quality, and one low quality. As shown in Fig. 6, studies complied especially poorly with CHEERS items related to reporting a health economic analysis plan (item 4), characterising heterogeneity (item 18), characterising distributional effects (item 19) and the effect of stakeholder engagement (item 25). To a lesser degree, studies also complied poorly with CHEERS items related to the abstract (item 2), discount rate (item 10), characterising uncertainty (item 20), describing stakeholder engagement (item 21), effect of uncertainty (item 24), role of funders (item 27) and conflicts of interest (item 28). Based on the type of study, the highest reporting quality was observed amongst economic evaluations carried out alongside randomised controlled trials or that used decision (Markov) models. Full details on the appraisal for each paper are included in Supplement 2.

Fig. 6
figure 6

Appraisal outcome for each included study across all 28 CHEERS checklist items. Note 1 = Title; 2 = Abstract; 3 = Background and objectives; 4 = Health economic analysis plan; 5 = Study population; 6 = Setting and location; 7 = Comparators; 8 = Perspective; 9 = Time horizon; 10 = Discount rate; 11 = Selection of outcomes; 12 = Measurement of outcomes; 13 = Valuation of outcomes; 14 = Measurement and valuation of resources and costs; 15 = Currency, price date, and conversion; 16 = Rationale and description of model; 17 = Analytics and assumptions; 18 = Characterising heterogeneity; 19 = Characterising distributional effects; 20 = Characterising uncertainty; 21 = Approach to engagement with patients and others affected by the study; 22 = Study parameters; 23 = Summary of main results; 24 = Effect of uncertainty; 25 = Effect of engagement with patients and others affected by the study; 26 = Study findings, limitations, generalisability, and current knowledge; 27 = Source of funding, 28 = Conflicts of interest

Discussion

We systematically reviewed and summarised the literature on partial and full economic evaluations of universal primary school interventions to improve child social–emotional and mental health outcomes. A total of 24 studies were identified that evaluated 25 interventions, consisting primarily of curricula and programmes, universal interventions combining a targeted component, and multicomponent interventions. Average annual costs per child varied substantially between Int$18.7 and Int$83,656. Universal interventions combining a targeted component were least costly, along with changes to school operational culture and several curricula and programmes, whilst multicomponent interventions were the most expensive. All but one of the 16 full economic evaluations reporting monetised outcomes (benefit–cost analyses), a cost per QALY gained or DALY averted, found that interventions likely represented good value for money in the study settings.

However, value for money alone is insufficient to effectively inform decision-makers and other key considerations are required, including feasibility and the affordability of implementing an intervention at scale based on available fiscal space (Baltussen et al., 2023). In addition, limited comparability between studies in this review, varying degrees of uncertainty and study reporting quality should also be considered. It is important that intervention cost, cost-effectiveness and benefit–cost estimates reported in this review are not interpreted as directly comparable as they come from different settings and the type of costs captured vary along with how costs and outcomes were valued, amongst other study aspects. We recommend that readers using findings to inform intervention prioritisation consult the full texts of studies included in this review to assess, amongst other criteria, the applicability of a given intervention and its comparator to the setting where prioritisation decisions are being made. Recent developments in the health sector to help assess the applicability of global economic evaluation evidence to a given setting can help with this (Goeree et al., 2011; Huda et al., 2023).

As mentioned in the introduction, economic evaluation findings vary substantially, in part due to methodological choices in data collection and analysis. For example, economic evaluations of the KiVA intervention argued both for and against costing teacher instructional time based on whether it fell under standard contractual hours and mapped onto existing aspects of the national curriculum or had the potential to “crowd-out” other teaching activities. This resulted in an average cost per child (Int$50.1) twice as high in one study (Persson et al., 2018) compared with the other (Int$26.9) (Clarkson et al., 2019). In turn, the perspective adopted by different studies influenced economic evaluation estimates, although disaggregated reporting of results helped mitigate these differences. Benefit–cost analyses in particular adopted a societal perspective, which helped capture cost-savings from a broader set of intervention outcomes (e.g. lower utilisation of government services, reductions in costs due to crime).

Nonetheless, despite this variability across studies, universal primary school social–emotional and mental health interventions were found to be likely cost-effective compared with usual practice in almost all the full economic evaluations reviewed. More specifically, our review identified several potentially cost-effective interventions that should be considered by policymakers and practitioners for adoption instead of usual practice in primary school settings within high-income countries—where all evaluations using standardised or monetised outcomes were carried out. Compared with usual practice, one low-cost intervention in our review (Int$26.9–Int$50.1 per child), the KiVA intervention for bullying (universal with a targeted component), was estimated to provide benefits valued at Int$4.04-Int$6.72 for each Int$1 spent in the Netherlands (Huitsing et al., 2020) and to cost Int$16,068 per QALY gained in Sweden (Persson et al., 2018)—substantially lower than the national cost-effectiveness threshold. The Promoting Alternative Thinking Strategies (PATHS) curriculum, which focused on behaviour and social–emotional development, was found to be low-cost (Int$39.0-Int$48.4 per child) and cost-effective (Int$24,034/QALY–Int$25,463/QALY) compared with usual practice by two separate studies in the United Kingdom (Humphrey et al., 2018; Turner et al., 2020). Two studies in Australia also found the Friendly Schools Programme for bullying to be low-cost (Int$30.9–Int$66.8) and cost-effective (cost-saving–Int$2,661/DALY) (Jadambaa et al., 2022; Le et al., 2021) compared with usual practice. Overall, cost-effectiveness was not clearly related with any single type of intervention or focus area. However, cost-effectiveness may vary if the interventions identified are implemented at scale in the countries studied and is likely to differ substantially across countries. This is also the case as different countries use different cost-effectiveness thresholds.

Whilst less costly interventions such as KiVA for bullying or the PATHS curriculum may be more likely to be adopted at scale (Fazel et al., 2014a, 2014b), more costly interventions were also found to be cost-effective compared with usual practice, and there was no clear link between cost and cost-effectiveness in the studies included (Supplement Figures S1, S2). More costly interventions were primarily multicomponent interventions, and it is important to note that results for these were impacted by methodological decisions and limitations. Multicomponent interventions such as The Better Beginnings Better Futures project (Peters et al., 2010, 2016), which were typically evaluated using a benefit–cost analysis, appeared to have lower benefit–cost ratios than other interventions such as curricula or teacher practices. This is because benefit–cost analyses encountered difficulties in monetising all relevant outcomes and therefore likely undervalued the benefits of included multicomponent interventions. Benefits were typically expressed and monetised in terms of increased lifetime earnings based on educational attainment and costs averted, such as from reductions in crime or drug use, which does not adequately capture benefits such as improvements in school environment or culture. Willingness-to-pay estimates, such as those for reductions in school bullying (Persson & Svensson, 2013), may capture benefits more comprehensively in benefit–cost analyses. However, participant responses to willingness-to-pay experiments are typically anchored by their experiences and socioeconomic status, which can result in a pro-rich bias (Robinson et al., 2019).

The use of cost consequence analyses may have better captured both costs and outcomes of multicomponent interventions, but at the expense of comparability across interventions. No cost-utility analyses, which use standardised outcomes and do enable comparisons across interventions like benefit–cost analyses, were carried out for multicomponent interventions. This is likely due to limitations in the availability of sufficiently broad quality of life outcome measures for children. Quality-adjusted life years were primarily used by administering a questionnaire such as the CHU9D; however, these focus on health status or functioning and therefore would not adequately capture benefits of the multicomponent interventions included. Ongoing work on broadening evaluative frameworks of wellbeing through the development of capability indices for children and young people, which can capture outcomes across multiple dimensions in life (Mitchell et al., 2021), may avoid the need for monetising benefits and improve evaluations of multicomponent interventions within a cost-utility framework (Greco et al., 2018).

Implications for Research

Findings from our review highlight several key gaps and research priorities for expanding the knowledge base and better informing policy. First, there is a need for more economic evaluations of tested interventions in high-income countries and an acute need in LMICs, where only one full economic evaluation was identified. In high-income countries, more modelling of existing trial results can provide valuable economic information for policymakers in the short term—only three modelling studies were identified in our review (Jadambaa et al., 2022; Le et al., 2021; Persson et al., 2018). Systematic reviews of intervention effects can be used to inform decision models, which can answer key policy questions by estimating costs and impact at scale and effectively capture and visualise uncertainty around likely intervention cost-effectiveness compared with alternatives.

Second, more researchers should incorporate full economic evaluations alongside RCTs, especially in LMICs where these are needed most. Such RCTs could be based on local adaptations of the effective, cost-effective and low-cost interventions (e.g. KiVA for bullying or the PATHS curriculum) identified in this review (Barlas et al., 2022). Interventions should be tested as close as possible to usual school operational conditions and ideally at scale. To facilitate the sustainability and likely adoption of interventions, researchers and intervention development teams should ensure programme administration and management activities do not drive higher costs and avoid an over-reliance on costly coaches and consultants for implementation. With regards to economic evaluations, it is important for these to be built into RCT data collection and analysis plans early on by researchers and trial teams. Regardless of economic evaluation methods, policymakers should be engaged early in a trial to ensure that the evaluation is designed so that it can answer key policy questions, which commonly include total implementation costs at scale, initial set-up costs required, feasibility of implementation at scale, equity impact, amongst others. If resources are insufficient to include an economist on a trial or to prospectively carry out an economic evaluation, then efforts should be made to collect data to enable the possibility of a retrospective evaluation. Extensive guidance exists on economic evaluation data needs and analysis (Drummond et al., 2015; Levin & Belfield, 2015). In brief, project accounts alone are insufficient for economic evaluations. At the very least, information should be collected on resource use (based on a mapping of all intervention and comparator inputs), project records should capture any donated items and unpaid time or activities (i.e. volunteering) and staff time use should be recorded to understand how staff allocate their time across intervention activities and help inform cost analysis assumptions (e.g. allocation of joint, or shared, costs across intervention activities). When compiling this information, it is important to differentiate between set-up and ongoing implementation activities and between research and implementation activities. Comparable quality of life outcome measures, or outcome measures that can be monetised, should ideally be used in the economic evaluation and be captured alongside other trial outcomes (Wilkinson et al., 2016).

Adequately long trial follow-ups will also be essential in informing longer time horizons for future analyses based on whether key outcomes amongst trial participants are sustained over time or fade out. Given that benefits from some interventions early in life are likely to persist and accrue over time, it is important to capture and model these in economic evaluations to avoid underestimating intervention benefits (Knapp & Wong, 2020; Ungar, 2021). For example, based on associations between Intelligence Quotient (IQ) and market productivity, a one-point increase in IQ has been valued between US$ 10,600–13,100 in the US (Grosse & Zhou, 2021). Despite this, only nine of 16 full economic evaluations in the review investigated outcomes for 10 years or more, six of which are benefit–cost analyses and one cost-effectiveness analysis. Effort should be made to adopt longer economic evaluation timeframes, especially in cost-utility analyses, and a societal perspective to minimise the underestimation of intervention benefits (Ungar, 2021).

In addition, there was great variation in the reporting quality of studies, which is presently a widely recognised weakness of economic evaluations. It is important for researchers to ensure future economic evaluations adhere closely to reporting guidelines and publish health economic study protocols ahead of analyses (Husereau et al., 2022). Most included studies were especially poor in reporting a health economics plan, the impact of any stakeholder involvement on the study, variations in results by subgroups and any distributional impact. It is therefore difficult to extrapolate findings to the general population given that heterogeneity was not adequately captured, the equity impact of included studies was largely unknown, and there was likely unreported bias in the studies identified. There was also a need for better reporting on discount rates and investigating the impact on results of varying rates and time horizons considered in sensitivity analyses. This affected the ability of some studies to capture and convey uncertainty, which is inherent to decision science and economic evaluations and key to appropriately inform policymaking. Transparency on reporting funder involvement, in some cases conflict of interest as well, was insufficient in 55% of included studies—presenting another potential source of bias. Overall, the highest reporting quality was observed amongst economic evaluations carried out alongside randomised controlled trials or using decision (Markov) models.

Limitations of the Study

Our review has some key limitations that must be considered. First, we may have missed publications that are not in English, and it was not possible for a second reviewer to independently screen all titles and abstracts, only a random sample, given the volume of records identified, time and resources available. Second, similarly to other reviews of economic evaluations (Rinaldi et al., 2020), differences in costing methodology, perspective, outcomes, and other sources of heterogeneity limited comparability between our included studies. We mitigated differences in currency, time and purchasing power by inflating and converting all reported costs to 2021 Int$. However, other sources of heterogeneity remained, and we were only able to compare full economic evaluations reporting monetised outcomes or a cost per QALY gained.

Conclusion

We systematically reviewed economic evaluations of universal primary school interventions to improve child social–emotional or mental health outcomes. Economic evaluations using standardised or monetised outcomes, enabling comparison between studies, found that all except for one intervention were cost-effective compared with usual practice. Our review therefore partially addressed its primary objective and identified several cost-effective interventions that should be considered for appraisal and implementation at scale instead of usual practice by policymakers in high-income countries, particularly in Europe and the United States. However, no economic evaluations using standardised or monetised outcomes were carried out in low- or middle-income countries, and it was not possible to infer cost-effectiveness or make recommendations for these settings. Overall, studies were concentrated in a few countries, with variations in quality, and the extent to which evaluated interventions would remain cost-effective compared with usual practice across different contexts remains unclear. Cost-effective interventions identified in this review, particularly those with low average costs per participant, should be adapted and assessed through within-trial economic evaluations in low- and middle-income countries.