Introduction

In England, 29% of adults have obesity (body mass index (BMI) ≥ 30 kg/m2) [1], whilst at least 7% of men and 9% of women have severe obesity (which we define as BMI ≥ 35 kg/m2) [2]. Obesity-related diseases (ORDs) such as type 2 diabetes mellitus (T2DM), cardiovascular diseases, stroke, and obesity-related cancers reduce life expectancy [3] and are detrimental to patient health and quality of life. The economic burden of obesity in England is projected to be approximately £16 billion per year [4]. In 2017/2018, 711,000 hospital admissions were associated with obesity, an increase of 15% from the previous year, demonstrating that obesity is a growing health concern [1].

Economic evaluations are comparative analyses of the costs and benefits of different health care interventions and provide information to help decision makers reach evidence-based decisions on the efficient allocation of scarce health care funding resources. International decision makers, such as the National Institute for Health and Care Excellence (NICE) in the UK and Canadian Agency for Drugs and Technologies in Health (CADTH) in Canada provide funding recommendations on the use of health technologies using economic evidence as an integral part of their decision-making processes. For example, in the UK, NICE published obesity guidance in 2014 [5] that recommended a weight management programme (WMP) for people with obesity, pharmacotherapy if WMPs had failed, a very low calorie diet (VLCD) for people that needed to lose weight quickly (such as for infertility treatment or joint replacement) and bariatric surgery for those with a BMI ≥ 40 kg/m2 and BMI of 35–40 kg/m2 for people with comorbidities.

Despite the substantial health, social and economic burden, there remains a lack of evidence synthesis that clarifies the most effective and cost-effective management strategies for people with severe obesity (and their comorbidities). The aim of this paper is twofold. First, we report the findings of existing cost-effectiveness studies evaluating non-surgical WMPs for people with severe obesity. Secondly, we identify common evaluation challenges, with a view to providing recommendations for the conduct of future obesity economic evaluations.

Methods

Search Strategy

We searched MEDLINE and EMBASE databases from 1980; NHS Economic Evaluation Database (NHS EED), Health Technology Assessment (HTA) database, Cost-effectiveness Analysis Registry, and Research Papers in Economics (RePEc) from inception. Original searches by us up to May 2017 were conducted as part of the REview of Behaviour And Lifestyle interventions for severe obesity: AN evidenCE synthesis (REBALANCE) study [6••]. Updated searches were conducted up until November 2020. Full details of search strategies are provided in our REBALANCE report [6••].

Inclusion and Exclusion Criteria

English language studies, reporting full economic evaluations, defined as a comparative assessment of two or more non-surgical WMPs (i.e. cost-utility analysis (CUA), cost-effectiveness analysis (CEA), cost–benefit analysis (CBA) or cost-minimisation analysis (CMA) frameworks) were deemed eligible for inclusion. Eligible populations were adults aged 18 and over, with severe obesity (BMI ≥ 35 kg/m2) based on mean or median BMI in source clinical effectiveness studies (or a modelled cohort with (BMI ≥ 35 kg/m2)). Interventions were eligible for inclusion so long as they were a WMP, where the key target of the intervention was weight loss or weight loss maintenance. This also included VLCDs, defined here as ≤ 800 ± 10% kcal/day. Partial economic evaluations such as evaluations of costs alone or outcomes alone, cost-consequence analyses (costs and consequences not compared but reported separately) and methodological studies were all excluded. The only pharmacotherapy included was Orlistat because, at the time of writing, it was the only drug prescribed for weight loss in the UK.

Data Extraction

Abstract screening was conducted by one health economist. Full texts were evaluated against the inclusion and exclusion criteria and checked by a second health economist for consensus. All included studies were data extracted into a predefined online data extraction form. The data extraction form for our REBALANCE review was designed to include all economic data available within the studies, but in the updated review, a targeted data extraction form was used, extracting only data required for the current article [7]. The updated data extraction form is provided in the Supplementary Material Table 1.

Narrative Evidence Synthesis

Findings from the systematic review were tabulated, and a narrative synthesis of the cost-effectiveness evidence provided. Data were not synthesised quantitatively due to substantial heterogeneity across included studies in terms of evaluation frameworks (CUA, CEA), evaluation approach (within trial evaluations or decision models), scope of evaluation (narrowly defined such as diabetes vs broadly defined multiple ORDs), differences across health care systems, definitions of interventions and comparators. Methodological limitations of the studies were identified and catalogued, with a view to providing guidance for future research.

Quality Assessment

Included studies (in our REBALANCE report [6••]) were quality assessed using standardised checklists, recommended by Cochrane: economic evaluations (EEs) alongside clinical trials and decision analysis models used Drummond and Jefferson [8] and Philips et al. [9] checklists, respectively. Quality assessment was done independently by two health economists for the individual review, the results of which can be found in the REBALANCE report [6••].

Studies identified in this updated review were assessed against the methodological issues identified in the REBALANCE review to identify whether the quality of studies has improved over time.

Results

Identified Studies

The searches, combined for the original and updated reviews, identified 3478 potentially relevant titles and abstracts. N = 352 full texts were retrieved and assessed against the inclusion/exclusion criteria. N = 32 studies were finally included in the review (reported in 36 papers). Further details are provided in the PRISMA flow chart (Fig. 1).

Fig. 1
figure 1

PRISMA flow chart for identification of studies from 1990 to 2020

Economic evaluations included evaluations of WMPs (n = 29) and pharmacotherapies (n = 5). Two studies evaluated both WMPs and pharmacotherapies [10, 11]. These are listed in Table 1 and categorised in three groups: economic evaluations alongside randomised controlled trials (RCTs) (n = 13), others (neither RCT-based nor model-based) (n = 4) and decision models (n = 15). The majority of studies were published within the past 10 years (n = 29), and the remainder were published in 2005 (n = 3). The WMPs are further categorised as lifestyle WMPs (n = 25) [6••, 1022, 23••, 24•, 25, 26•, 273240], VLCDs (n = 4) [6••, 26•, 27, 29], meal replacements (n = 2) [10, 11], group intervention (vs intervention delivered on individual basis) (n = 1) [33], and remote interventions (n = 6) [1214, 3436]. Five studies included Orlistat in their assessment (n = 5) [10, 11, 3739]. Some studies evaluated multiple interventions and therefore a study can have multiple WMP categories. The WMP categories are listed in Table 1, the study characteristics table.

Table 1 Study characteristics

Cost-Effectiveness Results

The cost-effectiveness results are presented in Figs. 2, 3, 4, 5, 6, and 7. The control groups are described in detail in Table 1 and include a variety of minimal interventions such as do-nothing, self-help booklet and usual care. More detailed results are reported in the Supplementary Material Table 2. A summary of results for each WMP category is provided below.

Fig. 2
figure 2

Cost-effectiveness results–weight management programmes–decision models (cost per QALY (£))

Fig. 3
figure 3

Cost-effectiveness results–weight management programmes–decision models (cost per QALY (US$))

Fig. 4
figure 4

Cost-effectiveness results–pharmacotherapy–decision models (cost per QALY (EUR) and cost per DALY (AU$))

Fig. 5
figure 5

Cost-effectiveness results–weight management programmes–within trial economic evaluations (cost per QALY (US$, £))

Fig. 6
figure 6

Cost-effectiveness results–weight management programmes–within trial economic evaluations (cost per kg lost (US$))

Fig. 7
figure 7

Cost-effectiveness results–weight management programmes–neither within trial economic evaluations nor decision models (cost per QALY (US$))

Weight Management Programmes (WMP)

Lifestyle WMPs (11 within trial, 11 decision models and 3 neither within trial nor decision models) included diet and physical activity advice [6••, 12, 13, 1522, 24•, 25, 30, 31, 40], low carbohydrate diets [14, 21], commercial WMPs (Weight Watchers and Vtrim, Slimming World) [10, 11, 28, 32], the Counterweight programme [19] and Look AHEAD [6••, 23••]. The comparators were either no active treatment (most often occurring in decision models) or usual care, with heterogeneous definition of usual care across the studies. Many studies include a “usual care” comparison arm that includes an active intervention/education that may not necessarily reflect usual care as delivered to the general population. The duration of follow-up varied from 12 weeks to 9.6 (median) years, with the majority of studies having a follow-up of 1–2 years. The longest follow-up intervention was Look AHEAD. The ICERs across studies ranged from: US$22 to US$1224 per kg lost for CEAs and from dominant (i.e. less costly and less effective vs different dietary advice) to US$335,952 (vs unclearly described usual care) per QALY for CUAs. The ICER for the WMP with the longest follow-up (Look AHEAD) was uncertain in the within trial analysis [23••] and borderline cost-effective (vs baseline population trends) or extendedly dominated (vs other non-surgical and surgical WMPs) [6••].

Four studies [6••, 26•, 27, 29] (all decision models) included a VLCD as an intervention [6••, 26•, 27, 29]. The VLCD interventions (LighterLife Total [27], Optifast [29], Cambridge Weight Plan UK [26•] and different meta-analysed VLCD interventions [6••]) were followed by a WMP of varying intensity. Duration of follow-up varied from 1 to 4 years across the VLCD studies. The ICERs for the VLCD intervention ranged from US$6,475 (vs no intervention) per QALY [29] to dominated (i.e. more costly and less effective compared to other WMPs and bariatric surgery) [6••].

Two meal replacement studies [10, 11] were included (neither of which were within trial nor decision model but extrapolated benefits using meta-analysed data). In both studies, the Jenny Craig meal replacement intervention included a prescribed calorie intake and counselling. Jenny Craig was compared to other WMPs, with ICERs ranging from to US$369,000 [10] to US$588,620 per QALY [11].

A group intervention (within trial) included counselling through a conference call, instead of individually (control group) [33]. The ICER was US$9249 (less costly, less effective). Follow-up was only 1 year.

The interventions that were delivered remotely (4 within trial, 1 decision model and 1 neither within trial nor decision model) were Internet or telephone-based. Other evaluations were for interventions delivered remotely rather than in-person [1214, 35, 36]. Follow-up ranged from 6 months to 2 years. The ICER ranged from US$275 [12] to US$2204 [34] per kg lost for CEAs and £151,142 to £232,911 (vs usual primary care; the decision modelling study) per QALY [36] for CUAs.

Five studies (3 decision models and 2 neither within trial nor decision model) evaluated the cost-effectiveness of Orlistat and low-fat diet and showed mixed results [10, 11, 3739]. When compared to placebo (plus a low-fat diet), Orlistat was cost-effective [38, 39]. However, when compared to existing population trends or more intense interventions (that were defined as usual care), Orlistat was no longer cost-effective [10, 11, 37]. Orlistat was not cost-effective in the lifetime decision modelling study [37].

Some interventions were evaluated in multiple studies. Counterweight was deemed cost-effective when compared to no treatment [32]. However, Counterweight was not cost-effective compared to Weight Watchers [27]. Slimming World was cost-effective compared to being given information verbally or through written material [28]. However, in a different study, Slimming World was not found cost-effective compared to Counterweight, Weight Watchers and Lighterlife Total [27]. Look AHEAD was borderline cost-effective compared to baseline population trends [6••] but mixed results when compared to a lifestyle WMP including physical activity and dietary advice [6••, 23••].

The majority of studies were conducted in the USA (n = 17). The WMPs considered cost-effective in the longer term (in terms of cost per QALY) in the USA were OPTIFAST (a VLCD) [29] (but with a 3-year time horizon) and a lifestyle intervention based on DPP [30] (but with a 5-year time horizon). The WMPs that were considered cost-effective in a UK setting (n = 12) in the longer term were the WMP delivered in a football club [24•, 25], Lighterlife Total [27], Slimming World (only when compared to usual care) [28], the Counterweight Programme (only when compared to no treatment) [32], Cambridge Weight Plan [26•] and NHS Diabetes Prevention Programme [31]. The WMP considered in Sweden (n = 1), Ireland (n = 1) and Australia (n = 1) was Orlistat, with ICERs ranging from €13,125 per QALY (vs placebo plus a low-fat diet) [38] to dominated (vs more intense interventions) [10, 11].

Note that all the cost-effectiveness results here are compared against different thresholds, with differing health care systems and methodological quality. Therefore, in the following section, we will assess the methodological quality of the studies.

Quality Assessment

Trial-Based Economic Evaluations

About half of the economic evaluations were trial-based. The follow-up period for most studies ranged between 1 and 2 years. Studies with longer (than 2 years) follow-up periods were 3.5 years [24•], 5 years [6••] and about 9 years (Look AHEAD). Within trial, economic evaluations do not capture the long-term costs and benefits, nor assumptions associated with a treatment for severe obesity due to the long-term impact on ORDs.

Decision Models

The following sections reflect the key methodological issues identified in the quality assessment of the included modelling studies. The most common model types were a Markov model and individual level simulation/microsimulation model. The most common framework for analysis was CUA, and the most common benefit measurement was the quality adjusted life year (QALY). The incremental cost effectiveness ratio (ICER) was therefore compared to a commonly used country-specific threshold.

Model Structure

Decision model time horizons ranged from 3 years to a lifetime horizon across the studies. 8/15 (53%) of decision models were built on a life-time horizon, which is likely required to capture all the costs and consequences of ORD such as stroke, cancer, diabetes and myocardial infarction. The varying time horizons further limit the comparability between the studies. Short-term decision models, such as those conducted over only 3 years are insufficient for decision making as they fail to capture the long-term benefits of weight loss interventions on ORD and may generate cost-effectiveness conclusions biased against WMPs. However, a counterargument is that longer term extrapolations require assumptions about the impact of transient weight loss on ORD, and assumptions about the long-term rate of weight regain over time (Weight Regain Assumptions). Longer term extrapolations, based on short-term data, add uncertainty to results, with a risk of drawing cost-effectiveness conclusions that are biased towards WMPs. To determine the most likely cost-effectiveness conclusions from a decision model, it is critical that models include a comprehensive range of sensitivity analyses to ascertain the impact of important assumptions such as transient effects and weight regain rates on results.

Furthermore, many of the obesity models did not include many of the relevant disease health states such as T2DM, stroke, cardiovascular disease, and obesity-related cancers. Some obesity models [6••, 24•, 26•, 31] (all UK studies) did include many of the ORD risks factors such as T2DM (all studies), obesity-related cancers [6••, 26•, 31], stroke [6••, 24•, 26•], coronary heart disease [6••, 24•], hypertension [6••, 24•, 31], knee osteoarthritis [6••, 31] and congestive heart failure [31]. Obesity-related cancers included breast, colon, liver, kidney and pancreas cancers. The populations considered in the decision models were a mixture of the general population with obesity, with T2DM, at high risk of T2DM or with comorbidities. Two decision models only focused on T2DM [30, 38]. Whilst this is suitable for studies only interested in T2DM as an outcome, the exclusion of other health states from studies modelling interventions for severe obesity may tend to underestimate the benefits of weight loss interventions in the long-term.

Weight Regain Assumptions

The modelling assumption on weight regain over time varied widely between the studies. This parameter is subject to uncertainty as we do not know what happens beyond the short trial time period, which was the case for studies on WMPs.

Studies assumed a variety of weight regain assumptions after the end of intervention delivery. 9/15 (60%) assumed a constant weight regain rate to baseline (often at 1-kg regain per year or a 5-year regain to baseline weight) or a linear projection of the BMI based on trial data. For the remainder of the studies, it was either unclear, not reported or done differently (i.e. assumed QALY gains from weight loss linearly reduced to zero or extrapolated a person’s measured glycated haemoglobin values instead of their BMI).

The weight regain rate has important implications for cost-effectiveness, particularly in models where the risk of ORD is directly linked to time-specific weight/BMI. Long-term follow-up data on WMPs is frequently lacking and therefore exploring the impact that the weight regain assumption has on results is crucially important. The longest follow-up for WMPs identified in the REBALANCE clinical effectiveness review [6••] was from the Look AHEAD study [41], with 9 years of data. This was an intensive longer term WMP which is dissimilar to the other WMPs identified in this review, which had much shorter follow-up. The Look AHEAD study was evaluated in two studies included in this review, one trial-based economic evaluation [23••] and in one decision model [6••]. However, for the majority of WMPs, there is an urgent need for longer term follow-up of RCT evidence to determine the most accurate assumptions for economic modelling.

Variation in Interventions and Comparators

The comparisons identified in this review varied widely. The interventions and comparators differed both between WMP categories and within categories. Lifestyle interventions varied widely and were compared to no active treatment (e.g. country-specific population BMI trajectory) or some form of usual care. VLCDs were compared to WMPs with varying intensity. The meal replacement (Jenny Craig) was compared to different WMPs. The group and remote interventions were compared to in-person lifestyle interventions. Because of the variation in the intervention and comparators, it is difficult to compare across the studies.

Sensitivity Analyses

Sensitivity analyses are key to unravelling the uncertainty in the cost-effectiveness results. Four studies varied the discount rate [6••, 26•, 28, 36], which generally had negligible impact on the cost-effectiveness results. Only a few studies looked at varying the time horizon, and not surprisingly, the longer the time horizon, the more cost-effective the intervention [6••, 29]. This is because costs are often incurred upfront but the benefits in terms of ORD avoided often occur far into the future.

The weight regain rate was varied in 4 studies [6••, 24•, 26•, 28]. In two of the studies where the weight regain rate was assumed to be more conservative (quicker weight regain to baseline weight) [24•, 28], it did not change the cost-effectiveness conclusions. In one study, the intervention was more cost-effective when assuming a weight that was 1 kg below baseline weight beyond 5 years, rather than assuming that all weight was regained after 5 years. The intervention would remain cost-effective as long as the weight is kept off and is not all regained for at least 3 years [26•]. Lastly, in our REBALANCE study [6••], the weight regain was assumed to follow a linear trajectory based on trial data instead of a 5-year weight regain. Look AHEAD went from being borderline cost-effective to cost-effective (vs baseline population trends) but for the other WMPs evaluated it both increased costs and reduced QALY gains (although remained cost-effective compared to baseline population trends) [6••].

In the younger age group (aged 20–34), a total diet replacement programme [26•] (assuming a 5-year weight regain) was not cost-effective, and the cost per QALY was highest in the older age groups. However, this was not the case when assuming that 1-kg weight loss is maintained beyond 5 years (in this case the intervention was cost-effective for all age groups). This further highlights the importance of varying the weight regain assumption.

For the higher BMI groups, the cost per QALY was lower (still cost-effective in all age groups) [26•] and more cost saving [29].

Only three studies [24•, 25, 36] conducted a value of information analysis (VOI). VOI is a framework for identifying where the greatest uncertainty lies to which future research should be directed. Considering the uncertain longer term weight loss, weight loss maintenance and associated clinical event management, VOI could help guide the direction of future research in the area of obesity.

Discussion

We identified 32 studies (across 36 papers) evaluating the cost-effectiveness of non-surgical interventions for severe obesity (BMI ≥ 35 kg/m2). The cost-effectiveness findings from the WMP and pharmacotherapy studies were mixed. Half of the WMP studies were economic evaluations alongside RCTs, not extrapolating costs and benefits over a longer time horizon, failing to capture the long-term impact of an intervention on obesity, a chronic disease. Furthermore, studies were subject to heterogeneity with regard to the chosen comparators, study populations, settings, decision model structure, costing methodology, weight regain assumptions and time horizons. To our knowledge, this (both our REBALANCE review and updated review) is the first systematic review of economic evaluations of different WMPs for severe obesity (BMI ≥ 35 kg/m2).

Two reviews have recently been conducted on the cost-effectiveness of interventions for people with obesity [42, 43]. However, unlike our review, they focused on bariatric surgery only their population of interest was people with obesity (BMI ≥ 30 kg/m2) rather than severe obesity (BMI ≥ 35 kg/m2), included partial economic evaluations (e.g. cost only, studies or effectiveness evaluations) in addition to full economic evaluations. As in the REBALANCE study, they also found surgery to be cost-effective. One of their included studies [44] applied a post-surgery complication risk over a 10-year period. This is a step in the right direction considering the evidence showing a longer term risk of complications following bariatric surgery [45, 46]. More recent relevant data on longer term surgery complications would improve future obesity decision models.

The quality of the included studies varied. However, as we have learnt from the REBALANCE study, many of these quality assessment items were not captured in the quality assessment checklists. These additional items for the quality assessment checklists would improve the quality assessment of obesity models [7]. Firstly, weight regain assumptions in the decision models varied widely, were poorly justified and were rarely explored in sensitivity analyses (only in 4 studies). This is important especially for WMPs because the majority of WMPs were of short duration and therefore, the longer term weight regain rate is unknown. The assumed weight regain rate (BMI trajectory over time) is associated with an increased risk of developing ORDs. Therefore, an intervention assuming patients revert back to baseline in 5 years’ time is more likely to be cost-effective than assuming patients revert back to baseline BMI immediately. Secondly, many studies did not include all the relevant disease health states such as T2DM and stroke. Lastly, the trial results should be extrapolated over a longer time horizon. Including these items on the quality assessment checklist would be helpful to reviewers in assessing the quality of obesity models.

Two studies in the review (UK studies) evaluated multiple WMPs and bariatric surgery, however, one with only a 10-year time horizon for costs and outcomes [27] and the other with a lifetime horizon for costs and outcomes [6••]. The REBALANCE study [6••] included all the relevant comparators (both surgical and non-surgical options) that were identified through a systematic review of RCTs, and modelled over a lifetime horizon. From a UK NHS perspective, the generalisability of the results in the systematic review presented here to a UK setting is poor. A recent UK RCT was published evaluating a VLCD (DROPLET trial) offered in primary care, and was found to be cost-effective over a lifetime horizon [26•]. However, the only comparator was nurse-led support. There is a need for a comparison of commonly available treatments in the UK NHS.

Strengths and Limitations

Key strengths of this study are the systematic approach to the literature review in identifying the cost-effectiveness evidence on interventions for severe obesity and the methodological quality assessment of the included studies. Furthermore, this review brings focus to the population with severe obesity, identifying value for money interventions for treating severe obesity.

Due to study heterogeneity, no quantitative synthesis of the study results by meta-analysis was attempted, a common issue with systematic reviews of economic evaluations. This is because studies were conducted in different countries with different health care systems, different definitions of comparator groups, model structures, costing methods and modelling assumptions. A detailed quality assessment was not conducted for all included studies, only for those identified through the REBALANCE review, but this informed our subsequent assessment of studies.

Conclusions

Most WMPs were cost-effective and pharmacotherapies showed mixed results. However, the cost-effectiveness evidence should be read with caution due to the varying methodological issues and study heterogeneity across the studies. About half of the WMPs were economic evaluations alongside RCTs, not accounting for the difference in long-term costs and outcomes between the considered interventions, crucial for a chronic disease such as obesity. WMPs tended to have short-term follow-up, rendering it even more important to make use of decision models. Decision models did not include most relevant health states and had varying assumptions around weight regain which was rarely explored in sensitivity analysis.

Although there exists a decision model assessing different types of interventions [6••], there is still a need for future economic evaluations to focus on effective interventions available on the UK NHS for people with severe obesity. Furthermore, there is room for improvement with regard to obesity models and their methodology. To improve decision models, there is a need for the inclusion of all the important health states, improved consistency in the assumed weight regain rate (which ideally should be based on best available evidence), and improved transparency in the description of the comparators (and interventions) to allow better comparison across studies.