FormalPara Key Summary Points

The obesity/overweight epidemic is a major public health challenge in an increasing number of countries worldwide

Understanding the multifaceted consequences of obesity/overweight, including its impact on health-related quality of life (HRQoL), will serve to inform decisions for policymakers, physicians and patients

The objective of this study was to investigate the relationship between body mass index (BMI) and HRQoL using data collected in two weight loss intervention studies, SCALE and STEP 1

A one unit change in BMI leads to a difference of up to 0.005 in utility indices; albeit not statistically significant, this finding is in line with those of similar studies in obesity

The results of the current analysis indicate that the differences in BMI alone may not explain the differences in utility indexes observed at baseline in the analyzed populations and that other factors, such as age, gender and presence of chronic or acute complications, need to be considered

Introduction

Obesity imposes a significant public health concern globally and is defined as an excessive accumulation of body fat. Obesity can be indicated by the body mass index (BMI), a metric used for defining anthropometric height/weight characteristics in adults, i.e., a BMI ≥ 30 kg/m2 indicates obesity in white, Hispanic and Black individuals and a BMI ≥ 25 kg/m2 indicates obesity in South-East Asian populations [1]. The World Health Organization (WHO) European Regional 2022 Report estimated that almost 23% of adults (aged ≥ 18 years) are living with obesity (22% of men and 24% of women), and > 1.2 million deaths annually are linked to obesity (corresponding to > 13% of total mortality in Europe) [2]. According to the Active Lives Adult Survey England, obesity affected 25.2% of adults (aged ≥ 18 years) in 2020–2021, which is an increase from 24.4% in 2019–2020 and 22.6% in 2015–2016 [3]. Likewise, the World Obesity Federation reported that in Portugal nearly 17.7% of adults (aged ≥ 18 years) were affected by obesity (17.1% of men and 18.2% of women) in 2019 [4].

Obesity increases the risk of type 2 diabetes (T2D), sleep apnea and cardiovascular diseases, and several cancers are more prevalent in people with overweight and obesity [5, 6]. The association between weight loss and a reduction in the incidence and/or prevalence of such comorbidities has been assessed in a few studies. For instance, a previously published weight loss study, Sleep AHEAD (Action for Health in Diabetes), noted a delay in T2D onset and a reduction in the prevalence and/or severity of sleep apnea [7]. Another study based on UK Clinical Practice Research Datalink GOLD data reported a reduction in the incidence of T2D, hypertension, dyslipidemia and chronic kidney disease with a median weight loss of 13%. However, this study showed no changes in cardiovascular outcomes (such as heart failure, atrial fibrillation, unstable angina/myocardial infarction) or overall mortality [8].

The negative impact of obesity also extends beyond morbidity and mortality; it may impair physical, emotional and social functioning, and is associated with poor self-image and self-esteem [9]. Therefore, it is important to better understand how obesity affects an individual’s ability to feel and function. This has been measured by evaluating the health-related quality of life (HRQoL) in individuals with obesity, and almost all reports have shown that HRQoL is lower in people with a higher BMI [10,11,12]. Several cross-sectional studies have shown that obesity affects both physical and psychological functioning. However, this varies by gender, age and ethnicity [13,14,15,16]. For instance, women who have obesity have a relatively lower HRQoL compared with men, and this is more likely to affect physical rather than psychological functioning [16]. Furthermore, both age and gender also impact different components of HRQoL, i.e., younger men and women (aged 18–34 years) with obesity exhibit poor physical health but older women with obesity (aged 35–64 years), but not men, demonstrate low HRQoL regarding both physical and mental functioning [17]. In addition, there is growing support for HRQoL assessment in clinical studies [18,19,20], and it is considered an adjuvant to clinical and physiological assessments in many chronic conditions. Therefore, HRQoL data collected alongside clinical trials are indicated to be the “gold standard” in the evaluation of healthcare services and outcome assessments [21].

One of the most widely used instruments to measure HRQoL in people living with obesity is the Short Form Health Survey 36-item (SF-36) [22]. It is a 36-item generic instrument of health status that yields two summary scores for physical and mental health and eight domain scores: physical functioning, role functioning, bodily pain, general health, vitality, social functioning, emotional and mental health [22]. Higher scores indicate better health status and functional ability [23]. While SF-36 scores provide the means for judging the effectiveness of healthcare interventions, they have limited application in economic evaluation because they are not based on patient preferences [22]. Indeed, the National Institute for Health and Care Excellence (NICE) in England and Wales recommends that changes in HRQoL should be directly measured from participants whereas the utility of these changes should be based on public preferences using a choice-based method [24]. The EuroQoL-5 dimensions-3 levels (EQ-5D-3L) measurement method is preferred for measuring HRQoL in NICE’s reference case to ensure consistency across evaluations. The Short Form 6 dimension (SF-6D) is an accepted alternative in a few countries, including Portugal, when EQ-5D data cannot be derived [25]. Therefore, scores obtained on the SF-36 six-digit health domains need to be converted into preference-based measures, resulting in a utility score. A utility score is anchored at 1 for full health and 0 for death [22]. The results of SF-36 are directly transferable to the SF-6D index using a mapping tool developed by the University of Sheffield [22, 26, 27], for which general population utility weights are available for a number of countries. A published algorithm by Rowen et al. [28] allows the mapping of SF-36 scores to the EQ-5D-3L index with UK general population weights.

The key objective of this analysis was to assess the correlation between BMI and utilities using HRQoL data collected at baseline in two weight loss intervention studies, SCALE and STEP 1. The analysis is intended to provide utilities that can be used in health economic modeling.

Methods

SCALE and STEP 1: Trial Overview

SCALE (NCT01272219) was a randomized, double-blind, placebo-controlled, parallel-group, multicenter, multinational trial in adults with overweight or obesity (BMI ≥ 30 kg/m2 or BMI ≥ 27 kg/m2 with comorbidities) comparing subcutaneous liraglutide 3.0 mg once daily with placebo in inducing and maintaining weight loss over 56 weeks [29] and reducing the risk of T2D in subjects with prediabetes at baseline after 160 weeks [30]. Subjects with diabetes at baseline were excluded from the study. The trial was conducted from June 1, 2011, to March 2, 2015 [29, 30].

STEP 1 (NCT03548935) was a randomized, double-blind, placebo-controlled, multicenter clinical trial in adults with overweight or obesity (BMI ≥ 30 kg/m2 or BMI ≥ 27 kg/m2 with ≥ 1 weight-related comorbidity) comparing semaglutide 2.4 mg with placebo in inducing weight loss over 68 weeks (main trial) [31]. The trial was conducted from June 4, 2018, to March 5, 2021 [32].

SCALE and STEP-1 trials were chosen as these are pivotal studies where the efficacy and safety of liraglutide 3.0 mg and semaglutide 2.4 mg have been investigated in people with overweight and obesity.

Measurement of HRQoL

During the SCALE trial, HRQoL was assessed using the Impact of Weight on QoL-Lite (IWQoL-Lite), Treatment-Related Impact Measure-Weight (in which higher scores indicate better QoL) and SF-36 v2 health questionnaires. These scores were captured for the 1-year main trial duration [29] as well as for the extension period and for the delayed onset of the T2D part of the trial [30].

During the STEP 1 trial, patient-reported outcomes were assessed by questionnaires such as the IWQoL-Lite Clinical Trials (version 3.0), Patient Global Impression of Status for physical function (version 1.0), Patient Global Impression of Change, SF-36 v2 acute, Stanford Presenteeism Scale-6 (2001 version) and International Consultation on Incontinence Questionnaire-Urinary Incontinence-Short Form (only in women) [33]. These scores were collected during randomization and the maintenance period and at the end of treatment during the main phase of the STEP 1 trial.

In our current analysis, SF-36 scores from SCALE and STEP 1 were mapped to the EQ-5D-3L using the algorithm of Rowen et al. [28] to derive utilities for the UK, while SF-36 scores from STEP 1 were converted into SF-6D utilities with Portuguese general population weights using the tool developed by the University of Sheffield [22, 26, 27].

Correlation Between BMI and Baseline Utility

The relationships between baseline BMI and utility scores were investigated using multiple linear regression analyses, controlling for demographic and clinical parameters, in line with analyses presented by Søltoft et al. [13]. The response variable was the baseline utility value, and the predictors were BMI linear, BMI quadratic and BMI cubic terms. The model also included the following covariates: gender (men vs. women), age groups (18–24, 25–34, 35–44, 45–54, 55–64, 65–74 and ≥ 75 years), smoking status (never smoked vs. previously smoked vs. current smoker), presence of heart or circulatory diseases at baseline (excluding hypertension) (yes vs. no) and hypertension (yes vs. no) for SCALE. STEP 1 included covariates such as gender (men vs. women), age (years), smoking status (never smoked vs. previously smoked vs. current smoker), presence of coronary artery disease (CAD) at baseline (yes vs. no), hypertension (yes vs. no) and presence of prediabetes at baseline (yes vs. no). The models presented maximum likelihood parameter estimates, standard errors, 95% confidence intervals (CIs) and p values (where applicable) for each covariate. Given the expected differences between genders for many parameters, as shown in Søltoft et al. [13], separate models for men and women were fitted, and the expectation of significant differences between genders was checked in the initial model. Visual diagnostics of the fitted models, based on quantile–quantile plots of studentized residuals and scatter plots of residuals versus the predicted values, were performed. The statistical significance of regression coefficients corresponding to covariates was determined using two-sided t tests at the α = 0.05 significance level. p values were not controlled for multiplicity. The data were analyzed using SAS version 9.4.

Statistical Analysis

The analysis set in SCALE was the full analysis set (FAS) of all participants (all randomized participants exposed to at least one dose of the trial product and with at least one post-baseline weight measurement) with prediabetes at baseline from the subset of countries where HRQoL was assessed. The analysis set in STEP 1 was the FAS, defined as all randomized subjects, according to the intention-to-treat principle.

Continuous data were described as the number of observations (N), the number of missing observations (N miss), mean and standard deviation (SD). Categorical data were described as the number (n) and percentage (%) of participants in each category.

Ethical Approval

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

Baseline Characteristics

The baseline demographics and clinical characteristics of SCALE (study population with prediabetes and a BMI of at least 30 kg/m2 or at least 27 kg/m2 with comorbidities from countries where HRQoL was assessed, n = 1755) and STEP 1 (n = 1961) study participants are shown in Table 1. The mean (SD) BMI was 39.2 (6.5) and 37.9 (6.7) kg/m2 in the SCALE and STEP 1 datasets, respectively, and 22.5% (396/1755) and 25.9% (508/1,961) of the SCALE and STEP 1 analysis sets, respectively, were men. The mean (SD) ages in SCALE and STEP 1 were 47.9 (11.7) and 46.5 (12.7) years, respectively. In SCALE, 13.4% of participants had heart or circulatory diseases at baseline, and in STEP 1, 2.3% of participants had CAD at baseline. The mean (SD) baseline EQ-5D-3L scores in SCALE were 0.929 (0.061) in men and 0.923 (0.064) in women. The corresponding STEP 1 EQ-5D-3L scores were 0.903 (0.094) in men and 0.892 (0.106) in women. The mean baseline SF-6D utilities (Portugal) in STEP 1 were 0.900 (0.065) and 0.898 (0.064) in men and women, respectively.

Table 1 Baseline demographics and disease characteristics of participants from SCALE and STEP 1

Correlation Between Baseline BMI and Utility

SCALE EQ-5D-3L for UK

In the prediabetes FAS population of SCALE, a higher baseline BMI was correlated with decreasing baseline utility scores (Fig. 1), although the correlation was non-significant based on p values of linear, quadratic and cubic effects. The confidence bands widened toward the extremes of the observed BMI.

Fig. 1
figure 1

Regression model: predicted EQ-5D-3L (UK) utility score from SCALE vs. BMI. Red-colored area corresponds to the prediction interval for men; green-colored area is the prediction interval for women. BMI body mass index, EQ-5D-3L EuroQoL-5 dimensions-3 levels

The results of the regression model on baseline utility (EQ-5D-3L) are presented in Table 2. The model predicted statistically significant higher baseline utilities for men (0.0104; 95% CI 0.0034, 0.0174; p = 0.0036) compared with women and statistically significant lower utilities for older participants (age 45–54 years: − 0.0145; 95% CI − 0.0247, − 0.0043; p = 0.0054, age 55–64: − 0.0203; 95% CI − 0.0313, − 0.0092; p = 0.0003, age 65–74: − 0.0229; 95% CI − 0.0376, − 0.0083; p = 0.0022, age ≥ 75: − 0.0512; 95% CI − 0.1013, − 0.0012; p = 0.0449) compared with younger participants (25–34 years) and for participants with hypertension (− 0.0089; 95% CI − 0.0152, − 0.0026; p = 0.0055) compared with participants without hypertension; all other parameters were fixed. BMI, heart or circulatory disease and smoking status were non-significant predictors of baseline utility. To preserve comparability with the model presented by Søltoft et al. [13], simplified models, e.g., with removed quadratic and cubic terms for BMI, were not further explored.

Table 2 Regression model between baseline BMI and baseline EQ-5D-3L (UK) scores from SCALE and STEP 1 and SF-6D (Portugal) scores from STEP 1 – full analysis set (men and women)

In the male subset (Table 3), a statistically significant and negative correlation was reported between utility and the age group of 65–74 years (− 0.0279; 95% CI − 0.0555, − 0.0003; p = 0.0474) compared with younger participants. In the female subset (Table 4), the age groups of 45–54 years (− 0.0125; 95% CI − 0.0242, − 0.0009; p = 0.0343), 55–64 years (− 0.0198; 95% CI − 0.0324, − 0.0071; p = 0.0022) and 65–74 years (− 0.0206; 95% CI − 0.0382, − 0.0030; p = 0.0218) compared with younger participants and participants with hypertension (− 0.0115; 95% CI − 0.0188, − 0.0042; p = 0.0021) compared with participants without hypertension were statistically significant and negatively correlated with utility.

Table 3 Regression model between baseline BMI and baseline EQ-5D-3L (UK) scores from SCALE and STEP 1 and SF-6D (Portugal) scores from STEP 1—male population
Table 4 Regression model between baseline BMI and baseline EQ-5D-3L (UK) scores from SCALE and STEP 1, and SF-6D (Portugal) scores from STEP 1—female population

STEP 1 EQ-5D-3L for UK

In the FAS population of STEP 1, a higher baseline BMI was correlated with decreasing baseline utilities (Fig. 2). Note that although we observed a decrease in average utility values with an additional unit of BMI, this trend was not significant based on p values of linear, quadratic and cubic effects. The confidence bands widened toward the extremes of the observed BMI.

Fig. 2
figure 2

Regression model: predicted EQ-5D-3L (UK) utility score from STEP 1 vs. BMI. Red-colored area corresponds to the prediction interval for men; green-colored area is the prediction interval for women. BMI body mass index, EQ-5D-3L EuroQoL-5 dimensions-3 levels

The results of our regression model on baseline utility (EQ-5D-3L) are presented in Table 2. Assuming other parameters being fixed, our model predicted EQ-5D-3L lower baseline utilities for women compared with men (0.0123; 95% CI 0.0020, 0.0225; p = 0.0188), previous smokers (− 0.0119; 95% CI − 0.0224, − 0.0014; p = 0.0257) compared with never-smokers, older participants (− 0.0008; 95% CI − 0.0012, − 0.0004; p = 0.0002) compared with younger participants and participants with CAD (− 0.0705; 95% CI − 0.1003, − 0.0407; p < 0.0001) compared with those without CAD at baseline. Prediabetes status at baseline and hypertension, along with BMI, were not significant predictors of baseline utility. To preserve comparability with the model presented by Søltoft et al. [13], simplified models, e.g., with removed quadratic and cubic terms for BMI, were not further explored.

In the male subset of the FAS (Table 3), the presence of CAD at baseline (− 0.0848; 95% CI − 0.1197, − 0.0499; p ≤ 0.0001) was the only statistically significant (negative) predictor of baseline EQ-5D-3L scores. In the female subset (Table 4), older age (− 0.0010; 95% CI − 0.0014, − 0.0005; p = 0.0001), presence of CAD at baseline (− 0.061; 95% CI − 0.1092, − 0.0136; p = 0.0119) and current (− 0.0177; 95% CI − 0.0346, − 0.0008; p = 0.0403) or previous smoking (− 0.0153; 95% CI − 0.0281, − 0.0026; p = 0.0182) were statistically significantly negatively correlated with baseline utility.

STEP 1 SF-6D for Portugal

In the FAS population of STEP 1, the estimated regression coefficients indicated that participants with higher BMI had lower baseline SF-6D scores, although the correlation was not statistically significant (Fig. 3).

Fig. 3
figure 3

Regression model: predicted SF-6D (Portugal) utility score from STEP 1 vs. BMI. Red-colored area corresponds to the prediction interval for men; green-colored area is the prediction interval for women. BMI body mass index, SF-6D Short Form 6 dimension

The results of the regression model on baseline utility (SF-6D) are presented in Table 2. Only the presence of CAD at baseline (− 0.0228; 95% CI − 0.0420, − 0.0036; p = 0.0198) was statistically significant and negatively correlated with SF-6D versus those without CAD. All other factors were not statistically significant predictors of baseline utility. The estimated regression coefficients indicated that participants with prediabetes had smaller baseline utilities (− 0.0014; 95% CI − 0.0074, 0.0046; p = 0.6519) than those without prediabetes, older participants had higher baseline utilities (0.0001; 95% CI − 0.0002, 0.0004; p = 0.4519) than younger participants, men had higher baseline utilities (0.0028; 95% CI − 0.0038, 0.0094; p = 0.4118) than women, and participants with hypertension (− 0.0004; 95% CI − 0.0106, 0.0024; p = 0.2146) and current (− 0.0010; 95% CI − 0.0100, 0.0080; p = 0.8238) or previous smokers (− 0.0003; 95% CI − 0.0071, 0.0064; p = 0.9238) had lower baseline utilities than those without hypertension or never-smokers.

In the male subset of the FAS (Table 3), the presence of CAD at baseline (− 0.0303; 95% CI − 0.0555, − 0.0051; p = 0.0184) was the only factor with a statistically significant correlation (negative) with utility. In the female subset of the FAS (Table 4), no statistically significant correlation between the selected factors and baseline SF-6D utilities was identified.

Overall, based on the estimated relationship between baseline BMI and utility (Tables 3 and 4), it can be calculated that an additional unit of BMI (within the range of 30–40 kg/m2) was correlated with 0.0041 and 0.0031 lower EQ-5D-3L scores in SCALE, 0.0039 and 0.0047 lower EQ-5D-3L scores in STEP 1 and 0.0027 and 0.0020 lower SF-6D scores among men and women, respectively, assuming linearity of the relationship.

Discussion

The present analysis assessed the correlation between baseline BMI and utility measures using HRQoL data collected in the SCALE and STEP 1 trials [29,30,31], the results of which can facilitate the translation of the clinical outcome, i.e., weight loss, into HRQoL benefits, as expressed by a utility index. Although SF-36 reported improvements in HRQoL measurement with both liraglutide 3.0 mg and semaglutide 2.4 mg in the SCALE and STEP trials [29,30,31], respectively, these values cannot be translated into health utilities used in health economic evaluations. Contrarily, EQ-5D-3L and SF-6D values obtained during this analysis provide easily interpretable (cardinal) index scores and allow comparability across different diseases and therapeutic areas [22, 26, 27].

Our analysis revealed that after adjusting for relevant confounding factors such as age, gender and obesity-related comorbidities, as present in the dataset, there was a negative correlation between baseline BMI and utility scores in UK EQ-5D-3L (SCALE, STEP 1) and Portuguese SF-6D (STEP 1) utilities, although the observed effect size was small and thus statistical significance was not reached for the model assumptions and dataset herein analyzed. This result indicates that the differences in BMI alone may not explain the differences in utility indexes observed at baseline in the analyzed populations and that other factors, such as age, gender and presence of chronic or acute complications, may need to be considered.

Despite the small effect size and the lack of statistical significance, the characterization and quantification of the multifactorial contributors to utility indexes remain relevant in the context of healthcare allocation and decision-making as they can be used to inform cost-effectiveness analyses whereby an explicit quantification of such factors is needed. This is especially important in the current context of multiple emerging weight loss interventions with further implications on clinical practice and choice of obesity management whereby interventions with demonstrated multifactorial benefits, going beyond the reduction of weight alone, should be prioritized. Some of these benefits have been demonstrated for semaglutide 2.4 mg and liraglutide 3.0 mg and include delays in the occurrence of diabetes [30], improvements in heart function for people with heart failure and preserved ejection fraction [34, 35] or the avoidance of recurrent acute cardiovascular events or mortality [36].

Indeed, the negative correlation between BMI and the utility index estimated in our study is similar to other published estimates [37,38,39]. For instance, Khan et al. [40] estimated that the coefficient of BMI was − 0.006, i.e., a unit increase in BMI was predicted to decrease the utility score by 0.006 (assuming other factors being fixed). Søltoft et al. reported lower utility scores above and below the maximum BMI that were achieved at 26 kg/m2 in men and 24.5 kg/m2 in women [13], as did Luah et al. in a more recent study [41]. Our analysis is, however, different from that of Søltoft et al. in certain aspects [13]. The study by Søltoft et al. included > 14,000 participants’ data derived from the 2003 Health Survey for England and thus was designed to provide a representative sample of the population of England living in private households [13]. Only 20% of the studied population had obesity, and 45.4% of men and 33.8% of women were overweight (BMI ≥ 25 kg/m2) [13]. Thus, the range of BMI values analyzed by Søltoft et al. was different from that of our dataset. This may explain the lack of significance of the correlation in our study when fitting a curvilinear model.

The results of our study should be interpreted in the context of a few limitations. BMI is acknowledged as a surrogate measure of body fatness as it measures excess weight rather than excess fat. Yet, due to its simplicity, inexpensiveness and noninvasiveness, it is the most widely used measure of overweight and obesity at population level and recommended as such by the WHO. Furthermore, BMI has been shown to highly correlate with more direct measures of body fat, such as dual-energy x-ray absorptiometry (DXA) derived fat mass index (FMI) and the percentage body fat (PBF); findings have been consistent across age groups, genders and race-ethnicity [42]. Finally, the target population of our analysis represents people who seek to lose weight and not people who have a high BMI due to higher muscular mass. Thus, in this population, it is more likely that BMI is good proxy for body fatness.

Next, the current study does not investigate the causal correlation between BMI and utilities as this is difficult to substantiate due to the multifactorial nature of utilities. However, by controlling for known demographics and the presence of weight-related diseases, which are expected to mediate this correlation partially, the current analysis suggests that such a causal relationship may exist.

Finally, as there are cultural differences in perceptions of weight and physical appearance, preferences for these health state utilities may vary across certain cultures and geographic regions. The extent to which the reported utilities would generalize to populations outside the UK and Portugal, and more generally Europe, is unknown. Therefore, it would make sense to replicate this research in other geographic regions before applying the resulting utilities in health economic evaluations to support decision-making in other countries.

Conclusion

The results of the current analysis highlight the role of factors such as age, smoking status and the presence of comorbidities in influencing utility indexes, i.e., EQ-5D-3L and SF-6D, and thereby the correlation between BMI and HRQoL. The analysis also showed that an additional unit of BMI (within the range of 30–40 kg/m2) was related to a loss on the utility index in both men and women but that the independent effect of weight was small and did not reach statistical significance, hence pinpointing the multifactorial nature of utility indexes and the need to consider other determinants of HRQoL alongside weight loss. Overall, our study findings may be useful to inform cost-effectiveness analyses wherein multiple factors such as weight, weight-related comorbidities and other demographic characteristics are considered to produce a standardized measure of cost per benefit.