Introduction

Globally, the obesity rate is the second-highest in the European region (24% in women and 22% in men) after the American region (30% in women and 24% in men) [1]. In Sweden, this rate has doubled within the past 10 years [2], and the prevalence of morbid obesity (body mass index (BMI) ≥ 40 kg/m2, or BMI ≥ 35 kg/m2 plus at least one obesity-related comorbidity) has increased markedly [2, 3]. Obesity is a cause of early mortality, reduced health-related quality of life (HRQoL), and increased risk for cancer, diabetes mellitus, and cardiovascular disorders [4]. Bariatric surgery is the most effective treatment for weight reduction in people living with morbid obesity [5]. This surgery improves obesity related comorbidities including diabetes [6] and cardiovascular disease [7]. A steady increase in the utilization of bariatric surgery has been observed worldwide, with more than 600,000 procedures now being performed annually [8].

Risk prediction models are used to guide clinicians and patients in a joint decision-making process for the selection of appropriate treatments [9]. For bariatric surgery, postsurgical weight [10], remission of associated comorbidities [11], and postsurgical complications [12] are the common predicted outcomes, and risk predictors include patient characteristics (e.g., age, sex, and baseline weight), socioeconomic status, medical history (e.g., comorbidities), and genetic markers. However, as the ultimate goal of bariatric surgery is to improve health, understanding how bariatric surgery might affect HRQoL is important. A few studies have investigated this issue [13,14,15,16], and there is a scarcity of studies developing prediction models for HRQoL. The two available prediction models developed by Cao et al. predict HRQoL measured by the Obesity-related Problems Scale and SF-36 (including both sub-domain scores and summary scores) at 2 and 5 year [17, 18], respectively. These models are useful in terms of supporting clinicians and patients in making decisions. However, for resource allocation decision-making within the health sector, it would also be useful to have a single overall measure such as quality-adjusted life years (QALYs).

QALYs are a summary measure that incorporates both life years and quality of life, which enables comparisons across different disease areas and populations. QALYs are required for economic evaluation involving treatment comparisons by many national reimbursement agencies and advisory bodies, such as the National Institute for Health and Clinical Excellence (NICE) in the UK and the Dental and Pharmaceutical Benefits Agency in Sweden [19, 20]. QALYs are calculated by multiplying the duration of time spent in a health state by its health utility. Health utility, which preference for a particular health state, is a numeric value that varies between 0 and 1, where 0 represents “dead” and 1 represents “full health”. Health utility can be estimated by the standard gamble, time trade-off or rating scale methods. However, as these direct valuation methods are often difficult to implement, in practice, a pre-scored preference-based measure (PBM) [21], such as the EQ-5D [22], SF-6D [23], or the health utilities index (HUI) [24], is applied. The psychometric properties of the SF-36 instrument has been estimated among patients with obesity, the validity and reliability of the instrument was accessed with other clinical indicators and disease specific instrument [25, 26]. SF-36 is the most commonly applied generic measure among patients with obesity [25, 26], as well as in Sweden [27]. However, as the SF-36 is not a preference measure, the SF-36 scores cannot be used to access health utility score. The short form six-dimensions (SF-6D) was developed to obtain health utility score from the SF-36 [28] or its 12-item version (SF-12), using a standard gamble method [29].

There is an association between weight reduction following bariatric surgery and enhancements in HRQOL. However, this may not always be the case due to post-surgery complications or emotional/functional difficulties [30]. The psychological consequences of bariatric surgery need further investigation. Previous studies have mainly reported on HRQOL for up to one-year post-surgery, with only a few studies reporting on longer-term follow-ups (≥ 2 years) [14], It has been found that HRQOL is more likely to improve within the first two years post-surgery, but further investigations is needed to fully understand HRQoL in the longer term. The study aimed to investigate whether QALYs of the patients who underwent bariatric surgery could be predicted using baseline information.

Methods

Data sources

Data of all patients who received bariatric surgery in Sweden between January 1, 2011 and March 31, 2019 (n = 47,653) were obtained from the Scandinavian Obesity Surgery Registry (SOReg) for the current study. SOReg is a national quality registry for bariatric surgery in Sweden with a coverage of > 98% nationwide [31]. Information including patients’ sociodemographic characteristics, details regarding the procedure and postsurgical conditions, HRQoL assessed by the RAND-36 and Obesity-related Problems Score. HRQoL data were reported by the patients prior to surgery and at 1, 2, and 5 years postoperatively. Due to the large proportion of missing data (> 60%), QALYs at 5 years were not analyzed in the current study. Ethical permission for analyzing the data was granted by the Ethics Authority in Sweden (reference number: 2019–03666).

Health outcomes

The SF-36 measures HRQoL in eight domains (social functioning, physical function, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, and mental health), and the SF-36-v1 has been applied in the SOReg [15, 21]. The short form six-dimensions (SF-6D) was developed to derive a preference-based score from the SF-36 [28] or its 12-item version (SF-12) [29]. A brief description of the method was provided below: firstly, 11 items from the original SF-36 were selected to construct the SF-6D six domains, which including pain, mental health, physical functioning, social functioning, role limitations, and vitality, with each domain described on 4–6 functional levels. The combination of the different domains and severity levels defines in total 18 000 health profiles. Step 2, a subset of 249 health profiles were selected, and valued by a representative sample from the general population, using standard gamble (SG) method. Step 3, various econometric models were tested, to estimate a social value set. Such value sets have been established in several countries, in the current study, the UK tariff were applied [28] since no local Swedish tariff was available.

QALYs were calculated using the area under the curve (AUC) method: the SF-6D index at each time point was represented as data points, which were first connected by straight lines to define the “curve”. Then, the AUC was calculated by adding the areas under the curve between each pair of consecutive observations [32].

Predictors

The following variables were included as predictors in the current study:

Baseline demographic and health characteristics (including age, sex, height, weight, BMI, waist circumference, HbA1c, education level, and smoking status), year of operation, operation time (in minutes), postoperative care period (in days), comorbidities (including sleep apnea, hypertension, diabetes, dyslipidemia, dyspepsia, diarrhea, depression (defined as pharmacological treatment for depression in the present study), and other illnesses), history of deep vein thrombosis (DVT) or pulmonary embolism (PE), previously cholecystectomy, previously anti-reflux surgery, surgery type, primary surgery, initially planned two-step operation, surgical access, reason for conversion, operation method, other simultaneous surgery (including cholecystectomy, gynecological surgery, incisional hernia or umbilical hernia repair, splenectomy, clearing of adhesions more than 10 min, cruroplasty, and other surgeries), specific intra-operative complication (including bleeding, leakage, abscess/deep infection, wound rupture, other wound complications, bowel obstruction/ileus, band-related complications, port-related complications, stricture, marginal ulcer, cardiovascular complication, DVT/PE, pulmonary complication, urinary tract infection, other complications), severity of complication, and baseline operation score and the 11 SF-36 items for constructing SF-6D (including three items for physical function, one for role-physical, two for bodily pain, one for vitality, one for social functioning, one for role-emotional, and two for mental health).

Binary and multi-nominal variables were converted into one or multiple dummy variables, and continuous and ordered variables were standardized to have a mean of 0 and a standard deviation of 1. HbA1c was log transformed before standardization due to its asymmetric distribution.

Missing values in the predictors were imputed using the multiple imputation method in R using the “mice” package [33]. This involved predictive mean matching with chained equations and k-nearest neighbors to draw imputed values, and five imputed datasets were generated for the later prediction analysis [34].

Prediction models

A general linear regression model and regularized linear regression models, including Lasso regression (with L1 regulation), ridge regression (with L2 regulation), and elastic net regression (with L1 and L2 regulations together) [35] were used to predict postoperative QALYs at follow-up years 1 and 2.

A recursive variable elimination (RVE) method was used during model training, which selected variables by recursively considering smaller and smaller sets of variables [36]. The models were first trained on the initial set of variables, then the importance of each variable was evaluated, and lastly the least important variables were pruned from the current set of variables. The procedure was recursively repeated on the pruned set until the desired number of variables that maximized the R2 was eventually reached. The desired optimal number of variables were arbitrarily set to 1, 11, 21, 31, 41, 51, 61, and 71 in the current study. For hyperparameter tuning during the training of the Lasso regression, ridge regression, and elastic net regression models, the exhaustive grid search method over the penalty parameter λ and L1 penalty ratio to maximize the R2 was used to determine the optimal parameters [37]. The relative importance of the variables selected in each model was evaluated used the permutation importance [38].

Model training, validation and test

The whole dataset was randomly split into two parts: a training dataset with 80% of the patients and an external test dataset with 20% of the patients. For the training dataset, k-fold (k = 5 in the current study) cross-validation was used. This approach involved randomly dividing the training data into k groups of equal size. Afterwards, the models were trained on the k-1 folds, and the remaining one fold was used to validate the models. The process was repeated until the models were validated on all the k folds. For each patient in the training dataset, the predicted value was that he/she obtained during in the validation. The final models were then evaluated using the test dataset [39].

Model performance metrics

The performance of the models was evaluated using R2 and relative root mean squared error (RRMSE) [40]. R2 is the percentage of the variation in the outcome variable that is explained by the predictors. It is commonly used to check how well-observed results are reproduced by linear models, depending on the proportion of total deviation of results explained by the models. Although adjusted R2 is recommended for comparing performance between models, there was no detectable difference between the R2 and adjusted R2 due to the considerable number of variables included in the final models [41]. RRMSE is another common way of measuring the performance of the prediction in statistical modelling, particularly regression analyses. RRMSE quantifies deviations from the true values relative to the mean of the true values. The smaller the RRMSE is, the better. A model is considered excellent when RRMSE ≤ 10%, good if 10% < RRMSE ≤ 20%, fair if 20% < RRMSE ≤ 30%, and poor if RRMSE > 30% [40].

Because five test datasets were used for prediction purposes, the final R2 and RRMSE of a model were calculated as the average values of those from the five datasets.

All analyses were conducted in Python 3.7 (Python Software Foundation, Beaverton, OR, U.S.) and R 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Patient characteristics

After excluding patients with age < 18 years, BMI < 30 kg/m2, HbA1c < 10 mmol/mol, postoperative care period > 365 days, and incomplete HRQoL form, 32,232 and 16.141 patients were included in the final analysis for follow-up years 1 and 2, respectively. The average age of the patients was 41.2 years, with a standard deviation (SD) of 11.3 years, and most were women (76.6%, Table 1). The standardized differences (SDDs) between all the patients and the patients included in the study at follow-up years 1 or 2 were less than 0.1, except for operation time (Table 1 and supplementary Table S1). In general, the included patients were a representative sample of those who received bariatric surgery in Sweden.

Table 1 Baseline demographic characteristics and SF-36 item scores of the patients (follow-up year 1)

Similarly, although statistically significant differences were found in the SF-36 item scores, in general, the differences were not of clinical significance. Nevertheless, the inverse probability weighting (IPW) method was used in later analyses to account for the probability of a patient being included in the prediction.

The mean QALYs of the included patients at follow-up years 1 and 2 were 0.72 (SD = 0.10) and 1.50 (SD = 0.20), respectively (Tables 1 and S1).

Model performance

In general, all the models showed equivalent performance by means of R2 for both training and test data, without any observed overfitting (Figs. 15). For the general linear regression model, the performance of the model increased with the number of variables; however, the improvement was ignorable when the number of variables was more than 30 and 50 for follow-up years 1 and 2, respectively (Fig. 1).

Fig. 1
figure 1

R2 of the general linear regression by number of the optimal variables (left, follow-up year 1; right, follow-up year 2)

The results indicated that L1 regularization could not improve model performance because R2 decreased with the larger penalty parameter λ used in the larger Lasso regression model (Fig. 2).

Fig. 2
figure 2

R2 of the Lasso regression by number of the optimal variables and penalty parameter (left, follow-up year 1; right, follow-up year 2)

Although L2 regularization may improve the model performance, indicated by the increased R2 with a larger penalty parameter λ, the improvement was negligible when the number of variables was more than 30 and 40 for follow-up years 1 and 2, respectively (Fig. 3).

Fig. 3
figure 3

R2 of the ridge regression by number of the optimal variables and penalty parameter (left, follow-up year 1; right, follow-up year 2)

The elastic net regression models showed similar results: less L1 regularization and a smaller L1 penalty parameter λ corresponded to a larger R2 (Figs. 4 and 5). This meant that less regularization presented better prediction ability. When more variables were included, the performance improved. However, the improvement was negligible when the number of variables was more than 20 (Figs. 4 and 5).

Fig. 4
figure 4

R2 of the elastic net regression for follow-up year 1 by number of optimal variables, penalty parameter, and L1 penalty ratio (left: training data; right: test data)

Fig. 5
figure 5

R2 of the elastic net regression for follow-up year 2 by number of optimal variables, penalty parameter, and L1 penalty ratio (left: training data; right: test data)

The performance metrics of the optimal models are summarized in Table 2. In general, the four models showed equivalent prediction ability for QALYs at follow-up year 1, with an R2 of approximately 0.57 and an RRMSE of approximately 9.6%. The results indicated that the linear regression model, whether regularized or not, may explain about 60% of the variance of 1-year QALYs after bariatric surgery, and the relative error of prediction is less than 10%, suggesting an excellent prediction ability.

Table 2 Performance of the models in terms of R2 and RRMSE

However, all the models showed poorer performance for predicting 2-year QALYs. Although the RRMSEs (approximately 16%) indicated a good performance of the models, the R2 of the models was only approximately 35%.

The selected variables for the optimal models are also presented in Table 2. In general, baseline SF-36 items, sex, age, height, weight, ongoing treatment, postoperative complications, and postoperative complications within 6 weeks etc., were the critical variables for predicting postoperative QALYs. The most parsimonious model was the elastic net regression model, which used fewer variables to predict the QALYs at follow-up year 1 while providing almost equivalent performance (Table 2).

The relative importance of the variables selected in each model for follow-up year 1 is shown in the supplementary Figure S1, which indicates that baseline SF-36 items BP2, SF2, VT2, PF10, MH1, MH4, RP3, RE2, and BP1 are relatively more important for the prediction. However, other variables contributed little to the prediction, which was partially due to that each of the multinominal variables was split into several dummy variables.

Discussion

In the current study, baseline indicators of importance for predicting post bariatric surgery QALYs were identified, including health-related quality of life, age, sex, BMI, smoking, glycemic control, pre- and post-operative complications within 6 weeks, may provide important information to predict postoperative QALYs of the patients. Additionally, factors unrelated to the patient, in particular year of surgery and annual hospital volume are also of importance. The results of the study could be useful for supporting decision-making in health resource allocation. The QALYs improved after bariatric surgery both at year 1 and year 2 relative to the baseline, with a slightly higher improvement from year 1 to 2 (1.50–0.72 = 0.78) compared with that from baseline to year 1 (0.72). This finding is in line with previous studies showing that bariatric surgery is effective in improving health among patients with morbid obesity [42].

The current study focused mainly on baseline characteristics and their prediction ability on postsurgical QALYs. The ability of baseline characteristics to predict post-surgical QALYs was generally effective at year 1 but showed a relative decrease at year 2. This indicates that relying solely on baseline characteristics may be useful to predict QALYs related to the initial weight-loss phase, but remains insufficient to predict long-term QALYs for which information on post-surgical weight, complications, behavioral traits and comorbidities may also be necessary [17, 18]. Additionally, other factors such as mental health and social support are likely to play significant roles in patients' HRQoL also [43, 44]. Baseline HRQoL (defined by the SF-36 items) was found to be important for both year 1 and 2’s QALYs across all the models, in line with previous studies [17, 18]. This was also confirmed by the relative importance of the HRQoL variables for the prediction of follow-up year 1 (Figure S1).

Both socioeconomic status and lifestyle factors are known to be strongly associated with health [20]. Specifically, age, sex, BMI, and socioeconomic status have all been associated with HRQoL changes in post bariatric surgery [13, 15]. These associations were confirmed in the present study (Table 2). In addition, smoking was an important predictor for postoperative health, which also concurred with a previous study [13]. Knowledge about these factors can help identify groups in need of more intense and individualized support in the preoperative, perioperative and postoperative settings. However, it remains important to stress that while some factors are associated with lower improvement in perceived health, all groups benefited from the surgical intervention and thus should not be excluded from this important treatment. In addition, suffering from a surgical complication was associated with reduced HRQoL improvement. This finding supports a previous study suggesting that patients who suffer from a serious postoperative complication have a higher use of antidepressants and opioids, as well as a higher rate of readmissions over two years after surgery [45]. On the other hand, preoperative use of antidepressants or chronic opioid use might be associated with increased risk for overall and serious postoperative complications [46, 47]. Increased awareness of the mental impact of a serious postoperative complication and the availability of increased support for patients who suffer from such complications may be important.

Our findings indicated that L1 or L2 regularization might improve the performance of the linear models used to predict the 1-year postoperative QALYs of bariatric patients; however, the improvement was minor. Twenty would be a proper number of the variables to be used to predict the 1-year QALYs based on the current data available. Although RRMSE is less than 10%, an R2 of 60% is moderate, which might be improved with more information at baseline.

Strengths and limitations

QALYs combine both quality of life (health utility) and quantity of life (life years) simultaneously and can summarize health outcomes from multiple follow-ups, which might provide a better overall measure of health improvement. Information from this study is useful for economic evaluation studies. The variables used in the study were all based on data from high-quality sources that are continuously validated.

However, there are also limitations in the current study. First, QALYs’ weights were based on the UK population, as no Swedish SF-36 tariff is currently available. Therefore, potential deviation in the QALYs estimates for the Swedish patient population might be exist, which needs to be verified in the future. Second, missingness in the current HRQoL data is substantial, especially for follow-up year 2, which is common in HRQoL studies. Although we did not detect a significant difference in baseline characteristics between the whole patient population and the patients who had follow-up information, the assumed missing at random mechanism for multiple imputation might not hold, and potential sampling bias could not be ruled out [48].

Conclusions

Patient characteristics before bariatric surgery including health-related quality of life, age, sex, BMI, smoking, glycemic control, pre- and post-operative complications within 6 weeks, may provide important information to predict postoperative QALYs of the patients. Additionally, factors unrelated to the patient, in particular year of surgery and annual hospital volume are also of importance. Knowledge of these factors can help identify groups in need of more intense and individualized preoperative, perioperative and postoperative support.