Introduction

Health state utility values (HSUVs) quantify the degree of preference for a particular health state [1]. In the model-based economic evaluations, the acquisition of precise HSUVs for various health states is crucial. HSUVs are utilised to integrated survival time and quality of life into Quality Adjusted Life Years (QALYs), which are integral to the evidence base in pharmacoeconomic analyses [2]. These values can be gauged through direct or indirect methods, with the resulting figures needing to be aligned with a standard value tariff derived from the general population to determine the equivalent HSUVs. The choice of measurement instruments and value sets can significantly influence HSUVs. Moreover, patient demographics, the treatments, and the distinct health outcomes associated with different complications are key factors that influence HSUVs. Research indicates that 36% of the HSUVs cited in existing literature have to be adjusted due to a lack of clarity; variables such as age, sex, and side effects can markedly affect the magnitudes of HSUVs [3]. Thus, the careful selection of HSUVs is pivotal in diminishing uncertainty within economic modelling.

The prevalence of type 2 diabetes (T2DM) is extensive [4], serving as a principal catalyst for global mortality rates [5]. As T2DM progresses, it often gives rise to multiple complications that can significantly degrade quality of life and may even result in mortality [6, 7]. This widespread condition consequently incurs substantial health resource utilisation. The tension between this immense financial strain and constrained healthcare resources necessitates that health systems perform health technology appraisals, particularly economic evaluations, for therapeutic agents, including the evaluation of new medications that are continuously introduced. Such evaluations are vital for the judicious distribution of societal resources and for extending benefits to a broader patient demographic. This underscores the necessity for the judicious selection of HSUVs for T2DM patients across different health states.

The research concerning the HSUVs of diabetes is densely populated with studies. Among these, the systematic review by Redenz et al. [8] assessed HSUVs of T2DM and its complications, summarising how complications, evaluative methodologies, and national backgrounds could influence outcomes. Mok et al. [9] built a suite of reference sets specifically for T2DM complications in East and Southeast Asia, attributing independent variables in study results to nationality, assessment instruments, and value sets. Jing et al. [10] found that several factors including physical activity, glucose monitoring frequency, co-morbidities or co-existing conditions such as hypertension, duration of diabetes, dietary patterns involving red meat, and mental health factors like depression, contributed to the variability in HSUVs for individuals with T2DM. While these investigations have recognised that patient characteristics, complications, nationality, reference sets, and assessment instruments bear upon the average health utility value, they have not quantified the statistical association between these diverse factors and HSUVs, nor have they offered concrete guidance for selecting HSUVs for T2DM in various decision-making contexts. Our research will build on these studies to further clarify the association between these factors and HSUVs of T2DM and provide guidance on choosing the proper HSUVs for the future.

Based on this notion, Wang et al. [11] conducted a systematic review and meta-regression to examine the association between health state utility values (HSUVs) and factors such as age, health status, treatments received, and timing of utility measurements. Age was used as an independent variable to assess its impact on HSUV in older women diagnosed with breast cancer. The study found that the mean HSUV declines as health status worsens, with age playing a significant role in determining health utility values in this population. Specifically, the study reported a decrease in breast cancer-specific utility of -0.001 per one-year increase in age (95% CI: -0.004, 0.002). This work highlights the effectiveness of meta-regression in exploring the relationships between patient demographics, treatment variables, and HSUV, and it serves as a methodological model for our current study. The purpose of this systematic review was to consolidate and quantitatively analyse through meta-regression the HSUVs associated with T2DM and its complications. The aim was to ascertain statistical associations and devise a statistical model [11, 12] that will enable analysts to select health utility value estimates that are most pertinent to their specific policy or clinical decision-making contexts.

Method

The systematic review aimed to identify previously published studies reporting HSUVs for T2DM and its complications according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA 2020) [13]. The protocol was PROSPERO-registered (CRD42023432948).

Inclusion and exclusion criteria

Eligibility for inclusion required articles to (1) report health state utility values (HSUVs) of T2DM; (2) use either direct or indirect methods for HSUVs assessment of the participants; (3) be published in the English language. Articles were excluded for the following reasons: (1) Inclusion of participants who were pregnant or diagnosed with gestational diabetes. (2) Lack of separate reporting for T1DM and T2DM. (3) Document participants’ health status without corresponding utility value estimations. (4) Studies that do not involve original human research, including reviews, reports, conference proceedings, and guidelines. (5) Non-English publications.

Literature search

The search for studies adhering to our inclusion criteria was conducted in two distinct phases. Initially, studies published from January 2000 to April 2020 were identified from the systematic review by Redenz et al. (2023) [8] and its related journal articles. This review comprehensively identified T2DM and its complications’ HSUVs measured using preference-based instruments (such as Standard-Gamble (SG), Time Trade-Off (TTO), the Health Utilities Index mark 3 (HUI-3), the Three-Level EuroQol Five-Dimensional Questionnaire (EQ-5D-3L) and the Five-Level EuroQol Five-Dimensional Questionnaire (EQ-5D-5L)) through three steps: (1) structured search in electronic databases including MEDLINE and Cochrane Library; (2) a free-term search in the School of Health and Related Research Health Utilities Database (ScHARRHUD); (3) a complementary search from the references of previously published systematic review and journal articles. This study shared similar inclusion and exclusion criteria as well as search strategies to those used in the review by Redenz et al. (2023) [8]. To ensure comprehensive coverage of the literature, we searched mainstream medical databases. Given the alignment in focus, the review by Redenz et al. (2023) [8] served as a valuable source for identifying studies reporting T2DM HSUV from 2000 to 2020. From this initial collection of references, we identified and retrieved studies that utilized direct or indirect instruments to measure HSUV for full-text review.

In the second stage, we conducted a comprehensive search through PubMed, Embase, and Web of Science database using a structured strategy to aggregate all relevant literature on HSUVs for T2DM and its complications from inception to March 2024. (Supplementary Materials Appendix 1 and Appendix 2).

Study selection

Two independent reviewers (YRX & HTS) initially evaluated the titles and abstracts from the electronic database search based on pre-determined inclusion criteria. The level of agreement between reviewers was quantified using intra-class correlation coefficients (ICC) [14]. According to the ICC scale, values are interpreted as follows: below 0.50 indicates poor reliability, 0.50 to 0.75 suggests moderate reliability, 0.75 to 0.90 reflects good reliability, and above 0.90 represents excellent reliability [15]. The reviewers (YRX & HTS) assessed the full texts of studies that met the eligibility requirements. Any disagreements encountered during this phase were resolved  by consulting a third reviewer (YW) to confirm the final selection of studies, ensuring stringent adherence to the inclusion and exclusion criteria throughout the review process.

Data extraction

All authors have agreed to develop a data extraction table beforehand. The data of interest should include the primary outcome and secondary outcomes. The primary outcome included the HSUVs of patients with T2DM and their complications [7, 16,17,18]. The secondary outcomes included (i) General characteristics of the patients, including gender, age, race, weight, blood pressure, and the duration since T2DM diagnosis etc. (ii) Characteristics of the study, e.g. the title, author(s), year of the study, country, country/region of respondents, research objectives, study design, sample size, sampling method, inclusion/exclusion criteria, selection and recruitment of respondents, and any other potential study issues. (iii) Health utility assessment methods, including diagnostic criteria for T2DM, instruments used for measuring HSUVs, value sets utilized, evaluation standards, statistical approaches.

Quality assessment

Given the absence of established reporting and evaluation criteria for assessing the quality of studies reporting HSUVs, it might be inappropriate to select an assessment list solely based on the design of the primary study [12, 19]. Quality assessment of HSUV studies may usefully focus on the selection and recruitment of respondents, inclusion and exclusion criteria, and the description of the background characteristics of the sample population from which value is derived [20]. To address this, we have extracted four questions (Table 1) from the 17-question evaluation tool developed by Nerich et al. [12, 21, 22] (Full appraisal tool in Appendix 3), which are tailored for assessing study quality. Zoratti et al.‘s systematic review of HSUV appraisal tools recognises these four items as being particularly suited for the critical evaluation of HSUV literature in health utility research [23].

Table 1 Four questions for quality appraisal

Data synthesis

We describe the characteristics of the included subjects using descriptive statistics and present the key statistics of the HSUVs, including the mean (with standard deviation: SD or standard error: SE), median (with interquartile range: IQR), and the range of variation (or 95% confidence interval: 95% CI). The study results are presented in narrative and graphical form, with detailed categorization of the T2DM population according to different health conditions and instruments. For studies where SD were not directly given, the missing SD were estimated using the mean, sample size, SE, or 95% CI, as recommended by the Cochrane Library [24]. We attempted to integrate the acquired data using meta-analysis, but the variability of countries, measurement modalities, patient characteristics and characteristics of the disease itself made the integrated results highly heterogeneous and not directly usable; thus, the HSUVs were synthesized through a meta-regression following the methodology of Wang et al. (2022) [11] to determine the association between HSUVs and various independent variables. The large number of values identified for each state of T2DM allowed us to synthesize the data using meta-regression. We applied a linear regression model, with the average HSUV from each study serving as the dependent variable. The method was simple, pooled, ordinary least squares. Several variables that could potentially influence HSUVs were used as independent variables, and the rationale for selecting these variables is detailed below.

Factors such as complications, instruments, tariffs, nationality, and general patient characteristics, including hypertension and diabetes duration, could affect the measurement and valuation of HSUVs suggested from the reviews by MOK et al. [9], Jing et al. [10], and Redenz et al. [8]. Additionally, hyperglycaemia [25], the increase in body mass index (BMI) [26], and age [27] were considered significant risk factors for the development of T2DM. The analysis incorporated several variables that might influence HSUVs: disease health state, utility measurement instrument, valuation tariff, mean age, duration of diabetes, blood pressure, and BMI. Variables such as disease health state (e.g., T2DM or T2DM with cardiovascular disease), utility measurement instruments (e.g., EQ-5D-3L or EQ-5D-5L), and valuation tariffs (e.g., UK or US) were defined as categorical variables. Due to variations in how literature reports on mean age, diabetes duration, blood pressure, and BMI as continuous variables or intervals, the scarcity of literature on these variables was also defined as categorical variables. To avoid collinearity among categorical independent variables, other study characteristics such as the country of the study, the study design (clinical trials or observational study), and the study population were excluded from the meta-regression as independent variables. Treatment was also excluded if the treatments were not reported precisely.

Given the varying sample sizes and error magnitudes associated with each variable, the contributions of individual observations to changes in the regression model differed. To address this, we assigned more significant weight to values from studies with smaller SDs of the mean estimate than those with more significant SD. Consequently, we evaluated three regression model specifications. The first model specification used the estimated sample size of each HSUV as a weighting coefficient, recognizing that not all studies provided. The second specification employed the reciprocal of the estimated sample SD for each HSUV as the weighting coefficient, considering that studies with more minor SDs provided more reliable utility values. The third specification did not include any weighting coefficients. We used cluster-robust SEs to account for within-study correlations, given that some studies contributed multiple HSUVs to the meta-regression, which were likely to be correlated [28]. The coefficient of determination (R²) was used to assess the goodness of fit [29]. The meta-regression analysis used Stata 18.0 (Stata Corp, College Station, TX) [30].

Results

Selection of studies

Seventy-six eligible articles were identified from the systematic review by Redenz et al. [8] and its related journal articles, and 6392 articles from the database search (Fig. 1). 118 studies met the inclusion criteria and were included in the systematic review. The inclusion process is illustrated in Fig. 1, and the reasons for exclusion are shown in Supplementary Appendix 4. The ICC value indicated good to excellent reliability between reviewers (The ICC between the two reviewers was 0.90).

Fig. 1
figure 1

PRISMA flow diagram for selection of studies

Study characteristics

1044 HSUVs were collected from 118 manuscripts involving over 44 countries and regions (Table 2). Of 1044 HSUVs, 977 HSUVs reported mean values, of which 732 HSUVs reported mean values with SD and 245 only reported mean HSUVs without SD. 67 HSUVs reported median values with interquartile intervals. Of the 977 HSUVs that reported mean values, 25 health states were defined, including 11 complications: T2DM with cardiovascular diseases (n = 68, n means the number of the HSUVs), T2DM with cerebrovascular disease (n = 29), T2DM with diabetic foot (n = 9), T2DM with hypoglycemia (n = 22), T2DM with macrovascular disease (n = 5), T2DM with microvascular and macrovascular disease (n = 6), and T2DM with microvascular disease (n = 10), T2DM with nephropathy (n = 23), T2DM with neuropathy (n = 15), T2DM with peripheral vascular disease (n = 16), T2DM with retinopathy (n = 38). The pooled HSUVs for T2DM and each complication stratified by instrument are presented in Fig. 2. Of the 732 HSUVs reporting mean values with SD, 441 reported the HSUVs of T2DM with or without complications. The pooled HSUVs are presented in Fig. 3.

Table 2 Characteristics of identified studies
Fig. 2
figure 2

Utility values for health states stratified by instrument

Fig. 3
figure 3

Health state utility values of T2DM by instrument

Of 977 HSUVs, nine different valuation instruments were used, with the EQ-5D-3L being the most widely used (n = 751), followed by the EQ-5D-5L (n = 122). Other instruments such as short-form 6-dimension (SF-6D) (n = 13), 15-dimension (15-D) (n = 1), SG(n = 2), the HUI mark 2 (HUI-2) (n = 9), HUI-3 (n = 50), feeling thermometer (FT) (n = 9), TTO (n = 20), etc. were applied less frequently. 31 different tariffs were applied, including EQ-5D-3L UK tariff (n = 263), EQ-5D-3L US tariff (n = 139), and EQ-5D-3L China tariff (n = 125) being the most widely used.

Among the 977 HSUVs, the T2DM without any complications (n = 14, mean:0.87; median:0.88; range: 0.78–0.95) had the highest mean HSUV. In contrast, the mean HSUV for patients with T2DM, with or without complications, was lower (n = 573, mean:0.80; median: 0.82; range: 0.39–0.95). For the subset of T2DM with complications, the mean HSUV were reported in the manuscripts  (n = 22, mean: 0.65; median: 0.66; range: 0.52–0.88) [44, 65, 86, 118, 130, 139], which was lower than the estimate of the HSUV from all publications for T2DM patients with complications (n = 263, mean: 0.72; median: 0.72; range: 0.40–0.93). Of these 573 HSUVs, compared with HSUVs by EQ-5D-5L (n = 95, mean: 0.83; median: 0.83; range: 0.61–0.94) and EQ-5D-3L (n = 418, mean: 0.80; median: 0.82; range: 0.39–0.95), those measured by 15-D (n = 1, mean: 0.89; median: 0.91; SD: 0.09) had the highest value, and HUI-3(n = 17, mean: 0.70; median: 0.68; range: 0.59–0.86) had the lowest.

By complication, the highest HSUVs were T2DM with neuropathy (n = 15, mean: 0.77; median: 0.79; range: 0.62–0.85), while the lowest HSUVs were T2DM with cerebrovascular disease (n = 29, mean: 0.65; median: 0.67 range: 0.42–0.82). Of 732 HSUVs reported mean values with SD, the mean HSUVs of T2DM with complications, T2DM with cerebrovascular disease and T2DM with microvascular disease had a decrement of 0.01. In contrast, the mean HSUVs of T2DM with cardiovascular diseases had a decrement of 0.02.

Quality assessment

All studies fully or partially addressed the four questions from the quality appraisal tool, with seventy-five studies (63.6%) providing detailed reports on these aspects. Specifically, 81.3%, 83%, 90.6%, and 98.3% of the studies provided thorough reporting on the four quality assessment issues. The vast majority (83%) adequately described the measurement instruments used, and almost all (98.3%) provided detailed information on the characteristics of the study population. Based on this quality assessment, these 118 studies were classified as high-quality (Table 3).

Table 3 Quality assessment

Regression analysis

Table 4 reports the results of the meta-regression analyses. The model weighted by sample size had better fit goodness (R2 is 0.6238, which is greater than the unweighted 0.4316 and SD weighted 0.4537).

Table 4 Regression models for HSUVs

In sample size weighted specification, differences in the choice of instruments significantly affected the HSUVs; SF-6D (0.083, 95%CI: 0.064, 0.101) was estimated to have the highest positive coefficient, while FT (-0.139, 95%CI: -0.195, -0.084) had the lowest negative coefficient. Meanwhile, the variables for tariff had a statistically significant (p < 0.05) association with the mean HSUV. Based on EQ-5D-3L UK tariff, the EQ-5D-3L US tariff (0.015, 95%CI: -0.008, 0.038), EQ-5D-3L Chinese tariff (0.08, 95%CI: 0.052, 0.107) and EQ-5D-3L Japanese tariff (0.061, 95%CI: -0.003,0.125) had positive effect on the mean HSUV, while EQ-5D-5L UK tariff (-0.068, 95%CI: -0.166, 0.03) had negative effect. Among these, the state of T2DM with diabetic foot (-0.17, 95%CI: -0.192, -0.147) resulted in the largest negative coefficient, while the state of T2DM without complications (0.023, 95%CI: 0.001, 0.046) resulted in the biggest positive coefficient.

The regression results for the mean age showed that the increments of HSUVs (0.029, 95%CI: -0.055, 0.112 for ages 50–60; 0.025, 95%CI: -0.058, 0.108 for ages 60–70) aligned with age increasing for the cohort of age less than 70 years, but this result was not statistically significant. While the HSUVs decremented for the duration of illness exceeding ten years (-0.006, 95%CI: -0.002, 0.008), hypertension (-0.025, 95%CI: -0.108, 0.058), overweight (-0.088, 95%CI: -0.135, -0.041) and obesity (-0.071, 95%CI: -0.103, -0.039), although the result for duration and hypertension had no statistical significance (p > 0.05).

Discussion

This study provided a valuable set of utility values for patients with T2DM to support future economic evaluations and decision-making. We synthesised 118 studies to summarise the HSUVs for patients with T2DM and its 11 complications, and the effects of different measurement instruments on HSUVs. In addition, meta-regression quantified the disutility associated with disease-related complications in patients with T2DM and estimated modifiers of HSUVs by controlling for country, selected measurement instrument, age, disease duration, blood pressure and body mass index. Overall, these estimates improved the robustness of the evidence for future quality-of-life studies and health economic assessments of patients with T2DM.

The economic evaluation of diabetes-related interventions relies heavily on HSUVs as an outcome measure of the impact of different factors on patients’ quality of life [150, 151]. It has become a consensus among health providers that having complications leads to a reduction in the health utility value of patients with T2DM [49, 121], and it is therefore important to incorporate this reduction in HSUVs in economic evaluations to improve the robustness of QALY estimates. The influence of other key associates such as country, instrument, tariff and general patient characteristics such as blood pressure, duration of illness, on HSUVs has already been assessed in existing studies (MOK et al. [9], Jing et al. [10], and Redenz et al. [8]). Our study also included these factors as control variables to quantify the associations between these variables and HSUVs. In addition, possible influences such as age and BMI were also included as variables to strengthen the model goodness-of-fit. Our meta-regression results provided new insights for future studies of T2DM-related management decisions by healthcare analysts.

Greenland et al. [152] noted that relying solely on statistical significance is inadequate for drawing inferences or making decisions about associations or effects. Meanwhile, when it comes to utilizing health economic evidence to inform healthcare decision-making, decisions are based on the incremental expected costs and health benefits of care, irrespective of statistical significance [153]. Thus, although there were no statistically significant associations between certain variables and HSUVs, our analysis quantified the incremental or decremental utility values can still be used for healthcare decision-making. First, the association between HSUVs measuring instruments and quality of life remains controversial. Our study suggested that different instruments would bring about different degrees of incremental or decremental HSUVs, which was consistent with the findings of Redenz et al. [8] and supported by the study of Glasziou et al. [154]. Lung et al. [155] noted that the utility scores obtained from the TTO method were greater than those obtained from the EQ-5D, which were greater than those obtained from the HUI2, HUI3, and SF-6D. However, our study yielded different trends and variations in the magnitude of coefficient in values. Meanwhile, it has also been shown that the utility decrements were comparable between the instruments, EQ-5D and SF-6D [9]. The validity of future research on HSUVs may require additional attention to be cast on the incremental and decremental utility values derived from specific instruments. Second, tariff differences also affected the measurement of HSUVs, the US (EQ-5D-3L), and China (EQ-5D-3L) tariff brought increments to mean HSUVs compared to the UK (EQ-5D-3L). The effect of the tariffs on HSUVs was interpreted as differences due to different socio-demographic factors in the study by Sullivan et al. [129]. Despite the observed variation in tariff application across regions such as the European countries, the United Kingdom, and the United States—which share similar ethnic compositions—it is advisable to employ tariffs that are representative of their respective jurisdictions to ensure high relevance and accuracy.

T2DM patients with complications usually have lower HSUVs than T2DM patients (with or without complications). The idea that complications have a negative impact on the quality of life of T2DM patients has been confirmed in several studies [8,9,10, 156,157,158]. Compared with other published studies, the utility reduction due to complications ranged from − 0.007 to -0.177 in Mok et al. [9] and from − 0.007 to -0.17 in our study. The lowest utility value was for end-stage renal impairment in the study by Lung et al. [155] with a utility value of 0.48 (0.25–0.71); however, in our study, the complication with the lowest HSUVs was for diabetic foot (-0.17, 95%CI: -0.192, -0.147). This difference may be attributed to differences in the countries, tariffs, assessment instruments, and the essential characteristics of the population. Our study provided the correlation between 9 instruments and 31 tariffs, in contrast to previous studies of Shao et al. [121], which only included a single instrument and tariff and Mok et al. [9], which included a limited number of countries and regions. In addition, advances in therapeutic strategies, medical treatments, and the progression of complications may also account for this difference between the size of the negative coefficients.

In the model weighted by sample size, the decremental trend of HSUVs in disease duration, blood pressure, overweight and obesity on HSUVs was consistent with previous studies [10, 149, 158]. Meanwhile, the results of the positive correlation between age <70 and HSUVs were consistent with the findings of Imayama et al. [157], which may be explained by the increased satisfaction with the quality of life associated with increasing age. Although age ≥ 70 leads to negative coefficients, there are two reasons to explain this phenomenon: firstly, older age ≥ 70 typically corresponds with lower HSUVs due to weaker physical functioning, higher complication rate and acute mortality rate [4]; secondly, the number of utility values included in the regression was only 8, so the results are not highly credible. The controversial effect of age on the HSUVs needs to be verified by further research [33].

Utility estimates naturally vary depending on factors such as study design, utility measuring instruments, health status classification, demographic characteristics and tariffs valuation [119, 127, 129, 159]. Ideal data for decision-making must take these factors into account [160]. One of the strengths of our study is that we have expanded extensively on these factors to include more comprehensive variables, we cover a wide range of HSUVs triggered by direct or indirect measurement instruments, covered 31 tariffs and across 11 complication states, and, for the first time, synthesised them using meta-regression to provide a range of reference values. Decision makers can select the most appropriate HSUVs based on their specific variables to robustly support future economic evaluations.

One limitation of this review is the search process. We searched only three databases, PubMed, Embase, and Web of Science, while identifying published manuscripts from peer-reviewed scholarly journals but potentially ignoring grey literature, unpublished work, and other data sources. Bramer et al. reported a 92.8% search rate for Medline and Embase, highlighting the robustness of these databases in identifying relevant studies [161]. Therefore, any potential omissions are unlikely to have significantly affected the overall findings of our study. The measurement of T2DM health utility values in countries around the world is carried out using a variety of standardised and validated instruments, and the diversity of the value sets is determined by differences in demographic characteristics in different countries, which inevitably leads to the high number of variables we included in the meta-regressions, resulting in small sample sizes for some variables. This is the second limitation of our study, and this under-observation prevents us from modelling even the full diversity of methods used to generate utility values, which may affect the model’s reliability. Another area of uncertainty lies in the inability to determine the impact of gender distribution in the ill population on quality of life and utility values. In addition, studies often did not adequately account for the timescales involved, either from the stage of the condition or the start of treatment. Similarly, 18.7% and 17% of the studies did not adequately explain the methods used to derive the utility values. However, our analysis found that differences in measurement methodologies significantly impacted HSUVs. To maximize the inclusion of available data, we opted not to exclude these studies, provided they addressed at least some of the four quality assessment questions. As more studies of HSUVs for people with T2DM are published, these effects could be further explored to improve the validity of meta-regression model estimates and the quality of the evidence to inform healthcare decisions.

Conclusion

Our study quantified the extent to which 11 complications, adjusted for valuation instruments and tariffs, affected patients’ quality of life, reinforcing the HSUVs evidence base and informing future decision-making processes about patients with T2DM. Analysts can use the data sources provided in this review to identify specific HSUV estimates most appropriate to their decision-making. Estimated condition-specific incremental decrements in health utility would provide more robust evidence for researchers to improve the quality of economic assessments in diabetes.