FormalPara Key Points for Decision Makers

This study provides an algorithm to map the condition-specific Insomnia Severity Index (ISI) to the EQ-5D-3L utilities, a general health-related quality-of-life questionnaire commonly used in health technology assessment.

The mapping algorithm builds upon previous work, expanding it through the use of a large, multinational dataset and adjustment for respondent characteristics.

1 Introduction

According to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) [1], insomnia disorder is a sleep disturbance (difficulty falling sleep, staying asleep or early morning awakening) marked by predominant dissatisfaction with sleep quantity or quality that is associated with substantial distress and impairments of daytime functioning. The sleep difficulty must occur at least 3 nights per week and be present for at least 3 months, while occurring despite adequate opportunity for sleep [1]. The prevalence of insomnia varies widely depending on the definition used; it ranges from an estimated 6 to 10% (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition [DSM-IV] criterion) to approximately one-third of the general population (at least one symptom of insomnia) [2]. Insomnia disorder induces fatigue and mood disturbance, and impairs social, vocational, educational, and behavioural functioning [3]. In a bidirectional relation, insomnia increases the risk of psychiatric and medical comorbidities (including anxiety and depression, obesity or weight gain, obstructive sleep apnoea and hypopnea syndrome) [4, 5]. Insomnia impacts next-day functioning, health, and quality of life (QoL) and results in substantial humanistic and economic burden for the healthcare system and society [6, 7].

Several insomnia-related generic and disease-specific patient-reported outcome (PRO) instruments have been used to identify and describe the condition. These instruments include, but are not limited to, the 36-Item Short-Form Health Survey (SF-36), the Leeds Sleep Evaluation Questionnaire, the Medical Outcomes Study Sleep Scale 12, the Epworth Sleepiness Scale, the Functional Outcomes of Sleep Questionnaire, the Pittsburgh Sleep Quality Index, the Insomnia Daytime Symptoms and Impacts Questionnaire (IDSIQ)Footnote 1 and the Insomnia Severity Index (ISI). The ISI is one of the most commonly used disease-specific measures for self-perceived insomnia severity [8]. It comprises seven items assessing the severity of sleep onset and sleep maintenance difficulties (both nocturnal and early morning awakenings), satisfaction with current sleep pattern, interference with daily functioning, worry about sleep, and sleep dissatisfaction [9]. Each item is rated on a 0–4 scale and the total score ranges from 0 to 28 (with higher score denoting higher severity). The recall period is 2 or 4 weeks. To date, the ISI has been used as a PRO tool to diagnose insomnia cases, document insomnia prevalence and burden, assist clinicians in their initial evaluation of patients, determine the need for treatment, and evaluate treatment response.

The EuroQol EQ-5D and other preference-based measures are rarely reported in insomnia-related clinical trials, partially due to the lack of sensitivity of the EQ-5D for disease-specific health complaints such as fatigue and cognitive problems [10]. Therefore, as recommended in the latest guidance by the National Institute for Health and Care Excellence (NICE) [11], sleep-specific instruments need to be mapped to QoL health state utility scores to enable cost-utility analyses.

Mapping of insomnia-specific PROs to the EQ-5D has been detailed in prior literature. In August 2021 [12], NICE released a guideline that included a comparison of the different types of continuous positive airway pressure machines. The treatment effect of continuous positive airway pressure machines for obstructive sleep apnoea was measured on the Epworth Sleepiness Scale and mapped to the EQ-5D using an ordinary least squares (OLS) regression [13]. Regarding the ISI, in a 2011 study focused on insomnia, Gu et al. employed a generalised linear model (GLM) to map the ISI’s seven items (Model I), summary score (Model II), and clinical categories (Model III), onto the EQ-5D-3L [14]. However, the mapping function of Gu et al. is limited by the use of a United States (US) dataset, the use of a single type of model (a GLM), and the lack of adjustment for confounding factors (e.g. comorbidities).

The primary objective of this study was to perform a new mapping between ISI and EQ-5D using a large representative dataset that includes both European (France, Germany, Italy, Spain and the UK) and US data.

2 Methods

This mapping study has been conducted following the recommendations of the Professional Society for Health Economics and Outcomes Research (ISPOR) Task Force report on ‘Mapping to estimate health-state utility from non-preference-based outcome measures’ [15].

2.1 Data Source

Observational studies with representative patient groups, large sample sizes, and reporting of potential confounding factors (e.g. age) are the preferred mapping datasets [15]. The mapping dataset used in this study came from the 2020 National Health and Wellness Survey (NHWS) for France, Germany, Italy, Spain, the UK and the US. The NHWS is a cross-sectional, self-administered, nationwide, internet-based survey of adults (aged ≥ 18 years) that is fielded annually in select global markets. Respondents were recruited through a general purpose, web-based consumer panel via channels such as opt-in e-mails, co-registration with panel partners, and e-newsletter campaigns. A stratified random sampling procedure, with strata by sex, race/ethnicity (in the US) and age, was implemented to obtain a representative sample of the adult population in each country. The protocol and questionnaire for the NHWS were reviewed and granted exemption status by Pearl Institutional Review Board (IRB; Indianapolis, IN, USA; 19-KANT-204).

2.2 Study Population

Respondents were included in the modelling sample if they had reported experiencing insomnia in the past 12 months and had completed the ISI measure. Respondents excluded from the study population were those (1) reporting that they had experienced or been diagnosed with narcolepsy, sleep apnoea or sleep difficulties other than insomnia in the past 12 months; (2) reporting that they had experienced or been diagnosed with another serious condition (any type of cancer, chronic liver disease, cirrhosis, epilepsy, multiple sclerosis, muscular dystrophy or Parkinson's disease); or (3) who were pregnant at the time of data collection.

2.3 Data Inputs

NHWS data used in the mapping included sociodemographic and general health characteristics, comorbidity burden, insomnia-related measures (including ISI), current insomnia treatment, and the EQ-5D-5L health states. The respondent characteristics used as covariates for model fitting included age, sex, marital status, employment status, education level, smoking status, drinking status, comorbidities (both individual conditions and the aggregate Charlson Comorbidity Index [CCI] score [16]), body mass index (BMI), insomnia diagnosis status, treatment for insomnia status, and geography. To enable modelling of all relevant country-specific observations using a similar tariff, and to align with NICE guidelines, a UK perspective was adopted and all EQ-5D-5L health state data were converted to EQ-5D-3L utility scores using the crosswalk function of Hernández-Alava and Pudney [17].

2.4 Modelling Approach

When mapping, it is recommended that specific aspects of the EQ-5D distribution should be accounted for, i.e. the presence of large spikes, upper and lower limits, skewness, multimodality, and gaps in the range of feasible values [15]. Figure 1 illustrates the distribution of EQ-5D-3L utility data from the NHWS for the insomnia cohort. The left skew of the data can easily be converted to right skew by transforming from utility to disutility using the simple linear transformation: disutility = 1 − utility (Fig. 1).

Fig. 1
figure 1

EQ-5D-3L index a disutility and b utility distributions for the National Health and Wellness Survey dataset (N = 17,955). EQ-5D-3L EuroQol 5 dimensions 3 levels

We selected and tested multiple potentially appropriate models. First, we mapped EQ-5D-3L and ISI using OLS regression as a reference model. As well as being the most frequently used mapping model [18], OLS performs well in mean prediction [19]; however, some commentators have suggested that OLS results in systematic bias when data are not continuously distributed, as is the case for EQ-5D utility data [15]. The censored least absolute deviation (CLAD) was also tested to tackle the heteroscedasticity, non-normality, and the ceiling of the EQ-5D at 1. CLAD is a form of median regression and it is expected to perform well on the mean absolute error (MAE) metric. The question is whether or not it minimises mean square prediction error [19]. The ISI mapping work performed by Gu et al. [14] was also replicated; a gamma-log GLM was used in this analysis. To fit this model, the EQ-5D-3L was transformed into disutility as 1 − utility, so that the natural left skew of the data became a right skew (Fig. 1). A gamma-log GLM is considered an appropriate model choice because it accommodates for the skewness of EQ-5D data and prevents predictions outside of the data range. Finally, the adjusted limited dependent variable mixture model (ALDVMM) was used to take into account both the limited nature and the multimodality trait of the EQ-5D [20]. Regarding the ALDVMMs , one should not rely on single point-estimates when handling them, but rather re-initialise the process multiple times and experiment with different optimisation methods [21]. For this reason, several methods (Broyden–Fletcher–Goldfarb–Shanno, conjugate gradient, Nelder–Mead, nlminb, Rcgmin, and Rvmmin) were employed, along with several initialisation options (zero, constant, and sann); only the best results were kept. To investigate the appropriate number of mixture models (i.e. components) for our dataset, we fitted four different ALDVMMs ranging from two to five components. Since the ALDVMM with five components did not converge, only the two to four component models are described in this study.

2.5 Covariate of Interest

ISI was the covariate of interest. We used direct mapping models by regressing the ISI total score as a continuous variable varying from 0 to 28 onto EQ-5D-3L utility.

2.6 Adjustment Covariates

Candidate covariates were derived from the information gathered in the NHWS. All models were adjusted on the entire list of available covariates that could potentially act as confounding factors. Dummy variable coding was used for all binary and categorical variables; continuous variables (including ISI) were standardised.

Binary variables encoded the respondent’s sex (male, female); regular experience of pain (yes, no); having obtained an undergraduate degree or higher (yes, no); marital status (married/living with partner, single/never married/divorced/separated/widowed); current employment (employed [full time/part time/self-employed], not employed) and retirement status (retired, not retired); smoking status (current smoker, former smoker, never smoked); drinking habits (heavy drinker: 4+ times per week, low/moderate, abstains); DASD (a variable denoting self-reported depression, anxiety or post-traumatic stress disorder [yes, no]); and country (UK nationality [yes, no]). Age, BMI and the CCI were encoded as continuous variables.

With regard to insomnia-related variables, self-reported clinician-diagnosed insomnia (yes, no) and self-reported prescribed treatment for insomnia (yes, no) were included as binary variables. Potential interactions with ISI were tested for both the self-reported diagnosis and self-reported prescribed treatment variables since these potentially relate to the severity of insomnia. Ultimately, only the Treated × ISI interaction variable was retained as there was no evidence of interaction with self-reported diagnosis.

2.7 Statistical Analysis

First, the models were fit to the full dataset, and, in the case of the ALDVMM, the optimum number of components were assessed. We then explored the predictive validity of the models by splitting the entire dataset randomly 50/50 into training and validation datasets. Models were then fit on the training dataset and the predictive ability of the models was assessed on the validation dataset. Continuous variable standardisation was performed using mean and standard deviation (SD) from the training dataset. Due to the time taken to fit the ALDVMMs , only the best fitting model from the different number of components was taken forward to this train and testing stage. For the best fitting model, the same process of optimising the model was followed as described above. This process was repeated 100 times and the results presented as averages across those 100 repetitions.

Metrics employed to measure the fitting performance were the log-likelihood, Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC), which provide means for comparisons of specifications within model types [15]. The AIC and BIC provide measures of model performance that account for model complexity by penalising the model for the number of parameters included. In addition, MAE, mean-squared error (MSE), and the coefficient of determination (R2) were computed to measure the predictive performance. They are mainly used to evaluate how far observed values differ from the average of predicted values. Graphical representations of model performance depicting EQ-5D observed versus predicted values were created for each mapping model.

The predicted EQ-5D utilities from the best fitting models developed as part of the current study were compared with those obtained in a mapping algorithm between the ISI and the EQ-5D, previously published by Gu et al. [14].

The statistical analyses and modelling procedures were performed using publicly available libraries in R (aldvmm [20]), and SAS 9.4 (SAS Institute Inc., Cary, NC, USA).

3 Results

3.1 Sample Characteristics

A total of 17,955 respondent observations were included in the overall model (Table 1). Most respondents came from the US (n = 8920), followed by the UK (n = 2128), France (n = 2069), Italy (n = 1853), Germany (n = 1792) and Spain (n = 1193). Among the overall sample, the mean (SD) age was 46 (16.3) years, the majority of respondents were female (66.8%), married/living with partner (54.9%), had less than a 4-year university degree (54.9%), had some type of employment (58.7%), never smoked (49.0%), were low/moderate drinkers (61.2%), did not have any diagnosis of pain (58.2%) or depression/anxiety/post-traumatic stress disorder (58.9%), and had a mean (SD) BMI of 26.7 (6.5) and a mean (SD) CCI of 0.30 (0.72). The sex imbalance is corroborated by epidemiologic literature on insomnia [2]. Most respondents were experiencing insomnia but had not been diagnosed by a clinician (72.9%) and currently received no treatment for their insomnia (86.5%). Mean (SD) EQ-5D-3L scores for the overall sample, France, Germany, Italy, Spain, UK, and the US were 0.71 (0.23), 0.72 (0.21), 0.69 (0.23), 0.75 (0.17), 0.78 (0.18), 0.68 (0.26), and 0.71 (0.23), respectively. EQ-5D-3L disutility and utility (Fig. 1) distributions across the full sample combined were positively skewed and negatively skewed, respectively, which is consistent with expectations.

Table 1 Sample characteristics of the NHWS 2020 for the whole sample, UK, US, France, Germany, Italy and Spain

3.2 Model Performance

Table 2 shows the analysis aiming to optimise the number of components to select the appropriate ALDVMM. The ALDVMM with three components (hereafter, ALDVMM3) reported the best performance metrics (R2 = 0.320 and MSE 0.0347). Table 3 presents the performance of the models when estimating EQ-5D utilities when fitting the entire dataset. Performance of the fits to observed data provided by each model can also be assessed visually from Fig. 2, although it is important to note that this relationship between ISI and EQ-5D-3L utility is not adjusted for covariate patterns that differ across the range of ISI. The ALDVMM3 was the best performing model on the R2 (0.31982) and the MSE (0.034691), while the GLM gamma-log was the second-best performing model (R2 = 0.30309 and MSE 0.03534).

Table 2 ALDVMM component identification – model fitting performance based on the full dataset
Table 3 All models – model fitting performance based on the full dataset
Fig. 2
figure 2

Average predictions of the various models on the full dataset (solid line) and observed values (crosses): a GLM gamma-log; b OLS regression; c CLAD; and df ALDVMM with two to four components, respectively. Note: the relationship between ISI and EQ-5D in this figure does not control for covariate patterns that vary across the ISI categories. Regarding the apparently low EQ-5D-3L utility corresponding to an ISI score of 0, very few respondents scored a zero on the ISI (n = 81), therefore the low EQ-5D-3L utility may be considered an anomaly due to the low sample size. ALDVMM adjusted limited dependent variable mixture model, CLAD censored least absolute deviation, EQ-5D-3L EuroQol 5 dimensions 3 levels, GLM generalised linear model, ISI insomnia severity index, OLS ordinary least squares

Following the predictive modelling approach, splitting the data 50/50 into training and testing sets, Table 4 indicates the performance of the models on the test dataset. The ALDVMM3 was the best performing model based on the MSE (0.0351), and the gamma-log GLM was the second-best performing model (MSE 0.0355). Working as a median regression model, the CLAD reported the best MAE score in both the fitting and predictive approaches but performed poorly when assessed using the MSE criterion.

Table 4 Model performance based on the predictive test dataset across 100 repetitions

Model coefficients for the gamma-log GLM and ALDVMM3 are presented in Table 5 alongside the mean covariate values for the NHWS data, with the non-UK and treated variables set to zero, such that the algorithm predicts values for a UK non-treated population. Also included is the sum-product of the covariate and model coefficient columns, which provides an alternative ‘intercept’ value for the algorithms for those users without detailed covariate information for their own application. These algorithms from Table 5 were used to map the ISI to the EQ-5D-3L, and the predicted EQ-5D-3L utility values for each ISI score are listed in Table 6. Figure 3 shows the EQ-5D-3L disutility associated with each one-point increase in ISI for the ALDVMM3 and gamma-log GLM compared with the Model II algorithm reported by Gu et al., and Fig. 4 shows the predicted utility scores for the same three models. In contrast to Figs. 2, 3 and 4 show the independent effect of ISI on EQ-5D-3L utility after controlling for other covariates.

Table 5 Regression coefficients for the gamma-log GLM and ALDVMM models
Table 6 EQ-5D utility scores predicted for gamma-log GLM - and ALDVMM3 models
Fig. 3
figure 3

Mapping functions on the full dataset: EQ-5D health state disutilities per ISI summary total point increase in the current study compared with the study by Gu et al. [14]. In this figure, the independent effect of ISI on utility is shown after controlling for other covariates. ALDVMM adjusted limited dependent variable mixture model, EQ-5D EuroQol 5 dimensions, GLM generalised linear model, ISI Insomnia Severity Index

Fig. 4
figure 4

Mapping functions on the full dataset: EQ-5D health state utilities per ISI summary total score (0–28) in the current study compared with the study by Gu et al. [14]. In this figure, the independent effect of ISI on utility is shown after controlling for other covariates. ALDVMM adjusted limited dependent variable mixture model, EQ-5D EuroQol 5 dimensions, GLM generalised linear model, ISI Insomnia Severity Index

The R scripts to run the analysis are provided in the electronic supplementary material (ESM). In addition, a Microsoft Excel™ ‘calculator’, demonstrating how the ISI score can be obtained from the gamma-log GLM and ALDVMM3 models presented in the manuscript, is also available as part of the ESM. This calculator also includes the full regression coefficients, standard errors, and covariance matrices for the gamma-log GLM and ALDVMM3 models.

3.3 Validation Against the Model Developed by Gu et al.

Similar to the study by Gu et al. [14], we obtained a sufficient correlation between ISI summary total score and EQ-5D-3L health state utilities. Absolute utility values in our dataset were lower; this was expected given the use of the UK EQ-5D-3L tariff, which is known to give lower utility scores than tariffs from other countries (see, for example, Kiadaliri et al. [22]). The shape of the final mapping functions of ISI total score to EQ-5D-3L was similar between both studies (Figs. 3, 4); the current analysis resulted in a slower down curve toward the highest ISI total scores, indicating a worse QoL in those patients with more severe insomnia. Our models and that of Gu et al. [14] crossover at ISI = 12; under this threshold, any point increase in ISI results in higher disutility in our model, while the opposite is true above this threshold. As an example, a 1-point ISI increase from ISI = 22 to ISI = 23 results in a disutility of 0.020 in the study by Gu et al. [14] and 0.023 in the present study (Fig. 3). The gap between the two models widens at higher ISI scores because our model adjusts for confounding factors such as comorbidities, unlike Model II reported by Gu et al. [14]. Therefore, we believe that our analysis is more conservative. Finally, as highlighted in Figs. 3 and 4, the gamma-log GLM and the ALDVMM3 showed very similar predicted values, meaning that the difference between them is unlikely to prove important in most applications.

4 Discussion

This study expands upon prior literature by providing an updated mapping between insomnia severity (via the ISI) and a preference-based measure of health-related QoL, the EQ-5D. A prior mapping between ISI and EQ-5D performed by Gu et al. was reported in 2011 for a US population using US tariff values. The updated mapping presented here is based on this earlier work, using their same preferred functional form and supplementing it with the use of a contemporary representative dataset based on UK tariff values, consisting of validated PROs from multiple countries (France, Germany, Italy, Spain, the UK and the US), and adjusting for potential confounding factors (including patient characteristics, comorbidities and insomnia characteristics) to obtain the independent effect of ISI on EQ-5D score. The importance of adjusting for potential covariates cannot be overemphasised. Despite the similarities between our preferred algorithm and that of the earlier Gu et al. algorithm, the greater slope at higher values of ISI for the Gu et al. algorithm can likely be attributed to an association with greater comorbidities. We see this effect in our own data when comparing the fitted values in Fig. 2, where the algorithm is predicted back onto the observed data, with Fig. 4, where the ISI relationship is shown at the mean of all other covariates. The lower slope after adjustment shows the independent effect of ISI on utility, which represents the most appropriate approach to linking a clinical effect on ISI to estimated effect on utility.

In addition to the work by Gu et al. [14] and in agreement with the ISPOR Task Force report on ‘Mapping to estimate health-state utility from non-preference-based outcome measures’ [15], other appropriate models (i.e. OLS regression, CLAD and ALDVMM) were tested against the gamma-log GLM model. The goal of our study was to develop a mapping algorithm from a UK perspective, which was accomplished by applying the latest NICE-approved UK tariff to all observations based on a crosswalk algorithm derived from the original EQ-5D-3L tariff. The use of a UK tariff for all five non-UK countries in the NHWS dataset means that remaining country-specific differences relate to potential differences in how subjects complete the instruments in different jurisdictions, not the tariff weights. This was handled in our analysis by using a fixed-effect covariate adjusting for subjects not in the UK, leaving the UK as the reference category.

The ALDVMM3 model proved to be the best fitting/predictive model, albeit by a very small margin, compared with the GLM gamma-log model, the second-best fitting/predictive model. The OLS regression and the CLAD models were the poorest performing both on fitting and prediction criteria. Despite its better performance, the ALDVMM3 was a time-consuming model to fit as it was solved by iteration and required numerous starting values to ensure that the fit was to a global rather than local maxima (this took more than 2 h to fit on a standard laptop). It is not surprising that the ALDVMM3 results showed a better fit to the entire dataset, given that this model uses five times as many parameters as the gamma-log GLM. Although the advantage of ALDVMM3 over the gamma-log GLM is reduced in the predictive task, it still performs better overall and therefore may be preferred by some analysts who are concerned only with getting an accurate algorithm for mapping purposes. For others, who also value parsimony and interpretability, the gamma-log GLM algorithm may be preferred to the ALDVMM3. Nevertheless, the practical differences between them are small (Figs. 3, 4) and readers who want to implement an ALDVMM algorithm can easily do so using the simple Microsoft Excel calculator available in the ESM.

In the absence of preference-based measures, the algorithms presented herein can be used to predict utility values from the ISI total summary score. Mapping studies between disease-specific instruments and generic preference-based instruments are common, driven largely by the need for quality-adjusted life-year (QALY) metrics in reimbursement decisions. Indeed, the latest NICE guidance specifically refers to the need for mapping studies to obtain EQ-5D estimates, where direct measurement of the EQ-5D is lacking. However, the legitimacy of any mapping exercise must be grounded in the conceptual overlap between the two instruments [24].

Our analysis is limited in that it may potentially underestimate the burden of insomnia due to the condition being potentially insufficiently captured by the EQ-5D. There is no empirical test for this, but some conceptual understanding of the problem can be inferred from the literature. For example, Perneger and Courvoisier [25] identify ‘Sleep’ and ‘Fatigue/energy’ as two of five possible dimensions missing from the EQ-5D. Fatigue has been the subject of so-called ‘bolt-on’ developments of the EQ-5D [10], most notably because fatigue is a common adverse effect of many health conditions and treatments for them, but it is clear that fatigue will also be a major consequence of next-day functioning for individuals suffering from insomnia. If the EQ-5D itself fails to capture important dimensions of QoL related to sleep deprivation, then no mapping, however statistically accurate, will be able to account for this deficiency and the QALY burden of insomnia will inevitably be underestimated. Another limitation of this study is that adjusting the results for other countries would require re-running the analyses with both a country-specific tariff and a country-specific fixed-effect covariate. However, we believe our approach is optimal since it utilises the full dataset of values and provides the most robust results. Moreover, EQ-5D-5L value sets exist for several countries [23]. Even though full health is reported less frequently on the EQ-5D-5L, as compared with the EQ-5D-3L, the average severity of reported problems is also less (i.e. slight problems instead of moderate problems). The comparability between scores obtained from the 3L and 5L versions of the EQ-5D depends on the values attached to health states, which vary across countries. In some countries, the results obtained from the 3L and 5L versions of the EQ-5D will be similar, while in other countries, the two EQ-5D versions may produce different results [26]. Therefore, using country-specific value sets instead of the UK EQ-5D-3L in our analysis could have resulted in different results and provided a more international perspective.

5 Conclusions

In the absence of preference-based measures, this study provides updated mapping algorithms for estimating EQ-5D utilities from the ISI summary total score. This new mapping draws its strengths from the use of a large international dataset and also the incorporation of adjustment variables (including sociodemographic and general health characteristics) to reduce the effects of confounders. Nevertheless, users should consider whether the limitations of the EQ-5D instrument itself in capturing all relevant domains of insomnia could still lead to underestimation of utility in application of the algorithms reported here.