The estimation of a preference-based single index for the IBS-QoL by mapping to the EQ-5D-5L in patients with irritable bowel syndrome

Purpose The Irritable Bowel Syndrome Quality of Life (IBS-QoL) questionnaire is a commonly used and validated IBS-specific QoL instrument. However, this questionnaire is in contrast to the EQ-5D-5L, not preference-based and as such does not allow calculation of QALYs. The objective of this study was to describe the convergent- and known-group validity of both questionnaires and to develop a mapping algorithm from EQ-5D-5L which enable IBS-QoL scores to be transformed into utility scores for use in economic evaluations. Methods We used data from two multicenter randomized clinical trials, which represented the estimation and external validation dataset. The convergent validity was investigated by examining correlations between the EQ-5D-5L and IBS-QoL and the known-group validity by calculating effect sizes. Ordinary least squares (OLS), censored least absolute deviations (CLAD), and mixture models were used in this mapping approach. Results 283 IBS patients were included (n = 189 vs. n = 84). Mean IBS-QoL score was 71.13 (SD 15.66) and mean EQ-5D-5L utility score was 0.73 (SD 0.19). The overall sensitivity of the IBS-QoL and EQ-5D-5L to discriminate between patient and disease characteristics was similar. CLAD model 4, containing the total IBS-QoL score and squared IBS-SSS (IBS severity scoring system), was chosen as the most appropriate model to transform IBS-QoL scores into EQ-5D-5L utility scores. Conclusion This study reports the development of an algorithm where the condition-specific questionnaire IBS-QoL can be used to calculate utility values for use in economic evaluations. Including a clinical measure, IBS-SSS, in the model improved the performance of the algorithm. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-021-02995-y.


Introduction
Irritable bowel syndrome (IBS) is a chronic disorder of the gut-brain interaction characterized by altered bowel habits (constipation, diarrhea, or mixed pattern) and abdominal pain. IBS affects a large number of people worldwide, 4.4-4.8% according to the Rome IV criteria [1]. These symptoms have a substantial impact on patients' quality of life (QoL) and are associated with considerable use of healthcare resources and secondary significant economic impact on individuals, healthcare systems, and society. Between 15 and 50% of patients with IBS report absenteeism (work time missed) due to their symptoms and up to 34% report presenteeism (impairment while at work) [2,3]. To accomplish symptom control to improve quality of life (QoL), various treatments for IBS are available nowadays. These include diets, psychological interventions, and several types of pharmacological agents [4,5].
The cost-effectiveness of these treatments is generally examined using cost-utility analysis [6,7]. In health care decision-making and reimbursement procedures, the outcome of a cost-utility analysis is known as quality-adjusted life years (QALY) which is used to determine whether a new therapy delivers value for money [8]. The quality of life side of the QALY can be assessed with a generic questionnaire, such as the frequently used EQ-5D (European Quality of Life Five Dimension questionnaire), which is designed to cover the core dimensions of health that are relevant across all medical conditions and to allow comparisons between patient groups [9,10]. In order to capture the impact of IBS on QoL, patients complete the EQ-5D-3L or the recently developed 5-level questionnaire after which a value set is applied to generate utility values. For example, mean-utility values in IBS patients range between 0.50 and 0.75, where 1 is equivalent to perfect health and 0 is death [3,8,[11][12][13][14]. The utility scores are subsequently used to calculate QALYs [8]. However, in clinical studies, a nonpreference-based condition-specific questionnaire is often preferred because they capture more disease-specific or relevant aspects of the disease from a clinical and patients' perspective. The Irritable Bowel Syndrome Quality of Life questionnaire (IBS-QoL) is a condition-specific instrument for IBS patients which incorporates specific subdomains such as food avoidance, bowel habits, and the effect on the social/sexual relationships [15]. EuroQoL-5D and IBS-QoL have both been proven to be valid for assessing QoL in IBS patients [3,8,15]. However, previous studies, in which different disease populations were examined, have suggested that condition-specific measures are more responsive than the generic measure with regard to capturing changes in health [16][17][18][19]. The involvement of the psychological domain in QoL questionnaires is relevant for IBS patients, due to the high prevalence of anxiety and depression disorders among these category of patients which has a significant impact on the disease course and the choice of therapy [2,12,[20][21][22][23]. The EQ-5D-5L has one Anxiety/Depression dimension, where IBS-QoL has several domains containing psychological questions. Whether the general EQ-5D and the condition-specific IBS-QoL are both sensitive enough to capture (mental) health changes is not yet investigated in IBS patients. Therefore, the difference in responsiveness of both the EQ-5D-5L and IBS-QoL should be further explored.
Because the IBS-QoL is specifically designed for IBS patients and uses aspects that are salient to this specific patient group, the IBS-QoL is often preferred in clinical studies. Up to now, however, there is no proper method available to convert IBS-QoL scores into utilities to calculate QALYs. A mapping approach for the IBS-QoL to the EQ-5D-5L would be highly valuable to enable prediction of utility scores for modeling studies in which evidence is used from trials where in the past, only the IBS-QoL questionnaire is included [24]. Mapping is recognized by the National Institute for Health and Clinical Excellence (NICE) for generating utility information for non-preferences-based measures and the ISPOR (International Society for Pharmacoeconomics and Outcomes Research) guidelines have provided recommendations about this composed algorithm between a base measure and a target measure [25][26][27].
To the best of our knowledge, no study thus far has performed a mapping approach to predict utility values for the condition-specific measure IBS-QoL for use in IBS patients. The goal of this study is to examine the convergent-and known-group validity between the EQ-5D-5L and the IBS-QoL and use empirical mapping to predict EQ-5D-5L utility values from the non-preference-based measure IBS-QoL scores in IBS patients.

Datasets
Two studies were included for this mapping approach. The first study (N = 189) is a three-armed multicenter placebocontrolled randomized controlled trial where the efficacy of peppermint oil was assessed, the PERSUADE study (NCT02716285) [28]. Patient inclusion took place in the Netherlands from August 2016 through March 2018. This study was used as estimation data set to create the mapping algorithm.
The second study (N = 84) is a three-armed multicenter randomized controlled non-inferiority trial where the efficacy of online hypnotherapy versus face-to-face hypnotherapy is compared with online psychoeducation as control condition (FORTITUDE NCT03899779). Patient inclusion commenced in the Netherlands in July 2019 and is still ongoing. This trial was used as study data set to test the algorithm for external validation.
Inclusion criteria were similar in both studies. Subjects were included between age 16 and 75 years, diagnosed with IBS according to the Rome IV criteria, and had no history of other causes for the abdominal complaints, such as Crohn's disease and coeliac disease [29]. They were both recruited via primary and secondary/tertiary healthcare. There was a slight difference in the age limits for inclusion between both studies: in the estimation data set, subjects between 18 and 75 years of age are included, in the validation set, subjects were included with 16-65 years of age. This is due to the changed age limit by the Dutch Medical Research Involving Human Subjects Act in august 2016 where research is allowed with subjects from 16 years and older [30]. The upper limit of age was adjusted due to involvement of online therapies. Exclusion criteria of both trials included insufficient command of the Dutch language, major surgery to the lower gastrointestinal tract, current pregnancy or lactation, and, respectively, peppermint oil usage or hypnotherapy in the last 3 months prior to inclusion. Patients with a positive screening for anxiety and depression (score ≥ 10 of GAD-7 and PHQ-9, respectively) in the validation dataset were 1 3 interviewed by the researcher and only patients with clinically significant anxiety or depression were excluded. In the estimation dataset, these scores were not incorporated during patient screening.
All procedures were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964, as revised in 2013. Both studies were reviewed and approved by the ethics committee at the Maastricht University Medical Center (METC 162,009; METC 18-037). Informed consent was obtained from all patients prior to being included in the study.

Questionnaires
Both the EQ-5D-5L and IBS-QoL were completed in these studies. The EQ-5D-5L is a preference-based measure and consists of five-dimension mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, each with five severity levels (no, slight, moderate, severe, extreme problems) [10,31]. This questionnaire is validated for use in IBS patients [3,8]. In the Netherlands, it is recommended by the National Health Care Institute (ZIN) for use in cost-utility analyses and a Dutch Tariff for the EQ-5D-5L is applied to create the utility values [32].
The IBS-QoL is a condition-specific instrument that is used to assess the impact of IBS and effects of treatment. It consists of 34 questions which cover eight domains including dysphoria, interference with activity, body image, health worry, food avoidance, social reaction, sexual, and relationships [15,33]. Each item has a five-point response scale (not at all, slightly, moderately, quite a bit, extremely). The responses are summed and averaged for a total score and transformed to a scale between 0 and 100: higher scores indicating better IBS-specific QoL. The Generalized Anxiety Disorder-7 (GAD-7) [34] and Patient Health Questionnaire-9 (PHQ-9) [35,36] were completed to screen for anxiety disorders, respectively, depressive disorders. A score of 10 or higher in both questionnaires was considered as cut-off point for (possible) diagnosis of the specific disorder, generalized anxiety, or depression disorder, and further examination to confirm diagnosis is recommended at that point. The Irritable Bowel Syndrome Severity Scoring System (IBS-SSS) was completed to measure the severity of the symptoms (0-500) [37]. It consists of five items with a maximum score of 100; higher scores indicate more severe symptoms.

Statistical Analysis
Descriptive analyses were performed for patient characteristics. Whether the IBS-QoL and EQ-5D-5L are sensitive to discriminate between relevant disease or patient characteristics was examined by comparing the mean values using paired t tests [38]. We hypothesized that both questionnaires would show similar levels of discriminatory power with regard to patient characteristics (age, gender, and BMI). In addition, we hypothesized that the IBS-QoL would have greater discriminatory power for disease characteristics (IBS severity, depression, and anxiety) compared to EQ-5D-5L.
The known-group validity was analyzed using standardized effect sizes, dividing the difference in means by the standard deviation. We used Cohen's d to calculate the effect size by the pooled standard deviation of the population, where 0.2 was considered as a small effect, 0.5 a medium effect, and 0.8 a large size [39]. If the sample size was small (< 20), Hedges' g was used to describe the effect size [40]. Glass' delta was chosen if the variance in both groups significantly differed [41].
The data from both trials were used to estimate a direct response mapping algorithm between IBS-QoL and EQ-5D-5L. The mapping approach was conducted following the principle described by Brazier et al. [42] and the ISPOR guidelines [27]. One of the criteria of mapping is the essential of overlap between the start and target measure to cover the important aspects of HRQoL. Mapping would be unsuccessful if there is no conceptual overlap [26]. At first, convergent validity was investigated by examining the correlations between the paired observations and their domains using Spearman correlation coefficients. Correlation coefficients of 0.10, 0.10-0.50, and > 0.50 were considered as weak, moderate, and strong associations, respectively [43]. Second, different types of regression models were estimated with increasing complexity. As recommended by Brazier et al., our initial analysis included a simple model where the regression consists of the target measure onto the total score of the starting measure (IBS-QoL) [42]. Afterward, the domain scores of the IBS-QoL, whether or not combined with covariates, were added to the algorithm [42]. We tested whether the models improved when including clinical covariates (age, BMI, sex, IBS-subtype, IBS-SSS) [27]. Only age and the clinical variable IBS-SSS significantly improved the models (p ≤ 0.05). These two variables were therefore included in the final models, as shown below.
The included models were specified as the following equations: EQ-5D-5L is the EQ-5D-5L utility score; IBS-QoL is the IBS-QoL total score; Dysphoria score is the score of the domain Dysphoria of the IBS-QoL; Body Image score is the score of the domain Body Image of the IBS-QoL; the IBS-SSS score is the IBS Severity Score (0-500). Also, the squared term of the IBS-SSS is included in the models to capture non-linear effect. β 0 is a constant, β 1 , β 2 , β 3 , are the coefficients to be estimated. Overall, EQ-5D-5L utility score is the dependent variable in the different regression equations, while the IBS-QoL total score, the separate domains, and the IBS-SSS score were used as predictors. Three statistical approaches were used to estimate these five models. The first technique was the ordinary least squares (OLS) estimator because it is the most widely used analysis and generates good estimate results, mostly better than the alternatives [26,44]. It estimates parameters by minimizing the sum of squared errors of data. However, because the utilities of the EQ-5D-5L in our population were censored (skewed left), we investigated the option for using estimators for censoring issues. The censored least absolute deviations (CLAD) estimator was chosen above the Tobit estimator because CLAD is robust against departures of errors from homoskedasticity and normality [45,46].
The Adjusted Limited Dependent Variable Mixture Model (ALDVMM) was used as third mapping model, which was developed to deal with the distributional features of the EQ-5D [47]. It accounts for the gap between 1 (full health) and the highest EQ-5D index value below 1 (truncation point). We used the command aldvmm in Stata to fit these models [48]. First, we estimated the mixture models with two to five components to determine that the model with 4 components has the best fit (highest Likelihood and the lowest BIC (Bayesian information criterion)). Models were conducted with and without inclusion of the truncation point. Model fit was better when the truncation point was included.
Models that were developed using data from one trial were used to predict EQ-5D values in the other trial (external validation). Model fit was assessed by comparing the mean absolute error (MAE) and the root-mean-square error (RMSE) in this sample [27]. The lower the MAE/RMSE, the better the predictive accuracy of the model. A scatter plot of observed and predicted values in the estimation sample was provided of the best model. The best fitting model was selected by the value of MAE/RMSE, the predictive performance, and by the convenience of the algorithm (e.g., simplicity) for usage in clinical practice [49]. Greater complexity of the algorithm by including more clinical and demographic characteristics does not always seem to be beneficial [42]. A significance level of p < 0.05 was applied for all analysis. All analyses were performed in Stata version 14.1 (Stata Corp., College Station, Texas, USA) and IBM SPSS Statistics version 27.0 (Armonk, NY: IBM Corp.).

Baseline characteristics of the population
In total, 273 IBS patients were included in this mapping approach. The estimation data set consisted of 189 IBS patients. The external validation data set consisted of 84 patients. The baseline patient characteristics are shown in Table 1. The mean age of the population was 35.07 years and 76.20% was female. The mean IBS-SS score was 278.17 (SD 76.17) which implies a moderate severity of IBS symptoms. The mean quality of life (QoL) according to the general questionnaire EQ-5D-5L was 0.73 (SD 0.20) and the mean QoL according to the condition-specific questionnaire IBS-QoL was 71.13 (SD 15.66).

Convergent-and known-group validity
The convergent validity between the IBS-QoL instrument and the EQ-5D-5L were investigated by Spearman's correlation coefficient and results are available in Table 2. The correlation between these two instruments for the total score showed a moderately strong significant correlation (0.472). The majority (57.50%) of the correlations between the subscores of the IBS-QoL and the subscores of the EQ-5D-5L were statistically significant. All subscores of the IBS-QoL were positively significantly correlated with the total EQ-5D-5L scores. The subdomains dysphoria (0.420*) and body image (0.438*) of the IBS-QoL reached the strongest significant correlation with the total EQ-5D-5L score.
The analysis of the known-group validity of both HRQoL instruments is shown in Table 3. Both the IBS-QoL and the EQ-5D-5L revealed a similar (very) small non-significant difference in QoL score or utility value with respect to gender (in the estimation set, males have lower health state scores/values, whereas in the validation set, females have lower health state scores/values). Patients younger than 40 years old showed lower quality of life scores or utility values and this effect was significant in the validation set. Greater effect sizes were seen in the validation set compared to the estimation set. This is probably due to a higher mean level of age and the presence of a greater percentage of the subgroup of patients aged ≥ 40 years in the validation dataset. The IBS-QoL score and EQ-5D-5L value were both lower in patients with severe IBS symptoms compared to patients with mild/moderate symptoms (all were significant). Table 1 Baseline characteristics of the population a The IBS-SSS, IBS symptom severity score, consists of 5 items with a maximum score of 100; a higher score indicates severe IBS symptoms. The total score (range 0-500) can be categorized as Mild IBS (score < 175), Moderate IBS (175-300), and Severe IBS (300-500). b The PHQ-9, Patient Health Questionnaire-9, is a 9-item questionnaire to screen for a depressive disorder. The total score (range 0-27) can be categorized as Minimal symptoms (score 0-4), Mild depression (5-9), Moderate depression (10)(11)(12)(13)(14), Moderately severe depression (15)(16)(17)(18)(19), Severe depression (20)(21)(22)(23)(24)(25)(26)(27). c The GAD-7, Generalized Anxiety Disorder-7, is a 7-item questionnaire to screen for an anxiety disorder. The total score (range 0-21) can be categorized as Minimal symptoms (score 0-4), Mild anxiety (5-9), Moderate anxiety (10)(11)(12)(13)(14), Severe anxiety (15)(16)(17)(18)(19)(20)(21).

3
The difference in effect sizes between both datasets could be explained by the greater percentage of patients included with mild/moderate symptoms in the estimation dataset compared to the validation dataset. This observation is therefore reflected in the different effect sizes of the depression and anxiety subgroups, whereby patients with more severe symptoms have often more psychopathology. Patients with a depression had significantly lower health scores and values, both according to IBS-QoL and EQ-5D-5L in the validation set. Patients with anxiety also had lower health-related quality of life, according to both instruments. The IBS-QoL reported greater effect sizes compared to EQ-5D-5L with respect to the characteristics gender (male vs female), age (< 40 vs > 40 years old), and the severity of symptoms according to the IBS-SSS (mild/moderate symptoms vs severe symptoms) and is therefore more sensitive to discriminate here. The discriminatory power of the IBS-QoL and EQ-5D-5L is similar for the BMI score and the presence of depression, but for anxiety, the EQ-5D-5L is slightly more sensitive.

Mapping results
Data of IBS-QoL and EQ-5D-5L in the estimation dataset were both left-skewed, where the EQ-5D-5L values were bimodally distributed. The EQ-5D-5L values were distributed as follows: 25% of the observations were between -0.02 and 0.68, 25% were between 0.68 and 0.82, 25% were between 0.82 and 0.86, and 25% were between 0.86 and 1.00 (full health). The truncation point for EQ-5D-5L is 0.92.
The goodness of fit results of the five models are shown in Table 4 The regression coefficients for all models are reported in Supplementary Table 1, 2, and 3. The predicted EQ-5D utilities nearly reached the value of 1 (full health), of which CLAD model 4 was closest to 1 with maximum values of 0.940. Figure 1 shows scatter plots from the observed and predicted utility values for model 4 from all three mapping models. The OLS model shows that the prediction is good at the upper end of the EQ-5D-5L, but worsens when the QoL is at the lower end. The CLAD model shows a good prediction for the higher QoL scores (> 0.7), where the predicted values are equal to the expected values in some cases. However, when the QoL is at the lower end of EQ-5D-5L, the prediction is worse. A large proportion of the observations are present below the truncation point; 0.7 and 0.9. The ALDVMM tends to underestimate good health and overestimate poor health, but observations near the mean are well predicted.
When assessing the goodness of fit results from the validation analysis, by the constructed models from the estimation data set,  Table 5.
Given the ease and straightforwardness of the algorithm, the good prediction of the mean and minimum/maximum

Discussion
This is the first study to present an algorithm to predict utility values in IBS patients from the condition-specific IBS-QoL questionnaire. Results of our mapping approach showed that CLAD model 4 containing the total IBS-QoL score and the squared IBS-SSS score is the most appropriate model to enable prediction of health state utilities. This algorithm was chosen because of its simplicity; the low MAE/ RMSE; and the small range to the predicted mean, minimum, and maximum. The mapping from the IBS-QoL to the EQ-5D-5L provides utility scores that can be converted into QALY which is increasingly important in the current health society where economic evaluations are necessary to design reimbursement rules for drugs and medical services. The reported mean IBS-QoL score in our study from 283 patients was 71.1. These results are similar to those reported in other IBS studies (baseline). In literature, IBS-QoL scores vary between 61.4 and 71.2 [3,12,15,33,50,51]. Therefore, our patients' sample used to derive and validate mapping algorithm covers the most commonly observed IBS-QoL data in clinical practice. Subdomains "Sexual Function" and "Relationships" were least affected in our cohorts in total QoL score. This finding is also in line with earlier studies [33,[51][52][53]. Patients in our cohorts were most affected by the scores on the subdomain "Food Avoidance" (estimation set 58.5; validation set 47.9). This finding was also confirmed by other studies in IBS patients [33,51]. In both datasets, the second most affected subdomain was "Health Worry" (estimation set 70.19; validation set 64. 19), which reflects the impact of IBS on a psychological level.

3
The total IBS-QoL score in the validation set was lower than reported in the estimation set (66.88 and 73.02, respectively). This is probably caused by the higher prevalence of moderate depression and mild and moderate anxiety among the IBS patients in the validation set due to offering psychological therapies in this trial. The domains, "dysphoria" and "body image," of the IBS-QoL, were strongly correlated with the EQ-5D-5L total utility score which highlights the relevance of these domains for IBS patients. Other diseasespecific domains such as "food avoidance," "social reaction," "sexual," and "relationships" were less correlated with the total EQ-5D-5L scores and are not represented in the generic questionnaire EQ-5D-5L. Still, these domains are specific and important for the psychological well-being among IBS patients [53]. The overall known-group validity of the IBS-QoL and EQ-5D-5L was quite similar. The IBS-QoL had a greater discriminatory power with regard to age and gender and the severity of symptoms (IBS-SSS). But the EQ-5D-5L had a favorable discriminative power with regard to the presence of anxiety. Both questionnaires showed comparable discriminative power with regard to BMI and the presence of depression. Therefore, our initial hypothesis has to be rejected because the IBS-QoL is not more sensitive to discriminate between disease characteristics compared to the EQ-5D-5L. However, the condition-specific questionnaire IBS-QoL could be more favorable when different aspects of the disease are required to be addressed during a clinical study.
Other condition-specific measures intended for patients who suffer from epilepsy and cancer had a similar sensitivity in comparison to the general EQ-5D [54,55]. However, in studies involving patients with asthma and urinary incontinence, construct validity of EQ-5D was not as strong as the condition-specific measures [56,57].
For the final mapping algorithm, we not only included age but also the symptom severity score (IBS-SSS). According to the ISPOR guidelines, including covariates, such as sociodemographic variables and disease characteristics, should be explored to avoid mis-specification of the model [27]. The prediction of the utility values will be more accurate in that way. A recent review of mapping studies showed that age was included in 51% in the algorithm and gender was included in 55% [44]. Clinical measures, such as BMI, were included in the analysis in only 20% of the reports. When performing a mapping study, inclusion of covariates in the algorithm should be explored more extensively in the future to enhance performance. This is the first study to enable the estimation of utility values from IBS-specific questionnaire scores. A strength of this study includes the applicability to other study IBS populations. The current study population was representable for IBS populations in general because our IBS population have comparable basic patient characteristics (i.e., age, gender) and includes the full range of IBS patient disease severity (range   [3,58,59]. The mean IBS symptom severity score of 278.17 in this study is similar to previous studies (range 259.   [51,58,60,61]. The two data sets used had similar inclusion criteria and the population had similar baseline characteristics, which facilitates the development of a valid mapping approach. Another strength of this study is that a different data set was used for external validation of the models and the model performance was reported by assessing the MAE and the RMSE [42].
A limitation of the present study is that our predicted EQ-5D-5L utilities did not capture the full range of observed EQ-5D-5L utilities. The overprediction of the lowest utilities and the under-prediction of the highest utilities may result in an underestimation of the utility gain. This is a general problem with mapping studies, especially when using linear regression [42,62]. Therefore, the model fit of both CLAD and ALDVMM outperformed OLS functions. The CLAD Model 4 performed slightly better than ALDVMM Model 4, containing the lowest MAE/RMSE. The big proportion of observations in our dataset was between 0.7 and 0.9. This is below the truncation point, which is an important feature of the ALDVMM, and could be an explanation for the fact that the CLAD model 4 performed better. ALDVMM could be a good option when data are differently distributed than in our dataset.
Furthermore, our algorithm is not directly applicable for usage in trial-based economic evaluations when a comparison with EQ-5D-3L data is requested. However, it is possible to use this data to generate 5L data by conducting a mapping function online [63].
In conclusion, this study investigated a mapping approach where the condition-specific questionnaire IBS-QoL was estimated to EQ-5D-5L utility values. This algorithm is useful for modeling studies in which only the IBS-QoL is included and in trial-based economic evaluations to estimate QALYs. Including a clinical measure in the model, such as the severity score of the disease (IBS-SSS), will improve performance of the algorithm to predict utility values.