HIT-6 and EQ-5D-5L in patients with migraine: assessment of common latent constructs and development of a mapping algorithm

Objective The aims of this study were to assess whether there is a conceptual overlap between the questionnaires HIT-6 and EQ-5D and to develop a mapping algorithm allowing the conversion of HIT-6 to EQ-5D utility scores for Germany. Methods This study used data from an ongoing randomised controlled trial for patients suffering from migraine. We assessed the conceptual overlap between the two instruments with correlation matrices and exploratory factor analysis. Linear regression, tobit, mixture, and two-part models were used for mapping, accounting for repeated measurements, tenfold cross-validation was conducted to validate the models. Results We included 1010 observations from 410 patients. The EQ-5D showed a substantial ceiling effect (47.3% had the highest score) but no floor effect, while the HIT-6 showed a very small ceiling effect (0.5%). The correlation between the instruments’ total scores was moderate (− 0.30), and low to moderate among each domain (0.021–0.227). The exploratory factor analysis showed insufficient conceptual overlap between the instruments, as they load on different factors. Thus, there is reason to believe that the instruments’ domains do not capture the same latent constructs. To facilitate future mapping, we provide coefficients and a variance–covariance matrix for the preferred model, a two-part model with the total HIT-6 score as the explanatory variable. Conclusion This study showed that the German EQ-5D and the HIT-6 lack the conceptual overlap needed for appropriate mapping. Thus, the estimated mapping algorithms should only be used as a last resort for estimating utilities to be employed in economic evaluations. Supplementary Information The online version contains supplementary material available at 10.1007/s10198-021-01342-9.


Introduction
Migraine is a common neurological condition affecting 10.6% of the German population (one-year prevalence) [1]. It is associated with comorbidities such as psychiatric Tobias Kurth and Annette Aigner contributed equally to this work. 1 3 disorders (depression and anxiety, among others), respiratory disorders, and chronic pain, and it leads to a significant reduction in quality of life [2,3].
This condition also imposes an economic burden on health care systems due to increased demand for goods and services and work-related productivity losses [4,5]. As healthcare systems face the challenge of limited resources, economic evaluations provide tools for decision-makers to analyse competing alternatives-in terms of both costs and consequences [6]. Cost-utility analyses, a form of economic evaluation, measure consequences with generic measures of health gain, commonly expressed in qualityadjusted life years (QALYs) [6]. The EuroQol five-dimensional questionnaire (EQ-5D) is a generic utility-based instrument which allows the estimation of utility scores, and thus, the calculation of country-specific QALYs [7]. It analyses five different dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The initial version of EQ-5D had only three levels within each dimension, while the improved EQ-5D-5L, henceforth EQ-5D-5L will be referred to as EQ-5D, has five levels, while maintaining the same five dimensions. The levels indicate no problems (1), slight problems (2), moderate problems (3), severe problems (4), and unable to/ extreme problems (5). Health states are defined by combining digits for the five dimensions, enabling 3125 possible health states. Health states can be represented with five-digit codes or converted using country-specific single index values.
Clinical trials in migraine often use monthly migraine days as a primary endpoint and the International Headache Society actually recommends the use of monthly migraine days as the primary endpoint for HTAs involving preventive treatments [8]. Where generic preference-based measures are not deemed ideal, other approaches include using condition-specific instruments or condition-specific preference-based measures (to our knowledge there is none for migraine). However, analyses with disease-specific instruments do not allow decision-makers to compare resource allocation across different conditions. Nevertheless, several trials in the migraine field (e.g. [9,10]) only collect migraine-specific health-related quality-of-life (HRQOL) instruments, but do not collect generic preference-based ones, which can be used to conduct cost-utility analyses. There are several migrainespecific HRQOL questionnaires such as the Headache Impact Test-6 (HIT-6), the Migraine Disability Assessment (MIDAS), and the Migraine-Specific Quality-of-Life Questionnaire (MSQ). The HIT-6 is a headache-specific questionnaire, which evaluates how headaches affect someone's ability to function on the job, at school, at home, and in social situations [11]. This instrument does not have a preference-based scoring system, thus it does not permit the calculation of QALYs. Mapping overcomes this issue by providing an algorithm which allows the estimation of QALYs even if a preference-based HRQOL instrument was not included in the study. However, to perform mapping between two instruments, there should be a conceptual overlap between them. ISPOR guidelines on mapping state that these algorithms can only be successful if there is sufficient overlap between the analysed instruments [12]. Although the selected instruments do not have to measure the same symptoms or functional (dis)abilities, they do need to address the same underlying concepts.
One study by Gillard et al. has already mapped the EQ-5D to the HIT-6, but used quality weights for England in a Brazilian population [13]. Several studies have shown the impact of using different country-specific value sets of EQ-5D on the interpretation of results [14,15]. Furthermore, the authors used variables in their algorithm which are not always collected in trials, such as ethnicity. A large number of trials which do not involve drugs but e.g. behavioural interventions often do not collect ethnic information (e.g. [16][17][18][19]). Applying the validation method of splitting the available data set into two has been criticised because of its limited ability to depict the uncertainty in the results and increased bias in the performance estimates in proportionally large test sets [20].
Based on these considerations, we will address the issue of whether there is enough conceptual overlap between the two instruments using not only correlation tables, but also exploratory factor analysis (EFA). Based on regressiontype approaches we will develop a mapping function to predict EQ-5D utility values for Germany from HIT-6 values, including variables widely used in migraine trials, and validate them with tenfold cross-validation.

Data
This study is based on data from the SMARTGEM project. SMARTGEM is an ongoing national randomised controlled clinical trial, which seeks to assess if a digital intervention via the use of a headache app and online consultations leads to a decrease in migraine frequency. The intervention consists of a certified medical app where patients document trigger factors, attacks, and medication in an electronic calendar; the app analyses the diary and evaluates trigger factors and proposes individually tailored treatment plans; a web-based tool where patients communicate both with other patients and specialists. HIT-6 and EQ-5D were completed by all users at baseline and after 3, 6, 9, and 12 months. Registration ID in the German clinical trials register is DRKS00016328.

Conceptual overlap
To analyse the strength of the relationship between HIT-6 scores and EQ-5D domains, correlation coefficients accounting for repeated measurements were computed. We also examined the capacity of each instrument to detect changes in HRQOL over time, referred to as responsiveness, by computing standardised response mean(s) (SRM). SRM is defined as a ratio of the difference in the mean baseline and mean follow-up values divided by their mean standard deviations' (sd) difference. We considered SMR values of less than 0.2 as small, from 0.2 to 0.5 as moderate, and values above 0.8 as large, following Cohen's criteria [21]. EFA was conducted to explore the overlap in the underlying constructs of the two instruments. If factors have meaningful loadings from the two different instruments (EQ-5D and HIT-6), these instruments are assumed to capture the same underlying latent structure. We considered factor loadings above 0.3 as 'meaningful' [22]. For ordered data, the preferred method to determine the number of factors is to conduct parallel analysis with polychoric correlations instead of Pearson correlations [23]. It is believed that Pearson correlations underestimate the relationship between ordered categorical data because of the categorisation [24]. Furthermore, Glorfeld (1995) showed that parallel analysis performs well with nonnormally distributed data [25]. The chosen factoring mode was weighted least squares, which makes no distributional assumption, thus being appropriate for ordinal data [26]. Varimax and promax rotations were used to interpret factor loadings.

Mapping model development
Since there is no specific model recommended by guidelines on best practices for mapping, we applied several models [12]. As our data contains repeated measurements per individual over time, we accounted for dependencies between observations by including random effects in our models, and estimated mixed-effects linear regression models (fit by maximum likelihood), mixed-effects tobit censored at the upper bound at 1, adjusted limited dependent variable mixture models, mixture beta regression models, and two-part models. For mixed-effects linear models and mixed-effects tobit, we compared models where the overall HIT-6 score versus the HIT-6 several dimensions were used as independent variables. Interaction terms and quadratic terms were considered. Models with the lowest BIC (Bayesian information criterion) were chosen. With regard to the two-part model, in a first stage, a mixed-effects logistic regression is fit to predict the probability of a respondent having full health. In a second stage a mixed-effects linear regression only based on those without full health was estimated. The overall expected EQ-5D index score was calculated using an expected value approach [27].
We also fitted adjusted limited dependent variable mixture models with one to four components, with the Stata command aldvmm, which was specifically developed to deal with health utility data [28]. These models allow to limit the dependent variable to the EQ-5D country-specific range, while taking into account the gap between 1 and the next feasible value (0.974 in the case of Germany). We conducted both adjusted limited dependent variable mixture models and mixture beta regression models with and without the inclusion of this truncation point, as well as with and without the inclusion of a probability mass at full health and at this truncation point (for beta mixture models only). We used the estimated parameters from a constant-only model in our mixture models, to find the global maximum, since mixture models are known to have multiple optima [28]. Unlike in the other models, we could not account for repeated measures by including random effects. We did, however, compute robust cluster-corrected standard errors.
HIT-6 related variables, sex, age, and migraine type were pre-defined as having to be part of the model. Given that mapping algorithms are intended to be used by other researchers, we only considered age and sex 1 as possible socio-demographic explanatory variables, since they are almost always collected in studies. Studies have shown that age has an impact on the symptoms of people with migraine, e.g. decrease in frequency of photophobia and phonophobia [29]. Migraine also affects three times more women than men, and it is known that fluctuations in female hormones play an important role in this relationship [30,31]. Since especially in women, the impact of migraines varies with age, we tested whether there is an interaction between age and sex [29]. We also included the information whether patients suffered from episodic or chronic migraines and its interaction with age. Migraine characteristics evolve across time (e.g. the conversion of episodic to chronic migraine), thus the importance of testing the inclusion of an interaction term between migraine type and age [32].
We conducted complete-case analysis based on the following variables: EQ-5D domains, HIT-6 domains, migraine type, age, and sex.

Validation
We plotted the observed and predicted EQ-5D values to visualise the models' performance. Given the lack of external data to conduct external validation, a tenfold cross-validation was carried out to compare the predictions of each model with the actual EQ-5D scores. This method is recommended for small samples [20]. Models' predictive performance was assessed with root mean squared error (RMSE), mean absolute error (MAE), and R 2 , reporting the mean of all 10 cycles.
Statistical analyses were performed using R 3.6.3 and Stata 15 [33,34]. We used additional R packages for data handling [35] and plotting [36], repeated measures correlation [37], and factor analysis [38,39], a Stata package for variable selection [40] (a preliminary version of gsreg 2.0 provided by the authors was used, which allowed its use with a mixed-effects linear regression estimated by maximum-likelihood), the package aldvmm to fit adjusted limited dependent variable mixture models [28], as well as the betamix package for conducting beta mixture regressions [40].

Results
The dataset used for the analysis contains 1010 observations, based on 410 patients, as 16 patients had missing data, such that 22 out of the 1032 (2.13%) observations had to be removed. thus the dataset used for the analysis contains 1010 observations, based on 410 patients. 7 out of 16 were excluded from the analysis because they did not have full data on other time points.
87.3% of all participants were female, with an average age of 41 years (Table 1).
Health utility values derived from EQ-5D ranged from − 0.57 to 1. We observed a ceiling effect in EQ-5D scores, with a skewness of − 2.33 and a kurtosis of 9.45, pointing to a left skew with few negative observations (Fig. 1). Data are considerably more skewed for patients with episodic migraine than for patients with chronic migraine. The mean EQ-5D utility value was 0.82 (sd 0.23) for all patients, 0.86 (sd 0.18) for patients with episodic migraine, and 0.72 (sd 0.30) for patients with chronic migraine.
HIT-6 scores ranged from 44 to 78 (possible score range 36-78). The skewness of − 0.64 indicated that the HIT-6 scores are only slightly skewed to the left (Fig. 2). There was no floor effect, no patient had the lowest score possible, and the ceiling effect was small (5 out of 1010 observations; 0.5%).
In EQ-5D, there was no floor effect (proportion of respondents reporting the worst level for all five dimensions), i.e. no patient had the lowest utility score possible (− 0.661 in the German value set). However, the ceiling effect (proportion of participants reporting the best level for all dimensions) amounted to 47.3% (194/410).

Conceptual overlap
We consider that occupation and daily activities can be measured by the EQ-5D dimension "usual activities" and by questions 2, 3, and 4 from the HIT-6. Physical health is captured by "pain/discomfort" and "mobility" in the EQ-5D, and by question 5 from the HIT-6. Self-care is only measured by the EQ-5D.
The correlation coefficient between EQ-5D score value and the HIT-6 total score amounted to − 0.30. In terms of EQ-5D value and the different HIT-6 dimensions, the coefficients ranged between − 0.153 and − 0.234. The correlation coefficients between each EQ-5D domain and the overall HIT-6 score ranged from 0.077 to 0.300 (Table A.1). Lastly, the correlation coefficients among each domain from the two instruments ranged from 0.021 to 0.227. The highest correlation (0.227) was found between EQ-5D pain/discomfort and HIT-6 q4. See Supplementary Tables A.1, A.2, and A.3 for correlation tables, additionally stratified by migraine severity level.
The EQ-5D total score and the different dimensions show small SRMs, while the HIT-6 total score and its different questions show small to moderate responsiveness. For EQ-5D dimensions, SRM values range from 0.088 to 0.280 and for the HIT-6 from 0.211 to 0.669 (see Supplementary  Table A .4). Although the lack of responsiveness may be in part because we are also analysing patients in the control group, this still does not explain why the responsiveness of the HIT-6 is higher than that of the EQ-5D.
We considered three factors in the EFA. Factor 1 had meaningful loadings (i.e. higher than 0.3) on all EQ-5D domains, but not on HIT-6 domains. Factors 2 and 3 loaded only on HIT-6 domains, specifically questions 2-6 for Factor 2, questions 1 and 2 for Factor 3. Considering that this question had a higher loading in Factor 2, thus belonging to this factor, Factor 2 had meaningful loadings in five out of six HIT-6 domains (Table 2). Similarly, using an orthogonal rotation, all EQ-5D items loaded on the same factor, while HIT-6 items loaded on both Factors 2 and 3 (Supplementary Table A.5).
As the EFA does not correctly take the repeated measurement nature of the data into account, we performed a sensitivity analysis based on baseline data only. The results did not relevantly differ in terms of number of factors and meaningful loadings.
The lack of overlap in all three factors, using the two different types of rotations, suggests that the EQ-5D and the HIT-6 potentially do not capture the same latent constructs. Table 3 and the Excel file in the Electronic Supplementary Material present information on the models' coefficients and their predictive ability. Overall, for the same statistical method, models which included the HIT-6 total score performed better than those which included all HIT-6 questions as independent variables. The inclusion of interaction terms (between age and sex, migraine type and age, and migraine type and age) did not relevantly improve the prediction of EQ-5D scores within any of the six models. On the contrary, the addition of quadratic terms both for HIT-6 overall score and for several HIT-6 dimensions proved to enhance some of the models with regard to their goodnessof-fit. In the two-part model, the first model only included the total HIT-6 score, the type of migraine, age, and sex, the second included the same variables plus the quadratic term of the HIT-6 score. Figure 3 shows the observed and the predicted EQ-5D values for the different models. Our models underestimated utilities for those with poorer health states and overestimated them for those with better health states, as is common in mapping studies [41]. Although linear regression models can yield estimates above 1 (given that there is no upper bound), Model A (mixed-effects linear regression with the total HIT-6 score as an independent variable) did not generate estimates out of the bound. For Model A, the maximum predicted value was 0.98 and for Model B (mixed-effects linear regression with the individual HIT-6 questions as independent variables) 1.07.  No model performed best across all goodness-of-fit measures. Model E (two-part model with the total HIT-6 score as the explanatory variable) performed the best in terms of RMSE (Table 3). Although the R 2 value is higher for Model G, this model predicts less well both individuals at full health and those with poorer health states than Model E. The R 2 value is also higher for Model I than E, but the latter predicts poorer health states better. The adjusted limited dependent variable mixture models and beta-mixture models took into account the gap between full health and the next feasible health state. However, the low number of observations (4) with the state directly after full health (0.974) may explain why these models did not perform better.

Mapping models
Hence, if researchers wish to estimate utilities from the HIT-6 to be employed in cost-utility analyses, Model E should be the preferred model. The corresponding variance-covariance matrix is available in the Electronic Supplementary Material, in Table A.6, to allow probabilistic sensitivity analysis to be carried out and account for uncertainty. However, we would like to remark that this mapping algorithm should only be used as a last resort.

Discussion
We aimed to assess whether there is a conceptual overlap between the HIT-6 and the EQ-5D and to present a mapping algorithm for the estimation of the EQ-5D score (with German weights) from the HIT-6 questionnaire, a diseasespecific survey widely used in clinical trials with migraine patients. Our study points to major differences in the underlying constructs of the HIT-6 and the EQ-5D. The EQ-5D showed a high ceiling effect and small SRMs across time, whereas the HIT-6 did not show a ceiling effect and had a higher responsiveness. This study also provides a mapping algorithm which can be used to map HIT-6 values to EQ-5D utility values.
We expected some overlap between the two instruments since both have been validated in migraine patients. The strength of association between the instruments measured with correlation coefficients was only low to moderateboth for the total scores and for each instrument's individual questions. Furthermore, the EFA showed that the HIT-6 and EQ-5D do not have a sufficient conceptual overlap and potentially estimate different underlying constructs. There are several reasons that might explain the lack of overlap. First, the recall period in the instruments' questions is different. While all EQ-5D questions refer to the day the questionnaire is filled out, three questions in the HIT-6 refer to the previous 4 weeks. Second, the HIT-6 has frequency response categories (ranging from never to always), while the EQ-5D has response categories based on levels of severity. Third, the specificities of both the EQ-5D and the HIT-6 may also play a role. A criticism of the use of the EQ-5D to describe health utilities in patients with migraine is the fact that the survey is conducted at random points in time, thus not differentiating whether or not patients were having a migraine attack at the moment they filled out the survey [42]. The 47.3% participants with level 1 for all five dimensions (ceiling effect) may indicate that the EQ-5D poorly discriminates within patients with migraine. To our knowledge, only two studies validated the use of HIT-6 in German patients with chronic migraine. Although the study by Rendas-Baum et al. [43] included German patients, the authors could not carry out country-specific assessments because of an insufficient sample size of the four European countries included. Thus,  they treated the data as one group. Another study by Martin et al. [44] evaluated whether the German version of the HIT-6 is comparable to the United States English HIT-6.
Unfortunately, there is no information whether the recruited patients suffered from episodic or chronic migraines. Thus, further research on the validation of HIT-6 in German patients who suffer from episodic and chronic migraine could help explain the lack of conceptual overlap between this questionnaire and the EQ-5D. Given the lack of responsiveness, as well as the substantial ceiling effect of the EQ-5D for migraine patients, economic evaluations with these patients should consider other approaches to determine value, not necessarily QALYs obtained from generic utility-based instruments. In fact, the guidelines of the International Headache Society state that QALYs may fail to account for specific patient preferences due to the insensitive nature of utility instruments [8]. Thus, the use of QALYs may not be appropriate, even where utility values were collected in the study and no mapping algorithm has to be used. Using clinical effectiveness endpoints (such as monthly migraine days) to conduct cost-effectiveness analyses may thus be more suitable for economic evaluations for migraine. However, these analyses with disease-specific outcomes would pose a different problem, as they do not allow decision-makers to compare resource allocation across different conditions. Strengths of our study include the fact that trained migraine neurologists provided the migraine diagnosis to the study participants' and the low percentage of missing data. Furthermore, we could use multiple observations per person and evaluated this data with methods suitable for repeated measurements where possible. The conceptual overlap of EQ-5D and HIT-6 was evaluated carefully prior to investigating mapping algorithms, where the latter were carried out with a broad set of multivariable modelling approaches.
A limitation of our study is that no external validation could be carried out since no dataset containing both EQ-5D answers and HIT-6 was available. Randomised controlled trials are often considered the 'gold standard' for evidence-based medicine [45], and although they have several strengths in comparison to other designs, their estimates may lack generalisability with respect to different settings [46]. The ISPOR Task Force Report on Mapping mentions that such trials frequently include less diverse patients than observational studies, due to their inclusion criteria, as well as their limited follow-up [12]. Thus, we have compared some socio-demographic characteristics of our study population to those of migraine patients from a study from the German Migraine and Headache Society, which included 7417 adults from three regions in Germany (see Supplementary Table A.7) [47]. The mean ages reported for episodic migraine were 47.5 (Dortmund Health Study), 50.0 (KORA Augsburg Study), and 50.1 (SHIP Study). For episodic migraine (excluding medication overuse headache, an exclusion criterion of our study) age values were 60.8 in the KORA Augsburg Study and 61.0 in the SHIP Study (no values were available for the Dortmund Health Study). In our study, the mean age was somewhat younger at 40.1 for chronic migraine and 41.5 for episodic migraine, which can be explained by the fact that participants need to have some affinity for using apps and because Berlin is the federal state with the second lowest average age [48]. In terms of sex distribution in the episodic migraine population, the Dortmund Health Study reported 78.7% women, the KORA study 84.2%, and the SHIP 85.6%. The proportion of women in our study was comparable with 86% of participants with episodic migraine. It should be also highlighted that many of those suffering from migraine never seek professional care, such that their characteristics may not be reported in the literature. In Germany, only about two thirds of those suffering from migraine consult a physician to receive treatment [49]. Response mapping models were not conducted, since this method requires many observations in each response category and this dataset contained few responses in the worst levels [50]. The EFA was conducted without taking repeated measurements into account. However, in the sensitivity analysis with baseline data only, we obtained the same results, in terms of number of factors and meaningful loadings. As in other mapping studies, compared to observed EQ-5D, mapped EQ-5D values underestimate scores for those with 'perfect' health and overestimate scores for those with worse health states [51]. We ran mixed-effects models with random intercepts only (i.e. different intercepts for each cluster), hence assuming that the association between the independent and dependent variables is highly similar across clusters. Unfortunately, it is not possible to introduce random effects in the adjusted limited dependent variable mixture models.

Conclusion
Our results suggest that the German versions of EQ-5D and the HIT-6 are not measuring the same underlying concepts due to conceptual differences. Therefore, mapping algorithms shall only be used as a last resort for estimating utilities to be employed in cost-utility analyses.
Funding Open Access funding enabled and organized by Projekt DEAL. No direct funding was received to write this manuscript. Data for this study steams from SMARTGEM, a randomised-controlled trial financially supported by the German Innovation Committee for the promotion of new forms of care (01NVF17038).
Availability of data and material (data transparency) The datasets generated and/or analysed during the current study are not publicly available due to data protection reasons.
Code availability (software application or custom code) The code used to conduct the current study is available from the corresponding author on reasonable request.

Declarations
Conflicts of interest (include appropriate disclosures) Both Ana Sofia Oliveira Gonçalves and Lars Neeb receive financial support from SMARTGEM, but not for the specific publication of this manuscript. Outside of the submitted work: TK received honoraria from Eli Lilly, Newsenselab, TotalEnergies, Teva, and The BMJ.

Ethics approval (include appropriate approvals or waivers)
The local ethics review board at the Charité -Universitätsmedizin Berlin approved the protocol for this study (approval number: EA4/110/18).

Consent to participate and for publication
The manuscript does not contain individual person's data in any form.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.