This study compared three widely used HRQL measures and tried to clarify if they measure HRQL in a similar way. The three measures showed different score ranges and distributions, but EQ5D and HUI3 were more similar to each other in distribution, mean, median, maximum and minimum score than to SF6D. SF6D scores appeared to have a more normal distribution and covered a narrower range of values. EQ5D showed a ceiling effect with 31% of people scoring the highest value. As highlighted by studies with similar results, EQ5D seems to fail to describe mild-severity health levels [16, 27].
In fact, with only three levels for each dimension, EQ5D does not grade between fair and good health. However, SF6D produced higher values for health conditions at the lower end of the scale. The lowest SF6D score obtained in this study was 0.301, while EQ5D and HUI3 produced negative values. Other studies with similar results [11, 13, 21] interpreted this performance as a poor ability of SF6D to distinguish severely impaired status. Overall, these findings seem to confirm those of other studies: SF6D produced higher values at the lower end of the scale and EQ5D at the upper end.
Differences between the questionnaires were also outlined by agreement between scores in the low range. The 95% limits of agreement were quite large, ranging approximately from −0.5 to 0.3, which, on a score scale from 0 to 1, is an important discrepancy. The best agreement was achieved by SF6D and EQ5D and the worst by EQ5D and HUI3. These results were probably due both to construct and statistical issues. Regarding the construct issue, HUI3 describes eight dimensions of health, six of which are related to particular aspects (such as vision or hearing), all focussed on the physical area of health. Only two scales are related to emotional and mental health, and none to social aspects of health. In fact, HUI3 is based on a “within the skin” approach to health status assessment that concentrates on physical and emotional areas and eliminates the social one because it is “outside the skin” [31–34]. On the contrary, SF6D and EQ5D not only describe physical and emotional dimensions of health, but also the social one. These different constructs could explain the worse agreement between HUI3 and the other measures. Regarding the statistical issue, the particularly poor agreement between HUI3 and EQ5D could be due to the skewed distributions of their scores. They both showed a ceiling effect, but, in the case of EQ5D, there is also major clustering of scores around a few values, which decreases the heterogeneity of the sample and hence the level of agreement whose statistics rely upon variance. Moreover, the extreme score values can lead to occurrence of outliers in the differences distribution and therefore widen the limits of agreement.
However, the scores showed a good correlation. A Spearman coefficient of 0.6 is described as very high [43]. In our study, the Spearman coefficient was around 0.6 for all questionnaires, indicating a good level of correlation, especially between SF6D and EQ5D. This level of correlation was similar, but to some extent inferior to those found in other studies [18, 20].
These findings highlighted that the results of HRQL measures may be influenced by their frameworks and the different methods used to calculate the scores. For example, SF6D could overestimate the health status of persons with severe illness and could be more suitable for surveys on the general population or people with a fair to good health status. However, EQ5D seems to overestimate middle-severity health status and could therefore be less suitable for describing the health status of the general population and more useful for patients with invalidating disease. Moreover, the three questionnaires are not interchangeable, and their results cannot be compared because their results show poor agreement, especially between HUI3 and EQ5D. This aspect could be a major issue in comparisons among populations because health status measured with different questionnaires is unlikely to be comparable.
Considering the above, it comes as a surprise to discover that the three measures had similar performance, especially SF6D and EQ5D, in relation to socio-demographic and clinical variables. In fact, this study highlighted that SF6D and EQ5D scores are influenced by the same pattern of factors. Multivariate analysis showed scores of the two questionnaires were influenced, in particular, by gender, with females showing poorer health than males, and by chronic diseases, especially musculoskeletal. HRQL seems to be influenced by the impact that diseases have on daily life rather than by the severity or possible complications of a disease. In fact, musculoskeletal diseases, usually painful and debilitating, influence HRQL more than hypertension or cardiovascular diseases. The two questionnaires show a gender difference in health, though both are also influenced by factors related to lifestyle, such as smoking and BMI.
HUI3, on the contrary, had slightly different performance. It seemed to be influenced by educational level and especially hypertension, cardiovascular and musculoskeletal diseases. However, HUI3 did not reveal health differences between males and females and did not seem to be influenced by factors related to lifestyle. This different performance could be related, as mentioned above, to the approach of the questionnaire, which focusses on physical and emotional aspects and excludes social ones.
These results give rise to some considerations. First, the scores of all questionnaires were influenced by musculoskeletal diseases, which are conditions characterised by physical pain, difficulty of movements, immobility and, often, partly because of the associated pain, deterioration in daily activities, and which, therefore, could have an effect also on vitality and social life. Therefore, HRQL seems to be influenced more by painful, and consequently daily-activities-limiting, conditions than by diseases that may be more serious, but are often asymptomatic. This could be a limit as well as a way to highlight different aspects of health. In fact, HRQL measures may underestimate health status for painful, but not fatal diseases, while overestimating health status in the case of serious, but asymptomatic diseases. HRQL measures may therefore help detect health needs that would otherwise remain concealed, but should probably be used to integrate other health measures, such as mortality, which are more objective, but more crude, or they could be “adjusted” for morbidity conditions assessed by more objective methods (such as morbidity indexes, which describe the severity of a disease).
Secondly, none of the three questionnaires, with the exception of EQ5D, seemed to be influenced by age, after adjusting for the other variables. This suggests that most of the decrease in HRQL in old age is due to factors other than age itself, such as diseases or other conditions like loneliness, which are more frequent in the elderly.
Thirdly, SF6D and EQ5D show a similar capacity of discrimination, while HUI3 seems to be less able to distinguish different categories of people. In particular, EQ5D seems to be the only one to detect some health differences between age groups and among smokers and non-smokers, whilst SF6D identifies differences between non-smokers and ex-smokers. These results should be considered when choosing a measure. Although the questionnaires have diverse frameworks and their crude scores may be different and difficult to compare, they appeared to be influenced by socio-demographic and morbidity variables in a similar way, especially EQ5D and SF6D. This shifts emphasis from the structural and construct similarities of different instruments to the behaviour that they reveal when applied in the field. The study endeavoured to examine the performance of the three measures when used to describe patients’ condition and their determinants instead of merely comparing ranges or distributions of scores. The results obtained could help in the choice of instrument, also considering that this study did not focus on a group of patients with a specific disease in order that the results are more generalisable.
The present study shows some limitations: (1) all the information about morbidity indicators is self-reported by patients so they could be misclassified; (2) participation was voluntary so there could be selection bias; (3) refusals were not recorded so it was impossible to assess whether people who refused to answer the questionnaires differed from people who agreed. However, for the aim of the study, these possible sources of error should not be of great concern, because the biases would involve all three instruments in the same way, and comparison would not be altered. However, these possible sources of error could affect the ability of the study to generalise the findings. Another problem could be the order in which the questionnaires were administrated. Since they were always administered in the same order, the last one could have suffered from loss of accuracy. However, EQ5D, the last one allocated to patients, is the shortest and easiest, so its impletion was as good as for the other two.
In conclusion, our results show that EQ5D and HUI3 were closer to one another in many ways (score distribution, mean, median, minimum and maximum), but SF6D and EQ5D scores were more similar in the way they were influenced by socio-demographic and morbidity indicators. In some cases, such as for smoking and age, EQ5D had better discrimination capacity. It is difficult to determine which is the best instrument, but, apart from a descriptive capacity similar or better than the other instruments, EQ5D seems to have the advantage of being easier to answer.