US population norms for the EQ-5D-5L and comparison of norms from face-to-face and online samples

Purpose Normative scores (norms) allow for comparisons between population(s) of interest and the general population, which is useful for burden of disease studies and cost-effectiveness analysis. The primary aim of this study was to estimate US visual analogue scale (EQ VAS) and utility-based norms for the EQ-5D-5L using the face-to-face sample. The secondary aim was to compare norms estimated in the face-to-face and online populations. Methods This study estimated population norms from two general population surveys: (a) face-to-face and (b) online. In these surveys, respondents provided their health state using the EQ-5D-5L health classifier and the EQ VAS. Descriptive statistics, including mean, standard deviation (SD), 95% confidence interval, and median for the 5L utility and EQ VAS were estimated for each sample and across relevant respondent characteristics to serve as the basis for US EQ-5D-5L norms Results Face-to-face sample respondents (n = 1134) were representative of the US adult general population. In this sample, mean (SD) utility decreased with increasing age until age 45 or greater (age 45–54: 0.816 (0.249) age 55–64: 0.815 (0.243) age 65–74: 0.824 (0.217) age 75 + : 0.811 (0.218)). With increasing age, more problems were reported on all dimensions except anxiety/depression; a smaller proportion of respondents age 65 and older reported problems with anxiety/depression (23.8%) as compared to the youngest respondents (42.1%). Online (n = 2018) mean utility and EQ VAS values were consistently lower than the face-to-face sample. Conclusions The availability of US EQ-5D-5L norms facilitates interpretation and understanding of general population and patient health. Electronic supplementary material The online version of this article (10.1007/s11136-020-02650-y) contains supplementary material, which is available to authorized users.


Introduction
The EQ-5D is the most widely used generic multi-attribute utility instrument in the world, and it has numerous applications in health care [1]. It is applied in a variety of research and clinical practice settings, such as clinical trials, costutility analysis (CUA), patient surveillance, and population health measurement [2][3][4][5][6][7][8]. The EQ-5D was developed as a brief, generic measure of health which includes a health state classifier that has five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The first version of the EQ-5D had 3 levels of health problems (EQ-5D-3L), and more recently in 2011, a more descriptively rich 5-level version (EQ-5D-5L) was introduced which describes 3125 health states (D L = 5 5 = 3125) [9,10]. An index-based utility score can be generated from self-assessments using the descriptive system, by applying a value set based on societal preferences for EQ-5D health states. The value set is based on preference choice tasks elicited from the general population. In addition, the health state classifier is also accompanied by the visual analogue scale (EQ VAS), anchored at 0 for "the worst health you can imagine" and 100 for "the best health you can imagine" [9]. Thus, the EQ-5D-5L can provide two summary scores of health that can inform decision-making: the patient's self-rating of health on the EQ VAS, and an index-based utility score. The latter is used to facilitate the generation of a common metric of health in the form of quality-adjusted life-years (QALYs) in CUA that can inform resource allocation across health care sectors.
Among the numerous applications of the EQ-5D-5L, a set of utility index and EQ VAS benchmark values for the general population, i.e., population reference data or population norms, are useful for comparing burden of disease and as normative reference values [11]. Patient or sample values can be compared to these benchmark values to determine how individuals or groups measure against the general population in decision models, evaluation of clinical programs, assessment of public health in large-scale applications, etc. [6,[11][12][13][14][15][16][17][18][19][20][21][22][23]. Although the use of the EQ-5D-5L continues to expand, the US does not yet have population norms for comparison. In 2019, an EQ-5D-5L value set for the US was developed [24]. It was developed utilizing the standardized international protocol recommended by the EuroQol Group and used preferences from an adult sample representative of the US general population. Apart from the valuation of EQ-5D-5L health states, these respondents also self-reported their own health using the EQ-5D-5L and the EQ VAS, allowing for estimation of US EQ-5D-5L population norms [24].
In the past, most studies reporting population norms have been based on data collected in-person as part of population health studies or using mail surveys [11,12]. The standardized international protocol for EQ-5D-5L valuation studies was developed for a face-to-face, interviewer-assisted setting to ensure the respondent task comprehension and preserve higher data quality for the challenging preference choice tasks [25,26]. The US EQ-5D-5L study used quotabased sampling based on age, gender, race, and ethnicity to ensure representativeness of the sample for the population and examined non-quota-based respondents characteristics like education to support comparability to the general population. A substudy was also conducted that sought to replicate the face-to-face protocol using online panels, applying the same quota-based sampling criteria [27,28]. The online and face-to-face samples were dissimilar in terms of health, and the online sample was also less comparable to the general population more than the face-to-face sample in terms of non-quota-based characteristics. Thus, the face-toface valuation study was selected as the primary source of population norms for the present study. However, as online data collection continues to gain relevance, there is value in understanding the differences between the two sources of respondents.
The primary aim of this study was to estimate EQ-5D-5L EQ VAS and index-based norms for the US general population using the data from the US valuation face-to-face study. A secondary aim was to compare face-to-face and online samples based on non-quota-based characteristics to understand the issues of generalizability with respect to mode of data collection.

EQ-5D-5L face-to-face valuation study
The purpose of the EQ-5D-5L face-to-face valuation study was to elicit preferences for EQ-5D-5L health states from a representative sample of the US adult general population. This study was granted Institutional Review Board (IRB) exemption by the University of Illinois at Chicago IRB. Collected preferences were used as the basis for modeling the US EQ-5D-5L value set [24]. The face-to-face study followed the internationally standardized EQ-5D-5L valuation study protocol based on a robust body of evidence [25,[29][30][31][32]. Respondents were recruited using a variety of inperson, community, and online advertising methods. They were quota-sampled using age, gender, race, and ethnicity to match the most recently available US population parameters at the time. Respondents indicated informed consent to the interviewer prior to survey participation.
Eleven interviewers traveled around the United States between May and August 2017 to survey respondents in oneon-one sessions in six metropolitan areas: Chicago, Philadelphia, Phoenix, Birmingham, Seattle, and Denver [24]. Interview areas were chosen based on their representativeness of the general US population and to ensure data collection in each of the four US census regions: Northeast, Midwest, West, and South. Interview sessions occurred at several locations around each metropolitan area, including both city and suburban/rural areas. Each interview was conducted as a face-to-face, computer-assisted personal interview (CAPI). Surveys were completed in Spanish or English based on the respondent's preferred language.
At the start of these surveys, respondents described their own health using the EQ-5D-5L visual analogue scale (EQ VAS) and the EQ-5D-5L descriptive system by indicating their level of problems on each of the 5 dimensions on the day of the survey (no, slight, moderate, severe, or extreme problems/unable to) [24]. Index-based utility scores for each respondent's self-reported EQ-5D-5L health state was estimated by applying the US value set [24]. The US value set was based on the composite time trade-off (cTTO) preferences, and index values ranged from − 0.573 for the worst (55,555) to 1 for the best (11,111) EQ-5D-5L health state. Respondent self-reported EQ VAS values were also used to estimate general population norms.

EQ-5D-5L online valuation study
The EuroQol valuation study protocol and survey were also replicated in an online valuation substudy. The study was determined to be eligible for an IRB exemption by the Western Institutional Review Board. Respondents in the online study were recruited from established online survey panels using the same strata as the face-to-face study. Respondents indicated informed consent on the first survey screen prior to participation. The sequence and content of self-reported respondent characteristics, including EQ VAS and as an EQ-5D-5L health state, were the same between online and face-to-face surveys. Online respondents self-completed the survey without any interviewer supervision.

Analyses
Norms for the utility index and EQ VAS were estimated using descriptive statistics, including mean, standard deviation, 95% confidence interval, and median. Proportions of respondents endorsing each level of severity for the five dimensions were also calculated. Norms were computed for the face-to-face and online samples separately. Characteristics important for informing population health, CUA, and clinical outcomes research were included as stratifying variables in the present analyses, including sociodemographic information (age, gender, race, ethnicity), and health (general health status, number of regular prescription medications) [33,34]. Age was divided into seven age bands (18-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75+).
During the surveys, respondents also self-reported diagnoses of certain illnesses, and the mean utility and EQ VAS values were estimated for each diagnosis. Only illnesses that were self-reported by more than 100 respondents were included in these analyses due to instability of the estimates with small sample sizes.
Utility and EQ VAS were analyzed as continuous variables. Statistical testing for differences in mean VAS or utility across groups was conducted using t-tests (gender, ethnicity, experience with illness) or ANOVA (age, race, general health status, number of prescription medications). Significance was designated at p < 0.05. All statistical analyses were completed in SAS 9.4 (Cary, NC, USA) The online and face-to-face norms were qualitatively compared.

US EQ-5D-5L norms
One thousand one hundred and thirty-four adult respondents were recruited as part of the US face-to-face valuation study [24]. The sample was representative of the US general population for age, gender, race, and ethnicity (Table 1). Compared to the general US population, the face-to-face sample was more likely not to have child dependents under 18 and attained education greater than secondary school. There were no missing data for the EQ-5D-5L health states nor the VAS. Some covariates were missing for a single respondent who had to terminate the interview early. Five respondents identified as non-binary for gender, but norms could not be generated due to the limited sample size.
Of the face-to-face sample, 31.2% of the respondents reported no problems on any of the EQ-5D-5L dimensions (11,111). (Table 2) The mean (standard deviation) utility value for the face-to-face sample was 0.851 (0.205). (Table 3) Mean utility differed across age groups (p < 0.001) and decreased with increasing age until the 45-54 age band. Means for age bands 45-54 and 55-64 were similar − 0.816 (0.249) and 0.815 (0.243), respectively. The mean increased for age band 65-74: 0.824 (0.217) and decreased again for the oldest age band of 75+ : 0.811 (0.218). Women and men had similar mean utility scores: 0.856 (0.191) versus 0.847 (0.219) (p = 0.487). No statistically significant differences in mean utility scores were identified across race and ethnicity categories.
Mean utility scores decreased with poorer general health as respondents with excellent, very good, good, fair, and poor health had mean (SD) scores of 0.951 (0.096), 0.910 (0.131), 0.835 (0.160), 0.632 (0.262), and 0.338 (0.380), respectively. (Table 3) The mean utility also decreased with increasing number of regular prescriptions taken (p < 0.001). Respondents with experience with serious illness, whether personal, intra-familial, or caring for others, had lower mean utility and EQ VAS than those with without the experience (p < 0.01 for all comparisons).
The mean (SD) EQ VAS for the sample was 80.4 (15.6). (Table 3) The mean EQ VAS pattern across age bands differed from mean index. EQ VAS decreased with increasing age until the 45-54 age band and reached a nadir of 75.9 (18.6); it then increased through the rest of the age bands, reaching 81.1 (15.6) for respondents 75 and older. Across other respondent characteristics, VAS norm trends were comparable to those observed for utility norms.
Of the EQ-5D-5L dimensions, fewest respondents reported problems with self-care (6.5% of respondents) and the most respondents reported any problems with pain/discomfort (51% of respondents). (Table 4) For both the index score and EQ VAS-based norms across the entire sample, means generally decreased with increasing problems on each dimension. Few respondents endorsed severe (level 4) and extreme problems/unable to (level Norms were also estimated separately by gender with additional stratification by age and general health status. (Table 5) Across age bands, women had higher mean index than men except for the 45-

Online descriptive statistics and comparison to face-to-face norms
Two thousand and eighteen respondents were recruited from online panels to participate in the online EQ-5D-5L valuation study. Online respondents were also generally representative of the US general population, but the Caucasian race was overrepresented; 77.8% of online respondents were White whereas only 65.5% of the US adult general population was White (Table 1). Similar to the face-to-face sample, online respondents were also more likely to have attained at least secondary education compared to the US general population. Respondents who began but did not complete the survey were not included in these analyses. Two thousand and eighteen respondents completed the online survey, and there was no missing data in this sample.
Of the online sample, 23.9% of the respondents reported no problems on any of the EQ-5D-5L dimensions. (Supplementary material Appendix B) In comparison to faceto-face respondents, online respondents had similar patterns of mean utility and EQ VAS values across different levels of covariates, e.g., decreasing mean norms with worsening general health status and lack of consistent decrease of mean norms with increasing age. (Tables 3-5, Supplementary material Appendices C-E) Online respondents were also more likely to report issues on EQ-5D-5L dimensions with increasing age except for anxiety/depression. (Supplementary material Appendices A, F) Online respondents consistently had lower mean index and EQ VAS values compared to the matching face-to-face age subgroup (Fig. 1). The mean index and EQ VAS values of the overall online sample were 0.800 (0.236) and 74.6 (18.7), respectively, whereas the corresponding values in the face-to-face sample were 0.851 (0.205) and 80.4 (15.6). (Table 3 and Supplementary material Appendix C). For both the EQ VAS and the utility index, the corresponding online values were consistently lower than the face-to-face values across age bands and levels of general health. (Fig. 1).
The effect of mode of administration on index and VAS norms were isolated in linear regressions. After controlling for age, race, gender, ethnicity, and general health status, online respondents had average index and VAS values that were 0.027 and 3.0 units lower than face-to-face respondents, respectively. (Results not shown).

Comparison of older adult respondents to the US general population
As the plateau effect of norms with increasing age was unexpected, additional comparisons of respondents who were at least 65 years old to the US population were completed. Two hundred and twenty-three face-to-face respondents and 290 online respondents were in this age segment (Supplementary  material Appendix G). Face-to-face respondents were more similar to the US general population in terms of distribution of age bands, race, and gender. Both sets of respondents were more educated than the US general population-larger portions of face-to-face and online respondents achieved at least a bachelor's degree as compared to the US general population. The face-to-face sample was healthier than the respondents recruited for the Medical Expenditure Panel Survey (MEPS), but the comparison of respondent health between the online sample and MEPS respondents was less straightforward.

Discussion
In this study, we reported age and gender-based norms for the EQ-5D-5L in the US general population. The reference norms were based on data collected from implementation of the EuroQol international standardized protocol using a study design that used quota-based sampling and involved face-to-face, interviewer-assisted administration of the survey and valuation tasks. Data based on the same quotas for sampling collected via online panels provided systematically different respondents and norms, so these were reported separately. Index-based utility and EQ VAS mean scores did not consistently decrease with increasing age. Instead, the norms were characterized by plateaus or minor increases in older age bands. This pattern may be explained by the lower reported prevalence of anxiety/depression with increasing age as well as different psychometric properties of the EQ-5D-5L (e.g., differential item functioning) and varying priorities for the dimensions by age [35][36][37][38][39][40][41]. In the face-to-face sample, approximately 24% of respondents 65 and older reported any problems with anxiety/depression whereas 42.1% of respondents 18 to 24 years old reported the same.(Supplementary material Appendix A) Although issues with mental health may be more often associated with social stigma in older adults and cause fewer older adults to indicate issues with mental health [42], differing levels of social desirability bias due to interviewer presence by age band was unlikely the only contributor to this unexpected distribution by age; the same differential between age bands was observed in the online sample: 24-30% of online respondents 65 and older and 57.1% of respondents 18-24 reported any problems with anxiety/depression (Supplementary material Appendix F).
Similar to the norms reported here, other research corroborates higher prevalence of mental health (e.g., anxiety and depression) issues in US young adults [35][36][37][38]. In 2017, the National Institute of Mental Health (NIMH) and the Substance Abuse and Mental Health Services Administration (SAMHSA) found that 4.7% of US adults 50 years and older had a major depressive episode in the past year compared to 13.1% in respondents 18 to 25 years old [35]. Older data from 2001 to 2003 showed that 9.0% of respondents 60 years and older had anxiety disorder in the past year whereas 22.3% of respondents 18 to 29 years old reported the same [36]. Further, mental health reasons for leaving jobs were highest in youngest members of the workforce, and anxiety has also been increasing among young adults since 2008 [37,38]. In developed, Western countries such as Germany and the UK, some younger segments of the population were also more likely to report mental health issues compared to older segments [43,44].
Response shift may also have occurred in the older respondents, potentially leading them to interpret or conceptualize the meaning of "no problems" differently from younger respondents who may impose a more ideal expectation when self-reporting their health [39-41, 45, 46]. These interpretation differences could contribute to distinctive psychometric properties of the EQ-5D-5L in different age segments of the population. A combination of increased mental health problems in younger adults and differing interpretation of the problem severity labels could have contributed to the decreased prevalence of anxiety/depression in older respondents and the observed patterns of norms across age groups.
The patterns seen in the US EQ-5D-5L norms are also present in other international norms. Similar to this study, a greater proportion of younger respondents in Chinese urban and Indonesia general populations reported problems with anxiety/depression on the EQ-5D-5L [17,19]. In China, 34.5% of men 20 to 29 reported any problems with anxiety/ depression whereas only 11.5% of men over 70 did the same; the trend was similar in Chinese women [19]. Approximately 40.1% of Indonesian respondents aged 17 to 30 indicated they had any issues with anxiety/depression compared to 32% of respondents older than 50 [17]. Similar to the present study, Canadian, specifically Alberta and Quebec, and   Indonesian EQ-5D-5L norms also plateaued in adjacent, older age groups [17,18,47]. The systematically lower mean utility and VAS values noted in online respondents relative to face-to-face respondents were likely affected by a combination of factors. Social desirability bias may have contributed to the observed disparities between samples. A greater proportion of online respondents reported issues across all EQ-5D-5L dimensions compared to face-to-face respondents (Supplementary material Appendices D and E). Further, a smaller portion of online respondents indicated that their general health was excellent compared to the face-to-face respondents (12.1% versus 20%; Table 1). If the populations had similar health, face-to-face respondents may have been unwilling to admit health issues in front of an interviewer due to social desirability bias. Past research also found that self-reported health and norms differed when surveys were self-administered versus interviewer-administered via telephone [48].
However, respondents could also have truly differed between modes of data collection due to the varying selection pressures of recruitment and survey needs for each mode of data collection [49]. US online panels tend to be disproportionately White and unrepresentative of minorities [50,51]. The company which administered the online surveys used in this study noted comparable patterns of minority under-representation in the online panels they employ, which contributed to the low prevalence of non-Black minorities recruited into the online sample [52]. In addition, online respondents must have reliable access to internet and a computer with which to access it and belong to a survey panel to be selected for the study. Face-to-face respondents needed to participate in the study at centralized locations, potentially requiring transportation to attend interviews outside of their homes. Respondent characteristics which determine agreement to survey participation may additionally differ by mode. Distinctions between samples were most noticeable when respondent characteristics unlikely to be susceptible to social desirability bias were examined. For example, online respondents were more likely to have children under 18 than face-to-face respondents (Table 1). Finally, face-toface respondents aged 65 and older were more comparable to the US general population than these respondents in the online sample, particularly in terms of age and gender distribution. (Supplementary material Appendix G).
As the online population appeared to be less representative of the US general population, the two samples were kept distinct. Based on the aforementioned differences between online and face-to-face respondents as well as the improved representativeness in older respondents, the face-to-face norms should be the primary set of normative values for the US population, particularly for data collected in-person. The online norms may be most useful for comparison of unsupervised data elicited from online panels.
The study had several limitations. Quota-sampling was employed for practical time and cost considerations, and random sampling may have allowed for a more representative sample of the US population, particularly in the faceto-face sample. This shortcoming can be observed in the education attainment and general health of respondents who were at least 65 years old (Supplementary material Appendix G). The sample sizes included in the present study were relatively small in comparison to the US population size, and each face-to-face respondent represented approximately 290,000 US inhabitants. As the EQ-5D-5L is not included in any large-scale US general population surveys, these data were the best available sources to estimate US general population norms. Finally, the mean index and VAS values estimated for each diagnosis in both online and face-to-face samples should only be used as a general guidance for disease burden as these subgroups are likely heterogenous in disease stage, acuity, and patient experience.
With the present study, US EQ-5D-5L utility and EQ VAS norms are now available as general population benchmarks to support health services research across research, clinical, and policy settings for two major methods of data collection. End-users of the EQ-5D-5L may choose the set most appropriate for the application.
Funding Funding was provided by the EuroQol Group. This work was completed while Ruixuan was a graduate student at the University of Illinois at Chicago, during which she was supported by the UIC Dean's Dissertation Fellowship and PhRMA Foundation Pre-Doctoral Health Outcomes Fellowship.