Data used in the study were obtained from the 2020 Tianjin Health Service Survey, which was conducted by Tianjin Health Commission between July and August 2020 . Tianjin is one of the four municipalities of China, with a total of 16 districts and more than 15 million permanent population . A multi-stage, stratified cluster random sampling strategy was used. First, five subdistricts (or townships) in each of the 16 districts were randomly selected. Second, two communities (or villages) were randomly selected within each of the 80 subdistricts (or townships). Third, 60 households were randomly selected within each of the 160 communities (or villages), and consequently, a total of 9600 households were included. All residents registered under each household were invited to participate in the survey.
Data from the 2020 Tianjin Health Service Survey were collected through three different approaches in this study to comply with the COVID-19 administrative policy in China, including face-to-face paper-based interviews at resident’s home, face-to-face paper-based interviews in publicly unified places (governmental subdistrict office or community health service center), and self-report at resident’s home. The process of the face-to-face interview was as follows. First, the respondent who was the most familiar with their family situations answered the basic questions, including the annual household medication expenditure and the distance to the closest healthcare institute from home. Second, all respondents provided a series of demographic characteristics (e.g., gender and age) and socioeconomic status (e.g., education level, marital and employment status). Third, respondents aged ≥ 15 years completed both the EQ-5D-5L and SF-6Dv2, then answered health indicator questions, including the presence of chronic diseases, presence of health examinations, and presence of illnesses in the last two weeks. Forth, questions referring to children aged < 5 years and including the number of health examinations within the past twelve months and the presence of vaccination certificates were posed to their parents. Fifth, female respondents aged 15–64 years were asked questions about the number of their children and the delivery place. Last, all respondents were asked about their knowledge and satisfaction with the hierarchical diagnosis and treatment model developed in China. Informed consent was obtained from all respondents included in the survey. Detailed information on sampling and data collection can be found elsewhere .
For this study, data collected in the second and third parts of the survey were used. Respondents aged < 18 years were excluded from this study since both the EQ-5D-5L and SF-6Dv2 are recommended to be used among adult respondents [20, 53]. Respondents were also required to meet the following inclusion criteria: (1) had no missing data for the EQ-5D-5L and SF-6Dv2 measures; and (2) had no missing data for the variables used in this study, including demographic characteristics, socioeconomic status, and health indicators.
The EQ-5D-5L descriptive system comprises five dimensions, namely, mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, each with five levels of severity (no, slight, moderate, severe, and extreme problems). A visual analog scale (hereafter EQ VAS) using a scale ranging from 0 (worst imaginable health state) to 100 (best imaginable health state) is also included in the EQ-5D-5L . The EQ-5D-5L defines 3125 (= 55) different health states according to all the possible combinations of dimension levels. The Chinese EQ-5D-5L utility value set was developed using the time trade-off (TTO) approach, with the range of − 0.391 (55,555) to 1 (11,111) .
The SF-6Dv2 is derived from 10 items of the SF-36. The health state classification system of SF-6Dv2 comprises six dimensions, including physical functioning, role limitation, social functioning, pain, mental health, and vitality. The pain dimension has six response levels, while all others have five levels, resulting in 18,750 (= 5*5*5*6*5*5) different health states . The Chinese SF-6Dv2 value set was developed using the TTO approach, with the range of − 0.277 (555,655) to 1 (111,111) .
The characteristics of respondents were described using means and standard deviations (SD) for continuous variables and frequencies and proportions for categorical variables. The distribution of response levels on each dimension of EQ-5D-5L and SF-6Dv2 was reported using histograms. Descriptive statistics (mean, SD) for the EQ-5D-5L and SF-6Dv2 utility values, and the EQ VAS scores were also computed. The EQ VAS scores were adopted as an indicator of self-reported health status, which was classified into four sub-groups: < 65 (bad), 65–79 (fair), 80–89 (good), and 90–100 (excellent) in this study [27, 41, 55].
The agreement between EQ-5D-5L and SF-6Dv2 was examined using the intraclass correlation coefficient (ICC), which was computed with the two-way mixed-effects model based on absolute agreement . An ICC above 0.7 suggests an acceptable agreement . Besides, given that the distributions of utility values were highly skewed, the paired comparisons between the EQ-5D-5L and SF-6Dv2 utility values were examined using Wilcoxon signed-rank test .
Measurement properties of the EQ-5D-5L and SF-6Dv2
The measurement properties evaluated in this study included the ceiling and floor effects, convergent validity, discriminate validity, agreement, and sensitivity of the EQ-5D-5L and SF-6Dv2.
Ceiling and floor effects
Ceiling and floor and effects for each measure were assessed by examining the percentage of respondents in the best and worst health states, respectively. These effects are considered existing if more than 15% of the respondents achieved either extreme end of the scale .
Convergent validity refers to the extent to which an outcome of interest (such as the pain/discomfort dimension in EQ-5D-5L) shows an expected association with another similar outcome (such as the pain dimension in SF-6Dv2) measured at the same time point [30, 59]. Convergent validity was assessed by examining the correlation between EQ-5D-5L and SF-6Dv2 dimensions using Spearman’s rank correlation coefficient (r). An absolute coefficient value greater than 0.5 stands for a strong correlation, values between 0.35–0.49 for moderate, values between 0.2 and 0.34 for weak, and values smaller than 0.2 for poor correlation [28, 60].
The mean utility value of each measure was calculated and compared to evaluate the capacity to discriminate between each of the respondents’ characteristic groups. The t-tests for dichotomous variables (e.g., gender) and the one-way analyses of variance for polytomous variables (e.g., age group and body mass index [BMI] group) were used, respectively. Effect sizes (ES) were also used to define the discriminative capacity of the EQ-5D-5L and SF-6Dv2, which was calculated as the difference between the mean utility of two sub-groups divided by the pooled standard deviation . For polytomous variables, the effect sizes between the extreme sub-groups (e.g., the effect sizes between the aged 18–29 sub-group and the aged ≥ 70 sub-group) were calculated . The larger effect size indicates the better discriminative ability of the measures [11, 34, 36, 42, 62]. As an extended test of validity, known-group validity was used to assess the extent to which an outcome measure of interest helps distinguish between subgroups that are theoretically expected to differ . Based on the published literature [34, 42, 44], we hypothesized that the elder, the female, and the obese respondents, as well as respondents with poorer self-reported health status and chronic diseases, such as hypertension and diabetes, had lower utility values.
The sensitivity of EQ-5D-5L and SF-6Dv2 for detecting differences in both external and self-reported health indicators were tested using the relative efficiency (RE) statistic. RE was determined via the ratio of the square of t-statistics from the t-tests of the comparator measure (SF-6Dv2) over that of the reference measure (EQ-5D-5L) [42, 43, 46]. A RE value of 1.0 indicates that the SF-6Dv2 has the same efficiency as EQ-5D-5L at detecting differences in these external health indicators. A value higher than 1 indicates that the SF-6Dv2 is more sensitive than the EQ-5D-5L, while a value lower than 1 means the opposite . The receiver operating characteristic (ROC) curve was also used to evaluate the sensitivity of these two measures. The ROC curve provides a useful method to assess the performance of measures against external dichotomous variables of health status . The area under the ROC curve (AUC) was computed to compare the discriminative power of the EQ-5D-5L and SF-6Dv2 . The one that generates the larger AUC is regarded as more sensitive or effective at detecting differences, and measures with excellent discriminative ability would have an AUC score of 1.0, whereas an AUC score of 0.5 means no discriminative capacity . For the current analyses, the presence of chronic diseases (i.e., hypertension and diabetes), illnesses in 2 weeks, and hospitalizations in 12 months represented the external health indicators. The self-reported health status was dichotomized as (1) excellent versus good, fair, or bad, (2) excellent or good versus fair or bad, and (3) excellent, good, or fair versus bad.
The statistical analyses were performed using STATA 15.0 (StataCorp LLC, College Station, TX, USA). All reported statistical tests were performed two-sided with a significance level of 0.05.