Background

Interest in health-related quality of life (HRQoL) issues has increased in recent decades, and the number of citations for “quality of life” in the medical literature has increased significantly. HRQoL instruments are essential for evaluating HRQoL as an outcome measure of community- or hospital-based interventions [1]. The Short Form-36 Health Survey, version 2 (SF-36 v2) is one of the most popular generic worldwide instruments for evaluating HRQoL. The SF-12 v2 is a shorter version of the SF-36 v2 that uses only 12 questions. Because the SF-12 v2 is brief and measures various aspects of health status, it has become the instrument of choice in population health surveys and in clinical studies that combine it with disease-specific instruments [2, 3]. Several studies have reported the validity and reliability of the SF-12 as a measure of HRQoL in a range of medical conditions, as well as in the general population [48]. Although the psychometric properties of the Korean SF-36 v2 have been evaluated in the general population [9, 10], a similar evaluation of the psychometric properties of the Korean SF-12 v2 is yet to be performed.

In addition, there is some evidence suggesting cultural differences in the item interpretation of HRQoL instruments [11, 12]. Therefore, assessing the feasibility and understanding the psychometric properties of the instruments should precede their application in research when instruments developed in other countries are adapted to the Korean population. Therefore, the aim of our current study was to evaluate the psychometric properties of the Korean version of the SF-12 v2 in the general population and to provide SF-12 v2 domain scores according to the general characteristics of the study population.

Methods

Study design

This study was conducted using individual face-to-face interviews. The survey was performed from August 2013 to November 2013 by 27 trained interviewers. Respondents were asked to complete the Korean version of the SF-12 v2 for HRQoL. Data on demographic factors (i.e. age, sex, level of education, and occupation) and health-related factors (i.e. current disease, outpatient visits in the past 2 weeks, and hospitalization in the past year) were also collected.

Setting and samples

Out of the 3,206 households that were contacted for interviews, 1,000 successful interviews were conducted (31.2%). The target population included individuals aged 19 years or older living in Korea (except for Jeju Island) who consented to participate in the survey. Sampling was performed using a multistage stratified quota method. Sample quota were assigned to each of the 15 Korean regions according to the population structure (gender, 10-year age group, and level of education [12 years or less vs. more than 12 years]), as defined by the resident registration data of the Ministry of Administration and Security of South Korea in June 2013.

Ethical considerations

This study was approved by the Institutional Review Board of the National Evidence-based healthcare Collaborating Agency (approval number: NECA IRB13-002), and all of the participants provided written informed consent.

Measurements

Our present study used the Korean SF-12 v2. The SF-12 v2 is a multipurpose, short form, health survey that includes 12 items taken directly from the SF-36 v2. The SF-12 v2 yields eight scale scores (physical functioning [PF], role-physical [RP], bodily pain [BP], general health [GH], vitality [VT], social functioning [SF], role-emotional [RE], and mental health [MH]). Four scale scores (PF, RP, RE, and MH) are calculated using two items each, whereas the remaining scales (BP, GH, VT, and SF) are represented by a single item [13]. Several worded items were recoded so that higher scores indicate a better condition. Scale scores were transformed into the 0 to 100 range according to the scoring manual [14]. The 12 items are used to derive two summary measures (i.e. physical component summary [PCS] and mental component summary [MCS]) [15].

Data analysis

The SF-12 v2 was assessed according to the data quality indicator recommended by its developer [13]. The assessment included completeness of the data, based on the percentage of the total number of items with a valid item response, as well as on the percentage of responses outside the range. In addition, convergent validity was tested to determine whether items were expected to represent the PCS or the MCS. When all of the hypothesized item-component correlations were 0.30 or greater, convergent validity was considered to be acceptable. It was hypothesized that the PCS is related to the PF, RP, GH, and BP items, and the MCS is related to the MH, RE, VT, and SF items. Finally, discriminant validity was assessed to determine whether an item more highly correlates with its hypothesized component summary measure score than with the alternative component summary measure score. When all of the hypothesized item-component correlations were significantly higher than the alternative item-component correlations, item discriminant validity was considered to be satisfactory. In addition, the percentages of respondents who achieved either the highest score (ceiling) or the lowest score (floor) were calculated because large ceiling and floor effects may limit the responsiveness of the SF-12 v2 [9, 13].

To assess construct validity, SF-12 v2 scale scores were calculated in terms of sociodemographic and health-related factors. It was expected that the SF-12 v2 scale scores would be lower in women, older persons, poorly educated persons, the unemployed, those suffering from any disease, and recent health service users [11, 1619]. Comparison of differences in scale scores between groups was performed using the student’s t-test or analysis of variance with post hoc Tukey’s test.

The summary measure, internal reliability, was analyzed with Cronbach’s alpha. When Cronbach’s alpha was ≥0.7, the reliability was considered to be acceptable [20]. To test whether the Korean SF-12 v2 produced the hypothesized structure of the original survey, exploratory item level factor analysis was performed using principal component analysis with varimax rotation. Factor loadings ≥0.4 were considered to be significant [21]. All statistical analyses were conducted using SAS (version 9.1; SAS Institute Inc., Cary, NC).

Results

The mean age of the participants was 45.0 years (standard deviation [SD], 14.3) and 50.1% of the participants were women. A total of 126 participants (12.6%) reported a current disease, and most of the participants were employed or self-employed (Table 1). The completeness of the data was 100%, and there were no out-of-range values. SF-12 v2 item descriptive statistics are presented in Table 2. The ceiling effect was considerably higher for the PF, RP, BP, SF, and RE items, whereas only 23 participants (2.3%) responded in the upper end of the scale for all items. The floor effect was <2% for the majority of items.

Table 1 General characteristics of the study respondents
Table 2 SF-12 v2 items and descriptive statistics (n =1000)

The Spearman correlation coefficients for the SF-12 v2 items and their component summaries are shown in Table 3. All of the items were correlated with their hypothesized measures by ≥0.30. Each item and its hypothesized component demonstrated a correlation between 0.59–0.78. In terms of discriminant validity, all of the items were more highly correlated with their hypothesized components than with the alternative components.

Table 3 Correlations between SF-12 v2 items and component summaries (n =1,000)

The scale scores of the Korean SF-12 v2 according to the sociodemographic and health-related variables are shown in Table 4. Significant differences were observed in SF-12 v2 scale scores. As expected, the scale scores of women were significantly lower than those of men in all scales except for the SF and RE scales. The oldest age group (≥70 years) demonstrated a significantly lower value than the other age groups on most of the scales except for the MH scale when the post hoc Tukey’s comparison was applied. Highly educated people tended to report higher values than poorly educated people on all scales. People suffering from disease and those who recently used the hospital service demonstrated significantly lower scores than the other participants on most of the scales. Scale scores according to gender and age group are presented in Table 5.

Table 4 SF-12 v2 scale scores according to general characteristics and health-related factors (n =1,000)
Table 5 Scale scores and component summary scores according to gender and age group (n =1,000)

Internal consistency reliabilities were 0.84, 0.83, and 0.85 in the PF, RP, and RE domains, respectively, whereas the reliability was 0.37 in the MH domain. The reliability of all SF-12 v2 items was 0.88. Cronbach’s alpha for the PF, RP, GH, and BP items was 0.83, and that for the MH, RE, VT, and SF items was 0.79. Item factor analysis demonstrated the presence of three factors that accounted for 65.1% of the variance. The results are presented in Table 6. The PF, BP, and GH items loaded onto the physical health concept (factor 1) and the VT, MH, and GH items separately loaded onto the psychological health concept (factor 3). The SF, RP, and RE items loaded onto factor 2.

Table 6 Factor loadings of the SF-12 v2 items after varimax rotation

Discussion

Quality of life is a critical component of healthcare. Many HRQoL outcome measures have been used in clinical and health economics research. Prior to the application of HRQoL instruments, evidence on the psychometric properties of each instrument should be considered. Our study assessed the data quality and psychometric properties of the Korean version of the SF-12 v2 in a general population sample. The rate of missing data was zero, and the quality criteria recommended by the developer of the instrument were satisfied in our study. All of the correlations between the items and their hypothesized components were >0.3, and all of the items were more highly correlated with their own hypothesized components than with other competing components. Generally, the item scores in our sample were higher than those in other countries. Korean people seem to evaluate themselves as healthy compared to people from other countries. Differences in the SF-12 v2 scale scores in terms of sex, age, educational level, health status, and use of health services showed evidence of construct validity.

Psychometric properties of the SF-12 have been demonstrated in the general population of various countries, including USA [4, 22], Israel [5], Sweden [7], Greek [8], Hong Kong [19], and so on. Psychometric properties of the SF-12 v2 in the Americans and Chinese adolescents have been presented [6, 23]. In terms of convergent and discriminant validity, all of the hypothesized item-component correlations were 0.30 or greater, and hypothesized item-component correlations were significantly higher than the alternative item-component correlations in previous publications [17], but, the study by Jakobsson et al. showed that item-component correlations argued against the suggested structure in a general elderly population (aged 75+) [7]. Scale and component score was lower in older persons, poorly educated persons, the unemployed, those suffering from any disease, and recent health service users [8, 11, 1619]. Cheak-Zamora et al. showed high test–retest reliability of PCS (ICC = .78) and moderate reliability of MCS (ICC = .60) [6]. Factor analysis yielded two factors and hypothesized item included the same factor in some of the countries [4, 8, 17]. However, the study performed in Israel revealed three factors and physical role loaded as a separate factor [5], and the results of the study by Jakobsson et al. failed to support a two-dimensional item structure among the elderly population [6].

This study demonstrated the psychometric properties of the Korean version of SF-12 v2. The vitality (a lot of energy) and MH (calm and peaceful, and downhearted and blue) items in the Korean population scored lower than those in Greek and Iranian studies [8, 17]. Our data showed higher ceiling effects than these studies, but our results were similar to those of a previous study in Chinese adolescents [23]. The RE and RP items were changed from two levels in version 1 to five levels in version 2, although the highest scores were still elevated and they ranged from 70.1% to 82.3% but the floor effects were lesser than those in a previous study [5, 8, 17]. Internal consistency reliability was >0.7 for the PF, RP, and RE scales, but the internal reliability of the MH scale was low at 0.37 in our study. Korean people may be free from the influence of two MH items (Calm and peaceful, downhearted and depressed), respectively. These two items were loaded onto a different factor in a previous study on Korean SF-36 [9]. These findings for reliability are comparable with the reliability of 0.34 found in a Chinese study [23]. Factor analysis of individual items produced partial matching of items to their hypothesized components. However, the loading of the items separated into three factors and aggregated into? SF, RE, and RP items. This pattern is unique to the Korean population, as the RE and RP items were also loaded onto the same factor in the Korean SF-36 v2 [9]. Use of item or scale scores rather than use of two summary measures of the SF-12 v2 seems to be more appropriate in the Korean population.

There were some limitations to our present study. Firstly, although we had recruited respondents nationwide, the external validity of the sample would be limited. The age and sex distributions of our sample were similar to those reported in the 2010 national census, but participants in this study reported lower health care utilization than the participants of the 5th KNHANES, which is a national-wide health survey of more than 30,000 people. Lower health care utilization may indicate that our population sample was healthier than the general Korean population. Healthy people may assign a HRQoL score by producing high item scores and a low floor effect. In addition, we did not explore face validity, concurrent validity, test-retest reliability, and responsiveness for health state change. Therefore, further research on the psychometric properties of the SF-12 v2 is needed.

Conclusions

The Korean SF-12 v2 seems to be a feasible, valid, and reliable instrument for measuring the HRQoL of a general population. The use of scale scores instead of component summaries seems to be more appropriate in Korean people. Further research on other psychometric properties of the Korean SF-12 v2 is desirable.