Background

The importance of evaluating the outcomes of health care from the standpoint of the patient is now widely recognized. Measures of health-related quality of life (HRQOL) have been used to track outcomes for many eye diseases [16]. HRQOL refers to health status in the physical, mental, and social domains, and to the effect of a disease, its symptoms, and treatments on patients' lives. Conventional clinical measures such as visual acuity and visual field assessments do not fully capture the influence of visual disability on daily visual functioning and on abilities to perform activities of daily living that are valued by patients.

In response to a need for a vision-targeted measure of quality of life, the National Eye Institute (NEI) funded the development of such an instrument in the mid-1990s. The resulting 51-item questionnaire is known as the National Eye Institute Visual Function Questionnaire (NEI VFQ) [7, 8]. To lessen the burden on respondents and to improve data quality, a shorter version was developed: the NEI VFQ-25 [9]. The NEI VFQ-25 has 25 items that measure vision-targeted HRQOL and are grouped into 12 subscales: general health (GH, 1 item); general vision (GV, 1 item); ocular pain (OP, 2 items); difficulty with near-vision activities (NV, 3 items); difficulty with distance-vision activities (DV 3 items); limitation of social functioning due to vision (SF, 2 items); mental health problems due to vision (MH, 4 items), role limitations due to vision (RL, 2 items); dependency on others due to vision (DP, 3 items); driving difficulties (DR, 2 items); difficulty with color vision (CV, 1 item); and difficulty with peripheral vision (PV, 1 item). Each subscale score is converted to a score between 0 and 100, and higher scores indicate better vision-specific HRQOL. The composite VFQ-25 score is the mean score of all items except for the general health item. The VFQ-25 has adequate reliability and validity, and subscale scores from the shorter form correlate highly with scores on the original long version. This questionnaire has been translated into Italian, French, Spanish, and German, and validated [1013], and it has been widely used to describe the HRQOL of patients with ocular disease and to assess the treatment of ocular disease [1420].

We developed a Japanese version of the NEI VFQ-25 (Appendix [see additional file 1]). and evaluated its psychometric characteristics. We investigated three points in particular. First, we looked at each question item in the Japanese version quantitatively and qualitatively, taking into consideration Japanese lifestyles, and made the necessary adaptations. Second, although composite NEI VFQ-25 scores can be computed, there is no published evidence of this scale's uni-dimensionality. Therefore, on the basis of the Japanese version's factor structure and other psychometric characteristics, we propose a particular combination of subscales that can be used to compute an appropriate composite score. Third, research on the responsiveness of the NEI VFQ-25 is limited [4, 21], so we quantified its responsiveness, using data obtained before and after cataract surgery.

Methods

Development of the Japanese version

One of us (CMM) was a developer of the original NEI VFQ-25. The Japanese version was developed in conformance with standard methods that have been adopted internationally [22], including forward translation, back-translation, examination of the translation quality and adjudication by bilingual speakers, and a pilot test on 15 persons. One item was changed to make then more appropriate to Japanese life style and culture (details below). The content of the translated questionnaire was reviewed by one of the original developers of the English version, and the Japanese version was considered appropriate for administration in a psychometric field test.

Study design and population

Two groups of patients were studied. The first group was a convenience sample of 276 outpatients who visited the departments of ophthalmology at 5 hospitals. To participate, patients had to be 21 years of age or older, had to have clinical evidence of age-related cataracts, glaucoma, or age-related macular degeneration (ARMD), and had to have been seen at least twice in the past 3 months at the participating hospital. For patients with cataracts, the inclusion criteria were having cataracts in both eyes and 20/30 or worse visual acuity in the better eye. Inclusion criteria for patients with glaucoma were binocular primary open-angle glaucoma, binocular abnormalities as measured with a Humphrey field analyzer, defects in the optic nerve, at least one documented instance (in each eye) of intraocular pressure greater than 21 mmHg, and no incisional surgery for treatment of glaucoma during the previous 3 months. For patients with ARMD, there were three inclusion criteria: having at least one of the following 5 conditions: abnormal retinal pigmented epithelium, sub-retinal neovascular membrane, disciform scar, previous laser treatment to the macula, or geographic atrophy involving the fovea; having small drusen in other areas; and binocular involvement. Also included in Sample 1 was a reference group of patients with refractive error only and hospital employees.

The second sample consisted of 110 patients who had been recruited from 6 different departments of ophthalmology and were scheduled for bilateral cataract surgery (phacoemulsification and implantation of foldable intraocular lenses). Inclusion criteria for these patients were bilateral cataracts and preoperative corrected visual acuity of 20/30 or better in both eyes.

Attending physicians explained the research and ethical considerations to the participants, who then indicated their understanding by signing an informed-consent form. This study was done in accord with the Declaration of Helsinki.

Data collection

All surveys were administered by a trained interviewer. The interviewers had no direct involvement in the medical care of the patients. The interviews included the Japanese version of the NEI VFQ-25 and 14 optional items about aspects of vision-specific HRQOL (which were not presented to patients who underwent cataract surgery), and SF-36 to measure general HRQOL [23, 24].

The attending physician recorded, on a structured form, the type of eye disease, duration of disease, uncorrected vision, maximally refracted vision, vision with habitual correction, and ocular pressure. In addition, severity of age-related cataracts was graded with the Lens Opacities Classification System (LOCS) III (slit lamp, standard testing conditions [25]), and in participants with glaucoma visual field was assessed with a Humphrey field analyzer 30-2. In patients with ARMD, the type of ARMD and the size and location of absolute scotoma were recorded. The data were managed by ID number, and were analyzed in a way that maintained the participants' privacy.

Statistical analysis

All statistical analyses were done with SPSS version 12 for Windows (SPSS Inc, Chicago, IL).

Descriptive analysis and item analysis

The item analysis was done using the data from the multi-condition group (Sample 1). The percentage of missing values was examined for each item. We also examined whether each item's distribution of responses was strongly skewed (large ceiling effect or floor effect).

Reliability

Cross-sectional data from the multi-condition group (Sample 1) were used to quantify reliability. Cronbach's alpha coefficient [26] was used as the index of internal consistency for each subscale. To quantify test-retest reliability, intraclass correlation coefficients [27] were used. The test-retest data were obtained from clinically stable patients with age-related cataracts, in surveys done 2 weeks apart.

Validity

The use of multi-trait analysis to evaluate convergent and discriminant validity has been described previously in detail [28]. What follows is a brief summary of the method: Each item is hypothesized to belong to only one multi-item subscale. For each item, correlations between the score on that item and the scores on all the subscales are computed. Then, for each item, if the correlation between the score on that item and the score on the subscale to which that item belongs is 0.4 or higher, that item is said to have "passed" the test of convergent validity. Also for each item, if the correlation between the score on that item and the score on the subscale to which that item belongs is greater than the correlations between the score on that item and the scores on all the subscales to which it that item does not belong, then that item is said to have "passed" the test of discriminant validity [29].

To assess concurrent validity, we computed correlations between scores on the NEI VFQ-25 and on the SF-36 subscales. We hypothesized that the NEI VFQ-25 "mental health", "social functioning", "role difficulties" and "dependency" scores would be associated more strongly with the SF-36 subscale scores that measured similar domains.

The subscale scores of participants with poor visual acuity were compared to those of participants with better visual acuity. Also, by analysis of variance, the subscale scores were compared among those with age-related cataracts, ARMD, and the reference group. In addition, scores on the peripheral-vision subscale in the patients with glaucoma were compared to those in the reference group. We also computed the correlations between subscale scores and visual acuity with habitual correction in the better and worse eye and deficits in visual fields as measured by the Humphrey Field Analyzer 30-2 in the better and worse eye.

Finally, we used factor analysis to assess the uni-dimensionality of the scale, in preparation for computing a composite score. Factor analysis was done using 10 subscales ('General Health' and 'Driving' were not included), with the maximum-likelihood solution and promax rotation. The 'Driving' subscale was not included because more than 60% of the responses on this subscale were missing.

Responsiveness

Responsiveness was studied using data from the reference group and from the patients who completed the survey before and 2 months after cataract surgery. Differences related to cataract surgery were analyzed with Student's t-test for paired data, and with the responsiveness statistic of Guyatt [30]. The responsiveness statistic is the ratio of the clinically important difference (sometimes denoted by the Greek letter delta in sample-size calculations) to the variability in stable subjects (the square root of twice the mean square error).

Results

Translation and pilot test

On the basis of the translations and discussions among the developers, one item was changed to conform better to Japanese norms. In The item "Because of your eyesight, how much difficulty do you have visiting with people in their homes, at parties, or in restaurants?", "visit at parties" was changed to "going to gatherings". Also, when a pilot test was done in 5 subjects without eye disease and 10 subjects with eye conditions, we found no expression equivalent to 'not applicable'. Therefore each item was rewritten so that it had a stem, in which the participants were asked whether they did the activity. If they indicated that they did the activity, then they were asked about the degree of difficulty in doing it. If they indicated that they did not do the activity, then they were asked whether this was due to vision problems. All such changes were discussed with, and approved by, one of the original NEI VFQ developers (CMM).

Subjects

Sample 1 had 276 participants and Sample 2 had 110. All those in Sample 1 were included in the analytic sample. In Sample 2, 4 patients did not answer the questionnaire and 11 did not respond after cataract surgery, thus 95 patients were in the analytic sample for this group. The characteristics of the participants are shown in Table 1.

Table 1 Sociodemographic and clinical characteristics of the two samples

Of the patients with ARMD, 7 had only dry change in both eye, 8 had exudative changes in one eye, 56 patients had exudative changes in both eyes, and the status 9 patients was unknown. In the patients with glaucoma, their mean dB threshold values were -12.8 for the right eye and -12.9 for the left eye with Humphrey 30-2 threshold perimetry test. In the cataract patients of sample 1, the mean values measured by LOCS III were 2.04 for nuclear color, 2.07 for nuclear opalescence, 2.42 for cortical opacity, and 1.84 for posterior subcapsular opacity in better eye. The mean values in sample-2 patients were 2.76, 2.78, 3.31 and 2.18 respectively.

Item analysis

Percentages of missing values for each item and proportions of responses at the floor (the lowest possible score) and ceiling (the highest possible score) are shown in Table 2. 'Finding objects on crowded shelf' which was included in the 'Near Vision' subscale was not endorsed by 28% of the respondents, while 'going out to movies/plays' which was included in the 'Distance Vision' subscale was not endorsed by 32% of the sample. Three items each from the 'Near Vision' and 'Distance Vision' subscales in the optional item pool were included in the questionnaire (NV: reading small print, reading mail/bills, shaving/styling hair, DV: recognizing faces in room, participating in sports, seeing television). Subsequently, items with low rates of missing data were substituted for those with high rates, as long as the percentage of responses at the ceiling or floor did not exceed 50%. The result was that 'reading small print' was selected for the 'Near Vision' subscale and 'seeing television program' was selected for the 'Distance Vision' subscale.

Table 2 Results of item analysis. Number and percentage of missing data and of responses at the floor and ceiling (n = 276)

More than 60% of the answers were missing for the 'Driving' subscale, which was much higher than the 16% and 31% obtained from surveys done in the United States.

Reliability

Cronbach's alpha (the index of internal consistency reliability) was 0.7 or higher for almost all of the subscales. It was lower for the 'Ocular Pain' and 'Driving' subscales. With regard to test-retest reliability, the intraclass correlation coefficient was 0.7 or higher for all of the subscales except 'General Health', 'General Vision', and 'Peripheral Vision' (Table 3). These values are considered to indicate adequate reliability for group-level comparisons [31]. Substitution of items in the 'Near Vision' and 'Distance Vision' subscales (described above) did not affect the reliability of those subscales.

Table 3 Internal consistency and test-retest reliability of NEI VFQ-25 subscales

Validity

All items passed the test of convergent validity, and 80% passed the test of discriminant validity. The success rates for the 'Near Vision' and 'Distance Vision' subscales were higher after item substitution than before (Table 3).

For concurrent validity, there were high correlations between scores on the NEI VFQ-25 subscales and similar domains of the SF-36 (Table 4). For example, The highest correlations were with the "Vitality" and "Mental Health" subscales, followed by the "Role Physical" and "Role Emotional" subscales. Correlations with the "Bodily Pain" and "Physical Functioning" subscales were low.

Table 4 Correlation of NEI-VFQ 25 subscales and the SF-36

The mean scores and the standard errors after adjustment for sex, age, and number of comorbid conditions are shown in Table 5. All scores were lower for those patients with age-related cataracts than for those in the reference group, with the exception of the 'Peripheral Vision', 'Color Vision', 'Ocular Pain', and 'Dependency' subscales. In addition, the subscales scores were significantly lower for those with ARMD than for those in the reference group, with the exception of the 'Peripheral Vision', 'Color Vision', and 'Ocular Pain' subscales. The item substitution described above resulted in slightly lower scores on the 'Near Vision' subscale and slightly higher scores on the 'Distance Vision' subscale. We tried the comparison of the explanation of variance caused by the influence of medical condition and visual acuity (Table 5). Two models associated with the NEI-VFQ score similarly.

Table 5 NEI VFQ-25 subscale scores and composite score, by condition* and the comparison of R2 between medical condition model and visual acuity model

Visual acuity in the better eye (logMAR, the logarithm of the minimum angle of resolution) was strongly correlated with subscales that are influenced by the ability to use central vision: 'General vision', 'Near Vision', and 'Distance Vision' (Table 6). As would be expected, the logMAR was only weakly correlated with the subscales that are less dependent on the quality of central vision: 'General Health', 'Peripheral vision', 'Ocular Pain', and 'Color Vision'. In patients with glaucoma, visual field deficits were strongly correlated with scores on three subscales: 'Distance Vision', 'Driving', and 'Peripheral Vision' (Table 6). These correlations are similar to those observed between clinical measures and NEI VFQ scores in the NEI psychometric field test [9].

Table 6 Pearson correlations of NEI VFQ-25 subscale scores with visual acuity and visual field

The results of factor analysis done with 10 subscales ('General Health' and 'Driving' were excluded) are shown in Table 7. Two factors were extracted. The 'Peripheral Vision', 'Ocular Pain', and 'Color Vision' subscales were included in the second factor. The correlation between the two factors was 0.47. The results of factor analysis done with 22 items (1 item on 'General Health' and 2 items on 'Driving' were excluded) had the similar to the structure of scale-level analysis.

Table 7 Results of factor analysis on 10 subscales of VFQ-25 ('General Health' and 'Driving' were excluded): factor loadings after promax rotation

Responsiveness

The mean of visual acuity of patients with cataract were 20/200 before surgery, and after surgery it had improved to 20/50 in the better eye. In the reference group, scores were stable over two months. In the cataract surgery group, surgery was associated with significant increases in the composite score and in 8 subscale scores: 'General Vision', 'Near Vision', 'Distance Vision', 'Ocular Pain', 'Social Functioning', 'Mental Health', 'Role Limitation', and 'Dependency' (Figure 1). Guyatt's index of responsiveness for those subscale scores ranged from 1.91 to 7.35. Even the lower limit of that range would be considered to be extremely high [32]. The only exception was the 'General Health' subscale, which would not have been expected to be strongly influenced by cataract surgery.

Figure 1
figure 1

Adjusted change in NEI VFQ-25 scores in the cataract-surgery group and the reference group. Change score adjusted for sex and age.

On the basis of the result of factor analysis, we computed 3 different composite scores: composite 11 (all VFQ-25 subscales except 'General Health'), composite 10 (all VFQ-25 subscales except 'General Health' and 'Driving'), and composite 7 (only those 7 subscales of the VFQ-25 that loaded heavily on Factor 1, as indicated in Table 7). The responsiveness indexes of these three composite scores were 7.18, 8.03, and 8.86, respectively, all of which are acceptable from a psychometric perspective.

Discussion

We developed a Japanese version of the NEI VFQ-25, and documented its psychometric characteristics in patients with various chronic eye conditions. Overall, we found that the Japanese version can provide data that are reliable, valid, and responsive to change in visual function.

In developing the Japanese version, a few changes to the content of the questionnaire were needed. For some of the items in the 'Near Vision' and 'Distance Vision' subscales, we found that the rates of missing data in the Japanese version were much higher than in the original English version. To minimize the rates of missing data and thereby to increase the measurement precision, we propose substituting items that are appropriate for patients in Japan. Specifically, instead of 'finding objects on crowded shelf', 'reading small print' can be used in the 'Near vision' subscale; and instead of 'going out to movies/plays', 'seeing television program' can be used in the 'Distance vision' subscale (both are from the pool of optional NEI VFQ items). Rates of missing data were much lower after those substitutions than before. The 'Near Vision' score was slightly higher and the 'Distance Vision' score was slightly lower, but their reliability and validity were virtually unchanged.

The 'Driving' subscale also had a high rate of missing data. We suggest that in Japan the 'Driving' subscale should be optional.

Composite scores can be useful summaries of visual function, particularly when the content of such a score is based on the results of factor analysis. In this study, factor analysis indicted that most of the subscales that are influenced by central vision correlated strongly with the first factor, while the 'Ocular Pain', 'Peripheral Vision', and 'Color Vision' subscales correlated strongly with the second factor. Therefore, if only one composite score is to be computed, that score should not include the 'Ocular Pain', 'Peripheral Vision', or 'Color Vision' subscales. Nonetheless, for studies of interventions involving small numbers of subjects we suggest using the 7-subscale composite score, given the caveat that it would not reflect problems with color vision, peripheral vision, or ocular pain. Furthermore, we suggest using the 10-subscale composite score when evaluating patients who have ocular pain or a disorder involving color vision.

Few reports [4, 25] were available up until now on the responsiveness of the NEI VFQ-25. Almost no changes were observed in the VFQ-25 scores over 2 months in the reference group. In contrast, in the patients who underwent cataract surgery, many subscale scores increased by about 20 points. These increases occurred not only in the scores on subscales related directly to vision ('General Vision', 'Near Vision', and 'Distance Vision'), but also in the scores on subscales that are less vision-specific ('Mental Health', 'Dependency', 'Social Functioning', and 'Role Limitation'). Scores on the 'General Health', 'Color Vision', and 'Peripheral Vision' subscales did not change with cataract surgery. These results show that with the Japanese version of the NEI VFQ-25, one can easily detect clinically important changes such as those resulting from cataract surgery.

Interpretation of these results is limited in at least four ways. First, this study did not include patients with diabetic retinopathy, low vision, and a large number of other eye conditions. Thus, whether these findings are applicable to patients with diseases other than cataracts, glaucoma, and ARMD remains to be studied. Second, we used a convenience sample of persons with these conditions, and they may not represent the full clinical spectrum of each disease. Third, it is unclear whether the mode of administration (self-administered or interviewer administered) would have important effects on the results. However, we obtained the present data with trained interviewers, and we note that the findings are similar to those obtained in a field survey with the original English version, even though the questionnaire in that survey was self-administered. Fourth, the responsiveness results were obtained without the aforementioned substitutions in the 'Near Vision' and 'Distance Vision' subscales. Thus, the responsiveness of those two subscales should be examined again, after the recommended substitutions.

Conclusion

In conclusion, psychometric testing indicates that data obtained with the Japanese version of the NEI VFQ-25 are sufficiently reliable, valid, and responsive for group-level comparisons. For reasons described in detail above, we suggest that a few items be substituted and that a few be removed from the composite score. Using this scale in vision-related clinical research in Japan should facilitate evaluations of clinical care and outcomes from the standpoint of the patient.