Background

The global prevalence of gastro-esophageal reflux disease (GERD), defined as heartburn once daily, is estimated to range from 5 to 7% but varies widely and depends on the definition used [1]. For example, 25% of the adult population in Belgium [2], nearly 18% in Australia [3], 20% in the United States and 9 % in Canada reported GERD symptoms once a week or more [4, 5]. While heartburn is the leading symptom in GERD, the disease is associated with a broad range of esophageal problems, including acid regurgitation, epigastric pain, esophageal erosions and complications such as Barrett's esophagus, esophageal adenocarcinoma and esophageal stricture. GERD is also associated with a variety of extra-esophageal problems, including sleep disturbances, noncardiac chest pain, asthma, chronic cough and hoarseness [6].

Clinicians and health services researchers are becoming more aware of the importance of patient-reported outcomes (PROs), including health related quality of life (HRQL) in understanding the burden of disease and the outcome of medical treatment. While physiologic measures provide information to clinicians, these outcomes are often not important to patients [7] and they correlate poorly with functional status or well-being, the areas in which patients are mostly interested. For example, the majority of patients with typical symptoms of GERD do not have endoscopic evidence of esophagitis [8]. PROs assist in providing a better understanding of treatment outcomes from the patient's perspective by translating clinical improvement into patient-important outcomes. Moreover, HRQL assessment is important for measuring quality of care, clinical effectiveness, and in reimbursement decisions [9, 10].

The use of validated questionnaires is appropriate for measuring PROs in clinical trials [11, 12]. There are two categories of HRQL measures; disease-specific and generic HRQL instruments [13, 14]. Disease-specific instruments are used to describe the burden of disease and treatment outcomes in patients with a specific disease, and generic instruments measure the overall HRQL of patients, including physical, emotional, and social function, as well as their level of general performance at work and in daily life across different diseases [15].

Utility measures, one type of generic health status measures, are based on economic and decision theory [16]. These instruments measure patient preferences and generate preference or utility scores for respondents' health states on a 0 to 1.0 scale where 0 typically equals dead and 1.0 is full health [17]. In this manuscript we will use the term "utility" for all scores generated with preference based instruments although. The standard gamble (SG) is regarded as the reference standard for utility measurement [1820]. Another utility instrument is the feeling thermometer (FT), a visual analogue scale presented in the form of a thermometer [18]. When completing this instrument, patients choose the score on the thermometer that represents the value they place on their health state. It is far simpler than the SG and has shown good responsiveness and validity in several studies [2124]. The health utility index 3 (HUI 3) is a multi attribute utility instrument designed to classify a patient's health status based on rating of a set of defined items [25, 26].

The aim of this analysis was to address the impact of GERD on utility scores, HRQL and work productivity in patients with moderate to severe GERD and to evaluate the construct validity of utility instruments as measures of HRQL.

Methods

Patients

We enrolled 249 uninvestigated outpatients with a clinical diagnosis of moderate to severe GERD in 13 specialty centers and 4 general practices across Canada between March 2002 and March 2003. Table 1 lists the definitions of symptoms severity. To evaluate the aims described in the introduction of this manuscript, we utilized data from a study that had as primary aim the comparison of two different formats of administering the SG and the FT [27]. Therefore, the current publication describes the data from the baseline visit of the 217 patients (87%) who completed the study and the responsiveness of the QOLRAD and the four symptoms questionnaire after four to six weeks of treatment with a proton pump inhibitor (PPI), esomeprazole. The main inclusion criteria were a clinical diagnosis of GERD, main symptom of heartburn, age greater 18 years, symptomatic for three months or longer, and off PPI for the 2 weeks prior to questionnaire administration. GERD was defined as a burning feeling, rising from the stomach or lower part of the chest up towards the neck. We describe the detailed eligibility criteria elsewhere [27].

Table 1 Definition of symptom severity

Study design

Patients completed several utility and HRQL instruments during a clinic visit and provided demographic and clinical data. The visit lasted approximately 80 minutes. An experienced research coordinator from the method center trained all site interviewers in a daylong session in HRQL instrument administration. Ethic review boards at all study sites approved the study protocol and all patients signed an informed consent form prior to enrollment in the study.

Utility measures

The feeling thermometer (FT) is a visual analogue scale shown as a thermometer in which the best state is full health (equal to a score of 100) and the worst state is dead (a score of 0) [18]. It has shown good responsiveness and validity in several studies [2124]. In this trial, we used a self-administered form of the FT [27, 28].

The SG [1820] offers patients two options from which a choice must be made: Choice A is the certain outcome that the patient will stay in a health state (their own health state, or a marker state) for t years until death. We varied t depending on the patient's age as follows: patients aged more than 80 years, t = the rest of the patient's lifetime; age 76 – 80 years, t = 10 years; age 66 – 75 years, t = 15 years; age 56 – 65, t = 25 years; age 46 – 55 years, t = 30 years; age 36 – 45 years, t = 35 years; age 26 – 35 years, t = 40 years; age 18 – 25 years, t = 45 years. Specifying the duration of remaining life means that patients use the same time frame as other patients of the same age, and reduces the random error that might result from patients inferring different time frames. Varying time frame by age minimizes an additional lack of realism that could arise if one chose a single time frame and either young patients have an unrealistically short duration of remaining life, or old patients have an unrealistically long duration. The alternative (choice B) is a hypothetical treatment with 2 outcomes: 1) returning to full health (probability p) for t years, at the end of which the patient dies or 2) immediate death (probability 1-p). Interviewers used a chance board with a ping-pong approach varying the probability p in steps of 0.05 to obtain the value, p*, where the patient considered choice A equal to choice B. This indifference probability, p*, is the utility value for the patient's own health in choice A in the interval from dead (= 0) to full health (= 1). The greater the respondent's willingness to accept the risk of a worse outcome (e.g dead) to avoid the health state in choice A, then the lower is the utility of the state in choice A to them.

The HUI 3 is a 15 item self-administered questionnaire. It has 8 attributes that include vision, hearing, speech, ambulation, dexterity, emotion, cognition and pain. We calculated a utility score on a 0 to 1.0 scale where 0 represents dead and 1.0 represents full health [26]. HUI has been shown to be a reliable, responsive and valid measure in a wide variety of clinical studies [25].

In addition we used the four symptoms questionnaire that comprises a series of four questions on which patients rate how they felt for the past week using a seven-point Likert scale ranging from no problem to very severe problem. The four symptoms questionnaire evaluates heartburn, acid reflux, belching, and stomach ache.

Disease-specific HRQL

The quality of life in reflux and dyspepsia (QOLRAD) consists of 25 items across five dimensions: emotional distress, sleep dysfunction, vitality, food/drink problems, and physical/social functioning. Patients provide answers on a seven-point Likert-type scale. The lower the value, the more severe is the impact on daily functioning. The QOLRAD is reliable, valid and responsive [2931].

GERD-specific work productivity and activity impairment questionnaire

The GERD-specific Work Productivity and Activity Impairment questionnaire (WPAI-GERD) contains 8 items (Table 2) that uses a one-week recall period and measures absence from work, reduced productivity while at work and reduced productivity while doing regular daily activities other than work [32].

Table 2 GERD-specific work productivity and activity impairment questionnaire (WPAI-GERD)

Health profile measure

The Medical Outcomes Study Short Form-36 (SF-36) is a generic instrument that assesses a wide range of health problems, including GERD [34]. It consists of 8 domains including physical functioning, role limitations-physical, bodily pain, general health, vitality, social functioning, role limitations-emotional, and mental health. The SF-36 scores range from 0 to 100, with higher scores indicating better functioning and well-being.

HRQL and utilities for other health states

We compared the SF-36 scores for the Canadian general population reported by Hopman et al. [33], and the US general population, depression, hypertension, diabetes extracted from the Medical Outcome Study and reported by Revicki et al. [3437] to the SF-36 scores of our study patients. In addition, we compared the utility scores of patients enrolled in this study with those of other patients reported previously, as utility ratings are comparable across conditions because they provide scores between 0 (dead) and 1 (full health) that are not disease specific [38, 39]. A priori, we determined that we would focus on patients with common diseases for whom utility assessments are available.

Statistical analysis

We report baseline instrument scores as means with 95% confidence intervals (CI). The QOLRAD and the four symptoms questionnaire responsiveness to treatment are shown as mean change scores with 95% CI. We also evaluated the cross-sectional construct validity of the utility instruments by calculating Pearson's correlation coefficients of the scores on the FT, SG and HUI 3 with the validation instruments the QOLRAD, the SF-36, and the four symptoms questionnaire. We assumed that higher correlations with the validation instruments would indicate greater construct validity. For interpretation of the correlations we considered correlations of less than 0.2 as very weak, from 0.2 to 0.35 as weak, from 0.35 to 0.5 as moderate and of more than 0.5 as strong. We compared mean physical and mental component summary scores on the SF-36 of our study population and the previously reported scores of the Canadian population, the US population, clinical depression, diabetes, and hypertension using Student's t tests [3337].

Results

Baseline characteristics and burden of disease

Table 3 shows the baseline characteristics of the patients. The mean age was 50 years (range 20 to 82), 53% were females, 69%% were employed, and the mean number of months since diagnosis was 86 (range 1 to 504). Table 4 presents mean QOLRAD scores and the mean scores for the four symptoms questionnaire along with the mean scores change after 4 weeks of PPI treatment. GERD has the greatest impact on the vitality and food/drink domains of the QOLRAD (scores of 4.3 and 3.8 respectively). The most severe symptoms were heartburn and acid reflux with scores of 4.5 and 4.1 respectively. PPI treatment significantly improved the scores of the QOLRAD and the four symptoms.

Table 3 Baseline characteristics of patients with moderate or severe GERD mean age 50 years, range 20 to 82
Table 4 QOLRAD and four symptoms questionnaire baseline scores and mean change scores after 4 week treatment with PPI

WPAI-GERD

A total of 153 (71%) patients were employed. The percentage of overall work impairment secondary to GERD that included absence from work plus time lost due to reduced productivity was 16% (95% CI, 12.9 to 18.8), which corresponds to 6.7 hours lost per week due to GERD symptoms. Furthermore, the reduced productivity during activities other than work in 216 patients was 21% (95% CI, 18.0 to 24.0).

Construct validity of the utility instruments

The correlations of the FT and HUI 3 with the QOLRAD and SF-36 domains were moderate. However, the SG showed lower correlations than the FT and the HUI 3. The FT had the highest, albeit weak correlation with the four symptoms questionnaire (Table 5).

Table 5 Correlation of the FT, SG and HUI 3 with the validation instruments (P-values)

Comparison of utility scores with other diseases

The systematic review by Morimoto et al. of utility measures reported SG weighted means for asthma, chronic renal failure, and angina pectoris of 0.88 (range 0.82–0.91), 0.52 (range 0.49–0.55), and 0.76 (range 0.64–0.97), respectively [38]. The systematic review by Post et al. revealed time trade off (TTO) and SG in survivors of minor stroke to be 0.72 (range 0.71–0.81), and 0.89 (range 0.81–0.95) respectively [39]. We previously reported baseline scores for the FT and SG in patients with moderate to severe COPD to be 0.60 (SD 0.18), and 0.66 (SD 0.27) respectively [24]. In this study, the utilities obtained for the FT, SG, and HUI 3 were 0.67 (95% CI, 0.64–0.70), 0.78 (95% CI, 0.75–0.80), and 0.80 (95% CI, 0.77–0.82) respectively.

Comparison of the SF-36 scores to other diseases and the general population

The mean scores on the SF-36 range from 42.8 to 47.7 across the different SF-36 domains (Table 6). These scores are significantly lower than the Canadian and US general population mean scores on the physical and mental component summaries (table 7) [33, 35]. Table 7 also demonstrates the comparison of the SF-36 scores of our patients to other groups of patients included in the Medical Outcomes Study and reported by Revicki et al. [35, 36, 40].

Table 6 SF-36 mean scores
Table 7 Baseline mean (standard deviation) SF-36 physical component summary (PCS) and mental component summary (MCS) scores for patients with gastroesophageal reflux disease compared with data reported by Revicki et al. [35] and Hopman et al. [33]

Discussion

We determined the impact of GERD on utility scores, HRQL and work productivity in patients with moderate to severe GERD and the cross-sectional construct validity of three utility instruments (FT, SG, and HUI 3). Although the comparisons we made are indirect, the results of this study indicate that GERD causes important reductions in HRQL and utility when compared to the Canadian and US general population and to those of patients with a range of other chronic conditions. Heartburn and acid reflux were reported by our patients to have the worst impact on symptoms using the four symptoms questionnaire. The QOLRAD as well as the symptom scores improved after esomeprazole treatment although this study did not include a placebo group. In addition, our data indicate that GERD causes a considerable loss in work productivity.

The strengths of this study include the use of several utility and HRQL instruments that allow comparison to other chronic conditions. By using several quality of life instruments, we have studied GERD patients more comprehensively than previous studies [31, 32, 35]; however, our results are confined to GERD patients with moderate to severe symptoms who were participating in a clinical study. In addition, our comparison with population data is based on historical data. Despite the impressive responsive of the QOLRAD scores to esomeprazole treatment, one has to keep in mind that this study was not a randomized controlled trial. In regards to evaluation of validity, another limitation of our study is that we did not generate a priori predictions regarding correlations between the utility instruments and other measures. Had we generated such a priori predictions our conclusion about the validity of the instruments might be stronger.

Data on utility measures in GERD patients are sparse. We observed important disutility measured with the FT, SG, and HUI 3. The results suggest that the FT and HUI 3 are valid tools for the assessment of HRQL in patients with GERD. We were specifically interested in exploring the relative validity of utility instruments in patients with GERD.

In general, the correlations with other HRQL instruments were moderate. In contrast, the SG showed poor construct validity. Moreover, the FT shows better correlation with the four symptoms questionnaire than the SG and the HUI 3. Thus, our findings suggest that the FT and the HUI 3 are more appropriate indicators of HRQL impairment than the SG in these patients. Studies that use the FT, SG, HUI and SF-36 simultaneously are rare. We have previously observed a similar pattern of correlations in patients with chronic obstructive pulmonary disease (COPD) [24]. The correlations of the FT and HUI 3 with the SF-36 were higher compared to those of the SG with the SF-36. Thus, there is external evidence that the FT and HUI 3 show greater validity for the assessment of HRQL than the SG.

The data also indicate that the disutility in patients with moderate to severe GERD is similar to that of moderate to severe COPD [24]. The utility scores obtained with the SG are lower than what was previously reported for SG scores in patients with asthma [38] and comparable to those of minor stroke survivors [39].

We also found reductions in the SF-36 scores on all 8 domains in GERD patients. Compared to the Canadian and US general population, patients in this study had significantly reduced scores in the SF-36 MCS and PCS [3336]. The SF-36 PCS scores are comparable to patients with clinical depression and hypertension. On the other hand, GERD patients have significantly worse SF-36 MCS scores than patients with diabetes mellitus and depression.

The QOLRAD results suggest that GERD has the greatest impact on the vitality and food/drink domains. These results are similar to those reported by Wiklund et al. [31] confirming the negative impact of GERD on the daily functioning of affected patients.

Health administrators and payers are interested in the magnitude of work productivity loss due to GERD [41]. Wahlqvist et al. showed that patients with GERD symptoms report 23% reduced productivity while at work, and 30% reduced productivity while doing regular daily activities in a Swedish population [32]. Our study supports the finding of impaired productivity in patients with moderate to severe GERD, but the estimates of work loss are somewhat lower demonstrating that about 16% of the work time is lost due to the illness. Since GERD affects approximately 9% of the Canadian population [5], impaired productivity has important economic consequences on society if it is not treated effectively.

Conclusion

In summary, we found that GERD has significant impact on utilities scores, HRQL and work productivity in patients with moderate to severe illness. The impact of the disease is similar in magnitude to other chronic conditions that are less responsive to treatment. In addition, utility instruments such as the FT and HUI 3 provide valid measurements of the impact of GERD on HRQL.