Background

Heartburn is the primary symptom of gastro-esophageal reflux disease (GERD). The diagnosis relies solely on patients' subjective symptom evaluation unless an endoscopy is performed [1]. Heartburn affects several aspects of patients' lives [2, 3], such as their ability to have a good night sleep and to eat and drink whatever they want [4]. Hence, the assessment of how upper gastrointestinal symptoms impact patients' health-related quality of life provides important information about the patient's health status and how patients perceive the treatment regime [5]. This information is of interest to clinicians, enabling them to better understand how to tailor treatment to the individual patient's needs. However, in order to be a viable measure of treatment outcome in clinical trials, Patient-Reported Outcomes (PRO) instruments must be extensively documented to both meet scientific standards [6] and to satisfy regulatory criteria, particularly from the perspective of claims for labeling and promotion [7]. The regulatory criterion is twofold: linguistic, cross-cultural adaptation and psychometric documentation. The translation and the cross-cultural adaptation of the German translation of the QOLRAD were done according to the latest guidelines [8]. Its psychometric properties remain to be documented. Thus, the aim of this study was to document the reliability and validity of the German translation of the GSRS and QOLRAD in patients with GERD.

Methods

Patients

Consecutive patients with current or previously verified (there was no definite time frame given) predominant symptoms of heartburn were screened in both general practices and gastroenterology clinics. Heartburn was defined as a 'burning feeling rising from the stomach or lower part of the chest up towards the neck'. Patients were excluded if they had: concurrent diagnosis of Irritable Bowel Syndrome (IBS) or peptic ulcer disease; other significant medical or surgical disease; major psychiatric illness or dementia. Patients treated for peptic ulcer with anti-secretory or anti-Helicobacter pylori therapy and referred for follow-up endoscopy or those using of acetyl salicylic acid (ASA) or other nonsteroidal anti-inflammatory drugs (NSAIDs) daily were also excluded. Patients had to be able to complete the PRO instruments themselves, i.e. no proxy assessment or interpreter was allowed. The study was conducted between December 2000 and November 2001 in five centers. Written informed consent was obtained from all patients prior to inclusion in the trial. The study protocol and consent form were approved by independent local ethics committee in accordance with the revised Declaration of Helsinki. The patient was free to discontinue participation in the study at any time.

Demographic and clinical variables

Clinicians reported: patient demographics (age, sex, race, family and employment status); medical history; history of gastrointestinal disease; and frequency and severity of heartburn symptoms. Investigators also assessed the patients' symptoms using a four-graded scale: 0 = none: no symptoms; 1 = mild: awareness of sign or symptom, but easily tolerated; 2 = moderate: discomfort sufficient to cause interference with normal activities; 3 = severe: incapacitating with inability to perform normal activities. All data were reported in a paper case report form.

Patient-Reported Outcomes (PRO) instruments

Generic instruments are comprehensive, designed to be applicable across diseases, treatments and populations. Both general population norm values and values of populations with a number of chronic diseases are available. Disease-specific instruments, on the other hand, capture details about the disease activity and symptom patterns and are likely to be more responsive to change than generic instruments [9, 10]. Taking into consideration the complementary nature of these different kinds of instruments, disease-specific and generic instruments are in practice often used in tandem.

Patients completed four PRO instruments: Gastrointestinal Symptom Rating Scale (GSRS) [11]; the heartburn version of Quality of Life in Reflux and Dyspepsia (QOLRAD) [12] questionnaire; the Short-Form Health 36 (SF-36) [13]; and the Hospital Anxiety and Depression scale (HAD) [14]. All PRO instruments have been tested in terms of validity and reliability (see below).

Gastrointestinal Symptom Rating Scale (GSRS)

The Gastrointestinal Symptom Rating Scale is a disease-specific instrument that includes 15 items combined into five-symptom clusters addressing different gastrointestinal symptoms. The five-symptom clusters depict reflux, abdominal pain, indigestion, diarrhea and constipation. The GSRS has a seven-graded Likert type scale where 1 represents absence of bothersome symptoms and 7 very bothersome symptoms. The GSRS is well documented to be reliable and valid [15] and norm values for a general population are available [16].

Quality of Life in Reflux and Dyspepsia (QOLRAD)

The heartburn version of the QOLRAD is a disease-specific instrument, including twenty-five items combined into five dimensions: Emotional distress, Sleep disturbance, Vitality, Food/drink problems and Physical/social functioning. The questions are rated on a seven-graded Likert-type scale; the lower the value the more severe the impact on the daily functioning. The QOLRAD has been extensively documented in international studies in patients with heartburn with regard to reliability, validity [12] and responsiveness [4], assessing the impact of GERD on patients' HRQL. Previous studies have also revealed that a change of approximately 0.5 represents a clinically relevant change in the QOLRAD [17]. Its factor structure was also replicated in several translations [18].

Short-Form Health 36 (SF-36)

The SF-36 is an extensively used generic questionnaire containing 36 items clustered in eight dimensions. Item scores for each dimension are coded, summed and transformed to a scale from 0 (worst possible health state measured by the questionnaire) to 100 (best possible health state). The higher value indicates a better evaluation of health. The SF-36 is well documented in terms of reliability and validity in all available language versions [19, 20]. This study used the acute version of the SF-36, i.e. a one week recall period.

Hospital Anxiety and Depression scale (HAD)

The HAD consists of 14 items divided into two sub-scales for anxiety (7 items) and depression (7 items), in which the patient rates each item on a four-point scale. The higher scores indicate the presence of problems. A cut off of ≥ 11 implies definite cases, a cut off of 8–10 probable cases and ≤ 7 no cases. The validity and reliability of the HAD have been reported in several studies [21, 22].

All the above instruments have been translated and linguistically validated according to international guidelines [8]. The linguistic validation of a questionnaire is not a literal translation of the original questionnaire, but the production of a translation, which is conceptually equivalent to the original and culturally acceptable in the country in which the translation will be used. This translation process includes forward and backward translations by independent translators.

Administration of PRO instruments

All questionnaires were completed in an electronic data capture device (Apple Newton Pad). The method of using Electronic Data Capture (EDC) for HRQL studies has previously been shown to improve the quality of the data and to be well received by patients [23]. All study personal were trained to use the EDC and to instruct the patients in a standardized way for minimizing bias and enhancing compliance.

Psychometric evaluation of the instruments

Reliability

Internal consistency refers to the extent to which the items are interrelated. Cronbach's coefficient is one method of assessing internal consistency and is the method most widely used for this purpose. Cronbach's alpha was calculated [24] in each dimension of the instruments to assess the internal consistency reliability. A high alpha coefficient (≥ 0.70) suggests that the items within a dimension measure the same construct and supports the construct validity.

Test-retest reliability refers to the stability of a score derived from serial administrations of a measure by the same rater. Repeated measurements are made in the same individuals, presumably with a time interval long enough to ensure independence. Here, patients in the stable phase (between visits one and two) and in whom the treatment – not study mandated – remained unchanged were assessed. A reliability coefficient above 0.70 [10] is considered to be acceptable.

Construct validity

Construct validity is concerned with whether the indicator actually measures the underlying attribute. The construct validity of the GSRS and QOLRAD was examined by convergent, discriminant and known-groups validity.

Convergent validity consists of showing that a postulated dimension of the instrument correlates appreciably with all other dimensions that theory suggests should be related to it. Here, it was was examined by: a) correlating the QOLRAD and the GSRS; b) correlating the QOLRAD and the GSRS with the dimensions of SF-36; and c) correlating the QOLRAD and GSRS with the HAD and the clinician-assessed patient heartburn symptoms. Using Pearson's product moment correlation, similar dimensions in these instruments were expected to have high correlations with each other. A strong correlation was considered to be over 0.60, a moderate between 0.30 and 0.60 and a low (very low) correlation below 0.30 [25]. Low correlation was expected between those dimensions that are theoretically unrelated constructs, thereby testing the discriminant validity of the instruments. Correlating the QOLRAD and the GSRS with severity and frequency of heartburn symptoms also tested the discriminant validity of the instruments. Finally, known-groups validity [26] was also tested since a PRO instrument should be able to differentiate between groups of patients whose health status differs according to the characteristics of patients' disease, in this case heartburn severity and frequency [27]. Physician-assessed overall severity of symptoms and its relation to the QOLRAD dimensions were also evaluated.

In addition, the SF-36 summary scores were calculated for the physical component summary scale (PCS) and the mental component summary scale (MCS) based on German data (algorithm) [20] from the IQOLA project [28].

Finally, the HAD was used to evaluate the extent to which anxiety or depression correlates with the QOLRAD and the GSRS scores.

Statistical methods

Statistical analyses were performed utilizing the Statistical Analysis System (SAS, version 8.02) [29]. Test results are reported as significant for P < 0.0003, adjusted for multiplicity (Bonferroni's correction [30], 0.05/165). In the case of missing data in the PRO instruments, the mean of the completed items in one dimension was imputed to substitute the missing item provided that more than 50% of the items in one dimension were completed [31].

Results

Study population

A total of 142 patients (79 females) were included in the study. The diagnosis was verified at the discretion of the investigator when the patient fulfilled a history of episodes of heartburn for six months or longer with episodes of heartburn for one day or more during the last seven days prior to inclusion.

Patients' ages ranged from 19 to 79 with a mean of 47.5 years ± 14.6 (Table 1). For patients scheduled for the second visit (consecutive patients) there was no change in the treatment between visits 1 and 2 and the patients were in a stable phase. All participants were Caucasian, and 66 % were married. 62 % of patients was employed full-time. Further demographics and clinical characteristics of the patient population are shown in Table 1.

Table 1 Patient demographics and clinical data (N = 142).

Patients were recruited with diverse severity of symptoms. The majority (70%) had moderate symptoms and 15 percent had either severe or mild heartburn symptoms. The frequency of symptoms was evenly distributed among those experiencing symptoms one to four days per week. Forty-six percent of the patients had symptoms more than five days a week.

The Gastrointestinal Symptom Rating Scale and the Quality of Life in Reflux and Dyspepsia questionnaire

Patients were bothered most by symptoms of 'Reflux' (Mean = 3.9), 'Indigestion' (Mean = 3.3) and 'Abdominal pain' (Mean = 3.0). The consequences of GI symptoms were reflected in the following dimensions of the QOLRAD: 'Food and Drink Problems' (Mean = 4.4); 'Vitality' (Mean = 4.6); 'Emotional Distress' (Mean = 5.0); and 'Sleep Disturbance' (Mean = 5.1), in that order. Hence, patients reported that, because of their symptoms, they could not eat and drink whatever they liked; their vitality was impaired; they were emotionally distressed; and they could not have a good night sleep.

Internal consistency reliability and test-retest reliability

Cronbach's alpha ranged from 0.53 (Abdominal Pain) to 0.91 (Diarrhea) in GSRS. In QOLRAD, the intercorrelations ranged from 0.90 (Vitality) to 0.94 (Emotional Distress) (Table 2), respectively.

Table 2 Cronbach's alpha at visit 1 and test-retest reliability (Intraclass Correlation Coefficient (ICC))for GSRS and QOLRAD domains between visits 1 and 2.

The test-retest reliability in GSRS ranged from 0.49 (Abdominal Pain) to 0.72 (Constipation), in QOLRAD from 0.70 (Vitality domain), to 0.84, respectively (Emotional Distress and Food/Drink Problems domain) (Table 2).

Convergent and discriminant validity of Gastrointestinal Symptom Rating Scale and Quality of Life in Reflux and Dyspepsia

Pearson correlation coefficients were used to assess the convergent and discriminant validity (Table 3). There was a negative correlation between the Gastrointestinal Symptom Rating Scale and SF-36 in all domains. GSRS domains of 'Reflux' and SF-36 'Vitality' and 'Bodily Pain' were significantly correlated. The GSRS domains of 'Abdominal Pain' and 'Constipation' were significantly correlated with nearly all SF-36 domains in a negative direction. The relevant GSRS domains, 'Reflux', 'Abdominal Pain', and 'Indigestion', correlated significantly with all QOLRAD domains.

Table 3 Correlation coefficients (Pearson) between GSRS, QOLRAD and SF-36 domains, HAD and physician-assessed frequency and severity of symptoms.

HAD scores yielded positive correlations between GSRS and anxiety, with significant correlations between anxiety and 'Abdominal pain' and 'Indigestion'. In addition, physician-assessed frequency of symptoms and the GSRS domain of 'Reflux' were significantly correlated.

QOLRAD correlated positively with all domains of the SF-36. The strongest correlations (>0.50) were found between QOLRAD 'Emotional Distress', 'Physical/Social functioning', and 'Vitality' and all SF-36 domains except for 'General health' and 'Bodily pain'. Furthermore, significant correlations were found between QOLRAD and anxiety and depression in a negative direction. Finally, QOLRAD and physician-assessed frequency of the symptoms significantly correlated in the expected (negative) direction (Table 3).

Known-groups validity of Gastrointestinal Symptom Rating Scale and Quality of Life in Reflux and Dyspepsia

All domains of the GSRS and QOLRAD questionnaires were able to differentiate between groups of patients whose health status differed according to the physician-assessed overall frequency and severity of heartburn, thereby confirming the known-groups validity of the instruments (Figures 1 and 2).

Figure 1
figure 1

Physician-assessed overall severity and frequency of symptoms and the GSRS domain scores.

Figure 2
figure 2

Physician-assessed overall severity and frequency of symptoms and the QOLRAD domain scores.

Discussion

Clear and consistent associations were found between the symptoms of heartburn and their impact on patients' HRQL. In agreement with these results, a recent German study reported that GERD patients with at least moderately severe reflux symptoms had reduced HRQL [32]. The relevance of the sample of patients was confirmed when patients reported that the most bothersome symptoms they had were heartburn and acid regurgitation (reflux), indigestion and abdominal pain, in that order. This finding is in accordance with previous descriptions of symptom patterns [2, 4, 33] in patient with GERD.

One of the most established, validated, reliable and responsive instruments available in this area is the QOLRAD [12], which has been proven to have excellent psychometric characteristics when tested in clinical trials [34, 35].

The primary goal of this study of documenting the psychometric characteristics of the German translation of the GSRS and QOLRAD was achieved. The reliability of the most relevant GSRS domain (Reflux) was satisfactory, but the 'Abdominal Pain' domain was not optimal. The low reliability of 'Abdominal Pain' may suggest that pain can be perceived differently from time to time and/or that pain intensity may vary considerably even during a shorter period of time. More research is needed to explore this issue. All domains of QOLRAD had excellent internal consistency and test-retest reliability.

The construct validity of GSRS and QOLRAD has also been documented. The relevant domain scores of the GSRS and QOLRAD significantly correlated. The GSRS domains of Abdominal Pain and Indigestion significantly (negatively) correlated with most of the domains of the SF-36. All QOLRAD domains correlated significantly with the domains of SF-36, thereby confirming the construct validity of QOLRAD. Known-groups validity was also proven; GSRS and QOLRAD did differentiate between patients with different frequency and severity of symptoms, which is comparable to previous findings [12, 36].

The moderate or low correlation between patient-reported and physician-assessed symptom frequency and severity indicates that symptom assessment should be balanced between clinician examination and patient report [12, 37].

In conclusion, the German translations of GSRS and QOLRAD are reliable and valid.