Background

Graft survival in renal transplant patients has improved steadily over the last decades as a result of improved immunosuppressive therapies. Immunosupressive regimens continue to have side effects which patients find burdensome and which may lead to sub-therapeutic dosing and non-compliance by the patient.

Specifically, upper and lower gastrointestinal (GI) side effects such as reflux, diarrhea, and constipation are frequent occurrences with these medications. While the effects are understood on a clinical level, little is known about the patient perspective. Some evidence suggests that patients on different immunosuppressant regimens may feel less bothered by GI side effects or report better health-related quality of life (HRQL)[1]. Therefore patient reports of these GI complications and HRQL are important. Like any clinical measure, patient-reported outcomes must be valid and reliable.

The Gastrointestinal Symptom Rating Scale (GSRS) and the Gastrointestinal Quality of Life Index (GIQLI) are two patient reported outcomes instruments with demonstrated reliability and validity for use among renal transplant populations[2, 3]. However, the study population did not include South American participants. Additionally test-retest reliability was not able to be assessed in the original study. The objective of this study was to evaluate the psychometric characteristics (reliability and validity) of the GSRS and GIQLI in South American patients who have had a renal transplant.

Methods

Study population

Data from the PROGIS study, Measurement of Patient Reported Outcomes in Renal Transplant Patients with and without Gastrointestinal Symptoms (PROGIS)[1] were used for this validation study. PROGIS was a longitudinal, observational study of patients post renal transplant designed to assess the impact of GI symptoms on symptom severity and HRQL and changes in these patient-reported outcomes (PROs) that occur as a result of conversion from mycophenolate mofetil (MMF) to an enteric-coated formulation of mycophenolate sodium (EC-MPS) (myfortic®). PROGIS was conducted in twenty-seven clinical sites in six countries, with a total per-protocol population of 278 patients: 177 post-transplant patients who were experiencing GI complaints and were eligible to convert to EC-MPS and 101 post-transplant patients who were not experiencing GI complaints remained on MMF. Participants were evaluated at Baseline again at 4–6 weeks. Additional details and results of the PROGIS study can be found in Chan[1]. This validation study utilized data from five sites in Argentina and Chile only.

Research was conducted according to the ICH Harmonized Tripartite Guidelines for Good Clinical Practice and in compliance with the ethical principles of the Helsinki Declaration. Appropriate Institutional Review Board/Ethics Committee approval was obtained at each center prior to study initiation. Patients were eligible for participation if they were at least 18 years of age; were willing to provide informed consent and adhere to study requirements; had received a renal transplant at least 1 month prior to the study enrollment; and had been on an immunosuppressive regimen including MMF for at least two weeks prior to enrollment. Patients were either eligible to convert to EC-MPS because of GI complaints or were not experiencing GI complaints and were stable on their current regimen. Patients were not eligible for participation if their current GI symptoms were not assumed or known to be caused by MMF; they had an episode of acute rejection less than 1 week prior to study enrollment; they were undergoing an acute medical intervention or hospitalization; they were a woman of child-bearing age not willing to use an effective means of birth control; they had a major psychiatric illness or other medical condition that might interfere with ability to complete the study; or they had received any investigational drug within 30 days prior to study enrollment.

Assessments

Patients who met study entry criteria and provided informed consent were either converted to EC-MPS at Visit 1 (Group 1) or remained on MMF (Group 2). Participants continued on the appropriate immunosuppressant regimen (i.e., EC-MPS or MMF plus other transplant medications) as dictated by good clinical practice. Site study staff provided basic demographic and clinical information. Participants completed three self-administered questionnaires in a private room at the clinic: the Gastrointestinal Symptom Rating Scale (GSRS), the Gastrointestinal Quality of Life Index (GIQLI) and the Psychological General Well-being Index (PGWB) at Baseline. At Visit 2 the participants completed the same questionnaires, with the addition of the Overall Treatment Effect (OTE) scale.

The GSRS [46] is a 15-item instrument designed to assess the symptoms associated with common GI disorders. It has 5 subscales (Reflux, Diarrhea, Constipation, Abdominal Pain, and Indigestion Syndrome). Subscale scores range from 1 to 7 and higher scores represent more discomfort. The Spanish for Argentina version of the questionnaire was utilized for this study. The minimal important difference (MID) is the smallest difference in the scores that is perceived as significant by the clinician or the patient[7]. The MID range in renal transplant recipients has been calculated as a range between 0.4 for diarrhea and 0.8 for reflux[1].

The GIQLI [8] is a 36-item GI-specific HRQL instrument designed to assess HRQL in clinical practice and clinical trials of patients with GI disorders. The GIQLI has five subscales (GI Symptoms, Emotion, Physical Function, Social Function, and Medical Treatment) as well as a Total Score. Higher scores represent better HRQL and subscores range from 0–4 while the total score range from 0–144. Recent MID calculations were 12.7 for the total score, while for the subscales the range was 0.5 for Social Function to 0.2 for Emotional Status[1]. The Spanish version of the questionnaire, with minor wording modifications introduced by the research team based on well known differences in medical terms between Spain and South America, was utilized for this study.

The PGWB [9] is a 22-item measure designed to measure generic HRQL through assessment of psychological well-being and distress. The PGWB has 6 subscales (Anxiety, Depression, Positive Well-being, Self-control, General Health, and Vitality) as well as a total score. Higher scores represent better HRQL and the range is 0–100 for both the subscales and the total score. The Spanish version of the questionnaire was utilized for this study.

Additional information collected included demographic and socioeconomic data, time since transplant, type of transplant (deceased vs. living donor), current GI complications, severity of participant's GI complications (per clinician impression), concomitant immunosuppressant medication, concomitant medications which could lead to GI complaints, concomitant medications to treat or prevent GI symptoms and complications, and adverse events/infections for each participant throughout the study duration[1].

Statistical Analyses

All analyses were performed with SAS version 8.02 (SAS Institute, Cary NC). Demographic variables and clinical conditions were evaluated by descriptive analyses. For the descriptive analyses, chi-square tests were used to evaluate categorical data; t-tests and analyses of variance (ANOVA) were used to evaluate continuous data. Scoring – including imputations for missing data if necessary – was performed according to each questionnaire's guidelines.

Reliability refers to the consistency of items within an instrument, either over time or internally within the instrument. Internal consistency reliability is the extent to which all items measure the same construct; values are presented descriptively, on an internal level scale from 0 to 1.0, with higher scores indicating a more reliable (precise) instrument. A Cronbach's alpha of 0.70 or greater indicates acceptable internal consistency reliability for an instrument used with group data[10]. The internal consistency reliability of the GSRS and GIQLI total and subscale scores was estimated using coefficient alpha.

Test-retest reliability, or reproducibility of the measure, refers to the degree to which scores remain the same over time when no change is expected [11, 12]. Reproducibility assesses whether stable participants (based on responses on the OTE) scored similarly on the GSRS and GIQLI from Baseline to Visit 2. Hays and colleagues[11] suggest that intraclass correlation coefficients (ICCs) should be greater than 0.60 in stable participants.

Validity refers to the extent to which the instrument measures the construct it purports to measure and also the extent to which the instrument is useful for its intended purpose[10, 11, 13]. Kendall's tau correlation coefficients were used to assess construct validity. Evaluation of validity can use instruments that measure similar or dissimilar constructs. For example a disease specific HRQL instrument is often validated by using a generic instrument. However one can also evaluate validity by using instruments that measure different yet related constructs, in this case making an assumption that correlations would be lower. For this study, we used generic HRQL instruments to assess validity of both the disease-specific GIQLI as well as the GSRS, a measure of symptom impact. Construct validity focused on the pattern and magnitude of the relationship among the GSRS and GIQLI scale scores and the PGWB. We expected to find the following: 1) the relationship between the two instruments measuring HRQL (the GIQLI and PGWB) will be stronger, producing correlations of greater magnitude than that between the GSRS and PGWB; 2) the relationship between the PGWB total score and the GSRS subscales will be low to moderate (0.10 < r < 0.40); 3) the relationship between the PGWB subscales and the GSRS subscales will be low to moderate (0.10 < r < 0.50) with higher correlations being found between the GSRS Abdominal Pain and Indigestion Syndrome and all PGWB total and subscale scores compared to the remaining three GSRS subscales (Diarrhea, Constipation, Reflux Syndrome) 4) the relationship between the GIQLI total score and the PGWB total score will be moderate and significant and 5) the relationship between the GIQLI Symptom subscale and the PGWB subscales will be low to moderate (0.10 < r < 0.50) and significant[14].

Discriminant or known groups validity is the extent to which scores from an instrument are distinguishable from groups of subjects that differ by a key indicator, often clinical in nature[15]. To evaluate known groups validity, GSRS and GIQLI scores were analyzed by the presence or absence of GI complications using Wilcoxon rank-sum tests and by clinical severity (none, mild, moderate, severe) of GI complications (as rated by the clinician) using a Kruskall-Wallis test of overall differences and Wilcoxon rank-sum tests for pairwise comparisons. The expectation would be that scores on the GSRS and GIQLI would be worse for patients with GI complaints as compared to patients without GI complaints. Additionally, we expected to see worse scores for patients with more severe GI effects.

Results

Study Sample

Sixty-two participants were enrolled: 44 participants at four sites in Argentina and 18 participants at one site in Chile. Table 1 presents the demographic and clinical characteristics of the participants at Baseline. Participants were, on average, 42 years old, and had their transplant 3.3 years prior to study enrollment. Fifty seven percent of the participants were male. Almost two-thirds of the participants (65%) had received a deceased transplant and had GI complaints ranging from mild to severe (68%). Abdominal pain was the most frequently-reported complaint (61%); 52% reported dyspepsia, 40% diarrhea, and 34% reported nausea. None of the differences between participants were statistically significant, however, a higher proportion of patients in the test-retest sample (group 2) were male, older, with lower educational status and had shorter time since transplant. Participants were very similar to the participants in the entire PROGIS study (data not shown).

Table 1 Baseline Demographic and Clinical Characteristics

Internal Consistency Reliability

The estimates of the internal consistency reliability of the GSRS sub-scales were good (range 0.72 – 0.90) with the exception of Abdominal Pain (Cronbach's alpha of 0.63). The GIQLI total and subscale scores demonstrated excellent internal consistency reliability (range 0.78–0.96; not calculated for the Medical Treatment subscale because it is a single item subscale).

Test-Retest Reliability

Twenty participants were stable from Baseline to Visit 2 (Table 2) and were used to assess the test-retest reliability of the GSRS and GIQLI. ICCs were adequate – above the established cut-off of 0.60 – and statistically significant (p < 0.001) for the GIQLI total score and all subscale scores and also for three of the five subscales of the GSRS (the Reflux subscale had an ICC of 0.57, very close to the cutoff point while the Diarrhea subscale was the exception with an ICC of 0.16).

Table 2 Test-retest Reliability (Reproducibility): Score Stability of the GSRS and GIQLI

Construct Validity

Correlations between the GSRS and the PGWB ranged from r = -0.21 (Diarrhea and Positive Well-being) to r = -0.53 (Indigestion Syndrome and General Health) (Table 3). All GSRS-PGWB correlations were statistically significant (p < 0.0015 or less) except for 8 correlations. Correlations between the GIQLI and the PGWB were higher as both measure similar constructs, ranging from r = 0.43 (Medical Treatment and Positive Well-being) to r = 0.71 (Emotion and Depression) (Table 3). All GIQLI-PGWB correlations were statistically significant (p < 0.001).

Table 3 Construct Validity: Correlations* of GSRS and GIQLI Total Scores and Subscale Scores with PGWB Total

Known Groups Validity

Clinical variables used to assess known groups validity were time since transplant; presence or absence of any GI complaint; and severity of GI complaints. Scores on the GSRS and GIQLI were all poorly correlated with the length of time since transplant and no difference by gender was detected (data not shown). All subscales of the GSRS and GIQLI significantly differentiated between patients with and without GI complaints (Figures 1 and 2). The differences were also clinically significant, because in all cases these were above the MID thresholds for each GSRS and GIQLI subscale and total score. Severity level analyses were conducted using four levels: none, mild, moderate, and severe. The GSRS subscales and the GIQLI total score and subscales were able to differentiate among the clinical severity ratings of none, mild, moderate, and severe (overall Kruskall-Wallis test p < 0.001 for each subscale except GIQLI Social Function, Table 4). Many pair comparisons among severity groups were statistically significant (p < 0.0015) and also clinically significant, especially among none and other severity levels but there was no clear consistent gradient between all severity levels.

Table 4 GSRS and GIQLI scores by physician's rating of clinical disease severity at baseline
Figure 1
figure 1

GSRS Subscale Scores by Presence/Absence of GI Complaints.

Figure 2
figure 2

GIQLI Subscale Scores by Presence/Absence of GI complaints ***.

Discussion

Psychometric evaluation is an on-going process that incorporates quantitative as well as qualitative testing. No single test result provides information regarding an instrument's psychometric soundness; results of both reliability and validity testing must be weighed together.

The results of this study provide evidence of both the reliability and the validity of the Spanish for Argentina versions of the GSRS and Spanish for Spain GIQLI in patients post renal transplant in South America. These instruments were originally developed for use in the GI area and demonstrated good psychometric characteristics when used with a wide variety of GI diseases and surgical procedures [8, 14, 16]. In this study, both questionnaires demonstrated extremely good psychometric characteristics, including reliability and validity. Internal consistency reliability of four of the five GSRS subscales was above the 0.70 cut-off for aggregate data. The GIQLI total score and subscales for which a Cronbach's alpha could be calculated were all above 0.70 as well. Four of the five GSRS subscales demonstrated satisfactory (above or very close to the threshold of 0.6) reproducibility over a 4- to 6-week period. The GSRS subscale that did not demonstrate good reproducibility – the Diarrhea subscale – was not the same as the subscale (Abdominal Pain) that fell below the 0.70 cut-off for internal consistency reliability, indicating that there is no pattern of poor reliability. Nevertheless, the low reliability score of the GSRS Diarrhea was surprising. One potential explanation is that the test-retest population may not have been stable in terms of diarrhea at the time of the second measurement. In the PROGIS study, diarrhea was the most frequent GI complaint in both groups after the study started [1], however the occurrence of diarrhea (7.4% in patients with GI events at baseline and 7.1% in patient without GI at baseline) or any GI event (17.7% in patients with GI at baseline and 13.3% in patients without GI events at baseline) was low during the study period. Therefore this new occurrence of diarrhea may not explain the low ICC alone. However, the fluctuations in diarrhea combined with a small sample size could have explained the low ICC. To investigate this hypothesis, a recalculation of this ICC in a larger, stable population was performed by using the stable safety population from the PROGIS study (n = 127) [1]. The ICC for Diarrhea improved to a moderate value of 0.49 (which is nevertheless still below the predetermined cut-off value of 0.6), while the ICC for Abdominal Pain, Indigestion Syndrome and Constipation remained above 0.6 and the ICC for Reflux Syndrome reduced to 0.49. The GIQLI Total Score and subscales all had satisfactory reproducibility. A previous study validated the GSRS and GIQLI in renal transplant patients[2, 3]. Due to the cross-sectional study design, test-retest reliability was not evaluated in that research. This work adds to the validity of the GSRS and the GIQLI not only confirming previous findings but by also proving the stability of the scores over time.

Validity of the GSRS and GIQLI was also demonstrated in this study. Construct validity was established through correlations of the questionnaires with another, generic questionnaire, the PGWB. For the GSRS, the highest correlations were seen between the Indigestion Syndrome subscale and the PGWB Anxiety and General Health Subscales. A high correlation was also observed between the GSRS Indigestion Syndrome and the PGWB Total Score and between the GSRS Constipation subscale and the PGWB General Health subscale. The GSRS Diarrhea subscale produced low correlations with the PGWB Depressed Mood and Self-control subscales. The GSRS Reflux Syndrome and Abdominal Pain subscales were also poorly correlated with PGWB Positive Well-being subscale. The patterns of association are also similar those previously reported[14]. The GSRS demonstrates adequate construct validity in this study.

The correlations between the GIQLI and PGWB were higher than those between the GSRS and PGWB, as the GIQLI and PGWB both assess a similar construct (HRQL), whereas the GSRS assesses GI symptoms. The GIQLI Emotion subscale was highly correlated with the PGWB Total score and also with its Anxiety and Depressed Mood subscales. The GIQLI Physical Function and Social Function subscales were highly correlated with the PGWB Total Score and PGWB General Health. The lowest correlations were between the GIQLI Medical Treatment subscale and the PGWB Anxiety, Positive Well-being, and Self-control subscales. Participants received their transplants a little over 3 years prior to enrollment in this study. Medical treatment may not be unusually stressful for them at this point, as they are quite familiar with the healthcare system, their clinicians, and their treatment. Therefore, it is logical that medical treatment would not be associated with anxiety, positive well-being, or self-control.

The GSRS and GIQLI both demonstrated good known groups validity, distinguishing between patients with and without GI complaints. Also, in this South American cohort all the results between the GI and no-GI groups were clinically significant as the differences were above the minimum important clinical difference calculated using the whole PROGIS sample including the South American participants. Both questionnaires were in some instances able to statistically and clinically distinguish among patients with varying GI complication severity demonstrating sensitivity not only to the presence of symptoms but to the severity of those symptoms as well. However, and similar to Kleinman[2, 3], clear consistent gradient relationships were not observed.

We acknowledge the relatively small sample size of this study. The significant differences in questionnaire scores observed make those results even more impressive given the small number of participants with mild and severe GI complications. We faced some limitations in terms of the available translations of the instruments for this study. We used the Argentinean Spanish version of the GSRS as the most accurate available GSRS version to represent the study population. However, one center was in Chile, where this version may not have been an optimally culturally adjusted questionnaire. Additionally, we used the Spanish for Spain version of the GIQLI with minimal changes to its wording introduced by the research team based on well known differences in medical terms between Spain and South America. It was the impression that the questionnaires looked acceptable to be used in the study as a high response rate was obtained. However, the use of country-specific versions of the GIQLI and a Chilean-specific version of the GSRS would have been better would these translations have been available.

Conclusion

The results of the study suggest that the Argentinean Spanish version of the GSRS and the Spanish version of the GIQLI are valid and reliable for use in a post-renal transplant population in South America. These results are a useful addition to the development of patient reported outcomes research in South America. Patients with GI complaints reported poor HRQL and strategies are needed to improve patients' HRQL.