Introduction

Colorectal cancer is a prevalent cancer, and both the disease and its treatment strongly impact quality of life (QoL). To allow for the evaluation of new treatments, the European Organisation for Research and Treatment of Cancer (EORTC) developed the colorectal QoL module QLQ-CR38 [1] as an adjunct to the generic EORTC QLQ-C30. Later, this was revised to the shorter QLQ-CR29 [2] and validated in an international study [3]. The resulting QLQ-CR29 consisted of four scales and 19 individual items. Later validation studies were reported for the Polish [4] and Spanish [5] versions. Validation of the Danish QLQ-CR38 [6] suggested the QLQ-CR29 to be more valid than the QLQ-CR38. In the Spanish QLQ-CR29, the blood and mucus scale was not confirmed; in the Polish only the body image scale was reliable, and the urinary incontinence scale approached acceptable reliability. Construct validity was limited for the Polish version and showed ambiguous results for the Spanish. In both cases, the authors nevertheless concluded the questionnaire to be reliable and valid. These equivocal results led us to assess the reliability and validity of the Dutch version and to assess whether additional scales might result in a reduction in the number of individual items.

Materials and methods

Translation and procedures

The QLQ-CR29 had been translated into Flemish/Dutch by the EORTC Quality of Life Group, following their Translation Procedure Manual Instructions [7]. Differences in the Dutch language exist between Belgium and the Netherlands, and pilot testing was undertaken to reword some items for a Dutch population, in 29 patients with colorectal cancer from the Leiden University Medical Center (LUMC). Suggested changes were discussed with experts of the EORTC, resulting in a final Dutch translation.

Consecutive patients were recruited from two academic and two peripheral hospitals in the western region of the Netherlands [LUMC—Departments of Surgery and Radiotherapy, Alrijne Hospital Leiden (former locations Diaconessen Hospital and Rijnland Hospital)], and Erasmus Medical Center Rotterdam, between May 2011 and December 2012. In three departments (Diaconessen Hospital, LUMC—Surgery, and ErasmusMC), research nurses handed the questionnaire to the patients (n = 123, response rate 79 %) at the time of their follow-up visit, and in one hospital (Rijnland), the questionnaire was sent to patients (n = 80, response rate 83 %) who had undergone treatment for colorectal cancer between May and December 2011. In one department (LUMC—Radiotherapy), the questionnaire was sent to patients (n = 93, response rate = 78 %) who participated in other studies [8, 9]. Of the 296 patients receiving a questionnaire, 244 returned it, and we included 236 completed questionnaires (response 80 %). Time between surgery and filling out the questionnaire ranged from 5 months to 12 years. No information is available on the non-responders, unfortunately, but given the nature of the task, filling out a short questionnaire, we do not expect major non-response bias.

For convergent validity, participants were additionally asked to fill out the EORTC QLQ-C30. For test–retest reliability, we approached patients who had indicated their willingness in the first questionnaire. The questionnaire was sent to every fifth participant within 2 weeks of returning the first questionnaire. Twenty-seven patients (out of 48 invited, 56 %) filled in the questionnaire twice, on average 19 days after the first (range 4–46 days). Patient characteristics are presented in Table 1.

Table 1 Patient characteristics

Statistical analysis

We assessed item performance, by proportion of floor and ceiling effects, and by test–retest reliability (intraclass correlation coefficients, ICCs). Since the QLQ-CR29 was shown to consist only of few and mostly two-item scales, we carried out a principal component analysis to detect potential additional subscales, based on eigenvalues (>1.0). Items 49–54 on bowel problems (patients without a stoma) and stoma problems (patients with a stoma), respectively, were used as if the same items for patients without and with a stoma. We used varimax rotation to facilitate interpretation [10]. We assessed scale reliability using Cronbach’s α, for both the newly found scales and the original four scales. Subscales were constructed on the basis of the principal component analysis by adding the unweighted scores of the variables that loaded on a factor and normalizing to 0–100. Finally, we assessed construct validity as done in the earlier studies [4, 5], using correlations with the QLQ-CR30 (scores below 0.40 indicating no undue overlap between the constructs of the two questionnaires), and known-groups comparisons comparing older (≥66 years) and younger (≤65 years), patients with and without a stoma, and patients treated with curative and palliative intent using Mann–Whitney U tests.

Results

Characteristics of items

Table 2 presents the item characteristics and the subscales detected. ICCs were low for urinary incontinence and dysuria. The percentage respondents at floor was rather high (>50 %) in the blood and mucus in stool scale and for 19 individual items.

Table 2 Quality of life scores according the EORTC QLQ-CR29, structure and reliability

Factor analysis and reliability

Factor analysis revealed seven factors, of which the original urinary frequency scale (Cronbach’s α = 0.71) and body image (α = 0.80) scales were reproduced (alpha in the original study [3] of 0.71 and 0.84, respectively). The original two-item stool frequency scale (items 52 and 53) had a lower α (0.68, originally 0.70 [3]) than when included in a larger factor, with all bowel and stoma problems included (items 49–54: α = 0.80). This latter scale also showed good reliability for patients with (α = 0.80) and without (α = 0.84) a stoma. The blood or mucus in stool scale was reproduced in the factor analysis but had a low α of 0.56 (originally 0.69 [3]). All remaining factors did not form clearly interpretable scales, and reliabilities were all below 0.70. We thus present construct validation for the original scales and items, as well as the new bowel/stoma problems scale.

Construct validity

Correlations between the subscales and the QLQ-C30 subscales were below 0.40, except for body image, which correlated moderately (r = 0.48) with social functioning.

Younger compared to older patients had significantly worse sexual functioning (Table 3) and had fewer problems with urinary frequency and incontinence and with a dry mouth. Patients without a stoma had a higher body image and less urinary incontinence. Patients treated with curative intent indicated more problems with blood and mucus in stool, defaecation problems, buttock pain, and stool frequency and fewer problems with hair loss and trouble with taste than patients treated with palliative intent.

Table 3 Known-groups comparisons (age, stoma, treatment intent)

Discussion

This study largely replicates the findings of the original study [3] and the Spanish validation [5]. As in the original study, the body image and urinary frequency scales were reliable, while the blood and mucus scale was only moderately reliable. An important result is that we found a reliable scale incorporating the items about bowel problems or stoma problems. Neither the Spanish nor the Polish study performed an exploratory factor analysis and only reported the results for the scales defined in the original paper [3]. Since the original stool frequency scale was incorporated in this new scale, the questionnaire still consists of four scales, but with 14 additional single items instead of 19. For reasons of reliability and multiple testing, it is recommended to have as few single items as possible, so this is an improvement.

Remarkable was the better item performance in our study compared to the Spanish validation, where ceiling effects were present in over 50 % of the scores in four domains (body image, anxiety, weight, and impotence). The patients in our sample scored markedly lower than those in the Spanish study, likely reflecting in part cultural values about body image and sexuality. Dysuria had similar high floor effects in the Spanish [5] and Danish [6] studies. We recommend additional assessment of the items urinary incontinence and dysuria, which showed poor reliability and item performance.

Reliabilities of the items in the original study were higher than ours (ICCs > 0.55). The other studies did not report test–retest reliability.

Construct validity was sufficient, as shown by only limited overlap between the QLQ-CR29 and QLQ-C30 (similar to the original study [3], apart from the correlation only we found between body image and social functioning). We also found differences in scores between groups that were well interpretable. We found fewer differences between patients with and without a stoma than the original study [3] (which also saw differences for the urinary frequency scale and the faecal incontinence, sore skin, and embarrassment items). Further, patients receiving palliative treatment in that study reported more problems with hair loss, anxiety, faecal incontinence, and dyspareunia, whereas in our study they reported less blood and mucus in stool and buttock pain, and lower stool frequency.

In conclusion, we were able to replicate earlier findings, but could also reduce the number of single items and thus improve on the QLQ-CR29 as published so far. We recommend that the remaining individual items be revised to improve their performance, and that following that, more psychometric research be carried out to reduce the number of individual items.