Malignant diseases of the biliary tree, encompassing cholangiocarcinoma and cancer of the gallbladder, are relatively rare, with evidence that incidence is rising (McGlynn et al, 2006). Incidence in the United States has been reported as between 1 and 2.5 cases per 100 000 per year (Kaushik, 2001; Khan et al, 2005), although rates of up to 96 per 100 000 have been reported in some high-risk populations (Shaib and El-Serag, 2004).

To date, cholangiocarcinoma and cancer of the gallbladder are associated with poor outcomes and impaired quality of life (QoL) (Heffernan et al, 2002). Patients presenting with advanced disease may have a mean survival of ⩽12 months (Anderson et al, 2004). Extrahepatic disease has a marginally better prognosis than intrahepatic disease (Ahmed et al, 2008).

Presenting symptoms in cholangiocarcinoma are dependent on the location of the malignancy in the biliary tree (Anderson et al, 2004) and are directly related to obstruction of the biliary tree (DeOliveira et al, 2007). Intrahepatic disease may manifest as a liver mass or cholangitis with possible intrahepatic abscess formation as well as right upper quadrant pain related to capsular distension, whereas extrahepatic neoplasms lead to painless obstructive jaundice and cholangitis. Gallbladder cancer may be found incidentally at cholecystectomy; however, it may present with pain due to mass effects and may also trigger obstructive jaundice. In addition to local effects, systemic symptoms such as weight loss, fevers and anorexia are commonly seen.

Treatments consist of surgical resection (DeOliveira et al, 2007), biliary stenting or external drainage (Patel and Singh, 2007), chemotherapy (Hong et al, 2007) and radiotherapy (Ben-David et al, 2006). Significant survival benefits have not been reliably demonstrated (Malhi and Gores, 2006; Hong et al, 2007) and treatment-related QoL changes are poorly studied.

Quality of life and its measurements are increasingly regarded as crucial end points in clinical trials, as well as being gradually introduced into day-to-day clinical practice. There is paucity of data regarding QoL in cholangiocarcinoma and cancer of the gallbladder, and to date there is no disease-specific questionnaire available. The need for a disease-specific QoL questionnaire, and its subsequent development from phase I to III was first reported in 2011 (Friend et al, 2011). The European Organization for Research and Treatment of Cancer (EORTC) QLQ-BIL21 (Supplementary material) is a new disease-specific module to be used with the existing generic EORTC QLQ-C30 tool (Aaronson et al, 1993).

This study describes an International Phase IV Psychometric Validation Study designed to assess the clinical and psychometric reliability, validity and responsiveness to change of the QLQ-BIL21 questionnaire in patients with cholangiocarcinoma and cancer of the gallbladder.

Materials and methods

The EORTC guidelines (Blazeby et al, 2002) for module development were followed in this international multicentre study. The initial phases involve a process of collating a list of issues from patients, health-care workers and previous publications, which are subsequently developed into a bank of questions from which the module is constructed. The module is then tested in a Phase IV Validation Study.


This study required patients to complete two paper questionnaires at all assessment time points: the well-established EORTC generic cancer questionnaires QLQ-C30 and the QLQ-BIL21. The QLQ-BIL21 consists of 21 questions: 3 single-item assessments relating to treatment side effects, difficulties with drainage bags/tubes and concerns regarding weight loss, in addition to 18 items grouped into 5 proposed scales: eating symptoms (4 items), jaundice symptoms (3 items), tiredness (3 items), pain symptoms (4 items) and anxiety symptoms (4 items). The response format was a four-point Likert scale.

Responses to the QLQ-C30 and QLQ-BIL21 questionnaires were transformed into a 0–100 scale using EORTC guidelines. Translations according to EORTC guidelines (Dewolf et al, 2009) were completed for six languages (German, Dutch, Italian, Spanish, Mandarin Chinese and Hindi).


Patients were recruited between September 2011 and June 2014, and were eligible for inclusion if they were aged 18 years or above, with a diagnosis of cholangiocarcinoma or gallbladder cancer confirmed by histology or MDT (multidisciplinary team) opinion, were able to give written informed consent, had no prior history of other significant malignant disease, were able to understand the language of the questionnaire and had an expected minimum survival of 3 months. Patients were excluded if there was concurrent malignant disease elsewhere (except basal cell carcinoma of the skin), if prior surgery with curative intent had taken place (with no evidence of recurrence), or if psychological, familial, sociological or geographical reasons would hamper compliance with the study protocol. Ethics committee approval and individual written patient consent was obtained. The study protocol was approved by the EORTC QoL group.

Study design

QoL changes may be different following major surgical interventions when compared with procedural therapies or compared with supportive care alone; thus, patients were assigned to one of three groups depending on their planned treatment: Intervention Group 1 (surgical treatment including prior stents or drains), Intervention Group 2 (chemotherapy/radiotherapy/photodynamic therapy/laser therapy including prior stents or drains) and Intervention Group 3 (supportive care only, but not excluding stents or drains). Prior active treatment was not a barrier to participation provided all residual effects of the intervention had resolved by the time the patient was enrolled.

Patients completed both questionnaires (QLQ-C30 with QLQ-BIL21) at baseline. This was within 1 month before commencing treatment for Intervention Groups 1 and 2; for Intervention Group 3, the baseline assessment was performed as soon as the decision to proceed with supportive care had been made, and all patients completed a second set 2 months post baseline within a 2-week window. The Karnofsky performance status (KPS) was recorded at both time points. All patients completed a short paper debriefing questionnaire following the baseline assessment, covering issues such as completion time or identifying questions that were confusing, upsetting or difficult to answer.

Test–retest reliability was assessed in 67 clinically stable patients from across all three intervention groups 2 weeks after the 2-month follow-up assessment. Patients receiving intravenous chemotherapy at the time were excluded from participating in the test–retest element.

Statistical analysis


The reliability, or internal consistency, of the QLQ-C30 and QLQ-BIL21 questionnaires was assessed by Cronbach’s α-coefficient (Cronbach and Warrington, 1951). An internal consistency estimate of >0.70 was considered acceptable for group comparison (Nunnally, 1994). Test–retest reliability was measured by calculating intraclass correlations (ICCs) for the QLQ-C30 and QLQ-BIL21 questionnaires. An ICC of >0.90 is considered desirable, but values as low as 0.70 are acceptable for clinical use.


Multitrait scaling was used to examine the hypothesised scale structure of the individual items of the QLQ-BIL21. To test item scale convergence validity, a correlation of ⩾0.40 was used. Comparison of an item with its own scale as compared with other scales was used to support item-discriminant validity (Hays et al, 1988). Items were expected to correlate significantly better (at least twice the standard error) with its own scale than with other scales.

Three distinct approaches were used to evaluate the validity of the QLQ-BIl21:

  1. i)

    Convergent validity of the QLQ-C30 and QLQ-BIL21 questionnaires: This was examined using Pearson’s product–moment correlation. It was expected that conceptually related scales will correlate substantially with each other (Pearson’s r>0.40), and conversely scales that are less related will exhibit lower correlations (Pearson’s r<0.40).

  2. ii)

    Known group comparison of the QLQ-C30 and QLQ-BIL21 questionnaires: This was performed to explore the ability of the questionnaire to discriminate between subgroups of patients differing in their clinical status, such as site (gallbladder vs intrahepatic cholangiocarcinoma vs extrahepatic cholangiocarcinoma vs disseminated disease) or treatment (resection possible vs inoperable). Differences were assessed using the T-test and if any were close to being nonsignificant, and were checked using Wilcoxon’s rank sum for nonparametric data.

  3. iii)

    Responsiveness to clinical change over time of the QLQ-C30 and QLQ-BIL21 questionnaires: This was assessed using the two sets of questionnaires available for each patient over time, with the mean QoL scores over time for items and scales reflecting change in QoL, the development of disease recurrence or metastases and performance status. These changes were compared by repeated-measure analysis of variance (ANOVA).

Statistical Package for the Social Science (SPSS) software (IBM Corporation, Armonk, NY, USA) was used for data analysis and the linear transformation of responses to the QLQ-C30 and QLQ-BIL21 questionnaires to a 0–100 scale using EORTC guidelines and SPSS syntax. Stata 11 was used for all analyses and a P-value of 0.01 was deemed significant. The method outlined in the EORTC scoring manual was used to impute missing data points (Fayers)

Sample size

The sample size was calculated to be 231 patients (Walter et al, 1998) based on the anticipation of a 5% attrition rate from enrolment to completion, and taking into account that a sample of 220 patients is required to achieve 80% power to detect differences in the α-coefficient under a null hypothesis of 0.60 and the alternative hypothesis of 0.70 using a two-sided F-test with a significance level of 0.05. A minimum sample of 210 is needed to achieve 10 responses per item (Tabachnik, 1993).



A total of 172 patients with cholangiocarcinoma and 91 patients with cancer of the gallbladder were recruited making a total of 263 eligible patients. Eleven centres in the United Kingdom (143 patients recruited overall), 2 in Germany (13 patients recruited overall) and 1 each in the Netherlands (14 patients), Italy (12 patients), Chile (4 patients), India (42 patients) and China (7 patients) enrolled a total of 263 patients fulfilling the study’s inclusion criteria. Intervention Group 1 consisted of 44 patients receiving surgical treatments (16.9%), and Intervention Group 2 consisted of 103 patients receiving medical treatments (39.6%). Patients in the Intervention Group 3 (113 patients or 43.4%) received supportive care (no interventions with the exception of drains or stents). Three of the 263 patients were not attributed to a treatment group. Overall, 98 patients or 37.1% received a stent and drainage was performed in 31 or 11.7% of patients

Questionnaire completion

In total, 478 questionnaires were available for analysis. Of the 263 patients enrolled, 75 were not able to complete the study at 2 months (48 died, 11 were lost to follow-up, 13 too unwell and 3 for other reasons). Notably, 62% (n=30) of patients who died were enrolled in India, representing 70% of all Indian patients.

Debriefing results

Of the 263 patients in the study, 256 completed the debriefing questionnaire at the baseline assessment. Of these, 66% (n=168) did so in the hospital outpatient clinic, 10.9% at home (n=28), 7.8% (n=20) as an in-patient and 15.2% (n=39) elsewhere. Overall, 89.1% of patients completed both QLQ-C30 and QLQ-BIL21 questionnaires in <20 min, with 38.3% patients (n=98) completing in <10 min, 34.4% in 11–15 min (n=88), 16.4% in 16–20 min (n=42), 9% in 21–30 min (n=23) and 1.6% taking over 30 min (n=4). Help completing the questionnaire from family members or health-care staff was necessary in 46.4% of patients. A total of 41 patients (16%) found one or more items confusing or difficult to answer, the most common of which were question 49 (found confusing by 10 patients) and question 50 (found confusing by 4 patients) in the QLQ-BIL21 questionnaire. Forty-five per cent of patients (n=115) found at least one question not relevant. In the QLQ-BIL21 questionnaire, question 50 (pertaining to drainage tubes and bags) was considered irrelevant by 29 patients. Question 49 was deemed irrelevant by 16 patients, and in the QLQ-BIL21 questionnaire, questions 37 and 35 (both pertaining to jaundice) were deemed irrelevant by 12 and 10 patients, respectively.

QLQ-BIL21 scale structure: consistency and reliability

The internal consistency of the QLQ-C30 and QLQ-BIL21 questionnaires was calculated for each scale at baseline, at 2-month follow-up and as an overall global score (Table 1). The overall global score of Cronabach’s α coefficients was calculated using all data including at baseline as well as at 2 months. Cronbach’s α-coefficients for all scales in the QLQ-BIL21 questionnaire exceeded the threshold of 0.70 at baseline (ranges 0.71–0.87). Overall Cronbach’s α-coefficients for the QLQ-BIL21 scales all met the 0.70 threshold (ranges 0.71–0.89). For the QLQ-C30 questionnaire, a Cronbach’s α-coefficient of >0.70 was met for all scales except Physical Function (α=0.47), Cognitive Function (α=0.65) and Nausea/Vomiting (α=0.67) at baseline.

Table 1 Cronbach’s α for BIL21 scales

Intraclass correlations (Table 2) were calculated using the 67 individuals in the study who completed the test–retest questionnaire 2 weeks following the second assessment. ICC values for the QLQ-C30 scales varied from 0.52 to 0.92. All scales in the QLQ-BIL21 scale scored ICC values >0.8, the only ones scoring ⩽0.9 being the Eating (ICC=0.87), Jaundice (ICC=0.86), Treatment Side effects (ICC=0.83) and Weight Loss (ICC=0.81). No ICC value in the QLQ-BIL21 scale fell below 0.80, well above the acceptable value of >0.70 (Fayers, 2000).

Table 2 Results for test–retest for BIL21 scales

Construct validity

Construct validity was determined for all scales using convergent and discriminant validity. Convergent validity was calculated for each scale in both the QLQ-C30 and QLQ-BIL21 questionnaires. For both questionnaires, all items had Pearson’s r correlations >0.4 for their own scales (Table 3). This confirms the clinical reasoning that the individual questions have been placed in the correct scales and are broadly measuring aspects of the same theme. Discriminant validity was calculated for all items in the QLQ-BIL21 questionnaire, none of which were >0.70. This indicates that none of the items correlate with scales outside the scale they have been placed into and indicated that items must therefore be in the correct scale.

Table 3 Construct validity for QLQ-BIL21 questions

Clinical validity

The ability of the QLQ-BIL21 questionnaire to correlate with clinical scores of function at discrete time points was assessed using KPS scores at baseline stratified into two groups (KPS <70 and >70). The difference in scores for each scale in these two groups subsequently underwent group comparison testing (n=238–256; Table 4). This shows that the QLQ-BIL21 questionnaire is able to identify significant differences in mean scale scores between the two Karnofsky groups in all scales except Jaundice (P=0.139) and Weight loss (P=0.898). This may be explained by the fact that weight loss is a less prominent feature in biliary malignancies than in other GI cancers. The lack of correlation between jaundice and low KPS is of particular interest.

Table 4 Known-group comparison: correlation with KPI

Responsiveness to clinical change over time

Clinical change over time was assessed by comparing mean scale scores at baseline and follow-up in 154–178 patients (this variation was due to some scales not being fully completed at the follow-up assessment) (Table 5). Changes in mean scores by scale over time are a measure of clinical responsiveness. These changes were only significant for eating, jaundice, tiredness, pain and treatment side effects. A change in mean scale score of >10% was only seen in the treatment side effects item (Δ12.54). Changes over the 2-month follow-up period for anxiety (P=0.066), drains (P=0.834) and weight loss (P=0.519) were not significant.

Table 5 Change over time for whole group and intervention subgroups 1–3

Analysis of the changes to scales over time stratified by the intervention group (Table 5) showed more subtle changes. Eating symptoms changed minimally by a nonsignificant margin in all three intervention groups. Jaundice symptoms showed a modest improvement (change of −13.7) in the Intervention Group 1 (surgery), with a smaller change (−5.6) in the Intervention Group 2 (medical therapies including chemotherapy and radiotherapy). There were no significant changes to jaundice symptoms in the Intervention Group 3 (supportive care only but including biliary stents and drains). Tiredness symptoms worsened significantly in the Intervention Groups 1 (+12.0) and 3 (+8.8), with insignificant change in the Intervention Group 2. Pain and anxiety symptoms improved significantly (change of −11.5 and −7.9, respectively) in the Intervention Group 2, with no significant changes in either Intervention Group 1 or 3. Treatment side effects worsened markedly in the Intervention Group 2 (+19.7), with no significant changes in Intervention Groups 1 and 3. Symptoms regarding drains and weight loss did not change significantly across all three intervention groups between baseline and follow-up. The worsening of weight loss symptoms (+10.0) in the Intervention Group 3 was not statistically significant, but had a significant clinical change.

Differences between cholangiocarcinoma and cancer of the gallbladder

As this module covers more than one group of diseases clinically, differences between these groups were calculated. It might be expected that jaundice may be more prominent in cholangiocarcinoma than gallbladder cancer, but there was no clear difference in this scale. Mean scores for the QLQ-BIL21 were calculated at baseline and at follow-up, and any group differences between cholangiocarcinoma and cancer of the gallbladder were investigated using repeated-measure ANOVA. Data for the baseline QLQ-BIL21 are displayed in Table 6. There was a significant difference between extra- and intrahepatic cholangiocarcinoma in the drain sites and weight loss questions only, as might be expected clinically since drains would be used more for intrahepatic than extrahepatic disease, and weight loss might be expected if there is a significant bulk of intrahepatic disease. None of the other scales were significantly different at baseline.

Table 6 Mean baseline scores for individual sites of disease showing n, mean and s.d. for extrahepatic, intrahepatic and gallbladder cancer


The current study is the final stage of a process of development and validation of a specific set of questions appropriate for biliary tract cancers. The QLQ-BIL21 questionnaire was conceived as the first disease-specific QoL assessment tool for patients with cholangiocarcinoma and cancer of the gall bladder. Although the existing EORTC questionnaire modules QLQ-PAN26 (Pancreatic Cancer Module, presently awaiting phase IV validation) (Fitzsimmons et al, 1999), the QLQ-LMC21 (Colorectal Liver Metastasis module) (Blazeby et al, 2009) and the questionnaire FACT-Hep (for all hepatobiliary cancers) covered many similar domains, and indeed some of their issues contributed to the construction of the QLQ-BIL21, the differences in presentation and disease course are reflected in a significantly different scale structure.

This International Phase IV Validation Study of the QLQ-BIL21 questionnaire conducted in accordance with EORTC guidelines found the reliability and validity of the questionnaire to be robust. Cronbach’s α-coefficients for all scales overall met the threshold of >0.70. The robustness of the testing parameters indicate that the QLQ-BIL21 questionnaire will prove clinically useful in trials as well as in clinical practice. The reliability analysis of the questionnaire did not fully meet the ideal ICC threshold of >0.90 for all scales (the exceptions being: eating, jaundice, treatment side effects and weight loss), but exceeded the minimum reported standard of 0.70. The QLQ-C30 questionnaire behaved differently in this population, with two scales failing to score an ICC >0.70. These results are an indication that the QLQ-BIL21 questionnaire may be needed in addition to the QLQ-C30 questionnaire in this patient population.

Analysis of the debriefing questionnaires highlighted that a large number of patients found QLQ-BIL21 question 50 (‘Have you had difficulties with drainage tubes/bags) irrelevant. This is not surprising as not all patients will encounter drains in the course of their treatment. To maintain clarity, it may therefore be desirable to enable patients to answer this question and also question 49 with ‘not applicable’ rather than a Likert scale in the final version of the questionnaire.

Patients enrolled in India differed statistically from those in other centres for some characteristics. The Indian cohort of 43 patients was significantly younger with a mean age of 52.24 years, a decade below the next youngest cohort in the Netherlands (mean age 62.6 years) and over two decades younger than the oldest cohort in the United Kingdom (Winchester mean age 75.5 years). Furthermore, 93% of Indian patients had primary gallbladder cancer, compared with an aggregated mean of 21.7% for all other centres. Overall, 43% of all gallbladder cancer patients enrolled in the study came from the Indian cohort. Mortality was also significantly higher in the Indian cohort, with a mortality of 70% (n=30) compared with an overall mortality of 18%, whereas excluding the Indian patients from the analysis yields an overall mortality of just 8%. Despite this demographic background, there seems to be no significant differences in QLQ-BIL21 scores for any scale, with the exception of jaundice, which is broadly congruent with clinical expectation.

Known group comparison testing demonstrated the QLQ-BIL21 questionnaire to be adept at identifying group differences at time points. Of particular interest here was the low correlation between jaundice and a low KPS. This may be a reflection of the fact that even a very small tumour load may generate jaundice without a concurrent effect on systemic symptoms or possibly that jaundice is often treated at an early stage while the tumour progresses. Similar results were found in the development of the QLQ-PAN26, although the jaundice items were retained on expert opinion.

Overall, the measured response to clinical change was not as marked as anticipated, but was significantly different from baseline in most scales. This may be a reflection of the short follow-up period, which was initially selected to prevent patient drop out owing to clinical deterioration and because it represents the period of maximal clinical intervention. It would have been desirable to continue data collection beyond this; however, the poor prognosis of this rare cancer would have led to a highly significant attrition rate. Indeed, even at 2 months 50 out of the 263 patients recruited had died. Comparable studies include a 2010 study of 91 patients in Romania who were followed up for up to 4 years with regular QLQ-C30 assessments (Mihalache et al, 2010). At the first 6-month follow-up appointment, statistically significant increases in global scores were detected. Conversely, a 2015 Thai study of 99 patients found significant decreases in QoL using the FACT-Hep Questionnaire at 2 months (Woradet et al, 2015). It may therefore prove desirable to further investigate the QLQ-BIl21 questionnaire in a smaller cohort over a longer time period to assess response to change over time more thoroughly. When each group was analysed separately, the changes identified in each treatment group by the QLQ-BIL21 questionnaire are broadly congruent with clinical expectation, with less change in Group 3 compared with the other active treatment groups.

Overall, the strengths of this study are that it followed EORTC module development guidelines, which included many different languages and cultures, including patients with a wide variety of stages, therapies and presentations of this disease. A drawback of this study is that a large proportion of patients were recruited from the United Kingdom, which may have caused some biases in terms of language and culture, although it is debatable whether the results would have been materially different had a more even spread of international patients been recruited.

Changes to biliary tract cancer therapeutics are evolving. The first advanced biliary tract cancer randomised phase II trial (ABC-01) of gemcitabine vs cisplatin/gemcitabine was reported in 2009 (Valle et al, 2009), leading to the pivotal randomised phase III study (ABC-02) establishing cisplatin/gemcitabine chemotherapy as the international standard of care regimen for patients with these disease (Valle et al, 2010). Its sixth iteration (ABC-06), a phase III randomised controlled trial of oxaliplatin and 5-FU vs active symptom control in the second-line setting after failure of cisplatin/gemcitabine chemotherapy, is currently active with recruitment closing in summer 2016; this study is one of the first to incorporate the use of QLQ-BIL21. The changes in therapeutics occurring as a result of these trials may require either a future adaptation of the QLQ-BIL21 questionnaire to include side effects of new drugs or an additional module specifically for new therapies.


The QLQ-BIL21 questionnaire has been demonstrated to be a clinically sensitive, reliable and valid instrument for measuring QoL in patients with cholangiocarcinoma and cancer of the gallbladder by this International Multicentre Phase IV Validation Study. The authors therefore recommend that this tool be used in the clinical trial and clinical practice setting to provide an accurate quantification of QoL to guide therapy and future research.