Study design and participants
Dutch bladder cancer patients participating in the UroLife (Urothelial cell cancer: Lifestyle, prognosis and quality of Life) or BlaZIB (‘BlaaskankerZorg In Beeld’, clinical trial number: NL8106) studies were included in the current analysis. Both studies are population-based, multicenter prospective cohort studies recruiting newly diagnosed bladder cancer patients based on notifications from the nationwide network and registry of histopathology and cytopathology in the Netherlands (PALGA) and successive registration in the Netherlands Cancer Registry (NCR). The main aim of the Urolife study is to evaluate the association between lifestyle habits and the risk of recurrence and progression and HRQoL of patients with NMIBC. BlaZIB aims to gain insight in bladder cancer care and to identify barriers and modulators for optimal care. More detailed information on these studies can be found elsewhere [8, 9]. For this analysis, patients diagnosed with NMIBC (stage Ta, T1, Tis) between April 1, 2014 and March 18, 2016 were selected from the UroLife study, and patients diagnosed with high-risk NMIBC (stage T1 and Tis) between November 1, 2017 and July 7, 2019 were selected from the BlaZIB study. All patients were Dutch speaking, between 18 and 80 years old, and treated with a transurethral resection. This study was performed in line with the principles of the Declaration of Helsinki. The Committee for Human Research in the region Arnhem-Nijmegen provided ethical approval for the UroLife study (CMO 2013-494) and deemed the BlaZIB study exempt from ethical review under the Medical Research Involving Human Subjects Act (WMO). Both studies were approved by the ethical review board of the NCR. Written informed consent was obtained from all patients participating in UroLife or BlaZIB.
Data collection
Both studies collected self-reported questionnaire data online or on paper 6 weeks after diagnosis (T6wk). The online questionnaires were collected via the data collection tool of the Patient Reported Outcomes Following Initial treatment and Long term Evaluation of Survivorship (PROFILES) registry [10]. Baseline data (T6wk) and follow-up data collected at 3 months (T3mo) and 15 months (T15mo) after diagnosis in the UroLife study, and at 6 months (T6mo) and 12 months (T12mo) after diagnosis in the BlaZIB study were used for the current analysis. The measurement points of UroLife were based on the treatment regimen of patients diagnosed with NMIBC, i.e. shortly after histological confirmation of the tumour (T6wk), at time of cystoscopy to investigate whether the tumour was successfully removed (T3mo), and long-term follow-up (T15mo). For the BlaZIB study, including also patients with Muscle Invasive Bladder Cancer, different measurement points were selected. Because of the nonconforming measurement points, most analyses were based on UroLife data and only supplemented with BlaZIB data where necessary (i.e. test–retest, interpretability of change scores; see also Fig. 1). The baseline questionnaires assessed demographics, smoking history, and comorbidity. HRQoL was assessed at T6wk and during follow-up using the EORTC-QLQ-C30 and the QLQ-NMIBC24 questionnaires [2, 11]. Patients who underwent a cystectomy were not or no longer invited to participate in the UroLife study.
In order to assess the test–retest reliability and standard error of measurement (SEM), patients who completed the BlaZIB T12mo questionnaire between March 1st 2019 and December 7th 2019 received an additional questionnaire 2 weeks after the T12mo questionnaire (T12mo + 2wk). In total, 134 patients diagnosed with NMIBC completed the T12mo + 2wk questionnaire (response rate 86.5%). This questionnaire included the QLQ-NMIBC24 and four additional questions to assess whether the symptoms – in terms of urinary, bowel, sexual and total function – had decreased, remained the same or increased compared to the T12mo questionnaire (three-point Likert scale, see Additional file 1: Appendix A). Patients whose symptoms remained the same were regarded as stable and included in the test–retest analysis.
In order to assess the minimal important change (MIC), the follow-up questionnaires of BlaZIB (T6mo, T12mo) contained an anchor question to assess changes over time., i.e. ‘Did your bladder cancer-specific complaints (urinary, bowel, sexual function and overall) change compared to your complaints at diagnosis?’. Patients were asked to score their change on a nine-point Likert scale ranging from 1 (worse than ever) to 9 (no complaints anymore) for urinary, bowel, sexual and total function, separately [12]. We clustered the answers into three categories: importantly deteriorated (1–3), not importantly changed (4–6) and importantly improved (7–9) [13].
HRQoL questionnaires
The EORTC QLQ-C30 is the core HRQoL questionnaire of the EORTC and measures the HRQoL of cancer patients. The questionnaire consists of 30 items organized into a global health status scale, five functioning scales (physical, role, cognitive, emotional, and social), three symptom scales (fatigue, pain, and nausea and vomiting) and six single items (dyspnoea, insomnia, loss of appetite, constipation, diarrhea, and financial impact) [11].
The QLQ-NMIBC24 is an EORTC module for patients diagnosed with NMIBC and should be administered in addition to the core questionnaire (EORTC-QLQ-C30) [4]. The module includes constructs specific to the tumour site and treatment of NMIBC. The QLQ-NMIBC consists of 24 items organized into six scales (urinary symptoms, malaise, future worries, bloating and flatulence, sexual functioning, and male sexual problems) and five single items (intravesical treatment issues, sexual intimacy, risk of contaminating partner, sexual enjoyment, female sexual problems) [3].
All items were scored on a four-point Likert scale ranging from 1 (not at all) to 4 (very much), with the exception of the global health status items, which employ a seven-point scale ranging from 1 (very poor) to 7 (excellent). Scores of items were summed and linearly transformed to 0–100 scales and missing data were imputed according to the EORTC guideline [14]. Higher scores on functioning scales and global health status represent better functioning, while higher scores on the symptom scales indicate more symptom burden. Higher scores on the scales and items of the QLQ-NMIBC24 should be interpreted as more symptom burden, with exception of the sexual function scale and sexual enjoyment where higher scores represent better functioning.
Statistical analysis
Floor and ceiling effects were examined for each scale at each assessment point. If more than 15% of the patients scored at the lowest or highest end of the scale, the scale was considered to have a floor or ceiling effect, respectively [15]. Multitrait scaling analysis and Confirmatory factor analysis (CFA) were performed to validate the constructs of the QLQ-NMIBC24. Convergent validity was defined as a correlation of 0.40 or greater between an item and its own scale. Discriminant validity was defined a as correlation of less than 0.40 between an item and any other scale [2, 16]. Maximum Likelihood (ML) was used as estimator in the CFA and missing items were imputed using Full Info Max Likelihood (fiml). Model-data-fit of the CFA was assessed with model chi-square, the Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR). Model chi-square > 0.05, CFI ≥ 0.95, RMSEA < 0.05 and SRMR < 0.05 indicate a good fit, and CFI > 0.90 and both RMSEA and SRMR > 0.05 but < 0.08 indicate an acceptable fit [15, 17]. Internal consistency was assessed with Cronbach’s α. A Cronbach’s α of 0.70 or higher was considered adequate for group level comparisons. Test–retest reliability was assessed based on the questionnaires administered at T12mo and T12mo + 2wk using the intraclass correlation coefficient for absolute agreement (ICC; two-way mixed model, single measure) [18]. An ICC value of 0.70 or higher was considered acceptable.
Divergent validity of the QLQ-NMIBC24 was assessed by calculating the Spearman correlation coefficients between the scales of the QLQ-C30 and QLQ-NMIBC24 [19]. Based on previous studies, we expected in general low to moderate correlations (< 0.40) between the scales of both questionnaires. Previous studies have shown that malaise was moderately to strongly correlated (> 0.40) with all the scales of the QLQ-C30 [2, 6, 16]. The urinary symptom scale has previously also shown to be moderately (0.40–0.69) correlated with role function, cognitive function, social function, fatigue, nausea and vomiting, and pain [2, 6]. At last, future worries was expected to be moderately correlated to the emotional function scale of the QLQ-C30 [2] and fatigue [6].
Known group validity was assessed by comparing patients with low, intermediate and high risk NMIBC using independent t-tests. Patients were divided into risk groups based on the European Association of Urology (EAU) guidelines [1] without taking into account the tumour size (not available) and the recurrent nature of the tumour (only primary tumours included). We hypothesized that patients with high risk NMIBC would have more urinary symptoms, malaise, future worries and intravesical treatment issues at T6wk than patients with low risk NMIBC.
Responsiveness to change was examined using all three questionnaires of the UroLife study (i.e. T6w, T3mo and T15mo) using paired t-tests. We hypothesized that differences on the scales of the NMIBC24 would only be small between T6wk and T3mo, but that symptoms and complaints decrease from T6wk to T15mo. Effect sizes (ESs) were calculated using Cohen’s d statistic (mean difference divided by pooled standard deviation). These provide a distribution-based estimate of the magnitude of mean differences/changes, where an ES of 0.2 is considered small, 0.5 moderate, and 0.8 large [20].
MIC was assessed using the visual-anchor distribution method of De Vet et al. [13]. This method determines the smallest change in scores of the QLQ-NMIBC24 that are regarded as either improvement or deterioration by taking into account the variability and importance of the scores. To determine the importance of the scores, an external anchor is used. Correlations between the anchor-question and the scales of the QLQ-NMIBC24 were assessed to determine the adequacy of the anchor (r ≥ 0.40) (i.e. does the anchor question measures the same as the change scores?). Then, patients were subdivided into three groups (importantly deteriorated, not importantly changed and importantly improved) using the anchor question and for each group the distribution of the changes scores was plotted. The optimal receiver operating curve (ROC) was considered to be the MIC value.
The CFA was conducted with the software package R using the “lavaan” package [21]. ICCs were calculated in STATA version 16.0 (StataCorp LLC, College Station, Texas, USA) and SEMs were calculated in SAS (SAS Institute, Cary, North Carolina, USA). All other statistical analyses were executed using SPSS version 25 (IBM Corporation, Armonk, New York, USA). P values < 0.05 were considered statistically significant.