Effects of exercise in breast cancer patients: implications of the trials within cohorts (TwiCs) design in the UMBRELLA Fit trial

Purpose The Trials within Cohorts (TwiCs) design aims to overcome problems faced in conventional RCTs. We evaluated the TwiCs design when estimating the effect of exercise on quality of life (QoL) and fatigue in inactive breast cancer survivors. Methods UMBRELLA Fit was conducted within the prospective UMBRELLA breast cancer cohort. Patients provided consent for future randomization at cohort entry. We randomized inactive patients 12–18 months after cohort enrollment. The intervention group (n = 130) was offered a 12-week supervised exercise intervention. The control group (n = 130) was not informed and received usual care. Six-month exercise effects on QoL and fatigue as measured in the cohort were analyzed with intention-to-treat (ITT), instrumental variable (IV), and propensity scores (PS) analyses. Results Fifty-two percent (n = 68) of inactive patients accepted the intervention. Physical activity increased in patients in the intervention group, but not in the control group. We found no benefit of exercise for dimensions of QoL (ITT difference global QoL: 0.8, 95% CI = − 2.2; 3.8) and fatigue, except for a small beneficial effect on physical fatigue (ITT difference: − 1.1, 95% CI = − 1.8; − 0.3; IV: − 1.9, 95% CI = − 3.3; − 0.5, PS: − 1.2, 95% CI = − 2.3; − 0.2). Conclusion TwiCs gave insight into exercise intervention acceptance: about half of inactive breast cancer survivors accepted the offer and increased physical activity levels. The offer resulted in no improvement on QoL, and a small beneficial effect on physical fatigue. Trial registration Netherlands Trial Register (NTR5482/NL.52062.041.15), date of registration: December 07, 2015.


Introduction
Fatigue is the most often reported side-effect of breast cancer and its treatment, which can persist for years and negatively affect quality of life (QoL) [1][2][3]. Meta-analyses of randomized controlled trials (RCTs) have shown that physical exercise has positive effects on fatigue and QoL in patients with cancer [4][5][6][7][8]. However, these effects were often small, which may be partly due to the inability of blinding the intervention [9]. Patients who decided to participate are generally motivated to exercise and may be disappointed when allocated to the control group. Consequently, they drop-out or start exercising by themselves, the latter resulting in contamination and dilution of the intervention effect [10]. Other disadvantages of conventional RCTs are timeconsuming accrual and inclusion of a selective study sample [11,12].
The Trials within Cohorts (TwiCs) design, also known as the cohort multiple randomized controlled trial (cmRCT) design, was proposed as an alternative to conventional pragmatic RCTs, and has the potential to overcome above mentioned challenges [13][14][15]. Using this design, the intervention study is performed within a prospective cohort. Compared to conventional RCTs, the TwiCs design can lead to more efficient patient recruitment, generalizability of results may be improved, and TwiCs allows evaluation of patients' acceptability of the intervention [14][15][16][17].
The current UMBRELLA Fit study is the first trial applying the TwiCs design in the field of exercise-oncology. UMBRELLA Fit examined the effect of a 12-week supervised exercise intervention on QoL and fatigue. Moreover, we aimed to evaluate the applicability of the TwiCs design in the field of exercise-oncology and the implications of the TwiCs design on effect estimation and interpretation of the results.

Study design and procedures
The study was approved by the Ethics Committee of the University Medical Center Utrecht (UMCU). The UMBRELLA Fit study is a pragmatic, two-arm RCT using the TwiCs design and is conducted within the 'Utrecht cohort for Multiple BREast cancer intervention studies and Long-term evaLuAtion' (UMBRELLA) [13,18,19]. Since September 2013, all patients with breast cancer who are referred for radiation treatment to the UMCU were invited to participate in the UMBRELLA cohort. At enrollment, first stage informed consent is asked for collection of clinical data and patient reported outcome ( Fig. 1) [20]. Additionally, patients could provide broad consent for randomization to future intervention studies. When allocated to an intervention arm (in the future), they will be offered the intervention and asked for (second stage) informed consent when accepting the intervention. Patients allocated to the control arm were not notified about the study and their cohort data were used to estimate intervention effectiveness. UMBRELLA Fit recruited from October 2015 to March 2018. Women participating in the UMBRELLA cohort meeting the UMBRELLA Fit inclusion criteria (see below) were randomly allocated to either intervention or control with a 1:1 ratio, stratified by time since cohort enrollment (12 or 18 months). Randomization was performed by an independent data manager, using a computer-generated randomization list. Patients allocated to the exercise intervention group received an offer by mail to participate and were contacted by telephone to further explain the study. Patients who refused the offer received usual care. When patients refused because of bad timing of the offer, they got the option to start the exercise program at a later stage. Patients allocated to the control group were not informed about the study, and received usual care. After study completion, all patients in the UMBRELLA cohort, irrespective of participation in this study, were informed by a newsletter.

Participants
Patients eligible for the UMBRELLA cohort [19], were eligible for UMBRELLA Fit when meeting the following inclusion criteria: (1) broad informed consent for randomization to future intervention studies; (2) 18-75 years of age; (3) completion of the 12-or 18-month cohort questionnaire; (4) cancer treatment completed (except for hormonal treatment); and (5) a physically inactive lifestyle (< 150 min/ week performing moderate-to-vigorous leisure time and sports activities).

Intervention
The 12-week supervised exercise intervention comprised two weekly 1-h combined aerobic and resistance training sessions at a physiotherapist center close to the patient's home (see details in the supplement). Patients were also encouraged to be physically active with moderate intensity for at least 30 min all days [21].

Outcome measures
Outcome measures were obtained from routine UMBRELLA cohort measurements. The questionnaire that was completed either 12 or 18 months after cohort enrollment served as baseline and the cohort questionnaire 6 months later was used as follow-up measurement.

Quality of life
Quality of life (primary outcome) was measured using the Global QoL score and the five functional scales of the EORTC QLQ-C30 [22]. A QLQ-C30 summary score was calculated using all functional and symptom subscales except global QoL and financial impact [23].

Fatigue
Fatigue was measured with the multidimensional fatigue inventory (MFI-20), a 20-item questionnaire containing five dimensions: general fatigue, physical fatigue, mental fatigue, reduced motivation, and reduced activity [24].

Anxiety and depression
Anxiety and depression was measured with the validated Dutch version of the hospital anxiety and depression scale (HADS), containing seven items for the depression subscale and seven items for the anxiety subscale [25].

Physical activity
Physical activity during an average week in the past months was assessed using the Short QUestionnaire to ASsess Health enhancing physical activity (SQUASH) [26]. A total score was calculated by summing up the minutes per week spent in commuting activities (cycling), leisure time walking and cycling, and sports activities, all with ≥ 4 MET.

UMBRELLA Fit measurements
Patients participating in the exercise intervention visited the UMCU pre-and 12-week post-intervention for additional measurements, i.e., questionnaires, cardiopulmonary exercise testing, intervention acceptance and compliance (see details in the supplement) [18].

Sample size calculation
A required sample size of 166 patients was estimated based on an expected acceptance rate of 70% in the intervention group and a clinically relevant 10-point difference in global QoL (power of 80% and two-sided alpha of 0.05) [18,27,28]. After recruitment of 152 patients, the actual acceptance rate was lower than expected (i.e., 55% instead of 70%) and the sample size was updated, as recommended by Candlish et al. to 260 patients [29]. This sample size adaptation was solely based on the acceptance rate. No interim analysis of the trial outcome was performed.

Statistical analysis
The statistical analyses were specified in our study protocol, which was approved by the Ethics Committee before recruitment started (NL52062.041.15). The statistical analyses plan remained unchanged, but since the TwiCs design is relatively new and little was known about the analyses methods, methods were further refined throughout the analyses process, which was also one of the aims of the current study and which was stated in the statistical analysis plan.
Baseline characteristics and within-group changes are described for all patients and separately for patients who accepted (including patients who withdrew from the intervention in a later phase) and patients who refused the intervention.

Within-group changes
Mean changes and corresponding 95%CI in QoL, fatigue, anxiety and depression, and physical activity level from baseline to 6 months follow-up (cohort measurements) and from pre-to 12-week post-intervention (exercise group) were calculated.

Between-group differences
ITT linear regression analysis was used to assess betweengroup differences. ITT analysis might lead to an underestimation of the intervention effect because of intervention refusal [30]. Therefore, we also performed instrumental variable IV analysis using the two-stage least squares method to account for possible non-acceptance in the intervention group as a sensitivity analysis [29,31]. In the first stage, the relation between treatment assignment and treatment acceptance (compliance) was estimated [32]. In the second stage, the effect of the exercise program on the outcome was estimated, using the predicted values from the first stage as an independent variable in a linear regression model. In the ITT and IV analyses, missing values on covariates and baseline measures of the outcome were multiply imputed (15 imputed datasets using the R mice algorithm [33,34]), whereas patients with missing outcome values were omitted [35].
In the IV analysis, we could not rule out that intervention refusers were influenced by the offer of the intervention [36]. Therefore, we performed propensity score (PS) analysis as a second sensitivity analysis. Here we estimated the effect in intervention accepters by comparing them to control patients who would have accepted the exercise intervention if offered. To this end, first, propensity scores were estimated for patients in the intervention group using a logistic regression model, i.e., the probability of accepting the intervention, given their observed characteristics (baseline measures of the outcome, age, time since diagnosis, BMI, education). Second, based on the propensity scores, intervention accepters were matched to potential accepters in the control group (i.e., patients who would have accepted the intervention if offered) in a 1:1 ratio without replacement using nearest neighbor matching. In the matched sample, balance of covariates between intervention groups was assessed by means of standardized mean differences and by checking the C-statistic of a refitted propensity score model, indicating the ability of the model to predict treatment status. Finally, linear regression analysis was performed. For PS analysis, missing values of the outcomes were also imputed [33].
All models were adjusted for baseline measures of the outcome, age, time since diagnosis, BMI, and education.
We did not adjust for multiple testing, partly because of multicollinearity between outcomes [37]. Therefore, we reported for all secondary endpoints the effect estimates with 95% confidence intervals and the inferences drawn may not be reproducible.

Sensitivity analysis
We repeated the ITT and IV analyses where missing values of the outcome also were imputed. Additionally, ITT analysis was repeated replacing the cohort measurements by the pre-and 12-week post-intervention measurements for patients who did not yet start or did not yet complete the intervention before the cohort follow-up measurement (Fig. 3). Furthermore, PS analysis was performed in refusers (i.e., those who were offered the intervention, yet refused it) to check whether there was an effect of offering the intervention in refusers.
We used SPSS version 25.0 or R Statistical Software version 3.5.1. For all models, model fit was checked and was satisfying.

Patients and participation
In total, 260 patients were randomly allocated to the intervention (n = 130) or control group (n = 130; Table 1, Fig. 2).
Of the patients allocated to the intervention arm, 52% accepted the intervention (n = 68). Patients who accepted were slightly younger, had a lower BMI, and were higher educated than refusers (Table 1).

Adherence
Eight patients withdrew from the intervention after a median number of four training sessions (range 1-15). The 60 patients who completed the exercise intervention attended on average 92% (SD = 9.9) of 24 sessions. At baseline, intervention accepters had a mean VO 2peak of 23.8 ml/min/kg (SD = 5.4), which increased, on average, with 1.8 ml/min/ kg (95% CI = − 0.4; 0.4) following the exercise intervention.
Timing of the cohort measurements was fixed (Fig. 3). Consequently, about half (n = 35/68) of the patients completed the follow-up cohort questionnaire before intervention completion, whereof three patients started the intervention after the follow-up cohort questionnaire. Reasons for a delayed intervention start were planned vacations, physical conditions (e.g., elective heart surgery), family issues, and time constraints. Eleven intervention (16%) accepters did not complete the follow-up questionnaire.

Retention
The 6-month follow-up cohort questionnaire was returned by 87% of the patients. Patients who did not return this questionnaire were lower educated and had a lower baseline global QoL. The response rate was lowest in intervention refusers (77%) and highest in accepters (93%). Eighty percent of control patients returned the follow-up cohort questionnaire.

Quality of life
At baseline, global QoL was comparable to the Dutch general female population (mean = 76.9, SD = 17.9). Withingroup changes are shown in Table 3.  Table 4).
In PS analyses, all covariates were well balanced (i.e., standardized mean differences < 0.1). No differences in the QoL measures were found between actual intervention accepters in the intervention arm and potential accepters in the control arm.
Compared to controls, patients in the intervention group reported larger, but still small, reductions in physical fatigue  Table 4). No between-group differences were found for the other fatigue dimensions.

Sensitivity analysis
Fifteen percent of the patients in the intervention group (n = 20; whereof 6 intervention accepters and 14 intervention refusers) and twelve percent of the patients in the control group had a missing value on the primary outcome (n = 16). Repeating the analyses with missing values on the primary outcome imputed as well as replacing the cohort measurements by the pre-and 12-week post-intervention measurements for patients who did not yet start or not yet complete the intervention before the cohort follow-up measurement, yielded comparable results (Supplemental Material).    Fig. 3 Timing of the intervention (and hence the pre-and 12-week post measurements) relative to the cohort measurements However, the intervention group reported larger, but still small reductions in general fatigue and reduced activity compared to the control group when using the 12-week post-intervention outcomes (− 0.9, 95% CI = − 1.7; − 0.0 and − 0.9, 95% CI = − 1.7; − 0.1 respectively). No differences in QoL and fatigue were found between intervention refusers and potential refusers (control group; data not shown).

Discussion
This study showed no effect on QoL, and a small but beneficial effect of a 12-week exercise intervention on physical fatigue in patients after breast cancer treatment. More than half of the patients performing no or little physical activity started exercising. The TwiCs design provided insights in the effect of refusing an intervention offer. Interestingly, in refusers, we observed slightly higher levels of physical activity, QoL and fatigue compared to the control group, which was not statistically significant when formally tested. This may imply that offering an intervention and actively refusing the offer might induce lifestyle changes.
In contrast to a systematic review reporting positive effects of exercise on QoL in breast cancer patients, we found no effect [8]. In UMBRELLA Fit, eligibility of patients was based on a physically inactive lifestyle and not on a low QoL. QoL of the study population was comparable to the Dutch general female population [38]. Therefore, room for improvement in QoL was limited. In a previous trial from our group in which patients experienced at least three problems on QoL domains, exercise indeed led to a relevant increase in QoL compared to control [40].
In line with a meta-analysis that showed that physical fatigue is the most sensitive fatigue dimension in exerciseoncology trials, we found a positive effect of exercise on physical fatigue [41]. In UMBRELLA cohort patients, QoL decreased during treatment and recovered in the year after treatment, whereas fatigue remained high as compared to a female Dutch reference population, even 18 months after treatment (unpublished observation), and is thus a symptom that indeed needs attention.

Implications of the TwiCs design for effect estimation
Because patients may refuse the intervention, it is recommended from simulation studies to perform both ITT and IV analyses to take non-acceptance into account when estimating the 'real' intervention effect [29,30]. The ITT analysis showed the effect of offering an exercise intervention, which resembles clinical practice, but it may dilute the treatment effect dependent on the extent of non-acceptance. The IV Table 2 Physical activity level at baseline and changes from baseline to 6 months follow-up   analysis took the chance of non-acceptance into account.
To check its robustness, we also performed PS analyses and compared intervention accepters with potential accepters in the control group. All (sensitivity) analyses yielded comparable results, indicating that the impact of intervention refusal on the effect size seems small. Although no  Adjusted for corresponding baseline measures of the outcome, age, time since diagnosis, BMI and education CI confidence interval; SD standard deviation a Scores ranged from 0 to 100 and a higher score indicated better outcomes b Normative data from the Dutch general female population [38] c UMBRELLA Fit pre-and 12-week post-intervention measurements in the patients who accepted the intervention d Scores ranged from 4 to 20 and a higher score indicated more fatigue e Scores ranged from 0 to 21 and a higher score indicated higher levels of anxious or depressive state  differences in outcomes were found between patients who refused the intervention and potential refusers in the control group, we cannot exclude that there was an effect of offering the intervention on outcomes, which then also affects the IV and PS analyses. However, analyses were not powered to detect an effect in refusers; therefore, these results are explorative.
Due to the design, we experienced that it was sometimes challenging to schedule the intervention in between the two cohort measurements. Consequently, 35 intervention accepters (of 68, 51%) had not yet completed the intervention or started (n = 3/68, 4%) with the intervention when the endpoint was evaluated. Therefore, the effects may be underestimated. As a sensitivity analysis, we replaced the cohort measurements by the pre-and 12-week post-intervention measurements for these patients and this yielded slightly larger effects. We also considered using the first cohort measurement after the study end for patients who did not yet complete the intervention before the planned cohort outcome measurement. These could then be matched to random control patients. However, this could introduce a risk of immortal time bias since both groups need to survive until the extended measurement point. Intervention and control patients could be matched on baseline variables, yet this would require selection based on post-intervention variables (notably survival until end of follow-up), thus potentially introducing a selection bias. Given these considerations, we did not conduct these sensitivity analyses. We recommend being stringent in intervention planning and taking a suitable follow-up period when designing a TwiCs.
We observed that the 6-month changes were smaller than the 12-week post-intervention changes. First, as described above, not every patient completed the intervention before the endpoint was evaluated. Second, cohort measurements were not obtained in a trial setting, whereas patients may have completed the 12-week post-intervention questionnaire with their trial participation in mind. Third, the 6-month cohort outcomes might present a 'long-term' effect of the intervention. Therefore, long-term cohort outcomes might be representative for the (long-term) real world effects.

Implications for clinical practice
Because the TwiCs consent procedure better reflects clinical practice than in conventional trials, this study provided important insight into intervention uptake and reasons for refusal. Because we used the TwiCs design, we learned that more than half of the patients performing no or little physical activity started exercising and subsequent adherence was high. Therefore, health care providers should not hesitate to motivate patients to become physically active. Yet, almost half of the patients refused the offer of the exercise intervention, even when it was free of costs. Based on their reasons for refusal, one can consider adaptations to, or alternatives for the intervention, to reinforce uptake in intervention refusers, e.g., offering these patients a home-based exercise program.

Conclusion
In this TwiCs, more than half of the inactive patients with breast cancer accepted the offer of a 12-week supervised exercise intervention. We found no effects of offering an exercise program on QoL, which was already high at baseline. For physical fatigue, we found small but beneficial exercise effects. Applying the TwiCs design appeared feasible. The use of the TwiCs design could dilute effect sizes because of intervention refusal. In this TwiCs, the impact of intervention refusal seemed small because of consistent results of the ITT and IV analysis (taking non-compliance into account). In addition, the use of cohort measurements for effect estimation may have diluted the effect size. Therefore, we recommend careful intervention planning inbetween the cohort measurements that will be used as preand post-intervention measurement.