Background

Cancer and its medical treatment often lead to impairments in aerobic capacity and consequently decreased physical functioning and health-related quality of life. Literature suggests that low aerobic capacity is associated with increased risks for cancer-recurrence and all-cause and cancer-related mortality [1, 2]. Therefore, it is worrying that cancer survivors experience a longstanding decline in aerobic capacity of 5–22% during the course of their treatment [3, 4]. This decline in aerobic capacity can be countered or prevented, and it is well-known that physical exercise is an effective way to do so [5, 6].

The criterion standard to evaluate aerobic capacity is measuring peak oxygen uptake (VO2peak) during an incremental maximal exercise test with respiratory gas analysis, also referred to as a cardiopulmonary exercise test (CPET) [7]. Measuring VO2peak is of great additional value for pre-operative risk-screening, personalized exercise prescription and monitoring aerobic capacity in patients with cancer [8, 9]. Moreover, CPET is used for exercise pre-participation health screening and to determine the underlying cause of exercise limitation [9, 10]. However, performing CPET is costly, time-consuming, a burden to the patient and requires costly advanced equipment and medical supervision [9]. In many clinical circumstances the main aim is to assess aerobic capacity, without underlying diagnostic question on exercise limitation. Patient-reported outcome measures (PROMs), such as self-reported questionnaires, could be a useful alternative to estimate and monitor aerobic capacity in these settings where a CPET is not feasible or necessary.

The Duke Activity Status Index (DASI) and Veterans Specific Activity Questionnaire (VSAQ) are self-reported questionnaires which are often used in clinical healthcare for the assessment of aerobic capacity in patients [11, 12]. The DASI was developed to assess physical functioning in cardiovascular patients and shows good validity compared to VO2peak measured during CPET (CPET-VO2peak) when administered by an interviewer, and moderate validity when self-reported [11]. In a recent study with patients scheduled for major cancer surgery, VO2peak estimated using the DASI (DASI-VO2peak) showed substantial bias with wide 95% limits of agreement (95%-LoA) when compared to CPET-VO2peak [13]. The VSAQ was developed to estimate aerobic capacity in American veterans describing activities of increasing Metabolic Equivalent of a Task (MET) and showed a moderate correlation with METs derived from CPET [12]. One MET is considered equal to 3.5 mL·kg− 1·min− 1 and can be used interchangeable with VO2peak [14]. In a more recent study with healthy adults, VO2peak estimated using the VSAQ (VSAQ-VO2peak) also showed considerable bias with wide 95%-LoA [15]. Although VSAQ and DASI showed a significant correlation with measured VO2peak, agreement was suboptimal. Besides, both questionnaires were developed and validated in an American population. A major drawback of the VSAQ is the use of activities, such as basketball and cross-country skiing, which are not practiced globally [16].

More recently, the FitMáx©-questionnaire, hereafter called FitMáx, was developed as a self-reported questionnaire to estimate VO2peak (FitMáx-VO2peak) in the general Dutch population. FitMáx-VO2peak is based on the self-reported maximum capacity of walking, stair climbing, and cycling combined with age, sex, and body mass index (BMI). In a recent study, the FitMáx showed a strong intraclass correlation (ICC = 0.93) with CPET-VO2peak, and acceptable bias (-0.24 with 95%-LoA − 9.23–8.75), in a heterogeneous group of 228 patients (with lung, cardiac and oncologic diseases) and athletes. The results for FitMáx were compared with DASI (ICC = 0.62, bias of 3.32 with 95%-LoA − 14.81–21.44) and VSAQ (ICC = 0.87, bias of 3.44 with 95%-LoA − 10.11–16.98) in the same population and showed better agreement with CPET-VO2peak [17].

The clinical usefulness and applicability of PROMs depend on several measurement properties including validity, responsiveness and reliability. Assessing the responsiveness of an instrument is important to determine whether it is able to detect changes over time. However, no studies regarding the responsiveness of these self-reported questionnaires were performed before. Therefore, the aim of this study was to assess and compare the (1) population specific criterion validity and (2) responsiveness of VO2peak predicted by FitMáx, DASI and VSAQ as self-reported questionnaires, to evaluate aerobic capacity in cancer survivors who participated in a 10-week supervised exercise program.

We hypothesized the population specific agreement between CPET-VO2peak and FitMáx-VO2peak at T0 to be moderate-to good, with an ICC of > 0.70 [17,18,19]; and the ICC between change over time in CPET-VO2peak and FitMáx-VO2peak to be between 0.40 and 0.60 [20, 21]. Furthermore, the ability of the FitMáx to discriminate between participants who did or did not improve in aerobic capacity was expected to be moderate. As such, the area under the curve (AUC) of the receiver operating characteristic curve (ROC-curve) was expected to be in the range of 0.60–0.80 [18]. Lastly, looking at the results of previous studies, the validity and responsiveness of FitMáx-VO2peak in this population are expected to be better compared to the validity and responsiveness of the DASI-VO2peak and VSAQ-VO2peak, which are expected to show poor-to moderate agreement with CPET-VO2peak (ICC < 0.70) [11, 17, 22].

Methods

Setting

Patients who were scheduled to participate in a supervised exercise program as part of usual-care multidisciplinary oncology rehabilitation, were prospectively recruited at the Department of Physical Therapy of the Maastricht University Medical Center (MUMC+) between January 2021 and December 2021.The multidisciplinary rehabilitation program consisted of a 10-week supervised physical exercise program, supplemented with psychological and/or occupational therapy, when indicated. The exercise program consisted of combined endurance and resistance training as described elsewhere [23]. Data collection procedures were in compliance with the Declaration of Helsinki [24] and were approved by the medical ethics committee of the MUMC+ (registration number METC 2020–2300). This study was reported according to the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) guidelines [25]. The study was registered as NL8568 in the Netherlands Trial Register (https://trialsearch.who.int).

Participants

Patients were eligible to participate in the rehabilitation program when they were suffering from physical and psychosocial complaints and/or fatigue due to cancer (treatments). Patients were excluded from participation when they were unable to perform basic activities of daily living (e.g. walking) and suffered from disabling comorbidities that seriously hamper physical exercise [23]. Within two weeks before the start (T0) and after the 10-week exercise program (T1) a CPET was conducted as part of usual care. Patients were included in this study when they were willing to complete three self-reported questionnaires during both CPET consultations and gave written informed consent for the use of their questionnaire and CPET data. Patients who were unable to read and understand the questionnaires, or did not show signs of voluntary exhaustion during the CPET at T0 (e.g. due to injuries or joint complaints) were excluded from the study.

Test procedures

Anthropometric measurements were conducted before the CPET. After pre-test instructions, baseline cardiopulmonary values were collected during a 2-minute rest period while seated at the cycle ergometer (Lode Corival, Lode BV, Groningen, The Netherlands). After the rest period, the participant completed a 3-minute warm-up phase of unloaded cycling. Subsequently, the work rate started to increase by an incremental maximal ramp protocol adjusted to the patients’ self-reported physical activity level (assessed by the sports physician independently from the questionnaire results), aimed to reach a maximal effort within 8‒12 min [26, 27]. At T1, the same ramp protocol was applied for CPET as at T0. Participants were instructed to keep cycling until exhaustion, with a pedaling frequency of at least 60 rotations per minute. The protocol continued until the patient stopped cycling or pedaling frequency fell below 60 rotations per minute, despite verbal encouragement. Continuous breath-by-breath analysis was obtained during the test using a ergospirometry system (Vyntus CPX, Vyaire Medical, Mettawa, United States) calibrated for respiratory gas analysis and volume measurements. Peak exercise was defined as the point where the pedaling frequency dropped below 60 rotations per minute. Voluntary exhaustion was considered to be achieved when participants showed clinical signs of intense effort (e.g., unsteady biking, sweating or clear unwillingness to continue exercising). True maximal effort was considered to be reached if one of the two following criteria was met: (i) percentage of age related predicted maximal heart rate and (ii) age related peak respiratory exchange rate (RERpeak) [28, 29]. Participants were blinded for test outcomes during both test moments and for questionnaire answers at T0, during T1 measurements. Moreover, researchers were blinded for questionnaire data during the CPET and for test outcomes at T0 during the CPET at T1. CPET outcomes were analyzed by a trained researcher. Oxygen uptake (VO2) and RER values were averaged over 30 s at peak exercise. The VO2 at the anaerobic threshold (VO2AT) was determined as described elsewhere [30].

Questionnaires

On the same day, shortly before the CPET subjects were asked to complete the DASI, VSAQ and FitMáx as self-reported questionnaires. The DASI consists of twelve dichotomous questions, of which weighted scores are used in an algorithm to estimate the VO2peak [11]. The VSAQ is a single-answer 13-point scale describing activities of increasing intensity. The VSAQ score and age were used to estimate VO2peak, according to guidelines of the questionnaire [12]. The FitMáx consists of three single-answer, multiple-choice questions assessing the maximum capacity of walking, stair climbing, and cycling on a 14-, 11- and 12-point scale, respectively. Based on the weighted score of the FitMáx combined with sex, age (in whole years) and BMI, VO2peak was estimated [17]. The ability of the current study population to complete the FitMáx was assessed using three additional questions on a scale 1‒10 for the questions about walking, stair climbing and cycling capacity separately, in which 1 indicates “I cannot estimate properly” and 10 indicates “I can estimate properly”.

Statistical analysis

A sample size estimation was performed using PASS 2008 [31], in which a sample size of n = 55 was determined to achieve a two-way 95% confidence interval with an expected correlation of r = 0.60 (0.40–0.75). This in in line with the minimum of 50 participants as recommended in the COSMIN guidelines [25]. Statistical analyses were performed using SPSS version 23.0 [32]. Continuous variables were checked for normality using histograms and Q-Q plots. Continuous variables are presented as mean ± standard deviation (SD) in case of normal distribution or as median and interquartile range otherwise. Categorical variables are expressed as frequencies with percentages. Mean changes in outcomes between T0 and T1 were reported with 95%-CI. When the 95%-CI did not include zero, the mean change was considered statistically significant. Criterion validity and responsiveness were determined using ICC (two-way random, absolute agreement), with corresponding 95%-CI and standard error of the estimate (SEE). Criterion validity of the FitMáx, DASI and VSAQ was evaluated for all participants at T0, by quantifying the agreement between CPET-VO2peak and VO2peak estimated using the questionnaires (questionnaire-VO2peak). Furthermore, Bland-Altman analysis was conducted with calculation of bias and 95%-LoA to assess the agreement between CPET-VO2peak and questionnaire-VO2peak and to determine whether mean differences between both values, are dependent on the size of the CPET-VO2peak. Proportional bias was assessed using linear regression between the means and the differences of CPET-VO2peak and questionnaire-VO2peak. P-values of < 0.05 were considered statistically significant. In case of proportional bias, the ratio of questionnaire-VO2peak to CPET-VO2peak was calculated for each subject and plotted to the average of the two values with corresponding 95%-LoA, as suggested by Bland and Altman [33]. To evaluate the responsiveness of the FitMáx, DASI and VSAQ, the ICC and SEE were calculated between the absolute change in CPET-VO2peak (ΔCPET-VO2peak) and questionnaire-VO2peak (Δquestionnaire-VO2peak) between T0 and T1, for participants who completed both exercise tests. As a secondary analysis, the FitMáx-VO2peak without cycling was included for analysis as well, since it was expected that not all participants cycle regularly (on a regular bicycle without electronic support).

If the responsiveness to estimate ΔCPET-VO2peak was insufficient (ICC < 0.50), ROC-curves were plotted between the dichotomized ΔCPET-VO2peak (improvement vs. no improvement) and the Δquestionnaire-VO2peak to assess whether the questionnaires at least were able to detect improvement in CPET-VO2peak [19,20,21] The minimal detectable change for improvement in CPET-VO2peak was defined as a relative increase of ≥ 6% [34]. The AUC of the ROC-curve with corresponding 95%-CI was calculated to evaluate the ability of the questionnaires to detect a true improvement in CPET-VO2peak of ≥ 6% over time. Since both sensitivity and specificity were considered equally important, the value at which the product of both is maximized was chosen as the optimal cut-off value to indicate an improvement in CPET-VO2peak [35]. Sensitivity, specificity, and predictive values (%) were calculated for the cut-off values of the questionnaires.

Results

Participants

Of the 84 patients who were eligible to participate in the study, 70 participants (83%) were included for analysis (15 men and 55 women). Twelve participants (17%) were lost to follow-up, because they did not complete any of the questionnaires and/or the CPET at T1, for several reasons. Outcome measures at T1 were available for 58 participants (83%) (see Fig. 1). Mean age at T0 was 53.2 ± 12.8 years and breast cancer was the most common diagnosis (39%). Surgery, chemotherapy and radiotherapy were the most commonly received treatments and approximately half of the participants were still receiving medical treatment during the study. Three of them (4%) were still receiving chemotherapy (Table 1).

Fig. 1
figure 1

Participant inclusion flowchart

Abbreviations: CPET, cardiopulmonary exercise test; n, number of subjects

Table 1 Patient characteristics at baseline (T0)

CPET and questionnaire results

Mean CPET-VO2peak at T0 was 18.9 ± 5.9 mL·kg− 1·min− 1, which is 62 ± 19% of the reference value for healthy Dutch persons of the same age and sex [36]. Mean time between T0-T1 was 94 ± 16 days. All included participants showed maximal voluntary exhaustion during CPET. At T0, n = 62 participants (89%) met at least one of the objective criteria for true maximal effort during CPET and at T1, n = 46 (79%). For RERpeak and heartrate at peak exercise (HRpeak), no significant differences were seen between T0 and T1. Participants who completed the tests and questionnaires at both T0 and T1 showed a significant mean improvement of 1.6 mL·kg− 1·min− 1 (95%-CI 1.0‒2.3) or 8% on CPET-VO2peak after completion of the exercise program. Thirty-four participants (59%) showed a relative increase of ≥ 6% in CPET-VO2peak which we considered as a true improvement in aerobic capacity [34]. Body weight, VO2AT during CPET, FitMáx-VO2peak, DASI-VO2peak and VSAQ-VO2peak increased significantly as well (Table 2). Most missing values were observed for DASI-VO2peak. Because some participants did not fill out the FitMáx question about cycling, a sub analysis was performed without the maximum cycling capacity [17]. CPET results and questionnaire-VO2peak are presented in Table 2 for all participants at T0 (N = 70) and for the participants who completed CPET and the questionnaires at both T0 and T1 (n = 58), with corresponding change scores. The participants’ ability to complete the FitMáx on a scale from 1 to 10 is reported as well.

Table 2 CPET and questionnaire results

Criterion validity

An ICC of 0.69 (95%-CI 0.18‒0.86) was found for the agreement between CPET-VO2peak and FitMáx-VO2peak at T0. When the question about maximum cycling capacity was not included, the ICC was 0.62 (95%-CI 0.01‒0.84) for the agreement with CPET-VO2peak. Less agreement was found between CPET-VO2peak and VSAQ-VO2peak (ICC = 0.53) and CPET-VO2peak and DASI-VO2peak (ICC = 0.37)(Table 3). The agreement between questionnaire-VO2peak and CPET-VO2peak is displayed visually in Fig. 2A-D. Bland-Altman plots showed proportional bias for the agreement between CPET-VO2peak and FitMáx-VO2peak, FitMáx-VO2peak without cycling and VSAQ-VO2peak (p < 0.05). For this reason, bias and 95%-LoA were reported as ratios [33]. The mean ratio of FitMáx-VO2peak/CPET-VO2peak was 1.21 (95%-LoA 0.80–1.62), which means the FitMáx overestimated CPET-VO2peak with 21% on average. The mean ratio bias was 1.28 (95%-LoA 0.81–1.75) for FitMáx-VO2peak without cycling, 1.06 (95%-LoA 0.33–1.79) for VSAQ-VO2peak and 1.26 (95%-LoA 0.55–1.97) for DASI-VO2peak. Bland-Altman plots show wider 95%-LoA for VSAQ and DASI, when compared to FitMáx. The plots for FitMáx-VO2peak with and without maximum cycling capacity look similar, but the results are shifted more towards a ratio above 1 for the FitMáx-VO2peak without maximum cycling capacity. SEE for the agreement between CPET-VO2peak and FitMáx-VO2peak, FitMáx-VO2peak without cycling, VSAQ-VO2peak and DASI-VO2peak was 3.28 mL·kg− 1·min− 1, 3.31 mL·kg− 1·min− 1, 4.95 mL·kg− 1·min− 1 and 5.46 mL·kg− 1·min− 1, respectively (Fig. 3A-D; Table 3).

Table 3 Agreement between CPET-VO2peak and questionnaire-VO2peak at T0
Fig. 2
figure 2

AD. Criterion validity with identity line for relation between questionnaire-VO2peak and CPET-VO2peak at T0. A) FitMáx-VO2peak compared with CPET-VO2peak. B) FitMáx-VO2peak without cycling compared with CPET-VO2peak. C) VSAQ-VO2peak compared with CPET-VO2peak. D) DASI-VO2peak compared with CPET-VO2peak. Abbreviations: CPET, cardiopulmonary exercise test; DASI, duke activity status index; ICC, intraclass correlation coefficient; kg, kilograms; mL, milliliters; min, minute; VO2peak, peak oxygen uptake; VSAQ, veterans specific activity questionnaire

Fig. 3
figure 3

 A-D. Bland-Altman plots for the agreement between questionnaire-VO2peak and CPET-VO2peak at T0. The dashed lines represent the 95%-LoA, from − 1.96 SD to + 1.96 SD. The solid line represents ratio bias and the dotted line represents the zero bias line. A) FitMáx-VO2peak compared with CPET-VO2peak. B) FitMáx-VO2peak without cycling compared with CPET-VO2peak. C) VSAQ-VO2peak compared with CPET-VO2peak. D) DASI-VO2peak compared with CPET-VO2peak. Abbreviations: CPET, cardiopulmonary exercise test; DASI, duke activity status index; kg, kilograms; mL, milliliters; min, minute; VO2peak, peak oxygen uptake; VSAQ, veterans specific activity questionnaire</fig>

Responsiveness

An ICC of 0.43 (95%-CI 0.18‒0.63) was found for the agreement between individual ΔFitMáx-VO2peak and ΔCPET-VO2peak from T0 to T1. The ICC agreement between ΔFitMáx-VO2peak without the question about maximum cycling capacity and ΔCPET-VO2peak was 0.27 (95%-CI 0.00‒0.49). A lower ICC was found for the agreement between ΔCPET-VO2peak and ΔVSAQ-VO2peak (ICC = 0.19 95%-CI -0.06‒0.42) and the agreement between ΔCPET-VO2peak and ΔDASI-VO2peak (ICC = 0.18 95%-CI -0.10‒0.44) (Fig. 4A-D; Table 4). Since the responsiveness to estimate ΔCPET-VO2peak was insufficient for all questionnaires, ROC analyses were performed to determine whether the questionnaires are able to detect a true improvement in CPET-VO2peak (≥ 6%) with a corresponding optimal cut-off value [34] An area under the curve (AUC) of 0.77 (95%-CI 0.63–0.91) was found for FitMáx-VO2peak, while the FitMáx without maximum cycling capacity showed an AUC of 0.72 (95%-CI 0.59–0.86). The ROC-curve for VSAQ-VO2peak and DASI-VO2peak showed an AUC of 0.66 (95%-CI 0.52–0.80) and 0.64 (95%-CI 0.48–0.81), respectively (Table 4; Fig. 5). The maximum product of sensitivity and specificity was found at Δ1.0 mL·kg− 1·min− 1, for FitMáx-VO2peak and Δ1.8 mL·kg− 1·min− 1 for FitMáx-VO2peak without maximum cycling capacity. These values were therefore chosen as the optimal cut-off values to discriminate between improvement and no improvement in CPET-VO2peak. The optimal cutoff value for VSAQ-VO2peak was Δ3.4 mL·kg− 1·min− 1 and Δ2.7 mL·kg− 1·min− 1 for DASI-VO2peak. Using the cut-off value for FitMáx-VO2peak, resulted in a sensitivity of 71% a specificity of 75%, a positive predictive value (PPV) of 81% and a (NPV) negative predictive value of 63%. Sensitivity, specificity, PPV and NPV for the other questionnaires are presented in Table 4.

Table 4 Agreement between CPET-VO2peak and Questionnaire-VO2peak for changes (∆) from T0 to T1
Fig. 4
figure 4

AD. Scatterplots for the relation between changes (Δ) in questionnaire-VO2peak and CPET-VO2peak from T0-T1. A) ΔFitMáx-VO2peak compared with ΔCPET-VO2peak. B) ΔFitMáx-VO2peak without cycling compared with ΔCPET-VO2peak. C) ΔVSAQ-VO2peak compared with ΔCPET-VO2peak. D) ΔDASI-VO2peak compared with ΔCPET-VO2peak Abbreviations: CPET, cardiopulmonary exercise test; DASI, duke activity status index; ICC, intraclass correlation coefficient; kg, kilograms; mL, milliliters; min, minute; VO2peak, peak oxygen uptake; VSAQ, veterans specific activity questionnaire

Fig. 5
figure 5

ROC-curves for the ability of questionnaires to detect a true improvement in CPET-VO2peak Abbreviations: DASI, duke activity status index; ROC-curve, receiver operating characteristics curve; VSAQ, veterans specific activity questionnaire

Discussion

In this study among cancer survivors who participated in a 10-week exercise program, we evaluated the criterion validity of three questionnaire and found a moderate agreement between FitMáx-VO2peak and CPET- VO2peak. Agreement between CPET-VO2peak and VSAQ-VO2peak was moderate as well, but lower compared to FitMáx-VO2peak, while the DASI-VO2peak showed poor agreement. This implies that the criterion validity of the DASI to evaluate aerobic capacity was insufficient. The criterion validity of the FitMáx and the VSAQ to estimate aerobic capacity is acceptable on group level, but limited to estimate CPET-VO2peak in individuals [19].

Initial Bland-Altman analysis showed proportional bias, indicating that mean differences between questionnaire-VO2peak and CPET-VO2peak with corresponding 95%-LoA, are dependent on the size of the CPET-VO2peak values. This is not surprising, since higher measurement errors are expected for higher values of CPET-VO2peak [34]. For the latter reason, Bland-Altman analyses were performed using ratios instead of differences between questionnaire-VO2peak and CPET-VO2peak, which showed an overestimation of CPET-VO2peak for all questionnaires [33]. Mean ratio bias for FitMáx-VO2peak (+ 21%) was smaller compared to DASI-VO2peak (+ 26%), but larger compared to VSAQ-VO2peak (+ 6%). However, 95%-LoA for VSAQ-VO2peak were wider compared to those for FitMáx-VO2peak. This could be explained by larger measurement errors for VSAQ-VO2peak in both directions, while FitMáx and DASI overestimated CPET-VO2peak in most cases.

The moderate agreement found between questionnaire-VO2peak and CPET-VO2peak is in line with previous research, which showed discrepancies between patient-reported functional capacity and measured VO2peak [13, 37]. A recent study of Meijer et al., reported higher values for the agreement between CPET-VO2peak and FitMáx-VO2peak, DASI-VO2peak and VSAQ-VO2peak. On the other hand, SEE for FitMáx-VO2peak and VSAQ-VO2peak were smaller in the current study, compared to the previous study, indicating more accurate predictions of CPET-VO2peak [17]. It was not possible to compare Bland-Altman results with previous studies, because ratios were used instead of absolute values in the current study. In the original studies about the development of DASI and VSAQ, higher correlation coefficients between estimated and measured aerobic capacity were found, but the populations and research methods differed substantially from our study and both studies were performed more than 25 years ago [11, 12]. Low ICC values for the agreement between questionnaire-VO2peak and CPET-VO2peak at T0 in the current study, could be explained by the small range in VO2peak values [38]. The current study population had a relatively low aerobic capacity (62% of predicted) and the population was more homogeneous compared to the original FitMáx study [17]. The fact that participants in the current study reached lower fitness levels compared to participants in the original FitMáx study (in which the questionnaire and its prediction model were developed), may have influenced the performance of the questionnaire as well. It can be expected that estimating physical abilities is easier when someone is fitter and reaches higher physical activity levels in daily life or even in sports. For patients who are mainly sedentary, it might be more difficult to estimate their physical abilities. Moreover, it could be questioned whether the question about cycling of the FitMáx is appropriate for the current study population. The area of the MUMC + is hilly, making it difficult for elderly to cycle on a regular bike, especially after receiving cancer treatment. When patients did not cycle regularly, or cycled on an electronic bike, it may have been hard for them to answer the FitMáx question about maximum cycling capacity. This is in line with the fact that participants rated their ability to complete the FitMáx question about cycling with a median of 5 at T0 and 6 at T1, which is lower compared to the other two questions about walking and stair climbing.

All three questionnaires showed poor responsiveness to measure ΔCPET-VO2peak in the current study population. This could be explained by the increased measurement error that comes along with repeated testing and by the little variability in data as well [20, 21, 38]. However, ROC analysis showed that FitMáx-VO2peak was sufficiently responsive to detect a true improvement in CPET-VO2peak (AUC 0.77), when using the optimal cut-off value of 1.0 mL·kg− 1·min− 1 [34]. This was also the case for the FitMáx-VO2peak without the question about maximum cycling capacity (AUC 0.72 with a cut-off value of 1.8 mL·kg− 1·min− 1). The AUC for DASI-VO2peak (0.64) and VSAQ-VO2peak (0.66) were insufficient to detect improvement, and therefore it is not recommended to use these questionnaires to monitor changes in aerobic capacity.

Comparing the current study results to a previous study in which a mean change of 2.0 ± 2.3 mL·kg− 1·min− 1 was found after a 10-week exercise program as part of multidisciplinary oncology rehabilitation in MUMC+, larger improvements in VO2peak were expected [23]. This could be explained by the fact that the training stimulus in the current study was not given as intended, due to COVID-19. Because of this pandemic, patients were allowed to train only once a week instead of twice and exercise training took place in smaller groups of four instead of eight patients. In order to avoid a long waiting list, the training frequency was reduced. The smaller improvement may have led to less variability in ΔVO2peak from T0 to T1, which could explain low ICC values for responsiveness [38]. Results for responsiveness could not be compared with literature, because no previous studies were conducted on this matter.

Comparing the results for the different questionnaires, we can conclude that values for criterion validity and responsiveness of the FitMáx-VO2peak are better compared to VSAQ-VO2peak and DASI-VO2peak, in cancer survivors participating in an exercise program. FitMáx-VO2peak was less accurate without the question for maximum cycling capacity, yet superior to the DASI and VSAQ.

Strengths of the current study

This is the first study to investigate the responsiveness of self-reported questionnaires to estimate ΔVO2peak. The direct comparison of the criterion validity and responsiveness of three different self-reported questionnaires, with CPET-VO2peak as criterion standard measure, was a strength of this study. Since both measurements and the exercise training were part of usual care, the current study results can easily be translated into daily care in oncology rehabilitation in the Netherlands. Besides, we included patients who did and did not complete medical treatment yet, resulting in a variation of ΔCPET-VO2peak in both directions, which is ideal to study the responsiveness of a measurement [5, 21]. Another strength of the study was blinding of participants and researchers for test outcomes to avoid bias.

Limitations of the current study

A limitation was the fact that the DASI was often not completed. A possible explanation is the use of twelve dichotomous questions also including some activities which are difficult to recognize for the general Dutch population, such as playing basketball. In the absence of only one answer the DASI-VO2peak could not be calculated. This suggests that the usability of the DASI is limited in this population. The fact that true maximum effort (according to objective criteria) was not reached during all CPETs, could be seen as a limitation as well. However, these findings are in agreement with previous studies, which reported that maximal effort criteria are often not reached in cancer survivors [23, 39]. Besides, it can be expected that these participants are also unable to reach and estimate their maximum capacity of walking, stairclimbing, cycling and other daily tasks, as described in the self-reported questionnaires. Since mean RERpeak and HRpeak were similar at T0 and T1, it is not expected that the delivered effort affected the study results. Another limitation was the fact that the study population is quite specific (79% women and in general low fitness) so results may not be generalizable to other patients with cancer. Validity and responsiveness for male cancer survivors could differ from the current study results, especially because VO2peak is sex-dependent. Also the cancer type and treatment may influence the relationship between questionnaire-VO 2peak and CPET-VO2peak. For instance, breast surgery and breast radiation may cause limitations in certain activities mentioned in the DASI and VSAQ that include the upper body (i.e. lifting weights). More research is needed in a population with a better distribution of sex, cancer type, treatment and more variation in level of aerobic capacity. Also, research on the responsiveness of PROMs to measure deterioration in VO2peak would be of additional value, since the current study focused on improvement. Monitoring deterioration in VO2peak would be useful during intensive cancer treatment, like chemotherapy. In this case, rehabilitation can be started as soon as deterioration in VO2peak is noted. Besides, PROMs for estimating aerobic capacity could potentially be improved in the future, by using computerized adaptive test (CAT) methods. CAT methods enable PROMs to be adapted to individual patients while maintaining direct comparability of the scores [40, 41]. Based on the patient’s previous answers, a computer program personalizes the next questions, in order to obtain precise information in an efficient manner. A CAT version of the FitMáx, could personalize questions on physical fitness for patients with different diagnoses of cancer, different treatment modalities and different fitness levels, which could potentially lead to more precise estimations of VO 2peakand better values of validity and responsiveness.

Clinical relevance

Results of the current study show that the FitMáx is sufficiently valid to estimate aerobic capacity on group level and could be used to detect improvement using a cutoff value of 1.0 mL.kg− 1.min− 1. The advantage of such a questionnaire is the possibility to monitor aerobic capacity over time with repeated assessments at low cost. When choosing self-reported questionnaires to evaluate aerobic capacity in cancer survivors, it can be recommended to use FitMáx above the DASI and VSAQ, since this recently developed questionnaire showed better criterion validity, and a responsiveness above the 0.70 AUC threshold. However, some results should be interpreted with caution, since values for criterion validity and responsiveness were still suboptimal, and it should be kept in mind that the FitMáx overestimates with on average 21% in this population [25]. Moreover, CPET is also used to determine the underlying cause of exercise limitations and contra-indications for physical exercise [9]. Therefore, FitMáx should not be considered as a full replacement for CPET, but rather an alternative tool to be used in clinical or research settings where exercise testing is not feasible or necessary. In cancer survivors with increased cardiovascular risks, such as pre-existing cardiovascular disease, treatment with cardiotoxic chemotherapy and left-sided chest radiation, performing CPET should still be recommended [42]. An online platform (www.fitmaxquestionnaire.com) was developed, to enable healthcare professionals and researchers in using the FitMáx. The online platform provides up-to-date information about the questionnaire and research projects. More information about the research group, hospital and FitMáx can be found on https://www.mmc.nl/english/fitmax/.

Conclusion

The population specific criterion validity and responsiveness of the self-reported FitMáx-VO2peak are better compared to VSAQ-VO2peak and DASI-VO2peak, in cancer survivors who participated in an exercise program as part of multidisciplinary rehabilitation. The FitMáx is sufficiently valid to estimate CPET-VO2peak in cancer survivors on group level, but overestimates with on average 21%. The responsiveness of the FitMáx to measure absolute changes in CPET-VO2peak was poor, but the questionnaire is able to detect whether aerobic capacity improved when using a cutoff value of only 1.0 mL.kg− 1.min− 1. Therefore, the self-reported FitMáx can be used to estimate and monitor aerobic capacity in cancer survivors, but results should be interpreted with caution on absolute values, since the agreement with the criterion standard is limited. Refinements of the questionnaire and the prediction model will be made in the future potentially leading to a further optimization of the validity and responsiveness.