Introduction

Cardiorespiratory fitness (CRF) is an important variable that influences several health outcomes including quality of life [1, 2]. Cardiopulmonary exercise testing (CPET) is the gold standard to objectively measure CRF expressed as the oxygen uptake at peak exercise (VO2peak) and is clinically used to determine the underlying cause of limitations in exercise capacity [3,4,5]. However, CPET is costly and labour-intensive, whereas Patient-Reported Outcome Measures (PROMs) are a simple, safe and cost-effective alternative, especially in repeated testing such as rehabilitation programs [6, 7].

Máxima Medical Centre (MMC) developed the FitMáx©-questionnaire (FitMáx), which consists of only three single-answer, multiple-choice questions [8]. The FitMáx was developed to estimate cardiorespiratory fitness expressed in VO2peak based on the self-reported maximum capacity of walking, stair climbing and cycling. The FitMáx scores are combined with subject’s age, sex and Body Mass Index (BMI) to estimate VO2peak. A previous validation study showed a strong correlation between VO2peak estimated by the FitMáx (FitMáx-VO2peak) and VO2peak measured with CPET (CPET-VO2peak), r = 0.94 (0.92‒0.95), ICC = 0.93 (0.91–0.95), and Standard Error of the Estimate (SEE) of 4.14 ml/kg/min. Moreover, FitMáx performed superiorly over commonly used questionnaires such as the Veterans Specific Activity Questionnaire (VSAQ) and Duke Activity Status Index (DASI) [8,9,10].

The clinical usefulness and applicability of PROMs depend on several clinometric properties including validity, responsiveness and reliability [11, 12]. Reliability is defined as the extent to which test results of subjects (whose condition has not changed) are the same over time. To assess such test–retest reliability of an instrument, repeated measures are performed under the same conditions [11, 13]. In this way it is possible to quantify the proportion of total variance in repeated measurements that is due to true differences in PROMs. The measurement error describes the systematic and random error of subjects’ results that are not caused by true changes in the construct to be measured [11].

The present short report aimed to assess the test–retest reliability of the FitMáx in four different groups (healthy subjects, pulmonary, oncology, and cardiac patients) and in the total study population.

Material and methods

Setting

Pulmonary, oncology, and cardiac patients were recruited prospectively in MMC, Veldhoven and Eindhoven, the Netherlands. Healthy subjects were included at Ancora Health in Eindhoven, the Netherlands. The authorized Medical Research Ethics Committee of the MMC has reviewed the study protocol and concluded that the rules laid down in the Medical Research Involving Human Subjects Act (also known by its Dutch abbreviation WMO), do not apply to this study (reference number N20.086). The study was registered as NL8846 in the Netherlands Trial Register.

Study population

Subjects were eligible for inclusion if they were aged ≥ 18 years, had a good command of the Dutch language, and if no change in CRF was expected within 31 days from enrollment date. During their visit to MMC or Ancora Health, cardiac and pulmonary patients and healthy subjects who were scheduled to perform CPET, either for medical reasons or as part of a health check, were asked to participate in a study about CRF questionnaires. The CPET protocol is extensively described in our validation study [8]. Since oncology patients do not perform CPET as part of standard care, they were included from the outpatient clinic of the sports department without performing a CPET. Oncology patients were not eligible for inclusion when they were undergoing active disease-specific treatments, potentially affecting their CRF, within the study period. Similar to our validation study, subjects were asked to complete the FitMáx, VSAQ and DASI questionnaires. The questionnaires were administered in a paper format twice to the same subject. Subjects were excluded from analysis if the FitMáx was incomplete, or if the period between T0 and T1 was > 31 days. To minimize a possible ‘subject expectancy effect’, it was explicitly not explained that this was a study to determine the test–retest reliability of these questionnaires. All participants received a second information letter and questionnaire (T1) two weeks after T0. We did not explicitly question participants about experienced change in CRF. All participants gave written informed consent to the use of their anonymized CPET and questionnaire data.

Statistical analysis

We performed a sample size calculation with an expected ICC of 0.85, a minimum acceptable ICC of 0.60 and two measurements per individual, requiring a sample size of n = 26 per subject group to achieve a power of 80%.

Statistical analyses were performed using R, version 4.2.1 (R Foundation for Statistical Computing, Vienna, Austria) [14]. Normality of data was tested using the Shapiro–Wilk test, and checked qualitatively by means of histograms and Q–Q plots. Descriptive statistics were provided for demographic characteristics and reported as mean ± standard deviation (SD) in case of normal distribution, and as median and interquartile range (IQR) otherwise. For categorical variables, we reported frequencies and corresponding percentages.

Pearson correlation coefficient (r) was used to evaluate the linear relationship between CPET-VO2peak and Questionnaire-VO2peak at T0 [15].

To evaluate the test–retest reliability of the questionnaires, the Intraclass Correlation Coefficient (ICC) with 95% confidence interval (95%-CI) was determined (Two Way Mixed, Absolute Agreement, single measurement) [16]. The Standard Error of the Measurement (SEM, see Additional file 1: equations) [17] is a measure related to ICC, but clinically easier to interpret (expressed in the same unit as of the measurement of interest (VO2peak)). The ICC and SEM were calculated between T0 and T1 for all questionnaires in all patient groups together, and for each patient group separately. An ICC < 0.50 indicates poor test–retest reliability, 0.50–0.75 indicates moderate test–retest reliability, 0.75–0.90 indicates good test–retest reliability, and > 0.90 indicates excellent test–retest reliability [16]. The higher the ICC, the lower the SEM and vice versa, but there is no standard measure for the SEM as it depends on the standard deviation of the data.

In addition, Bland–Altman plots were used to present systematic errors with 95% limits of agreement (95%-LoA), by plotting the difference between Questionnaire-VO2peak at T0 and T1 against the mean Questionnaire-VO2peak from T0 and T1 [18].

Results

In this study, 213 subjects participated. A total of 73 subjects did not return the T1-questionnaire, resulting in a response rate of 66%. 11 subjects returned it after > 31 days from T0 and, although we did not explicitly question, two subjects reported on paper to have changed CRF due to a COVID-19 infection and were excluded as well. As such, a total of 127 participants (84 men and 43 women) were included for analysis. The time between completing the questionnaires and CPET ranged from 11 to 31 days.

Since the data collection of some patient groups was completed sooner, we continued the data collection until a group of at least n = 26 was reached for every included patient group (pulmonary, oncology, cardiac and healthy subjects). The total study population’s age ranged from 19 to 84 years. Ancora Health included healthy subjects during the COVID-19 period, using viral filters (MicroGard II, Vyaire Medical GmbH) resulting in inaccurate data, as such we omitted VO2peak data of this group [19]. As mentioned before, oncology patients were included from the outpatient clinic and did not perform CPET as part of standard care. Therefore, we present the CPET data from the total group without the healthy subjects and oncology patients. In the so-obtained population, the median VO2peak was 21.94 (16.89–31.29; IQR) ml/kg/min, which is 94.1 (85.7–134.5)% of the predicted reference value for healthy Dutch persons of the same age and sex [20]. Anthropometrical data, CPET data and questionnaire data are presented in Tables 1 and 2. Data of VSAQ and DASI questionnaires can be found in Additional file 2: Table S1.

Table 1 Participant characteristics
Table 2 Intraclass correlations of the questionnaires between T0 and T1

The FitMáx-VO2peak strongly correlated (r = 0.94 (0.91–0.97); 3.70 SEE ml/kg/min) with CPET-VO2peak. The correlation of the VSAQ and DASI with CPET-VO2peak was lower (r = 0.85 (0.76–0.91); 5.89 SEE ml/min/kg and r = 0.76 (0.63–0.85); 6.99 SEE ml/min/kg respectively), as was expected from the results of the validation study [8].

Test–retest reliability

The ICC’s and corresponding 95%-CI for each patient group are displayed in Table 2. The ICC of the FitMáx-VO2peak between T0 and T1 in the total population, was 0.97 (0.96–0.98). As a sensitivity analysis, we performed our ICC analysis in a two-way model examining potential systematic difference and found similar results, as expected. We found similar high ICC values in the VSAQ [0.94 (0.92–0.96)] and DASI [0.90 (0.85–0.93)] (more information in Additional file 2: Table S1). A Bland–Altman plot is provided in Fig. 1 (Additional file 3: Figure S1 for all questionnaires) showing the difference between the two values of FitMáx-VO2peak at T0 and T1 against their mean. The mean difference was − 0.39 (95%-LoA − 5.68 to 4.84 ml/kg/min), 0.31 (95%-LoA − 8.75 to 9.37) and 0.20 (95%-LoA − 5.56 to 5.96) for FitMáx, VSAQ and DASI respectively.

Fig. 1
figure 1

Bland–Altman plot for the FitMáx questionnaire. Notes The colors indicate the reason of the CPET visit. The dashed line represent the limits of agreement (− 1.96 to 1.96 SD). The solid line represents bias and the dotted line is the zero bias line

Discussion

The use of PROMs to assess CRF seems a simple, safe and cost-effective alternative for objective measurement using CPET in clinical settings [7]. The applicability of such PROMs collected via self-reported questionnaires depends upon several clinometric properties. An important aspect in the validation of a new questionnaire is the test–retest reliability. The FitMáx showed an excellent test–retest reliability between the VO2peak estimated at T0 and T1, with an ICC of 0.97 (0.96–0.98; IQR) in the total population. In the different patient groups the ICC ranged from 0.93 to 0.98 for FitMáx, 0.83–0.95 for VSAQ and 0.84–0.95 for DASI. The ICC (and thus SEM) support the precision and reliability of the FitMáx and VSAQ and DASI.

A study by Ravani et al. [21] assessed the test–retest reliability of the DASI. The study was performed in pre‐dialysis patients and patients who received a kidney transplant, and obtained an ICC of 0.71 and 0.81, respectively. These ICC values were lower than the ICC value(s) we found in the current study. This difference may be caused by the 6‐month window they used in their study, which could have resulted in true CRF changes and therefore lower reliability [21].

Strengths

The strength of the current study lies in the diverse study population. We initially included healthy subjects, oncology, pulmonary, and cardiac patients. Although oncology patients and healthy subjects did not perform (valid) CPET, a wide range of VO2peak values was observed in the current study population. The VO2peak ranged from (extremely) low to above average [21.94 (9.8–53.3)]. The FitMáx proves to be widely applicable in a clinical population, with both low and high VO2peak. Moreover, the ICC values of the FitMáx show little variance in the several subject groups. Therefore we can conclude that the ICC is independent of the CPET-VO2peak and the different patient groups to estimate CRF. At last, we ensured minimized ‘subject expectancy effect’ as the participants were not told that this study aimed to determine the test–retest reliability of the FitMáx, but that they could possibly be approached a second time for the purpose of this study.

Clinical applicability

The FitMáx is an inexpensive tool with low burden for subjects to assess CRF. Moreover, the questionnaire proves to be effective in various populations and provides information on daily life activities in several dimensions (intensity, frequency and duration). The current study shows that the FitMáx is reliable to assess CRF over time when no change in CRF has occurred. This makes FitMáx a useful tool to assess self-reported CRF among patients and healthy subjects in clinical settings.

Limitations

The study reached a response rate of only 66%. This might be explained by the assumption of patients that they already completed the exact same questionnaires before. The test–retest period used in the current study was on average 18 days, which could have been too short to prevent subjects from remembering the response of the FitMáx from memory. However, following recommendations, we have deliberately chosen for this short recall period in order to reduce reporting error in estimates of CRF due to fluctuating experienced physical fitness, especially in patients [2, 22]. The small sample size prohibited statistical testing to compare the ICC between questionnaires. Although inspection of the ICC in supplementary material might suggest a higher reproducibility for FitMáx in most patient groups, all three questionnaire revealed high ICC values. This possible difference may not be statistically or clinically relevant.

Conclusion

The FitMáx proves to be highly reliable in repeated measures to assess CRF of patients with different conditions and healthy subjects, when no change in CRF was expected. This increases the applicability and clinical usefulness of the FitMáx.