Introduction

Time-to-exhaustion (TTE) tests are extensively used in exercise physiology to evaluate exercise tolerance. They require a person to sustain a fixed workload for the longest time possible, which makes the execution of a TTE test feasible even for some clinical patients [1]. Nevertheless, TTE tests have been widely criticized for a number of (somewhat debatable) reasons. The main argument against the use of TTE tests is the supposedly poor within-subject reliability when TTE tests are compared to time trials [2]. However, TTE tests and time trials have similar within-subject reliability when the curvilinear relationship between exercise intensity and duration is taken into account [3]. Moreover, they have a similar sensitivity to changes in endurance performance [4]. Notwithstanding conflicting views on this issue, there are experimental conditions where the use of TTE tests is invaluable, such as when the aim is to exclude the potentially confounding factor of varying workload (as it occurs during time trials) in the evaluation of the effects of a given experimental intervention on physiological and psychological responses.

Beyond the reliability issue, the well-documented between- and within-subject variability in TTE [5,6,7] offers another overlooked challenge to researchers, which is how to computationally deal with differences in exercise duration. Indeed, exercise tests with different durations prove difficult to analyse, especially when the time course of physiological or psychological responses must be compared across tests. Therefore, it is imperative to find appropriate ways to analyse data collected during TTE tests. The analysis most used so far (here termed “group isotime”) includes only the portion of the TTE test that is available for all the participants in the group being analysed [8,9,10,11,12,13,14,15,16], which is limited by the participant with the shortest TTE. Disappointingly, this analysis results partial data loss for all the other participants, and the extent of data loss increases with the difference between the TTE of a given participant and that of the participant with the shortest TTE. This may negatively impact the interpretation of the phenomenon under study. To avoid any data loss, it is common to normalize data to the TTE of each test for each individual [17]. However, this analysis compares different TTE tests at the same relative duration (this analysis is here termed “relative isotime”) instead of absolute duration, and consequently, it cannot be used to assess the effects of an experimental intervention. For instance, muscle glycogen depletion [18] or different environmental conditions [19] have no effect on rating of perceived exertion (RPE) if data are expressed against relative time instead of absolute time.

There is a third, less-appreciated method of analysis (here termed “individual isotime”) which allows for the assessment of the effect of an experimental intervention while limiting data loss. Indeed, within individuals, it normalizes data to the shorter TTE and maintains absolute isotime comparisons between different TTE tests [20,21,22]. Nevertheless, the “individual isotime” has not been widely used by researchers. This may be because its potential advantages over traditional methods have not been demonstrated empirically, and possibly because the method has not been explained in sufficient detail to be easily reproduced.

The present study compared the three aforementioned methods of analysis when processing the same set of data collected during TTE tests, with the main aim of comparing the “individual isotime” and the “group isotime” methods. Participants performed four TTE tests at the same workload within individuals, and these four performances were rank ordered from best to worst to have four different levels of performance within individuals (within-subject performance ranking). Physiological and psychological variables expected to change with performance were measured, with a particular interest in the RPE and respiratory frequency (fR) responses. RPE and fR reflect effort during exercise [23,24,25], their rate of increase is strongly associated with TTE [17, 26], and they are sensitive to interventions that affect TTE test performance [10, 11, 27,28,29]. We hypothesized that measured variables would be more sensitive to performance ranking when using the “individual isotime” method compared to the “group isotime” method, because the latter results in partial data loss, thereby preventing full exploration of the data available. We used “within-subject performance ranking” as the independent variable in the present study to demonstrate that variability in TTE can only be reduced to a limited extent, which means that researchers need to select methods of TTE analysis that deal with between- and within-subject differences in exercise duration.

Methods

Subjects

Eleven male participants (mean ± SD: age 22 ± 3 years, stature 1.74 ± 0.09 m, body mass 66 ± 8 kg) volunteered to participate in this study. They were well-trained competitive cyclists with a minimum of 3-years cycling experience and 200 km training per week. This study was approved by the Ethics Committee of the University of Rome Sapienza in compliance with the Declaration of Helsinki. Written informed consent was obtained from all of the participants.

Study protocol

Participants reported to the laboratory on six separate occasions over a 3-week period, with visits separated by at least 48 h. On the first visit, participants performed a ramp incremental exercise test, followed by a TTE test, with the two tests separated by 30 min of recovery. On the second visit, participants performed two TTE tests separated by 30 min of recovery. The 30-min recovery time used in both visits is in line with previous studies performing multiple performance tests in the same visit with the aim to obtain the power–duration relationship [30, 31]. The three TTE tests differed in the exercise intensity and, therefore, exercise duration. This allowed us to obtain the power–duration relationship for each individual, which was used to set the exercise intensity for the experimental TTE tests. Four identical experimental TTE tests were then performed on separate days (visits 3–6). Based on the power–duration relationship, the exercise intensity corresponding to a TTE of 1200 s was selected for these four tests. All the protocols were performed on an electromagnetically-braked cycle ergometer (Lode Excalibur Sport, Groningen, The Netherlands). The positions of the ergometer seat and handlebar during the first visit were recorded for each participant and reproduced in the following visits.

Ramp incremental test

The ramp incremental test was preceded by a 5 min warm-up at 100 W, 3 min of rest, and 2 min pedalling at 20 W. The test consisted of a continuous ramped increase in work rate of 30 W min−1, starting from 20 W. Preferred pedalling cadence was selected by each participant and was kept constant throughout the test, which terminated when cadence fell by more than 10 rpm, despite strong verbal encouragement. The peak power output (PPO) was defined as the highest power output achieved at exhaustion, registered to the nearest 1 W, and the \(\dot{V}{\text{O}}_{{2\;{\text{peak}}}}\) as the highest value of a 30-s average.

Before the ramp incremental test, participants were given standard instructions for providing RPE using the Borg 6–20 scale [32]. During the ramp incremental test, participants were asked to rate their perceived exertion on the RPE scale every minute during exercise and, retrospectively, at exhaustion. This procedure served as a familiarization with the scale.

Preliminary TTE tests

Three preliminary TTE tests were performed during visits 1 and 2 to obtain the power–duration relationship for each individual. These three TTE tests were performed on average at 87 ± 0.3%, 76 ± 1.3%, and 70 ± 2.7% of the ramp PPO, to result in TTEs of approximately 4, 10, and 18 min, although between-subject variability was expected. This choice was made to have exercise durations suitable for a good prediction of the power output corresponding to a TTE of 20 min. The 76% PPO test was performed during the first visit after the ramp incremental test, while the 70% PPO test was performed before the 87% PPO test during the second visit. The three performance data points were then used to derive the power–duration relationship for each individual using a power law mathematical function with the following equation: Y = cXb, where Y (power output) and X (time) are the two variable quantities, c is the theoretical maximal power output at time zero, and b is the scaly exponent describing the decrease in power output over time. For more detailed information on the use of the power law function in endurance sports, see García-Manso et al. [33]. In the present study X was fixed to 1200 s, and the corresponding power output was obtained for each individual.

Preferred pedalling cadence was selected by each participant before the first preliminary TTE test. The participant was asked to maintain the cadence within a range of preferred cadence ± 7 rpm during the test. This was done to reduce potential changes in physiological and psychological variables due to changes in pedalling cadence. The participant had been informed that a 10-s countdown would have started whenever pedalling cadence fell outside the predefined range. If the cadence returned to a value within that range before the countdown was completed, the test continued; otherwise, participants were judged to have reached exhaustion, which corresponded to the end of the 10-s countdown. This objective exhaustion criteria allowed us to register TTE to the nearest second. Preferred cadence was kept constant for a given participant throughout both preliminary and experimental TTE tests. The participant did not receive any feedback or encouragement during any of the TTE tests performed in the present study.

Every 2 min during the three preliminary TTE tests, RPE and affective valence (i.e., pleasure/displeasure experienced during exercise, measured using the Feeling Scale) were collected to allow the participants to thoroughly familiarise with the two scales. The Feeling Scale was first presented to participants in the first visit before the 76% PPO test, and standard instructions were provided [34]. Feeling Scale scores can range from + 5 (the exercise feels “very good”) to − 5 (the exercise feels “very bad”).

Experimental TTE tests

On visits 3–6, participants performed an identical TTE test at a power output corresponding to a predicted TTE of 1200 s, as detailed in the previous section, with the aforementioned exhaustion criteria. Power output was prescribed based on the individual power–duration relationship in an attempt to reduce the relatively high between-subject variability in TTE that is commonly observed when other methods of exercise prescription are used. The four experimental TTE tests were preceded by a standardized warm-up. This consisted of 3 min at 100 W, 6 min at 50% of PPO, 1 min at 60% of PPO, and 1 min at 100 W. Tests were then preceded by 3 min of rest and 2 min pedalling at 20 W. During all the tests, fR, minute ventilation (\(\dot{V}_{\text{E}}\)) and heart rate (HR) were measured breath-by-breath using a metabolic cart (Quark b2, Cosmed, Rome, Italy). Appropriate calibration procedures were performed following the manufacturer’s instructions. RPE and affective valence were reported every 2 min.

Control of factors potentially confounding performance

In an attempt to limit the within-subject variability in TTE, a number of potential confounding factors were controlled, to limit their influence on performance. All testing was completed in the laboratory with a room temperature of 20–22 °C and at the same time of day (± 1 h) within participants. Participants were asked to refrain from caffeine and alcohol at least for the 3 h and 24 h, respectively, preceding each test. They were asked to record food intake on the day of the first experimental TTE test as well as the day before, to replicate it before the subsequent experimental TTE tests. Participants were also asked to standardise their training routine and to avoid strenuous exercise the day before the test. At each visit to the laboratory, participants were asked to complete a pre-test checklist to verify that they had complied with the instructions given to them. They were also asked to confirm that they were not in a state of mental fatigue, physical fatigue, sleep deprivation, and that they were free of injury and under no medical treatment. A single test was rescheduled, because the participant failed to meet some of the requirements. To reproduce a competitive setting and favour the achievement of a maximal effort in all the tests, a performance-based prize (£ 200 voucher) was offered for the participant with the longest average TTE considering all the four tests. No feedback on performance was provided to participants until all the tests were completed.

Data analysis

Data were analysed with MATLAB (R2016a, The Mathworks, Natick, MA, USA). Before performing the three different TTE analyses, breath-by-breath ventilatory data were filtered for errant breaths (i.e., values resulting from sighs, swallows, coughs, etc.) by deleting values greater than 3 standard deviations from the local mean [35]. Subsequently, breath-by-breath ventilatory data were interpolated with a linear function and extrapolated every second. Data were then smoothed by a moving average of 60 s. RPE and affective valence data collected every 2 min were interpolated with a linear function and subsequently extrapolated to have continuous values every second. For each individual, the four tests were rank ordered from the best to the worst based on TTE. This was done to have different levels of performance within individuals, reflective of within-subject variability in TTE.

Group isotime method

When data were processed with the “group isotime” method, the worst test of the participant with the shorter TTE (i.e., participant 2; Fig. 1) was selected to identify the timepoints in which to segment all the tests of all the participants. This test lasted 530 s. To obtain 10 equally spaced timepoints characterizing each test, the following timepoints were considered: 53, 106, 159, etc. up to 530 s. This method is termed “group isotime”, because all the tests of all the participants are analysed considering the same absolute timepoints. The extent of data loss (EDL) that occurs with this method was calculated as: EDL = (avgTTE − isotime duration)/avgTTE × 100, where avgTTE is the average TTE of the group in seconds (e.g., 1026 s in the present study; worst test), while isotime duration corresponds to the last timepoint (in seconds), where all the participants were represented (e.g., 530 s in the present study; worst test). When data were available, the same formula was used to calculate the EDL from the previous studies that used the “group isotime” method, for a comparison with the present study.

Fig. 1
figure 1

TTE of the four tests for each participant. The dashed line indicates the 1200 s value. A test for participant 2 is hidden by other two tests

Individual isotime method

When data were processed with the “individual isotime” method, each participant was considered in isolation when segmenting the tests in timepoints, hence the name “individual isotime”. For each participant, the worst test was taken into account for identifying ten timepoints in which the four tests were segmented. Considering again participant 2, exactly the same timepoints used for the “group isotime” analysis (53, 106, 159, etc. up to 530 s) were selected. However, the timepoints identified for the other participants differed from each other on the basis of the TTE of their worst test. For instance, the worst test of participant 5 had a TTE of 1173 s. Hence, the timepoints considered for the four tests of that participant were 117, 235, 352, etc. up to 1173 s. Importantly, with this analysis, the worst test of all participants did not result in any data loss. This means that a greater portion of data was included in the between-test comparison, relative to the “group isotime” analysis. For further clarification on this analysis, please note that each data point corresponds to different absolute time values between participants. This can be depicted graphically by adding horizontal error bars. However, horizontal error bars are identical across conditions when using the “individual isotime” method, apart from the test end value. Therefore, we opted for a graphical representation of the horizontal error bars, which preserves the quality of the graph and avoids redundancy (Figs. 2, 3, 4). To promote a full understanding of the “individual isotime” analysis as well as the “relative isotime” analysis described below, we have made available the codes used to run the two analyses as Supplementary material (Online Resource 1).

Fig. 2
figure 2

fR and RPE responses for the worst (filled circles), third (open circles), second (filled triangles), and best (open triangles) test analysed with the “group isotime” (a, b), “individual isotime” (c, d), and “relative isotime” (e, f) methods. To preserve the clarity of panels c and d, the horizontal error bar is depicted in the lower part of the two panels, being the time error identical across the four tests when using the “individual isotime” analysis. #Significant interaction (P < 0.05). §Significant main effect of rank (P < 0.05). *Significant simple main effect of rank (P < 0.05)

Fig. 3
figure 3

\(\dot{V}_{\text{E}}\) and HR responses for the worst (filled circles), third (open circles), second (filled triangles), and best (open triangles) test analysed with the “group isotime” (a, b), “individual isotime” (c, d), and “relative isotime” (e, f) methods. To preserve the clarity of panels c and d, the horizontal error bar is depicted in the lower part of the two panels, being the time error identical across the four tests when using the “individual isotime” analysis. #Significant interaction (P < 0.05). §Significant main effect of rank (P < 0.05). *Significant simple main effect of rank (P < 0.05)

Fig. 4
figure 4

Affective valence response for the worst (filled circles), third (open circles), second (filled triangles), and best (open triangles) tests analysed with the “group isotime” (a), “individual isotime” (b), and “relative isotime” (c) methods. To preserve the clarity of the panel b, the horizontal error bar is depicted in the lower part of the panel, being the time error identical across the four tests when using the “individual isotime” analysis. #Significant interaction (P < 0.05). *Significant simple main effect of rank (P < 0.05)

Relative isotime method

When data were processed with the “relative isotime” method, each test of each participant was segmented into ten timepoints on the basis of the TTE of the test analysed. Again, for participant 2, the worst test was segmented in the following timepoints: 53, 106, 159, etc. up to 530 s, as for the other two analyses. However, the best test of participant 2 was segmented in the following timepoints: 93, 187, 280, etc. up to 933 s because of the longer TTE. The same procedure was applied for the other tests of participant 2 as well as for all the tests of the other participants. With this method, there is no data loss, but different tests are not compared at the same absolute timepoints within participants but at the same percentages of TTE. Therefore, this method is here termed “relative isotime”.

Statistical analysis

An a priori power analysis was performed using G*Power (version 3.1.9.2; Kiel University, Kiel, Germany). Expecting a large effect size for the sensitivity of fR and RPE to within-subject performance ranking, a sample size of 7 was required based on 1 − β = 0.80 and α = 0.05. Eleven participants were recruited to account for potential dropping out.

Statistical analyses were conducted using IBM SPSS Statistics 20 (SPSS Inc, Chicago, IL, USA) unless otherwise stated. Data were checked for normality prior to analysis. The reliability in TTE was quantified by means of the log-transformed coefficient of variation (CV) with 90% confidence limits using a published open-source spreadsheet in Microsoft Excel (Microsoft Corp.) [36]. For TTE data, the within- and between-subject variance components were calculated as percentages of the total variance by means of a linear mixed model based on the restricted maximum likelihood estimates approach, where participants served as a random between-subjects factor and test as a fixed within-subject factor [7]. More specifically, the within- and between-subject variance components were summed to obtain the total variance, and their percentage contributions to the total variance were calculated. A one-way repeated-measures ANOVA was used to compare the end value of fR, \(\dot{V}_{\text{E}}\), HR, RPE, and affective valence across the four TTE tests. A two-way repeated-measures ANOVA (rank × time) was used to analyse the effect of rank on fR, \(\dot{V}_{\text{E}}\), HR, RPE, and affective valence responses. The same statistical analysis was used for the three methods of data processing under study, i.e., the “group isotime”, “individual isotime”, and “relative isotime”. When the sphericity assumption was violated, the Greenhouse–Geisser adjustment was performed. For the main effect of rank, the main effect of time, and the interaction, partial eta squared (η2p) effect sizes were calculated; an effect of η2p ≥ 0.01 indicates a small effect, η2p ≥ 0.059 a medium effect, and η2p ≥ 0.138 a large effect [37]. When a significant main effect of rank was found, the Bonferroni test was used as follow-up analysis. When a significant interaction was found, a one-way repeated-measures ANOVA was used to test the simple main effect of rank at different timepoints.

Within-subject correlation coefficients (r) were computed for the correlations between RPE and fR, using the method described by Bland and Altman [38]. This method adjusts for repeated observations within participants, using multiple regression with “participant” treated as a categorical factor using dummy variables. A correlation coefficient and a P value were obtained considering the four tests together, as well as for each test considered separately. These correlations were computed using data analysed with the “relative isotime” method, because it is the only analysis method which results in no data loss for any test. A P value < 0.05 was considered statistically significant in all analyses. The results are expressed as mean ± SD in text and as mean ± SE in figures.

Results

\(\dot{V}{\text{O}}_{{2\;{\text{peak}}}}\) and the PPO measured during the ramp incremental test were 4341 ± 623 mL min−1 (66 ± 6 mL kg−1 min−1) and 422 ± 42 W, respectively. The power outputs of the three preliminary tests were 368 ± 37 W, 323 ± 33 W, and 294 ± 34 W, and the corresponding TTEs were 286 ± 56 s, 547 ± 56 s and 1049 ± 213 s. The power output for the experimental TTE tests was 288 ± 40 W.

Figure 1 depicts the TTE for the four tests of each participant. On average, the TTE (1319 ± 356 s) was higher than that predicted by the power–duration relationship. The TTE CV (with 90% confidence limits) was 25.3% (20.2, 35.2). The within- and between-subject variance components contributed to 52.9% and 47.1%, respectively, of the total variance. No significant differences were observed between the four tests when these were ordered from the first to the last experimental visit (1300 ± 284 s, 1288 ± 381 s, 1360 ± 374 s, and 1328 ± 418 s). These findings indicate that there was no order effect. When the four tests were rank ordered by performance (from the best to the worst), a significant effect of rank was found (P < 0.001; η2p = 0.712). The TTEs were: 1526 ± 332 s; 1425 ± 313 s, 1295 ± 325 s and 1026 ± 265 s, and pairwise comparisons were all significantly different (P < 0.044).

Table 1 reports the P value and effect size (η2p) for the main effect of rank, the main effect of time and the rank × time interaction for fR, \(\dot{V}_{\text{E}}\), HR, RPE, and affective valence, analysed with the three different methods. While no main effect of rank and no interaction was found for any variable when data were analysed with the “group isotime” method, all the variables except for HR showed a main effect of rank and/or an interaction when data were analysed with the “individual isotime” method. Considering the two variables for which a main effect of rank was found when using the “individual isotime” method, the follow-up analyses revealed a significant difference (P < 0.022) between the worst test and the best and second best tests for fR, while only a statistical trend was found for \(\dot{V}_{\text{E}}\) (worst vs. best, P = 0.095; worst vs. second best, P = 0.073).

Table 1 Physiological and psychological variables analysed with the three methods

The differences observed between the “group isotime” and “individual isotime” methods are due to the data loss that occurs with the “group isotime” method, with the extent of data loss being 48%. This is evident from the time courses of fR, \(\dot{V}_{\text{E}}\), HR, RPE and affective valence, as depicted in Figs. 2, 3, and 4. When a significant interaction was found, these figures show where a simple main effect of rank was found.

No significant between-test differences were found when comparing the end value of fR (from the best to the worst test: 64 ± 13, 62 ± 12, 65 ± 14, and 63 ± 11 breaths min−1), \(\dot{V}_{\text{E}}\) (147 ± 30, 148 ± 29, 145 ± 28, and 147 ± 29 L min−1), HR (185 ± 10, 186 ± 11, 186 ± 12, and 185 ± 10 bpm), and affective valence (0.2 ± 3.3, 0.6 ± 3.1, 0.9 ± 2.8, and 1.0 ± 3.0), while a statistical trend (P = 0.052) was found for RPE (19.6 ± 0.5, 19.5 ± 0.5, 19.3 ± 0.6, and 19.3 ± 0.6).

A strong correlation was found between RPE and fR when the four tests were considered together (P < 0.001, r = 0.80), as well as when the tests were considered separately (P < 0.001; from the best to the worst test: r = 0.84, r = 0.81, r = 0.85 and r = 0.84). These correlations are reported in Fig. 5.

Fig. 5
figure 5

Correlations between RPE and fR for the worst (filled circles), third (open circles), second (filled triangles), and best (open triangles) tests. Each symbol represents the mean value of all participants at each percentage of the TTE

Discussion

This is the first study to compare different methods of analysis to empirically determine which is the most appropriate way to analyse data collected during TTE tests. The findings show that the choice of method dramatically influences the magnitude and the statistical significance of the effect of the independent variable (within-subject performance ranking) on the dependent variables (physiological and psychological responses). Specifically, fR, \(\dot{V}_{\text{E}}\), RPE, and affective valence were sensitive to performance ranking when data were analysed with the “individual isotime” method, but not when the traditional “group isotime” method was used. This emphasises how the method of analysis influences the interpretation of the phenomenon under study.

To reduce the between-subject variability in TTE, we prescribed exercise based on the power–duration relationship, which is, at least in principle, an ideal method to reduce between-subject variability in TTE, because the power output can be selected on the basis of the desired TTE (20 min in the present study). Using the power–duration relationship, we found that the % contribution of the between-subject variability (47.1%) to the overall variability in TTE was lower than the 59.4% reported by Faude et al. [7] during constant-load cycling at the maximal lactate steady state. This suggests that the prescription modality used in the present study limited the extent of between-subject variability in TTE, despite average TTE being higher than the desired TTE. We also attempted to reduce the within-subject variability in TTE by controlling potential confounding factors. However, we found similar reliability values of TTE (CV % = 25.3) to those reported by Faude et al. [7] (CV % = 24.6), because within-subject variability is an inherent characteristic of performance. Our findings suggest that the overall variability in TTE can only be reduced to a limited extent. Therefore, it is imperative to use appropriate methods of analysis that deal with between- and within-subject differences in performance duration.

With the aim to identify the most appropriate method to analyse data collected during TTE tests, we purposely selected physiological and psychological variables expected to change according to variations in performance, with a special interest in the responses of RPE and fR. The rates of increase in RPE and fR are correlated with TTE at least during high-intensity exercise [17, 26], and RPE and fR are sensitive to experimental interventions that affect TTE performance including muscle fatigue [10] and damage [28], and increase in body temperature [39] and hypoxia [27]. Moreover, the linear increase in RPE and fR over time makes the distinction between a control and an experimental condition more evident in the second half of a TTE test [10]. Therefore, the partial data loss that occurs when using the “group isotime” method may considerably affect the sensitivity of RPE and fR to an experimental intervention or another independent variable. This hypothesis was supported in the present study.

When analysed with the “group isotime” method, no significant differences in fR, \(\dot{V}_{\text{E}}\), RPE, and affective valence were found across tests. Conversely, when analysed with the “individual isotime” method, fR, \(\dot{V}_{\text{E}}\) and RPE values were higher, while affective valence values were lower, in the worst test compared to the other tests, and the extent of the differences increased over time. This emphasises how the data loss that occurs with the “group isotime” method profoundly affects the results of the study, and this is evident in Figs. 2, 3, and 4. Therefore, as expected, fR, \(\dot{V}_{\text{E}}\), RPE, and affective valence were sensitive to performance ranking, but only when the “individual isotime” method was used. This shows how the interpretation of the phenomenon under study may change dramatically depending on the analysis conducted. Our findings strongly suggest that researchers should use the “individual isotime” instead of the traditional “group isotime” method.

The importance of our findings can be further appreciated if they are read against the previous studies that used the traditional “group isotime” method. The extent of data loss found in the present study when using the “group isotime” method (48%) is similar to that found in the previous research [8,9,10,11,12, 14,15,16], with these studies having values ranging from 42% [11] to 53% [10]. This suggests that the use of the “group isotime” method may have affected the results and interpretation of a number of the previous studies. Therefore, quantifying the extent of data loss is useful when interpreting data from the previous studies, because the higher the extent of data loss, the higher the probability that the use of the “group isotime” method may have influenced the results. For instance, Marcora et al. [10] found no effect of muscle fatigue on RPE in the first of two similar studies, where 53% of data loss occurred as a consequence of the use of the “group isotime” method. Conversely, in the second study, the authors [10] found a higher RPE in the muscle fatigue condition when the control condition was performed, for each individual, at exactly the same duration of the muscle fatigue condition to avoid any data loss. Therefore, our findings should be considered when interpreting the previous results obtained with the “group isotime” method.

Our findings clearly show that the “relative isotime” analysis is not an alternative method to the individual “isotime analysis” and that it cannot be used to assess the effects of an independent variable. Indeed, very different effects were found for physiological and psychological variables when comparing the “relative isotime” with the “individual isotime” method (Table 1). For instance, the large differences observed across tests for fR when using the “individual isotime” method were not revealed with the use of the “relative isotime” method. Furthermore, when analysed with the “individual isotime” method, RPE was higher over the last minutes of the worst TTE test compared to the other TTE tests, while it was, conversely, lower when analysed with the “relative isotime” method. On the other hand, it is not surprising that variables representing effort show similar responses across different TTE tests when values are compared at the same relative distances from the point where a maximal effort is exerted (i.e., using the “relative isotime” analysis), as also found in other studies [18, 19]. Therefore, the “relative isotime” method is not informative of the between-test differences that may occur at the same absolute timepoints. However, isotime comparisons are needed in most of the studies using TTE tests. For instance, the effects of different environmental conditions [19] or muscle glycogen depletion [18] on RPE are not revealed if data are expressed against relative time instead of absolute time. Nevertheless, the “relative isotime” method can be used for correlating the responses of different variables (as done in the present study for RPE and fR), as it is the only method that results in no data loss.

In the light of our findings, it is surprising how the “individual isotime” method has received limited attention so far. As it had not previously been compared with other methods of analysis, there may be limited awareness of its importance. In addition, there are a number of discrepancies in the way that the “individual isotime” method has been reported in the previous studies [20,21,22, 40, 41], sometimes leaving uncertainty over which method of analysis was used. Therefore, we provide some guidelines for TTE analysis reporting that researchers are encouraged to follow. First, we suggest that researchers use the terminology adopted in the present manuscript, where the rationale for the terms “group isotime”, “individual isotime”, and “relative isotime” have been explained. Second, we discourage expressing data processed with the “individual isotime” method as a % of TTE (e.g., [40]), because this may lead researchers to confuse the “individual isotime” with the “relative isotime” method. Rather, we suggest expressing data against absolute time and adding horizontal error bars (see Figs. 2, 3, 4 for an example). Third, we suggest to analyse data collected during TTE tests using the MATLAB code provided here as Supplementary material (Online Resource 1). This would clarify the analysis used, and would avoid any error due to manual data processing.

While we used a classic constant-workload exercise test, the present findings also apply to a variety of exercise protocols characterized by variable workloads that are performed to exhaustion. Among these, the incremental exercise test is commonly used to evaluate exercise tolerance and measure key physiological parameters. Intermittent TTE tests are also widely used, particularly in the field of neuromuscular physiology [42]. Furthermore, our findings can be extended to animal studies, where TTE tests are the preferential exercise protocols used to evaluate exercise tolerance [43]. Collectively, TTE tests, in their various forms, have contributed substantially to our understanding of the mechanisms underlying exercise tolerance and fatigue in humans and animals. However, for a deeper understanding of the physiological and psychological mechanisms of exercise tolerance, further research should pay careful attention to the method used to analyse data collected during TTE tests.

Conclusion

The present study shows that the method used to process data collected during TTE tests dramatically affects the magnitude and the statistical significance of the effect of the independent variable on the dependent variables. Investigating the effect of within-subject performance ranking on physiological and psychological variables, we found that fR, \(\dot{V}_{\text{E}}\), RPE, and affective valence are sensitive to performance ranking, but this was the case only when the “individual isotime” method was used. This method greatly reduces the partial data loss that occurs when the traditional “group isotime” method is used. We also provided detailed information on how to use the “individual isotime” method to encourage its use in future studies. Based on our findings, researchers are strongly encouraged to use the “individual isotime” method instead of the “group isotime” method, to correctly interpret the phenomenon under study. The arising implications extend to incremental exercise, which is another commonly used test of exercise tolerance.