Introduction

Interstitial lung disease (ILD) is the collective term for a series of pulmonary disorders characterised by inflammation, interstitial and alveolar damage, and irreversible declines in lung function [1]. Presently, ILD affects approximately 2 million people [2] and results in approximately 120,000 deaths globally [3]. Traditionally, resting measures of pulmonary function, including forced vital capacity (FVC) and the diffusion capacity of carbon monoxide (DLCO), have been utilised to monitor disease progression and evaluate the efficacy of treatments. Both variables are predictive of mortality [4] and provide greater predictive power for survival over 6 months than histopathological factors alone [5]. However, these are not the sole factors predictive of mortality.

Cardiopulmonary exercise testing (CPET) is a dynamic diagnostic and prognostic test that simultaneously stresses multiple organ systems in order to identify causes of exercise intolerance, and obtain functionally useful biomarkers [6]. Lower values for peak oxygen uptake (VO2peak), the primary outcome from cardiopulmonary exercise testing (CPET), are also associated with increased risk of mortality and need for transplantation [7,8,9,10], enhancing the predictive power of static pulmonary function testing [8], whilst also maintaining high independent predictive power when these factors are controlled for [7]. Ventilatory efficiency, and exercise induced hypoxemia are also indicative of poorer prognosis [11], thus highlighting the importance of more functionally derived data available from CPET as independent and dynamic prognostic outcome measures in addition to traditional, static, pulmonary variables.

The utility and validity of CPET in a range of pulmonary disease has been described previously [12], and within the key requirements of exercise protocols eliciting VO2peak is confidence that a ‘maximal’ value has been achieved and that sub-maximal values are not mistakenly accepted [12]. However, of previous studies to utilise CPET in ILD, it is unclear as to whether maximal exercise has actually been achieved, as prior studies in ILD have either not reported how maximal exercise is classified [7,8,9], or only use limited criteria to establish a ‘maximal’ value [10]. Equally, there is a lack of data on the repeatability of CPET in ILD, with a need to understand this to be able to accurately interpret significant and clinically meaningful changes in function, to inform and evaluate treatment options and appropriately assess disease progression [13].

Therefore, this study sought to characterise CPET responses in patients with ILD, focusing on the validity and repeatability of the test, with particular emphasis with regards to VO2peak, to further the evidence for the using this parameter as an independent physiological marker of disease progression in ILD.

Materials and methods

Study design, population and ethics

This analysis forms part of a wider study (PETFIB: Exploring the potential of Cardio-Pulmonary Exercise Testing as a biomarker in patients diagnosed with FIBrosing Lung Disease), whereby the clinical feasibility and patient acceptability of CPET, and initial results on participant characteristics, have been previously reported [14]. This study recruited 26 people with ILD [19 male] via convenience sampling, of differing diagnoses, and prescribed differing medications as per Table 1. All participants attended the research facility on three occasions, over a 20-month period from August 2017 to May 2019, with a period of 3 months (0.2–0.3 years) separating each visit where possible.

Table 1 Baseline anthropometric, pulmonary, and clinical data in study participants

Ethics approval for this study was granted by the Health Research Authority (IRAS 220189) following review by the South West (Frenchay) Research Ethics Committee (17/SW/0059). All participants provided written and informed consent upon recruitment to the study.

Physiological measures

Participant’s stature and body mass were assessed using standard methods, with body mass index (BMI) subsequently calculated. Body fat percentage was assessed using air displacement plethysmography (BodPod; COSMED, Rome, Italy), with subsequent values for fat mass and fat-free mass (FFM) calculated.

Retrospective measures of pulmonary function were obtained from medical records, whereby the date closest to the participants first CPET was utilised. Measures included forced expiratory volume in one second (FEV1), FVC and DLCO, expressed as absolute values and as a percent of predicted value for age, sex, and stature [15, 16]. Furthermore, GAP scores, incorporating a composite of gender, age and physiology were also calculated for each participant. Scores range from 0 to 8, whereby an increased score is indicative of a greater risk profile for early mortality [17]. Physical activity status was subjectively assessed using the General Practice Physical Activity Questionnaire [18].

Cardiopulmonary exercise testing

Participants underwent a CPET on an electronically braked cycle ergometer (Lode Excalibur; Lode, Groningen, the Netherlands), whereby the protocol incorporated an initial warm-up at 0 W for three minutes before an incremental ramp phase increased resistance by 10 W min−1. Participants were instructed to maintain a self-selected cadence between 60 and 80 revolutions per minute (rpm) until volitional exhaustion, defined as a decrease in cadence < 10 rpm for 5 consecutive seconds despite verbal encouragement from research staff. Upon exhaustion, the resistance was removed, and participants returned to pedalling at 0 W for a further three minutes to cool down. This protocol has been detailed previously [14].

Throughout the CPET, measures of pulmonary gas exchange were recorded using a metabolic cart (Medgraphics Ultima; Medical Graphics UK Ltd., Gloucester, UK), calibrated for volume and gas concentrations prior to each test. Data was measured breath-by-breath and analysed in 10 s averages, with VO2peak and presence of a plateau in VO2 being determined using methods described previously [19]. Briefly, a linear regression was plotted over the ‘linear’ portion of the exercise test, with data from the first and last two minutes prior to exhaustion (or clinical termination) excluded. The VO2 from this linear portion was then extrapolated over the remainder of the test, and residuals from final 60-s isolated and examined against the extrapolated portion. A negative residual indicated a deceleration in VO2 against power output and was defined as a plateau when the magnitude of residuals was ≥ 5% of projected VO2 (Fig. 1a). Either a positive or negative residual < 5% of projected VO2 indicated a linear response (Fig. 1b). Finally, a positive residual ≥ 5% indicated an acceleration in VO2 against power output (Fig. 1c).

Fig. 1
figure 1

Example VO2 responses to increasing work-rate during cardiopulmonary exercise tests. A: Deceleration of VO2, producing a plateau (64-year old male with idiopathic pulmonary fibrosis); B: Linear response (70-year old male with idiopathic pulmonary fibrosis); C: Acceleration of VO2 against power (58-year old male with chronic hypersensitive pneumonitis). For all cases, the extrapolated regression line is fitted from 120 s, through to volitional exhaustion. VO2: oxygen uptake

Normative values of Jones et al. [20], as suggested by the European Respiratory Society (ERS) [12], were utilised to present VO2peak and peak work rate (WRpeak) as a percent of predicted. Determination of the gas exchange threshold (GET) was undertaken using the V-slope method as previously described [21], and verified using ventilatory equivalents for oxygen (VE/VO2) and carbon dioxide (VE/VCO2).

Subjective ratings of perceived exertion (RPE) and dyspnoea (RPD) were recorded at baseline, throughout the CPET, and at test termination, on validated scales of 6–20 and 0–10 respectively [22].

Participants also wore a 12-lead ECG (Welch Allyn CardioPerfect; Hillrom, Chicago, USA) and pulse oximeter (Choice MMed MD300C2; ChoiceMMed, Dusseldorf, Germany), to monitor cardiac changes and peripheral capillary oxygen saturation (SpO2) respectively. All CPETs were supervised by an exercise physiologist and medical doctor, and the CPET was terminated if either ECG (e.g., arrhythmia) or SpO2 responses warranted early cessation for patient safety. In the first round of CPETs, SpO2 limit was set at < 88%, and extended to < 80% in the second and third CPETs as hypoxemia was shown to be well tolerated in the first CPET.

Determination of validity

CPET was determined to be a ‘maximal’ effort (and therefore valid) if it satisfied at least one of the criteria set forward by a recently published technical standards document from the ERS [12]. With relation to the current study design (cycle ergometry with ramp protocol) and available measures (pulmonary gas exchange, work rate, retrospective spirometry and cardiac function), these criteria included a primary criterion of a plateau in VO2 (as previously described), or one of numerous secondary criteria, including: (1) achieving predicted VO2peak, using aforementioned normative equations of Jones et al. [20]; (2) achieving predicted WRpeak, using normative equations of Jones et al. [20]; (3) achieving predicted maximal heart rate (HRmax; calculated as 220-Age); (4) peak ventilation (VE) reaching, or exceeding, 85% of estimated maximal voluntary ventilation (MVV; calculated as FEV1 × 40); and (5) respiratory exchange ratio (RER) ≥ 1.05.

Determination of repeatability

For participants who performed at least two valid CPETs within an either a 3-month or 6-month period (an ecologically valid time frame reflecting frequency of clinical visits), differences between CPETs were established using paired samples t-tests. Reproducibility was established using an existing spreadsheet [23], with calculation of (a) changes in the mean, (b) Pearson’s correlation coefficients, (c) intraclass correlation coefficients (ICC), d) absolute typical error (TE), and (e) TE expressed as a percentage of the coefficient of variation (TECV%). Both TE and TECV% were calculated with 95% confidence limits and a smallest worthwhile effect size of 0.2. This approach has previously been utilised for determining repeatability of exercise based parameters in respiratory disease [24]. Furthermore, Bland–Altman analyses [25] identified the mean bias and limits of agreement (LoA) between repeated measures of VO2.

Statistical analyses

Assessment of validity and repeatability have been discussed previously within the methodology and therefore, given these aforementioned approaches, no formal power calculation was undertaken. With regards to correlation coefficients, magnitudes were described as small (0.1 < 0.3), medium (0.3 < 0.5) and large (≥ 0.5) as per existing thresholds [26]. For all analyses, statistical significance was set at p = 0.05.

Results

Participant characteristics

Twenty-six participants were recruited, although clinical contraindications resulted in n = 2 being excluded from baseline CPETs, as described in Fig. 2, and described previously [14].

Fig. 2
figure 2

Flowchart detailing participant enrolment and successful completion of cardiopulmonary exercise tests within study. CPET: cardiopulmonary exercise test; SpO2: peripheral capillary oxygen saturation

Therefore, n = 24 undertook at least one CPET. Further exclusions during the course of the study period resulted in n = 21 completing all three CPETs. Descriptive participant characteristics of the n = 26 recruited, n = 24 to undertake at least one CPET, and n = 21 to perform all three CPETs are listed in Table 1. A total of 67/78 prospective CPETs were completed during the course of this study.

Testing timeline

For n = 21 participants to undertake all three CPETs, the mean duration between visits 1 and 2 was 26 ± 12 weeks (range = 12–49 weeks), and between visits 2 and 3, this was 15 ± 5 weeks (range = 10–28 weeks). Time between the first and last CPET was 41 ± 14 weeks (range = 23–65 weeks).

A total of n = 17 participants successfully performed at least two CPETs within a 3-month period (14 ± 2 weeks, 11–16 weeks), with this either being between the first and second CPET, or between the second and third. If participants had a 3-month gap between their first and second, as well as second and third CPETs, data between the first and second CPETs was carried forward for repeatability analyses. Within this sample of n = 17, a further n = 12 participants successfully performed two CPETs within a 6-month period (27 ± 1 weeks, 25–29 weeks).

Factors that prevented all participants completing CPETs within the prescribed 3-month (and subsequent 6-month) periods included personal availability, malfunctioning equipment, laboratory availability, participants forgetting to attend scheduled research visits, and staff availability (affected by general clinical rotas and hospital winter pressures).

Changes in exercise and pulmonary function

Exercise based outcomes for the n = 21 to complete all three CPETs are listed in Table 2, whereby WRpeak ranged from 20 to 166 W in this group during the study, and VO2peak ranged from 0.34 to 2.40 L min−1. However, due to the unplanned variances in individual testing timelines as mentioned above, each participant does have differing time frames between each CPET. Therefore, no formal analyses could be undertaken on these exercise-based parameters, and the data displayed in Table 2 are for descriptive purposes only, whereas formal repeatability analyses are presented further below.

Table 2 Changes in anthropometric and exercise responses at each study visit for n = 21 who completed all three study visits

Furthermore, due to the retrospective nature of obtaining pulmonary function data, a number of data points could not be retrieved from participant medical records, leading to incomplete pulmonary function data as seen in Table 1. There was also wide variability in the time difference between pulmonary function tests and CPETs. The smallest difference was zero days, whereby a participant had undertaken pulmonary function testing on the same day as a CPET. The mean difference was -32 ± 96 days (− 0.09 ± 0.26 years), indicating that pulmonary function tests, were as an average, undertaken 1 month prior to each CPET. However, the total range was from − 252 to 317 days (− 0.69 to 0.87 years). Therefore, given this disparity in timelines, and the fact that pulmonary function is not a primary outcome variable within this study, these data are only utilised as a descriptive variable in Table 1, and no further analyses are undertaken with regards to FEV1, FVC or DLCO. Further to these changes in pulmonary function, no mean change was identified in GAP score, or physical activity status.

Validity of cardiopulmonary exercise testing

The majority of CPETs undertaken were terminated due to patients successfully reaching volitional exhaustion (n = 42), whereas the remaining tests were terminated for clinical reasons (n = 24), and n = 1 reason was not recorded. Of the n = 67 CPETs completed, VO2 responses were linear (n = 32, 48%), accelerations (n = 6, 9%) and plateaus (n = 14, 21%) in nature. A total of n = 15 (22%) CPETs were of insufficient length to analyse VO2 residuals. Of the n = 42 CPETs terminated due to volitional exhaustion, responses were linear (n = 23, 55%), accelerations (n = 4, 10%) and plateaus (n = 9, 21%). A total of n = 6 (14%) were of insufficient length to analyse. A breakdown of these frequencies per CPET is provided in Fig. 3A1, A2.

Fig. 3
figure 3

Frequency of VO2 profiles, and primary and secondary verification criteria, in each cardiopulmonary exercise test. 1 = All completed CPETs (CPET 1, n = 24; CPET 2, n = 22; CPET 3, n = 21). 2 = CPETs whereby participants reached volitional exhaustion (CPET 1, n = 14; CPET 2, n = 11; CPET 3, n = 17). CPET: cardiopulmonary exercise test; HRmax: maximal heart rate; RER: respiratory exchange ratio; VE/MVV: minute ventilation/maximal voluntary ventilation; VO2: oxygen uptake; VO2peak: peak oxygen uptake; WRpeak: peak work rate

Further to the occurrence of a number of plateaus in VO2, a number of secondary criteria were achieved by participants, with a full breakdown provided in Fig. 3B1, B2. Across all 67 CPETs, multiple secondary criteria were obtained: reaching predicted VO2peak (n = 8), reaching predicted WRpeak (n = 4), reaching predicted HRmax (n = 29), reaching ≥ 85% MVV (n = 12) and reaching RER ≥ 1.05 (n = 60). At least one criteria (primary or secondary) was fulfilled for n = 24 CPETs, two criteria in n = 22, three criteria in n = 11, four criteria in n = 4 and five criteria in n = 2. No CPET fulfilled every primary and secondary criterion. Of the n = 24 CPETs terminated for clinical reasons (e.g., desaturation, cardiac contraindications), a total of n = 20 were verified as being maximal by reaching a number of primary or secondary criteria (1 criteria, n = 9; 2 criteria, n = 7; 3 criteria, n = 2; 4 criteria, n = 1). Of the remaining n = 4 that failed to present with any verification criteria, these were terminated by clinicians for desaturation (n = 3) and right bundle branch block (n = 1).

Therefore, these data indicate that 94% of performed CPETs were deemed valid. When solely considering participants who exercised to volitional exhaustion, this figure increases to 100% (42/42).

Repeatability of cardiopulmonary exercise testing

Of the n = 17 participants who successfully performed two CPETs within a 3-month period, all tests were deemed to be valid, and therefore repeatability data is determined from n = 17. Of the n = 12 participants who completed two CPETs within a 3-month period, one test was deemed to be invalid, and therefore repeatability data is determined from n = 11.

Statistically significant differences were seen between CPETs for all parameters of VO2 over a 3-month period, and most over a 6-month period, as shown in Table 3, with individual changes visualised in Fig. 4. Data in Table 3 also shows the mostly large correlation coefficients, and typical error associated with the repeatability of outcomes from each test, with this ranging from 12.7 to 25.5% over 3 months, and from 15.7 to 33.9% over 6 months, dependent on the variable being assessed.

Table 3 Changes in exercise responses in participants who successfully performed two valid cardiopulmonary exercise tests within a 3 month period (n = 17) and six month period (n = 11)
Fig. 4
figure 4

Individual changes in absolute VO2peak over 3- and 6-months. A, B: three month changes; C, D: six-month changes. Data is provided for all participants (A, C) and for those only to reach volitional exhaustion (B, D). VO2peak: peak oxygen uptake

When only considering participants who reached volitional exhaustion in both analysed pairs of CPETs, statistically significant changes in VO2peak are observed over a 3-month period, but not a 6 month period, although the latter is only representative of n = 5 participants (Table 4). Furthermore, the typical error when expressed as a coefficient of variation is lower in participants who reached volitional exhaustion (20% for absolute VO2peak, Table 4) than for the group which included all participants to produce a valid test (25% error for absolute VO2peak, Table 3).

Table 4 Changes in exercise responses in participants who successfully performed two valid cardiopulmonary exercise tests, whilst reaching volitional exhaustion, within a 3 month period (n = 10) and six month period (n = 5)

The mean bias in absolute VO2peak for the CPETs performed 3- and 6-months apart was − 0.21 L min−1 each, although the subsequent standard deviations and LoA differed, as shown in Fig. 5A, B. Furthermore, for participants who reached volitional exhaustion, this mean bias remained − 0.21 L min−1 at 3 months (Fig. 5C), with a similar limit of agreement in those to reach volitional exhaustion at 6 months (Fig. 5D).

Fig. 5
figure 5

Bland Altman plots displaying mean bias and limits of agreement for absolute VO2peak obtained from cardiopulmonary exercise tests. A: CPETs performed 3 months apart for n = 17 participants. B: CPETs performed 6 months apart for n = 11 participants. C: CPETs performed 3 months apart for n = 10 participants, who reached volitional exhaustion only. D: CPETs performed 6 months apart for n = 5 participants, who reached volitional exhaustion only. In each instance, difference (y-axis) presents data from CPET 2–CPET 1 (i.e., a value above zero indicates CPET 2 was higher than CPET 1 and therefore an increase in function has occurred). CPET: cardiopulmonary exercise test

Discussion

This study, for the first time, has fully characterised the validity of CPET, and repeatability of associated outcomes, in a cohort of patients with ILD. This work has shown that CPET is a valid tool, whereby all participants to reach volitional exhaustion during CPET provide a valid test; and novel data has been generated surrounding the repeatability and mean bias of exercise-based outcomes over a 3- and 6-month period, with particular reference to VO2peak.

Validity of cardiopulmonary exercise testing

In the first analyses of this study, focusing on the presence of a valid CPET, it was identified that 100% of participants to reach a volitional exhaustion produced a valid test, and 94% of all tests were deemed valid, even if the participant did not reach volitional exhaustion; a highly encouraging statistic. When this is combined with an expressed preference for CPET above and beyond traditional, static, pulmonary function testing [14], this highlights the ability of CPET to be integrated into respiratory services as an additional biomarker for diagnostic, prognostic, and rehabilitative reasons for an older patient group.

Traditionally, exercise studies have relied on the occurrence of a plateau in VO2 to determine a maximal, and therefore valid, effort [12, 27]. However, a plateau in VO2 does not always occur during incremental exercise, having been consistently evidenced in adults [28], children [29] and those with chronic respiratory disease [30, 31]. This is corroborated by the present study, whereby only ~ 20% of exercise tests exhibited a plateau in VO2. Reliance on this physiological artefact as the sole indicator of VO2max, and thus a maximal and valid test, is unwise as it can result in dismissal of perfectly valid and clinically useful data. Therefore, use of secondary criteria, such reaching predicted values for VO2peak, WRpeak, HRmax, maximal voluntary ventilation and the respiratory exchange ratio are also used to determine whether a maximal test has been achieved [12].

Within the present study, the majority of participants presented with an RER ≥ 1.05 and a HRmax exceeding their predicted value. In contrast, very few exceeded their predicted values for VO2peak and WRpeak, with this discrepancy likely due to the fact that predicted values for VO2peak and WRpeak are generated from healthy populations and thus patient populations will simply not reach these values because of their disease status. This does not discount using VO2peak and WRpeak values as secondary criteria, but the sole reliance upon these criteria is also not recommended, and therefore the entre CPET profile must be considered to evaluate whether a maximal effort has been reached. Further secondary criteria are available, such as changes in inspiratory capacity and blood lactate [12], and whilst these could not be assessed in the present study due to limitations of the study design, they provide a wider profile of physiological thresholds with which to determine a maximal effort.

Furthermore, it should be noted that the present study makes use of the most recent technical document for exercise testing in respiratory disease [12], whereas previous guidelines from over 15 years ago [27]—which are not wholly specific to those with chronic lung disease—utilise a slightly differing set of criteria to determine a maximal effort. The most notable differences include the presence of subjective markers of perceived effort, which are excluded from the ERS document [12]; as well as different critical threshold value for the RER. The recent ERS technical standards suggest a value of 1.05 for maximal exercise [12], whereas older from the American Thoracic Society (ATS) guidelines suggest a more conservative value of 1.15 [27]. However, within the present study, had a value of 1.15 been adopted, then only an additional four CPETs (two of which were terminated by clinicians due to desaturation) would be deemed invalid for failing to satisfy any verification criteria, thus resulting in 88% of all CPETs, and 95% of CPETs to reach volitional exhaustion, being deemed valid. Therefore, the authors believe such a change in a singular secondary verification criterion does not detract from the overall validity seen for CPET in ILD. Moreover, subjective ratings of perceived effort and dyspnoea were collated within this study, but are not formally reported as this was only undertaken to safely monitor individual changes throughout exercise, and were recorded on a 6–20 scale, and not a 0–10 scale [22] as suggested by the ATS. However, understanding the repeatability and continued clinical change in perceptual responses to exercise will be of use to both clinicians and patients alike, and should be considered for future research and incorporation into future practice.

To circumvent the reliance on secondary criteria, which are not always robust in determining whether a maximal effort has been reached [29, 32], the use of supramaximal verification testing has been proposed, whereby an additional exercise challenge is presented to participants, at a work rate that is typically in excess of that achieved during a ramp incremental test [33]. This has been shown to be effective in healthy [28] and diseased adults [31], although the feasibility and acceptability of this additional physical work in people with ILD remains unknown, and the development of an optimal protocol for performing CPET in this population should be undertaken [12].

Furthermore, whilst the majority of participants reached volitional exhaustion during their CPET, 36% of participants tests were terminated early and subsequently 4% of tests were deemed invalid due to a failure to produce sufficient maximal values. Therefore, to account for these situations whereby a clinical termination of a CPET may be required, submaximal exercise parameters should be investigated in those with ILD. Previous research in other clinical populations such as CF, heart failure and COPD, has focused on parameters of oxygen uptake efficiency (VO2/VE) and ventilatory drive (VE/VCO2) [34,35,36,37]—both of which are also important in ILD [11]. The advantage of such parameters is that these do not require a full CPET to be completed, nor volitional exhaustion to be reached, which is in contrast to parameters such as the GET. Whilst the GET is considered sub-maximal, it is normally characterised as a percentage of VO2peak, a process that requires participants to reach volitional exhaustion upon which to anchor the threshold, and therefore defeats the purpose of attempting to source a sub-maximal parameter. Therefore, exploration of truly sub-maximal parameters that are not dependent on maximal anchors, such as the oxygen uptake efficiency plateau [34, 38], is warranted in ILD to ascertain validity and prognostic utility.

Repeatability of cardiopulmonary exercise testing

Within the second analysis of this study, the repeatability of multiple outcome variables associated with CPET were established, notably that of VO2peak. Given the importance of VO2peak as a biomarker in ILD [7, 8], it is critical to be able to identify error and variation in such measures, to allow successful inferences regarding physiological decline and efficacy of therapeutic regimens to be made.

As there is a paucity of data on describing natural variation in pulmonary function [39], this study has provided valuable data in identifying variation in the equally valuable marker of VO2peak. This study subsequently identified that absolute VO2peak presented with a typical error, when expressed as a coefficient of variation, of 20% over a 3-month period in participants to reach volitional exhaustion. Whilst this error did reduce to ~ 10% over 6 months in those to reach volitional exhaustion, this analysis was only in a sample of n = 5 and must therefore be treated with caution.

Previous research has predominantly focused on younger, and healthy, individuals to identify repeatability of exercise testing [40, 41], with a repeatability of CPET being established in some clinical groups, such as those cystic fibrosis [24, 42] and pulmonary arterial hypertension [43], identifying lower rates of error in VO2peak than the present study. However, these studies were far shorter in length, ranging from 48-h repeatability to 6 weeks and undertaken in notably younger populations, and conditions which present with notably different pathophysiology, co-morbidities, risk factors and treatment profiles to ILD. The repeatability of CPET in ‘restrictive lung disease’ has only been evaluated once previously, observing a variation of ~ 5% in VO2 at peak exercise over a 28-day period [44]. However, this study from Marciniuk et al.[44] was published nearly 30 years ago, and was undertaken on only six patients, three with idiopathic pulmonary fibrosis (as per the current study), two with sarcoidosis and a single case of systemic sclerosis. Therefore, the results of this prior study should be interpreted with extreme caution, and even ignored when considering advances made in ILD management in the intervening decades. In utilising both a 3- and 6-month period of observation, we have utilised a time frame that is less burdensome than a smaller resolution (e.g., 1 week) that would require more frequent testing, whilst aligning this repeatability window with the schedule of routine clinical appointments that people with ILD have with their clinical support teams, dependent upon disease severity and trajectory [45]. As the research and clinical team were not manipulating any further treatment during this study period, all observed change would be due to disease progression and therefore the authors believe that the data presented in this study are more ecologically valid than the prior data over a 28-day time frame [44].

Further to reporting the repeatability of VO2peak in this population, it must also be acknowledged that the disease trajectory of ILD is markedly different to other chronic respiratory disease, having a median survival of only 2–3 years from diagnosis [46], unlike conditions such as chronic obstructive pulmonary disease or cystic fibrosis, whereby median survival times are ~ 10 and ~ 40 years from diagnosis and birth respectively [47, 48]. This therefore calls into question whether the observed variances in VO2peak are representative of ‘normal’ error that would ordinarily be observed between tests, and what is a genuine decline in physiological function. A number of participants within the present study had decrements in VO2peak of > 0.5 L min−1 over a 3-month period, and therefore distinguishing between genuine variation and disease-driven change is of importance in ILD management, and will require corroboration of the current results to establish the true repeatability of VO2peak. Moreover, further studies to assess repeatability over alternative timeframes (e.g., 1 week, 1 month) are warranted; aligning with the time course of potential health changes in this population.

Study considerations

There are a number of strengths to this study. Primarily, a robust protocol was utilised to elicit VO2peak in participants though gold-standard CPET; and a thorough analysis of contemporary, internationally developed, technical standards were used to establish validity of CPET in this patient group. Moreover, choice of cycle ergometry was optimal in this group, as is not only acceptable to patients with ILD [14], but is also likely less affected by dynamic stability. As people with ILD demonstrate impaired stability (e.g. stride length, contact time) during treadmill testing [49], cycling is a preferable modality as outcomes, such as VO2peak, will reflect a genuine cardiopulmonary function, instead of an ability to simply balance during the test.

In addition to the choice of methodology to elicit VO2peak, the mathematical calculation of VO2 plateaus, use of multiple techniques to assess repeatability (as opposed to relying on a single correlation for example), and determination of such repeatability over multiple, ecologically valid, time frames (3- and 6-months) in an under-reported group further add to the strengths and novelty of this investigation.

In respect to limitations, it is acknowledged that the unexpected variances in individual testing timelines (due to aforementioned clinical interruptions) have reduced the number of participants available for analysis, and the subsequent lack of time-aligned pulmonary function data does not permit comparison of repeatability of VO2peak and FVC. Moreover, the sole use of ramp-incremental testing could be enhanced via the use of supramaximal verification testing [33], and whilst this was not used, it does present a unique avenue for future research in this population. Despite limitations, the presented data remains novel and clinically useful for clinicians developing CPET services, and researchers utilising CPET as a tool for further investigations.

Summary

In conclusion, this study has for the first time, fully examined, characterised, and established the validity and reproducibility of CPET within a cohort of patients with ILD. This study has utilised gold-standard testing methodologies, and existing analytical methods used previously in respiratory disease, finding CPET to be valid, and repeatable in this patient group. This data will prove useful for clinicians and researchers when using CPET as a diagnostic and prognostic tool. In particular, to investigate the utility of sub-maximal parameters in this disease group and identify repeatability of exercise-based parameters in relation to static lung function measures currently used to track disease progression is warranted.