Introduction

Cardiopulmonary exercise testing (CPET) is the gold-standard for the evaluation of the cardiovascular, respiratory, and skeletal muscle systems during physical effort1,2,3. CPET has extensive applications in medicine, as it measures cardiometabolic and ventilatory markers that support the functional assessment of patients with heart failure4,5, coronary artery disease6, and pulmonary hypertension7, among others. CPET also plays a crucial role in sports sciences and professional training, as it provides metabolic information that enables the determination of ventilatory thresholds (VTs), key parameters in the prescription of exercise, and performance tracking of athletes8,9. While the impact of CPET in medicine and sports sciences is broad and well established, it requires a technical infrastructure and operation that restricts its use to the clinical and research settings10. This limitation, together with recent advances in physiological wearable sensors, represents a unique opportunity to create new wearable technology to assess cardiopulmonary performance at lower cost and outside complex facilities11,12.

VTs are physiological parameters widely used for the study and monitoring of physical fitness that are identified by the CPET software and subsequently corroborated by visual analysis by expert physiologists (gold-standard). The first ventilatory threshold (VT1), originally defined by Wasserman and McIlroy as anaerobic threshold (AT)13, denotes the point during exercise where ventilation increases faster than oxygen consumption (VO2). It also manifests as the first non-linear increase in MV during graded exercise tests14. The second ventilatory threshold (VT2), or also named as second AT or respiratory compensation point15, occurs during high exercise intensity, triggered by the overactivation of anaerobic metabolism needed to supply the energy demands. VT2 is observed during CPET as a secondary non-linear increase in MV, which concurrently arises with an increase in the carbon-dioxide ventilatory equivalent. These ventilatory changes are a consequence of metabolic compensations that are typically assessed from lactate tests9,16, which require blood sample collection that may not be compatible with mass testing or in-field tracking.

Based on VTs, Skinner and McLellan17 proposed a model compound of three phases (triphasic model) traditionally utilized by fitness coaches to prescribe exercise, considering the values’ range at which variables associated with physical effort are linked to VTs (i.e., heart rate (HR), blood lactate ([Lac], power, speed, etc.). Of these, HR is the most used, which shows a lineal increase as exercise progresses. In healthy subjects are expected to achieve a range of 130–150 beats·min-1 and 160–180 beats·min-1 in VT1 and VT2, respectively. Habitually, to improve cardiopulmonary exercise capacity subjects should complete a training where the first 4 weeks exercised at HR linked to VT1, from five to 8 weeks tailored at HR between VT1 and VT2, and finally the last three to 4 weeks train at HR above linked to VT218. However, these HR ranges vary in clinical contexts depending on the population studied19; as well as in extreme environmental conditions (i.e., altitude20 and hypoxia21) where HR is not a good indicator of the level of exercise intensity, motivating the exploration of other physiological variables involved in identifying VTs.

Currently, a handful of sensor technologies and advanced algorithms are being explored in the estimation of VT1 and VT2. Using electrocardiography (ECG) signals acquired from commercially available systems, the time instant when AT occurs has been associated with the dynamics of the fractal correlation index of heart rate variability (HRV) during incremental treadmill tests in a group of healthy subjects22. Recent studies have also considered HRV features extracted from single-lead ECG signals in the construction of neural-network classifiers to detect VT1, using data from a cohort of 260 patients with cardiovascular disease undertaking CPET at a hospital facility23. For the determination of VT2, solutions based on near-infrared spectroscopy (NIRS) signals from portable devices have reported high agreement with physiological data from gold-standard24. While these technologies offer promising alternatives, most of them require medical-grade sensors and equipment that restrict their application to clinical and research laboratories. Further, systems based on data-driven algorithms carry the disadvantage of being predictive only for the population they were trained for25. To our knowledge, there is no single wearable technology able to provide accurate VT1 and VT2 estimations.

In this study, we validate a respiratory wearable system and algorithms for estimating VTs during cycling-graded exercise tests in healthy subjects. Additionally, we assess the applicability of our system outside laboratory facilities by evaluating VTs from ramp tests on smart trainers in recreational athletes, demonstrating concordance with classical physiological variables commonly used for exercise prescription.

Results

Validation of respiratory rate measurements during exercise

The pilot study on five subjects with repeated measurements showed that the placement of the wearable sensor inside the face mask does not significantly affect the time evolution of RR, tidal volume (VT), minute ventilation (MV), and VO2 measurements by the ergospirometer (Fig. 1, Supplementary Material). To assess the agreement between respiratory wearable and ergospirometry during exercise in the prediction of RR, we evaluated 17 healthy adults physically active who completed an incremental cycling exercise protocol (ramp test). Breath-by-breath RR was measured simultaneously by the wearable sensor and ergoespirometer (Fig. 1). Table 1 shows the participant characteristics and main variables registered during exercise protocol.

Fig. 1: Experimental setup for wearable system validation.
figure 1

The ergospirometry face mask is installed on top of the sensor to obtain simultaneous measurements during tests. The wearable sensor broadcasts signals to a smartphone via Bluetooth, which records respiratory data for later analysis. The exercise protocol consisted in a ramp test with rest, warm-up, exercise, and cool-down stages. An informed consent for this image was obtained from the volunteer.

Table 1 Participant characteristics and main physiological variables during ramp tests (n = 17)

A comparison between these two signals for one representative subject is shown in Fig. 2a, where we observe a high-visual agreement. For this subject, the correlation between the wearable and ergoespirometer was very high (r = 0.990, p ≤ 0.001, Fig. 2b). Bland–Altman (B&A) analysis resulted in a low bias of 0.37 breaths per minute (bpm) with a 95% confidence interval of (–2.52, 3.27) bpm, see Fig. 2c.

Fig. 2: Wearable respiratory data and validation of RR against ergospirometry during a laboratory ramp for a representative subject (subject 8).
figure 2

a Visual comparison of RR time evolution, b Scatterplot with correlation analysis; c Bland–Altman plot.

Figure 3 shows the comparison of RRs between respiratory wearable and ergoespirometer during all exercise protocol of all participants (Fig. 3a); also- B&A analysis, showing a low bias of 0.32 bpm with 95% CI of (−4.32–5.03 bpm), see Fig. 3b.

Fig. 3: Comparison of RRs between respiratory wearable system and ergoespirometer during exercise in all participants.
figure 3

a Scatterplot with correlation analysis; b Bland–Altman plot.

Table 2 shows the bias, accuracy, and precision determined from B&A analysis for each subject in the study group. The group mean values of bias, precision, and accuracy were 0.33, 2.28, and 2.29 bpm, respectively. Bias in RR was found to be positive in most of the subjects. The Pearson correlation coefficient in the sample ranged from 0.91 to 0.98, with an average value of 0.95.

Table 2 Values of agreements between respiratory wearable and ergospirometry during exercise protocol in healthy adults physically active (n = 17)

Estimation and validation of ventilatory thresholds in laboratory conditions

Figure 4 shows the wearable RR evolution during a ramp test for a representative subject, the trilinear regression for the detection of respiratory breakpoints, and the VT1 and VT2 time points identified independently by two experienced physiologists who analyzed changes in MV and exercise-load during the protocol.

Fig. 4: Comparison between wearable-system RR signal and time evolution of the rate of oxygen consumption during a laboratory ramp test in subject 4.
figure 4

Ventilatory thresholds predicted by the algorithm display a high agreement with those determined by expert physiologists.

Respiratory breakpoints were evaluated in all subjects and were considered as predictors for the VTs. We compared these predictions with the VTs determined by expert physiologists in terms of the time-associated RR, heart rate (HR), and percentage of peak VO2 value (%VO2-peak). Figure 5 shows scatter plots reporting these comparisons. For the case of VT1, we observe a high correlation in terms of RR (r = 0.899, p ≤ 0.001) and HR (r = 0.880, p ≤ 0.001), and a moderate correlation in terms of %VO2-peak (r = 0.596, p = 0.001). To VT2, the correlation was moderate for RR (r = 0.745, p ≤ 0.006) and HR (r = 0.759, p ≤ 0.001), and moderate for %VO2-peak (r = 0.611, p = 0.009).

Fig. 5: Correlation plots for the validation of VT wearable-system predictions.
figure 5

For VT1: (a) RR (bpm); (b) HR (bpm).; and (c) %VO2–peak. For VT2: (d) RR (bpm); (e) HR (bpm); and (f) %VO2–peak.

Figure 6 shows the B&A analysis for the predictions of VT1 and VT2 in the sample group. For the case of VT1, the comparison of the wearable and ergoespirometer in terms of RR resulted in a bias of –1.76 bpm and limits of agreement (LoA) of from –6.49 to 2.96 bpm. For HR, the bias was –6.59 bpm and the LoA from –24.90 to 11.70 bpm. For %VO2-peak measurements, the bias was –8.85% with LoA from –30.17 to 12.45%. When analyzing the prediction agreement for VT2, RR resulted in a bias of 0.76 bpm with LoA from –10.11 to 10.64 bpm, with all subject data inside LoA. To HR, the bias was –0.88 bpm with LoA from –5.47 to 3.71 bpm, and one participant was outside LoA; and finally, to %VO2-peak, the bias was –2.88% and all subjects data were inside LoA from –19.30 to 13.76%.

Fig. 6: Bland–Altman plots showing the comparison between wearable-system and expert predictions of ventilatory thresholds.
figure 6

For VT1: (a) RR; (b) HR (bpm); and (c) %VO2-peak. For VT2: (d) RR; (e) HR (bpm); and (f) %VO2-peak.

Figure 7 shows a graphical comparison of RR, HR, and %VO2-peak measured at VT1 and VT2 time points determined by experts from ergospirometry analysis and by the wearable system. No significant differences were found in these physiological parameters, except for %VO2-peak at VT1.

Fig. 7: Group distributions of ventilatory thresholds determined during laboratory ramp tests by wearable system and gold standard.
figure 7

a RR; b HR; and c %VO2–peak.

Study of ventilatory thresholds in a population of recreational athletes using the wearable system

A total of 107 recreational athletes (20 females and 87 males, range age 31.1 ± 10.5 years) completed ramp tests outside the laboratory using our wearable system. Figure 8a, c shows the distributions of VT1 and VT2 expressed in terms of the percentage of the peak RR value achieved during the test (%RR–peak) for this population, respectively. The %RR–peak group values at VT1 and VT2 were 42.4 ± 12.6% and 58.3 ± 13.5%, respectively. Figure 8b, d shows the distributions of VT1 and VT2 expressed in terms of the percentage of the maximum theoretical heart rate (%HR–max.), respectively. The %HR–max group values at VT1 and VT2 were 71.9 ± 10.0% and 88.2 ± 8.4%, respectively.

Fig. 8: Population distribution of ventilatory thresholds in 107 recreational athletes evaluated with the respiratory wearable system.
figure 8

VTs were determined using the wearable system in ramp tests performed in training environments (outside the laboratory). For VT1, the distribution is shown in terms of (a) %RR–peak, and (b) %HR–max. For VT2, the distribution is shown in terms of (c) %RR–peak, and (d) %HR–max.

Discussion

Our wearable system achieves high accuracy in the continuous estimation of RR during incremental exercise, as evidenced when comparing the sensor predictions with ergospirometry airflow measurements (Fig. 2). RR wearable predictions were consistently higher than those measured by ergospirometry, as reflected by the positive bias in RR (Table 1). However, we note that in all subjects RR bias was always lower than 1 bpm, which can be considered almost negligible as 1 bpm represents the smallest unit employed in practical applications of RR monitoring. Interestingly, the performance metrics achieved in this study result in a similar mean error and dispersion to those obtained in healthy subjects breathing under resting conditions26. This result suggests that temperature-based sensors offer a robust approach for monitoring RR in a wide range of respiratory frequency and effort intensity (rest to maximal effort) on athletes training on a stationary bike. While several wearable sensors have been reported in the literature for monitoring RR in resting conditions27,28, and a few during low-stress walking activities29, systems that track RR during medium to high intensity exercise remain underexplored. One recent attempt considers acoustic sensors installed in a mouthguard during moderate exercise. The reported RMSD was 11.28% for a range of RR in 7–19 bpm as measured in 4 subjects under study. Higher activity intensity was associated with lower signal-to-noise ratio, which poses a challenge in the determination of RR from acoustic signals. A different approach is the use of face masks equipped with thermistors30. This mask, which fully covers the mouth and nose, funnels the airflow into a thermistor located at the front open end of the mask. A validation study in 10 male cyclists results in breath-by-breath RR overall bias and precision of –0.05 bpm and 3.37 bpm, respectively. Our overall RR metrics compare well with these values (Table 2) while providing a mask-free experience to the athlete that offers higher comfort during exercise.

Our wearable system and breakpoint algorithm achieve a high accuracy in the prediction of AT and RCT from ramp tests, with Pearson correlation values of 0.880 and 0.759, respectively, when measured in terms of HR (Fig. 5b,e). Further, the mean error for VT1 and VT2 measured in terms of respiratory rate are below 2 bpm (Fig. 6a, d), which is below the threshold considered relevant for medical applications31. In addition, no significant differences between the gold-standard method and the wearable system predictions were found when comparing group mean values of VTs (Fig. 7). Many of these performance metrics are in the range of those achieved by ECG-based solutions22,23 for VT1 and by NIRS-based technologies for VT224,32, with the convenience of providing both VTs using only one wearable technology. A key advantage of estimating VT1 and VT2 from respiratory signals is the direct and well-established connection between metabolic changes and non-linear changes in respiratory flow during exercise33, which are not necessarily manifested by the cardiac or muscular systems9. Based on these grounds, we anticipate that our system can be predictive not only on the population considered in this study, but their reach can extend to other cohorts in non-controlled environmental conditions that may affect VTs34,35,36. This feature may prove advantageous when compared to algorithms that rely on large amounts of labeled data, such as neural networks or other deep learning algorithms, which may not be available for specific populations or environmental conditions, where little or no data is available for algorithm training.

One promising application of the respiratory wearable system demonstrated in this work is the evaluation of ventilatory thresholds in a large population outside of the laboratory. While the validation of individual estimations in this setting is unfeasible, the empirical population distributions of VT1 and VT2 tested with our wearable system can be compared to previous laboratory studies on large groups of volunteers. A recent study in a group of 100 healthy adults with varying levels of fitness showed that the sample average aerobic and anaerobic lactate thresholds were found at 74.9% and 89.0% of the maximum heart rate16. These values compare extremely well with those found for average values of VT1 and VT2 in our study: 72.4% and 88.7% of the maximum heart rate, respectively. Further, unimodal distributions around these values are observed in both studies. In addition to these heart-rate population metrics, our work provides distributions and average values of VT1 and VT2 expressed in terms of the peak respiratory rate, see Fig. 8. We note that the percentage of peak respiratory rate highly correlates with the rating perceived exertion index, which is a widely used index used for the subjective assessment of physical intensity37. Thus, monitoring RR during training may provide an objective measurement of perceived exertion for classifying physical intensity, particularly if ventilatory thresholds can be determined in terms of RR.

This work represents the first validation of a wearable system for ventilatory threshold evaluation, and as such it offers several opportunities for future developments and applications. First, we note that while the mean error in estimating VT1 and VT2 was relatively low when measured in terms of RR and HR, the bias achieved in terms of %VO2-peak may be large for certain applications. Further, the dispersion in these parameters, measured in terms of precision and limits of agreement, can be higher than that reported by other solutions22,32. Future developments of the wearable system could reduce this variability by developing a minute-ventilation estimator based on the sensor signal accurate for exercise conditions and then performing a breakpoint analysis on such volumetric data9. Despite this potential improvement, we note that dispersion in VTs determination can be high in the gold-standard method, as significant systematic differences among expert evaluators using the visual method have been reported in the literature38. Second, we remark that the proposed VTs detection algorithm relies on a triphasic exercise model of MV during ramp tests17. While this physiological model applies to healthy trained subjects, it may not be experienced by undertrained participants or patients with chronic cardiorespiratory diseases19,39,40,41. In these low-aerobic capacity subjects, typically VT2 is not attained during ramp tests42. Future efforts may take this modified intensity model into consideration to develop enhanced algorithms for the determination of VT1 using the wearable system. Third, while our population study was performed outside the laboratory, it typically consisted of indoor ramp tests in gymnasiums and home setups. One interesting opportunity is the assessment of ventilatory thresholds in adverse environmental conditions, such as high altitude or extreme temperatures, where standard variables like heart rate have shown to lose accuracy in estimating exercise intensity20,21. This aspect is particularly useful in sports such as climbing, cycling, mountaineers, triathletes, and trail runners, where the extensive competition time and extreme environmental conditions require accurate estimation of effort intensity to achieve optimal sports performance. Finally, future efforts should further explore the evaluation of ventilatory thresholds for clinical purposes. One such opportunity applies to physical rehabilitation programs, where the accurate determination of effort intensity is critical in the recovery of pulmonary functional capacity in survivors of severe COVID-19 where oxygen uptake and physiological responses to exercise are impaired43,44. An adequate exercise prescription, based in determination of VT1 and VT2 as markers of effort intensity also becomes critical in patients undergoing pharmacological treatment, where heart rate is affected (e.g., negative chronotropes drugs or beta-blockers), and therefore cannot be used to classify exercise intensity45,46,47. The same applies to patients with significant peripheral atrophy or with high adipose tissue thickness where non-invasive monitoring of muscle oxygenation loses accuracy48,49, or patients experiencing exacerbated symptoms of dyspnea and muscular fatigue50,51.

Methods

Wearable respiratory sensor and ventilatory-threshold detection algorithm

The respiratory wearable sensor is built upon a wearable temperature-based sensor previously employed in predicting respiratory rate (RR) and MV in subjects at rest26,52. It consists of a small and lightweight case (length 30 mm, width 16 mm, height 20 mm; weight 8 g) with two thermal sensors directly placed under the nose and in front of the mouth, see Fig. 9.

Fig. 9
figure 9

Respiratory wearable sensor: device dimensions and sensing components.

Nasal and oral sensors measure temperature changes that result from nasal and oral airflow impinging the sensor plates, respectively. These temperature signals are broadcasted via Bluetooth to an external device (smartphone) which registers these signals for offline analysis. Temperature time series are analyzed using a mean-cross algorithm to estimate breath-by-breath RR as detailed in a previous contribution26. The breath-by-breath RR dataset is then interpolated every 0.04 s to create a uniformly spaced time series with sampling frequency of 25 Hz. This RR time series is exported from the smartphone for further offline analysis.

During offline analysis, and to enable a fair comparison with ergospirometry data, the raw RR time series obtained from the wearable system for each subject was processed using a moving average filter (window size of 1 s), from which a subsampled time series with RR measurements every 5 s was created. This processed RR time series was employed in the validation of RR predictions and in the estimation of VTs.

The determination of VTs was performed by detecting abrupt changes in the respiratory flow pattern during a ramp test, measured as breakpoints in the RR time evolution33. To this end, the RR time series is first filtered using a third order SavGol filter (SciPy v1.10.1) with a window size of 30 s to remove short-term fluctuations. This choice of filter parameters allows for capturing inflection points inside the window size, as well as ensuring that enough data points are approximated to prevent overfitting. To detect breakpoints, we performed a trilinear segmented regression using the filtered RR time series. We consider the following regression function:

$$R{R}_{{reg}}\left(t\right)=\left\{\begin{array}{cc}{a}_{0}+{b}_{0}t & 0\, <\, t\le {t}_{1}\\ {a}_{1}+{b}_{1}(t-{t}_{1}) & {t}_{1} < t\le {t}_{2}\\ {a}_{2}+{b}_{2}(t-{t}_{2}) & {t}_{2} < t\le T\end{array}\right.$$
(1)

where T is the final time at which the ramp test stops, and \({a}_{0},{b}_{0},{a}_{1},{b}_{1},{t}_{1},{a}_{2},{b}_{2},\,{t}_{2}\) are parameters to be determined. In particular, pairs \(({t}_{1},\,{a}_{1})\) and (\({t}_{2},{a}_{2})\) in Eq. (1) represent the time and RR tuples where breakpoints are located, which we use to predict VT1 and VT2. Assuming the regression function is continuous at breakpoints located at t1 and t2, we find expressions for a1 and a2 in terms of an independent set of parameters given by \({a}_{0},{b}_{0},\,{b}_{1},{t}_{1},{b}_{2},\,{t}_{2}\). This last set of unknown parameters is determined by solving a least-squares fit of the regression function to the filtered RR time series. This piecewise regression procedure is implemented by the Python pwlf library (v2.2.1), which we used in our calculations.

Laboratory cardiopulmonary exercise tests in healthy subjects

Seventeen (n = 17) healthy adults (age = 19–44 years) were recruited in a non-probability convenience sampling through social network advertising to perform an incremental CPET to study ventilatory thresholds. Inclusion criteria were physically active (≥150 min of moderate or ≥75 min of vigorous physical activity by week), with normal body mass index (20–25 kg · m–2). Exclusion criteria were history of respiratory, cardiovascular, metabolic, musculoskeletal, or neoplastic diseases or without any infectious or inflammatory process at least 2 weeks prior to the beginning of the study. All participants were informed orally and verbally of the purpose, protocol, and procedures before informed consent was obtained. This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Pontificia Universidad Católica de Chile (Institutional Review Board, protocol number ID210125002, date of approval: April 8, 2021). Informed consents were obtained from all participants in the study. The authors affirm that human research participants provided informed consent, for publication of the images in Fig. 1.

All experiments were carried out in the Laboratory of Exercise Physiology at Alemana Sport, Santiago, Chile, using a commercial CPET bicycle ergometer (MasterScreen™ CPX, Jaeger®, Germany). Procedures were performed under constant laboratory environmental conditions (temperature 20 ± 2 °C; relative humidity, 40 ± 2%) and within a similar time frame (from 9:00 to 14:00 h). Participants were asked to avoid physical activities for 24 h before the measurements and to avoid alcohol, caffeine, and other stimulants and food for at least 3 h prior to the evaluations.

Before the tests, subjects were equipped with the respiratory wearable sensor and with an ergospirometry facial mask. Exhaled gases (VO2 and VCO2) and ventilatory variables (VE, TV, and RR) were measured using an airflow turbine connected to the facial mask under standard pressure dry air temperature (SPTD). Simultaneously, the respiratory wearable sensor registered continuous RR throughout the test. Heart rate and pulse oxygen saturation were also continuously monitored during exercise using a cardiothoracic band and an oximeter, respectively. The exercise protocol consisted of an initial register of 60 s to allow for sensor synchronization, followed by a 2-min baseline phase (0 watts (W), rest-phase), followed by 3-min of warm-up period at 50-W, and then followed by the exercise phase with 80 s intervals and initial load of 100-W, see Fig. 1 for a schematic. The load was increased by 20-W after each interval. Participants were requested to maintain a cadence between 80 and 100 rpm during protocol. A cool-down phase of 3-min at 60-W was performed during the final stage of the test (Fig. 1).

To assess the effect of the wearable sensor in ergospirometry measurements, a pilot study considering five volunteers was carried out, where maximal tests were conducted twice on the same subject, with and without the wearable sensor. The comparison of the RR, VT, VE, and VO2 time evolution between these cases is reported in Supplementary Fig. 1.

Using the ventilatory variables and exhaled gases values recorded by the ergoespirometer during exercise protocol, two blinded experienced physiologist researchers determined VT1 and VT2. In case of discrepancy, the opinion of a third blinded research was possible, accepting as the definite criterion that point at which at least two evaluators agreed53. Determination of VTs were based on the loss of linearity between VE and load of each interval of load-work during protocol. Values correspond to the mean of the last 30-s of each interval load15,17,54. For elucidating this aim about if changes of RR-wearable during exercise could identify VTs, were compared the values of HR and %VO2-peak at which RRs´ trends registered by respiratory wearable determined VTs with those registered at which VTs were determined by exhaled gases analyses using ergospirometer.

Ramp tests using the wearable system outside the laboratory

The respiratory wearable system was used by 107 recreational cyclists (20 women) during ramp tests on bicycles connected to commercial smart trainers. All tests were conducted in training environments under uncontrolled conditions (e.g., gyms or at home). Tests were conducted using a virtual cycling platform (Zwift Inc., California, USA) with bikes connected to an electronically braked cycle ergometer indoor trainer device (KICKR™, Wahoo Fitness, Atlanta, GA, USA). The protocol consisted of 1-min at resting phase where the wearable was turn on, 3-min of warm-up phase (50-W), and exercise phase that consists of intervals of 80 s, starting at 100 W and increased 20-W by each interval period, until voluntary exhaustion. At the end, a cool-down phase of 3-min at 60-W. The directed cadence was between 80 to 100 rpm. Each subject was assisted remotely in the installation of the respiratory wearable sensor and in connecting the sensor to the app on a smartphone. After the test, the RR signals from the wearable sensor and cardiac and power data from the virtual cycling platform were collected for offline analysis. The resulting estimates of VT1 and VT2 were computed in terms of HR and RR at the associated time instants and was normalized by the estimated maximum heart rate (equal to 220 minus age) and peak RR, respectively.

Validation metrics

We performed Bland–Altmann and correlation analyses to assess the agreement and association between the wearable system predictions and the gold-standard values. To compare our results with previously reported metrics, we also evaluated the performance of the wearable-system predictions in terms of bias, precision, accuracy, and the Pearson correlation coefficient52. In brief, assuming there are N subjects, for the i-th subject we define \({\hat{X}}_{i}\) and Xi as the predicted and gold-standard values, respectively. Then, absolute bias, precision, and accuracy are determined as follows:

$${Bias}=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}({\hat{X}}_{i}-{X}_{i})$$
(2)
$${Precision}=\sqrt{\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{{(\hat{X}}_{i}-{X}_{i}-{Bias})}^{2}}$$
(3)
$${Accuracy}=\sqrt{\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{{(\hat{X}}_{i}-{X}_{i})}^{2}}$$
(4)

For the analysis of time series data in a single individual (i.e., validation of RR), index i in Eqs. (2)–(4) denotes a time instant and N denotes the total number of measurements in that individual. From their definitions, we interpret bias and precision as the average error and standard deviation of the error of the predicted values, respectively. Further, accuracy can be related to the root-mean-square error of the prediction.

For the validation of VTs, we determined the RR, HR, and %VO2-peak values at VT1 and VT2 and performed Bland–Altman and correlation analyses based on these physiological parameters.

Statistical analysis

For intergroup comparison, normality was assessed by the Shapiro–Wilk test, and mean differences were tested using the unpaired t-test. Numerical results are expressed in terms of mean and standard deviation.