1 Introduction

Athletes’ monitoring has been one of the key aspects of the training process to control the balance between external and internal loads. In this manner, professionals in the field can maximize performance by reducing the potential risks of illness and injury [1]. The continuous innovations in monitoring technologies have created a wide variety of tools that are accessible nowadays to everyone. For this purpose, heart rate variability (HRV) has been one of the most used metrics due to its non-invasive, inexpensive, and time-efficiency [1]. HRV reflects the autonomic nervous system (ANS) modulation by the variation of the inter-beat intervals commonly expressed as time-domain (RMSSD, SDNN, PNN50%), frequency-domain (VLF, LF, HF), and nonlinear variables (SD1, SD2) [2]. Each of them corresponds to a branch of the ANS, either the sympathetic (SNS) or the parasympathetic activity (PNS) [3]. From all of these options, Buccheit [3] and Plews et al. [4] recommended practitioners to use the natural logarithm of the time domain called the square root of the mean of the sum of the squares of differences between adjacent normal R-R intervals (Ln RMSSD) for a couple of reasons: (i) it can be captured by a short record, (ii) and when compared the frequency domain, the Ln RMSSD is less sensitive to breathing noises what makes it ideal for daily monitoring without requiring a controlled breathing pattern. From a baseline, changes in Ln RMSSD can be interpreted mainly as coping well with training when the activity of the PNS increases, and as accumulated fatigue when Ln RMSSD decreases as a reflection of the increased SNS activity [3].

Some methodological and practical considerations have been remarked as crucial to get valid and reliable HRV data in a monitoring context [3]. Given the high sensitivity of the ANS to environmental conditions, resting HRV measurements must be conducted under standardized conditions [3]. Thus, nightly records during the first hours of sleep, where the third phase called the slow-wave sleep (SWS) mostly occurs, constitute a theoretically ideal moment for HRV monitoring [3]. However, a late-night exercise on the previous day and a stage sleeping fragmentation can alter the Ln RMSSD value despite perceiving a well rest [3]. For this reason, upon awakening HRV measurement can be a more appropriate method if a standardized environment is applicated such as same bed, same hour, and no daily activity done before measure [3].

A wide variety of sport watches and mobile phones apps have offered this HRV tracking function. One of the most used by the sports community is the Polar Nightly Recharge™ function present in Polar sport watches [5]. By the technique of photoplethysmography (PPG), the ANS activity can be monitored by the pulse rate variability (PRV) [6] during the first four hours of sleep [5]. On the other hand, electrocardiogram R waves have been the principal source for measuring HRV (i.e., gold-standard measurement) and it can be easily tracked nowadays in a valid way by a chest strap [7]. Another crucial aspect to consider is the signal processing since any ectopic beat can alter significantly the HRV value [3]. The Kubios software available in a mobile phone app has shown to be a valid tool in this aspect by interpolating the identified artefacts using a cubic spline interpolation [8]. Thus, the combination of a chest strap with the app Kubios is also a common practice in the sport community to track HRV upon awakening.

Since both methods are widely used by athletes and practitioners, it is necessary to compare both systems to clarify their potential differences and similarities. Due to the aforementioned disparities between methods (i.e., night vs. upon awakening measure, pulse vs. R waves, and Polar vs. Kubios software signal processing) and the individuality of HRV values, the most appropriate way to determine their level of agreement is to analyze the individual response of each athlete along a training program with both devices.

To this end, the weekly Ln RMSSD fluctuation represents more consistently than daily values any change of the cardiac ANS modulation [9], and by the method proposed by Batterham and Hopkins [10] called magnitude-based inference (MBI), the ANS modulation can be determined by the smallest worthwhile change (SWC) at each device. This procedure consists of determining a baseline HRV value by tracking a light training period and then comparing it to the following microcycles of the training program. Based on the magnitude and variability of the change with respect to the baseline value where the SWC comes from, Batterham and Hopkins [10] proposed to interpret any change, in this case, an increase or decrease of the PNS activity as indicated in Table 1 to determine how the athlete is coping with training. This novel approach would allow a comparison between both devices despite their different tracking procedures and determine their possible similarities based on their interpretations of the change.

Table 1 Magnitude-based inference method change interpretation [10]

Based on the aforementioned, the purpose of this study was to determine the level of agreement between different methods of HRV monitoring, represented in two different systems (i.e., the Polar Nightly Recharge™ function present in Polar sport watches and the Polar H10 chest strap synchronized with the Kubios app) in the weekly Ln RMSSD changes during an eight-week concurrent training program in recreational athletes. Based on the different methodological considerations (i.e., night recording vs. upon awakening measure, pulse vs. R waves, and software signal processing), we hypothesized that HRV data obtained from both devices would be correlated but they would differ in absolute values which would influence the interpretation of the weekly Ln RMSSD changes.

2 Methods

All participants were informed about the research purpose and procedures of the study before signing a written informed consent form. The study protocol adhered to the tenets of the Declaration of Helsinki and was approved by the institutional review board (ref. 2546/CEIH). To address the aim of determining the level of agreement between different methods of HRV monitoring, the data collection, process, and comparison were executed in summary as follow:

Data collection: Participants’ HRV (RMSSD) were monitored during an 8-week training program using two different systems (i.e., the Polar Nightly Recharge™ function present in Polar sport watches and the Polar H10 chest strap synchronized with the Kubios app).

Data processing: RMSSD data were transformed to Ln RMSSD and averaged over each week (Weekly Ln RMSSD) of the training program following the recommendation of Plews et al. [4, 9].

Data comparison: The relationship, absolute agreement, systematic bias, and random error between devices were determined through Pearson correlation, intra-class correlation coefficients (ICC), and Bland Altman plots for the 8 weeks dataset (Ln RMSSD). The weekly Ln RMSSD changes interpretation during the eight-week concurrent training program was also determined through the MBI in both devices and established in a cross table.

2.1 Study design

All participants completed an eight-week concurrent training program composed of three sessions of 60–90 min per week with all-out sprint intervals and squats and bench press strength exercises (Fig. 1). Individualized loads were applied according to a one-maximum repetition test for the aforementioned exercises. The training program progresses every two weeks in volume for the all-out sprint intervals and in load for the strength exercises (Fig. 1). During the training program, athletes were asked to self-monitor their HRV with the Nightly Recharge™ function present in the Polar Ignite sport watch (Version number: 2.1.5) and the with the Polar H10 chest strap (Version number: 3.1.1) synchronized to the app Kubios (Version number: 1.1.10). The first two weeks of the training program were used to determine the SWC (mean ± 0.2SD) where the lightest micro-cycle load was applied, to determine in this way, the weekly Ln RMSSD changes on the following six weeks of the training program (Fig. 1).

Fig. 1
figure 1

Training program diagram followed to test the level of agreement in the HRV monitoring between the Polar H10 chest strap synchronized with the Kubios app and the Polar Nightly Recharge™ function present in the Polar Ignite sport watch. SWC Smallest worthwhile change, SIT Sprint interval training, BP Bench press, SQ Squat, 1RM One-repetition maximum, RIR Repetitions in reserve, RMSSD Root of the mean of the sum of the squares of differences between adjacent normal R–R intervals, Ln RMSSD natural logarithm of RMSSD, SD standard deviation, Rec recovery

2.2 Participants

A group of 18 recreational athletes was initially enrolled. After checking the compliance with the intervention period, seven of them were discarded as they did not conduct at least three measures per week as Plews et al. [11] recommended to have a valid representation of one-week daily monitoring. Finally, a group of 11 recreational athletes, five females and six males (age: 21.73 ± 1.49 years, body mass: 70.74 ± 12.88 kg, height: 173 ± 7.6 cm, body mass index: 23.41 ± 2.99 kg/m2) were involved in their weekly HRV changes analysis. A post hoc analysis of the achieved power for this sample size was conducted (G*Power software vs. 3.1), given α = 0.05, r = 0.7, total sample size = 11, statistical test = Correlation bivariate normal model. This analysis revealed a medium to large power (0.7). All participants met the following inclusion criteria: (i) to not suffer any injuries within the six months before data collection, (ii) to be physically active according to the guidelines of the ACSM [12].

2.3 Data collection and processing

The HRV was self-monitored by athletes with the Nightly Recharge™ function present in the Polar Ignite sport watch and with the Polar H10 chest strap synchronized to the app Kubios. They were asked to place the Polar Ignite sport watch on their non-dominant arm wrist and to adjust the strap until a tight feeling to avoid any motion artifacts [13]. Athletes were also instructed to leave the chest strap on the bedside to avoid any disturbance to conduct the upon awakening measure with the Polar H10 chest strap synchronized to the app Kubios in a supine position with a short-term record of five minutes [14]. On an excel shared spreadsheet, participants registered their daily HRV data corresponding to the (RMSSD) value from both devices along the training program which was then transformed into (Ln RMSSD) and averaged for each week (Weekly Ln RMSSD) following the recommendations of Plews et al. [4, 9].

2.4 Statistical analysis

Descriptive statistics are represented as mean (SD). The normal distribution of data and homogeneity of variances were confirmed through the Shapiro–Wilk test and Levene’s tests, respectively (p > 0.05). The relationship between the HRV data (Ln RMSSD) obtained through both devices was quantified through the Pearson correlation coefficient. The criteria for interpreting the magnitude of the coefficients were as follows: trivial (0.00–0.09), small (0.10–0.29), moderate (0.30–0.49), large (0.50–0.69), very large (0.70–0.89), nearly perfect (0.90–0.99), and perfect (1.00) [15]. The Bland–Altman method was used to determine the systematic bias and random error (mean difference ± 1.96 SD) [16]. ICC was also calculated according to the guidelines reported by Koo and Li [17]. A two-way mixed effects model was performed following a mean of two raters type (Kubios-H10 vs. Polar Ignite) to determine their absolute agreement. The 95% confidence intervals of the ICC values were calculated, and interpretations were based on the following classifications: < 0.50 (poor reliability), 0.50–0.75 (moderate reliability), 0.75–0.90 (good reliability), and > 0.90 (excellent reliability) [17]. The level of significance used was p < 0.05. Data analysis was performed using the SPSS software (version 21, SPSS Inc., Chicago, IL, USA). The MBI method was used to determine the weekly HRV change interpretation of each participant with each device [10]. The data collected over the first two weeks of the training program with each device were used to establish the SWC of each participant (mean ± 0.2SD) [18]. Using Hopkin’s spreadsheet (http://www.sportsci.org) [19], the likelihood of a true change was interpreted as indicated in Table 1 [10]. If there was a possibly decrease and increase, the interpretation was established as unclear [10]. Lastly, the magnitude-based weekly Ln RMSSD changes interpretation of each participant with each device was summarized on a cross table to determine the percentage of the agreement for the 66 weeks evaluated.

3 Results

A very large correlation (r = 0.714) and good reliability (ICC = 0.817) were obtained between devices through the entire training program (Table 2). A higher coefficient of variation was found in the Polar H10-Kubios (9.93%) compared to the Polar Ignite (7.71%) for the eight weeks training program. In Bland–Altman plots (Fig. 2), males and females presented a similar systematic bias and random error of − 0.05 ± 0.33 ms and − 0.01 ± 0.28 ms, respectively.

Table 2 Pearson (r), intra-class correlation coefficients (ICCs), and coefficient of variation (CV) of the heart rate variability (Ln RMSSD, in ms) data obtained from two different devices (i.e. Kubios-H10 vs. Polar Ignite) during an eight week concurrent training program
Fig. 2
figure 2

Bland–Altman plots with the mean differences of heart rate variability (Ln RMSSD, in ms) obtained through the eight-week concurrent training program from two different devices (Kubios-H10 vs. Polar Ignite). The plots include the mean difference (dotted line), 95% limits of agreement (dashed lined), and the regression line (solid line). Systematic bias and Pearson's determination coefficient (r2) are also presented

Relative to the individual HRV analysis, Table 3 summarizes the magnitude-based weekly Ln RMSSD changes of each participant with each device during 8-week training program. Based on the SWC obtained from the initial 2 weeks (mean ± 0.2SD), the area where the 90% of confidence interval overlaps determines the likelihood of the change (Fig. 3). In the example exposed, devices only agree in week five in the direction of the change but with a discordance in the likelihood (Fig. 3).

Table 3 Magnitude-based weekly Ln RMSSD changes during an eight-week concurrent training program determined by two different devices (i.e. Kubios-H10 vs. Polar Ignite)
Fig. 3
figure 3

Weekly Ln RMSSD changes of a random participant tracked with the Polar H10 chest strap synchronized with Kubios app (a) and with the Polar Nightly Recharge™ function present in the Polar Ignite sport watch (b). The solid line represents the zero line of the SWC to indicate unclear changes when 90% CI overlaps. The dashed lines represent the SWC (mean ± 0.2SD) [18]. The dotted line represents the seven days rolling average

Table 4 summarizes the magnitude-based weekly Ln RMSSD changes interpretation of each participant with each device. The white diagonal squares represent the agreement between devices for the likelihood and the direction of the change. The dark grey squares surrounding the white diagonal indicate the weeks where devices agree in direction but differ in the likelihood of the change. The absolute agreement between devices for the 66 weeks evaluated was 13.6%, meanwhile, the agreement in the direction of the change but with discordance in likelihood was 25.8%.

Table 4 Magnitude-based weekly Ln RMSSD changes interpretation during an eight-week concurrent training program determined by two different devices (i.e. Kubios-H10 vs. Polar Ignite)

4 Discussion

Supporting our main hypothesis, the results reveal that despite the very large correlation (r = 0.714) and good reliability (ICC = 0.817) obtained between devices through the entire training program, the MBI method applied to determine the weekly Ln RMSSD changes from the 66 weeks monitored showed an absolute agreement of 13.6%, and an agreement in the direction of the change but with discordance in the likelihood of 25.8% (Table 3). These systems use different methods to get the HRV data and some methodological differences might explain the low level of agreement between devices when the MBI method was applied. Whereas the Polar Nightly Recharge™ records at the first four sleeping hours corresponding to SWS [5], the Polar H10 synchronized with the Kubios app is applied upon awakening. Regarding this, Hynynen et al. [20] found in a group of 12 overtraining female and male athletes, that the ANS modulation was disturbed upon awakening measures but not during sleep with respect to the control group. A potential reason is that awakening itself involves a sympathetic stimulus, which could represent in a better way the athlete´s state than the sleeping record [20]. Another cause of disparity could be the sleep stage distribution as it can alter the Ln RMSSD value despite having a perceived well rest [3]. Since the Polar Nightly Recharge™ record at the first four sleeping hours, a greater percentage time of the third sleep stage (SWS) during them could involve greater Ln RMSSD values compared to other nights with the same amount of stage time but in different time frames [3]. On the other hand, the ANS modulation during these first sleeping hours can be also affected by the late-night exercise [21]. In our case, some athletes conducted their training sessions between 20:00 and 21:00 h which could have involved an alteration of the measure for those days.

Another important point comes from the data source. Polar devices include the PPG Polar Precision Prime™ system that detects the pulse waves of the cardiac output, meanwhile, their Polar H10 chest strap receives the R waves of the heartbeat electrical impulses. The main source of error of PPG devices are the motion artefacts that are mainly present in physical activities, meanwhile at rest, they present a reasonable accuracy [22]. Since these devices have been tested in different running intensities showing a high level of agreement [23, 24], this potential source of inaccuracy seems to be well controlled with just a correct strap adjustment [13]. The Polar H10 chest strap has shown to be a valid tool compared to the electrocardiogram for measuring the inter-beats intervals [7] and the level of agreement between PRV and HRV, obtained from pulse and R waves, has been tested in the early morning and during different sleep stages showing a high level of agreement [4, 25]. In particular, a recent study has determined the level of agreement between the Polar Nightly Recharge™ function and the Polar H10 chest strap in the Ln RMSSD monitoring during sleep in a group of recreationally-trained athletes [26]. Both devices recorded pulse and R waves intervals during the first four hours of sleep as established by default in this function of Polar sports watches [26]. A systematic bias and random error of 0.17 ± 0.40 ms were found being such differences interpreted by the authors as an appropriate accuracy for this PPG system [26]. Despite the different time frames applied in this study (i.e., night recording vs. upon awakening), our results reveal a systematic bias and random error of − 0.05 ± 0.33 ms and − 0.01 ± 0.28 ms for males and females, respectively, throughout the entire training program. This random error showed to greatly influence the Ln RMSSD weekly changes interpretation when the MBI was applied. Thus, from a practical point of view of their corresponding uses in different periods, there is a large discrepancy between devices.

One more important point that might explain the differences between systems is the signal processing. The software Kubios available in a phone application has shown to be a valid tool in this task by interpolating the identified artefacts using a cubic spline interpolation [8]. Benítez-Herrera et al. [27] compared the Kubios and Polar software in the inter-beat intervals correction artefacts during a maximum VO2max test in a group of 12 competitive athletes, finding a better performance in the Kubios software. However, since HRV measurement was conducted at rest, the artefacts correction should not be the main source of disparity between devices and software.

The MBI method applied in this manuscript was proposed by Batterham and Hopkins [10] in response to the limitation of the null-hypothesis significant testing (i.e., p < 0.05 despite the magnitude of the effect is trivial). However, its application in sport science is not without critics [28]. First, the same mean effect can have different magnitudes due to differences in the standard deviation [28] (e.g., Polar H10: 100 ± 20 ms to 120 ± 20 ms is ES = 1.0; Polar Ignite: 100 ± 40 ms to 120 ± 40 ms is ES = 0.5). However, in the case exposed, wider confident intervals reflect the particular variation of each system, and due to that, these differences need to be considered for determining the magnitude of the change. The random errors obtained from both devices through the training program (≈ 30 ms) could partially explain these differences when MBI is applied despite the large correlation (r = 0.714) and good reliability (ICC = 0.817) obtained. Second, establishing what is a worthwhile change can affect inferences [28]. In this case, the first two weeks of the training program were used to determine the SWC (mean ± 0.2SD) where the lightest micro-cycle load was applied. This baseline measure should be updated in longer training periods for meaningful HRV changes interpretation. Lastly, type I error rates are high in this method [28]. However, the quality assessment as indicated in Table 1 warns about the probability of committing a wrong inferential [10]. These HRV change interpretations are with what coaches and athletes deal in daily training decisions, so the conducted analysis responds closely to the functionality of training monitoring with these devices.

Finally, some limitations must be taken into consideration to properly interpret these results. HRV daily monitoring was not conducted in a laboratory context and neither a gold standard was used. However, participants were instructed and encouraged to reduce any potential disturbance in the night and upon awaking measure. From a pragmatic approach, this situation was the real one where athletes and coaches would be exposed, so these potential artefacts on the measure should be considered when comparing wearable devices. It is also necessary to highlight the small number of participants recruited. However, the main results are derived from the 66 weeks monitored which provide some clear insights about their level of agreement. Based on this practical interpretation of the HRV modulation according to a baseline measure, future works should compare the level of agreement of other tracking procedures such as post-exercise monitoring and determine which one reflects more precisely the athlete state according to biological markers.

5 Conclusion

From a practical point of view, these results warn coaches and athletes about the influence of the method used to test and monitor HRV data and its interpretation as a measure of internal load. These advances in monitoring technologies are gaining in ecology, but there is a need to test their functionality before putting them into practice. Despite the very large correlation and good reliability obtained between devices, the random errors found conditioned the weekly Ln RMSSD changes interpretation when the MBI method was applied. Thus, practitioners in the field of training should be aware of these facts if their training groups use different devices or if an athlete interchanges them. The different nuances of each device (i.e., night recording vs. upon awakening measure, pulse vs. R waves, and software signal processing) can condition the HRV data variation what could compromise the interpretation of the ANS modulation.