Background

Stress has been investigated as a risk factor for cardiovascular disease [1] and for reduced human performances, which in some situation, such as dangerous works or driving a car, may results in negative consequences. Stress influences the balance of Autonomous Nervous System (ANS)[2].

HRV is a non-invasive measure reflecting the variation over time of the period between consecutive heartbeats (RR intervals) [3] and has been proved to be a reliable marker of ANS activity [3].

For this reason, several studies investigated cardiovascular reaction induced by stress using Heart Rate Variability (HRV) focussing on acute, laboratory stressors: cognitive (e.g., mental arithmetic) [46], psychomotor (e.g., mirror tracing) [4] challenges and physical stressors[79]. Moreover, as standard laboratory stressors do not always engage subjects' affective response, real life stressors (e.g. precompetitive anxiety [10] or social interaction stressors such as public speaking tasks[11]) are often applied to provide a more appropriate social context in which negative emotions might be elicited[12]. Some studies [1316] investigated HRV variations in the case of university exams as it is a real-life stressor. These studies included only linear HRV measurement, except for the study by Anishchenko which considered nonlinear measures such as Approximate Entropy[13]. In the current study, we investigated how the most common nonlinear HRV measures vary in subject under stress due to university examination. Furthermore, we proposed a classifier for automatic detection of stress based on nonlinear HRV features.

Methods

We performed a prospective analysis, examining 5-minute HRV extracted from ECG records of volunteer students in two different conditions: the first record was performed during an on-going verbal examination (stress session); the second one was performed after holidays (control session).

Sample of data

The data were acquired from 42 students of the School of Biomedical Engineering of the University Federico II, who volunteered to take part in the study. This study was performed in compliance with the Human Study Committee regulations of the University of Naples "Federico II". After obtaining written consent, a 3-lead electrocardiogram (ECG) was recorded on 2 different days: the first recording was performed during an ongoing university verbal examination (stress session), while the second one was taken in controlled resting condition (rest session) after a holiday period, far away from stress induced from study routines.

There are many factors that may influence the HRV, such as circadian rhythm, body position, activity level prior to recording, medication, verbalization and breathing condition. For that reason, we took special precautions to maintain similar condition, such performing both recordings at similar time of day and in a sitting body position after an adaptation time of at least 15 minutes. Furthermore, we asked about consumptions of drugs, and none of the students declared consumption of drugs. Finally, we induced participants to speak also in the control session.

Short-term nonlinear HRV measures

We performed a short-term 5-minute HRV analysis according to International Guidelines [3]. The RR interval time series were extracted from ECG records using an automatic QRS detector, WQRS available in the PhysioNet's library [17], based on nonlinearly scaled ECG curve length feature [18]. Two scientists independently reviewed and corrected the QRS detection and manually labelled the normal beats obtaining the so called series of normal to normal (NN) beat intervals. QRS review and correction was performed using PhysioNet's WAVE [17]. The fraction of total RR intervals labelled as normal-to-normal (NN) intervals was computed as NN/RR ratio. This ratio has been used as a measure of data reliability [17, 19], with the purpose to exclude records with a ratio less than a 90% threshold. None of the records were excluded as NN/RR is higher than 90%.

Nonlinear properties of HRV were analyzed by the following methods: Poincaré Plot [19, 20], Approximate Entropy[21], Correlation Dimension[22], Detrended Fluctuation Analysis[23, 24], and Recurrence Plot [2527]. We focussed on these methods as they were implemented in a software freely distributed and widely used for research activities.

Poincaré Plot

The Poincaré Plot (PP) is a common graphical representation of the correlation between successive RR intervals, for instance the plot of RR j+1 versus RR j . A widely used approach to analyze the Poincaré plot of RR series consists in fitting an ellipse oriented according to the line-of-identity and computing the standard deviation of the points perpendicular to and along the line-of-identity referred as SD1 and SD2, respectively[20].

Approximate entropy

Approximate entropy measures the complexity or irregularity of the RR series[21]. The algorithm for the computation of Approximate Entropy was briefly described here.

Given a series of N RR intervals, such as RR 1 , RR 2 ,..., RR N , a series of vector of length m X 1 , X 2 ,..., X N-m+1 is constructed from the RR intervals as follows: X i , =[RR i , RR i+1 ... RR i+m-1 ].

The distance d[X i , X j ] between vectors X i and X j is defined as the maximum absolute difference between their respective scalar components. For each vector X i , the relative number of vectors X j for which d[X i , X j ]≤r, C i m ( r ) is computed where r is referred as a tolerance value (see equation 1).

C i m ( r ) = n u m b e r o f { d [ X i , X j ] r } N - m + 1 j
(1)

Then, the following index Φ m (r) is computing by taking natural logarithm of each C i m ( r ) and averaging them over i.

Φ m ( r ) = 1 N - m + 1 i = 1 N . - m + 1 ln C i m ( r )
(2)

Finally, the approximate entropy is computed as:

A p E n ( m , r , N ) = Φ m ( r ) - Φ m + 1 ( r ) .
(3)

In this study, we computed the ApEn with m = 2 and with three different value of the threshold r:

r = 0.2*SDNN (standard deviation of the NN series);

r = r max that is, the value of r in the interval (0.1 * SDNN, 0.9 * SDNN) which maximizes the ApEn;

r = r chon , that is the value computed according to the following formula proposed by Chon[28]:

r c h o n = ( - 0 . 0 3 6 + 0 . 2 6 S D D S S D N N ) N 1 0 0 0 4
(4)

where N denotes the length of the NN sequence, and SDDS and SDNN, respectively, are the measure of the short-term and long-term variability of the RR sequence. Formally, SDDS is the Standard Deviation of the Difference Sequence of the series RR, that is, [RR i+1 - RR i , RR i+2 - RR i+1 ,..., RR N - RR N-1 ], and SDNN is the Standard Deviation of the series NN.

The value of the parameters r and m were chosen according to the recommendation for slow dynamic time series, such as heart rate variability, (m = 2 and r = 0.2*SDNN)[29, 30] and to the findings of recent studies [28, 31] which suggested choosing the value of r which maximizes the entropy (r = r max ) and proposed a formula for automatic selection of the value r (r = r chon ).

Further in the paper, we will indicate the Approximate Entropy computed with the different values of r with the following notation En(0.2), En(r max ) and En(r chon ).

Correlation dimension

The correlation dimension D 2 is another methods to measure the complexity used for the HRV time series[22].

As for Approximate Entropy, the series of X i is constructed and C i m ( r ) is computed as in formula 2, but the distance function, in this case, is defined as follows:

d [ X i , X j ] = k = 1 m ( X i ( k ) - X j ( k ) ) 2
(5)

where X i (k) and X j (k) refer to the k-th element of the series X i and X j , respectively.

Then, the following index Cm (r) is computed by averaging C i m ( r ) over i.

C m ( r ) = 1 N - m + 1 i = 1 N - m + 1 C i m ( r )
(6)

The correlation dimension D 2 is defined as the following limit value:

D 2 ( m ) = lim r 0 lim N log C m ( r ) log r
(7)

In practice this limit value is approximated by the slope of the regression curve (logr,logCm (r)). In the current study a value of m = 10[30] was adopted.

Detrended Fluctuation Analysis

Detrended Fluctuation Analysis measures the correlation within the signal [23, 24] and consists into the steps described here.

The average R R ¯ of the RR interval series is calculated on all the N samples. The alternate component of RR interval series, which is defined as RR minus its average value R R ¯ , is integrated:

y ( k ) = j = 1 k ( R R j - R R ¯ ) , k = 1 , . . . . , N .
(8)

The integrated series is divided into non-overlapping segments of equal length n. A least square line is fitted within each segment, representing the local trends with a broken line. This broken line is referred as y n (k), where n denotes the length of each segment.

The integrated time series is detrended as following: y(k)-y n (k). The root-mean-square fluctuation of the detrended time series is computed according to the following formula:

F ( n ) = 1 N k = 1 N ( y ( k ) - y n ( k ) ) 2 .
(9)

The steps from 2 to 4 are repeated for n from 4 to 64.

Representing the function F(n) in a log-log diagram, two parameters are defined: short-term fluctuations (α1) as the slope of the regression line relating log(F(n)) to log(n) with n within 4-16; long-term fluctuations (α2) as the slope of the regression line relating log(F(n)) to log(n) with n within 16-64.

Recurrence Plot

Recurrence Plot (RP) is another approach performed for measurement of the complexity of the time-series[2527]. RP was designed according to the following steps.

Vectors X i = (RR i , RR i+τ ,..., RR i+(m-1) τ ), with i = 1,..., K, with K=[N-(m-1)* τ)], where m is the embedding dimension and τ is the embedding lag, are defined.

A symmetrical K-dimensional square matrix M 1 is calculated computing the Euclidean distances of each vector X i from all the others.

After choosing a threshold value r, a symmetric K-dimensional square matrix M 2 is calculated as the matrix whose elements M 2 (i,j) are defined as:

M 2 i , j = 1 i f M 1 ( i , j ) < r 0 i f M 1 ( i , j ) > r
(10)

The RP is the representation of the matrix M 2 as a black (for ones) and white (for zeros) image.

In this study, according to [30, 32], the following values of the parameters introduced above were chosen: m=10;τ=1;r= m *SDNN, with SDNN defined as the standard deviation of the NN series.

In the RP, lines are defined as series of diagonally adjacent black points with no white space. The length l of a line is the number of points which the line consists of.

The following measures of RP were computed: recurrence rate (REC) defined in equation 11; maximal length of lines (l max ); mean length of lines (l mean ); the determinism (DET) defined in equation 12; the Shannon Entropy (ShEn) defined in equation 13.

R E C = 1 K 2 i = 1 K j = 1 K M 2 ( i , j )
(11)
D E T = l = 2 l max l * N l i = 1 K j = 1 K M 2 ( i , j ) , with  N l = number of lines of length  l
(12)
S h E n = l = l min l max n l * ln n l ,  with  n l = percentage of  N l  over all the number of lines .
(13)

The HRV analysis was performed using Kubios [30] for all the measures except the Approximate Entropy ones which were computed using in-house software in Matlab as their computation is not available in Kubios. All the computed measures are summarized in Table 1.

Table 1 Nonlinear Heart Rate Variability measures selected in the current study

Statistical analysis

We calculated mean, standard deviation, median and 25th and 75th percentiles to describe distribution of HRV features during stress and rest conditions. Moreover, we calculated mean, standard deviation, median and 25th and 75th percentiles of the individual differences between stress session and rest session, and we used the Wilcoxon signed rank test to investigate the statistical significance of features' variation within each subject. The statistical analysis was performed by in-house software developed in MATLAB version R2009b (The MathWorks Inc., Natick, MA).

Classification and performance measurement

We adopted Linear Discriminant Analysis (LDA) as classification method. LDA aims to find linear combinations of the input features that can provide an adequate separation between two classes, in the current study, stress and rest session. LDA uses an empirical approach to define linear decision plans in the feature space. The discriminant functions used by LDA are built up as a linear combination of the variables that seek to maximize the differences between the classes. Further details about LDA can be found in Krzanowski[33].

In order to evaluate the classifier, we computed the common measures for binary classification performance measurement[34] using the formulae reported in Table 2, considering positive to the test those records classified as under stress. Total classification accuracy represents the ability of the classifier to discriminate between the two sessions, sensitivity refers to the ability to identify records in the stress session and specificity refers to the ability to identify records in the rest session.

Table 2 Binary Classification Performance Measures

To estimate the performance measures we adopted a 10-fold cross-validation scheme[35]. This technique consists in developing 10 classifiers as following: (1) dividing randomly the dataset into 10 subsamples; (2) excluding a subsample (testing subset) in turn; (3) developing a classifier with the remaining 9 subsamples (training subset); (4) testing each classifier with the excluded subsample (which is used as an independent testing dataset), computing the performance measures using the formulae in Table 2. The 10-fold cross-validation estimates of the performance measure are computed as the averages over the 10 classifiers. We divided the dataset in 10 folds by subject and not by record in order to obtain a person-independent testing [36].

Feature selection

It would be possible to use all the 13 selected HRV features reported in Table 1 for the LDA, however this may decrease the performance of the classifier, particularly because of curse of dimensionality. Therefore, we tried to find the subset of features which could discriminate the two classes with the highest total classification accuracy: we adopted the so-called exhaustive search method[35], investigating all the possible variations with repetition of k out of N features (with k from 1 to N). Since the number of features N is 13, we investigated 213 = 8192 subsets of features, training and testing the same number of classifier, as discussed in the previous subsection.

For all the single features and for the best subset of features, that is, which achieved the highest total classification accuracy, the discrimination function was computed against all the dataset in order to provide classification rules.

All the analysis was performed by in-house software developed in MATLAB version R2009b (The MathWorks Inc., Natick, MA).

Results

Table 3 shows the descriptive statistics of nonlinear short-term HRV measures in the enrolled subject during the rest and stress sessions. Table 4 presents how nonlinear short-term HRV measures vary in the subjects in rest or under stress due to an ongoing university examination. The features SD2, D 2 , En(0.2), En(r chon ), α 1 , l max were significantly reduced during university examination as compared with rest session, while l mean , REC and ShEn increased significantly during stress.

Table 3 Descriptive statistics of nonlinear HRV features during holidays and during university examination
Table 4 Comparison of nonlinear HRV features during holidays and during university examination

Table 5 shows the performance of the classifier based on single nonlinear HRV measures. The highest 10-fold cross-validation estimate of the total classification accuracy was achieved by SD2 and by D 2 . Applying the linear discriminant analysis against the whole dataset, we obtained the following rules: for instance, referring to SD2, if SD2 is lower than 0.0646 ms, the record is classified as under stress, otherwise as in rest condition. Moreover, also the classifiers based on REC, En(r chon ) and α 1 obtained a satisfying accuracy rate (71%).

Table 5 Performance of the classification rules based on single features and on the best subset of features

The classifier achieving the highest accuracy is based on the subset of features SD1, SD2 and En(0.2), obtaining a total classification accuracy rate of 90%. All the performance measures are reported in the last row of Table 5. The classification rule can be express as follows:

The record is classified as stress if:

1 0 . 6 4 + 2 0 3 . 9 9 S D 1 - 1 0 8 . 7 4 S D 2 - 8 . 2 6 E n ( 0 . 2 ) > 0
(15)

Furthermore, the classification rule could be represented as in Figure 1: in the 3D space of the features SD1, SD2 and En(0.2), the points under the plan defined by equation (12), are classified as STRESS; those above as REST.

Figure 1
figure 1

3D plot of the classification rule based on SD1 , SD2 and En(0.2). The points in sub-space under the blue plane were classified as STRESS; the ones in the sub-space above the blue plane were classified as REST.

Among the classifier based on couple of features for comparison with other studies it is interesting to report the performance of the classifier based on SD1 and SD2 which achieved a total classification accuracy, a sensitivity and a specificity rate of 82%, 79% and 86%.

Discussion

In this study, we compared within-subject variations of short-term nonlinear HRV measures in healthy subjects during condition of mental stress due to an on-going university examination.

Almost all the features measuring complexity of the time series statistically decreased during the stress session, like D 2 , En(0.2), En(r chon ), which have been widely used complexity measures for HRV[37].

Almost all the features measuring complexity of the time series statistically decreased during the stress session. These findings confirms the results obtained by Anishchenko[13], which showed that Approximate Entropy decreased significantly during stress condition due to university examination. Among the approximate entropy measures considered in this study, the one based on threshold value r chon achieved the highest total classification accuracy. These results support previous findings regarding r chon capability to detect different physiological conditions [38].

Furthermore, our finding of decreased complexity measures, in particular D 2 , are in line with studies about the relationship between Heart Rate complexity and acute physical stress[79] or short-term mental stress[1].

The decreased value of complexity measures reflects a change towards more stable and periodic behaviour of the heart rate under stress which may be associated with stronger regularity, decoupling of multimodal integrated networks and deactivation of control-loops within the cardiovascular system[3941]. As interpreted by Schubert [1], this reduction in heart rate complexity during a high stress condition may reflect a lower adaptability and fitness of the cardiac pacemaker.

The results of the classification for automatic detection of high-stress reinforce the findings of the statistical analysis: the D 2 and the En(r chon ) enables detecting the stress condition with a total classification accuracy rate higher than 70%. Furthermore, also the SD2, which is a measure of long-term variability, and α1, which provided information about short-term fluctuations, achieved comparable performances.

The combination of features achieving the best results consists of the two parameters of PP (SD1 and SD2) and a measure of complexity (En(0.2)) and enables detecting the stress condition with a total classification accuracy, a sensitivity and a specificity rate of 90%, 86% and 95%, respectively. The SD1 was chosen in the best combination of features, although the classifier based only on SD1 achieved the lowest performance among the one based on single features, because it provided information different from the other features, particularly SD2 and En(0.2).

We underlined that, even if not shown in best combination, the classifier based on the two parameters of the PP (SD1 and SD2) achieved a total classification accuracy higher than 80%, confirming the usefulness of PP as a valid marker for mental stress[10, 42].

The performance achieved by the selected subset of nonlinear features is higher than that achieved by selected linear feature on the same data-set reported in our unpublished observations. Furthermore, comparing with the study of Kim [2], who adopted a logistic regression on linear HRV features for distinguishing high stressed subject from low stressed ones, achieving a total classification accuracy of 70%, the performance of the current study are better. These comparisons confirms the usefulness of nonlinear HRV features for automatic classification[43].

Controlled breathing was not asked in order not to affect student performance during the university exam. However, the effect of breathing pattern on HRV is a debated question. Some studies [44, 45] showed that different breathing conditions may have an impact on the reproducibility of HRV. In contrast, other studies [4648] found that such factors did not have a significant impact on HRV reliability and their findings seem to suggest that HRV is reliable and consistent over time, whether or not respiration is controlled.

In the current study, we focussed only on a few nonlinear methods, those which were implemented in Kubios, a free software for HRV analysis. Although this choice is a limit of the current study, it could be useful in order to increase the reproducibility of the experiment by other investigators.

As regards the classification methods, LDA succeeded partially in separating the two classes, providing an intelligible model. The intelligibility of features and classification rule is strongly appreciated in medical domain data-mining[49]. However, the adoption of a linear classifier may represent another limit of the current study, which did not enable us to consider nonlinear structures in classification. In future work we will use nonlinear methods such as Artificial Neural Network (ANN) and Support Vector Machine (SVM) with adequate kernel, in order to achieve a possible improvement in the performance measurement. However, we underlined that the computational cost of the LDA is lower than the ANN or SVM, saving time in the operation.

Finally, the results of this paper could extend the use of portable sensing devices, usually adopted in cardiac applications [50, 51], to stress detection.

Conclusions

In conclusion, the results of the current study suggest that nonlinear HRV analysis using short term ECG recording could be effective in automatically detecting real-life stress condition, such as a university examination. The proposed classifier based on the Poincaré Plot measures and on the Approximate Entropy enables detecting the condition of stress due to university examination with a total classification accuracy, a sensitivity and a specificity rate of 90%, 86, and 95%, respectively.

Further research on a large sample size and on different stressful conditions will help to further elucidate the findings of this study and effectiveness of HRV analyses for differentiation between low and high stress condition.