An automatic segmentation method for heart sounds

Liu, Qingshu; Wu, Xiaomei; Ma, Xiaojing

doi:10.1186/s12938-018-0538-9

An automatic segmentation method for heart sounds

Research
Open access
Published: 06 August 2018

Volume 17, article number 106, (2018)
Cite this article

Download PDF

You have full access to this open access article

BioMedical Engineering OnLine Aims and scope Submit manuscript

An automatic segmentation method for heart sounds

Download PDF

6139 Accesses
22 Citations
Explore all metrics

Abstract

Background

There are two major challenges in automated heart sound analysis: segmentation and classification. An efficient segmentation is capable of providing valuable diagnostic information of patients. In addition, it is crucial for some feature-extraction based classification methods. Therefore, the segmentation of heart sound is of significant value.

Methods

This paper presents an automatic heart sound segmentation method that combines the time-domain analysis, frequency-domain analysis and time–frequency-domain analysis. Employing this method, the boundaries of heart sound components are first located, and the components are then recognized. Finally, the heart sounds are divided into several segments on the basis of the results of boundary localization and component identification.

Results

In order to evaluate the performance of the proposed method, quantitative experiments are performed on an authoritative heart sound database. The experimental results show that the boundary localization has a sensitivity (Se) of 100%, a positive predictive value (PPV) of 99.3% and an accuracy (Acc) of 99.93%. Moreover, the Se, PPV and Acc of component identification reach 98.63, 99.86 and 98.49%, respectively.

Conclusion

The proposed method shows reliable performance on the segmentation of heart sounds. Compared with previous works, this method can be applied to not only normal heart sounds, but also the sounds with S3, S4 and murmurs, thus greatly increasing the applied range.

Segmentation of Heart Sound by Clustering Using Spectral and Temporal Features

Heart Sound Segmentation in Noisy Environments

Key Techniques and Challenges for Processing of Heart Sound Signals

Background

Cardiovascular disease (CAD) remains the leading cause of death worldwide. In China, CAD accounted for more than 40% of deaths in 2014 [1]. Heart sounds, which are generated by the beating of heart, are considered as an important signal for detecting cardiovascular problems. In general, a normal heart sound comprises two components, namely the first heart sound (S1) and the second heart sound (S2). In some special cases, three additional components, namely the third heart sound (S3), fourth heart sound (S4), and murmurs, may appear together or separately [2]. Heart sounds have been used to diagnose cardiovascular problems for hundreds of years. Even currently, auscultation remains a crucial approach for physicians to learn about patients’ heart health. However, this conventional diagnostic approach is highly subjective and relies largely on physicians’ experience [3]. Thus, an accurate and efficient automated heart sound analysis system is required.

There are two major challenges in developing an automated heart sound analysis tool: segmentation and classification. The primary operations of segmentation are positioning the boundaries of heart sound components and identifying the component types, namely S1, S2, S3, and S4. Several segmentation methods have been reported. However, most of the methods focus on either boundary detection or component identification. In addition, many methods only apply on normal heart sounds, thus severely limiting the application of these methods.

Liang et al. [4] proposed a Shannon-entropy-based heart sound segmentation method to recognize S1 and S2, but they did not study S3, S4, and murmurs. Kumar et al. [5] presented an S3 detection algorithm based on wavelet transform; however, this method could not detect the boundaries of the heart sound components. Moukadem et al. [6] developed an S-transform-based heart sound segmentation method; however, this method can only be applied to normal heart sounds. Springer et al. [7] successfully addressed the segmentation problem of noisy S1 and S2 by utilizing a logistic regression–hidden semi-Markov model. Nevertheless, this study, as most other studies, did not address the segmentation of heart sounds containing S3 and S4. In addition to the aforementioned studies, several other researchers have also contributed to the automated segmentation of heart sounds. For example, Naseri et al. [8] proposed a heart sound segmentation method based on the frequency-energy method, which had excellent performance for various heart sounds. Boutana et al. [9] presented a time–frequency-analysis–based heart sound segmentation method and validated the efficiency of the method through some pathological heart sounds. Tang et al. [10] reported a dynamic-clustering-based segmentation method, which was effective for both normal and abnormal heart sounds. However, these methods all have some of the aforementioned limitations.

Based on the advantages and drawbacks of the methods mentioned above, the present study is focused on developing an automated heart sound segmentation algorithm that can position the boundaries and recognize the components for both normal heart sounds and heart sounds with S3, S4 and murmurs.

In this study, a method that combines time-domain analysis, frequency-domain analysis, and time–frequency-domain analysis was developed, and some novel strategies were proposed. The complete method comprises four parts, namely preprocessing, murmur elimination, boundary detection, and component identification. The overall process is presented in Fig. 1. First, the noisy raw heart sound is standardized in the preprocessing phase. Then, the signal goes through murmur elimination algorithm, where any possible murmurs are considerably eliminated without affecting the characteristics of the heart sound signal. In the following boundary detection phase, the onset and offset of each heart sound component are positioned and marked automatically. Finally, in the component identification phase, the cardiac cycle is calculated first, and the heart sound components are then recognized in each heartbeat on the basis of cardiac cycle.

Methods

Materials

To validate the proposed method, two quantitative experiments were performed, and the heart sounds from the University of Michigan’s Heart Sound & Murmur Library were employed [11]. In the first experiment, 12 types of heart sounds with variable murmurs were used to verify the effect of murmur elimination. The total length of these sounds was 759 s, and the cardiac cycle number reached 888. The second experiment was performed to quantify the effectiveness of boundary detection and component identification. For this purpose, 16 types of sounds, including 2 normal and 14 abnormal types, were used, constituting a total of 1039 s (1251 cycles). The heart sound signals were recorded at 44.1 kHz, and each one lasted for at least 56 s. For the convenience of displaying and testing, each sound was divided into three-heart-cycle segments. Moreover, every two adjacent segments had an overlap of one cardiac cycle.

Preprocessing

Down-sampling

Because the maximal frequency of the heart sounds did not exceed 1 kHz [12], the heart sound signals were down-sampled to 2 kHz according to the Nyquist-Shannon sampling theorem.

Denoising

A wavelet-based denoising method with soft threshold was utilized. The db4 wavelet was chosen as mother wavelet because its shape is similar to that of a heart sound signal, and the scale was established as 7 based on signal–noise-ratio (SNR) and normalized root mean squared error (NRMSE) [13]. The denoised heart sound was written as $HS\left( n \right)$.

Normalization

To standardize the processing of heart sound, the heart sound signals were normalized as

$$HS_{N} \left( n \right) = \frac{HS\left( n \right)}{{{ \hbox{max} }\left( {\left| {HS\left( n \right)} \right|} \right)}} , \quad n = 0,1, \ldots ,N - 1$$

(1)

where N is the total number of sampling points.

Unless otherwise specified, the expressions “original heart sound” and “original signal” appearing in the following refer to HS_N(n).

Murmur elimination

Different from the conventional constant-cutoff-frequency-based murmur elimination method (usually 200 Hz) [8, 14], this paper presents a novel low-pass filter to remove the murmurs, namely the automatic-cutoff-frequency low pass filter (ALPF), whose cutoff frequency is calculated by analyzing the fast Fourier transform (FFT) of the heart sound.

Assuming that the modulus of the FFT sequence of $HS_{N} \left( n \right)$ is $FFT_{H} \left( n \right)$, n = 0, 1, …, N − 1, the envelope of $FFT_{H} \left( n \right)$ is obtained by the moving average method as follows:

$$E_{FFT} \left( n \right) = \frac{1}{{L_{F} + n + 1}}\sum\nolimits_{k = 0}^{{n + L_{F} }} {FFT_{H} \left( k \right)} ,\quad 0 \le n \le L_{F} - 1$$

$$E_{FFT} \left( n \right) = \frac{1}{{2L_{F} + 1}}\sum\nolimits_{{k = n - L_{F} }}^{{n + L_{F} }} {FFT_{H} \left( k \right)} , \quad L_{F} \le n \le N - L_{F} - 1$$

(2)

$$E_{FFT} \left( n \right) = \frac{1}{{L_{F} + N - n}}\sum\nolimits_{{k = n - L_{F} }}^{{n + L_{F} }} {FFT_{H} \left( k \right)} ,\quad N - L_{F} \le n \le N - 1$$

where N is the sampling number of FFT_H(n), and L_F is the neighborhood radius of point n, L_F ≪ N. Note that if L_F is too large, some adjacent local peaks of $E_{FFT} \left( n \right)$ are merged together. Considering this and experimental results, L_F is set to 5. The envelope is also normalized using (1) and is written as $E_{F} \left( n \right)$. Unless otherwise specified, the “FFT coefficient” and “FFT envelope” appearing hereinafter refer to $E_{F} \left( n \right)$.

For the heart sounds without murmurs,$E_{F} \left( n \right)$ is generally comprised of primary peak and side peak (see in Fig. 2b). The primary peak is located in the low-frequency area with the highest amplitude, and the side peak is located in the relatively higher-frequency area with much smaller amplitude. Thus, most energy of heart sounds is concentrated within the frequencies of primary peak, and filtering out the side peak does not affect the shape of heart sounds (see in Fig. 2c, d). Therefore, the primary peak preserves the information of heart sound components (S1, S2, S3 and S4).

For the heart sounds with murmurs, the primary peak still exists, but the side peak is usually merged with the frequency band of murmurs (see in Fig. 2f). Considering that the murmurs usually have higher frequency than the primary peak, the frequency of primary peak’s ending point is an appropriate estimation of the cutoff frequency to remove murmurs.

In this study, the ending point of primary peak is searched in the range of 20 to 200 Hz and is determined as the first valley point (the lowest point between two adjacent spikes) after the primary peak point with FFT coefficient smaller than 0.2. The value 0.2 is determined based on experimental experience. If there is no such an ending point between 20 and 200 Hz, the cutoff frequency is set to 200 Hz.

$HS_{N} \left( n \right)$ is filtered using this frequency, and the filtered sound is then normalized; the normalized sound is denoted as HS_NLP.

Boundary detection

Envelope extraction

After removing the interference of murmurs, the next step is to calculate the onsets and offsets of heart sound components. Because the envelope can reduce the complexity of computing while preserving the location information of the signals, the closing operation of mathematical morphology is utilized to obtain the envelope in the proposed method and is defined as

$$\left[ {f \cdot g} \right]\left( n \right) = \left\{ {\left[ {f \oplus g} \right] \otimes g} \right\}\left( n \right)$$

(3)

where f(n) refers to the input signal, and g(n) is a structure element; ⨁ denotes the dilation operation, and ⨂ denotes the erosion operation. The lengths of f(n) and g(n) are P and Q respectively, and generally P > Q [15, 16].

The choice of structure element g(n) directly affects the shape of the envelope obtained by this operation [17]. In this study, in order to make the shape of envelope simple, g(n) is designed as

$$g\left( n \right) = 0, \quad 0 \le n \le Q$$

(4)

Moreover, Q, the length of $g\left( n \right)$, is also a significant parameter. According to physiological knowledge, in general, the time interval between adjacent heart sound components (S1S2 interval, S2S3 interval, S4S1 interval, etc.) is no less than 100 ms [8, 18], which corresponds to 200 points under the sampling rate of 2 kHz. Thus, Q should be smaller than 200. In this study, Q was set as 30 on the basis of the experimental results.

The $f\left( n \right)$ in (3) is replaced by HS_NLP, and the heart sound envelope is obtained. The envelope is then normalized. The result is denoted as E_NLP.

Although the ALPF can effectively reduce murmurs, the filtered heart sound may retain some low-amplitude residuals, which are also reflected in the envelope and affect the detection of boundaries. In this method, three operations are used to overcome this problem.

Thresholding processing

The first operation is thresholding.

For some murmurs, such as early systolic murmurs, their onsets partially overlap the offsets of S1s. After filtering by ALPF, the residues of murmurs may still be linked with S1s. Consequently, the murmur residues and S1s may be mixed together in $E_{NLP}$. Therefore, the purpose of thresholding is to separate the boundaries of murmurs from those of S1s to conduct the subsequent operations. Consequently, a small threshold can complete this task. Due to the use of small threshold, the boundary information of the heart sound components can be preserved as much as possible.

The envelope $\tilde{E}_{NLP}$ after thresholding is calculated as

$$\tilde{E}_{NLP} = \left\{ {\begin{array}{ll} E_{NLP} ,& \quad E_{NLP} \ge \theta \\ 0,&\quad E_{NLP} < \theta \\ \end{array} } \right.$$

(5)

where θ = min(θ_A, θ_C), $\theta_{A} = \lambda \cdot \sqrt {\frac{1}{N}\mathop \sum \nolimits_{n = 0}^{N - 1} \left[ {E_{NLP} \left( n \right) - \mu } \right]^{2} }$, $\lambda = 0.8, \mu = \frac{1}{N}\mathop \sum \nolimits_{m = 0}^{N - 1} E_{NLP} \left( m \right)$ and θ_C = 0.025. The parameter θ_A enables the algorithm to determine an appropriate threshold based on the characteristics of the signal itself, and θ_C is a reference threshold, preventing error in θ_A; the value of θ_C is determined by experimental results.

Opening operation

After thresholding, the boundaries of heart sound components and interfering residues are separated. The remaining residues are considerably smaller than the heart sound components in both duration and amplitude. Because the opening operation of mathematical morphology can reduce spikes, it is utilized as the second stage of residue removal. The opening operation is defined as [19]

$$\left[ {f \circ g} \right]\left( n \right) = \left\{ {\left[ {f \otimes g} \right] \oplus g} \right\}\left( n \right)$$

(6)

The g(n) in opening operation is still defined using (4), but the length of g(n) is denoted as Q’. Besides, the parameters in (6) are same as those in (3).

As in the closing operation, Q’ is also a significant parameter. Because the average durations of S1 and S2 are approximately 100 ms and the durations of S3 and S4 are approximately 50 ms [8], which correspond to 200 and 100 sampling points, respectively. Thus, Q’ must be smaller than 100; in this study, it was set to 50 based on experimental results. The results of the opening operation are normalized, and the normalized signal is denoted as $\tilde{E}_{NO}$.

Energy thresholding processing

Finally, to completely exclude the interference components from the envelope, the energy thresholding approach is used. Considering that $\tilde{E}_{NO}$ is comprised of points of zero and non-zero, i.e., the regions located by S1, S2, S3, S4 and few murmur residues are non-zero, and the rest regions are zero. Therefore, the boundaries of each component (including the interference components escaping the opening operation) can be automatically obtained as follows:

$$If \tilde{E}_{NO} \left( i \right) = = 0 \& \& \tilde{E}_{NO} \left( {i + 1} \right) \ne 0,\quad i = 0,1, \ldots N - 2$$

then, the ith point is determined as a potential onset;

$$If \tilde{E}_{NO} \left( {j - 1} \right) \ne 0 \& \& \tilde{E}_{NO} \left( j \right) = = 0,\quad j = 1,2, \ldots N - 1$$

the jth point is determined as a potential offset. The potential onset and the potential offset always appear in pairs and these onsets and offsets are denoted as SPT(i) and EPT(j) respectively, where $i, j = 0,1, \ldots ,M - 1$. M is the number of potential onsets/offset.

Consequently, the energy of each component can be obtained as

$$E_{k} = \sum\nolimits_{i = SPT\left( k \right)}^{EPT\left( k \right)} {\tilde{E}_{NO} {}^{2}\left( i \right)} ,\quad k = 0,1, \ldots ,M - 1$$

(7)

Assuming for each E_k,

$$If \ E_{k} < \eta$$

then the kth potential onset and offset are considered invalid and removed from $SPT$ and EPT. Moreover, the points between these invalid onsets and offsets are determined as interference parts and are set to zero using (8)

$$\tilde{E}_{NO} \left( n \right) = \left\{ {\begin{array}{l} {0,\ {SPT}\left( k \right) \le n \le EPT\left( k \right), \ 0 \le k \le M - 1, \ if \ E_{k} < \eta } \\ {\tilde{E}_{NO} \left( n \right), \ else } \\ \end{array} } \right.$$

(8)

In this inequality, $\eta$ is set to 0.25 based on experimental experience. The remaining potential points are determined as the final onsets and offsets of the heart sound components.

Component identification

Cardiac cycle calculation

The last step is to recognize the heart sound components. Considering the quasi-periodic nature of heart sounds, this step can be more efficiently accomplished if the cardiac cycle is calculated. In some studies, the cardiac cycle was calculated by using the partial autocorrelation function (PACF) [10, 15]. However, because of the inherent defects of PACF, the calculating results are not satisfactory. In order to overcome the shortcomings of PACF, this study proposes a cardiac cycle calculation method based on the unbiased autocorrelation function (UACF), considerably improving the applicability.

The PACF and UACF are defined as (9) and (10), respectively.

$$R\left( m \right) = \frac{1}{N}\sum\nolimits_{n = 0}^{N - m - 1} {\tilde{E}_{NO} \left( n \right)\tilde{E}_{NO} \left( {n + m} \right)}$$

(9)

$$R^{\prime}\left( m \right) = \frac{1}{N - m}\sum\nolimits_{n = 0}^{N - m - 1} {\tilde{E}_{NO} \left( n \right)\tilde{E}_{NO} \left( {n + m} \right)}$$

(10)

where $m = 0,1, \ldots N - 1, N$ is the sampling number of $\tilde{E}_{NO}$. Figure 3 demonstrates the PACF and UACF.

For PACF, as m increases, the number of sampling points involved in multiplication and accumulation decreases gradually. Consequently, the primary peaks of the PACF exhibit gradual attenuation (see in Fig. 3b), causing the side peaks between the primary peaks to be interference of the cardiac cycle calculation. By contrast, the UACF only averages the terms that are involved in multiplication and accumulation, thereby overcoming the drawback of primary peak’s attenuation in the PACF.

Figure 3c shows that there is a relatively significant difference in amplitude between the primary peaks and side peaks. To further expand the amplitude difference, square energy, which is defined in (11), is utilized. After obtaining the energy signal, thresholding is used to remove the side peaks. The energy signal of the UACF and the energy signal after thresholding are denoted as $E_{{R^{\prime}}}$ and $\tilde{E}_{{R^{\prime}}}$, respectively.

$$E_{{R^{\prime}}} \left( m \right) = R^{{\prime}^{2}} \left( m \right),\quad m = 0,2, \ldots, N - 1$$

(11)

$$\tilde{E}_{{R^{\prime}}} = \left\{ {\begin{array}{ll} E_{{R^{\prime}}} ,&\quad E_{{R^{\prime}}} \ge \sigma \\ 0,&\quad E_{{R^{\prime}}} < \sigma \\ \end{array} } \right.$$

(12)

where $\sigma = 0.4 \times { \hbox{max} }\left( {E_{{R^{\prime}}} } \right)$.

The last peak of $\tilde{E}_{{R^{\prime}}}$ is likely to be a side peak that escaped from thresholding because of its relatively higher amplitude (see in Fig. 4c). This kind of side peak is usually generated by the multiplication and accumulation of S1 and S2. In order to calculate the cardiac cycle accurately, the last peak of $\tilde{E}_{{R^{\prime}}}$ is forcibly removed regardless of whether it is a side peak or a primary peak (see in Fig. 4d). Finally, the UACF of $\tilde{E}_{{R^{\prime}}}$ with the last peak removed is calculated, and this UACF is denoted as R_Final (see in Fig. 4e). Then, thresholding is performed on R_Final with a threshold of 0.5 × max (R_Final), leaving only primary peaks in the thresholded R_Final. The intervals between adjacent peaks of thresholded R_Final are then calculated and averaged to obtain the average cardiac cycle. Figure 4 shows the main procedures of cardiac cycle calculation.

Component recognition

After determining the cardiac cycle, the heart sound components can be recognized beat by beat. Under normal circumstances, in terms of the number of components in one heartbeat, heart sounds can be divided into three categories: two components (S1 and S2), three components (S1, S2, S3, or S4) and four components (S1, S2, S3, and S4). Moreover, in the time domain, S1S2 interval < S2S1 interval, S2S3 interval < S3S1 interval, and S2S4 interval > S4S1 interval, whereas in the frequency domain, the frequencies of S1 and S2 are usually higher than those of S3 and S4 [8, 15]. These priori knowledge along with the mentioned three categories can be utilized to recognize the heart sound components.

Figure 5 presents the overall process of component recognition. The algorithm finds a heartbeat based on cardiac cycle and automatically obtains its component number. Then, the recognition processing is performed according to the component number found.

For the convenience of expression, in the following part, the jth component of the ith heartbeat is denoted as C ⁱ_j , and the time interval between C ^p_m and C ^q_n is denoted as T ^{p,
q}_mn .

Assuming that the $i$ th heartbeat is being analyzed, then the recognition processing can be expressed as follows.

Two components: In this case, T ^{i,
i}₁₂ and T ^{i,
i+1}₂₁ are calculated. If T ^{i,
i}₁₂ < T ^{i,
i+1}₂₁ , C ⁱ₁ is identified as S1, and C ^j₂ is identified as S2; otherwise, C ⁱ₁ is S2, and C ^j₂ is S1.

Three components: In this case, because S3 and S4 are less than S1 and S2 in frequency, these three components can be recognized using three steps: detecting S3/S4, recognizing S1 and S2, and identifying S3/S4.

Time–frequency analysis is required for the detection of S3 and/or S4. In this study, S-transform is employed to accomplish this task. Assuming that the signal to be analyzed is h(n), n = 0, 1, …, N − 1, its Fourier transform is $H\left( {k/NT} \right),k = 0,1, \ldots ,N - 1,$ where T is the sampling interval. Then, the S-transform of $h\left( n \right)$ is given by [20].

$$\left\{ {\begin{array}{l} {S\left( {jT,\frac{n}{NT}} \right) = \sum\nolimits_{m = 0}^{N - 1} {H\left( {\frac{m + n}{NT}} \right)e^{{ - \frac{{2\pi^{2} m^{2} }}{{n^{2} }}}} e^{{\frac{i2\pi mj}{N}}} } ,\quad n \ne 0} \\ {S\left( {jT,0} \right) = \frac{1}{N}\sum\nolimits_{m = 0}^{N - 1} {\left( {\frac{m}{NT}} \right)} ,\quad n = 0} \\ \end{array} } \right.$$

(13)

where j, m, n = 0, 1, …, N − 1.

To obtain sufficient time–frequency information, the S-transform is directly performed on the original heart sound signal HS_N(n). The result is a complex matrix and is written as S. Suppose that the element of the mth row in the nth column of the matrix is S_m,n = a + jb, then the modulus of $S_{m,n}$ is defined as.

$$\left| S \right|_{m,n} = |S_{m,n} | = \sqrt {a^{2} + b^{2} }$$

(14)

The modulus of each element of S is calculated, and the modulus matrix |S| is then obtained. Unless otherwise specified, both the expression “S transform” and “S coefficient” in the following text refer to $\left| S \right|$.

The two-dimensional |S| matrix is relatively difficult to analyze; thus, the one-dimensional instantaneous frequency is utilized to detect S3 and S4, and is defined as.

$$f_{H} \left( n \right) = \frac{{\mathop \sum \nolimits_{m = 0}^{N - 1} F\left( m \right) \cdot \left| S \right|_{m,n} }}{{\mathop \sum \nolimits_{m = 0}^{N - 1} \left| S \right|_{m,n} }}$$

(15)

where $F\left( m \right) = \frac{m}{NT} = \frac{{m \cdot f_{s} }}{N}$ represents the frequency of the mth row of the S-transform, and f_s is the sampling rate. Because the maximal frequency of the sampled signal is half of the sampling frequency, $F\left( m \right)$ was set as $\frac{1}{2}\frac{m}{NT} = \frac{{m \cdot f_{s} }}{2N},m = 0,1, \ldots ,N - 1$ in this study.

Furthermore, the boundaries of heart sound components are obtained in the boundary detection phase (i.e., SPT and EPT). Therefore, by extracting the parts between these onsets and offsets from f_H(n), the instantaneous frequency of heart sound components is obtained (see in Fig. 6(3)).

Finally, the average instantaneous frequency is calculated. Figure 6a–d present an example of these operations. It is observed that the average instantaneous frequency of S4 is significantly lower than those of S1 and S2. Similarly, the instantaneous frequency of S3 is also the lowest among S1, S2 and S3. Consequently, the component with the lowest frequency of each heartbeat is S3 or S4, which means that the previous component is S2 and the next is S1.

Without loss of generality, assuming that C ⁱ₂ is the component with the lowest frequency, C ⁱ₁ and C ⁱ₃ can be easily determined as S2 and S1, respectively. Then, T ^{i,
i}₁₂ and T ^{i,
i}₂₃ are calculated and compared. If T ^{i,
i}₁₂ < T ^{i,
i}₂₃ , C ⁱ₂ is recognized as S4; otherwise, if T ^{i,
i}₁₂ > T ^{i,
i}₂₃ , C ⁱ₂ is S3.

Four components: In this case, the four components are arranged in the order: S1 → S2 → S3 → S4 → ··· S1 → S2. Thus, the first one of the two components with the lowest instantaneous frequencies is S3, and the second is S4. Moreover, the component preceding the detected S3 and S4 is S2, and the following component is S1.

After all the components in the ith heartbeat are recognized, the algorithm finds the next beat, and performs the same recognition processing. When all the heartbeats are analyzed, the recognition process is finished.

Finally, the heart sounds are divided into segments. For the sounds containing S1 and S2 only (including the sounds with murmurs), they are segmented into S1, S1S2 interval, S2 and S2S1 interval. For the sounds containing extra S3/S4 (sounds with murmurs are also included), they are segmented into S1, S1S2 interval, S2, S2S3/S2S4 intervals, S3/S4, S3S1/S4S1 intervals.

Results

Experimental setup

Murmur elimination evaluation

For the evaluation of murmur elimination, a novel index, the signal murmur ratio (SMR), is proposed as follows:

$$SMR = 10 \cdot lg\frac{{\mathop \sum \nolimits_{n \in U} X^{2} \left( n \right)}}{{\mathop \sum \nolimits_{m \in V} X^{2} \left( m \right)}}$$

(16)

where X(n) is the heart sound signal, $U$ is the region where heart sound components (S1, S2, S3, and S4) are located, and V is the region where murmurs are located. To ensure the validity of the indicator, both U and $V$ are manually determined from the original signal HS_N(n). Thereby, for a certain heart sound, U and V are constant when calculating SMRs of the murmur-reduced heart sound and the non-murmur-reduced heart sound (see in Fig. 7). A higher SMR indicates a smaller murmur energy proportion in signal $X\left( n \right)$, which means a stronger murmur removing effect.

Boundary detection and component identification evaluation

To validate the performance of boundary detection and component identification, three classical metrics, sensitivity (Se), positive predictive value (PPV), and accuracy (Acc) are calculated. These three indices are defined as follows:

$${\text{Sensitivity}}\left( {\text{Se}} \right) = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} \times 100{\text{\% }}$$

(17)

$${\text{Positive predictive value }}\left( {\text{PP}} \right) = \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}} \times 100{\text{\% }}$$

(18)

$${\text{Accuracy }}\left( {\text{Acc}} \right) = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} \times 100{\text{\% }}$$

(19)

In this experiment, the onsets and offsets of heart sound components were automatically marked with green circles and red stars by the algorithm (see in Fig. 8c), respectively. In addition, the names of these components were automatically labeled above them (see in Fig. 8d). The results of boundary localization and component identification were judged by human and quantified using (17)–(19), in which the FN means false negative, FP means false positive, TN means true negative and TP means true positive.

Results of murmur elimination

The performance of ALPF was compared with two existing methods. The first method is the conventional low-pass filter with a constant cutoff frequency of 200 Hz (200-LPF), and the second method is the wavelet package transform method with a db10 mother wavelet and a scale of 5 (WPT) [14]. In the WPT method, the heart sound components were reconstructed by nodes (5, 0) to (5, 5) to remove interference components such as murmurs. The SMR of the heart sounds filtered by the ALPF, WPT and 200-LPF are denoted as SMR_A, SMR_WPT, and SMR₂₀₀, respectively. Moreover, the SMR of the original heart sounds HS_N(n) was calculated and denoted as $SMR_{Ori}$. In addition, the SMR gains of these three methods were achieved by subtracting SMR_Ori from $SMR_{A}$, $SMR_{WPT}$ and $SMR_{200}$, respectively. All the experimental parameters of these three methods were identical.

Tables 1 and 2 show the statistical results of SMR and SMR gains. It is observed that the proposed ALPF has the highest score of SMR and SMR gains. The overall performances were evaluated by two indices, arithmetic average (AA) and weighted average (WA). AA was achieved by simply averaging the SMRs of 12 types of heart sounds and WA was affected by the weight of each type’s cycle number. In terms of SMR, the AA (± SD) and WA (± SD) of SMR_A reach 26.95 ± 4.30 db and 26.92 ± 4.14 db, respectively; by contrast, the AA and WA are 24.42.07 ± 4.54 and 24.38 ± 4.43 for SMR_WPT and 22.61 ± 5.86 and 22.61 ± 5.29 for SMR₂₀₀. In terms of SMR gains, the heart sounds filtered by the ALPF have an arithmetic average and weighted average SMR gain (± SD) of 8.66 ± 4.52 db and 8.59 ± 4.37 db; by contrast, the arithmetic average and weighted average SMR gains (± SD) are 6.07 ± 3.88 and 5.99 ± 3.76 for WPT and 4.26 ± 4.51 and 4.21 ± 4.51 for 200-LPF.

Table 1 Statistics of SMR

Full size table

Table 2 Statistics of SMR gains

Full size table

Figure 7 shows an example of the murmur elimination results using the three methods.

Results of boundary detection and component identification

A detailed quantitative description of the boundary positioning result is shown in Table 3. It is observed that the proposed method achieves a sensitivity of 100% for all 16 types of heart sounds, and both the PPV and accuracy are 99.93%, indicating an outstanding performance.

Table 3 Quantitative estimation of positioning

Full size table

Table 4 presents the quantitative data of component identification. Because of the good positioning effect, the identification performance is high, with an average Se of 98.63%, an average PPV of 99.86%, and an average accuracy of 98.49%.

Table 4 Quantitative estimation of identification

Full size table

Figure 8 shows the complete process of the proposed method, including murmur elimination, boundary detection, and component identification. The first component (S4) in Fig. 8d is not labeled. This is because the identification of S4 requires information obtained from the previous cardiac cycle. However, this cycle lacks the information. The unlabeled S4 component is identified in the previous segment. Therefore, there is no problem of missed detection.

Discussion

Tables 1 and 2 show the results of murmur elimination, all three methods exhibit an improvement in SMR compared with SMR_Ori. As a result of selecting the appropriate cutoff frequency, the proposed ALPF has the highest SMR gain among all the tested methods. The arithmetic and weighted average SMR gains of ALPF reach 8.66db and 8.59db, which are 42.67% and 43.41% higher than WPT’s 6.07db and 5.99db, and 103.29% and 104.04% higher than 200-LPF’s 4.26db and 4.21db, respectively, indicating optimal murmur elimination performance.

In order to verify the superiority of the proposed method for murmur elimination, in this study, the paired two-sample t-test was performed on the SMR gains calculated for 12 types of heart sounds filtered with these three methods. In the paired two-sample t-test, the significance level α was set to 0.01. Under this condition, the rejection region is |t| ≥ t_α/2(n − 1) = t_0.005(11) = 3.1058, and t_A,WPT = 5.5499 > 3.1058, t_A,200 = 5.5039 > 3.1058, and t_WPT,200 = 2.6273 < 3.1058. Thus, the proposed murmur elimination method was verified to have significant advantages compared with WPT and 200-LPF.

For boundary detection results shown in Table 3, among all 16 types of heart sounds, the proposed method failed on one signal labeled as a mid-systolic click: two click components were not removed in boundary detection process and were erroneously positioned as true components. Time–frequency analysis was then performed to determine the reason for this.

The time–frequency analysis revealed that these click components have a relatively high number of overlaps with S1 and S2 in frequency, resulting in the failure in removing them. Compared with this sound, the other signals’ interference components (clicks, murmurs, and residues of noise) show a smaller overlap with the true components; thus, the positioning score is higher. In fact, setting a higher threshold θ can completely remove the clicks, resulting in a higher performance score. However, the higher threshold causes a larger deviation between the calculated boundaries and the real values. After weighing the pros and cons, this approach was abandoned.

The identification results of Table 4 shows that the fourth signal (Umich No. 4) is consistent with the result of positioning. A difference appears in the result of the 13th signal (Umich No. 14). The FN of the signal reaches 36, resulting in a low Se and low Acc of 73.91%. Initially, this result was puzzling because the signal was labeled as normal. After a comprehensive analysis, the rhythm of some parts of this heart sound was found to be irregular, resulting in the failure of heart cycle calculation in these parts. Therefore, components of these parts could not be recognized.

For the performance evaluation, because the cycle number of each heart sound was very close and the performance of each type of signal was analyzed separately, the utilized dataset did not suffer from the problem of unbalance.

Some other operating characteristics obtained in this study and 11 previous studies are listed in Table 5 for comparison. Both the Se and PPV for positioning in literature [6] consist of two values separated by a slash. The two values are for normal and pathological cases of heart sounds, respectively. As mentioned in the “Background”, a complete segmentation is comprised of boundary detection and component identification. However, several existing works only focus on one of them. Moreover, some studies could be applied only to the basic situation, for example, heart sounds without murmurs. Therefore, in addition to achieving excellent performance, the proposed approach achieves the functions of both boundary localization and component identification and can be applied in more conditions.

Table 5 A comprehensive comparison with other studies

Full size table

Conclusions

This paper presents an accurate heart sound segmentation algorithm that combines time-domain, frequency-domain and time–frequency-domain analysis. Compared to existing studies, this method is applicable to a wide range of heart sounds, from normal to those containing S3, S4 and various murmurs. To verify this method, quantitative experiments were performed using the University of Michigan’s Heart Sound & Murmur Library, an authoritative open database. The experimental materials incorporated two types of normal heart sounds and 14 types of abnormal heart sounds. The results show that the boundary localization has an average Se of 100%, an average PPV of 99.3% and an average Acc of 99.93%. Moreover, the Se, PPV and Acc of the component identification reach 98.63%, 99.86% and 98.49%, respectively, indicating outstanding performance of the proposed method. There are still some shortcomings of this work. For example, the component identification relies on the success of cardiac cycle calculation; therefore, this method cannot be applied to the heart sound with severe arrhythmia because of the failure to achieving accurate cardiac cycle by using UACF. The study of segmentation provides a good basis for extracting significant features of heart sounds. Therefore, the further study will focus on the classification of heart sounds.

References

Chen W, Gao R, Liu L. Report on cardiovascular diseases in China 2014. Chin Circ J. 2015;30:617–22.
Google Scholar
Tseng YL, Ko PY, Jaw FS. Detection of the third and fourth heart sounds using Hilbert-Huang transform. Biomed Eng Online. 2012;11:8.
Article Google Scholar
Yuenyong S, Nishihara A, Kongprawechnon W, et al. A framework for automatic heart sound analysis without segmentation. Biomed Eng Online. 2011;10:13.
Article Google Scholar
Liang H, Lukkarinen S, Hartimo I. Heart sound segmentation algorithm based on heart sound envelogram. In: Computers in Cardiology 1997. Lund: IEEE; 1997. 7–10 September 1997, p. 105–8.
Kumar D, et al. Third heart sound detection using wavelet transform-simplicity filter. In: Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE. IEEE, 2007. p. 1277–81.
Moukadem A, et al. A robust heart sounds segmentation module based on S-transform. Biomed Signal Process Control. 2013;8:273–81.
Article Google Scholar
Springer DB, Tarassenko L, Clifford GD. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans Biomed Eng. 2016;63:822–32.
Google Scholar
Naseri H, Homaeinezha MR. Detection and boundary identification of phonocardiogram sounds using an expert frequency-energy based metric. Ann Biomed Eng. 2013;41:279–92.
Article Google Scholar
Boutana D, Benidir M, Barkat B. Segmentation and identification of some pathological phonocardiogram signals using time-frequency analysis. IET Signal Process. 2011;5:527–37.
Article MathSciNet Google Scholar
Tang H, et al. Segmentation of heart sounds based on dynamic clustering. Biomed Signal Process Control. 2012;7:509–16.
Article Google Scholar
Heart Sound & Murmur Library. University of Michigan, US. http://www.med.umich.edu/lrc/psb_open/html/repo/primer_heartsound/primer_heartsound.html. Accessed 7 Apr 2017.
Choi S, Jiang Z. Comparison of envelope extraction algorithms for cardiac sound signal segmentation. Expert Syst Appl. 2008;34:1056–69.
Article Google Scholar
Qiu S, et al. A stimulus artifact removal technique for SEMG signal processing during functional electrical stimulation. IEEE Trans Biomed Eng. 2015;62:8.
Article Google Scholar
Zheng Y, Guo X, Ding X. A novel hybrid energy fraction and entropy-based approach for systolic heart murmurs identification. Expert Syst Appl. 2015;42:2710–21.
Article Google Scholar
Chen Y, Duan H. A QRS complex detection algorithm based on mathematical morphology and envelope. In: Engineering in Medicine and Biology Society. IEEE-EMBS 2005. 27th Annual International Conference of the. IEEE, 2006. p. 4654–57.
Chu CHH, Delp EJ. Impulsive noise suppression and background normalization of electrocardiogram signals using morphological operators. IEEE Trans Biomed Eng. 1989;36:262–73.
Article Google Scholar
Trahanias PE. An approach to QRS complex detection using mathematical morphology. IEEE Trans Biomed Eng. 1993;40:201–5.
Article Google Scholar
Barma S, et al. Detection of the third heart sound based on nonlinear signal decomposition and time–frequency localization. IEEE Trans Biomed Eng. 2016;63:1718–27.
Article Google Scholar
Hu JL, Bao SD. An approach to QRS complex detection based on multiscale mathematical morphology. In: Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on. Vol. 2. IEEE, 2010.
Stockwell RG, Mansinha L, Lowe RP. Localization of the complex spectrum: the S transform. IEEE Trans Signal Process. 1996;44:998–1001.
Article Google Scholar
Varghees VN, Ramachandran KI. A novel heart sound activity detection framework for automated heart sound analysis. Biomed Signal Process Control. 2014;13:174–88.
Article Google Scholar
Sepehri AA, et al. A novel method for pediatric heart sound segmentation without using the ECG. Comput Methods Appl Mech Eng. 2010;99:43–8.
Google Scholar
Pedrosa J, Castro A, Vinhoza TTV. Automatic heart sound segmentation and murmur detection in pediatric phonocardiograms. In: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE. IEEE, 2014.
Wang P, et al. Phonocardiographic signal analysis method using a modified hidden markov model. Ann Biomed Eng. 2007;35:367–74.
Article Google Scholar
Schmidt SE, et al. Segmentation of heart sound recordings from an electronic stethoscope by a duration dependent hidden-markov model. In: Computers in cardiology. Bologna: IEEE; 14–17 September 2008, p. 345–48.
Moukadem A, et al. Localization of heart sounds based on S-transform and radial basis function neural network. In: 15th Nordic-Baltic Conference on Biomedical Engineering and Medical Physics (NBC 2011). Springer: Heidelberg, 2011.
Zhong L, et al. A robust envelope extraction algorithm for cardiac sound signal segmentation. In: Bioinformatics and Biomedical Engineering, 2011 5th International Conference on. IEEE, 2011.
Wang X, et al. Detection of the first and second heart sound using heart sound energy. In: Biomedical Engineering and Informatics, 2009. BMEI’09. 2nd International Conference on. IEEE, 2009.

Download references

Authors’ contributions

QL developed the main algorithm. XW and XM provided professional verification of the experimental results. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank Science and Technology Commission of Shanghai Municipality for financial support.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The datasets analyzed during the current study are available in the [Heart Sound & Murmur Library of UMich] repository, (http://www.med.umich.edu/lrc/psb_open/html/repo/primer_heartsound/primer_heartsound.html).

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Funding

This work received financial support from Science and Technology Commission of Shanghai Municipality, Grant No.15DZ2251700 and 13DZ1941802.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Department of Electronic Engineering, Fudan University, Room 522 B, Science Building, 220 Handan Rd., Shanghai, China
Qingshu Liu & Xiaomei Wu
Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention (MICCAI) of Shanghai, 138 Medical College Rd., Shanghai, China
Xiaomei Wu
Children’s Hospital of Fudan University, 399 Wanyuan Rd., Shanghai, China
Xiaojing Ma

Authors

Qingshu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojing Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaomei Wu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Liu, Q., Wu, X. & Ma, X. An automatic segmentation method for heart sounds. BioMed Eng OnLine 17, 106 (2018). https://doi.org/10.1186/s12938-018-0538-9

Download citation

Received: 23 May 2018
Accepted: 30 July 2018
Published: 06 August 2018
DOI: https://doi.org/10.1186/s12938-018-0538-9

An automatic segmentation method for heart sounds

Abstract

Background

Methods

Results

Conclusion

Similar content being viewed by others

Segmentation of Heart Sound by Clustering Using Spectral and Temporal Features

Heart Sound Segmentation in Noisy Environments

Key Techniques and Challenges for Processing of Heart Sound Signals

Background

Methods

Materials

Preprocessing

Down-sampling

Denoising

Normalization

Murmur elimination

Boundary detection

Envelope extraction

Thresholding processing

Opening operation

Energy thresholding processing

Component identification

Cardiac cycle calculation

Component recognition

Results

Experimental setup

Murmur elimination evaluation

Boundary detection and component identification evaluation

Results of murmur elimination

Results of boundary detection and component identification

Discussion

Conclusions

References

Authors’ contributions

Acknowledgements

Competing interests

Availability of data and materials

Consent for publication

Ethics approval and consent to participate

Funding

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation