Advertisement

Introducing Negative Evidence in Ensemble Clustering Application in Automatic ECG Analysis

  • David G. MárquezEmail author
  • Ana L. N. Fred
  • Abraham Otero
  • Constantino A. García
  • Paulo Félix
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9370)

Abstract

Ensemble clustering generates data partitions by using different data representations and/or clustering algorithms. Each partition provides independent evidence to generate the final partition: two instances falling in the same cluster provide evidence towards them belonging to the same final partition.

In this paper we argue that, for some data representations, the fact that two instances fall in the same cluster of a given partition could provide little to no evidence towards them belonging to the same final partition. However, the fact that they fall in different clusters could provide strong negative evidence of them belonging to the same partition.

Based on this concept, we have developed a new ensemble clustering algorithm which has been applied to the heartbeat clustering problem. By taking advantage of the negative evidence we have decreased the misclassification rate over the MIT-BIH database, the gold standard test for this problem, from 2.25 % to 1.45 %.

Keywords

Clustering ensembles Evidence accumulation Heartbeat clustering Heartbeat representation Hermite functions ECG 

1 Introduction

Clustering is defined as the task of grouping together similar objects into groups called clusters. This process is one of the steps in exploratory data analysis and due to its usefulness has been addressed by researchers in many fields. The appropriate clustering algorithm and parameter settings (including parameters such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Therefore, we can not claim that a given algorithm will always perform better than others [3]. Furthermore, different clustering solutions may seem equally plausible without prior knowledge about the underlying data distributions. Part of the data analysis process is selecting the best clustering algorithm for the given data based on the information available on the problem. This is done by applying and tuning several algorithms in an iterative process to determine the best choice. This process is time-consuming and prone to error.
Fig. 1.

(a) shows the three natural partitions present in the dataset. (b) and (c) show two versions of the K-means algorithm with K = 3 and K = 20, respectively. (d) shows the result of the combination in a single partition of 100 different partitions created by K-means algorithm using different values for K in a range between 7 and 37.

Inspired by the work in classifier combination, clustering combination approaches have been developed [1, 6, 7, 11, 22] and have emerged as a powerful method to improve the robustness and the stability of the clustering results. By combining the strengths of many individual algorithms we can improve the overall performance. Figure 1 shows how combining the results of 100 partitions created by the K-means algorithm, which is only able to create globular shape partitions, the three natural non-globular partitions are discovered. These natural partitions could never be discovered in a single execution of the K-means algorithm. Furthermore, the combination of algorithms also reduces the problem of the high dependence between an algorithm and the clustering results.

In this paper we extend the clustering ensemble method, known as EAC, first proposed in [7] by introducing the concept of negative evidence. The negative evidence is obtained from the instances that are not clustered together. We also illustrate the proposed method in application to heartbeat clustering.

2 Clustering Ensembles

The result of a clustering algorithm is a data partition P of n elements organized into k clusters. A clustering ensemble is defined as the set of m different data partitions \(P_{1},P_{2},\dots ,P_{m}\), obtained with different algorithms or different data representations, ultimately combined in a final data partition \(P_{*}\). This final data partition \(P_{*}\) should, in general, have a better overall performance than any of the individual partitions.

The following sections present our algorithm built on the EAC paradigm. First we explain how the data partitions \(P_{1},P_{2},\dots ,P_{m}\) are created. Then we explain how the partitions are combined to obtain both positive and negative evidence. Finally, we show how the final data partition \(P_{*}\) is created from the evidence.

2.1 Generating the Clustering Ensembles

To harness the potential of clustering ensembles, the data partitions \(P_{1},P_{2},\dots ,P_{m}\) must be different. Employing different sets of features, or perturbing the data with techniques like bagging or sampling, will produce different partitions. We can also obtain different partitions by changing the initialization or the parameters of the clustering algorithm, or by using different algorithms. With the combination of several data partitions the particular quirks of each one can be abstracted in order to find the best partition that summarizes all the results.

Among the various clustering methods, the K-means algorithm, which minimizes the squared-error criteria, is one of the simplest algorithms. Its simplicity is one of its biggest advantages, making it computationally efficient and fast. Its reduced number of parameters (typically only the number of clusters) is another strong point. Its major limitation is the inability to identify clusters with arbitrary shapes, ultimately imposing hyperspherical shaped clusters on the data. By combining the results of several K-means executions this limitation can be overcome [8].

In this work we shall use K-means to generate each data partition. We use this method due to its simplicity and small computational cost, but any other partitioning clustering algorithm may be used. To obtain different partitions we shall run the algorithm with random initializations and different values for the number of clusters. The number of clusters k in each execution will be determined at random in a range given by
$$\begin{aligned} k \in [\sqrt{n}/{2},\sqrt{n}] \end{aligned}$$
(1)
being n the number of instances in the data.

2.2 Combining the Data Partitions

In the literature there are several methods to combine data partitions in ensemble clustering. In this work we shall adopt the evidence accumulation method proposed in [7] and we shall expand it by introducing the concept of negative evidence. One of its advantages is that it can combine data partitions with different number of clusters.

The evidence accumulation method gathers the evidence in C, a nxn matrix, using a voting mechanism. For each data partition, the co-occurrence of a pair of instances i and j in the same cluster will be stored as a vote. The underlying assumption is that instances belonging to the same “natural” cluster are more likely to be assigned to the same cluster in the different data partitions \(P_{1},P_{2},\dots ,P_{m}\).

The final evidence matrix built from the m partitions will be calculated as follows:
$$\begin{aligned} C_{i,j}=\frac{n_{ij}}{m}, \end{aligned}$$
(2)
where \(n_{ij}\) is the number of times the instances i and j are assigned to the same cluster in the \(P_{1},P_{2},\dots ,P_{m}\) partitions.

There are some scenarios where the fact that two instances fall in the same cluster of a given partition provides little information about their natural grouping. This weak evidence could introduce noise in the combination of the partitions, worsening the results. However, in these scenarios the fact that two instances fall into different clusters could provide useful evidence against grouping those instances. As we shall argue later, this is the case for some of the features that permit the identification of some arrhythmia types. We shall call this evidence negative evidence.

We shall gather this negative evidence in a nxn matrix \(C^{-}\) that is built as follows:
$$\begin{aligned} C^{-}_{i,j}=-\frac{o_{ij}}{m}, \end{aligned}$$
(3)
where \(o_{ij}\) is the number of times the instances i and j are assigned to different clusters among the \(P^{*}_{1},P^{*}_{2},\dots ,P^{*}_{m}\) partitions. Is important to note that the partitions from which we shall gather the negative evidence cannot be the same partitions used for gathering positive evidence. In fact, the partitions used to gather negative evidence \(P^{*}_{1},P^{*}_{2},\dots ,P^{*}_{m}\) should be generated with different clustering algorithms or data features, more attuned to obtain this type of evidence. Futhermore, negative evidence can only be used in conjunction with positive evidence; negative evidence only indicates which instances should be in different clusters, but not which instances should be in the same cluster.
Finally, positive and negative evidence matrices are combined into a single evidence matrix, E, that will be used to generate the final partition:
$$\begin{aligned} E=C+C^{-}. \end{aligned}$$
(4)

2.3 Extracting the Final Data Partition

The last step of our clustering ensemble algorithm is extracting from the matrix E the final data partition \(P_{*}\). To this end we shall apply a clustering algorithm. The average-link hierarchical clustering algorithm was developed for clustering correlation matrices, such as our evidence matrix [20]. Out of the several algorithms tried, this one has shown the best performance.

Most of the clustering ensemble methods rely on a user-specified number of clusters to build the final data partition \(P_{*}\). In [8] an alternative method is proposed: the use of a lifetime criterion to determine the number of clusters. In an agglomerative algorithm, such as the average-link, each instance starts in its own cluster. In each iteration the closest pair of clusters are merged until only one cluster remains. The k-cluster lifetime value is defined as the absolute difference between the thresholds on the dendogram that lead to the identification of k clusters. This value is calculated for all the possible values of k (i.e., all possible number of clusters). The number of clusters that yields the highest lifetime value will be the one selected for the final data partition.

3 Application to Heartbeat Clustering

Cardiovascular diseases are the first cause of death in the world and are projected to remain the single leading cause of death for the foreseeable future [16]. The analysis of the electrocardiogram (ECG) is an important tool for the study and diagnosis of heart diseases. However, this analysis is a tedious task for the clinicians due to the large amount of data, especially in long recordings such as Holter recordings. For example, a 72-hour Holter recording contains approximately 300,000 heartbeats per lead. The recording can have up to 12 leads. In these cases, the amount of data generated makes necessary the use of automatic tools that support the clinicians. Furthermore, a disadvantage of the visual interpretation of the ECG is the strong dependence of the results on the cardiologist who performs the interpretation.

Although the detection of the heartbeat is a problem solved satisfactorily, its classification based on its origin and propagation path in the myocardium is still an open problem. This task, often referred to as arrhythmia identification, is of great importance for the interpretation of the electrophysiological function of the heart and subsequent diagnosis of the patient condition.

In the literature several approaches have been developed by estimating the underlying mechanisms using a set of labeled heartbeats [5]. However, this approach entails a strong dependence on the pattern diversity present in the training set. Inter-patient and intra-patient differences show that it can not be assumed that a classifier will yield valid results on a new patient, or even for the same patient throughout time. Furthermore, class labels only provide gross information about the origin of the heartbeats in the cardiac tissue, loosing all the information about their conduction pathways. This approach does not distinguish the multiple morphological families present in a given class, as in multifocal arrhythmias.

Heartbeat clustering aims at grouping together in a cluster those heartbeats that show similar properties. The possible differences for heartbeats of the same class are preserved with this method. If the clustering algorithm was successful, the cardiologist just has to inspect a few heartbeats of each cluster to perform an interpretation of all the heartbeats that have fallen into that cluster. There is no clustering method that has shown a significant advantage in the problem of heartbeat clustering. Here we propose the use of the clustering ensembles technique described above, to group the heartbeats according to their different types.

3.1 ECG Database

In this work we used the MIT-BIH Arrhythmia Database [17], which includes a wide range of arrhythmias and can be considered the gold standard test database for automatic detection of arrhythmias [4, 5, 13, 24]. This database is composed by 48 two-channel ambulatory ECG recordings of 30 min. A lead is a recording of the electrical activity of the heart. Different leads record this activity from different positions in the patient’s body, providing slightly different information about the heart. The recordings were digitized at 360 Hz and annotated by two or more cardiologists. Each heartbeat is annotated with its position in time and the type of heartbeat (16 different types). The leads used are the modified limb lead II (MLII) and the modified leads V1, V2, V3, V4 and V5.

3.2 Preprocessing

Before extracting the features that will represent each heartbeat the ECG signal was filtered. We applied a Wavelet filter to eliminate the baseline drift [2]. The low frequency component of the signal was reconstructed using the coefficients of the Discrete Wavelet Transform (DWT) and subtracted from the original signal, removing the baseline drift. To eliminate the noise of the high frequencies a low-pass Butterworth filter was applied with a cutoff frequency of 40 Hz.

3.3 Heartbeat Representation

The choice of data representation has a great impact on the performance of clustering algorithms. In ECG analysis we can find several options. One is using the samples of the digital signal as the feature vector. This representation has the problem of the high dimensionality of the feature vectors and the sensitivity to noise [21]. Another approach is representing the heartbeat by features such as heights and lengths of the waves that make up the beat, which are the same features used by clinicians when reasoning about the beat. However, it is hard to obtain a robust segmentation of the beat to measure those features [14]. The last main approach in the literature is using a liner combination of basis functions to represent the heartbeat [23]. The interpretability of the feature vector is lost with this representation, but its advantages are being compact and robust in the presence of noise. Hermite functions are the most widely used basis function for the representation of heartbeats [9, 10, 12, 13, 18]. They have the advantages of being orthonormal and that the shape of the functions is similar to the shape of the heartbeat.

To obtain the Hermite representation we start by extracting an excerpt of 200 ms around the beat annotation of the database. This size is enough to fit the QRS complex leaving out the P and T waves. Once the QRS has been taken into account, the T wave provides little additional information for arrhythmia detection, therefore it is not generally represented in the feature vector. The P wave provides useful information, but the difficulty in identifying it normally leads to trying to obtain information similar to that provided by the P wave from the distance between consecutive heartbeats [5, 13].

Hermite functions converge to zero at \(\pm \infty \). To achieve this behavior in the ECG, a padding of 100 ms zeros is added on each side of the 200 ms signal excerpt. The resulting window x[l] of 400 ms is represented as:
$$\begin{aligned} \begin{array}{c} x[l]=\sum \limits _{n=0}^{N-1}{c_{n}(\sigma ){\upphi _{n}}[l,\sigma )}+e[l], \\ \quad l=-\left\lfloor \frac{W\cdot f_{s}}{2} \right\rfloor ,-\left\lfloor \frac{W\cdot f_{s}}{2} \right\rfloor + 1,\dots , \left\lfloor \frac{W\cdot f_{s}}{2}\right\rfloor \!\!, \end{array} \end{aligned}$$
(5)
being N the number of Hermite functions used, W the window size in seconds and \(f_{s}\) the sampling frequency. \(\upphi _{n}[l,\sigma )\) is the n-th discrete Hermite function obtained by sampling at \(f_{s}\) the n-th continuous Hermite function \(\upphi (t,\sigma )\), \({c_{n}}\) are the coefficients of the linear combination, e[l] is the error between x[l] and the Hermite representation, and \(\sigma \) controls the width of the Hermite function enabling it to adjust to the width of the QRS. The Hermite functions \(\upphi _{n}[l,\sigma )\), \(0\le n<N\), are defined as:
$$\begin{aligned} \begin{array}{c} {\upphi _{{n}}}[l,\sigma )=\frac{1}{\sqrt{\sigma 2^{n}n!\sqrt{\pi }}} e^{-(l\cdot T_{s})^{2}/2\sigma ^{2}}H_{n}(l\cdot T_{s}/\sigma ), \end{array} \end{aligned}$$
(6)
where \( T_{s}\) is the inverse of the sampling frequency.
The Hermite polynomial \(H_{n}(x)\) can be obtained recursively:
$$\begin{aligned} H_{n}(x)=2xH_{n-1}(x)-2(n-1)H_{n-2}(x), \end{aligned}$$
(7)
with \(H_{0}(x)=1\) and \(H_{1}(x)=2x\).
Fig. 2.

Original beat and Hermite approximation with N \(=\) 3, 6, 9, 12 and 15.

The N coefficients of the linear combination \(c_{n}(\sigma )\), \(0 \le n<N\), and \(\sigma \) compose our representation of the heartbeat. Figure 2 illustrates how heartbeats can be reconstructed from Hermite functions. We can always increase the accuracy of the representation using more functions but, after a certain point, we will start to model noise instead of the QRS complex.

For a given \(\sigma \) the coefficients \(c_{n}(\sigma )\) can be calculated by minimizing the summed square error of the Hermite functions:
$$\begin{aligned} \begin{array}{c} \sum \limits _{l}\left( e[l]\right) ^{2}=\sum \limits _{l}\left( {x[l]-\sum \limits _{n=0}^{N-1}{ c_{n}(\sigma )\upphi _{n}[l,\sigma )}}\right) ^{2}\!. \end{array} \end{aligned}$$
(8)
The minimum of the square error is easily calculated thanks to the orthogonality property:
$$\begin{aligned} c_{n}(\sigma )=\mathbf {x}\cdot \mathbf {\upphi }_{n}(\sigma ), \end{aligned}$$
(9)
where the vectors are defined as \(\mathbf {x}=\{{x[l]}\}\) and \(\mathbf {\upphi } _{n}(\sigma )=\{\upphi _{n}[l,\sigma )\}\).

An iterative stepwise increment of \(\sigma \) was done by recomputing (9) and (8) for each step and selecting the \(\sigma \) that minimizes the error.

To identify arrhythmias that do not affect the morphology of the QRS we generate two rhythm features from the heartbeat position annotations of the database:
$$\begin{aligned} R_{1}[i]=R[i]-R[i-1], \end{aligned}$$
(10)
$$\begin{aligned}&\qquad \qquad \qquad R_{2}[i] =u(\alpha )\cdot \alpha ,\nonumber \\&\alpha = (R_{1}[i+1]-R_{1}[i])-(R_{1}[i]-R_{1}[i-1]), \end{aligned}$$
(11)
where R[i] is the time of occurrence of the i-th beat, and u(x) is the Heaviside step function. In this work we shall use 16 Hermite functions to represent the heartbeat, a number high enough to represent most of the QRS complexes accurately and low enough to not model noise [15]. The representation of each heartbeat will be made up by 36 features, 16 Hermite coefficients for each one of the two leads, one sigma value per lead, and the two rhythm features given by (10) and (11).
Fig. 3.

The figure shows three normal beats followed by three bundle branch block beats. Note how the distance between beats is not altered at any time. The three pathological beats shown in the image have the same values for the features given by Eqs. 10 and 11 as the normal beats.

Fig. 4.

The third beat is a premature atrial beat. It is morphologically identical to the other four normal beats shown in the image. The fact that the distance between the atrial beat and the preceding and subsequent beat is different than the distance between normal beats is key to identify this arrhythmia.

4 Experimental Results

We shall use three different strategies for ensemble generation. The first strategy is the classical approach in machine learning of putting together all the information (Hermite parameters extracted from the two ECG leads and the features given by (10) and (11)) in the same feature vector. Based on this feature vector we generate the data partitions \(P_{1},P_{2},\dots ,P_{m}\).

The second strategy relies on the hypothesis that generating data partitions on different types of features yields better results than using all the features. Each lead in the ECG is recording the electrical activity from a different view point of the heart. This makes some configurations more suited to detect certain pathologies. Furthermore, the rhythm features present a completely different information, with another frame of reference such as the distance between beats instead of the electrical activity. By dividing the information in several sets we can take advantage of these differences and improve the clustering results. Based on these assumptions, the information is split in three different representations: one Hermite representation is obtained from each lead and a third one is obtained from the rhythm features. In this strategy the three representations will be used separately to generate data partitions that provide positive evidence.

In heartbeat clustering, the fact that two beats have approximately the same distance to the next beat and the previous beat doesn’t mean that they are of the same type (see Fig. 3). However, two heartbeats with considerably different distances to the next beat and the previous beat are most likely of different types (see Fig. 4). In the third strategy, based on this knowledge, we will use the same configuration that in the second strategy, but in this case the partition generated from the rhythm features will be used to generate negative evidence.

In the first strategy 100 data partitions are generated using the complete feature set with the K-means algorithm as is explained in Sect. 2.1. In the second strategy 100 data partitions are generated for each one of the three sets of features, making a total of 300 data partitions. The evidence in the data partitions is gathered in the matrices \(C_{S1}\) and \(C_{S2}\) for the first and the second strategy, respectively. In these strategies we are not using the negative evidence.

In the third strategy we will also generate 100 partitions for each of the two Hermite representations extracted from each ECG leads. These partitions provide positive evidence. However, in this case the 100 data partitions corresponding to the rhythm features will be treated as negative evidence (see Eq. 3). To obtain the final matrix for this strategy we shall combine the evidence obtained from the Hermite representation of each ECG lead, \(C_{S3}\), and the evidence obtained from the rhythm features, \(C^{-}_{S3}\), into a final matrix \(E_{S3}\) (see Eq. 4).

The final partitions for each strategy are generated from their respective evidence matrices by applying the average-link method. We ran a test in which the lifetime criterion was used to determine the number of partitions. An additional test with a fixed number of partitions (25) was also run. This second test will allow us to compare our results with the most referenced work in heartbeat clustering [13], where 25 clusters per recording were used.
Table 1.

Results of the clustering for the three strategies. For each strategy we have the number of errors using a fixed number of 25 clusters (25C) and using the lifetime criterion. In the second case we also show the number of clusters chosen by this criterion.

1 Strategy

2 Strategy

3 Strategy

Record

25C

Lifetime

25C

Lifetime

25C

Lifetime

Errors

Errors

Clusters

Errors

Errors

Clusters

Errors

Errors

Clusters

100

33

33

5

6

33

2

6

33

6

101

3

3

7

0

3

5

0

3

5

102

7

47

6

13

58

4

12

30

12

103

1

1

15

0

1

2

0

0

80

104

251

257

11

309

351

4

235

230

35

105

11

12

10

5

5

5

7

5

43

106

2

10

9

1

28

7

1

1

24

107

0

1

9

1

1

3

1

1

29

108

11

16

9

9

9

2

9

9

20

109

4

9

14

2

10

2

3

2

28

111

0

0

11

0

0

2

0

0

17

112

2

2

7

1

2

3

2

2

26

113

0

0

11

0

0

2

0

0

64

114

12

16

12

11

16

4

9

9

36

115

0

0

4

0

0

3

0

0

37

116

2

2

13

0

2

2

1

1

24

117

1

1

7

0

0

7

0

0

56

118

96

96

12

58

100

2

45

7

188

119

0

0

7

0

0

2

0

0

28

121

1

1

12

0

1

2

0

0

27

122

0

0

8

0

0

8

0

0

11

123

0

0

6

0

0

2

0

0

81

124

36

43

9

41

41

4

41

41

11

200

129

130

15

117

531

14

53

52

86

201

48

54

7

50

65

4

48

49

10

202

37

42

12

17

56

2

19

19

31

203

81

82

19

286

385

2

76

59

171

205

14

14

9

13

15

6

14

14

18

207

187

196

20

52

318

5

130

23

76

208

107

109

17

120

449

3

105

143

6

209

181

298

5

106

162

3

66

136

3

210

32

37

17

30

71

2

33

18

120

212

0

0

11

3

4

4

2

2

57

213

112

351

8

90

396

3

102

103

15

214

6

6

14

4

5

12

4

4

31

215

4

5

20

9

26

3

7

4

100

217

34

50

10

65

69

10

58

58

29

219

11

12

9

11

18

3

9

9

6

220

94

94

3

4

94

2

6

5

35

221

1

3

8

1

1

6

1

1

31

222

389

389

10

328

421

2

288

156

183

223

116

125

17

108

265

3

106

62

59

228

3

3

14

3

4

7

3

3

12

230

0

0

9

0

0

3

0

0

25

231

2

2

5

2

2

5

2

2

5

232

388

398

12

80

89

15

68

97

22

233

19

20

18

32

35

3

17

17

57

234

2

50

5

1

50

2

1

1

15

Total

2470

3020

508

1989

4192

203

1590

1411

2091

%

2.25

2.75

1.81

3.81

1.45

1.28

Table 2.

P-values of the Wilcoxon test of significance

1 Strategy vs 2 Strategy

1 Strategy vs 3 Strategy

2 Strategy vs 3 Strategy

25 Clusters

0.0820593

0.0006947

0.0402501

lifetime criterion

0.0046556

0.0000204

0.0000028

To validate our results, we shall consider that each cluster belongs to the beat type of the majority of its heartbeats. All the heartbeats of a different type than their cluster are considered errors. In practice, this mapping between clusters and beat types could be obtained just by having a cardiologist label one beat of each cluster. Table 1 shows the results for the different strategies using this procedure to count the errors. This table contains the number of errors, when using a fixed number of 25 clusters, and the number of errors and the corresponding number of clusters, when using the lifetime criterion, for each recording of the MIT-BIH Arrhythmia Database. We can divide the number of errors by the number of heartbeats in the database (109966 heartbeats) to obtain the error percentage. The results for the fixed number of clusters execution are 2.25 %, 1.81 % and 1.45 % for the first, second and third strategies, respectively. Using the lifetime criterion the results are 2.75 %, 3.81 % and 1.28 %, respectively.

Normality was tested for the number of misclassification errors in each of the strategies and rejected using a Shapiro-Wilk test [19]. A non-parametric Wilcoxon test was used to determine the significance of the differences between strategies. For a fixed number of clusters the p-values were 0.08, \(<\)0.01 and 0.04 for strategy 1 vs. strategy 2, strategy 1 vs. strategy 3, and strategy 2 vs. strategy 3, respectively. When using the lifetime criterion the p-values were \(<\)0.01 in all comparisons (see Table 2).

5 Discussion

The improvement between the first and the second strategy, from 2.25 % to 1.81 % (p-value = 0.08) with the fixed number of clusters, suggests that the idea of splitting the information of each channel and the rhythm features has merit and should be studied further. This idea is particularly interesting to process 12-lead ECG. Usually the 12 leads are available in the clinical routine and they all are used in the diagnosis of the patient. However, up to date, to avoid an explosion on the size of the feature vectors representing the heartbeat, typically only one or two leads are used when trying to identify arrhythmias. Each lead provides a different perspective on the electrical signal and an automatic solution would benefit of combining them. Furthermore, some ECG leads may be misplaced, disconnected or may present noise. A solution that can combine the 12 leads would be more robust and accurate than the normal approach of using only one or two leads.

When the lifetime criterion is used, the error increases from 2.75 % to 3.81 % (p-value = 0.004) between the first and the second strategies. At the same time, the total number of clusters created goes down from 508 to 203 clusters, an average of 10.5 and 4.2 clusters per recording, respectively. A small number of clusters is desired because it means that the cardiologist will have to do less work to interpret the results. However here it comes at the expense of a large increase in the error.

Using a fixed number of clusters, the misclassification error between the second strategy and the third strategy decreases from 1.81 % to 1.45 % (p-value = 0.04). The only difference is the treatment of the information from the rhythm features. This result supports our previous assumption that the use of some representations as negative evidence can improve the results: the use of the rhythm features as negative evidence results in a lower misclassification error than using them as positive evidence.

When using the lifetime criterion, the misclassification error between the second strategy and the third strategy decreases from 3.81 % to 1.28 % (p-value \(<\)0.01). But the lifetime criterion creates 2091 clusters (an average of 43.56 clusters per recording). This proliferation of clusters in some cases, such as in recording 222, may be due to the noise and artifacts present in the recordings. Especially for the third strategy, the results using the lifetime criterion are unsatisfactory due to the high number of groups generated. Some adjustments should be done to control the proliferation of the clusters, or a different criterion should be used to determine the optimum number of clusters.

In [13] a clustering algorithm based on Self Organizing Maps (SOM) is used with a fixed number of 25 clusters. There are some differences between this paper and our work. Lagerholm et al. used their own annotations which may be slightly different from the database annotations that we are using, and they don’t use a high frequency noise filter. Nevertheless, the differences are small enough that a comparison between their results and our results with the fixed number of clusters is relevant. The best result obtained by Lagerholm et al. was an error rate of 1.51 % for the complete MIT-BIH database. In our case in the first and second strategy we obtain worse error rates 2.25 % and 1.81 %. However, using the negative evidence in the third strategy our error rate, 1.45 %, is slightly lower.

6 Conclusions

In traditional clustering ensembles algorithms evidence is accumulated only from the instances that belong to the same partition. We argue that for some data representations the fact that two instances belong to the same partition may provide little or no information. However these data representations need not to be useless; on the contrary: the fact that two instances belong to different partitions created from those representations can provide useful evidence towards the instances belonging to different clusters in the final partition.

In this paper we have introduced the concept of negative evidence to gather evidence from the instances that do not belong to the same data partitions. Based on this concept, we have designed a new ensemble clustering algorithm that exploits both positive and negative evidence to create the final partition. We have applied this algorithm to the problem of heartbeat clustering. Our hypothesis was that the information derived from the distance from one beat to previous and next beats would not be useful for generating positive evidence, but may be used to generate negative evidence. When we apply our algorithm over the MIT-BIH database the misclassification error fall from 1.81 % to 1.45 % when the number of clusters was fixed to 25 and from 3.81 % to 1.28 % when the number of clusters was selected by the lifetime criterion just by using the information extracted from the distance between beats to generate negative evidence instead of positive evidence. These results demonstrate the usefulness of the negative evidence and encourage further research on this concept.

As a future work, we would also like to study the applicability of the clustering ensembles to the 12-lead ECG and to develop further the concept of negative evidence, including other potential applications.

Notes

Acknowledgments

This work was supported by the University San Pablo CEU under the grant PPC12/2014. David G. Márquez is funded by an FPU Grant from the Spanish Ministry of Education (MEC) (Ref. AP2012-5053). Constantino A. García acknowledges the support of Xunta de Galicia under “Plan I2C” Grant program (partially cofunded by The European Social Fund of the European Union).

References

  1. 1.
    Bakker, B., Heskes, T.: Clustering ensembles of neural network models. Neural Netw. 16(2), 261–269 (2003)CrossRefGoogle Scholar
  2. 2.
    Blanco-Velasco, M., Weng, B., Barner, K.E.: ECG signal denoising and baseline wander correction based on the empirical mode decomposition. Comput. Biol. Med. 38(1), 1–13 (2008)CrossRefGoogle Scholar
  3. 3.
    Celebi, M.E.: Partitional Clustering Algorithms. Springer, Cham (2014)zbMATHGoogle Scholar
  4. 4.
    de Chazal, P., O’Dwyer, M., Reilly, R.B.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 51(7), 1196–1206 (2004)CrossRefGoogle Scholar
  5. 5.
    de Chazal, P., Reilly, R.B.: A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 53(12 Pt 1), 2535–2543 (2006)CrossRefGoogle Scholar
  6. 6.
    Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)CrossRefGoogle Scholar
  7. 7.
    Fred, A.: Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  8. 8.
    Fred, A.L., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Analy. Mach. Intell. 27(6), 835–850 (2005)CrossRefGoogle Scholar
  9. 9.
    García, C.A., Otero, A., Vila, X., Márquez, D.G.: A new algorithm for wavelet-based heart rate variability analysis. Biomed. Signal Proces. Control 8(6), 542–550 (2013)CrossRefGoogle Scholar
  10. 10.
    Gil, A., Caffarena, G., Márquez, D.G., Otero, A.: Hermite Polynomial Characterization of Heartbeats with Graphics Processing Units. In: IWBBIO 2014 (2014)Google Scholar
  11. 11.
    Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn. 41(9), 2742–2756 (2008)CrossRefzbMATHGoogle Scholar
  12. 12.
    Jane, R., Olmos, S., Laguna, P.: Adaptive Hermite models for ECG data compression: performance and evaluation with automatic wave detection. In: Computers in Cardiology (1993)Google Scholar
  13. 13.
    Lagerholm, M., Peterson, C., Braccini, G., Edenbrandt, L., Sörnmo, L.: Clustering ECG complexes using Hermite functions and self-organizing maps. IEEE Trans. Biomed. Eng. 47(7), 838–848 (2000)CrossRefGoogle Scholar
  14. 14.
    Madeiro, J.P., Cortez, P.C., Oliveira, F.I., Siqueira, R.S.: A new approach to QRS segmentation based on wavelet bases and adaptive threshold technique. Med. Eng. Phys. 29(1), 26–37 (2007)CrossRefGoogle Scholar
  15. 15.
    Márquez, D.G., Otero, A., Félix, P., García, C.A.: On the Accuracy of Representing Heartbeats with Hermite Basis Functions. In: BIOSIGNALS 2013, pp. 338–341 (2013)Google Scholar
  16. 16.
    Mathers, C.D., Loncar, D.: Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 3(11), e442 (2006)CrossRefGoogle Scholar
  17. 17.
    Moody, G., Mark, R.: The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001)CrossRefGoogle Scholar
  18. 18.
    Park, K., Cho, B., Lee, D., Song, S., Lee, J., Chee, Y., Kim, I., Kim, S.: Hierarchical support vector machine based heartbeat classification using higher order statistics and Hermite basis function. In: 2008 Computers in Cardiology, pp. 229–232. IEEE, September 2008Google Scholar
  19. 19.
    Rodríguez-Fdez, I., Canosa, A., Mucientes, M., Bugarín, A.: STAC: a web platform for the comparison of algorithms using statistical tests (2015). http://tec.citius.usc.es/stac
  20. 20.
    Sokal, R.R.: A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38, 1409–1438 (1958)Google Scholar
  21. 21.
    Sörnmo, L., Laguna, P.: Bioelectrical signal processing in cardiac and neurological applications. Elsevier Academic Press, New York (2005)Google Scholar
  22. 22.
    Topchy, A.P., Jain, A.K., Punch, W.F.: A mixture model for clustering ensembles. In: SDM, pp. 379–390. SIAM (2004)Google Scholar
  23. 23.
    Young, T.Y., Huggins, W.H.: On the representation of electrocardiograms. IEEE Trans. Bio-Med. Electron. 10(3), 86–95 (1963)CrossRefGoogle Scholar
  24. 24.
    Zhang, Z., Dong, J., Luo, X., Choi, K.S., Wu, X.: Heartbeat classification using disease-specific feature selection. Comput. Biol. Med. 46, 79–89 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • David G. Márquez
    • 1
    Email author
  • Ana L. N. Fred
    • 2
  • Abraham Otero
    • 3
  • Constantino A. García
    • 1
  • Paulo Félix
    • 1
  1. 1.Centro de Investigación en Tecnoloxías da Información (CITIUS)University of Santiago de CompostelaSantiago de CompostelaSpain
  2. 2.Instituto de Telecomunicações, Instituto Superior TécnicoLisboaPortugal
  3. 3.Department of Information TechnologiesUniversity San Pablo CEUBoadilla del Monte, MadridSpain

Personalised recommendations