Introduction

The electrocardiogram, frequently called ECG, is a routine diagnostic test to assess the electrical and muscular functions of the heart. A trained person looking at an ECG record can for instance interpret the rate and rhythm of heartbeats; estimate the size of the heart, the health of its muscles and its electrical systems; check for effects or side effects of medications on the heart, or check heart abnormalities caused by other health conditions. At the present time, ambulatory ECG monitoring serves to detect and characterize abnormal cardiac functions during long hours of ordinary daily activities. Thereby the validated diagnostic role of ECG recording has been extended beyond the bedside1,2,3.

The broad use of ECG records, in particular as a mean of supporting clinical health care from a distance, enhances the significance of dedicated techniques for compressing this type of data. Compression of ECG signals may be realized without any loss in the signal reconstruction, what is referred to as lossless compression, or allowing some distortion which does not change the clinical information of the data. The latter is called lossy compression. This procedure can enclose an ECG signal within a file significantly smaller than that containing the uncompressed record.

The literature concerning both lossless4,5,6,7,8 and lossy compression9,10,11,12,13,14,15 of ECG records is vast. It includes emerging methodologies based on compressed sensing16,17,18,19. This work focusses on lossy compression with good performance at low distortion recovery. Even if the approach falls within the standard transform compression category, it achieves stunning results. Fresh benchmarks on the MIT-BIH Arrhythmia database are produced for values of PRD as in recent publications11,12,14,15.

The transformation step applies a Discrete Wavelet Transform (DWT). It is recommended to use the fast Cohen-Daubechies-Feauveau 9/7 (CDF 9/7) DWT20, but other possibilities could also be applied. Techniques for ECG signal compression using a wavelet transform have been reported in numerous publications. For a review paper with extensive references see21. The main difference introduced by our proposal lies in the compression method. In particular in what we refer to as the Organization and Storage stage. One of the findings of this work is the appreciation that remarkable compression results are achievable even prescinding from the typical entropy coding step for saving the outputs of the algorithm. High compression is attained in straightforward manner by saving in the Hierarchical Data Format (HDF)22. More precisely, in the compressed HDF5 version which is supported by a number of commercial and non-commercial software platforms including MATLAB, Octave, Mathematica, and Python. HDF5 also implements a high-level Application Programming Interface (API) with C, C++, Fortran 90, and Java interfaces. As will be illustrated here, if implemented in software, adding to the algorithm an entropy coding process may improve compression further, but at expense of processing time. Either way, the compression results for distortion corresponding to mean PRD in the range [0.48, 1.71] are shown to significantly improve recently reported benchmarks11,12,14,15 on the MIT-BIH Arrhythmia database. For PRD < 0.4 the technique becomes less effective.

Method

Before describing the method let’s introduce the notational convention. \({\mathbb{R}}\) is the set of real numbers. Bold face lower cases are used to represent one dimension arrays and standard mathematical fonts to indicate their components, e.g. \({\bf{c}}\in {{\mathbb{R}}}^{N}\) is an array of N real components c(i),i = 1, …, N, or equivalently c = (c(1), …, c(N)). Within the algorithms, operations on components will be indicated with a dot, e.g. c.2 = (c(1)2, …, c(N)2) and |c.| = (|c(1)|, …, |c(N)|). Moreover t = cumsum (|c.|2) is a vector of components \(t(n)={\sum }_{i\mathrm{=1}}^{n}\,{|c(i)|}^{2},\,\,n=\mathrm{1,}\,\ldots ,\,N\).

The proposed compression algorithm consists of three distinctive steps.

  1. (1)

    Approximation Step. Applies a DWT to the signal keeping the largest coefficients to produce an approximation of the signal up to the target quality.

  2. (2)

    Quantization Step. Uses a scalar quantizer to convert the wavelet coefficients in multiples of integer numbers.

  3. (3)

    Organization and Storage Step. Organizes the outputs of steps (1) and (2) for economizing storage space.

At the Approximation Step a DWT is applied to convert the signal \({\bf{f}}\in {{\mathbb{R}}}^{N}\) into the vector \({\bf{w}}\in {{\mathbb{R}}}^{N}\) whose components are the wavelet coefficients (w(1), …, w(N)). For deciding on the number of nonzero coefficients to be involved in the approximation we consider two possibilities:

  1. (a)

    The wavelet coefficients (w(1), …, w(N)) are sorted in ascending order of their absolute value (w(γ1), …, w(γN)), with |w(γ1)|≤ ⋯ ≤|w(γN)|. The cumulative sums \(t(n)={\sum }_{i=1}^{k}\,{|w({\gamma }_{i})|}^{2},\,n=\mathrm{1,}\,\ldots ,\,N\) are calculated to find all the values n such that t(n) ≥ tol2. Let k + 1 be the smallest of these values. Then the indices γi,i = k + 1,…N give the coefficients w(γi), i = k + 1, …, N of largest absolute value. Algorithm 1 summarizes the procedure.

  2. (b)

    After the quantization step the nonzero coefficients and their corresponding indices are gathered together.

Algorithm 1
figure a

Selection of the largest wavelet coefficients Procedure \([{\bf{c}},\ell ]={\rm{SLWC}}({\bf{w}},\,{\rm{tol}})\).

At the Quantization Step the selected wavelet coefficients c = (c(1), …, c(K)), with K = N − k and c(i − k) = w(γi), i = k + 1, …, N, are transformed into integers by a mid-tread uniform quantizer as follows:

$${c}^{{\rm{\Delta }}}(i)=\lfloor \frac{c(i)}{{\rm{\Delta }}}+\frac{1}{2}\rfloor ,\,i=\mathrm{1,}\,\ldots ,\,K.$$
(1)

where \(\lfloor x\rfloor \) indicates the largest integer number smaller or equal to x and Δ is the quantization parameter. After quantization, the coefficients and indices are further reduced by the elimination of those coefficients which are mapped to zero by the quantizer. The above mentioned option (b) follows from this process. It comes into effect by skipping Algorithm 1. The signs of the coefficients are encoded separately using a binary alphabet (1 for + and 0 for −) in an array (s(1), …, s(K)).

Since the indices \({\ell }_{i},\,i=\mathrm{1,}\,\ldots ,\,K\) are large numbers, in order to store them in an effective manner at the Organization and Storage Step we proceed as follows. These indices are re-ordered in ascending order \({\ell }_{i}\to {\tilde{\ell }}_{i},\,i=\mathrm{1,}\,\ldots ,\,K\), which guarantees that \({\tilde{\ell }}_{i} < {\tilde{\ell }}_{i+1},\,i=\mathrm{1,}\,\ldots ,\,K\). This induces a re-order in the coefficients, \({{\bf{c}}}^{{\rm{\Delta }}}\to {\tilde{{\bf{c}}}}^{{\rm{\Delta }}}\) and in the corresponding signs \({\bf{s}}\to \tilde{{\bf{s}}}\). The re-ordered indices are stored as smaller positive numbers by taking differences between two consecutive values. Defining \(\delta (i)={\tilde{\ell }}_{i}-{\tilde{\ell }}_{i-1},\,i=\mathrm{2,}\,\ldots ,\,K\) the array \(\tilde{{\boldsymbol{\delta }}}=({\tilde{\ell }}_{1},\,\delta \mathrm{(2),}\,\ldots ,\,\delta (K))\) stores the indices \({\tilde{\ell }}_{1},\,\ldots ,\,{\tilde{\ell }}_{K}\) with unique recovery. The size of the signal, N, the quantization parameter Δ, and the arrays \({\tilde{{\bf{c}}}}^{{\rm{\Delta }}}\), \(\tilde{{\bf{s}}}\), and \(\tilde{{\boldsymbol{\delta }}}\) are saved in HDF5 format. The HDF5 library operates using a chunked storage mechanism. The data array is split into equally sized chunks each of which is stored separately in the file. Compression is applied to each individual chunk using gzip. The gzip method is based of on the DEFLATE algorithm, which is a combination of LZ7723 and Huffman coding24. Within MATLAB all this is implemented simply by using the function save to store the data.

Algorithm 2 outlines a pseudo code of the above described compression procedure.

Algorithm 2
figure b

Compression Procedure.

The fast wavelet transform has computational complexity O(N). Thus, if the approach (a) is applied, the computational complexity of Algorithm 2 is dominated by the sort operation in Algorithm 1 with average computational complexity O(NlogN). Otherwise the complexity is just O(N), because the number K of indices of nonzero coefficients to be sorted is in general much less than N. Nevertheless, as will be shown in the Numerical Example III, in either case the compression of a 30 min record is achieved on a MATLAB platform in an average time less then 0.2 s. While compression performance can be improved further by adding an entropy coding step before saving the arrays, if implemented in software such a step slows the process.

When selecting the number of wavelet coefficients for the approximation by method a) the parameter tol is fixed as follows: Assuming that the target PRD before quantization is PRD0 we set \({\rm{tol}}={{\rm{PRD}}}_{{\rm{0}}}\Vert f\Vert /100\). The value of PRD0 is fixed as 70–80% of the required PRD. The quantization parameter is tuned to achieve the required PRD.

Signal recovery

At the Decoding Stage the signal is recovered by the following steps.

  • Read the number N, the quantization parameter Δ, and the arrays \({\tilde{{\bf{c}}}}^{{\rm{\Delta }}}\), \(\tilde{{\boldsymbol{\delta }}}\), and \(\tilde{{\bf{s}}}\) from the compressed file.

  • Recover the magnitude of the coefficients from their quantized version as

    $${\tilde{{\bf{c}}}}^{{\rm{r}}}={\rm{\Delta }}{\tilde{{\bf{c}}}}^{{\rm{\Delta }}}.$$
    (2)
  • Recover the indices \(\tilde{\ell }\) from the array \(\tilde{{\boldsymbol{\delta }}}\) as: \({\tilde{\ell }}_{1}=\tilde{\delta }\mathrm{(1)}\) and \({\tilde{\ell }}_{i}=\tilde{\delta }(i)+\tilde{\delta }(i-\mathrm{1),}\,i=\mathrm{2,}\,\ldots ,\,K\mathrm{.}\)

  • Recover the signs of the the wavelet coefficients as \({\tilde{{\bf{s}}}}^{{\rm{r}}}=2\tilde{{\bf{s}}}-1\)

  • Complete the full array of wavelet coefficients as wr(i) = 0, i = 1, …, N and \({{\bf{w}}}^{{\rm{r}}}(\tilde{\ell })={\tilde{{\bf{s}}}}^{{\rm{r}}}\mathrm{.}{\tilde{{\bf{c}}}}^{{\rm{r}}}\)

  • Invert the wavelet transform to recover the approximated signal fr.

As shown in Tables 57, and the recovery process runs about 3 times faster than the compression procedure, which is already very fast.

Results

We present here four numerical tests with different purposes. Except for the comparison in Test II, all the other tests use the full MIT-BIH Arrhythmia database25 which contains 48 ECG records. Each of these records consists of N = 650000 11-bit samples at a frequency of 360 Hz. The algorithms are implemented using MATLAB in a notebook Core i7 3520 M, 4GB RAM.

Since the compression performance of lossy compression has to be considered in relation to the quality of the recovered signals, we introduce at this point the measures to evaluate the results of the proposed procedure.

The quality of a recovered signal is assessed with respect to the PRD calculated as follows,

$${\rm{PRD}}=\frac{\Vert {\bf{f}}-{{\bf{f}}}^{{\rm{r}}}\Vert }{\Vert {\bf{f}}\Vert }\times 100 \% ,$$
(3)

where, f is the original signal, fr is the signal reconstructed from the compressed file and \(\Vert \cdot \Vert \) indicates the 2-norm. Since the PRD strongly depends on the baseline of the signal, the PRDN, as defined below, is also reported.

$${\rm{PRDN}}=\frac{\Vert {\bf{f}}-{{\bf{f}}}^{{\rm{r}}}\Vert }{\Vert {\bf{f}}-\overline{{\bf{f}}}\Vert }\times 100 \% ,$$
(4)

where, \(\overline{{\bf{f}}}\) indicates the mean value of f.

When fixing a value of PRD, the compression performance is assessed by the Compression Ratio (CR) as given by

$${\rm{C}}{\rm{R}}=\frac{{\rm{S}}{\rm{i}}{\rm{z}}{\rm{e}}\,{\rm{o}}{\rm{f}}\,{\rm{t}}{\rm{h}}{\rm{e}}\,{\rm{u}}{\rm{n}}{\rm{c}}{\rm{o}}{\rm{m}}{\rm{p}}{\rm{r}}{\rm{e}}{\rm{s}}{\rm{s}}{\rm{e}}{\rm{d}}\,{\rm{f}}{\rm{i}}{\rm{l}}{\rm{e}}}{{\rm{S}}{\rm{i}}{\rm{z}}{\rm{e}}\,{\rm{o}}{\rm{f}}\,{\rm{t}}{\rm{h}}{\rm{e}}\,{\rm{c}}{\rm{o}}{\rm{m}}{\rm{p}}{\rm{r}}{\rm{e}}{\rm{s}}{\rm{s}}{\rm{e}}{\rm{d}}\,{\rm{f}}{\rm{i}}{\rm{l}}{\rm{e}}}.$$
(5)

The quality score (QS), reflecting the tradeoff between compression performance and reconstruction quality, is the ratio:

$${\rm{QS}}=\frac{{\rm{CR}}}{{\rm{PRD}}}.$$
(6)

Since the PRD is a global quantity, in order to detect possible local changes in the visual quality of the recovered signal, we define the local PRD as follows. Each signal is partitioned in Q segments fq, q = 1 …, Q of L samples. The local PRD with respect to every segment in the partition, which we indicate as prd(q), q = 1, … Q, is calculated as

$${\rm{prd}}(q)=\frac{\Vert {{\bf{f}}}_{q}-{{\bf{f}}}_{q}^{{\rm{r}}}\Vert }{\Vert {{\bf{f}}}_{q}\Vert }\times 100 \% ,$$
(7)

where \({{\bf{f}}}_{q}^{{\rm{r}}}\) is the recovered portion of the signal corresponding to the segment q. For each record the mean value prd (\(\overline{{\rm{prd}}}\)) and corresponding standard deviation (std) are calculated as

$$\overline{{\rm{prd}}}=\frac{1}{Q}\sum _{q=1}^{Q}\,{\rm{prd}}(q)$$
(8)

and

$${\rm{std}}=\sqrt{\frac{1}{Q-1}\sum _{q=1}^{Q}\,{({\rm{prd}}(q)-\overline{{\rm{prd}}})}^{2}}.$$
(9)

The mean value prd with respect to all the records in the database is a double average \(\overline{\overline{{\rm{prd}}}}\).

When comparing two approaches on a database we reproduce the same mean value PRD. The quantification of the relative gain in CR of one particular approach, say approach 1, in relation to another, say approach 2, is given by the quantity:

$${\rm{Gain}}=\frac{{{\rm{CR}}}_{1}-{{\rm{CR}}}_{2}}{{{\rm{CR}}}_{2}}\times 100 \% .$$

The gain in QS has the equivalent definition.

Numerical test I

We start the tests by implementing the proposed approach using wavelet transforms corresponding to different wavelet families at different levels of decomposition. The comparison between different wavelet transforms is realized using approach (b), because within this option each value of PRD is uniquely determined by the quantization parameter Δ. Thus, the difference in CR is only due to the particular wavelet basis and the concomitant decomposition level. Table 1 shows the average CR (indicated as CRb) and corresponding standard deviation (std) with respect to the whole data set and for three different values of PRD. For each PRD-value CRb is obtained by means of the following wavelet basis: db5 (Daubechies) coif4 (Coiflets) sym4 (Symlets) and cdf97 (Cohen-Daubechies-Feauveau). Each basis is decomposed in three different levels (lv).

Table 1 Comparison of CRs for three values of PRD when the proposed approach is implemented using different wavelets at decomposition levels 3, 4, and 5.

As observed in Table 1, on the whole the best CR is achieved with the biorthogonal basis cdb97 for lv = 4. In what follows all the results are given using this basis for decomposition level lv = 4.

Next we produce the CR for every record in the database for a mean value PRD of 0.53.

Table 2 shows the results obtained by approach (a) where the CR and QS produced by this method are indicated as CRa and QSa, respectively. The PRD values for each of the records listed in the first column of Table 2 are given in the forth columns of those tables. The second and third columns show the values of \(\overline{{\rm{prd}}}\) and the corresponding std for each record. The CR is given in the fifth column and the corresponding QS in sixth column of the table. The mean value CR obtained by method (b) for the same mean value PRD = 0.53 is CRb = 22.16.

Table 2 Compression results with approach a), cdf97 DWT, lv = 4, Δ = 35, and PRD0 = 0.4217, for the 48 records in the MIT-BIH Arrhythmia Database listed in the first column of the left and right parts of the table.

Table 3 shows the variations of the CRa with different values of the parameter PRD0 in method (a).

Table 3 Comparison of the CR achieving PRD = 0.53 with method a) of the proposed approach for different values of the parameter PRD0.

Numerical test II

Here comparisons are carried out with respect to results produced by the set partitioning in hierarchical threes algorithm (SPHIT) approach proposed in26. Thus for this test we use the data set described in that publication. It consists of 10-min long segments from records 100, 101, 102, 103, 107, 108, 109, 111, 115, 117, 118, and 111. As indicated in the footnote of26 at pp 853, the given values of PRD correspond to the subtraction of a baseline equal to 1024. This has generated confusion in the literature, as often the values of PRD in Tables III of26 are unfairly reproduced for comparison with values of PRD obtained without subtraction of the 1024 base line. The values of PRD with and without subtraction of that baseline, which are indicated as PRDB and PRD respectively, are given in Table 4. As seen in this table, for the same approximation there is an enormous difference between the two metrics. A fair comparison with respect to the results in26 should either involve the figures in the second row of Table 4 or, as done in26, the fact that a 1024 base line has been subtracted should be specified.

Table 4 Comparison with the results of Table III in26.

The figures in the 3rd row of Table 4 correspond to the CRs in26. The 4th row shows the CRs resulting from method (b) of the proposed approach without entropy coding and the 5th row the results of adding a Huffman coding step before saving the compressed data in HDF5 format. The last two rows show the quantization parameters Δ which produce the required values of PRDB and PRD.

Numerical test III

This numerical test aims at comparing our results with recently reported benchmarks on the full MIT-BIH Arrhythmia database for mean value PRD in the rage [0.23, 1.71]. To the best of our knowledge the highest CRs reported so far for mean value PRD in the range [0.8, 1.30) are those in12, and in the range (1.30,1.71] those in14. For PRD < 0.8 the comparison is realized with the results in11, as shown in Table 7. Table 5 compares our results against the results in Table III of12 and Table 6 against Table 1 of14. In both cases we reproduce the identical mean value of PRD. The differences are in the values of CR and QS. All the Gains given in Table 5 are relative to the results in12 while those given in Tables 6 and 7 are relative to the results in14 and11, respectively.

Table 5 Comparison between the average performance of the proposed method and the method in12 for the same mean value of PRD.
Table 6 Comparison between the average performance of the proposed method and the method in14 for the same mean value of PRD.
Table 7 Comparison between the average compression performance of the proposed method and the method in11 for the same mean value of PRD.

As already remarked, and fully discussed in27, when comparing results from different publications care should be taken to make sure that the comparison is actually on the identical database, without any difference in baseline. From the information given in the papers producing the results we are comparing with (the relation between the values of PRD and PRDN) we can be certain that we are working on the same database25, which is described in28.

The parameters for reproducing the required PRD with methods (a) and (b) are given in the last 3 rows of Tables 57. The previous 3 rows in each table give, in seconds, the average time to compress (tc) and recover (tr) a record. As can be observed, the compression times of approaches (a) and (b) are very similar. The given times were obtained as the average of 10 independent runs. Notice that the CR in these tables do not include the additional entropy coding step.

Figure 1 gives the plot of CR vs PRD for the approaches being compared in this section.

Figure 1
figure 1

CR vs PRD corresponding to the proposed approach method (b) (blue line) and the approaches in12 (green line)14, (yellow line) and11 red line.

Numerical test IV

Finally we would like to highlight the following two features of the proposed compression algorithm.

  1. (1)

    One of the distinctive features stems from the significance of saving the outputs of the algorithm directly in compressed HDF5 format. In order to highlight this, we compare the size of the file saved in this way against the size of the file obtained by applying a commonly used entropy coding process, Huffman coding, before saving the data in HDF5 format. The implementation of Huffman coding is realized, as in Table 4, by the off the shelf MATLAB functions Huff06 available on29. In Table 8 CRa and CRb indicate, as before, the CR obtained when the outputs of methods (a) and (b) are directly saved in HDF5 format. \({{\rm{CR}}}_{{\rm{a}}}^{{\rm{Huff}}}\) and \({{\rm{CR}}}_{{\rm{b}}}^{{\rm{Huff}}}\) indicate the CR when Huffman coding is applied on the outputs (a) and (b) before saving the data in HDF5 format. The rows right below the CRs give the corresponding compression times.

    Table 8 Comparison of different storage methods. CRa and CRb are the CRs from approaches (a) and (b) when the outputs are saved directly in HFD5 format. \({{\rm{CR}}}_{{\rm{a}}}^{{\rm{Huff}}}\) and \({{\rm{CR}}}_{{\rm{b}}}^{{\rm{Huff}}}\) are the corresponding values when the Huffman codding step is applied before saving the data in HFD5 format.
  2. (2)

    The other distinctive feature of the method is the significance of the proposed Organization and Storage step. In order to illustrate this, we compare the results obtained by method (b) with those obtained using the conventional Run-Length (RL) algorithm30 instead of storing the indices of nonzero coefficients as proposed in this work. The CR corresponding to RL in HDF5 format is indicated in Table 8 as CRRL. When Huffman coding is applied on RL before saving the outputs in compressed HDF5 format, the CR is indicated as \({{\rm{CR}}}_{{\rm{RL}}}^{{\rm{Huff}}}\).

Discussion

We notice that, while the results in Table 1 show some differences in CR when different wavelets are used for the DWT, it is clear from the table that the selection of the wavelet family is not the crucial factor for the success of the technique. The same is true for the decomposition level. That said, since the best results correspond to the cdf97 family at decomposition level 4, we have realized the other numerical tests with that wavelet basis.

We chose to produce full results for a mean value PRD of 0.53 (c.f. Table 2) as this value represents a good compromise between compression performance and high visual similitude of the recovered signal and the raw data. Indeed, in15 the quality of the recovered signals giving rise to a mean value PRD of 0.53 is illustrated in relation to the high performance of automatic QRS complex detection. However, the compression ratio of their method is low. For the same mean value of PRD our CR is 5 times larger: 4.515 vs 23.17 (Table 2). As observed in Table 2 the mean value of the local quantity prd is equivalent to the global value (PRD). Nevertheless the prd may differ for some of the segments in a record. Figure 2 plots the prd for record 101 partitioned into Q = 325 segments of length L = 2000 sample points. Notice that there are a few segments corresponding to significantly larger values of prd than the others. Accordingly, with the aim of demonstrating the visual quality of the recovered signals, for each signal in the database we detect the segment \({q}^{\ast }\) of maximum distortion with respect to the prd as

$${q}^{\ast }=\mathop{{\rm{\arg }}\,{\rm{\max }}}\limits_{q\,=\,\mathrm{1,}\,\ldots ,\,Q}{\rm{prd}}(q\mathrm{).}$$
(10)
Figure 2
figure 2

Values of prd for the Q = 325 segments of length L = 2000 in record 101.

The left graphs of Fig. 3 correspond to the segments of maximum prd with respect to all the records in the database and segments of length L = 2000. These are: the segment 25 of records 101, when applying the approximation approach (a) (top graph), and segment 175 of record 213 for approach (b) (bottom graph). The upper waveforms in all the graphs are the raw ECG data. The lower waveforms are the corresponding approximations which have been shifted down for visual convenience. The bottom lines in all the graphs represent the absolute value of the difference between the raw data and their corresponding approximation. The right graphs of Fig. 3 have the same description as the left ones but the segments correspond to values of prd close to the mean value prd for the corresponding record.

Figure 3
figure 3

The upper waveforms in all the graphs are the raw data. The lower waveforms are the corresponding approximations which have been shifted down for visual convenience. The bottom lines represent the absolute value of the difference between the raw data and the approximation. The top left graph corresponds to segment 25 in record 101 and the right one corresponds to segment 120 in the same record. The bottom graphs have the same description as the top graphs but for record 213 and segment 175 (left) and 51 (right).

It is worth commenting that the difference in the results between approaches (a) and (b) is consequence of the fact that the concomitant parameters are set to approximate the whole database at a fixed mean value PRD. In that sense, approach (a) provides us with some flexibility (there are two parameters to be fixed to match the required PRD) whereas for approach (b) the only parameter (Δ) is completely determined by the required PRD. As observed in Table 3, when setting the parameter PRD0 much smaller than the target PRD the approximation is only influenced by the quantization parameter Δ and methods (a) and (b) coincide. Contrarily, when setting the PRD0 too close to the target PRD the quantization parameter needs to be significantly reduced, which affects the compression results. For a target PRD≥0.4 we recommend to set PRD0 as 70–80% of the required PRD.

For values of PRD < 0.4 the storage approach is not as effective as for larger values of PRD. This is noticeable in both Tables 4 and 8. Another feature that appears for PRD < 0.4 is that applying the entropy coding step, before saving the data in compressed HDF5 format, improves the CR much more than for larger values of PRD. This is because for PRD < 0.4 the approximation fits noise and small details, for which components in higher wavelet bands are required. Contrarily, for larger values of PRD the adopted uniform quantization keeps wavelet coefficients in the first bands. As a result, through the proposed technique the location of the nonzero wavelet coefficients is encoded in an array which contains mainly a long stream of ones. For small values of PRD the array’s length increases to include different numbers. This is why the addition of an entropy coding step, such as Huffman coding which assigns smaller bits to the most frequent symbols, becomes more important. In any case, if the outputs are saved in HDF5 format, adding the Huffman coding step is beneficial. Nonetheless, since when implemented in software the improvement comes at expense of computational time, for PRD > 0.4 this step can be avoided and the CR is still very high.

Comparisons with the conventional RL algorithm, in Table 8, enhances the suitability of the proposal for storing the location of nonzero coefficients. A similar storage strategy has been successfully used with other approximation techniques for compression of melodic music31 and X-Ray medical images32. In this case the strategy is even more efficient, because the approximation is realized using a basis and on the whole signal, which intensifies the efficiency of the storage approach.

Conclusions

An effective and efficient method for compressing ECG signals has been proposed. The proposal was tested on the MIT-BIH Arrhythmia database, which gave rise to benchmarks improving upon recently reported results. The main feature of the method is its simplicity and the fact that for values of PRD > 0.4 a dedicated entropy coding to save the outputs can be avoided by saving the outputs of the algorithm in compressed HDF5. This solution involves a time delay which is practically negligible in relation to the signal length: 0.14 s for compressing a 30 min record. Two approaches for reducing wavelet coefficients have been considered. Approach (b) arises from switching off in approach (a) the selection of the largest wavelet coefficients before quantization. It was shown that, when approximating a whole database to obtain a fixed mean value of PRD, approach (a) may render a higher mean vale of CR when the target PRD is greater the 0.4.

The role of the proposed Organization and Store strategy was highlighted by comparison with the conventional Run Length algorithm. Whilst the latter produces smaller CRs, the results are still good in comparison with previously reported benchmarks. This outcome leads to conclude that, using the a wavelet transform on the whole signal, uniform quantization for all the wavelet bands works well in the design of a codec for lossy compression of ECG signals.

Note: The MATLAB codes for implementing the whole approach have been made available on a dedicated website29,33.