Kalman-based Spectro-Temporal ECG Analysis using Deep Convolutional Networks for Atrial Fibrillation Detection

In this article, we propose a novel ECG classification framework for atrial fibrillation (AF) detection using spectro-temporal representation (i.e., time varying spectrum) and deep convolutional networks. In the first step we use a Bayesian spectro-temporal representation based on the estimation of time-varying coefficients of Fourier series using Kalman filter and smoother. Next, we derive an alternative model based on a stochastic oscillator differential equation to accelerate the estimation of the spectro-temporal representation in lengthy signals. Finally, after comparative evaluations of different convolutional architectures, we propose an efficient deep convolutional neural network to classify the 2D spectro-temporal ECG data. The ECG spectro-temporal data are classified into four different classes: AF, non-AF normal rhythm (Normal), non-AF abnormal rhythm (Other), and noisy segments (Noisy). The performance of the proposed methods is evaluated and scored with the PhysioNet/Computing in Cardiology (CinC) 2017 dataset. The experimental results show that the proposed method achieves the overall F1 score of 80.2%, which is in line with the state-of-the-art algorithms.

wide [6]. It is also estimated that by 2030 only in European Union 14-17 million patients suffer from AF [46]. AF is associated with an increased risk of having stroke (5-fold), blood clots, heart failure, coronary artery disease, or death (2-fold; death rates are doubled by AF) [6]. Therefore, developing automatic algorithms for early detection of AF is crucial.
During AF atrial muscle fibers have chaotic electrical activity which may emit impulses with 500 bpm rate to atrioventricular (AV) node, from which impulses pass randomly. This results to an irregular ventricular response which is one of the main characteristics of AF [39]. In addition, AF has the following characteristics on electrocardiogram (ECG): 1) "absolutely" irregular RR intervals; 2) the absence of P waves; and 3) variable atrial cycle length (when visible).
The analysis of ECG is the most common approach to AF detection, and during the past ten years, various algorithms have been developed for automatic AF detection [1][2][3]5,9,13,22,42,43]. Most of the existing algorithms follow a traditional pipeline of pre-processing, feature extraction, and classification. The recent deep learning (DL) techniques [21] also provide a promising framework for end-to-end classification. In contrast to traditional approaches, one of the most significant advantages of using deep learning for classification is that hand-crafted features are no longer needed, because deep neural networks have the ability of learning the inherent features when provided with a sufficient training data [11]. Whilst surprisingly, the applications of deep learning in AF have just begun in the past few years (see, e.g., [24,28,30,34,40]).
For ECG signals, one can directly adopt 1D convolutional or recurrent network models for the classification task. However, transforming signals into spectral domain (spectro-temporal features) is a promising alter-native approach knowing that the current state-of-theart deep convolutional neural networks (CNNs) structures are typically designed for 2D images. Deep CNNs such as AlexNet [20], Inception-v4 [36], and DenseNet [16] have proved their superiority in image classification.
Within the previous studies, only a few have resorted to the use of time-varying spectrum for AF detection. The reasons might be the following. First, it is not easy to select hand-crafted features from 2D data using traditional classifiers. Second, the temporal features of spectrogram are usually hard to capture even in DL setting. Several studies [40,45] have endeavoured DL for AF detection in spectral domain, but the use of traditional spectral estimation methods such as shorttime Fourier transform (STFT) or continuous wavelet transform (CWT) may drop momentous information during the transformation, and produce less informative input data. Thus, to unravel these problems, it is beneficial to consider new spectro-temporal estimation methods that retain the temporal features better.
The contributions of this paper are: 1) We propose two extended models for spectro-temporal estimation using Kalman filter and smoother. We then combine them with deep convolutional networks for AF detection. 2) We test and compare the performance of proposed approaches for spectro-temporal estimation on simulated data and AF detection with other popular estimation methods and different classifiers. 3) For AF detection, we evaluate the proposals using Phys-ioNet/CinC 2017 dataset [7], which is considered to be a challenging dataset that resembles practical applications, and our results are in line with the state-of-theart.
This paper is an extended version based of our previous conference paper "Spectro-temporal ECG Analysis for Atrial Fibrillation Detection" [44] presented at 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing. In addition to the original contributions in the conference article, in this article, we use a new stochastic oscillator model and show that the spectro-temporal estimation can also be implemented with a steady state (stationary) Kalman filter and smoother, which leads to a significant reduction in time consumption without losing estimation accuracy. We demonstrate this in both simulated data and AF data classification. In addition to the experiments in the conference paper, where we only showed a few comparisons among estimation methods and classifiers, we expand them to a wide range of both standard and modern (e.g., Random Forests, CNNs, and DenseNet) classifiers for a better and more solid illustration of the classification performance.
The paper is structured as follows: In Section 2, we propose spectro-temporal methods for ECG signal analysis. In Section 3, we apply the proposed estimation method to AF detection using an averaging procedure. In Section 4, we compare and discuss experimental results both in simulated data and ECG dataset, followed by conclusion in Section 5.

Spectro-Temporal Estimation Methods
Spectro-temporal signal analysis is an effective and powerful approach that is used in many fields ranging from biosignal analysis [27] and audio processing [26] to weather forecasting [8] and stock market prediction [17]. In ECG analysis, the temporal evolution of spectral information can be captured in spectro-temporal data representation, which can convey important information about the underlying biological process of the heart.
In this section, we develop new methods for spectrotemporal estimation. We first introduce a Fourier series model based upon the Bayesian spectrum estimation method of Qi et al. [25], and put Gaussian process priors on the Fourier coefficients. Then, by adopting the ideas presented in [33], we convert the Fourier series into a more flexible stochastic oscillator model and use a fast stationary Kalman filter/smoother for its estimation. Finally, we demonstrate the estimation performance on simulated data.

Kalman-based Fourier Series Model for Spectro-Temporal Estimation
Apart from traditional STFT and CWT methods, the spectro-temporal analysis can also be done by modeling the signal as a stochastic state-space model and resorting to the Bayesian procedure (i.e., Kalman filter and smoother) for its estimation [25,31]. The key advantages of this kind of approaches over other spectrotemporal methods are that we can apply them to both evenly and unevenly sampled signals [25] and they require no stationarity guarantees nor windowing. Furthermore, as we show here, they can also be combined with state-space methods for Gaussian processes [14,32].
Recall that any periodic signal with fundamental frequency f 0 can be expanded into a Fourier series where the exact representation is obtained with M → ∞, but for sampled (and thus bandlimited) signals it is sufficient to consider finite series. This stationary model is the underlying model in the STFT approach. STFT applies a window to each signal segment and finds a least squares fit (via discrete Fourier transform) to the coefficients {a j , b j : j = 1, . . . , M }.
In our approach, we start by assuming that the coefficients depend on time, and we put Gaussian process priors on them: As shown in [14,32], provided that the covariance functions are stationary, we can express the Gaussian processes as solutions to linear stochastic differential equations (SDEs). We choose the covariance functions to have the form where s a j , s b j > 0 are scale parameters and λ a j , λ b j > 0 are the inverses of the time constants (length scales) of the processes.
The state-space representations (which are scalar in this case) are then given as where W a j , W b j are Brownian motions with suitable diffusion coefficients q a j , q b j . We can also solve the equations at discrete time steps (see, e.g., [12]) as where Let us now assume that we obtain noisy measurements of the Fourier series (1) at times t 1 , t 2 , . . .. What we can now do is to define a state vector x = [a 0 , a 1 , ..., a M , b 1 , b 2 , . . . , b M ] which stacks all the coefficients a j and b j . In this way, we can write H k = [1, cos(2πf 0 t k ), . . . , cos(2πM f 0 t k ), sin(2πf 0 t k ), . . . , sin(2πM f 0 t k )], which leads to We can also rewrite the dynamic model (5) as where Ψ k contains the terms ψ a jk and ψ b jk on the diagonal and q k ∼ N (0, Σ k ) where Σ k contains the terms Σ a jk and Σ b jk on the diagonal. If we assume that we actually measure (7) with additive Gaussian measurement noise r k ∼ N (0, R), then we can express the measurement model as Equations (8) and (9) define a linear state-space model where we can perform exact Bayesian estimation using Kalman filter and smoother [31]. In the original paper [25], the state vectors x 1 , ..., x N are assumed to perform random walk, but here the key insight is to use a more general Gaussian process which introduces a finite time constant to the problem. Although here we have chosen to use quite simple Gaussian process model for this purpose, it would also be possible to use more general Gaussian process priors for the coefficients such as state-space representations of Matérn or squared exponential covariance functions [14,32]. The Kalman filter for this problem then consists of the following forward recursion (for k = 1, . . . , N ): and the RTS smoother the following backward recursion (for k = N − 1, . . . , 1): The final posterior distributions are then given as: The magnitude of the sinusoidal with frequency f j = j f 0 at time step k can then be computed by extracting the elements corresponding toâ j (t k ) andb j (t k ) from the mean vector m s k : From now on, matrix S is called spectro-temporal data matrix.

Oscillator Model for Spectro-Temporal Estimation
In practice, the computational cost of Kalman filter and smoother can be extensive when the length of the signal is very long. However, instead of the Fourier series state space model in previous section, one can also derive an alternative representation using stochastic oscillator differential equations. In this way, the dynamic and measurement models become linear time-invariant (LTI) so that we can leverage a stationary Kalman filter to reduce the time consumption. This kind of stochastic oscillator models were also considered in [33] and the link to period Gaussian process models was investigated in [35]. A single quasi-period stochastic oscillator can be described with the following stochastic differential equation model [35]: where x j = a j b j and the Brownian motion has a suitably chosen diffusion matrix ζ j = q j I [35]. By solving the SDE in discrete time steps, we have where A j and Q j are given by: where ∆t = t k − t k−1 .
A general quasi-periodic signal can be modeled using a superposition of stochastic oscillators of the above form [35]. If we construct , then the resulting timeinvariant model can be written as: where A, Q and H are defined as: In this model, the first component of the state is a slowly drifting Brownian motion with diffusion coefficient q b modeling the possible non-zero mean of the signal.
The estimation problem can be solved with a Kalman filter and smoother. However, because the model is LTI, the Kalman filter is known to converge to a steady-state Kalman filter [18]. The steady-state Kalman filter can be obtained by solving the following discrete algebraic Riccati equation (DARE) for the limit covariance P − k → P − ∞ : A positive-semi-definite solution to the equation is known to exists provided that the pair [A, H] is detectable [18]. Thus we can obtain P − ∞ by solving DARE in (20), and the stationary Kalman filter for the forward mean propagation is: where the stationary gain is The corresponding smoother then turns out to converge to its steady state as well, and the backward propagation for the resulting steady-state smoother is: where the gain is computed as In this way, the calculation of the filter and covariances at every time step is not needed, which reduces the computational cost significantly. The disadvantage is that we need to solve the DARE in order to construct the stationary filter and smoother, which also adds to the computational cost.
After computing the estimates m s k for each time step, we can extract the estimates ofâ j (t k ) andb j (t k ) and use (13) to compute the spectro-temporal data matrix.

Estimation Trials on Simulated Data
A quantitative evaluation of the proposed spectrotemporal methods for ECG classification is discussed in Sections 4 and 5.2. However, in this section we visually inspect the proposed spectro-temporal representations on the simulated data and compare them with other standard time-frequency approaches such as STFT, CWT, and BurgAR. To avoid confusion in terminology, from now on, we refer the proposals in Section 2.1 and 2.2 as FourierKS and OscKS, respectively.
We simulated a noise-observed multi-sinusoidal signal y(t) as shown in (25) In Fig. 2, we plot the time-varying spectrum results using FourierKS, OscKS, STFT, CWT, and BurgAR. The settings for estimation we use here are described in the figure captions.
Although all methods can approximate the simulated data to a good extent, FourierKS and OscKS have higher frequency resolution with less noisy representation which can help us to extract more robust features from spectro-temporal representation. Morover, the results from FourierKS and OscKS methods are almost the same although they have different state-space models.
To verify the computational efficiency of the stationary proposal in Section 2.2, we run each of the estima-    Fig. 3 Generalized overall processing scheme for ECG analysis.

ECG Dataset
In the AF experiments, we used the ECG dataset provided by PhysioNet/CinC Challenge 2017 [7]. In total 8528 short single lead ECG recordings were collected using AliveCor hand-held devices. The recordings were uploaded automatically through an application on the user's mobile phone. In addition, the data were sampled at 300 Hz and band-pass filtered by the AliveCor devices. The duration of ECG recordings were between 9 s to 61 s with 30 s median. The distribution of ECG recordings among different classes is as follows: Normal (5076 recordings), AF (758), Other (2415), and Noisy (279).

ECG Spectro-Temporal Feature Engineering
Our aim is now to find the spectro-temporal features of ECG signals such that it can be classified by deep convolutional neural networks (CNNs). In Fig. 3 we show the overall proposed scheme from input (ECG) to output (predicted label). The first step is QRS detection and ECG segmentation in which the raw ECG signal is divided into fixedlength segments aligned by their central R peaks. Next, the spectro-temporal data matrix for each segment is calculated using (13). The data matrices are then averaged and normalized to generate a fixed-length spectrotemporal feature matrix. In the final step, the 2D feature matrix (spectro-temporal image) is fed into a deep CNN for classification.
The logic behind the segmentation and averaging steps in the feature engineering procedure (dashed area in Fig. 3) is threefold. First, it can handle the problem of ECG recordings with different length, and generate fixed-length spectro-temporal feature matrices. Second, it can capture enough information from ECG recording to be classified by CNNs. For example, since the central R peaks in each segments are aligned, after averaging we expect sharp edges corresponding to QRS complexes in feature matrices (spectro-temporal image) for Normal rhythms. However, for AF rhythms we expect the blurred area in spectro-temporal images due to the variable R-R intervals. For, noisy segments we do not expect any clear area for QRS complexes, and for Other classes based on the underlying arrhythmia one can expect different patterns in spectro-temporal images (see Fig.4). Finally, the third reason to use the segmentation and averaging steps is to decrease the effect of noise in ECG recordings. In the following we discuss different steps of feature engineering in detail.
In this work, for QRS detection, we use a modified version of Pan-Tompkins algorithm. The original Pan-Tompkins algorithm [23] is sensitive to burst noise, and it easily misinterprets noise with R peak. To address this limitation at least partially, we slightly modify the original algorithm such that it iteratively checks the number of detected R peaks and if that number is smaller than a threshold, it ignores the detected R peaks and their neighbourhood samples in the ECG signal, and again applies the Pan-Tompkins algorithm on the rest of the signal. In this way, if there are few instances with high-amplitude burst noise, our algorithms can handle those. One example which illustrate this modification is shown in Fig. 5.
The next step is segmentation in which the fixedlength ECG segments are extracted from the original signal such that each segment potentially covers three QRS complexes. The segmentation process is described as follows: if y = y 1 y 2 · · · y N ∈ R N is the original ECG signal andp i ∈ {1, 2, · · · , N } is the position of ith R peak in y, thenp = p 1p2 · · ·p D holds the positions of all R peaks, and D is the total number of R peaks in y. Now, to extract D − 2 ECG segments we associate eachp i , i ∈ {2, · · · , D − 1}, to a segment of y such that it potentially covers three adjacent QRS complexes. To do so, we collect β samples before and after eachp i . Following this procedure, the ECG segment associated to ith R peak can be extracted from y as y (i) = yp i−β · · · yp i · · · yp i+β , and using equation (13), the spectro-temporal data matrix corresponding to this ECG segment is S (i) ∈ R M ×(2β+1) where M and 2β + 1 are frequency and time steps, respectively. It is worth noticing that these two parameters (i.e., M and 2β + 1) determine the size of the matrix S in (13). The choice of parameter β is important, as it regulates the length of output and how much takes into average. Usually, β should cover at least three QRS complexes for good evidence of R-R intervals.
The spectro-temporal feature matrix S ‡ is obtained by averaging over all spectro-temporal data matrices and multiplying with their maximum mask: The reason for adding a max operation in Equation (26) is that it could, at least in certain extent, help pre- Perform Spectro-temporal estimation on y (i) to get S (i) 10: end for 11: serving intricate details of spectro-temporal data that were potentially lost during averaging across every segments, and also normalizing the data.
Examples of ECG spectro-temporal feature matrices (images) four different classes of ECG signals are shown in Fig. 4, where we used the proposed spectrotemporal estimation method in Section 2.2. In the recent ten years, deep learning techniques, especially convolutional neural networks, have achieved great success in detection and classification tasks. Comparing to 1D CNNs models, the progress of CNNs for 2D image applications is more prosperous. The aim here is to leverage advanced CNNs for AF classification using the time-varying spectrum (which is an image).

Classification
However, one flaw in most of the current network models is that the information during training, principally the gradient, may disappear if the network is exceedingly deep (with many layers), which is usually called "vanishing gradient" [10]. In general way, this root problem can be alleviated by several basic ways, for instance, with pre-training, residual connection, or with properly selected activation functions (e.g., one should not attach ReLu before batch normalization).
Densely connected convolutional networks (DenseNet) [16], which won the 2017 best paper award of CVPR, provide state-of-the-art performance without degradation or over-fitting even when stacked by hundred of layers. DenseNets can be seen as refined versions of deep residual networks (ResNets) [15], where the former one introduces explicit connection on every two and preceding layers in a dense block rather than only adjacent layers, as shown in Fig. 6. Another additional advantage of DenseNet, as mentioned in [16], is the feature reuse.
Considering an L layers network, and image input U 0 , the output of l-th layer is: where H Res l and H Den l are layer operations (e.g., convolution, batch-normalization, or activation) of ResNet and DenseNet respectively, and U l is the output of lth layer.
The DenseNet we implement here, which we refer as Dense18 + , is slightly different from the original proposal [16], where we employ both max and average global pooling on last layer and concatenate them as shown in Table 2. In our application, because of the size of input, we remove the initial down-sampling max pooling layer. Each dense block contains four 3 × 3 convolutional layers, with growth rate of 48 and reduction rate 0.5.

Model Assessment and Evaluation Criteria
To evaluate the performance of the proposed methods, we have conducted experiments on the ECG dataset described in Section 3.1. The classification performance of different methods was assessed by using the scoring mechanism recommended by PhysioNet/Computing in Cardiology (CinC) Challenge 2017 [7] over the whole dataset in 10-fold cross-validation scheme. The data were partitioned such that the same proportions of each class are available in each fold (stratified crossvalidation). Moreover, the F1 score,  for each class is calculated to summarize the performance of that specific class: Normal (F 1 N ), AF (F 1 A ), Others (F 1 O ), and Noisy (F 1 ∼ ). Then, as recommended by PhysioNet/CinC 2017 the overall evaluation metric is used as follows: Finally, the detailed performance is shown by a 4-class confusion matrix whose the diagonal entries are the correct classifications and the off-diagonal entries are the incorrect classifications. This confusion matrix is the result of stacking 10 confusion matrices of the test data in the 10-fold cross-validation.

Experiments
In principle, any time-frequency analysis method can be used for ECG classification. So, in order to show the benefit of using the proposed spectro-temporal method in Section 2 over other standard time-frequency analysis methods, we have conducted experiments on the ECG dataset. We have compared the results of the proposed method with short-time Fourier transform (STFT), continuous wavelet transform (CWT), and classical power spectral density estimation method. To do so, we used magnitude of STFT, magnitude of CWT,  Table 3 10-fold cross-validation F1 Score of spectro-temporal estimation methods using different classifiers for classification. Best score for each column and row are rendered bold and italic respectively.  and square root of non-logarithmic power spectral density using Burg autoregressive model (BurgAR) [19] of ECG signal to construct the feature matrices.
In addition, several different convolutional architectures are examined, and their results are compared to the standard RF classifier. The networks structure of InceptionV3, ResNet, and DenseNet are taken from their original papers [15,16,37], but we removed the initial sub-sampling layer for a fair comparison with Dense18 + in Table 2. We also construct a plain CNN (CNN18) which has the same structure setting with Dense18 + but without dense connection. For the random forest we use 500 decision trees and random selection of 50 features (out of 2500) at each node. In addition, at each node the random forest minimizes the cross-entropy impurity measure. The settings for spectro-temporal estimation we choose here are the same as described in Section 5.1. All spectro-temporal feature matrices (images) are then unifiedly resized (down-sample by local averaging) to 50 × 50 for classifiers.
With seven classifiers and five different timefrequency analysis methods, in total we have 35 different combinations whose performance are reported in Table 3. As can be seen from this table the best results (overall scores) belong to our proposed spectro-temporal representation methods (i.e., Fouri-erKS and OscKS) with Dense18 + classifier. Moreover, Table 4 shows the performance for each ECG classes for Dense18 + classifier with different time-frequency representation.
The detailed performance of all five methods (i.e., FourierKS, OscKS, CWT, STFT, and BurgAR) with Dense18 + classifier are reported in five confusion matrices in Fig. 7. Each confusion matrix is row-wise normalized. The diagonal entries show the Recall of each rhythm and off-diagonal entries show the misclassification rates. For example, the first row of the first confusion matrix shows 92.1% of normal rhythms are correctly classified as normal, but 0.6%, 6.3%, and 1.0% are incorrectly classified as AF, Other, and Noisy.

ECG Time-Frequency Analysis Methods
We first examine how different spectro-temporal estimation methods perform on an ECG signal through a visual inspection. We take the 3223th recording (Rec. 3223) from CinC 2017 dataset as example, which is labelled as AF. It is shown in Fig.8(a). For the FourierKS and OscKS method, we choose different frequency range (M ) and smoothing option as shown in Fig. 8(b), 8(c) and 8(d). We set the length scale λ to a constant 10, and use 1 for variance of measurement noise R, and identity for covariance of process noise q. In theory, λ could be different for each frequency, which could be used to improve the performance. Fig. 8(e) presents results by the original method in [25], which adopts Brownian motion model for the coefficients. For STFT and Bur-gAR, we apply apply 11 length 10 overlapping Hann windows for estimation, as shown in Fig. 8  For CWT (Fig. 8(g)), we use the default Morse wavelet implemented in Matlab.
First, we observe that the estimation results of FourierKS (Fig.8(c)) and OscKS (Fig.8(d)) are nearly the same except that the base frequency a 0 coefficient estimates are very sensitive to q b in the OscKS method. If we compare FourierKS method to STFT, BurgAR, and CWT, which are shown in Fig. 8(c), 8(f), 8(h), and 8(g) respectively, we can initially conclude several advantages: the result from FourierKS is more smooth and it has higher and more unified resolution on both time and frequency. For STFT and BurgAR, the resolution is confined by window selection, length, and overlap. CWT untangles this problem by scaling and translation of wavelet basis function, but due to uncertainty principle of wavelet signal processing [29], the required resolution in time and frequency can not be met simultaneously (see Fig.8(g)). Our approaches model the time-varying Fourier series coefficients of signal in statespace, which are free from usage of windows or wavelets.
Another advantage of the proposed OscKS estimation method is that it can be very computationally efficient for implementation when we need to perform estimation many times and the system is fixed (i.e., A, Q remain unchanged). For example, if one takes the averaging strategy, the spectrum estimation has to be done for every segment and recording. For OscKS method, we merely need to solve P ∞ in (20) once. As we stated in Section 2.2, the computational cost of OscKS method is substantial reduced by deriving a stable covariance.

ECG Classification for AF Detection
As it is mentioned before, Table 3 shows that the best results belong to our proposed spectro-temporal representation methods (i.e., FourierKS and OscKS) with Dense18 + classifier. Table 3 also shows that independent of spectro-temporal representation method, Dense18 + has the highest performance among all classifiers. In contrast, the plain CNN (CNN18) has the lowest scores. In addition, RF is generally worse than convolutional networks classifiers (except CNN18) probably because in contrary to convolutional networks, RF has not benefited from the existing structure in spectrotemporal representation.
Regarding the different spectro-temporal representations STFT and BurgAR have the worst results, and FourierKS, and OscKS have the best performance. In addition, for some classifiers CWT provides the results which are as good as or even better than FourierKS, and OscKS. However, the best results of FourierKS, and OscKS are higher than the best result of CWT. Table 4 shows that the the proposed ECG classification methods have the best result for Normal rhythm and the worst result for Noisy. The performance of AF and Other are between these two, but typically AF has better performance that Other, probably because Other is an umberella term that covers many abnormal non-AF rhythms, and we do not have enough samples for each abnormalities to properly train our classifiers.
To examine how different spectro-temporal features act in AF ECG analysis, one elementary-level way is to investigate the feature map and activation of the first convolutional layer. However, this voxel-based "probing" only produces limited explanation [38], and can not fully give the insights. The visualization is shown in Fig. 9. We can see that the feature-map of FourierKS and CWT are more diverse and active than STFT and BurgAR, and they have larger activation on "peaks" and background details. In comparison to FourierKS and CWT, the lower-frequency area are better preserved and exploited for FourierKS method.

Limitations
Typically for AF detection we need at least 30 s ECG data [6]. However, many ECG recordings in the dataset have less than 30 s duration (see Section 3.1) which limits the medical significance of the current study. In addition, the averaging step in feature engineering is robust only when there are enough spectro-temporal segments, which is not the case for very short ECG recordings (see Section 3.2).

Conclusion
In this paper, we proposed a spectro-temporal representation of ECG signals, based on state-space models, for application in deep network based atrial fibrillation detection. We empirically showed that if we put Gaussian process priors on the Fourier series coeffients, then by estimating the state of the corresponding linear statespace model using Kalman filter/smoother we can outperform other time-frequency analysis methods such as short-time Fourier transform, continuous wavelet transform, and autoregressive spectral estimation for ECG classification.
We also accelerated the estimation of the spectrotemporal representation of signals by using a stochastic oscillator differential equation model and stationary Kalman filter/smoother. This representation is useful to improve the scalability of the proposed spectrotemporal representation for long ECG recordings. Finally, we have found an efficient convolutional architecture (i.e., Dense18 + ) for AF detection using the spectro-temporal features by comparative evaluation of multiple convolutional neural networks models.