Keywords

1 Introduction

Patient-specific cardiac models have shown increasing potential in personalized treatment of heart diseases [9]. A significant challenge in personalizing these models arises from the estimation of patient-specific tissue properties that vary spatially across the myocardium. To estimate these high-dimensional (HD) tissue properties (in the form of model parameters) is not only algorithmically difficult given indirect and sparse measurements, but also computationally intractable in the presence of computing-intensive simulation models.

Numerous efforts have been made to circumvent the challenge of HD parameter estimation. Many works assume homogeneous tissue property that can be represented by a single global model parameter [7]. To preserve local tissue properties, a common approach is to reduce the parameter space through an explicit partitioning of the cardiac mesh. These efforts can be generally summarized in two categories. In one approach, the cardiac mesh is pre-divided into 3–26 segments, each represented by a uniform parameter value [11]. Naturally, this artificial low-resolution division has a limited ability to represent tissue heterogeneity that is not known a priori. It has also been shown that the initialization of model parameters becomes increasingly more critical as the number of segments grows [11]. Alternatively, a multi-scale hierarchy of the cardiac mesh can be defined for a coarse-to-fine optimization, which allows spatially-adaptive resolution that is higher in certain regions than the others [3, 4]. However, the representation ability of the final partition is limited by the inflexibility of the multi-scale hierarchy: homogeneous regions distributed across different scales cannot be grouped into the same partition, while the resolution of heterogeneous regions can be limited by the level of scale the optimization can reach [4]. In addition, because these methods involve a cascade of optimizations along the hierarchy of the cardiac mesh, they are computationally expensive. In the presence of models that could require hours or days for a single simulation, these methods could quickly become computationally prohibitive.

In this paper, we present a novel HD parameter optimization approach that replaces the explicit anatomy-based reduction of the parameter space, with an implicit low-dimensional (LD) manifold that represents the generative code for HD spatially-varying tissue properties. This is achieved by embedding within the optimization a generative variational auto-encoder (VAE) model, trained from a large set of spatially-varying tissue properties reflecting regional tissue abnormality with various locations, sizes, and distributions. The VAE decoder is utilized within the objective function of the Bayesian optimization [2] to provide an implicit LD search space for HD parameter estimation. Meanwhile, the VAE-encoded posterior distribution of the generative code is used to guide an efficient exploration of the LD manifold. The presented method is applied to estimating tissue excitability of a cardiac electrophysiological model using non-invasive electrocardiogram (ECG) data. On both synthetic and real data experiments, the presented method is compared against the use of anatomy-based LD [11] or multi-scale representation of the parameter space [4]. Experiments demonstrate that the presented method can achieve a drastic reduction in computational cost while improving the accuracy of the estimated parameters. To the best of our knowledge, this is the first work that utilizes a probabilistic generative model within an optimization framework for estimating HD model parameters. It provides an efficient and general solution to personalizing HD model parameters.

2 Background: Cardiac Electrophysiological System

Cardiac Electrophysiology Model: Among the different types of cardiac electrophysiological models, phenomenological models such as the Aliev-Panfilov (AP) model [1] can explain the macroscopic process of cardiac excitation with a small number of model parameters and reasonable computation. Therefore, the AP model given below is chosen to test the feasibility of the presented method:

$$\begin{aligned} \begin{aligned} {\partial u}/{\partial t}&= \nabla (\mathbf {D} \nabla u) - cu(u-\theta )(u-1) - uv,\\ {\partial v}/{\partial t}&= \varepsilon (u,v)(-v - cu(u - \theta - 1)). \end{aligned} \end{aligned}$$
(1)

Here, u is the transmembrane action potential and v is the recovery current. The transmural action potential is computed by solving the AP model (1) on a 3D myocardium discretized using the meshfree method [10]. Because u is most sensitive to the value of the parameter \(\theta \) [4], we focus on its estimation in this study.

Body-Surface ECG Model: The propagation of the spatio-temporal transmural action potential \(\mathbf U \) to the potentials measured on the body surface \(\mathbf Y \) can be described by the quasi-static approximation of the electromagnetic theory [8]. Solving the governing equations on a discrete heart-torso mesh, a linear relationship between \(\mathbf U \) and \(\mathbf Y \) can be obtained as: \(\mathbf Y = \mathbf H (\mathbf U (\pmb {\theta }))\), where \(\pmb {\theta }\) is the vector of local parameters \(\theta \) at the resolution of the cardiac mesh.

3 HD Parameter Estimation

To estimate \(\pmb {\theta }\), we maximize the similarity between the measured ECG and those simulated by the combined electrophysiological and ECG model \(M(\pmb {\theta })\):

$$\begin{aligned} \hat{\pmb {\theta }} = {\mathop {\hbox {arg max}}\limits _{\pmb {\theta }}}{-||\mathbf Y - M(\pmb {\theta })||^2}. \end{aligned}$$
(2)

To enable the estimation of \(\pmb {\theta }\) at the resolution of the cardiac mesh, the presented method embeds within the Bayesian optimization framework a stochastic generative model that generates \(\pmb {\theta }\) from a LD manifold. It includes two major components as outlined in Fig. 1: (1) the construction of a generative model of HD spatially-varying tissue properties at the resolution of the cardiac mesh, and (2) a novel Bayesian optimization method utilizing the embedded generative model.

Fig. 1.
figure 1

Outline of the presented method, with the dimension of each VAE layer labeled.

3.1 LD-to-HD Parameter Generation via VAE

Generative VAE Model: We assume that the spatially varying tissue properties at the resolution of a cardiac mesh \(\pmb {\theta }\) is generated by a small number of unobserved continuous random variables \(\mathbf z \) in a LD manifold. To obtain the generative process from \(\mathbf z \) to \(\pmb {\theta }\), the VAE consists of two modules: a probabilistic deep encoder network with parameters \(\pmb {\alpha }\) that approximates the intractable true posterior density as \(q_{\pmb {\alpha }}(\mathbf z |\pmb {\theta })\); and a probabilistic deep decoder network with parameters \(\pmb {\beta }\) that can probabilistically reconstruct \(\pmb {\theta }\) given \(\mathbf z \) as \(p_{\pmb {\beta }}(\pmb {\theta }|\mathbf z )\). Both networks consist of three fully-connected layers as shown in Fig. 1.

To train the VAE, we generate \(\pmb {\varTheta }=\big \{\pmb {\theta }^{(i)}\big \}_{i=1}^{N}\) consisting of N configurations of heterogeneous tissue properties in a patient-specific cardiac mesh. The training involves optimizing the variational lower bound on the marginal likelihood of each training data \(\pmb {\theta }^{(i)}\) with respect to network parameters \(\pmb {\alpha }\) and \(\pmb {\beta }\):

$$\begin{aligned} \mathcal {L}(\pmb {\alpha };\pmb {\beta };\pmb {\theta }^{(i)}) = -D_{\mathrm {KL}} (q_{\pmb {\alpha }}(\mathbf z |\pmb {\theta }^{(i)}) || p_{\pmb {\beta }}(\mathbf z )) + E_{q_{\alpha }(\mathbf z |\pmb {\theta }^{(i)})} [\mathrm {log} p_{\pmb {\beta }}(\pmb {\theta }^{(i)}|\mathbf z )], \end{aligned}$$
(3)

where we model \(p_{\pmb {\beta }}(\pmb {\theta }|\mathbf z )\) with a Bernoulli distribution. To optimize Eq. (3), stochastic gradient descent with standard backpropagation can be utilized. Assuming the approximate posterior \(q_{\alpha }(\mathbf z |\pmb {\theta })\) as a Gaussian density and the prior \(p_{\pmb {\beta }}(\mathbf z )\sim \mathcal {N}(0,1)\), their KL divergence can be derived analytically as:

$$\begin{aligned} D_{\mathrm {KL}} (q_{\pmb {\alpha }}(\mathbf z |\pmb {\theta }^{(i)}) || p_{\pmb {\beta }}(\mathbf z )) = -\frac{1}{2} \sum (1+\mathrm {log}(\pmb {\sigma }_j^2)-\pmb {\mu }_j^2 - \pmb {\sigma }_j^2), \end{aligned}$$
(4)

where j is along the dimensions of \(\mathbf z \), and \(\pmb {\mu }\) and \(\pmb {\sigma }^2\) are mean and variance from \(q_{\pmb {\alpha }}(\mathbf z |\pmb {\theta }^{(i)})\). Because stochastic latent variables are utilized, the gradient of the expected negative reconstruction term during backpropagation cannot be directly obtained. The popular re-parameterization trick is utilized to express \(\mathbf z \) as a deterministic variable as \(\mathbf z ^{(i)} = \pmb {\mu }^{(i)} + \pmb {\sigma }^{(i)} \pmb {\epsilon }\), where \(\pmb {\epsilon }\sim \mathcal {N}(0,\mathbf I )\) is noise [6].

Probabilistic Modeling of the Latent Code: The trained encoder provides an approximated posterior density of the LD latent code \(q_{\alpha }(\mathbf z |\pmb {\theta })\). This represents valuable knowledge about the probabilistic distribution of \(\mathbf z \) learned from a large training dataset. To utilize this in the subsequent optimization, we integrate \(q_{\alpha }(\mathbf z |\pmb {\theta })\) over the training data \(\pmb {\varTheta }\) to obtain the density \(q_{\pmb {\alpha }}(\mathbf z )\) as a mixture of Gaussians \(1/N\sum _{i}^{N}{\mathcal {N}(\pmb {\mu }^{(i)},\pmb {\varSigma }^{(i)}})\), where \(\pmb {\mu }^{(i)}\) and \(\pmb {\varSigma }^{(i)}\) are mean and covariance from \(q_{\pmb {\alpha }}(\mathbf z |\pmb {\theta }^{(i)})\). Because the number of mixture components in \(q_{\pmb {\alpha }}(\mathbf z )\) scales linearly with the number of training data, we approximate \(q_{\pmb {\alpha }}(\mathbf z )\) with a single Gaussian density as \(~\mathcal {N}\big (1/N \sum _{i}^N\pmb {\mu }^{(i)}, 1/N \sum _{i}^N(\pmb {\varSigma }^{(i)} + \pmb {\mu }^{(i)}\pmb {\mu }^{(i)T}) - \pmb {\mu }\pmb {\mu }^{T}\big )\). Alternatively, we approximate \(q_{\pmb {\alpha }}(\mathbf z )\) with a mixture of Gaussians with \(K<<N\) components, where k-means clustering with the Bregman divergence [5] as a similarity metric is used to reduce the number of mixture components.

In this way, we obtain a generative model \(p_{\pmb {\beta }}(\pmb {\theta }|\mathbf z )\) of HD tissue properties from an implicit LD manifold, and prior knowledge of the LD manifold \(q_{\pmb {\alpha }}(\mathbf z )\) from the probabilistic encoder. Both will be embedded into Bayesian optimization to enable efficient and accurate HD parameter estimation.

3.2 Bayesian Optimization with Embedded Generative Model

Representing \(\pmb {\theta }\) with the expectation of the trained decoder \(p_{\pmb {\beta }}(\pmb {\theta }|\mathbf z )\), we obtain:

$$\begin{aligned} \hat{\mathbf{z }} = {\mathop {\hbox {arg max}}\limits _\mathbf{z }}{-||\mathbf Y - M\big (\mathrm {E}[p_{\pmb {\beta }}(\pmb {\theta }|\mathbf z ]\big )||^2}, \end{aligned}$$
(5)

which allow us to optimize the HD parameter \(\pmb {\theta }\) in an implicit LD manifold of \(\mathbf z \). For Bayesian optimization, we assume a zero mean Gaussian process (GP) with an anisotropic Mátern 5/2 kernel as a prior over the objective function (5). The optimization then consists of two iterative steps: (1) select point in the LD manifold that allows the GP to globally approximate Eq. (5) (exploration) while locally refining the area of optimum (exploitation); and (2) update the GP.

VAE-Informed Acquisition Function: To select points on LD manifold, we adopt the expected improvement (EI) function that picks a point with maximum expectation of improvement over the current best objective function value \(f_m\) [2]. For a GP posterior \(\sim \mathcal {N}(\mu (.),\sigma (.))\), it can be obtained as:

$$\begin{aligned} \mathrm {EI(}{} \mathbf z )&= (\mu (\mathbf z )-f_m)\mathrm {\Phi }\Big (\frac{\mu (\mathbf z )-f_m}{\sigma (\mathbf z )}\Big ) + \sigma (\mathbf z ){\phi }\Big (\frac{\mu (\mathbf z )-f_m}{\sigma (\mathbf z )}\Big ), \end{aligned}$$
(6)

where \(\mathrm {\Phi }\) and \(\phi \) are the cumulative distribution function and density function of the standard normal distribution. Here, the first term controls exploitation (through high \(\mu \)) and the second term controls exploration (through high \(\sigma \)). Because using only \(f_m\) can lead to excessive exploitation, it is common to augment \(f_m\) with a constant trade-off parameter \(\varepsilon \) as: \(f_{m}+\varepsilon \) [2]. Here, we utilize the VAE-encoded knowledge about the LD manifold \(q_{\pmb {\alpha }}(\mathbf z )\) to enforce higher exploitation in the areas of high probability density for \(\mathbf z \), and lower elsewhere. In specific, we define \(\varepsilon (\mathbf z ) = -f_m \sum _{i}w_i(\mathbf z -\pmb {\mu }_i) \pmb {\varSigma }_{i}^{-1} (\mathbf z -\pmb {\mu }_i)\), where w, \(\pmb {\mu }\), and \(\pmb {\varSigma }\) are the weight, mean, and variance of the K Gaussian mixture components in \(q_{\pmb {\alpha }}(\mathbf z )\).

GP Update: After a new point \(\mathbf z ^{(n)}\) is selected by maximizing the modified EI, the objective function (5) is evaluated at the HD parameter given by the mean of the generative model \(p_{\pmb {\beta }}(\pmb {\theta }|\mathbf z ^{(n)})\). The GP is then updated by adding the new pair of \(\mathbf z ^{(n)}\) and objective function value, and maximizing the log marginal likelihood with respect to kernel parameters: length scales and kernel amplitude.

4 Experiments

Synthetic Experiments: We include 27 synthetic experiments on three CT-derived human heart-torso models. In each case, an infarct sized 2%–40% of the heart was placed at differing locations using various combinations of the AHA segments. The value of the parameter \(\theta \) in the infarcted and the healthy region is set to 0.5 and 0.15, respectively. 120-lead ECG is simulated and corrupted with 20 dB Gaussian noise as measurement data. We evaluate the accuracy in estimated parameters with two metrics: (1) root mean square error (RMSE) between the true and estimated parameters; and (2) dice coefficient (DC) = \(\frac{2(|S_1 \cap S_2|)}{|S_1| + |S_2|}\), where \(S_1\) and \(S_2\) are the sets of nodes in the true and estimated regions of infarct; these regions are determined by Otsu’s thresholding method.

VAE Architecture and Training: For each heart, we generate a training dataset of tissue properties with various heterogeneous infarcts. Each infarct is generated by random region growing in which, starting with one infarct node, one out of the five closest neighbors of the present infarct is randomly added to the infarct until an infarct of desired size is obtained. It is then added to training data. Because infarcts thus generated tend to be very irregular, we also include infarcts generated by growing the infarct with the node closest to its center. For each heart, we extract 123,896, 155,099, and 116,459 data. The training of VAE with an architecture as shown in Fig. 1 with the Adam optimizer on Titan X GPU took 9.77, 13.96, and 9.0 min for each dataset.

Fig. 2.
figure 2

Comparison of BO-VAE EI Post-1 (blue bar) with: (1) FH and FS (green bars); and (2) BO-VAE using standard EI, EI Isotropic, and EI Post-K (yellow bars) in terms of DC, RMSE, and number of model evaluations (from left to right).

Fig. 3.
figure 3

Left: Examples of estimated parameters with BO-VAE, FH, and FS. Right: Progression of FH on the multi-scale hierarchy for parameter estimation of (a) (green leaf: homogeneous tissues; red leaf: heterogeneous tissues).

Comparison with Existing Methods: The presented method (termed as BO-VAE) is compared against two common approaches based on explicit LD representation: (1) optimization over fixed 18 segments (fixed-segment (FS) method); and (2) coarse-to-fine optimization along a fixed multi-scale hierarchy (fixed-hierarchy (FH) method). As summarized in Fig. 2(a)(b), BO-VAE (blue bar) is more accurate than the other two methods (green bars) in both DC and RMSE (paired t-tests, p \(<0.012\)). This is achieved at a reduction of the computational cost by: \(87.57\%\) for the FS method and \(98.73\%\) for the FH method (Fig. 2(c)).

The FS method shows the lowest accuracy with some estimated parameters either missing the infarct or including large false positives (Fig. 3) left. The FH method overcomes this issue, although to a limited extent. In the LD representation obtained by the FH method as shown in Fig. 3 right several dimensions are wasted at representing homogeneous healthy regions (green) across different scales, which limits its ability to optimize deeper along the tree. BO-VAE is not limited by such explicitly-imposed anatomy-based structure, allowing it to attain higher accuracy with only 2 latent dimensions and 1–10\(\%\) of the computation.

The Effect of VAE-Encoded Knowledge About the LD Manifold: To study the effect of incorporating the VAE-encoded \(q_{\pmb {\alpha }}(\mathbf z )\) in the EI, we compare the standard EI with EI augmented with three types of distributions on \(\mathbf z \): (1) \(p_{\pmb {\beta }}(\mathbf z )\sim \mathcal {N}(0,1)\) (EI Isotropic), (2) approximated \(q_{\pmb {\alpha }}(\mathbf z )\) with a single Gaussian density (EI Post-1); and (3) approximated \(q_{\pmb {\alpha }}(\mathbf z )\) with a mixture of 10 Gaussian densities (EI Post-K). As shown in Fig. 2, the accuracy using all three distributions is higher than that without using any, among which EI Post-1 has the highest accuracy. Figure 4(b) illustrates that, when \(q_{\pmb {\alpha }}(\mathbf z )\) is utilized, the exploration proceeds from the region of high probability density to the region of low probability density. In comparison, with standard EI, the points are spread to reduce overall variance (Fig. 4(a)); this could result in incorrect (Fig. 4(c)) or suboptimal (Fig. 4(d)) solutions.

Fig. 4.
figure 4

Comparison of training points selected by EI (a) and EI Post-1 (b), and examples of the estimated parameters by the two acquisition functions (c–d).

Fig. 5.
figure 5

(a–b): Examples of estimated parameters using five vs. two dimensional latent codes. (c–d): Latent code manifold based on (c) infarct location, and (d) infarct size.

We also experimented with HD latent code \(\mathbf z \). As shown in Fig. 5(a)(b), there was only a marginal improvement in accuracy with a five vs. a two dimensional (2d) latent code. It suggests that, given the focus of the training data on local infarcts, a 2d latent code may be sufficient to capture the necessary generative factors. The plot of these 2d latent codes in Fig. 5(c)(d) show that they cluster by infarct location and their radial direction accounts for the infarct size.

Real Data Experiments: Real-data studies are conducted on two patients with previous myocardial infraction. Patient-specific heart and torso meshes are constructed from axial CT images. Tissue excitability is estimated from 120-lead ECG data. The results are evaluated by in-vivo bipolar voltage data which, although not a direct measure of tissue excitability, provides a reasonable reference about the region of infarcts. The first two columns of Fig. 6 show the original voltage data (red: dense infarct; purple: healthy tissue; green: infarct border) and the same data registered to cardiac meshes.

The voltage map in case 1 (Fig. 6(a)) shows a highly heterogeneous infarct spread over a large region in the lateral LV. The estimated parameters by all methods capture this region of infarct. For this accuracy, the FH and FS methods required 4056 and 1058 model evaluations, whereas BO-VAE required only 105 model evaluations. By contrast, as shown in Fig. 6(b), case 2 has a smaller region of dense scar in the lateral LV. The estimated parameters by BO-VAE and FH correctly reveal this region of scar, whereas the FS method is less accurate. In this case, BO-VAE, FH method, and FS method required 105, 5798, and 1501 model evaluations respectively.

Fig. 6.
figure 6

Model parameter estimated with BO-VAE, FH, and FS on real-data study.

Conclusion: We present a novel approach to estimating HD model parameters, achieved by embedding within the Bayesian optimization a generative model of HD tissue properties from a LD manifold. Experiments show a gain in accuracy with drastically reduced computation. Future works include the incorporation of training data from high resolution 3D imaging and study of alternatives to incorporate the knowledge of latent manifold in Bayesian optimization.