In this section, we present our approach which is composed of three main parts illustrated in Fig. 1, but in this work we focus on the second and third parts.
Cardiac motion estimation
Specular highlights hinder the performance of the vision-based solution as they partially occlude the targeted surface, appear as additional features, generate discontinuities in the images or cause loss of texture or color information. In this work, we adapted our specular-free image solution, presented in [17], to the stereo-pair frames case.
Assume a calibrated image sequence \(G=\{g_{s}\}_{s=0}^{S-1}\) composed of S stereo-pair frames, where \(g_{s}=\{f_{\mathrm{r}}^{s},f_{\mathrm{l}}^{s}\}\). Let \(f_{\mathrm{r}}^{s}\rightarrow {\mathbb {R}}^2\) and \(f_{\mathrm{l}}^{s}\rightarrow {\mathbb {R}}^2\) denote the left and right view of s in its bounded domain \(\Omega \). To retrieve the heart motion, we start with defining a lattice on each stereo view according to the next definition:
Definition 1
A lattice, \({\mathfrak {L}}\), is a subgroup in a real vector space V of dimension d that has the form \({\mathbb {Z}}v_{1}+\cdots +{\mathbb {Z}}v_{d}\)
Consider \({\mathfrak {L}}_{\mathrm{l}}^{s},{\mathfrak {L}}_{\mathrm{r}}^{s} \subset {\mathbb {R}}^{2}\) as the lattices defined at the left and right views of \(g_{s}\). We recover the 3D heart surface by computing the projections of the corresponding points from \({\mathfrak {L}}_{\mathrm{l}}^{s}\) and \({\mathfrak {L}}_{\mathrm{r}}^{s}\) as illustrated in Fig. 2, which results in the three dimensional lattice \({\mathfrak {L}}^{s}\subset {\mathbb {R}}^{3}\) with a set of lattice points \({\mathbf {B}}\). In this work, we represent the deformable heart surface by the tensor product of the b-splines \(\xi _{c}\). Assume a given position \(x \subseteq {\mathbb {R}}^{d}\), a defined d-dimensional lattice point as \(z:=y_{1}{\ldots }y_{d}\) and the n degree b-splines. Then deformation can be represented as:
$$\begin{aligned} \begin{aligned}&\varphi (x;{\mathbf {B}})=\sum _{j_{1}=0}^{n}{\ldots }\sum _{j_{d}=0}^{n}{\mathbf {B}}_{j_{1},{\ldots },j_{d}}\prod _{k=1}^{d}\xi _{k,c}(x_{k})\\&\quad \text {for}\quad c=0,1,2,3 \end{aligned} \end{aligned}$$
(1)
After defining the deformation model, the changes on the heart surface’s deformation over time are computed by an energy functional that is composed of three terms: (i) a data term that allows measuring the discrepancy between the current \(f_{\mathrm{r}}\) and \(f_{\mathrm{l}}\), (ii) a regularization term that enforces a plausible transformation and (iii) a topology preservation term which ensures connectivity between the structures created within the lattice.
Particularly, we represent the data term with the sum of squared differences modifying the minimization of the residual error \(\sum _{i}{r_{i}^{2}}\) for \(\sum _{i}\rho ({r_{i}})\) where \(\rho \) is the Tukey’s M estimator for increasing robustness in the sense of outliers. The second term is formulated using the curvature method which has the advantage of penalizing oscillations and keeping affine linear transformations [18].
Definition 2
A map \(f:X\rightarrow Y\) preserves topology if there exists \(f^{-1}\) and both f and \(f^{-1}\) are smooth.
For the third term and Definition 2, we use the topology preservation term that we first proposed in [19], but here we extended it to 3D. This penalization term is based on controlling the Jacobian determinant for preserving the anatomical structure of organs. Unlike works where topology preservation is not considered, such as [12, 14, 20, 21], in this work we demonstrate the relevance of preserving the heart anatomical structure specially during complex deformations. Taking these three terms, our energy functional is given by:
$$\begin{aligned} \begin{aligned}&{\hat{\mathbf {E}}}_{s}({\mathbf {B}})\\&\quad =\,\bigg (\frac{1}{m}\bigg )\underbrace{\int _{\Omega }\rho (f_{\mathrm{r}}^{s}(\varphi (x;{\mathbf {B}})+x)-f_{\mathrm{l}}^{s}(x)){\mathrm{d}x}}_{\text {data term}}\\&\qquad +\,\underbrace{\sum _{i=1}^{d}\int _{\Omega }(\Delta \varphi (x;{\mathbf {B}})_{i})^{2}{\mathrm{d}x}}_{\text {regularization term}} \underbrace{ + \int _{\Omega } \delta _{{\varphi }}(x;{\mathbf {B}}) {\mathrm{d}x} }_{\text {topology preservation term}} \end{aligned} \end{aligned}$$
(2)
where m is the number of pixels in the overlapped domain \(\Omega _{f_{\mathrm{r}},f_{\mathrm{l}}}\) and our term \(\delta _{{\varphi }}\) is defined as:
$$\begin{aligned} \delta _{\varphi }(x;{\mathbf {B}}):= \left\{ \begin{array}{ll} \frac{\displaystyle \frac{\displaystyle 1}{\displaystyle 2}\pi -\arctan (| J_{\varphi }(x;{\mathbf {B}})|)}{\displaystyle \pi } +\varphi \sqrt{\vert J_{\varphi }(x;{\mathbf {B}})\vert ^2} &{}\quad \mathbf if\mathbf |\; |J_{\varphi }(x;{\mathbf {B}})| -1\;| \ge \tau \\ 0 &{}\quad \text {otherwise} \\ \end{array}\right. \end{aligned}$$
(3)
where \(\varphi \in {\mathbb {R}}^+\) offers a balance in our penalization and \(\tau \in {\mathbb {R}}^+\) is the margin of acceptance for values close to one. While the main purpose of the first term is to guarantee the positivity of the Jacobian determinant, which translates in avoiding the creation of new structures in the defined lattice, the second term penalizes big values which translates in prevention of big expansions and contractions. An illustrative explanation can be found in Supplementary Material Fig. 1. To solve our energy functional described in Eq. 2, we use the Levenberg–Marquardt (LM) method, which benefits of the advantage of both Gradient Descent and Gauss–Newton methods.
Cardiac motion prediction
During a RAMIS procedure, a common challenging factor is the presence of partial occlusions which compromises the tracking precision and could lead to algorithm failure. The studies in the literature of cardiac motion estimation cope with this problem using algorithms from classic estimation theory, such as the EKF and the Auto-Regressive eXogenous (ARX) model. In this work, we go beyond those solutions and use tools drawn from machine learning as an alternative to solve prediction of sequential data.
As in any supervised learning problem, a set of n training samples in the form of input–output pairs \(\{(x_{i},y_{i})\}_{i=1}^{n}\) is needed to find the function M that maps \(X\xrightarrow {M} Y\) and works well on unseen inputs x. Particularly, in a real clinical scenario, it is difficult to extract true observed values Y when estimating the cardiac motion. To mitigate the lack of a set Y and define a standard supervised learning approach, we slide [22] the given sequential data \(\{(x_{i})\}_{i=1}^{n}\) in the form \(Y=\{({x}_{i+d})\}_{i=1}^{n-1}\) where d is the time step size known as the lag, which results in input–output \(\{(x_{i},y_{i})\}_{i=1}^{n}\). An example illustrating this process can be found in supplementary material Fig. 2.
Taking the previous restructured data, our goal is to predict the heart motion within the lattice domain not just to deal with occlusion events, but as a feedback information for improving the heart motion estimation.
Definition 3
A restricted Boltzmann machine (RBM) is a two-layer graphical model that learns a probability distribution of a given set of inputs and can be defined as the energy E where the probability distribution of the visible and hidden units is given in terms of E as:
$$\begin{aligned}&E_{RBM}(v,h|W,b^{v},b^{h})= -( v^{\intercal } W^{vh}h+ v^{\intercal }b^{v}+ h^{\intercal }b^{h})\nonumber \\&= -\left( \sum _{i}\sum _{j}v_{i}W_{ij}h_{j}+\sum _{i}v_{i}b_{i}^{v}+\sum _{j}h_{j}b_{j}^{h}\right) \end{aligned}$$
(4a)
$$\begin{aligned}&p(v,h)=\frac{1}{Z}exp({-E_{RBM}(v,h)}) \end{aligned}$$
(4b)
where W refers to the weights matrix, h and v are the hidden and visible units, \(b^{v}\) and \(b^{h}\) are the unit bias, and Z the normalization factor.
Although RBMs are powerful models, they are not able to capture temporal dependencies from the model data. To cope with this problem, an extension of RBMs called conditional restricted Boltzmann machines (CRBM) [23] has been recently a focus of attention, and in particular, in dealing with motion capture [23, 24]. For illustration purposes, refer to the top part of Fig. 3.
For improving the cardiac motion estimation within the lattice domain, we exploit CRBM as a tool to, on the one side, improve the heart motion estimation and, on the other, predict the motion during occlusion events. Let c be the vector (the conditional) that contains the past information in the form time \(t-1, t-2, {\ldots }, t-M\) of the lattice (points motion). See the illustration in the bottom part of Fig. 3. The joint probability function, given the hidden and visible layers, the conditional data and M past elements, is expressed in terms of the energy \(E_\mathrm{CRBM}\) as:
$$\begin{aligned}&E_\mathrm{CRBM}(v_{t},h_{t}|c,W,{\mathcal {W}},b^{v},b^{h})= E_\mathrm{RBM}(v,h|W,b^{v},b^{h})\nonumber \\&-\sum _{m}\left( \sum _{k}\sum _{i}v_{ki,t-m}{\mathcal {W}}_{ki,t-m}v_{it} +\sum _{k}\sum _{j}v_{kj,t-m}{\mathcal {W}}_{kj,t-m}h_{j,t}\right) \end{aligned}$$
(5a)
$$\begin{aligned}&p(v_{t},h_{t}|c,W,{\mathcal {W}},b^{v},b^{h})=\frac{1}{Z}e^{\big (-E_{CRBM}(v_{t},h_{t}|c,W,{\mathcal {W}},b^{v},b^{h})\big )} \end{aligned}$$
(5b)
For training the CRBM, we used the well-known contrastive divergence algorithm [25]. Details about the architecture, for example number of units, are explained in the experimental results.