Skip to main content

Sliding to predict: vision-based beating heart motion estimation by modeling temporal interactions



Technical advancements have been part of modern medical solutions as they promote better surgical alternatives that serve to the benefit of patients. Particularly with cardiovascular surgeries, robotic surgical systems enable surgeons to perform delicate procedures on a beating heart, avoiding the complications of cardiac arrest. This advantage comes with the price of having to deal with a dynamic target which presents technical challenges for the surgical system. In this work, we propose a solution for cardiac motion estimation.


Our estimation approach uses a variational framework that guarantees preservation of the complex anatomy of the heart. An advantage of our approach is that it takes into account different disturbances, such as specular reflections and occlusion events. This is achieved by performing a preprocessing step that eliminates the specular highlights and a predicting step, based on a conditional restricted Boltzmann machine, that recovers missing information caused by partial occlusions.


We carried out exhaustive experimentations on two datasets, one from a phantom and the other from an in vivo procedure. The results show that our visual approach reaches an average minima in the order of magnitude of \(10^{-7}\) while preserving the heart’s anatomical structure and providing stable values for the Jacobian determinant ranging from 0.917 to 1.015. We also show that our specular elimination approach reaches an accuracy of 99% compared to a ground truth. In terms of prediction, our approach compared favorably against two well-known predictors, NARX and EKF, giving the lowest average RMSE of 0.071.


Our approach avoids the risks of using mechanical stabilizers and can also be effective for acquiring the motion of organs other than the heart, such as the lung or other deformable objects.


Robotic-assisted minimally invasive surgery (RAMIS) has been an attractive alternative to traditional and laparoscopic surgeries during the last years since it offers diverse advantages to both surgeons and patients [1]. Particularly, RAMIS has allowed performing complex procedures including off-pump coronary artery bypass grafting (OPCABG) [2]. This procedure avoids the associated complications of using cardiopulmonary bypass (CPB) since the heart is not arrested. Thus, surgeons have to deal with a dynamic target which compromises their dexterity and precision.

Fig. 1
figure 1

Overview of our proposed approach to estimate the cardiac motion in a RAMIS setup which is composed of three main parts

To compensate the heart motion, different authors have proposed solutions based on mechanical stabilization (for example, see [3, 4]), in which small devices are positioned over the heart surface to keep the region to be repaired in a steady state. However, works such as the one presented in [5] reported that there is still a significant residual motion (1.5–2.4 mm) after mechanical stabilization. This entails the need of manual compensation from the surgeon, which is not possible since the heart motion exceeds the human tracking bandwidth [6]. Moreover, these mechanical stabilizers can only be positioned on a small region of the heart surface and can cause irreversible heart damage that affects the cardiac mechanics [7, 8].

To overcome those difficulties, the pioneered work of Nakamura [9] reported that motion cancelation is possible by tracking the heart dynamics and continuously synchronizing this motion with the robot. This direction has been followed by different authors. An image-based motion tracking algorithm was proposed in [10] for retrieving the cardiac surface deformation using a stereo endoscopic system. However, authors in that work did not take into account the effect of occlusions on the performance and stability of the tracking algorithm. Later on, Ortmaier et al. presented in [11] a 2D affine matching algorithm using natural landmarks for estimating the heart motion. These authors dealt with occlusions by integrating a prediction scheme based on Takens’ theorem and combining electrocardiogram and respiration pressure signals.

Richa et al. [12] proposed tracking the heart surface using a thin-plate spline (TPS) deformable model and included an illumination compensation solution. Another approach was presented in [13] in which the heart motion was retrieved using a stochastic physics-based tracking technique and occlusions were tackled using a extended Kalman filter (EKF). Another 3D tracking approach based on a quasi-spherical triangle was introduced in [14] where authors modeled the heart surface using a triangle-based model with a curving parameter. They handled occlusions by applying an algorithm based on the peak-valley characteristics of motion signals.

In more recent works, authors in [15] presented a scheme for tracking the heart motion using two recursive processes. The first represents the target region in joint spatial color space, while the second applies the thin-plate spline model to fit the heart shape around the region of interest. Yang [16] proposed a motion prediction scheme for tracking the heart motion during occlusion events based on the dual Kalman filter in which a point of interest was modeled as a dual time-varying Fourier series.

Aim of this work

In this work, we propose a new approach to estimate the heart motion in which the main contributions of our solution are:

  • A diffeomorphic variational framework that is able to deal with the inherent complex deformation of a beating heart while guaranteeing preservation of the anatomy using a topology preserving penalizer. Our framework maintains affine linear transformations by means of the curvature penalizer and incorporates a preprocessing stage for dealing with specular highlights.

  • A prediction stage, which is a key point of this paper as it is different from existing approaches related to the problem at hand. We propose sliding the cardiac motion data to formulate a standard supervised learning problem, which is handled via a conditional restricted Boltzmann machine (CRBM).

Toward estimating the beating heart motion

In this section, we present our approach which is composed of three main parts illustrated in Fig. 1, but in this work we focus on the second and third parts.

Fig. 2
figure 2

3D diffeomorphic surface reconstruction from the projection of the lattice points defined in each stereo-pair image

Cardiac motion estimation

Specular highlights hinder the performance of the vision-based solution as they partially occlude the targeted surface, appear as additional features, generate discontinuities in the images or cause loss of texture or color information. In this work, we adapted our specular-free image solution, presented in [17], to the stereo-pair frames case.

Assume a calibrated image sequence \(G=\{g_{s}\}_{s=0}^{S-1}\) composed of S stereo-pair frames, where \(g_{s}=\{f_{\mathrm{r}}^{s},f_{\mathrm{l}}^{s}\}\). Let \(f_{\mathrm{r}}^{s}\rightarrow {\mathbb {R}}^2\) and \(f_{\mathrm{l}}^{s}\rightarrow {\mathbb {R}}^2\) denote the left and right view of s in its bounded domain \(\Omega \). To retrieve the heart motion, we start with defining a lattice on each stereo view according to the next definition:

Definition 1

A lattice, \({\mathfrak {L}}\), is a subgroup in a real vector space V of dimension d that has the form \({\mathbb {Z}}v_{1}+\cdots +{\mathbb {Z}}v_{d}\)

Consider \({\mathfrak {L}}_{\mathrm{l}}^{s},{\mathfrak {L}}_{\mathrm{r}}^{s} \subset {\mathbb {R}}^{2}\) as the lattices defined at the left and right views of \(g_{s}\). We recover the 3D heart surface by computing the projections of the corresponding points from \({\mathfrak {L}}_{\mathrm{l}}^{s}\) and \({\mathfrak {L}}_{\mathrm{r}}^{s}\) as illustrated in Fig. 2, which results in the three dimensional lattice \({\mathfrak {L}}^{s}\subset {\mathbb {R}}^{3}\) with a set of lattice points \({\mathbf {B}}\). In this work, we represent the deformable heart surface by the tensor product of the b-splines \(\xi _{c}\). Assume a given position \(x \subseteq {\mathbb {R}}^{d}\), a defined d-dimensional lattice point as \(z:=y_{1}{\ldots }y_{d}\) and the n degree b-splines. Then deformation can be represented as:

$$\begin{aligned} \begin{aligned}&\varphi (x;{\mathbf {B}})=\sum _{j_{1}=0}^{n}{\ldots }\sum _{j_{d}=0}^{n}{\mathbf {B}}_{j_{1},{\ldots },j_{d}}\prod _{k=1}^{d}\xi _{k,c}(x_{k})\\&\quad \text {for}\quad c=0,1,2,3 \end{aligned} \end{aligned}$$

After defining the deformation model, the changes on the heart surface’s deformation over time are computed by an energy functional that is composed of three terms: (i) a data term that allows measuring the discrepancy between the current \(f_{\mathrm{r}}\) and \(f_{\mathrm{l}}\), (ii) a regularization term that enforces a plausible transformation and (iii) a topology preservation term which ensures connectivity between the structures created within the lattice.

Particularly, we represent the data term with the sum of squared differences modifying the minimization of the residual error \(\sum _{i}{r_{i}^{2}}\) for \(\sum _{i}\rho ({r_{i}})\) where \(\rho \) is the Tukey’s M estimator for increasing robustness in the sense of outliers. The second term is formulated using the curvature method which has the advantage of penalizing oscillations and keeping affine linear transformations [18].

Definition 2

A map \(f:X\rightarrow Y\) preserves topology if there exists \(f^{-1}\) and both f and \(f^{-1}\) are smooth.

For the third term and Definition 2, we use the topology preservation term that we first proposed in [19], but here we extended it to 3D. This penalization term is based on controlling the Jacobian determinant for preserving the anatomical structure of organs. Unlike works where topology preservation is not considered, such as [12, 14, 20, 21], in this work we demonstrate the relevance of preserving the heart anatomical structure specially during complex deformations. Taking these three terms, our energy functional is given by:

$$\begin{aligned} \begin{aligned}&{\hat{\mathbf {E}}}_{s}({\mathbf {B}})\\&\quad =\,\bigg (\frac{1}{m}\bigg )\underbrace{\int _{\Omega }\rho (f_{\mathrm{r}}^{s}(\varphi (x;{\mathbf {B}})+x)-f_{\mathrm{l}}^{s}(x)){\mathrm{d}x}}_{\text {data term}}\\&\qquad +\,\underbrace{\sum _{i=1}^{d}\int _{\Omega }(\Delta \varphi (x;{\mathbf {B}})_{i})^{2}{\mathrm{d}x}}_{\text {regularization term}} \underbrace{ + \int _{\Omega } \delta _{{\varphi }}(x;{\mathbf {B}}) {\mathrm{d}x} }_{\text {topology preservation term}} \end{aligned} \end{aligned}$$

where m is the number of pixels in the overlapped domain \(\Omega _{f_{\mathrm{r}},f_{\mathrm{l}}}\) and our term \(\delta _{{\varphi }}\) is defined as:

$$\begin{aligned} \delta _{\varphi }(x;{\mathbf {B}}):= \left\{ \begin{array}{ll} \frac{\displaystyle \frac{\displaystyle 1}{\displaystyle 2}\pi -\arctan (| J_{\varphi }(x;{\mathbf {B}})|)}{\displaystyle \pi } +\varphi \sqrt{\vert J_{\varphi }(x;{\mathbf {B}})\vert ^2} &{}\quad \mathbf if\mathbf |\; |J_{\varphi }(x;{\mathbf {B}})| -1\;| \ge \tau \\ 0 &{}\quad \text {otherwise} \\ \end{array}\right. \end{aligned}$$
Fig. 3
figure 3

(Top) Illustration of both RBM and CRBM architectures and (left bottom) how the reconstructed heart motion is used as an input for CRBM. (Right bottom) The accumulated lattice points over time

where \(\varphi \in {\mathbb {R}}^+\) offers a balance in our penalization and \(\tau \in {\mathbb {R}}^+\) is the margin of acceptance for values close to one. While the main purpose of the first term is to guarantee the positivity of the Jacobian determinant, which translates in avoiding the creation of new structures in the defined lattice, the second term penalizes big values which translates in prevention of big expansions and contractions. An illustrative explanation can be found in Supplementary Material Fig. 1. To solve our energy functional described in Eq. 2, we use the Levenberg–Marquardt (LM) method, which benefits of the advantage of both Gradient Descent and Gauss–Newton methods.

Cardiac motion prediction

During a RAMIS procedure, a common challenging factor is the presence of partial occlusions which compromises the tracking precision and could lead to algorithm failure. The studies in the literature of cardiac motion estimation cope with this problem using algorithms from classic estimation theory, such as the EKF and the Auto-Regressive eXogenous (ARX) model. In this work, we go beyond those solutions and use tools drawn from machine learning as an alternative to solve prediction of sequential data.

As in any supervised learning problem, a set of n training samples in the form of input–output pairs \(\{(x_{i},y_{i})\}_{i=1}^{n}\) is needed to find the function M that maps \(X\xrightarrow {M} Y\) and works well on unseen inputs x. Particularly, in a real clinical scenario, it is difficult to extract true observed values Y when estimating the cardiac motion. To mitigate the lack of a set Y and define a standard supervised learning approach, we slide [22] the given sequential data \(\{(x_{i})\}_{i=1}^{n}\) in the form \(Y=\{({x}_{i+d})\}_{i=1}^{n-1}\) where d is the time step size known as the lag, which results in input–output \(\{(x_{i},y_{i})\}_{i=1}^{n}\). An example illustrating this process can be found in supplementary material Fig. 2.

Taking the previous restructured data, our goal is to predict the heart motion within the lattice domain not just to deal with occlusion events, but as a feedback information for improving the heart motion estimation.

Definition 3

A restricted Boltzmann machine (RBM) is a two-layer graphical model that learns a probability distribution of a given set of inputs and can be defined as the energy E where the probability distribution of the visible and hidden units is given in terms of E as:

$$\begin{aligned}&E_{RBM}(v,h|W,b^{v},b^{h})= -( v^{\intercal } W^{vh}h+ v^{\intercal }b^{v}+ h^{\intercal }b^{h})\nonumber \\&= -\left( \sum _{i}\sum _{j}v_{i}W_{ij}h_{j}+\sum _{i}v_{i}b_{i}^{v}+\sum _{j}h_{j}b_{j}^{h}\right) \end{aligned}$$
$$\begin{aligned}&p(v,h)=\frac{1}{Z}exp({-E_{RBM}(v,h)}) \end{aligned}$$

where W refers to the weights matrix, h and v are the hidden and visible units, \(b^{v}\) and \(b^{h}\) are the unit bias, and Z the normalization factor.

Although RBMs are powerful models, they are not able to capture temporal dependencies from the model data. To cope with this problem, an extension of RBMs called conditional restricted Boltzmann machines (CRBM) [23] has been recently a focus of attention, and in particular, in dealing with motion capture [23, 24]. For illustration purposes, refer to the top part of Fig. 3.

Fig. 4
figure 4

(Left) Specularity elimination and inpainting results. (Right) Error and Signal-to noise ration (SNR) plots

For improving the cardiac motion estimation within the lattice domain, we exploit CRBM as a tool to, on the one side, improve the heart motion estimation and, on the other, predict the motion during occlusion events. Let c be the vector (the conditional) that contains the past information in the form time \(t-1, t-2, {\ldots }, t-M\) of the lattice (points motion). See the illustration in the bottom part of Fig. 3. The joint probability function, given the hidden and visible layers, the conditional data and M past elements, is expressed in terms of the energy \(E_\mathrm{CRBM}\) as:

$$\begin{aligned}&E_\mathrm{CRBM}(v_{t},h_{t}|c,W,{\mathcal {W}},b^{v},b^{h})= E_\mathrm{RBM}(v,h|W,b^{v},b^{h})\nonumber \\&-\sum _{m}\left( \sum _{k}\sum _{i}v_{ki,t-m}{\mathcal {W}}_{ki,t-m}v_{it} +\sum _{k}\sum _{j}v_{kj,t-m}{\mathcal {W}}_{kj,t-m}h_{j,t}\right) \end{aligned}$$
$$\begin{aligned}&p(v_{t},h_{t}|c,W,{\mathcal {W}},b^{v},b^{h})=\frac{1}{Z}e^{\big (-E_{CRBM}(v_{t},h_{t}|c,W,{\mathcal {W}},b^{v},b^{h})\big )} \end{aligned}$$

For training the CRBM, we used the well-known contrastive divergence algorithm [25]. Details about the architecture, for example number of units, are explained in the experimental results.

Experimental results

Cardiac data description

We used both phantom and in vivo datasets [26] to evaluate our approach. The phantom dataset is a silicon heart with cardiac motion. It is composed of 3389 stereo-pair images of size \(720\times 288\). We refer to this phantom dataset as Dataset I (see the bottom part of Fig. 4).

The in vivo data come from a robotic-assisted totally endoscopic coronary artery bypass surgery. It is composed of 1573 stereo-pair images of size \(720\times 288\). We refer to this sequence as Dataset II (see the top part of Fig. 4).

Fig. 5
figure 5

(From top to bottom) For each dataset: example input raw data frames, accumulated displacement of the reconstructed 3D heart at different time instances and visualization of the recovered region of interest. (Bottom left) The Jacobian Determinant results of our vision-based approach, with and without applying our topology preservation term, in two different cases: retrieval of complex deformation and under illumination variation. (Bottom right) The convergence results of our optimization process on the two datasets while using the topology preservation term

Fig. 6
figure 6

Motion of a point of interest over time used in the prediction stage

Results and discussion

In this section, we focus the attention on evaluating the three parts that compose our approach through a set of numerical results, graphical and visual analyses.

Specular-free approach The evaluation of our specular-free approach is shown in Fig. 4. To offer a quantitative evaluation of our detection approach, we used a ground truth from each of the sequences. The results showed that the specular highlight regions were detected with \(\sim \) 99% accuracy in all datasets. Aside from this numerical evaluation, we also show detection and inpainting results on frames from each dataset in the left part of Fig. 4. From visual inspection, it is clear that our approach is able to adapt well to diverse color variations. The right part of Fig. 4 shows visualizations of the inpainting results along with plots that represent Sobelev energy minimization and signal-to-noise ratio (SNR) improvement during the inpainting process.

Vision-based cardiac motion estimation We start evaluating our vision-based approach (see Eq. 2) by recovering the heart motion. In Fig. 5, we show the resulting 3D reconstruction of the heart surface using Datasets I and II. The top rows of Fig. 5 of both datasets show stereo-pair image samples with the region to be repaired pointed out. The middle rows show the accumulated displacement field of the complete image domain. As evidenced by the images, unlike Dataset I which exhibits a strong homogeneity in the surface, Dataset II presents strong visual texture which provides more stable features during the tracking process of the region of interest. The bottom rows from both datasets illustrate the 3D reconstruction of the region of interest (ROI), which is used as input to the next stage (prediction stage). We only use information from the ROI since the surgeon’s attention is focused on the zone to be repaired. The plots at the bottom rows clearly show pleasant visual results of the 3D ROI with both phantom and in vivo data.

For quantitative analysis, we evaluated the global performance of our vision-based approach. The first question that we pose is—How robust is our vision-based cardiac motion estimation approach?. To respond to this question, we carried out two experiments as follows:

  • Experiment 1: Without topology preservation by setting \(\delta _{{\varphi }}=0\) in Eq. 2.

  • Experiment 2: With our topology preservation term by setting \(\varphi =3\cdot 10^{-3}\) in Eq. 2.

After running both experiments, we found that the average range [min, max] of the Jacobian determinant for Exp. 1 was \([-\,2.5471, 3.0012]\) with an average residual error of the order of magnitude \(10^{-2}\), while for Exp. 2, the Jacobian exhibited stable values with an average range of [0.9715, 1.015] yielding to an average minima in the order of magnitude of \(10^{-7}\). The significance of the minima lies in the fact that a small value of the energy is equivalent to computational efficiency of the minimization. Some samples showing the Jacobian determinant over the region of interest are displayed at the bottom part of Fig. 5.

This results, together with a nonparametric Wilcoxon test that revealed statistical significant difference between both experiments, lead us to conclude that our penalizer helps obtaining a better minima and speeds up the solution convergence (see bottom right side of Fig. 5).

Cardiac motion prediction In this subsection, we analyze the performance of our approach during partial occlusions. To do this, we first extracted the motion of a point of interest in (x,y,z) directions from both datasets as shown in Fig. 6. This is the data used in the remaining of this section.

In order to offer a detailed analysis of our prediction scheme, we took two well-known predictors from classic estimation theory: the NARX and EKF. We use these two predictors to check whether a statistical significant difference exists between those schemes and the one based on CRBM over 200 frames.

Fig. 7
figure 7

Estimated vs predicted comparison in x, y, z directions and for two predictors from the body of the literature, NARX and EKF, and the one used in this work—CRBM over 200 future frames, and the corresponding RMSE

We begin by analyzing the NARX predictor and Fig. 7 (top left) shows the resulted prediction for x, y and z directions. From visual inspection, it is clear that for the x and y directions, the prediction was acceptable. However, in the z direction, the predicted values were far from the target. This is further supported by the root mean square error (RMSE) computed for all directions and plotted in the bottom of Fig. 7. The RMSE shows that NARX was able to predict x and y direction within a maximum RMSE of 1.1 mm, while z was far to be retrieved accurately since it reached a maximum of 1.7 mm with an average of 0.69 mm.

We also evaluated the performance of the EKF, which is probably the most used well-known predictor. The results are reported in Fig. 7 (top middle). A visual inspection shows that EKF overcame the NARX predictor in all directions. This is also evidenced by the RMSE reported in the bottom of Fig. 7 which exhibits a concentration of error values lower than 0.2 mm. Particularly, the maximum errors for x, y and z are 0.38, 0.43 and 0.27 mm, respectively, and the average RMSE is 0.1153 mm.

Finally, we evaluated the CRBM for predicting the cardiac motion. For the CRBM, we set the learning rate as \(10^{-2}\), a momentum value of 0.9 and 350 hidden units. The results from the prediction are shown in Fig. 7 (top right). In a visual comparison, one can see that the estimated values of the CRBM are closer to the target values. This is supported by the RMSE which offered a maximum value of 0.12 mm for all directions with an average of 0.071 mm.

But is there a significant difference in terms of prediction between NARX, EKF and CRBM? Results derived from the nonparametric Friedman test, \(\chi (3)=18.154\), \(p<0.001\), indicated statistically significant difference. This leads us to conclude that CRBM achieves a better prediction than NARX and EKF. The same quantitative analysis was performed with the in vivo dataset, in which results also favored the CRBM. (Detailed description can be found in supplementary material text and Fig. 3.)


In this work, we proposed recovering the 3D cardiac motion by the means of a variational framework that guarantees the anatomical preservation of the heart. A key point of our solution is its robustness to partial occlusions by using a generative model (a CRBM).

The results revealed a robust visual approach that reached an average minima in the order of magnitude of \(10^{-7}\) providing stable values for the Jacobian determinant. In terms of prediction, our approach using CRBM reported the lowest average RMSE of 0.071 in comparison with the NARX and EKF. This is further supported by a statistical test that pointed out significant difference in estimation between the three predictors. This together with the RMSE leads us to demonstrate the potential of using a CRBM (deep learning) in RAMIS scenarios.

While we wanted to demonstrate the potentials of combining a diffeomorphic variational framework with supervised learning techniques (particularly CRBM), from a technical point of view, the aim of this work is to report an initial study for a proof of concept. Future work will include a more extensive evaluation to explore the clinical potential of our approach


  1. Wilson EB, Bagshahi H, Woodruff VD (2014) Overview of general advantages, limitations, and strategies. In: Robotics in general surgery. Springer, New York, pp 17–22

  2. Pettinari M, Navarra E, Noirhomme P, Gutermann H (2017) The state of robotic cardiac surgery in Europe. Ann Cardiothor Surg 6:1

    Article  Google Scholar 

  3. Yuen SG, Kettler DT, Novotny PM, Plowes RD, Howe RD (2009) Robotic motion compensation for beating heart intracardiac surgery. Int J Robot Res (IJRR) 28(10):1355–1372

    Article  Google Scholar 

  4. Gagne J, Bachta W, Renaud P, Piccin O, Laroche É, Gangloff J (2014) Beating heart surgery: comparison of two active compensation solutions for minimally invasive coronary artery bypass grafting. In: Garbey M, Bass BL, Berceli S, Collet C, Cerveri P (eds) Computational surgery and dual training. Springer, New York, pp 203–210

  5. Lemma M, Mangini A, Redaelli A, Acocella F (2005) Do cardiac stabilizers really stabilize? Experimental quantitative analysis of mechanical stabilization. Interact Cardiovasc Thorac Surg 4(3):222–226

    Article  PubMed  Google Scholar 

  6. Falk V (2002) Manual control and tracking a human factor analysis relevant for beating heart surgery. Ann Thorac Surg 74(2):624–628

    Article  PubMed  Google Scholar 

  7. Dzwonczyk R, Carlos L, Sai-Sudhakar C, Sirak JH, Michler RE, Sun B, Kelbick N, Howie MB (2006) Vacuum-assisted apical suction devices induce passive electrical changes consistent with myocardial ischemia during off-pump coronary artery bypass graft surgery. Eur J Cardiothorac Surg 30(6):873–876

    Article  PubMed  Google Scholar 

  8. Ling Y, Bao L, Yang W, Chen Y, Gao Q (2016) Minimally invasive direct coronary artery bypass grafting with an improved rib spreader and a new-shaped cardiac stabilizer: results of 200 consecutive cases in a single institution. BMC Cardiovasc Disord 16(1):42

    Article  PubMed  PubMed Central  Google Scholar 

  9. Nakamura Y, Kishi K, Kawakami H (2001) Heartbeat synchronization for robotic cardiac surgery. In: IEEE international conference on robotics and automation (ICRA), vol 2. IEEE, pp 2014–2019

  10. Lau WW, Ramey NA, Corso JJ, Thakor NV, Hager GD (2004) Stereo-based endoscopic tracking of cardiac surface deformation. In: International conference on medical image computing and computer-assisted intervention (MICCAI). Springer, pp 494–501

  11. Ortmaier T, Groger M, Boehm DH, Falk V, Hirzinger G (2005) Motion estimation in beating heart surgery. IEEE Trans Biomed Eng 52(10):1729–1740

    Article  PubMed  Google Scholar 

  12. Richa R, Poignet P, Liu C (2010) Three-dimensional motion tracking for beating heart surgery using a thin-plate spline deformable model. Int J Robot Res 29:218–230

    Article  Google Scholar 

  13. Bogatyrenko E, Pompey P, Hanebeck UD (2011) Efficient physics-based tracking of heart surface motion for beating heart surgery robotic systems. Int J Comput Assist Radiol Surg (IJCARS) 6(3):387–399

    Article  Google Scholar 

  14. Wong W-K, Yang B, Liu C, Poignet P (2013) A quasi-spherical triangle-based approach for efficient 3-d soft-tissue motion tracking. IEEE/ASME Trans Mechatron 18(5):1472–1484

    Article  Google Scholar 

  15. Yang B, Wong W-K, Liu C, Poignet P (2014) 3D soft-tissue tracking using spatial-color joint probability distribution and thin-plate spline model. Pattern Recognit 47(9):2962–2973

    Article  Google Scholar 

  16. Yang B, Liu C, Zheng W, Liu S (2017) Motion prediction via online instantaneous frequency estimation for vision-based beating heart tracking. Inf Fusion 35:58–67

    Article  Google Scholar 

  17. Alsaleh SM, Aviles AI, Sobrevilla P, Casals A, Hahn JK (2016) Adaptive segmentation and mask-specific Sobolev inpaiting of specular highlights for endoscopic images. In: IEEE Engineering in Medicine and Biology Society (EMBC)

  18. Fischer B, Modersitzki J (2004) A unified approach to fast image registration and a new curvature based registration technique. Linear Algebra Appl 380:107–124

    Article  Google Scholar 

  19. Aviles AI, Widlak T, Casals A, Ammari H (2016) Towards estimating cardiac motion using low-rank representation and topology preservation for ultrafast ultrasound data. In: IEEE Engineering in Medicine and Biology Society (EMBC)

  20. Sauvée M, Noce A, Poignet P, Triboulet J, Dombre E (2007) Three-dimensional heart motion estimation using endoscopic monocular vision system: from artificial landmarks to texture analysis. Biomed Signal Process Control 2(3):199–207

    Article  Google Scholar 

  21. Lo B, Chung AJ, Stoyanov D, Mylonas G, Yang G-Z (2008) Real-time intra-operative 3D tissue deformation recovery. In: IEEE International symposium on biomedical imaging (ISBI), pp 1387–1390

  22. Dietterich TG (2002) Machine learning for sequential data: a review. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 15–30

  23. Taylor GW, Hinton GE, Roweis ST (2011) Two distributed-state models for generating high-dimensional time series. J Mach Learn Res (JMLR) 12:1025–1068

    Google Scholar 

  24. Zeiler MD, Taylor GW, Troje NF, Hinton GE (2009) Modeling pigeon behavior using a conditional restricted Boltzmann machine. In: ESANN

  25. Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800

    Article  PubMed  Google Scholar 

  26. Stoyanov D, Scarzanella MV, Pratt P, Yang G-Z (2010) Real-time stereo reconstruction in robotically assisted minimally invasive surgery. In: International conference on medical image computing and computer-assisted intervention (MICCAI). Springer, pp 275–282

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Angelica I. Aviles-Rivero.

Ethics declarations

Conflict of Interest:

The authors declare that they have no conflict of interest.

Ethical approval:

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent:

This article does not contain patient data.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3191 KB)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aviles-Rivero, A.I., Alsaleh, S.M. & Casals, A. Sliding to predict: vision-based beating heart motion estimation by modeling temporal interactions. Int J CARS 13, 353–361 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Motion estimation and prediction
  • Robotic surgery
  • Deep learning