Abstract
Purpose
Technical advancements have been part of modern medical solutions as they promote better surgical alternatives that serve to the benefit of patients. Particularly with cardiovascular surgeries, robotic surgical systems enable surgeons to perform delicate procedures on a beating heart, avoiding the complications of cardiac arrest. This advantage comes with the price of having to deal with a dynamic target which presents technical challenges for the surgical system. In this work, we propose a solution for cardiac motion estimation.
Methods
Our estimation approach uses a variational framework that guarantees preservation of the complex anatomy of the heart. An advantage of our approach is that it takes into account different disturbances, such as specular reflections and occlusion events. This is achieved by performing a preprocessing step that eliminates the specular highlights and a predicting step, based on a conditional restricted Boltzmann machine, that recovers missing information caused by partial occlusions.
Results
We carried out exhaustive experimentations on two datasets, one from a phantom and the other from an in vivo procedure. The results show that our visual approach reaches an average minima in the order of magnitude of \(10^{7}\) while preserving the heart’s anatomical structure and providing stable values for the Jacobian determinant ranging from 0.917 to 1.015. We also show that our specular elimination approach reaches an accuracy of 99% compared to a ground truth. In terms of prediction, our approach compared favorably against two wellknown predictors, NARX and EKF, giving the lowest average RMSE of 0.071.
Conclusion
Our approach avoids the risks of using mechanical stabilizers and can also be effective for acquiring the motion of organs other than the heart, such as the lung or other deformable objects.
Introduction
Roboticassisted minimally invasive surgery (RAMIS) has been an attractive alternative to traditional and laparoscopic surgeries during the last years since it offers diverse advantages to both surgeons and patients [1]. Particularly, RAMIS has allowed performing complex procedures including offpump coronary artery bypass grafting (OPCABG) [2]. This procedure avoids the associated complications of using cardiopulmonary bypass (CPB) since the heart is not arrested. Thus, surgeons have to deal with a dynamic target which compromises their dexterity and precision.
To compensate the heart motion, different authors have proposed solutions based on mechanical stabilization (for example, see [3, 4]), in which small devices are positioned over the heart surface to keep the region to be repaired in a steady state. However, works such as the one presented in [5] reported that there is still a significant residual motion (1.5–2.4 mm) after mechanical stabilization. This entails the need of manual compensation from the surgeon, which is not possible since the heart motion exceeds the human tracking bandwidth [6]. Moreover, these mechanical stabilizers can only be positioned on a small region of the heart surface and can cause irreversible heart damage that affects the cardiac mechanics [7, 8].
To overcome those difficulties, the pioneered work of Nakamura [9] reported that motion cancelation is possible by tracking the heart dynamics and continuously synchronizing this motion with the robot. This direction has been followed by different authors. An imagebased motion tracking algorithm was proposed in [10] for retrieving the cardiac surface deformation using a stereo endoscopic system. However, authors in that work did not take into account the effect of occlusions on the performance and stability of the tracking algorithm. Later on, Ortmaier et al. presented in [11] a 2D affine matching algorithm using natural landmarks for estimating the heart motion. These authors dealt with occlusions by integrating a prediction scheme based on Takens’ theorem and combining electrocardiogram and respiration pressure signals.
Richa et al. [12] proposed tracking the heart surface using a thinplate spline (TPS) deformable model and included an illumination compensation solution. Another approach was presented in [13] in which the heart motion was retrieved using a stochastic physicsbased tracking technique and occlusions were tackled using a extended Kalman filter (EKF). Another 3D tracking approach based on a quasispherical triangle was introduced in [14] where authors modeled the heart surface using a trianglebased model with a curving parameter. They handled occlusions by applying an algorithm based on the peakvalley characteristics of motion signals.
In more recent works, authors in [15] presented a scheme for tracking the heart motion using two recursive processes. The first represents the target region in joint spatial color space, while the second applies the thinplate spline model to fit the heart shape around the region of interest. Yang [16] proposed a motion prediction scheme for tracking the heart motion during occlusion events based on the dual Kalman filter in which a point of interest was modeled as a dual timevarying Fourier series.
Aim of this work
In this work, we propose a new approach to estimate the heart motion in which the main contributions of our solution are:

A diffeomorphic variational framework that is able to deal with the inherent complex deformation of a beating heart while guaranteeing preservation of the anatomy using a topology preserving penalizer. Our framework maintains affine linear transformations by means of the curvature penalizer and incorporates a preprocessing stage for dealing with specular highlights.

A prediction stage, which is a key point of this paper as it is different from existing approaches related to the problem at hand. We propose sliding the cardiac motion data to formulate a standard supervised learning problem, which is handled via a conditional restricted Boltzmann machine (CRBM).
Toward estimating the beating heart motion
In this section, we present our approach which is composed of three main parts illustrated in Fig. 1, but in this work we focus on the second and third parts.
Cardiac motion estimation
Specular highlights hinder the performance of the visionbased solution as they partially occlude the targeted surface, appear as additional features, generate discontinuities in the images or cause loss of texture or color information. In this work, we adapted our specularfree image solution, presented in [17], to the stereopair frames case.
Assume a calibrated image sequence \(G=\{g_{s}\}_{s=0}^{S1}\) composed of S stereopair frames, where \(g_{s}=\{f_{\mathrm{r}}^{s},f_{\mathrm{l}}^{s}\}\). Let \(f_{\mathrm{r}}^{s}\rightarrow {\mathbb {R}}^2\) and \(f_{\mathrm{l}}^{s}\rightarrow {\mathbb {R}}^2\) denote the left and right view of s in its bounded domain \(\Omega \). To retrieve the heart motion, we start with defining a lattice on each stereo view according to the next definition:
Definition 1
A lattice, \({\mathfrak {L}}\), is a subgroup in a real vector space V of dimension d that has the form \({\mathbb {Z}}v_{1}+\cdots +{\mathbb {Z}}v_{d}\)
Consider \({\mathfrak {L}}_{\mathrm{l}}^{s},{\mathfrak {L}}_{\mathrm{r}}^{s} \subset {\mathbb {R}}^{2}\) as the lattices defined at the left and right views of \(g_{s}\). We recover the 3D heart surface by computing the projections of the corresponding points from \({\mathfrak {L}}_{\mathrm{l}}^{s}\) and \({\mathfrak {L}}_{\mathrm{r}}^{s}\) as illustrated in Fig. 2, which results in the three dimensional lattice \({\mathfrak {L}}^{s}\subset {\mathbb {R}}^{3}\) with a set of lattice points \({\mathbf {B}}\). In this work, we represent the deformable heart surface by the tensor product of the bsplines \(\xi _{c}\). Assume a given position \(x \subseteq {\mathbb {R}}^{d}\), a defined ddimensional lattice point as \(z:=y_{1}{\ldots }y_{d}\) and the n degree bsplines. Then deformation can be represented as:
After defining the deformation model, the changes on the heart surface’s deformation over time are computed by an energy functional that is composed of three terms: (i) a data term that allows measuring the discrepancy between the current \(f_{\mathrm{r}}\) and \(f_{\mathrm{l}}\), (ii) a regularization term that enforces a plausible transformation and (iii) a topology preservation term which ensures connectivity between the structures created within the lattice.
Particularly, we represent the data term with the sum of squared differences modifying the minimization of the residual error \(\sum _{i}{r_{i}^{2}}\) for \(\sum _{i}\rho ({r_{i}})\) where \(\rho \) is the Tukey’s M estimator for increasing robustness in the sense of outliers. The second term is formulated using the curvature method which has the advantage of penalizing oscillations and keeping affine linear transformations [18].
Definition 2
A map \(f:X\rightarrow Y\) preserves topology if there exists \(f^{1}\) and both f and \(f^{1}\) are smooth.
For the third term and Definition 2, we use the topology preservation term that we first proposed in [19], but here we extended it to 3D. This penalization term is based on controlling the Jacobian determinant for preserving the anatomical structure of organs. Unlike works where topology preservation is not considered, such as [12, 14, 20, 21], in this work we demonstrate the relevance of preserving the heart anatomical structure specially during complex deformations. Taking these three terms, our energy functional is given by:
where m is the number of pixels in the overlapped domain \(\Omega _{f_{\mathrm{r}},f_{\mathrm{l}}}\) and our term \(\delta _{{\varphi }}\) is defined as:
where \(\varphi \in {\mathbb {R}}^+\) offers a balance in our penalization and \(\tau \in {\mathbb {R}}^+\) is the margin of acceptance for values close to one. While the main purpose of the first term is to guarantee the positivity of the Jacobian determinant, which translates in avoiding the creation of new structures in the defined lattice, the second term penalizes big values which translates in prevention of big expansions and contractions. An illustrative explanation can be found in Supplementary Material Fig. 1. To solve our energy functional described in Eq. 2, we use the Levenberg–Marquardt (LM) method, which benefits of the advantage of both Gradient Descent and Gauss–Newton methods.
Cardiac motion prediction
During a RAMIS procedure, a common challenging factor is the presence of partial occlusions which compromises the tracking precision and could lead to algorithm failure. The studies in the literature of cardiac motion estimation cope with this problem using algorithms from classic estimation theory, such as the EKF and the AutoRegressive eXogenous (ARX) model. In this work, we go beyond those solutions and use tools drawn from machine learning as an alternative to solve prediction of sequential data.
As in any supervised learning problem, a set of n training samples in the form of input–output pairs \(\{(x_{i},y_{i})\}_{i=1}^{n}\) is needed to find the function M that maps \(X\xrightarrow {M} Y\) and works well on unseen inputs x. Particularly, in a real clinical scenario, it is difficult to extract true observed values Y when estimating the cardiac motion. To mitigate the lack of a set Y and define a standard supervised learning approach, we slide [22] the given sequential data \(\{(x_{i})\}_{i=1}^{n}\) in the form \(Y=\{({x}_{i+d})\}_{i=1}^{n1}\) where d is the time step size known as the lag, which results in input–output \(\{(x_{i},y_{i})\}_{i=1}^{n}\). An example illustrating this process can be found in supplementary material Fig. 2.
Taking the previous restructured data, our goal is to predict the heart motion within the lattice domain not just to deal with occlusion events, but as a feedback information for improving the heart motion estimation.
Definition 3
A restricted Boltzmann machine (RBM) is a twolayer graphical model that learns a probability distribution of a given set of inputs and can be defined as the energy E where the probability distribution of the visible and hidden units is given in terms of E as:
where W refers to the weights matrix, h and v are the hidden and visible units, \(b^{v}\) and \(b^{h}\) are the unit bias, and Z the normalization factor.
Although RBMs are powerful models, they are not able to capture temporal dependencies from the model data. To cope with this problem, an extension of RBMs called conditional restricted Boltzmann machines (CRBM) [23] has been recently a focus of attention, and in particular, in dealing with motion capture [23, 24]. For illustration purposes, refer to the top part of Fig. 3.
For improving the cardiac motion estimation within the lattice domain, we exploit CRBM as a tool to, on the one side, improve the heart motion estimation and, on the other, predict the motion during occlusion events. Let c be the vector (the conditional) that contains the past information in the form time \(t1, t2, {\ldots }, tM\) of the lattice (points motion). See the illustration in the bottom part of Fig. 3. The joint probability function, given the hidden and visible layers, the conditional data and M past elements, is expressed in terms of the energy \(E_\mathrm{CRBM}\) as:
For training the CRBM, we used the wellknown contrastive divergence algorithm [25]. Details about the architecture, for example number of units, are explained in the experimental results.
Experimental results
Cardiac data description
We used both phantom and in vivo datasets [26] to evaluate our approach. The phantom dataset is a silicon heart with cardiac motion. It is composed of 3389 stereopair images of size \(720\times 288\). We refer to this phantom dataset as Dataset I (see the bottom part of Fig. 4).
The in vivo data come from a roboticassisted totally endoscopic coronary artery bypass surgery. It is composed of 1573 stereopair images of size \(720\times 288\). We refer to this sequence as Dataset II (see the top part of Fig. 4).
Results and discussion
In this section, we focus the attention on evaluating the three parts that compose our approach through a set of numerical results, graphical and visual analyses.
Specularfree approach The evaluation of our specularfree approach is shown in Fig. 4. To offer a quantitative evaluation of our detection approach, we used a ground truth from each of the sequences. The results showed that the specular highlight regions were detected with \(\sim \) 99% accuracy in all datasets. Aside from this numerical evaluation, we also show detection and inpainting results on frames from each dataset in the left part of Fig. 4. From visual inspection, it is clear that our approach is able to adapt well to diverse color variations. The right part of Fig. 4 shows visualizations of the inpainting results along with plots that represent Sobelev energy minimization and signaltonoise ratio (SNR) improvement during the inpainting process.
Visionbased cardiac motion estimation We start evaluating our visionbased approach (see Eq. 2) by recovering the heart motion. In Fig. 5, we show the resulting 3D reconstruction of the heart surface using Datasets I and II. The top rows of Fig. 5 of both datasets show stereopair image samples with the region to be repaired pointed out. The middle rows show the accumulated displacement field of the complete image domain. As evidenced by the images, unlike Dataset I which exhibits a strong homogeneity in the surface, Dataset II presents strong visual texture which provides more stable features during the tracking process of the region of interest. The bottom rows from both datasets illustrate the 3D reconstruction of the region of interest (ROI), which is used as input to the next stage (prediction stage). We only use information from the ROI since the surgeon’s attention is focused on the zone to be repaired. The plots at the bottom rows clearly show pleasant visual results of the 3D ROI with both phantom and in vivo data.
For quantitative analysis, we evaluated the global performance of our visionbased approach. The first question that we pose is—How robust is our visionbased cardiac motion estimation approach?. To respond to this question, we carried out two experiments as follows:

Experiment 1: Without topology preservation by setting \(\delta _{{\varphi }}=0\) in Eq. 2.

Experiment 2: With our topology preservation term by setting \(\varphi =3\cdot 10^{3}\) in Eq. 2.
After running both experiments, we found that the average range [min, max] of the Jacobian determinant for Exp. 1 was \([\,2.5471, 3.0012]\) with an average residual error of the order of magnitude \(10^{2}\), while for Exp. 2, the Jacobian exhibited stable values with an average range of [0.9715, 1.015] yielding to an average minima in the order of magnitude of \(10^{7}\). The significance of the minima lies in the fact that a small value of the energy is equivalent to computational efficiency of the minimization. Some samples showing the Jacobian determinant over the region of interest are displayed at the bottom part of Fig. 5.
This results, together with a nonparametric Wilcoxon test that revealed statistical significant difference between both experiments, lead us to conclude that our penalizer helps obtaining a better minima and speeds up the solution convergence (see bottom right side of Fig. 5).
Cardiac motion prediction In this subsection, we analyze the performance of our approach during partial occlusions. To do this, we first extracted the motion of a point of interest in (x,y,z) directions from both datasets as shown in Fig. 6. This is the data used in the remaining of this section.
In order to offer a detailed analysis of our prediction scheme, we took two wellknown predictors from classic estimation theory: the NARX and EKF. We use these two predictors to check whether a statistical significant difference exists between those schemes and the one based on CRBM over 200 frames.
We begin by analyzing the NARX predictor and Fig. 7 (top left) shows the resulted prediction for x, y and z directions. From visual inspection, it is clear that for the x and y directions, the prediction was acceptable. However, in the z direction, the predicted values were far from the target. This is further supported by the root mean square error (RMSE) computed for all directions and plotted in the bottom of Fig. 7. The RMSE shows that NARX was able to predict x and y direction within a maximum RMSE of 1.1 mm, while z was far to be retrieved accurately since it reached a maximum of 1.7 mm with an average of 0.69 mm.
We also evaluated the performance of the EKF, which is probably the most used wellknown predictor. The results are reported in Fig. 7 (top middle). A visual inspection shows that EKF overcame the NARX predictor in all directions. This is also evidenced by the RMSE reported in the bottom of Fig. 7 which exhibits a concentration of error values lower than 0.2 mm. Particularly, the maximum errors for x, y and z are 0.38, 0.43 and 0.27 mm, respectively, and the average RMSE is 0.1153 mm.
Finally, we evaluated the CRBM for predicting the cardiac motion. For the CRBM, we set the learning rate as \(10^{2}\), a momentum value of 0.9 and 350 hidden units. The results from the prediction are shown in Fig. 7 (top right). In a visual comparison, one can see that the estimated values of the CRBM are closer to the target values. This is supported by the RMSE which offered a maximum value of 0.12 mm for all directions with an average of 0.071 mm.
But is there a significant difference in terms of prediction between NARX, EKF and CRBM? Results derived from the nonparametric Friedman test, \(\chi (3)=18.154\), \(p<0.001\), indicated statistically significant difference. This leads us to conclude that CRBM achieves a better prediction than NARX and EKF. The same quantitative analysis was performed with the in vivo dataset, in which results also favored the CRBM. (Detailed description can be found in supplementary material text and Fig. 3.)
Conclusions
In this work, we proposed recovering the 3D cardiac motion by the means of a variational framework that guarantees the anatomical preservation of the heart. A key point of our solution is its robustness to partial occlusions by using a generative model (a CRBM).
The results revealed a robust visual approach that reached an average minima in the order of magnitude of \(10^{7}\) providing stable values for the Jacobian determinant. In terms of prediction, our approach using CRBM reported the lowest average RMSE of 0.071 in comparison with the NARX and EKF. This is further supported by a statistical test that pointed out significant difference in estimation between the three predictors. This together with the RMSE leads us to demonstrate the potential of using a CRBM (deep learning) in RAMIS scenarios.
While we wanted to demonstrate the potentials of combining a diffeomorphic variational framework with supervised learning techniques (particularly CRBM), from a technical point of view, the aim of this work is to report an initial study for a proof of concept. Future work will include a more extensive evaluation to explore the clinical potential of our approach
References
Wilson EB, Bagshahi H, Woodruff VD (2014) Overview of general advantages, limitations, and strategies. In: Robotics in general surgery. Springer, New York, pp 17–22
Pettinari M, Navarra E, Noirhomme P, Gutermann H (2017) The state of robotic cardiac surgery in Europe. Ann Cardiothor Surg 6:1
Yuen SG, Kettler DT, Novotny PM, Plowes RD, Howe RD (2009) Robotic motion compensation for beating heart intracardiac surgery. Int J Robot Res (IJRR) 28(10):1355–1372
Gagne J, Bachta W, Renaud P, Piccin O, Laroche É, Gangloff J (2014) Beating heart surgery: comparison of two active compensation solutions for minimally invasive coronary artery bypass grafting. In: Garbey M, Bass BL, Berceli S, Collet C, Cerveri P (eds) Computational surgery and dual training. Springer, New York, pp 203–210
Lemma M, Mangini A, Redaelli A, Acocella F (2005) Do cardiac stabilizers really stabilize? Experimental quantitative analysis of mechanical stabilization. Interact Cardiovasc Thorac Surg 4(3):222–226
Falk V (2002) Manual control and tracking a human factor analysis relevant for beating heart surgery. Ann Thorac Surg 74(2):624–628
Dzwonczyk R, Carlos L, SaiSudhakar C, Sirak JH, Michler RE, Sun B, Kelbick N, Howie MB (2006) Vacuumassisted apical suction devices induce passive electrical changes consistent with myocardial ischemia during offpump coronary artery bypass graft surgery. Eur J Cardiothorac Surg 30(6):873–876
Ling Y, Bao L, Yang W, Chen Y, Gao Q (2016) Minimally invasive direct coronary artery bypass grafting with an improved rib spreader and a newshaped cardiac stabilizer: results of 200 consecutive cases in a single institution. BMC Cardiovasc Disord 16(1):42
Nakamura Y, Kishi K, Kawakami H (2001) Heartbeat synchronization for robotic cardiac surgery. In: IEEE international conference on robotics and automation (ICRA), vol 2. IEEE, pp 2014–2019
Lau WW, Ramey NA, Corso JJ, Thakor NV, Hager GD (2004) Stereobased endoscopic tracking of cardiac surface deformation. In: International conference on medical image computing and computerassisted intervention (MICCAI). Springer, pp 494–501
Ortmaier T, Groger M, Boehm DH, Falk V, Hirzinger G (2005) Motion estimation in beating heart surgery. IEEE Trans Biomed Eng 52(10):1729–1740
Richa R, Poignet P, Liu C (2010) Threedimensional motion tracking for beating heart surgery using a thinplate spline deformable model. Int J Robot Res 29:218–230
Bogatyrenko E, Pompey P, Hanebeck UD (2011) Efficient physicsbased tracking of heart surface motion for beating heart surgery robotic systems. Int J Comput Assist Radiol Surg (IJCARS) 6(3):387–399
Wong WK, Yang B, Liu C, Poignet P (2013) A quasispherical trianglebased approach for efficient 3d softtissue motion tracking. IEEE/ASME Trans Mechatron 18(5):1472–1484
Yang B, Wong WK, Liu C, Poignet P (2014) 3D softtissue tracking using spatialcolor joint probability distribution and thinplate spline model. Pattern Recognit 47(9):2962–2973
Yang B, Liu C, Zheng W, Liu S (2017) Motion prediction via online instantaneous frequency estimation for visionbased beating heart tracking. Inf Fusion 35:58–67
Alsaleh SM, Aviles AI, Sobrevilla P, Casals A, Hahn JK (2016) Adaptive segmentation and maskspecific Sobolev inpaiting of specular highlights for endoscopic images. In: IEEE Engineering in Medicine and Biology Society (EMBC)
Fischer B, Modersitzki J (2004) A unified approach to fast image registration and a new curvature based registration technique. Linear Algebra Appl 380:107–124
Aviles AI, Widlak T, Casals A, Ammari H (2016) Towards estimating cardiac motion using lowrank representation and topology preservation for ultrafast ultrasound data. In: IEEE Engineering in Medicine and Biology Society (EMBC)
Sauvée M, Noce A, Poignet P, Triboulet J, Dombre E (2007) Threedimensional heart motion estimation using endoscopic monocular vision system: from artificial landmarks to texture analysis. Biomed Signal Process Control 2(3):199–207
Lo B, Chung AJ, Stoyanov D, Mylonas G, Yang GZ (2008) Realtime intraoperative 3D tissue deformation recovery. In: IEEE International symposium on biomedical imaging (ISBI), pp 1387–1390
Dietterich TG (2002) Machine learning for sequential data: a review. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 15–30
Taylor GW, Hinton GE, Roweis ST (2011) Two distributedstate models for generating highdimensional time series. J Mach Learn Res (JMLR) 12:1025–1068
Zeiler MD, Taylor GW, Troje NF, Hinton GE (2009) Modeling pigeon behavior using a conditional restricted Boltzmann machine. In: ESANN
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800
Stoyanov D, Scarzanella MV, Pratt P, Yang GZ (2010) Realtime stereo reconstruction in robotically assisted minimally invasive surgery. In: International conference on medical image computing and computerassisted intervention (MICCAI). Springer, pp 275–282
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest:
The authors declare that they have no conflict of interest.
Ethical approval:
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent:
This article does not contain patient data.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
AvilesRivero, A.I., Alsaleh, S.M. & Casals, A. Sliding to predict: visionbased beating heart motion estimation by modeling temporal interactions. Int J CARS 13, 353–361 (2018). https://doi.org/10.1007/s1154801817021
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1154801817021
Keywords
 Motion estimation and prediction
 Robotic surgery
 Deep learning