Electromagnetic tracking (EMT) is a key technology to enable navigation in minimally invasive surgery without line of sight. As miniaturized sensors can be integrated into catheters, EMT has potential to be employed for guidewire navigation in abdominal aortic aneurysm repair (AAAR) [15, 16]. In current clinical practice, fluoroscopic X-ray imaging is considered the gold standard for guidewire navigation in endovascular aneurysm repair [5]. However, X-ray imaging exposes both the surgeon and the patient to ionizing radiation [8]. The high accuracy [13] and visual feedback of fluoroscopy means complete removal of X-ray in minimally invasive vascular surgery is unrealistic in near future. A more realistic approach is to consider a hybrid navigation framework. In this framework, continuous navigation will be performed by radiation-free EMT, while X-ray snapshots will be acquired on demand for recalibration or dexterous maneuver. This hybrid navigation reduces the amount of X-ray images that need to be captured during the procedure, which in turn will reduce the radiation exposure for both surgeon and patient.

EMT navigation is negatively affected by the presence of metal or electromagnetic interference within the vicinity of the tracking system. The presence of the c-arm X-ray unit within the operating room (OR) is a dominant source of metallic distortion for the EMT measurement. While it is well-known that passive countermeasures (such as removal of the metallic object) might mitigate such error [6], the c-arm fluoroscopy unit is essential for hybrid tracking procedures. Thus, the c-arm cannot be removed from the OR.

Table 1 Types of error in the active EMT compensation pipeline

Rather, an active error compensation is necessary to improve electromagnetic tracking accuracy. Unlike random error that can be eliminated by averaging recorded sensor data over multiple samples, compensating systematic error requires more sophisticated algorithms. Classical techniques such as lookup-tables [17], interpolation [23] or polynomial regression [9], only work under known distortion characteristics. Such offline compensation requires a tedious data acquisition procedure every time the c-arm position is changed. Clearly, these algorithms are impractical for hybrid navigation in the OR. EMT navigation in surgery thus requires online compensation approaches, where the compensating algorithm can be used in any distortion scenario. Training data for online compensation need to be collected only once for several scenarios. For the sake of brevity, all types of errors related to EMT and countermeasures are summarized in Table 1.

Fig. 1
figure 1

a Calibrated positions (dark dots) on Lego phantom. b Measurement setup in c-arm environment

Fig. 2
figure 2

Lego phantom (brown) on base board. Green area marks specified region of the trakSTAR

In this paper, we present an active online error compensation approach for EMT navigation in AAAR. First, we collect positional EMT data in one laboratory and several OR scenarios with different degrees of distortion. We capture EMT sensor positions on a calibrated Lego phantom (Figs. 1a and 2). Metallic distortion is artificially introduced to the magnetic field by positioning a c-arm fluoroscopy unit (Fig. 1b) in varying alignments next to the Lego setup. We then use an artificial neural network (ANN) for approximating a function that maps erroneous positions to compensated positions. The ANN is evaluated in an online setup to compensate distortions that are not available in the training phase. As model predictions can be uncertain in regions where training data are sparse, we assess spatial uncertainty inherent to the ANN. This uncertainty evaluation is performed at different regions of the navigation volume with varying availability of training data. In a final experiment, we simulate a trajectory resembling the guidewire path through the abdominal aorta. We use our knowledge about ANN inherent uncertainty for finding optimal points to acquire X-ray images. For guidewire insertion in the real OR, these X-ray images can be used to rectify uncertain (and hence erroneous) EMT sensor positions. Our simulation provides initial understanding to the trade-off in loss of precision versus reduction in radiation exposure using hybrid tracking.

This is the first work describing an uncertainty-aware online error compensation approach for EMT in endovascular surgery. Our contributions are twofold; first, we describe a neural network to approximate a spatial compensation function based on relative distances. The approximation works online in scenarios with unknown distortions. Second, we assess spatial model-inherent uncertainty of the neural network regression and its effect on positional error. This analysis provides us insight about the linear relation between model uncertainty and tracking accuracy, and the potential radiation-error trade-off for hybrid guidewire navigation in AAAR.

Related work

As a comprehensive description of all the active EMT error compensation techniques is beyond the scope of the paper, we point the reader to Franz et al. [6] and Kindratenko et al. [10] who provide a comprehensive review of this topic. Instead, in this paper, we mainly focus on the compensation techniques similar to ours. Kindratenko et al. [11] propose a two hidden layer neural network that outperforms polynomial fits and lookup-table compensation in an offline compensation setup.

Online compensation approaches use data from additional sensors [20] or sensor arrays [18] to map metallic distortions in the tracking volume. Sadjadi et al. propose a simultaneous localization and mapping (SLAM) approach that reduces positional error by 67%, but requires auxiliary sensors to be rigidly attached to the tracked instrument—which is not applicable to guidewires or catheters in endovascular navigation.

In endovascular surgery, the use of EMT is evaluated in several phantom [4, 22], swine [15, 22] and patient studies [16]. These studies show that there is potential for EMT to be applied in AAAR, with positional errors of up to 5 mm.

Neural networks, such as those we use for error compensation in this paper, are black boxes due to their complexity and nonlinearity. We therefore employ means to make model predictions traceable. Gal et al. [7] propose to use dropout masks for hidden layers at both training and inference time to obtain a Bayesian approximation for prediction uncertainty in classification problems. In this paper, we generalize this approximation to learn about the limits of the presented regression approach for spatial error compensation.


Positional tracking experiments are performed with an Ascension trakSTAR 3D Guidance system (Northern Digital Inc.) under the use of a 1.8 mm sensor. Positional EMT measurement data are collected on a calibrated Lego measurement phantom (repeatability \(20\,\upmu \hbox {m}\)) similar to the one proposed by us earlier [14]. EMT measurements are performed in laboratory and near a Ziehm Vision 3D c-arm fluoroscopy unit. Software for interfacing the trakSTAR system is developed in C++. Compensation models are implemented in Python (Python Software Foundation) using Keras [3] and tensorflow backend [1].

Table 2 Datasets collected in varying distances to c-arm and in a laboratory setup


First, training and evaluation datasets are acquired in one laboratory and multiple c-arm scenarios. We describe the acquisition and preprocessing in “Data acquisition and preprocessing” section. Acquired datasets are then used for training neural networks for EMT error compensation, which we describe in “Error compensating neural networks” section. We use our ANN in four experimental setups. In the first experiment, the ANN is trained on a multitude of datasets for online compensation (“Compensation of unseen distortions” section). Afterward, we perform an offline evaluation to compare the ANNs to a similar compensation approach (“Known distortion compensation” section). We then evaluate spatial model uncertainty for the online model in “Model uncertainty evaluation” section. Finally, in a simulation experiment (“Simulated hybrid AAAR intervention” section), we use model uncertainty to find a threshold for recalibration.

Data acquisition and preprocessing

Data points are captured by sequentially positioning a Lego block with an embedded EMT sensor on ten calibrated positions (see Fig. 1a) of the phantom. Random EMT error is eliminated by taking the median of 500 samples. We collect positional datasets in multiple scenarios with artificial distortion and in a laboratory scenario. Each distorted scenario uses a different c-arm alignment with respect to the Lego phantom. Table 2 shows the datasets collected for the experiments in “Compensation of unseen distortions” and “Known distortion compensation” sections. Positional data are collected in three phantom elevations in steps of 9.6 mm (height of one Lego brick). With each c-arm position, we also measure positions with the phantom rotated by \(180^\circ \) around its azimuth axis. Error values are calculated as \(e = ||x_2 - x_1|| - y\), where e is error, \(x_1\) and \(x_2\) are two different measuring points and y is the respective ground truth distance on the Lego board.

Error compensating neural networks

We mitigate systematic positional error by approximating a compensation function that maps erroneous to compensated points. The compensation function is approximated by a three-layer ANN with 32 units per layer. These parameters as well as the batch size (512) are estimated by grid search. The ANN uses leaky ReLU activations (\(\alpha =0.01\)) in the hidden layers to prevent vanishing gradients.

The compensation function has four input units for x, y, z and the trakSTAR quality indicator, which are all normalized to an interval [0, 1] to improve model stability. This normalization is reverted after the final layer, which contains three units with linear activations for x, y and z coordinates.

Fig. 3
figure 3

Neural network model for point compensation. \(x_1\), \(x_2\) are input points (x, y, z, quality), \(x_{1,c}\) and \(x_{2,c}\) are compensated output points. \(\ell \) is the displacement distance we use for computing the MSE-loss

During training, two ANNs with shared weights are arranged in parallel, as illustrated in Fig. 3. We can therefore train the compensation function on relative displacements, but use the trained function for absolute point predictions. Training on relative displacements between positions ensures that the exact distance of phantom to the field generator center does not need to be measured [14]. Hence, this approach circumvents the need for a second measurement standard to capture absolute positions, which would contribute to overall measurement uncertainty. In the training phase, the displacement error (mean-squared-error) is minimized by Adam optimizer [12] (\(\hbox {learning rate} = 0.01\)):

$$\begin{aligned} {\mathcal {L}} = ||f(x_{2},q_{2},\omega ) - f(x_{1},q_{1},\omega )||_2 - y \end{aligned}$$

where f denotes the learned compensation function approximated by the ANN, y is the ground truth displacement length, \(x_{1}\), \(x_{2}\) are measured EMT points, \(q_{1}\) and \(q_{2}\) are respective quality indicator values and \(\omega \) is the matrix of learned weights.

As mentioned earlier, we add a quality indicator value that is reported by the trakSTAR system along with every measurement, as additional input to the compensation models in expectation of better generalization performance across scenarios. According to the trakSTAR user manual [2], the quality value is computed from an internal error indication \(\epsilon \) and four user-defined quality-parameters:

$$\begin{aligned} Q = S \cdot (\epsilon - (b + m \cdot r)) \end{aligned}$$

where S, b, and m denote user parameters (sensitivity, offset, slope), r is the sensor-transmitter range. We obtain raw quality values by setting the user parameters to \(S=1,~b=0,~m=0\).

For comparison, we implement mixed-term polynomial regression models as proposed by Kügler et al. [14]. Both compensation models are trained on pairs of EMT sensor positions and corresponding ground truth distances on the Lego phantom.

Fig. 4
figure 4

Comparison of compensation performance in scenarios with unseen (left) and known (right) distortions. ANNs with and without quality indicator (Q) are compared to polynomials. Percentages denote error reduction

Compensation of unseen distortions

We train the ANN on data from four c-arm scenarios (see Table 2). For model validation, we use the data obtained in the laboratory setup. The trained model is evaluated on the remaining four datasets (Fig. 4). Likewise, we train and evaluate the polynomial regression model for comparison to the compensation approach proposed by Kügler et al. [14].

Known distortion compensation

In this experiment, we evaluate ANN models in the same c-arm setup in which they are trained (offline compensation). This experiment measures the best-case outcome for learning based compensation. We examine compensation for all scenarios in Table 2 individually, where training and evaluation sets are chosen to be spatially independent. The data are divided into training/validation/testing sets with a split of 45/5/50.

Model uncertainty evaluation

Model-inherent uncertainty for the ANN is estimated by applying \(10\%\) dropout during training and at inference time (Monte Carlo Dropout [7]). We take 3000 samples from the distribution of compensated output positions to obtain a Bayesian approximation of model-inherent uncertainty. Spatial uncertainty is expressed as the standard deviation \(\sigma =\sqrt{\sigma _x^2 + \sigma _y^2}\) for each point (xy) in the planar full-base-board dataset. Distributions of model predictions cannot be assumed to be Gaussian (see section 4.1 in [14]), so that the 68–95–99.7 rule for confidence interval approximation does not apply here.

Fig. 5
figure 5

Model-inherent uncertainty map projected onto the Lego base board. Dark spots mark measurement points used for training the ANN

Fig. 6
figure 6

Schematic abdominal aortic anatomy with guide wire path (left). Virtual EMT sensor trajectory segments (black arrows) in center of specified tracking volume (right). Dots correspond to measurement positions on Lego phantom

To examine spatial uncertainty for a large portion of the specified tracking volume, we collect a dataset in the 2D plane by moving the phantom to different positions on the gray Lego board. This dataset contains 10,598 different displacements, collected in six different alignments of the c-arm to the tracker. We use this dataset for training a neural network analogously to “Compensation of unseen distortions” section, but with modifications to the ANN layout. That is, a neural network with two layers, 64 units per layer and two output neurons (xy) is employed for the following evaluations.

Fig. 7
figure 7

Relationship between ANN uncertainty and compensation error. Red line shows linear regression

Fig. 8
figure 8

Relationship of ANN uncertainty to distance to nearest training point. Red line shows linear regression

Fig. 9
figure 9

Relationship between max. distance to training point and compensation RMSE. Red line shows linear regression

Fig. 10
figure 10

Error development along simulated trajectories with (blue) and without (red) compensation. Filled area depicts uncertainty

In addition to the training set, we collect measurement data for evaluation from all Lego points within the specified area (green area in Fig. 2). Unlike the training sets, this dataset also contains phantom points that were not calibrated. As phantom uncertainty (\(\approx ~20\,\upmu \hbox {m}\) according to [14]) is negligible compared to model-inherent uncertainty we want to examine, this simplification is valid. This measurement allows for an evaluation of spatial epistemic uncertainty over the whole base board (see Fig. 5).

Although the symmetric ANN is trained and validated on pairs of positions and respective ground truth distances, a single trained ANN can be used for absolute point compensation during inference (compare \(x_{1,c}\) and \(x_{2,c}\) in Fig. 3). In this experiment, we let our trained ANN predict compensated positions for the whole baseboard. Absolute positional error is then estimated by calculating the measured distances to adjacent points in a Moore neighborhood (\(r = 3\)) [21] and averaging the error.

Simulated hybrid AAAR intervention

Based on real EMT data, we simulate a sensor moving on a path inside a virtual abdominal aorta. This simulation is inspired by guidewire insertion in AAAR using hybrid navigation. Path shapes are motivated by those of abdominal aorta, as shown in Fig. 6. Since the average abdominal aorta is 20 cm to 25 cm in length, a simulated path of 21.9 cm is chosen.

Start and end points of each path segment are taken from the full-base-board dataset. Comparing the distances between segment start and end points to respective ground truth distances yields positional error. Uncertainty is determined position-wise as described in “Model uncertainty evaluation” section.

In hybrid navigation, guidewire position can be precisely recalibrated by X-ray pose estimation [13] and fiducial registration [15, 22]. Correcting the EMT sensor location by an X-ray image exposes the patient to radiation, so that recalibrations should rarely be employed. Hence, we are facing a trade-off between radiation dose and tracking accuracy. We introduce the concept of recalibration to our simulation by resetting error and accumulated uncertainty at calculated recalibration points.

We evaluate two different strategies for determining when to perform the recalibration in our simulation: (A) We choose recalibration points based on model uncertainty. Recalibration is performed when accumulated model uncertainty exceeds a certain threshold \(\tau \).

(B) We simulate recalibration in defined constant intervals based on traveled distance. We simulate the recalibration process for different adaptive thresholds \(\tau \). The same is done for different uniform distance intervals ranging from 0 cm to 21.9 cm.

Fig. 11
figure 11

Pareto front for radiation vs. error trade-off at seen (left) and unseen (right) regions of specified tracking volume


Error compensation In unseen scenarios (“Compensation of unseen distortions” section), ANNs clearly outperform polynomial regression models (Fig. 4). However, the compensated EMT error does not reach the results achieved by offline compensation. We observe that including the quality indicator in the model input improves generalization abilities of the neural network. Scenario-wise compensation (“Known distortion compensation” section) appears to be a simple task for both polynomial fits and ANNs, as we can compensate between \(35\%\) and \(75\%\) of error in each scenario. Both algorithms can reduce tracking error to sub-millimeter values in this offline setup.

Model uncertainty evaluation We compare error after compensation to model uncertainty, finding that both quantities are correlated (Fig. 7). Hypothesizing that model-inherent uncertainty grows with less training points nearby, we measure distances between points in the evaluation set and the nearest point in the training set. The resulting distances serve as a proxy measure for training point density. Figure 8 shows the relationship between training point density and the resulting model uncertainty. We observe that for distances to next training point greater than 35 mm, error after compensation grows linearly with decreasing point density (Fig. 9).

Simulated hybrid AAAR intervention Along with the sensor traveling along its path, uncertainty accumulates by \(\sigma _{n+1}=\sqrt{\sum _{i=1}^n\sigma _i^2}\). Figure 10 shows how error and uncertainty develop during virtual guidewire insertion with and without X-ray recalibration. The trade-off between required X-ray recalibrations and tracking error for uncertainty-based (blue, adaptive) and distance-based (red, static) triggering of X-ray recalibrations is illustrated in Fig. 11. Choosing a threshold of \(\tau =2\,\hbox {mm}\), as motivated by Fig. 7, yields a good compromise between tracking error and radiation exposure in both seen and unseen scenarios.


Although localization error of 4 mm are believed to be acceptable in endovascular surgery [22], improving EMT accuracy raises the overall trust of the hybrid navigation system. In this work, we have shown that ANNs can improve positional tracking to achieve sub-millimeter accuracy in an online setting. With higher localization accuracy, less X-ray images are needed for navigation.

Spatial uncertainty can be examined for ANN models and it should be utilized as a measure for model validation whenever compensation algorithms are employed. On the one hand, knowledge about model uncertainty can be used to refine phantom design. We have found that our ANN model requires a training point spacing of 35 mm to be sufficiently accurate, which should be considered in future experimental designs. On the other hand, our simulation experiments show that knowledge about model uncertainty can be exploited to minimize radiation exposure in the online hybrid setting. We envision that model-inherent uncertainty assessment will become an essential part of future EMT error compensation approaches.

The presented method can be further improved by conducting more realistic evaluations. Currently, assumptions about radiation reduction are solely made on the basis of simulations. Consequently, the findings presented in this paper should be assessed in realistic phantom or cadaver studies.

In addition, the rotational degrees of freedom (DOF) need to be considered for realistic evaluations. Especially the roll angle of EMT sensors are of great importance in endovascular procedures and will thus be a major subject of our future work. Our current evaluations show that ANN can compensate static error to sub-millimeter values with only a single sensor. However, the effort of data collection in multiple scenarios with all DOF poses a problem that still needs to be solved, for instance by automatized data acquisition.

Furthermore, only one source of metallic distortion (c-arm) is considered in our experiments. In the real OR, other metallic artifacts contribute to measurement error in addition to the c-arm. Considering additional artifacts, such as the patient bed, and the rotational DOF might require more complex models than the proposed neural network.

Conclusion and future work

In this paper, we present a novel active error compensation framework for EMT in endovascular surgery. We introduce neural networks capable of generalizing across distortion scenarios in single-sensor configuration while providing sub-millimeter accuracy. We also quantify the positional uncertainty of the error compensating neural network. When error compensated EMT reaches its limits, we show that knowledge about positional uncertainty helps to get EMT navigation back on track. Our work suggests inherent limits of spatial uncertainty that can only be realized when EMT and the compensation scheme are evaluated in tandem. In future phantom evaluation protocols, we will consider these spatial uncertainty limits.

In the future, we will work on automatized data acquisition protocols in order to extend our approach to more than two DOF. Moving toward more realistic evaluations, we will evaluate our method in a hybrid setup with 3D printed aortic phantoms and additional metallic artifacts.