Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Robertshaw, Harry; Karstensen, Lennart; Jackson, Benjamin; Granados, Alejandro; Booth, Thomas C.

doi:10.1007/s11548-024-03208-w

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Original Article
Open access
Published: 17 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Download PDF

Harry Robertshaw¹,
Lennart Karstensen²,
Benjamin Jackson¹,
Alejandro Granados¹ &
…
Thomas C. Booth^1,3

338 Accesses
13 Altmetric
1 Mention
Explore all metrics

Abstract

Purpose

Autonomous navigation of catheters and guidewires can enhance endovascular surgery safety and efficacy, reducing procedure times and operator radiation exposure. Integrating tele-operated robotics could widen access to time-sensitive emergency procedures like mechanical thrombectomy (MT). Reinforcement learning (RL) shows potential in endovascular navigation, yet its application encounters challenges without a reward signal. This study explores the viability of autonomous guidewire navigation in MT vasculature using inverse reinforcement learning (IRL) to leverage expert demonstrations.

Methods

Employing the Simulation Open Framework Architecture (SOFA), this study established a simulation-based training and evaluation environment for MT navigation. We used IRL to infer reward functions from expert behaviour when navigating a guidewire and catheter. We utilized the soft actor-critic algorithm to train models with various reward functions and compared their performance in silico.

Results

We demonstrated feasibility of navigation using IRL. When evaluating single- versus dual-device (i.e. guidewire versus catheter and guidewire) tracking, both methods achieved high success rates of 95% and 96%, respectively. Dual tracking, however, utilized both devices mimicking an expert. A success rate of 100% and procedure time of 22.6 s were obtained when training with a reward function obtained through ‘reward shaping’. This outperformed a dense reward function (96%, 24.9 s) and an IRL-derived reward function (48%, 59.2 s).

Conclusions

We have contributed to the advancement of autonomous endovascular intervention navigation, particularly MT, by effectively employing IRL based on demonstrator expertise. The results underscore the potential of using reward shaping to efficiently train models, offering a promising avenue for enhancing the accessibility and precision of MT procedures. We envisage that future research can extend our methodology to diverse anatomical structures to enhance generalizability.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Article 13 March 2023

Introduction to Reinforcement Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Cardiovascular (CV) diseases are the most common cause of death across Europe, accounting for more than 4 million deaths each year, with cerebrovascular disease accounting for 25.4% of CV-related mortality across all ages and genders [1]. Mechanical thrombectomy (MT) has become an established treatment for patients with acute ischaemic stroke due to large vessel occlusion, increasing the likelihood that a patient will be functionally independent after a stroke [2,3,4,5]. During such a procedure, an operator navigates a guidewire and guide catheter from an insertion point (typically the common femoral artery) through the common iliac artery and descending aorta to the aortic arch. Subsequently, a guide catheter is typically advanced over a guidewire through the aortic arch to the distal aspect of the cervical segment of the internal carotid artery (ICA) (with or without the help of a slip catheter). Once the guide catheter is positioned in the distal aspect of the cervical segment of the ICA, it serves as an anchor for a subsequent step. The subsequent step is often the navigation of a microguidewire and microcatheter through the distal aspect of the cervical segment of the ICA to the site of the thrombus which is typically the M1 segment of the middle cerebral artery (MCA). Once the target site has been reached, a mesh-like device called a stent retriever is pushed through the thrombus within a microcatheter, expanded to engage the thrombus, and then pulled backwards, which removes the thrombus from the blood vessel. Alternatively, an aspiration catheter—where a vacuum sucks the thrombus from the artery—is used instead of a stent retriever.

In acute ischaemic stroke, time from symptom onset to treatment is crucial, as the benefits of MT become more marked the sooner a thrombus is removed [6]. As a result, in the UK for example, only 3.1% of stroke admissions benefit from MT despite at least 10% of patients eligible for treatment [7, 8]. Other challenges for MT relate to occasional complications, including perforation, thrombosis and dissection in the parent artery, and distal embolization of thrombus [9]. Moreover, angiography requires intravascular contrast agent administration, which can occasionally lead to nephrotoxicity [10]. For operators and their teams, the high cumulative dose of x-ray radiation from angiography is a risk factor for cancer and cataracts [11]. Although exposure can be minimized with current radiation protection practice, some measures involve operators wearing heavy protective equipment, a risk factor for orthopaedic complications, so alternative exposure reduction methods are beneficial [12, 13].

It is hoped that robotic surgical systems can mitigate or eliminate some of the challenges that MT presents. For example, robotic systems could be set up in hospitals nationwide and tele-operated remotely from a central location, increasing the speed of access to treatments such as MT beyond what is currently possible [14]. Additionally, robotic systems might eliminate any operator physiological tremors or fatigue and allow MT to be performed in an optimum ergonomic position while potentially increasing procedural precision (for example, procedure time), thereby improving overall performance and reducing complication rates [15]. Furthermore, as operators would not be required to stand next to the patient, their radiation exposure would be reduced, and the need to wear heavy protective equipment would be eliminated.

Robotic systems such as the Magellan$^{\textrm{TM}}$ system (Auris Health, Redwood City, USA) and the Corpath GRX^® (Corindus Vascular Robotics, USA) have been used to help alleviate some of the challenges of MT; however, they have limitations. The controller operator structure requires a reasonably high cognitive workload and can still result in human error which means that the procedure is limited to an individual operator’s skill set in terms of both MT and robot handling [16]. These robotic systems consist of user interfaces such as buttons and joysticks, requiring skills different from those used in current clinical practice [17]. Additionally, a lack of haptic feedback from robotic systems results in difficulty receiving tactile feedback from the catheters and guidewires as they interact with vessel walls [14].

One emerging method of mitigating some of these challenges is by applying artificial intelligence (AI) techniques to robotic systems. AI, and in particular, machine learning (ML), has accelerated in recent years in its applications for data analysis and learning [18], with many areas of healthcare already making use of this technology for disease prediction and diagnosis [19, 20]. By using ML in the autonomous navigation of guidewires and catheters in MT, it is plausible that in endovascular specialities facing a shortage of highly trained operators, tele-operated MT may be performed safely and effectively by a few highly trained operators based in centralized neuroscience centres. Alternatively, less experienced operators—for example, endovascular operators (e.g. interventional radiologists who are not used to performing neurointerventional procedures)—may use AI-assisted robots in hospitals that are not neuroscience centres (the majority of hospitals are not neuroscience centres). Potentially, such developments would lead to greater accessibility of MT globally [21].

Several papers have investigated the potential use of ML in the automation of catheters and guidewires for endovascular interventions, with a recent systematic review finding 80% (8/10) of studies (all published after 2018) implemented some form of reinforcement learning (RL) [21]. RL is a subset of ML where an agent learns by interacting with the environment, receiving feedback as rewards, and aims to minimize cumulative rewards over time by optimizing its actions based on the current state [22]. However, this trial-and-error approach means that RL often requires many interactions with the environment to learn the optimal policy [23]. ‘Demonstrator data’ has been utilized previously in a variety of ways: it has been used to establish control policies for RL to optimize [24], enhance training speed [25], and also function as high-priority samples in the reinforcement learning replay memory [26]. While a ‘dense reward’, characterized by frequent feedback, has been shown to provide quicker training times than a ‘sparse reward’, which offers feedback less frequently [25], there has been no previous work to use demonstrator data to determine a reward function specifically for autonomous endovascular navigation tasks. Inverse reinforcement learning (IRL) is able to do this by using an agent to infer the underlying reward function from the observed behaviour of an expert [27]. By leveraging the knowledge of experts, the number of trial and error cycles needed can potentially be reduced, leading to faster and more effective learning of complex tasks.

This study aimed to leverage expert demonstrators through IRL to train autonomous guidewire and guide catheter navigation in simulated MT environments (i.e. in silico), by navigating from the common iliac artery to the distal aspect of the cervical segment of the ICA (representing the early stages of navigation which is completed by obtaining this stable guide catheter position). A primary objective was to train and test three models, each with a different reward function, using the soft actor-critic (SAC) algorithm [28]. A secondary objective was to conduct an analysis of one- and two-device tracking. This study contributes to the advancement of autonomous MT navigation and demonstrates a novel application of IRL in the simulated MT vasculature environment, filling a significant research gap in the field [21].

Methods

In the following sections, we present the navigation task, the environment required to simulate MT, and the simulation data collection process of expert demonstrator data. Subsequently, we describe the IRL approach for devising a reward function.

Navigation task

In MT, a guidewire is typically used to navigate a guide catheter to the ICA. An ‘access catheter’—one with a distal shape to allow access into vessel branches—is placed within the guide catheter and taken ahead of the guide catheter tip during navigation. Once the access catheter is within the ICA, the guide catheter is advanced to make a stable platform. At this point, the guidewire and access catheter are retracted, and a microguidewire within a microcatheter is together passed through the stable guide catheter and navigated to the target site. The navigation task presented in this paper simulates this complete navigation of guidewire and access catheter. A target location, randomly sampled from 30 centreline points within the right internal carotid artery (RICA) or left internal carotid artery (LICA), is chosen to be navigated from an insertion point in the common iliac artery; 20 targets in each branch were selected for training, and the remaining 10 hold-out target locations were used for evaluation. For each navigation attempt, the train or test target location is changed. Figure 1a shows the simulation environment, with anatomy labelled, and the possible targets. Figure 1b shows the insertion point (standard across all runs) and an example navigation path to a target within each carotid artery.

Simulation environment

The in silico environment for the navigation task builds upon prior contributions from [17, 29], both of which use the Simulation Open Framework Architecture (SOFA), along with the modified BeamAdapter plugin, to create a comprehensive simulation-based training and evaluation environment [30, 31]. SOFA allowed replication of vascular structures derived from two computed tomography angiography (CTA) scans. The first scan encompassed the abdominal and thoracic regions, including the femoral arteries, descending aorta, and the aortic arch. The second CTA scan encompassed the aortic arch and extended to the cerebral vessels, including the carotid and vertebral arteries in the neck. These scans were subsequently processed into surface meshes, each consisting of approximately 850 vertices, which were then manually merged to create a cohesive and continuous model using Blender [32].

Integrating catheters (characterized by 160 vertices with Young’s modulus of 47 MPa) and guidewires (comprising 120 vertices with Young’s modulus of 43 MPa) within the in silico environment were facilitated by the BeamAdapter plugin developed for SOFA. This plugin was used to model the Penumbra Neuron MAX 088 guide catheter (Penumbra, California, USA) and the Terumo 0.035 guidewire (Terumo, Tokyo, Japan) to generate the initial topology and mechanical properties needed for the simulation.

The simulation assumed rigid vessel walls with an empty lumen. The simulation’s fidelity to real-world guidewire behaviour was achieved through iterative fine-tuning of parameters such as the friction between the guidewire and vessel wall and the guidewire’s stiffness, as demonstrated by [29] and [33], the latter of which has shown promising translation from in silico to an ex vivo porcine liver. A qualitative survey shows that six experienced (greater than five years clinical experience) interventional neuroradiologists (UK consultant grade; US attending equivalent), two novice (less than three years clinical experience) interventional neuroradiologists, and one interventional radiologist (UK specialist trainee grade; US fellow equivalent) agreed that the vessel anatomy was accurate, agreed the guide catheter performed realistically, and were indifferent to the realism of the guidewire [17].

Input parameters for the simulation, including guidewire rotation and translation speed, were applied at the proximal end of the catheter and guidewire. The output of the simulation was the catheter and guidewire position as a series of coordinate points directly extracted from the finite element model. The rotational speed was constrained to a maximum of 180 $^{\circ }\, \textrm{s}^{-1}$, while the translational speed was capped at 40 $\textrm{mm s}^{-1}$. Feedback during the navigation was given as two-dimensional $(x',z')$ tracking coordinates of the device, and no feedback about the vessel geometry was given. Experiments were performed using solely guidewire-tracking (single tracking) and combined guidewire and cathetertracking (dual tracking) methods.

Demonstrator data collection

The navigation task was performed in silico on ten random targets in each carotid artery for demonstrator data collection. Navigation of the targets from the same insertion point in Fig. 1b was performed using inputs from a keyboard. The action (device rotation and translation), device tracking data, and target location were collected at each simulation step. Data were split based on if the target branch was on the left or right side. An example navigation path of guidewire and catheter is shown in Fig. 3a.

IRL algorithm

IRL was used to determine a suitable reward function from the collected demonstrator data. The maximum entropy IRL (MaxEnt IRL) algorithm was employed to learn the reward functions based on expert demonstrations, specifically for navigating through different branches of vasculature [34]. The algorithm’s objective is twofold: it seeks to replicate the observed behaviour of experts and ensures that the learned reward functions capture a distribution of behaviour with maximum entropy. To achieve this, MaxEnt IRL involves an iterative process of updating reward functions to maximize the entropy of the expected policy while maintaining consistency with observed expert behaviour. The reward functions are initialized randomly, with the same random seed across experiments, and updated using gradient ascent to minimize the entropy of the policy.

For each branch, the reward function is updated using the gradient ascent step in Eq. (1), where $R_i$ is the reward function for branch i, $\nabla R_i$ is the gradient vector, and $\eta $ is the learning rate.

$$\begin{aligned} R_i \leftarrow R_i + \eta \nabla R_i \end{aligned}$$

(1)

The MaxEnt IRL objective is expressed as in Eq. (2), for a policy $\pi \left( a\mid s \right) $, which is the probability distribution over actions given a particular state.

$$\begin{aligned} Maximize \mathbb {E}_\pi \left[ -\sum _a \pi \left( a\mid s \right) \log \pi \left( a\mid s \right) \right] \end{aligned}$$

(2)

The Boltzmann policy calculation for action a given state s is defined as Eq. (3).

$$\begin{aligned} \pi \left( a\mid s \right) = \frac{e^{R_i \left( s,a \right) }}{\sum _{a^i}e^{R_i \left( s,{a}' \right) }} \end{aligned}$$

(3)

A feedforward neural network with four fully connected layers trained the model using expert trajectories, optimizing the policy through $1 \times 10^6$ iterations. A separate model was trained for each set of targets in each branch. At the start of a new navigation task, the correct model based on the target branch was obtained, and at each step, a reward value was returned based on the current observation given to the model.

Controller architecture, reward functions, and training

In this study, all RL models were trained using a SAC controller, adapted to facilitate two-device tracking, and accommodate various reward functions from the architecture developed in previous work by Karstensen et al. [29]. The architecture consists of a Long Short-Term Memory (LSTM) layer, which functions as an observation embedder, enabling the learning of a trajectory-dependent state representation. The subsequent feedforward layers are responsible for learning the control of the guidewire. The LSTM-based observation embedder is updated exclusively using the q1-network. In this architecture, the controller receives an observation as input, and the Gaussian policy network generates parameters, specifically mean ($\mu $) and standard deviation ($\sigma $), of a normal distribution for the following action. During training, actions are sampled from this normal distribution, whereas for evaluation, $\mu $ is directly employed as the action, rendering the behaviour deterministic. Training was completed on a NVIDIA DGX A100 node with 8 GPUs (Santa Clara, California, USA).

The action was defined as the output of the Gaussian policy network representing the catheters’ and the guidewires’ rotation and translation speed. Three points on the instrument tip described the instrument’s position, denoted as $(x', z')_{i=1,2,3}$ and $(x',z')_{1}$ coinciding with the instrument tip. Additionally, the target position was specified by the current target’s $(x', z')$-coordinates. The observation comprised the current and previous guidewire and catheter positions, the target position, and the previous action taken from the last to the current position.

The controller training procedure performed the navigation task for $1\times 10^{7}$ exploration steps, defined as a single control loop cycle during the exploration phase. Each navigation task was considered an episode and was deemed complete once the target was reached within a 5 mm threshold. To ensure computational efficiency, we introduced a timeout after 400 exploration steps (equivalent to approximately 53 s) without reaching the target. The control frequency was set to 7.5 Hz.

Three types of reward functions were used in this study to determine whether deriving a reward function from demonstrator data provides any benefit for autonomous endovascular navigation. Each reward function is calculated in every simulation step from the environment state based on the agent’s actions. Therefore, actions will lead to different reward quantities across reward functions and, hence, a variation in the final model.

The current state-of-the-art algorithm for autonomous navigation in endovascular interventions is devised by [29]. This algorithm uses a dense reward function, $R_{1}$, which does not use IRL and is shown in Eq. (4). Here, pathlength is defined as the distance between the guidewire tip and the target along the centrelines of the arteries, with $\Delta \text {pathlength}$ representing the change in pathlength at time t from the previous step at time $t=-1$.

$$\begin{aligned} R_{1}= & {} -0.005 - 0.001\cdot \Delta \text {pathlength}\nonumber \\{} & {} + {\left\{ \begin{array}{ll} 1.0 &{} \text {if target reached} \\ 0 &{} \text {else}\end{array}\right. } \end{aligned}$$

(4)

The second reward function $R_{2}$ was derived using IRL, as seen in Eq. (5), where $R_\textrm{RICA}$ and $R_\textrm{LICA}$ are equal to $R_{i}$ in Eq. (1) calculated for the right and left carotid arteries, respectively.

$$\begin{aligned} R_{2} = {\left\{ \begin{array}{ll} R_\textrm{RICA} &{} \text {if target in RICA}\\ R_\textrm{LICA} &{} \text {if target in LICA} \end{array}\right. } \end{aligned}$$

(5)

The third reward function $R_{3}$ was obtained through reward shaping using a combination of the dense reward function $R_{1}$ and the IRL-derived reward function $R_{2}$, as shown in Eq. (6). $\alpha $ is a scaling factor used to provide the correct ratio of $R_{1}$ and $R_{2}$. A scaling factor of 0.001 was used in this study.

$$\begin{aligned} R_{3} = R_{1} + \alpha R_{2} \end{aligned}$$

(6)

Evaluations were conducted every $2.5 \times 10^5$ exploration steps for 100 episodes. Ten targets in each branch unused during training were utilized for evaluation. Each evaluation step records the success rate, procedural times, and path ratio. Comparison is performed between the highest success rate of each model, and the corresponding path ratio and procedure times at this evaluation step. The success rate is the percentage of evaluation episodes in which the controller successfully reaches the target; path ratio is a measure of the remaining distance to the target point in unsuccessful episodes, calculated by dividing the remaining distance by the initial distance; and procedure time is the time taken from the start of navigation to the target location for successful episodes. Exploration steps is the number of training steps taken to reach the point at which the results are provided. Comparative statistical analyses were conducted using two-tailed paired Student’s t-tests, with a predetermined significance threshold set at p = 0.05.

Table 1 Results of device tracking training

Full size table

Results

Single- versus dual-device tracking

Results of an investigation into the difference in success rate, procedure times, and path ratio during training between tracking solely the guidewire and tracking both the guidewire and catheter are shown in Table 1. Figure 2 shows the success rate and path ratio during training. Single tracking and dual tracking reach similar maximum success rates of 95% and 96%, respectively ($p= 0.741$), with single tracking having a lower procedure time of 22.5 s compared to 24.9 s for dual tracking ($p=0.047$). Figure 3 compares the guidewire and catheter trajectories from insertion to a target in the LICA of single and dual-device tracking. The final tip position of catheters for these two models is highlighted for comparison. For single tracking, the catheter is utilized sparingly and is not navigated out of the common iliac artery. In contrast, during dual tracking navigation, the catheter is used throughout the navigation task and is taken up the aortic arch to provide stability. Dual tracking navigation, therefore, mimics neurointerventional radiology experts, whereas single tracking navigation does not.

Table 2 Results of reward function training

Full size table

IRL and reward shaping

Table 2 shows the final success rate, procedure time, and path ratio for a dense reward function alone, an IRL-derived reward function alone, and a reward function obtained through reward shaping (which combines a dense reward function and an IRL-derived reward function). All experiments are performed using dual tracking. Figure 4 shows the success rate and path ratio during training. Reward shaping provides the highest success rate of 100%, followed by 96% for the dense reward function ($p=0.045$). Reward shaping also has the lowest procedure time out of the examined reward functions at 22.6 s, compared to 24.9 s for the dense reward function ($p=0.0001$). Figure 3d shows an example navigation path for guidewire and catheter of the IRL-derived reward function at $0.76 \times 10^6$ exploration steps, where it can be seen that the low success rate of 48% can be attributed to catheterization of the left vertebral artery, rather than the right carotid artery.

Discussion

This study was the first to perform a navigation task that simulated the MT intervention by autonomously navigating both guidewire and catheter from an insertion point in the femoral artery to a target in the carotid artery. Furthermore, our study introduced for the first time dual device for autonomous endovascular navigation, providing information about catheters and guidewire during RL training. This study was also the first to investigate reward functions in autonomous endovascular navigation using IRL to identify a specific reward function from demonstrator data for RL training.

There have been no studies applying RL in any form to any task related to MT [21]. Our study demonstrated proof of concept of in silico autonomous navigation of instruments in MT. The implication is that if further validation studies demonstrate benefit, our approach can plausibly increase patient accessibility to MT while increasing intervention safety and speed.

Single versus dual tracking

Both models showed an increase in path ratio and success rate over training. Results showed that single and dual tracking models reach a similarly high success rate after training, but the single tracking method records quicker procedure times. However, investigation into the dual tracking models shows that the catheter is utilized from an early stage during navigation, with the movements performed similar to those of an experienced neurointerventional radiologist. The catheter has a higher stiffness than the guidewire, and hence, in the dual tracking model, the catheter is taken up the aortic arch to provide stability as the guidewire is navigated into the branching point of the common carotid arteries. For this reason, dual tracking was used in all reward function experiments, as it better replicates the actions of an experienced operator, with no trade-off for success rate.

It is noteworthy that in this experiment, a single static mesh and a relatively simple navigation task were used. It is plausible that in a more challenging navigation task, for example, where the anatomy is more tortuous, as seen in old and hypertensive patients, the dynamic use of two instruments (dual tracking with guidewire and catheter) would enable a higher success rate.

Reward function

Results showed that reward shaping using a combined learned IRL and a dense reward function gives a higher overall success rate and a lower procedure time than the state of the art [29]. The model trained using reward shaping can use demonstrator data through IRL to navigate towards the target, while the dense reward function provides an incentive for reaching and moving towards the target while promoting this in the lowest number of steps possible.

Training using only an IRL-derived reward function did not reach the same success rates as the other models, although it can be seen that the model was able to navigate towards the target. However, the low success rate comes from wrong branch catheterization, which could be caused by the IRL-derived reward function not taking into account the vessel walls and, therefore, receiving a high reward signal for a small Euclidean distance in the coordinate system, while the pathlength is still significant. For anatomical clarity, the distal vertebral artery and the ipsilateral internal carotid artery are approximately 10 mm from each other yet both arteries are in continuity with the aortic arch allowing the guidewire to move seamlessly to either location. These issues were overcome through reward shaping, as by providing a negative reward for moving away from the target via pathlength, the success rate can be increased dramatically.

A model with a high path ratio is still valid for all experiments performed within this study. Failures where the path ratio was above 80% usually meant that the correct branch was catheterized; however, the guidewire did not reach the target point. Despite this, the guidewire would be in the correct region of the vessel and be helpful for the operator and subsequent steps in MT. After all, in MT the initial aim simulated by our experiments is to navigate to this approximate area within the ICA with a standard guidewire facilitated by an access catheter. Thereafter, a guide catheter is advanced to this approximate area, and then, a microguidewire and microcatheter are used. The exact location of the initial navigation within the ICA is often less important than simply having a guidewire in the ICA that is stable enough to allow the subsequent endovascular steps.

Limitations

While the results of this investigation towards an autonomous MT navigation system fall under a technology readiness level (TRL) of 3 [35], this proof-of-concept work showed that IRL can be leveraged effectively as part of reward shaping to deduce a suitable reward function for RL training; there are limitations to the methodology employed in terms of utility. In vitro (e.g. phantom) and clinical validation steps would be required to progress the TRLs.

First, demonstrator data were obtained using a simple keypad controller; therefore, to improve the reward function’s suitability, data could be collected to accurately reflect the actions performed by an experienced neurointerventional radiologist. Furthermore, 20 demonstration trajectories were used for IRL, and a higher sample size may provide a more generalized reward function. Additionally, the reward function could be altered to give consideration to safety during the intervention. By providing a negative reward for vessel wall contact forces over a threshold, the puncturing of a vessel wall causing a perforation can be prevented.

Second, while different targets were used for training and testing, the same mesh was used throughout the experiments. Future in silico validation work should apply these algorithms to a range of patient anatomy (a dataset with multiple training and hold-out testing unique patient anatomy) to increase generalizability, improving the effectiveness during future in vitro experiments.

Third, simulations in this study were performed using tracking coordinates of the guidewire and catheter tip. An alternative to this would be to use an image-based tracking system, which may have more clinical relevance, as the autonomous system could use fluoroscopic imaging that already takes place during MT interventions. For future in vitro experiments, further work will investigate the effectiveness of the techniques demonstrated in this study to imaged-based tracking inputs and those captured by device tracking.

Fourth, recent clinical trials of robotic diagnostic cerebral angiography recommend that procedural times, success rates, fluoroscopy times, and radiation doses be compared with the traditional manual approach [36]. The measurement of fluoroscopy times and radiation doses is outside the scope of this in silico study. However, success rate, procedural times, and path ratio were recorded at each evaluation step. Future in vitro experiments could measure fluoroscopy times and radiation doses.

Our study successfully addresses the task of simulating the complete navigation of guidewire and guide catheter in MT, which takes place from the common iliac artery to either the LICA or RICA where the guide catheter is in a stable position. However, a complete MT procedure typically involves a subsequent navigation step from the LICA or RICA to the MCA. Such a subsequent navigation to the MCA requires different instrumentation (often microcatheters and microwires) beyond the scope of our investigation. Therefore, while this study provides valuable insights into a specific aspect of the MT procedure involving several steps where there is challenging anatomy, it does not represent a comprehensive examination of the entire MT navigation procedure.

Conclusion

We showed the feasibility of autonomous guidewire navigation throughout the MT vasculature using IRL with demonstrator data for the first time. We established a comprehensive simulation-based training environment and, for the first time, compared models with different reward functions for autonomous endovascular navigation by utilizing SOFA and SAC. Reward shaping emerged superior, with higher success rates and lower procedure times than the state of the art. Single- versus dual-device tracking revealed that both methods achieved high success rates, but dual tracking utilized both devices, effectively mimicking experienced neurointerventional radiologists. We have highlighted several opportunities for future research beyond our proof-of-concept experiments, in particular relating to improving performance. Another opportunity is to investigate the final stage of navigation which is typically from the distal aspect of the cervical ICA to the M1 segment of the MCA using microcatheter and microwire. Additionally, researchers may wish to apply our approach to other non-MT endovascular systems. In summary, this work offers promising avenues for improved procedural accessibility and precision of in silico autonomous endovascular navigation tasks. For MT, the work plausibly lays the foundation for potentially transformative patient care.

References

Townsend N, Wilson L, Bhatnagar P, Wickramasinghe K, Rayner M, Nichols M (2016) Cardiovascular disease in europe: epidemiological update 2016. Eur Heart J 37(3232–3245):11
Google Scholar
Goyal M, Menon BK, Zwam WHV, Dippel DW, Mitchell PJ, Demchuk AM, Dávalos A, Majoie CB, Lugt AVD, Miquel MAD, Donnan GA, Roos YB, Bonafe A, Jahan R, Diener HC, Berg LAVD, Levy EI, Berkhemer OA, Pereira VM, Rempel J, Millán M, Davis SM, Roy D, Thornton J, Román LS, Ribó M, Beumer D, Stouch B, Brown S, Campbell BC, Oostenbrugge RJV, Saver JL, Hill MD, Jovin TG (2016) Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials. The Lancet 387:1723–1731
Article Google Scholar
Vidale S, Agostoni E (2017) Endovascular treatment of ischemic stroke: an updated meta-analysis of efficacy and safety. Vasc Endovasc Surg 51:215–219
Article Google Scholar
Rha JH, Saver JL (2007) The impact of recanalization on ischemic stroke outcome: a meta-analysis. Stroke 38:967–973
Article PubMed Google Scholar
Berkhemer OA, Fransen PS, Beumer D, van den Berg LA, Lingsma HF, Yoo AJ, Schonewille WJ, Vos JA, Nederkoorn PJ, Wermer MJ, van Walderveen MA, Staals J, Hofmeijer J, van Oostayen JA, Nijeholt GJL, Boiten J, Brouwer PA, Emmer BJ, de Bruijn SF, van Dijk LC, Kappelle LJ, Lo RH, van Dijk EJ, de Vries J, de Kort PL, van Rooij WJJ, van den Berg JS, van Hasselt BA, Aerden LA, Dallinga RJ, Visser MC, Bot JC, Vroomen PC, Eshghi O, Schreuder TH, Heijboer RJ, Keizer K, Tielbeek AV, den Hertog HM, Gerrits DG, van den Berg-Vos RM, Karas GB, Steyerberg EW, Flach HZ, Marquering HA, Sprengers ME, Jenniskens SF, Beenen LF, van den Berg R, Koudstaal PJ, van Zwam WH, Roos YB, van der Lugt A, van Oostenbrugge RJ, Majoie CB, Dippel DW (2015) A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med 372(11–20):1
Google Scholar
Saver JL, Goyal Lugt AVD, Menon BK, Majoie CB, Dippel DW, Campbell BC, Nogueira RG, Demchuk AM, Tomasello A, Cardona P, Devlin TG, Frei DF, Rochemont RDMD, Berkhemer OA, Jovin TG, Siddiqui AH, Zwam WHV, Davis SM, Castaño C, Sapkota BL, Fransen PS, Molina C, Oostenbrugge RJV, Ángel Chamorro, Lingsma H, Silver FL, Donnan GA, Shuaib A, Brown S, Stouch B, Mitchell PJ, Davalos A, Roos YB, Hill MD (2016) Time to treatment with endovascular thrombectomy and outcomes from ischemic stroke: a meta-analysis. JAMA J Am Med Assoc 316:1279–1288
Article Google Scholar
McMeekin P, White P, James MA, Price CI, Flynn D, Ford GA (2017) Estimating the number of UK stroke patients eligible for endovascular thrombectomy. Eur Stroke J 2:319–326
Article PubMed PubMed Central Google Scholar
Programme SSNA (2023) Ssnap annual report 2023 [Online]. Available: www.hqip.org.uk/national-programmes
Hausegger KA, Schedlbauer P, Deutschmann HA, Tiesenhausen K (2001) Complications in endoluminal repair of abdominal aortic aneurysms. Eur J Radiol 39:22–33
Article CAS PubMed Google Scholar
Rudnick MR, Goldfarb S, Wexler L, Ludbrook PA, Murphy MJ, Halpern EF, Hill JA, Winniford M, Cohen MB, VanFossen DB (1995) Nephrotoxicity of ionic and nonionic contrast media in 1196 patients: a randomized trial. Kidney Int 47:254–261
Article CAS PubMed Google Scholar
Klein LW, Miller DL, Balter S, Laskey W, Haines D, Norbash A, Mauro MA, Goldstein JA (2009) Occupational health hazards in the interventional laboratory: time for a safer environment. Soc Interv Radiol 250:538–544
Google Scholar
Ho P, Cheng SW, Wu PM, Ting AC, Poon JT, Cheng CK, Mok JH, Tsang MS (2007) Ionizing radiation absorption of vascular surgeons during endovascular procedures. J Vasc Surg 46:455–459
Article PubMed Google Scholar
Madder RD, VanOosterhout S, Mulder A, Elmore M, Campbell J, Borgman A, Parker J, Wohns D (2017) Impact of robotics and a suspended lead suit on physician radiation exposure during percutaneous coronary intervention. Cardiovasc Revascularization Med 18:190–196
Article Google Scholar
Crinnion W, Jackson B, Sood A, Lynch J, Bergeles C, Liu H, Rhode K, Pereira VM, Booth TC (2022) Robotics in neurointerventional surgery: a systematic review of the literature. J Neurointerventional Surg 14:539–545
Article Google Scholar
Riga CV, Cheshire NJ, Hamady MS, Bicknell CD (2010) The role of robotic endovascular catheters in fenestrated stent grafting. J Vasc Surg 51:810–820
Article PubMed Google Scholar
Mofatteh M (2021) Neurosurgery and artificial intelligence. AIMS Neurosci 8:477–495
Article PubMed PubMed Central Google Scholar
Jackson B, Crinnion W, Reyzabal MDI, Robertshaw H, Bergeles C, Rhode K, Booth T (2023) Comparative verification of control methodology for robotic interventional neuroradiology procedures, International Journal of Computer Assisted Radiology and Surgery, 7. [Online]. Available: https://link.springer.com/10.1007/s11548-023-02991-2
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2:5
Article Google Scholar
Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 09:1–16
Google Scholar
Silahtaroğlu G, Yılmaztürk N (2021) Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods 50:1547–1556
Article Google Scholar
Robertshaw H, Karstensen L, Jackson B, Sadati H, Rhode K, Ourselin S, Granados A, Booth TC (2023) Artificial intelligence in the autonomous navigation of endovascular interventions: a systematic review. Front Hum Neurosci 17:1239374
Article PubMed PubMed Central Google Scholar
Sutton RS, Barto AG (1998) Introduction to reinforcement learning, 1st ed. MIT Press
Adams S, Cody T, Beling PA (2022) A survey of inverse reinforcement learning. Artif Intelli Rev 55:4307–4346
Article Google Scholar
Chi W, Liu J, Abdelaziz MEMK, Dagnino G, Riga C, Bicknell C, Yang GZ (2018) Trajectory optimization of robot-assisted endovascularcatheterization with reinforcement learning. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, vol 8, pp 3875–3881
Behr T, Pusch TP, Siegfarth M, Hüsener D, Mörschel T, Karstensen L (2019) Deep reinforcement learning for the navigation of neurovascular catheters. Curr Dir Biomed Eng 5:5–8
Article Google Scholar
Kweon J, Kim K, Lee C, Kwon H, Park J, Song K, Kim YI, Park J, Back I, Roh JH, Moon Y, Choi J, Kim YH (2021) Deep reinforcement learning for guidewire navigation in coronary artery phantom. IEEE Access 9:166409–166422
Article Google Scholar
Ng AY, Russel S (2000) Algorithms for inverse reinforcement learning. Morgan Kaufmann Publishers Inc., pp 663–670
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy J, Krause A (Eds.), Proceedings of the 35th international conference on machine learning, ser. Proceedings of machine learning research, vol. 80. PMLR, pp 1861–1870. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html
Karstensen L, Ritter J, Hatzl J, Ernst F, Langejürgen J, Uhl C, Mathis-Ullrich F (2023) Recurrent neural networks for generalization towards the vessel geometry in autonomous endovascular guidewire navigation in the aortic arch. Int J Comput Assist Radiol Surg 18:1735–1744
Article PubMed PubMed Central Google Scholar
Faure F, Duriez C, Delingette H, Allard J, Gilles B, Marchesseau S, Talbot H, Courtecuisse H, Bousquet G, Peterlik I, Cotin S (2012) SOFA, a multi-model framework for interactive physical simulation. Springer Berlin Heidelberg, 2012. [Online]. Available: https://doi.org/10.1007/8415_2012_125
Duriez C, Cotin S, Lenoir J, Neumann P (2006) New approaches to catheter navigation for interventional radiology simulation. Comput Aided Surg 11:300–308
Article CAS PubMed Google Scholar
Community BO (2018) Blender—a 3D modelling and rendering package, Blender Foundation, Stichting Blender Foundation, Amsterdam. [Online]. Available: http://www.blender.org
Karstensen L, Ritter J, Hatzl J, Pätz T, Langejürgen J, Uhl C, Mathis-Ullrich F (2022) Learning-based autonomous vascular guidewire navigation without human demonstration in the venous system of a porcine liver. Int J Comput Assist Radiol Surg 17:2033–2040
Article PubMed PubMed Central Google Scholar
Ziebart BD, Maas A, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. AAAI Press, pp 1433–1438. [Online]. Available: www.aaai.org
Mankins J (1995) Technology readiness level—a white paper, 4. [Online]. Available: https://www.researchgate.net/publication/247705707
Beaman A Gautam, Peterson C, Kaneko N, Ponce L, Saber H, Khatibi K, Morales J, Kimball D, Lipovac JR, Narsinh KH, Baker A, Caton MT, Smith ER, Nour M, Szeder V, Jahan R, Colby GP, Cord BJ, Cooke DL, Tateshima S, Duckwiler G, Waldau B (2023) Robotic diagnostic cerebral angiography: A multicenter experience of 113 patients. J NeuroInterventional Surg

Download references

Funding

Partial financial support was received from the Wellcome/EPSRC Centre for Medical Engineering [WT203148/Z/16/Z] and from the Engineering and Physical Sciences Research Council Doctoral Training Partnership (EPSRC DTP) under Grant Agreement No EP/R513064/1. For the purpose of Open Access, the Author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Surgical and Interventional Engineering, School of Biomedical Engineering and Imaging Sciences, Kings College London, London, UK
Harry Robertshaw, Benjamin Jackson, Alejandro Granados & Thomas C. Booth
AIBE, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Lennart Karstensen
Department of Neuroradiology, Kings College Hospital, London, UK
Thomas C. Booth

Authors

Harry Robertshaw
View author publications
You can also search for this author in PubMed Google Scholar
Lennart Karstensen
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Jackson
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Granados
View author publications
You can also search for this author in PubMed Google Scholar
Thomas C. Booth
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation was performed by Harry Robertshaw, Lennart Karstensen, and Benjamin Jackson. Data collection and analysis were performed by Harry Robertshaw. The first draft of the manuscript was written by Harry Robertshaw, and all authors commented on previous versions of the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Thomas C. Booth.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that is relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Robertshaw, H., Karstensen, L., Jackson, B. et al. Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03208-w

Download citation

Received: 11 January 2024
Accepted: 30 May 2024
Published: 17 June 2024
DOI: https://doi.org/10.1007/s11548-024-03208-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Abstract

Purpose

Methods

Results

Conclusions

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Introduction to Reinforcement Learning

Introduction

Methods

Navigation task

Simulation environment

Demonstrator data collection

IRL algorithm

Controller architecture, reward functions, and training

Results

Single- versus dual-device tracking

IRL and reward shaping

Discussion

Single versus dual tracking

Reward function

Limitations

Conclusion

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation