1 Introduction

Moving agents, such as mobile robots, are increasingly being employed in many complex environments. Starting from the initial applications of mobile robots to manufacturing industries, a variety of robotic systems have been developed, and they have shown their effectiveness in performing different kinds of tasks, including smart home environments (Khawaldeh et al., 2016), airports (Zhou et al., 2016), shopping malls (Kanda et al., 2009), and manufacturing laboratories (Chen et al., 2014).

Nowadays, path planning is fruitfully employed in many fields, such as entertainment, medicine, mining, rescuing, education, military, space, agriculture, robots for elderly persons, automated guided vehicles, for transferring goods in a factory, unmanned bomb disposal robots and planet exploration robots (Robert et al., 2008). Apart from robotic applications, path planning finds use in planning the routes on circuit boards, obtaining the hierarchical routes for networks in wireless mobile communication, planning the path for digital artists in computer graphics, reconnaissance robots and in computational biology to understand probable protein folding paths (Raja & Pugazhenthi, 2012).

In recent years, path planning has been also widely used in surgery (Adhami & Coste-Manière, 2003). In current clinical practice, a growing number of minimally invasive procedures rely on the use of needles, such as biopsies, brachitherapy for radioactive seeds placement, abscess drainage and drug infusion (Shi et al., 2016). With respect to standard open surgeries, the small diameter of the needle allows to access the targeted anatomy inflicting limited tissue damage and thus reducing the risks for the patient and speed up the recovery.

Over the last two decades, different research groups have focused their efforts on the development of needles able to autonomously steer inside the tissue. These needles can perform curvilinear trajectories planned to maximize the distance from sensitive anatomical structures to be avoided and reach targets otherwise inaccessible via rectilinear insertion paths (Wang et al., 2011). Accurate placement of the needle tip inside tissue is challenging, especially when the target moves and anatomical obstacles must be avoided. Moreover, the complex kinematics of steerable needles (Favaro et al., 2020) make the path planning challenging requiring the aid of automatic or semi-automatic path planning solutions. In our opinion, one of the most challenging path planning scenarios in surgery is neurosurgery. The complexity of planning within the brain is closely linked to the crowded anatomical structures and the relative delicacy of tissues that cannot be damaged without the risk of causing permanent damage or even death. In addition, due to the consistency of brain matter, there are few catheters capable of tracing curvilinear trajectories and those that exist have a minimal degree of curvature (Audette et al., 2020). These reasons significantly reduce the number of feasible trajectories that guarantee a high level of safety. According to this, the planning space in the brain is considered a complex environment. Furthermore, in the majority of cases, reaching the target, located deep in the brain, is very difficult due to the kinematic limitations of the catheter and the various anatomical regions to be avoided. Therefore, during the planning phase, several poses on the surface of the skull have to be evaluated to guarantee the best entry pose for reaching the target.

In general, moving agents in a static or dynamic known environment means finding one or more admissible paths from a starting configuration to a target configuration, avoiding obstacles and some movement possibilities, identified as kinematic constraints. The path planning problem is part of a larger class of “scheduling” and “routing problems”, and it is known to be non-deterministic polynomial-time hard and complete (Obitko, 1998). In general, path planning algorithms performance can be evaluated over two main characteristics: “completeness” and “optimality”. An algorithm is said to be complete if it can find a solution in a definite interval of time, provided that the solution exists, or report a failure otherwise. An algorithm is optimal if no other algorithm uses less time or space complexity. Given a path, path length is defined as the total distance covered by the moving agent from the starting position to the target, path safety represents the distance from the path to the nearest obstacle, and computation time is the time required to compute a path.

In this work, we propose a framework that couples inductive and deductive techniques in order to improve path planning performances. In particular, an inductive learning model, relying on demonstrations performed by expert operators, is in charge of generating a set of paths as candidate solutions; a deductive reasoning module selects then the “best” starting point, according to explicit knowledge modeled over domain experts suggestions. Interestingly, this kind of coupling allows us to transfer to the automated path planner some of the knowledge available at human level: the inductive learning module “catches” via demonstrations expert capabilities that are hard to explicitly express (e.g., visual-spatial, bodily-kinesthetic), while the deductive module formally encodes what has been elaborated by the experts upon long-lasting practice (e.g., domain knowledge, best practices). Furthermore, the deductive technique based on a declarative formalism grants several advantages: on the one hand, it makes the knowledge easy to maintain and update; on the other hand, it allows us to provide the final user with a highly customizable tool with real time visual feedback, as she can decide what is important for choosing the best path, and to what extent, for each case at hand.

Eventually, we assess the viability of the proposed approach in a use case, namely 3-D path planning for neurosurgical steerable needle, proving that it stands or even outperform state-of-the art solutions.

The remainder of the paper is structured as follows: in Sect. 2 an overview of the path planning approaches proposed in literature is presented. Section 3 outlines our path planning method. Section 4 describes the experimental protocol used. Section 5 presents the comparison between the presented solution, the expert manual path definition and a state-of-the-art path planning method for the specific neurosurgical application. Discussion and Conclusions are reported in Sect. 6.

Table 1 Related work (Color table online)

2 Related works

Several approaches for path planning have been proposed in literature: graph-based, sampling-based, optimisation-based, heuristic-based, learning-based, reasoning-based methods. These methods are described below, and summarised in Table 1, according to optimality (i.e. an algorithm is known to be and “optimal” since it can estimate the best path, according to a certain criteria, given the specific resolution of the approximation); completeness (i.e. an algorithms is known to be “complete”, as it can determine whether a solution exists in finite time); scalability (i.e. an algorithms is known to be “scalable” as it can plan a path in a reasonable time even if the search space increase in size); computational time (i.e. the execution time to obtain a solution); the ability to plan within a complex environment (i.e. the environment is composed of many obstacles with elaborate shapes, narrow passages and tangled locations); and the ability to obtain a smooth path (i.e. able to minimise the along the curvature).

Dijkstra algorithm (Dijkstra, 1959) and A* (Hart et al., 1968) are graph-based methods based on the discrete approximation of the planning problem. Many methods represent the environment as a square graph, or as an irregular graph (Kallem et al., 2011), or a Voronoi diagram (Garrido et al., 2006). A search is performed in order to find an optimal path. These algorithms are known to be “resolution-complete”, as the one proposed in Fu et al. (2021) that provides more guarantees also in terms of efficiency, and “resolution-optimal”. This approach may also be used for identifying a restricted area where further optimisation refinements can be performed (Huang et al., 2009). Notably, even though graph-based methods are relatively simple to implement, they require considerable computational time as the environment size increases (Bellman, 1966) or becomes complex. Tangent graph-based planning methods for a given environment build a graph by selecting collision-free common tangents between the obstacles. These methods allow more accurate path planning around curved obstacles without errors caused by polygonal approximation; however, these methods are not always suitable when considering the kinematics limitations of a moving agent and require additional optimisation steps (Tovar et al., 2007) to obtain a smooth path.

Sampling-based methods are based on a random sampling of the working space, with the aim of significantly reducing the computational time. Rapidly-exploring Random Tree (RRT) (LaValle & Kuffner Jr., 2000) is an exploration algorithm for quick search in high-dimensional spaces, more efficient than brute-force exploration of the state space. In fact, this class of methods is scalable and capable of planning in a complex environment. Its enhanced versions, RRT* (Jordan & Perez, 2013; Favaro et al., 2018) and bidirectional-RRT (Karaman & Frazzoli, 2011) are “probabilistically complete” since the probability to find an existing solution tends to one, as the number of samples goes to infinity, and “asymptotically optimal”, as they can refine an initial raw path by increasing the sampling density of the volume.

Paths computed with the approaches mentioned above can be further refined using Bezier curves (Hoy et al., 2015), splines (Lau et al., 2009), polynomial basis functions (Qu et al., 2004), or with optimisation-based methods such as evolutionary algorithms, simulated annealing, and particle swarm (Besada-Portas et al., 2010) to obtain a smooth path. These approaches have the advantage of working properly in complex environments, as demonstrated in Favaro et al. (2021); however, they require higher computational time than the sampling-based methods.

Artificial Intellingence (AI) has been increasingly used while dealing with path planning tasks (Erdos et al., 2013) in the last decade. Heuristic-based techniques such as greedy (Sniedovich, 2006) and genetic algorithm (Whitley, 1994) are two examples of AI approaches; they belong both to the class of optimisation procedure. Greedy algorithms for path planning are often used in combination with other approaches. This kind of algorithms fails to find the optimal solution, as it takes decisions merely on the basis of the information available at each iteration step, without considering the overall picture. Genetic algorithms are also used to generate solutions for path optimisation problems based on operations like mutation, crossover and selection. For this kind of algorithms, the most relevant limit is computational time, as it significantly increases with the search space. Completeness depends on the heuristic function.

Learning-based methods are more flexible than graph-based and sampling-based methods (Xu et al., 2008); indeed, they allow one to directly include all expected constraints and optimality criteria (obstacle clearance, kinematic constraints, minimum path length) in the optimisation process, without the need for subsequent refinement steps, which are time-consuming and may still not lead to the optimal path (Segato et al., 2020). Mnih et al. (2015) and Lillicrap et al. (2015) showed that Deep Reinforcement Learning (DRL) is suitable for solving path planning problems, and several studies (Mirowski et al., 2016, 2018; Tai et al., 2017) applied DRL in path planning. Panov et al. (2018) made use of the DRL approach to a grid path planning problem with promising results on small environments. Inverse Reinforcement Learning, also known as an Inductive Learning (IL) technique (Michalski, 1983) that is a process where the learner discovers rules by observing examples, has been applied to a wide range of domains, including autonomous driving (Wulfmeier et al., 2016), robotic manipulation (Crooks et al., 2016) and grid-world planning (Nguyen et al., 2017). In general learning-based method are not optimal or complete, the computational time is not high, and it is not increasing when the search space increase. They perform well in complex environment even if it is dynamic because they don’t need prior information about obstacles, as we demonstrated in our previous work (Segato et al., 2021a).

Reasoning-based approaches for path planning have been successfully designed, providing high-level methods like in Lifschitz (2002). They are optimal and complete. Portillo et al. (2011) successfully solved the path planning problem, Gómez et al. (2021) and Erdem et al. (2015) encoded multi-agents pathfindings and Erdem et al. (2012) used a deductive reasoning-based approach to control and plan the actions of multiple housekeeping robots whose aim is to tidy up a house in the shortest possible time and to avoid collisions between themselves and other obstacles. Reasoning-based approaches have the capability of explicitly representing domain knowledge; however, a path planning system for complex environments based only on a deductive reasoning-based method or similar approaches might be insufficient, as current implementations cannot handle an excessive increase of the search space and generalise on different environments (Erdem et al., 2012).

Fig. 1
figure 1

Agent and 3D complex environment. a (on the left) The moving agent kinematic constraints are the needle diameter (\(d_{out}\)) and the maximum curvature (\(k_{max}\)) that it can perform. At tth time step it can translate (\({\textbf {p}}_{agent_t}\)) and rotate (\({\textbf {R}}_{agent_t}\)), performing an action \(a_t\), from its configuration, \({\textbf {T}}_{agent_t}\). b (on the right) The 3D complex environment is represented by a brain structure, the obstacle space (\(C_{obst}\)), the free space (\(C_{free}\)), the agent configuration (\({\textbf {T}}_{agent}\))

3 Materials and methods

In this work, we propose a novel approach for path planning in 3-D complex environments that combines IL and Deductive Reasoning (DR); we refer to it as the ILDR method. In this way we can exploit all the advantages of the first technique (scalability, low computation time, capacity to plan in a complex environment taking into account kinematic constraints of the robot providing a smooth path if necessary) and the second technique (completeness, optimality, capability of explicitly encoding medical knowledge, solid theoretical bases coupled with a fully declarative nature) which allows to produce formal specifications that are already executable without the need for additional algorithmic coding, thus fostering fast prototyping and easing the interaction with domain experts. The novel aspects of this approach is that it not only includes all the fundamental requirements of the path planning task, but take also into account the expert’s knowledge to fully understand the decision-making process that guides the optimal path selection.

To test viability and effectiveness of our approach, we have chosen keyhole neurosurgery as a case study. In this context, path planning is crucial when the pathological target (e.g., a tumor or left subthalamic nucleus (LSTN)) is located in deep brain areas and cannot be safely reached by a flexible probe. Thus, the optimization criteria involved in finding the best path in this case scenario are: the maximisation of the minimum and average distance with to obstacles, so as to avoid delicate structures in the brain and the minimisation of the length and curvature of the path, so as to limit damage to the brain matter.

3.1 Problem statement

3.1.1 Moving agent

Let us consider an “agent”, showed in Fig. 1a, moving in a 3D complex environment. The agent in this work is the tip of a steerable needle, a new biomimetic flexible probe (PBN) (Burrows et al., 2013), that can translate and rotate in space. Its kinematic constraints are the “diameter”, \(d_{out}\), and the “maximum curvature”, \(k_{max}\).

3.1.2 Environment

The “3D complex environment” (Env) is showed in Fig. 1b. The “configuration space”, C-space, is the set of all the possible t “agent configurations”, \({\textbf {T}}_{agent_t}\), with \(t \in C\)-space. The agent configurations are described by poses, denoted as \(4 \times 4\) transformation matrices:

$$\begin{aligned} {\textbf {T}}_{agent_t}=\begin{pmatrix} {\textbf {R}}_{agent_t} &{} {\textbf {p}}_{agent_t}\\ {\textbf {0}} &{} 1 \end{pmatrix} \end{aligned}$$

where \({\textbf {p}}_{agent_t} \in {\mathbb {R}}^3 \) is the tip position and \({\textbf {R}}_{agent_t} \in SO(3)\) is the orientation relative to a world coordinate frame.

The “obstacle space”, \(C_{obst} \in \) C-space, is the space occupied by obstacles. The “free space” \(C_{free} \in \) C-space, is the set of possible configurations (\({\textbf {T}}_{agent}={\textbf {T}}_{free} \in C_{free}\); \({\textbf {T}}_{agent}\ne {\textbf {T}}_{obst} \in C_{obst}\)). Moreover we can define: the “start configurations” \({\textbf {T}}_{start_{k}}\in C_{free}\) with \(k \in 1..N\), the “target space” \(C_{target }\in C_{free}\), that is a subspace of “free space” which denotes where we want the needle to move to, and the “target configuration” \({\textbf {T}}_{target} \in C_{target}\).

3.1.3 Actions

The agent at every t-th time step can take an action \(a_t=(x_t,y_t,z_t,\alpha _t,\beta _t,\gamma _t)\), moving towards the \({\textbf {T}}_{target}\), where x,y and z are the axes and \(\alpha \), \(\beta \), \(\gamma \) the angles around the axes respectively. Actions moving the agent toward an \({\textbf {T}}_{obst}\) or outside the Env are considered inadmissible.

3.1.4 Path planning problem

The path planning problem, described in Fig. 2a, can be formulated as follows: given the “start configurations” \({\textbf {T}}_{start_{k}}\) \(k=1:N\) and a “target configuration”, \({\textbf {T}}_{target}\), the task is to find a path, \(Q = \{{\textbf {T}}_{agent_0},\,{\textbf {T}}_{agent_1},\,...,\,\) \({\textbf {T}}_{agent_{n-1}}\}, \, {\textbf {T}}_{agent_0} = {\textbf {T}}_{start_{k}},\,{\textbf {T}}_{agent_{n-1}}= {\textbf {T}}_{target}\), and \({\textbf {T}}_{agent_t}\) and \({\textbf {T}}_{agent_{t+1}}\) are connected by straight segments, as an admissible sequence of “agent configurations”.

3.2 Inductive learning-based and deductive reasoning-based (ILDR) architecture

The herein proposed approach to path planning in 3-D complex environments, referred to as the ILDR method, is summarized in Fig. 2b. The best path is obtained by means of a learning approach that generates a set of path candidates and an expert classifier that selects the “preferred” one, according to a set of rules set by the domain expert.

Fig. 2
figure 2

Path planning problem and method. a Schematic representation of the 3D path planning problem elements, start configuration (\({\textbf {T}}_{{start_{{k}}}}={\textbf {T}}_{{agent_{0}}}\)) and target configuration (\({\textbf {T}}_{target}={\textbf {T}}_{{agent_{{n-1}}}}\)). The task is to find the optimal path (\(Q = \{{\textbf {T}}_{{agent_{0}}},\,{\textbf {T}}_{{agent_{1}}},\,..,\,{\textbf {T}}_{{agent_{{n-1}}}}\}\)). At every tth time step \({\textbf {T}}_{{agent_{t}}}\) and \({\textbf {T}}_{{agent_{{t+1}}}}\) are connected by straight segments, as an admissible sequence of “agent configurations”, taking into account the “obstacle space”, \(C_{obst}\). b The proposed path planning algorithm exploits two merged approaches: inductive learning-based and deductive reasoning-based approaches. The expert indicates N start configurations, (\({\textbf {T}}_{{start_{{k}}}}\) with \( k \in N\)), the target configuration, (\({\textbf {T}}_{target}\)) the rules and performs a set of demonstrations. While the agent acts and observes in the virtual environment, the expert’s demonstrations and the agent’s observations feed the inductive approach that generates a set of optimal trajectories \(\{Q^{IL}_{k}\}\) witk k equal to the N start configurations. The rules and the kinematic constraints feed the deductive approach that extracts the best path \(Q^{ILDR}\).

Fig. 3
figure 3

ILDR architecture. The expert gives in input the dataset (\(C_{free}\) and \(C_{obst}\)), the start and target configuration (\({{\textbf {T}}}_{start}\) and \({{\textbf {T}}}_{target}\)) and the kinematics constraints (\(d_{out}\) and \(k_{max}\)). The IL model is trained through a loop that starts with paths generated by an expert (\(\{Q^{Manual}\}\)) and paths (\(\{Q^G\}\)) performed by the network generator (G) based on a Reinforcement Learning (RL) approach. With a Generative Adversarial Imitation Learning (GAIL) approach a discriminator (D) with its network (\(D_w\)), takes in input the expert and generetor policies (\(\pi _{E}\) and \(\pi _{\theta }\)) the starting value of the policy’s parameters and of the discriminator (\(\theta _0\) and \(w_0\)). During the training phase, it makes a comparison of these two paths generating an intrinsic reward (\(r^{in}({{\textbf {T}}}_{agent},a)\)) based on the similarity score (\(r_{sim}\)), updating the agent’s polici \(\pi \). The loop continues until the generator, moving in the environment (Env) with actions (\(a_i\)), collecting observations (\(d_{tot}\), \(d_{min}\), \(d_{avg}\), \(c_{max}\) and \(\alpha _{tr}\)) and computing an extrinsic reward(\(r^{ex}({{\textbf {T}}}_{agent},a)\)), can produce a path similar to the expert’s demonstrations and that respects the kinematic constraints. Once the IL model is trained, it generates a set of optimal paths; the weights (\(w_{d_{tot}}\), \(w_{d_{min}}\) and \(w_{c_{max}}\)) and the kinematic constraints are taken as input by the DR classifier that extracts the best path with an approach based on Answer Set Programming (ASP). Applying hard and soft constraints obtaining the best path \(Q^{ILDR}\)

3.2.1 Inductive learning model

As shown in Fig. 3, the inductive component iteratively trains the moving agent to generate optimal paths, \(\{Q^{IL}_k\}\), thanks to expert’s manual demonstrations, \(\{Q^{Manual}\}\).

3.2.2 Generator

With several successful applications in robot control, RL that searches for optimal policies by interacting with the environment becomes one potential solution to learn how to plan a path. Segato et al. (2020), based on DRL, demonstrated its capability of directly learning the complicated policy of planning a path for steerable needle.

In the proposed approach, the agent interacts with the environment, Env. At each time step (t), the agent updates its current state (\({\textbf {T}}_{agent}\)) and selects an action (\(a_t\)), according to the policy (\(\pi \)), such that:

$$\begin{aligned} \pi ({\textbf {T}}_{agent}, Env) = a_t \end{aligned}$$
(1)

In response, the agent receives the next state (\({\textbf {T}}_{agent_{t+1}}\)) and observations (explained in the next paragraph). The goal of the agent is to determine an optimal policy \(\pi \) allowing it to take actions inside the environment, maximizing, at each t, the cumulative extrinsic reward \(r^{ex}({\textbf {T}}_{agent},a)\). The latter is associated to the real, full state of the system and to the agent’s observations. In this way, the moving agent successfully learns to follow the best path in accomplishing its task and its constraints of maximum curvature, \(k_{max}\), and diameter, \(d_{out}\).

Observations The observations collected step-by-step by the agent while exploring the environment are: the length (\(d_{tot}\)) of the path, the minimum (\(d_{min}\)) and average (\(d_{avg}\)) distances from the obstacle space \(\{C_{Obst}\}\), the maximum curvature of the path (\(c_{max}\)), the target angle (\(\alpha _{target}\)) and the computational time (T).

For each path \(Q^G = \{{\textbf {T}}_{agent_0},\,{\textbf {T}}_{agent_1},\,..,\,{\textbf {T}}_{agent_{n-1}}\}\) where \({\textbf {T}}_{agent_0}={\textbf {T}}_{start_k} \, k= 1:N\), the observations are:

  • Path length (\(d_{tot}\)): the distance, d, between any two positions can be calculated based on the Euclidean distance:

    $$\begin{aligned} d(\overline{{\textbf {p}}_{agent_{t}} {\textbf {p}}_{agent_{t+1}}})=\left\| {\textbf {p}}_{agent_{t}} - {\textbf {p}}_{agent_{t+1}} \right\| \end{aligned}$$
    (2)

    with \(t~\in 1,...,n\), \(n=||Q^G\) and

    $$\begin{aligned} d_{tot} = \sum _{t={0}}^{n-1}d(\overline{{\textbf {p}}_{agent_{t}} {\textbf {p}}_{agent_{t+1}}}) \end{aligned}$$
    (3)
  • Minimum distance (\(d_{min}\)): given the the line segment \((\overline{{\textbf {p}}_{agent_{t}} {\textbf {p}}_{agent_{t+1}}})\) and the m obstacles represented by the occupied configurations \({\textbf {T}}_{obst_j}=\begin{pmatrix} {\textbf {R}}_{obst_j} &{} {\textbf {p}}_{obst_j}\\ {\textbf {0}} &{} 1 \end{pmatrix}\) with \(j~\in 1,...,m\), \(m=||C_{obst}\), the minimum distance of the path from the nearest obstacle indicating the level of safety is the minimum length between the line segments and the closest obstacle, such that:

    $$\begin{aligned} d_{min}= \mathrm {min}\{d(\overline{{\textbf {p}}_{agent_{t}} {\textbf {p}}_{agent_{t+1}}},{\textbf {p}}_{obst_j})\}~\forall _t,~\forall _j \end{aligned}$$
    (4)
  • Average distance (\(d_{avg}\)) of the path from all the nearest m obstacles: it is calculated over the whole length of the path with respect to all the obstacles, such that:

    $$\begin{aligned} d_{avg}=\frac{1}{n \cdot m}\sum _{t=1}^{n}\sum _{j=1}^m d(\overline{{\textbf {p}}_{agent_{t}} {\textbf {p}}_{agent_{t+1}}},{\textbf {p}}_{obst_j})~\forall _t,~\forall _j\nonumber \\ \end{aligned}$$
    (5)
  • Maximum curvature (\(c_{max}\)) of the path: curvature c of a path in a 3-D space is the inverse of the radius \(r_t\) of a sphere passing through 4 positions of the path (\({\textbf {p}}_{agent_{t}}, {\textbf {p}}_{agent_{t+1}}, {\textbf {p}}_{agent_{t+2}}, {\textbf {p}}_{agent_{t+3}}\)) computed for each t (the method is explained comprehensively in “Appendix A.1”). Subsequently, the maximum curvature can be extracted as follows:

    $$\begin{aligned}&c_{t} =\frac{1}{r_t} \end{aligned}$$
    (6)
    $$\begin{aligned}&c_{max} ={\max \limits _{0 \le t <n-3}c_t} \end{aligned}$$
    (7)
  • Target angle (\(\alpha _{target}\)): given the 3-D unit vector representing the agent direction \(\overrightarrow{{\textbf {p}}_{agent_{t}}{\textbf {p}}_{agent_{t+1}}}\) and the one representing the target direction \(\overrightarrow{{\textbf {p}}_{agent_{t}}{\textbf {p}}_{target}}\), the target angle is defined as follows:

    $$\begin{aligned} \alpha _{target} = arccos(\overrightarrow{{\textbf {p}}_{agent_{t}}{\textbf {p}}_{agent_{t+1}}} \cdot \overrightarrow{{\textbf {p}}_{agent_{t}}{\textbf {p}}_{target}}) \end{aligned}$$
    (8)
figure h

Reward function (Algorithm 1) the reward function associated with each time step t is shaped in order to make the agent learn to optimise the path, according to three main requirements:

  1. 1.

    path length minimisation

  2. 2.

    obstacle clearance maximisation

  3. 3.

    moving agent’s kinematic constraints respect.

figure i
figure j

The reward \(r^{ex}({\textbf {T}}_{agent},a)\) is defined as:

$$\begin{aligned} r^{ex}({\textbf {T}}_{agent_t},a_t) = {\left\{ \begin{array}{ll} r_{goal_t} \;\; \text{ if } {\textbf {T}}_{agent_t}={\textbf {T}}_{target} \\ r_{obst_t} \;\; \text{ if } {\textbf {T}}_{agent_t}={\textbf {T}}_{obst} \\ r_{step_t}+ r_{dist_t}+r_{curv_t}+ r_{target_t} \;\; \text{ otherwise }\end{array}\right. } \end{aligned}$$
(9)
  • A positive constant reward, \(r_{goal_t}\), is given upon reaching the target.

  • A negative constant reward, \(r_{obst_t}\), is given if an obstacle collision is detected.

  • A negative reward, \(r_{step_t}= -\frac{1}{S_{max}}\), is given at each step t of the agent in order to obtain a reduction in the computational time (T), where \(S_{max}\) corresponds to a fixed threshold representing the maximum number of steps t.

  • A negative constant reward, \(r_{dist_t}\), is added whenever the minimum value of \(d_{min_t}\) is lower than a predefined safe distance (\(d_{safe}~=~\frac{d_{out}}{2}\)), corresponding to the occupancy of the moving agent. This reward aims to maximise the \(d_{min_t}\) to reduce the risk of collision and increase the safety rate of the path.

  • A negative constant reward, \(r_{curv_t} \), is assigned if the current path curvature \(c_t\) overcomes the maximum value of the curvature of the moving agents specified by the user (\(k_{max}\)).

  • Finally, a negative reward, \(r_{target_t}=-\frac{\alpha _{target_t}}{\alpha _{max}} r_{deg}\), is added to minimise \(\alpha _{target_t}\) in order to further minimise c and T parameters. The value of this reward is proportional to the ratio between \(\alpha _{target_t}\) and the maximum angle \(\alpha _{max}\).

The parameters of the reward function are reported in Table 2. The optimal parameters’ value have been obtained by fine-tuning with grid-search procedure. Sub-optimal parameter configurations caused the agent to learn and apply inappropriate actions, e.g. moving too close to obstacles, going in the opposite direction with respect to the target, choosing non-optimal paths in terms of distance.

figure k
Table 2 Training parameters

In this way, with the generator, G, we are able to obtain new paths, \({Q^G}\).

3.2.3 Discriminator

In GAIL (Ho & Ermon, 2016) the task of the discriminator, D, is to distinguish between the paths, \(\{Q^{G}\}\), generated by G with a RL approach, and the demonstrated paths \(\{Q^{Manual}\}\). When D cannot distinguish \(\{Q^{G}\}\) from \(\{Q^{Manual}\}\), then G has successfully matched the demonstrated path, \(\{Q^{Manual}\}\), reaching a level equal to or higher than one of the expert thanks to the additional observations collected and extrinsic reward received.

As showed in Algorithm 2, the proposed path planning approach receives in input the expert’s paths, \(\{Q^{Manual}\}\), the path generated from the Generator, \(\{Q^{G}\}\), and the initialization of the policy’s parameters, \(\theta _0\), and of the discriminator, \(w_0\). The path \(\{Q^{G}\}\) fits a parameter policy \(\pi _\theta \) (where \(\theta \) represent weights), while the manual demonstration, \(\{Q^{Manual}\}\), fits an expert policy, \(\pi _E\). The discriminator network \(D_w\) (w, weights) learns to distinguish the generated policy, \(\pi _\theta \), from the expert one, \(\pi _E\). The parameters w of \(D_w\) are updated in order to maximise Eq. 10:

$$\begin{aligned}&\max _{\nabla _w} E_{\pi _{{\theta }}}\left[ \nabla _w \log \left( D_w\left( {\textbf {T}}_{agent},a \right) \right) \right] +\nonumber \\&\quad E_{\pi _E}\left[ \nabla _w \log \left( 1-D_w\left( {\textbf {T}}_{agent},a \right) \right) \right] \end{aligned}$$
(10)

where \(\nabla _w\) is the gradient, E[ ] is the expectation value with respect to a policy, \({\pi _{\theta }}\), or to the expert policy, \({\pi _E}\), \(D_{w}\left( {\textbf {T}}_{agent},a \right) \) is the Discriminator network that evaluates the state (\({\textbf {T}}_{agent}\)) and the action (a).

TRPO (Schulman et al., 2015) assures that the change of parameter \(\theta \) between \(\pi _\theta +1\) and \(\pi _\theta \) is limited:

$$\begin{aligned} \theta= & {} \min _{\nabla _\theta } E_{\pi _{{\theta }}}[\nabla _\theta \log \pi _{\theta } (a|{\textbf {T}}_{agent}) \cdot Q({\textbf {T}}_{agent},a) ]\nonumber \\&- \lambda H ( \pi _{\theta } ) \end{aligned}$$
(11)

where \(\nabla _\theta \) is the gradient, \(H\left( \pi _{\theta } \right) = E_{\pi _{\theta }}\left[ -\log \pi \left( a|{\textbf {T}}_{agent} \right) \right] \) is the causal entropy and its value by \(\lambda \in \Re \), \(\lambda >0\).

Finally the discriminator, D, updates the agent’s policy \(\pi \) to be close to the expert policy \(\pi _{E}\) :

$$\begin{aligned} \pi =\arg \max _{\pi _{\theta }}E_{\pi _{{\theta }_i}}[-log(1-D_{w}({\textbf {T}}_{agent},a)] \end{aligned}$$
(12)

using an intrinsic reward, defined as \(r^{in}({\textbf {T}}_{agent},a))=r_{sim}\). The similarity reward \(r_{sim}\) proportional to \(-log(1-D_{w}({\textbf {T}}_{agent},a)\) is an increasing reward when the results approach of \(D_w\) approaches \(r_{sim}=1\), i.e. \(D_w\) is not able to discriminate well the two paths. The inverse happens when \(D_w\) approaches \(r_{sim}=0\), i.e. it is able to discriminate well between the two paths, and the reward goes to 0.

The trained IL model generates the paths \(\{Q^{IL}_k\} \, k=1:N\).

3.2.4 Deductive reasoning classifier

As already introduced above, the optimal path is selected among the N paths given by IL and the paths produced by the deductive module by means of an ASP-based DR Classifier (Gelfond & Lifschitz, 1991; Brewka et al.,2011; Leone & Ricca, 2015). Among modern declarative formalisms, ASP, originally developed in the field of logic programming and non-monotonic reasoning, has became widely used in AI, and it is recognized as an effective tool for Knowledge Representation and Reasoning (KRR). This is especially due to its high expressiveness and the ability to deal with incomplete knowledge, and also because of the availability of robust and efficient solvers; more details on ASP are given in “Appendix A.2”.

A domain expert provided the knowledge that we encoded into a logic program consisting of rules and constraints and readable by the ASP solver. For each instance of the path planing problem, the ASP program is hence coupled to a proper representation of the pool of paths calculated by the IL model, \(\{Q^{IL}_k\}\) (the “input”) and fed to the ASP solver, thus choosing the best one, \(\{Q^{ILDR}\}\). In the following, we provide more details about the encoded deductive path selector.

A path \(Q^{ILDR}_x\) cannot be selected if the maximum curvature \(c_{max}\) (Eq. 7) measured for \(Q^{ILDR}_x\) is bigger than the \(k_{max}\), as it can be seen in the following code snippet (In order to fully understand the syntax, we have provided some examples in “Appendix A.2”).:

$$\begin{aligned} \begin{aligned} \texttt {:- choose}(Q^{ILDR}_x), \texttt {curvature}(c_{max}), \\ \texttt {maxCurve}(Q^{ILDR}_x,k_{max}), k_{max}<c_{max}. \end{aligned} \end{aligned}$$

where the atom maxCurve(\(Q^{ILDR}_x\),\(c_{max}\)) couples the path \(Q^{ILDR}_x\) with the maximum curvature \(c_{max}\).

A path \(Q^{ILDR}_x\) must be discarded also if it approaches the sensitive structures at a minimum distance \(d_{min}\) (Eq. 4) smaller than the radius (r \(= \frac{d_{out}}{2}\)) of the moving agent:

$$\begin{aligned} \begin{aligned} \texttt {:- choose}(Q^{ILDR}_x), \texttt {radius}(r),\\ \texttt {distObst}(Q^{ILDR}_x,d_{min}), d_{min}<r. \end{aligned} \end{aligned}$$

Along with these hard constraints, that make inappropriate paths to be discarded, we identified several criteria for expressing preferences among admissible ones. In particular, starting from the expert’s knowledge, we designed some “soft constraints”, and three corresponding weights (\(w_{d_{tot}}\), \(w_{d_{min}}\) and \(w_{c_{max}}\), \(w \in {\mathbb {R}} > 0\)) used to express preferences towards paths that feature minimum (or maximum) adherence to the criteria.

  1. 1.

    Minimization of the path length (\(d_{tot}\)):

    $$\begin{aligned} \begin{aligned} \#\texttt {minimize}\{d_{dot}@w_{d_{tot}}, Q^{ILDR}_x: \texttt {choose}(Q^{ILDR}_x), \\ \texttt {length}(Q^{ILDR}_x,d_{tot})\}. \end{aligned} \end{aligned}$$

    where \(d_{tot}\) is the length of path \(Q^{ILDR}_x\).

  2. 2.

    Maximization of the distance from obstacles:

    $$\begin{aligned} \begin{aligned} \#\texttt {maximize}\{ d_{min}@w_{d_{min}}, Q^{ILDR}_x: \texttt {choose}(Q^{ILDR}_x), \\ \texttt {distObst}(Q^{ILDR}_x,d_{min}) \}. \end{aligned} \end{aligned}$$
  3. 3.

    Minimization of the curvature of the moving agent:

    $$\begin{aligned} \begin{aligned} \#\texttt {minimize}\{ c_{max}@w_{c_{max}}, Q^{ILDR}_x: \texttt {choose}(Q^{ILDR}_x), \\ \texttt {maxCurve}(Q^{ILDR}_x,c_{max}) \}. \end{aligned} \end{aligned}$$

Some additional insights are provided in description of Algorithm 3.

It is worth noting that the purely declarative nature of ASP easily allows fine-tuning the desiderata by combining the constraints. Not only the system can easily be improved if new or more specific knowledge is available from the experts, but the user can change the behavior of the classifier at will when in use. The provided interface gives the user the possibility to compose the desiderata and repeat this step multiple times, after the IL model has generated the output, \(\{Q^{IL}_k\}\) (agnostic to the human-chosen desiderata), changing the inputs until she is satisfied with the obtained path. In this case, new weights and more (or less) restrictive constraints (i.e., increasing \(k_{max}\) or \(d_{out}\)) can be set. Furthermore, the user can decide to take into account all, some, or none of the encoded preferences, depending on the specific application; if she chooses to apply more than one of these rules, then also different priorities can be set, expressed by the weights. Hence, the capability to customize the set of rules and (hard/soft) constraints to use for each case study makes our tool highly flexible and generalizable.

4 Experimental protocol

Criteria for defining the “best” surgical path are several; their importance depends on the application at hand. In our experiments, performed in static simulated environments, we focused on Deep Brain Stimulation (DBS) and Convection Enhanced Delivery (CED), which are two relevant applications of steerable needles in keyhole neurosurgery.

4.1 Neurosurgical environment

3-D brain structures for CED and DBS environment were identified on two datasets: (1) Time-of-Flight (ToF) Magnetic Resonance (MR) for vessels visualisation, (2) T1 for brain cortex, skull surface, arterial blood vessels, ventricles and all the relevant deep grey matter structures visualisation, segmented through FreeSurfer Software (Fischl, 2012) and 3-D Slicer (Pieper et al., 2004).

Fig. 4
figure 4

CED and DBS environemnts. In a, sagittal, axial and coronal view of the CED environment are reported. The obstacle space (\(C_{obst}=\){ventricles, thalamus, pallidum and vessels), the free space (\(C_{free}= \) gyri) and the target space (\(C_{target}= \) tumor) were defined accordingly. In b, sagittal, axial and coronal view of the DBS environment are reported. Defining accordingly the obstacle space (\(C_{obst}=\){ventricles, thalamus, pallidum, vessels and Corticospinal Tract (CST)}), the free space (\(C_{free}= \) gyri) and the target space (\(C_{target}= \) LSTN)

  • As reported in Fig. 4a, in CED the target space is the tumor=\(C_{target}\), surrounded by different essential structures (ventricles, thalamus, pallidum and vessels), that represent the obstacle space, \(C_{obst}\), while gyri represent the free space, \(C_{free}\).

  • As reported in Fig. 4b , in DBS the target space is the LSTN=\(C_{target}\), located in the central brain core. The obstacle space, \(C_{obst}\), is represented by relevant structures (ventricles, thalamus, pallidum, vessels and CST), while gyri represent the free space, \(C_{free}\).

4.2 Neurosurgical simulator

Fig. 5
figure 5

Neurosurgical simulator. It is possible to: (1) start a simulation, (2) select the procedure between the DBS and CED, (3) input the physical characteristics of the needle, \(d_{out}\) and \(k_{max}\), and the parameters to be maximised or minimised in the selection of the best path, (4) select starting and target configurations , \({\textbf {T}}_{start_{k}}\) and \({\textbf {T}}_{target}\), (5) perform manually the trajectories with a joystick controller or let the agent to perform trajectories autonomously with ILDR and finally (6) visualise all the generated trajectories

A planning tool is implemented in 3D Unity (Goldstone, 2009) and integrated with Machine Learning (ML)-Agents Toolkit (Juliani et al., 2018) that allows to visualize the 3D segmented risk structures of the brain of the patient derived from that data. We designed and developed a neurosurgical simulator, i.e. a “Brain Digital Twin”, described in Fig. 5 and shown in the animation (Online Resource 6.2) to create the environment, collect manual demonstrations by the expert surgeon with a joystick controller (with a combination of the translation along the z axis and the rotation around the x and y axes) and train the moving agent. First of all the surgeon is asked to choose the curvature \(k_{max}\), and diameter \(d_{out}\), and the parameters he/she wants to prioritize in the selection of the best trajectory, \(w_{d_{tot}}\), \(w_{d_{min}}\) and \(w_{c_{max}}\), used either to minimize or to maximize the rule expression. The surgeon can then select a target configuration in the brain (\({\textbf {T}}_{target}\)), e.g. on the tumor and N start configurations (\({\textbf {T}}_{start_{k}}\)), on the brain cortex. Once the start and target configurations are defined, it is possible to proceed to either a manual or automatic, pre- or intra-operative procedure. The difference between these last two is the dynamism of the environment in representing the needle-tissue interaction and so the non-holonomic constraints that affect the behaviour and the needle movements. The surgeon can in fact proceed with a static environment or with a dynamic environment whose model is based on a Positions Based Dynamics simulator (Segato et al., 2021b) used to emulate the brain tissues deformations in Key-hole Neurosurgery (KN). The needle model is considered as particle system (O’Brien et al., 2001). For the validation a pre-operative procedure with a static environment was considered. It is assumed that the motion of the needle tip fully determines the motion of the needle (“follow-the-leader” deployment) with a combination of the translation along the z axis and the rotation around the x and y axes.

Fig. 6
figure 6

Schematic representation of the experimental protocol workflow Each environment \(Env_s\) (with \(1 \le s \le 2\)), takes in input the start configurations, \({\textbf {T}}_{start_{k}}^s\) (with \(1 \le k \le 10\)), the target configuration, \({\textbf {T}}_{target}^{s}\), the weights, \(w_{c_{max}}^s, w_{d_{min}}^s, w_{d_{tot}}^s\) and the kinematic constraints of the moving agent, \(d_{out}\) and \(k_{max}\). j experiments, \(EXP_j\) (with \(1\le j \le 5\)), were conducted for each approach: Manual, DR, RRT* (only for DBS Environment) and ILDR. A pool of surgical paths, {\(path_k\)} is generated, and the optimal one, \(path_j^{s}\), is selected

4.3 Experimental validation

The results’ assessment for both scenarios, CED and DBS, is based on the comparison of the proposed method, ILDR, with the Manual and DR approaches. Moreover, in the DBS scenario, ILDR was tested against the Rapidly-exploring Random Trees (RRT)* algorithm.

As shown in Fig. 6, an expert surgeon (age: 37, performed surgical procedures: 2440) was asked to select, for each environment \(Env_s\), 10 desired start configurations, \({\textbf {T}}_{start_{k}}^s\), on the brain cortex, a target configuraion, \({\textbf {T}}_{target}^{s}\), on the target space, \(C_{target}\), and the weights, \(w_{c_{max}}^s, w_{d_{min}}^s, w_{d_{tot}}^s\) for the rules prioritisation, reported for both scenarios in Table 3. j experiments, \(EXP_j\) (with \(1\le j \le 5\)), were conducted for each one of the four approaches: Manual, DR, RRT* (only for DBS Environment) and ILDR.

  • Manual approach: For each \(EXP_j\), the surgeon was asked to generate a pool of surgical paths, {\(path_k\)} (with \(1 \le k \le 10\)), and choose the optimal one, \(path_j^{Manual,s}\), based on his expertise.

  • DR approach: For each \(EXP_j\) was considered the same pool of surgical paths generated in the manual approach,{\(path_k\)}, and the optimal one, {\(path_j^{DR,s}\)}, was selected with the DR classifier, using rules, weights and kinematic constraints given in input by the surgeon.

  • RRT* approach: For each \(EXP_j\) the pool of paths, {\(path_k\)}, was generated with the RRT* algorithm. The optimal one, {\(path_j^{RRT^*,s}\)}, was selected with a Cost Function \(F_{cost}\), to be minimised:

    $$\begin{aligned}&F_{cost}(\{path_k\})\nonumber \\&\quad ={\left\{ \begin{array}{ll} \infty \text{ if } d_{min} \le 0 \\ \infty \text{ if } c_{max} > k_{max} \\ w_{d_{min}}\frac{1}{d_{min}} + w_{d_{tot}}\frac{1}{\bar{d_{tot}}} + w_{c_{max}} \frac{c_{max}}{k_{max}} \text{ otherwise } \end{array}\right. }\nonumber \\ \end{aligned}$$
    (13)

    Using rules, weights and kinematic constraints given in input by the surgeon. For more information on the implementation of this approach please refer to our previous work Segato et al. (2019).

  • ILDR approach: For each \(EXP_j\) the pool of paths, {\(path_k\)}, was generated with the IL model. The optimal one, {\(path_j^{ILDR,s}\)}, was selected with the DR classifier, using rules, weights and kinematic constraints given in input by the surgeon.

For each path, {\(path_j^{s}\)}, we calculated:

  • The length (\(d_{tot}\)) of the path, as described in Eq. 3;

  • The minimum (\(d_{min}\)) and the mean (\(d_{avg}\)) distances of the path with respect to all the obstacles, as described in Eqs. 4 and 5;

  • The maximum curvature (\(c_{max}\)) of the path, as described in Eq. 7.

4.4 Hardware specification

Experiments were performed on a Linux machine equipped with a 6-core i7 CPU, 16GB of RAM and 1 NVIDIA Titan XP GPU with 12GB of VRAM.

4.5 IL training strategy

The training phase, for IL models, for each Environment, takes in input w start (\({\textbf {T}}_{start_{w}}\), with \(1 \le w \le 20\)) and target z (\({\textbf {T}}_{target_{z}}\), with \(1 \le z \le 5\)) configurations and y expert manual path (\(Q^{manual}_y\), with \(1 \le y \le 10\)) for each start and target. The number of manual demonstrations \(||x=1000\), (with \(1 \le x \le ||w \times ||z \times ||y\)), is obtained by combining the number of demonstrations (||y) provided by the expert user for each couple of start and target (\(||w \times ||z\)). At every episode randomly, a new \({\textbf {T}}_{start_w}\) was chosen among the available ones along with its relative \({\textbf {q}}_{target_{z}}\). Table 4 presents the training parameters values referred to the CED and DBS models.

Table 3 Expert constraints and rules
Table 4 Training parameters

4.6 Statistical analysis

All the performance metrics (\(d_{tot}\), \(d_{min}\), \(d_{avg}\) and \(c_{max}\)), extracted from the path, were analysed employing Matlab (The MathWorks, Natick, Massachusetts, R2020a). Lilliefors test has been initially applied for data normality. Due to the non-normality of data distribution, pairwise comparison was performed with the Wilcoxon matched-pairs signed-rank test. Differences were considered statistically significant at p value < 0.05.

Fig. 7
figure 7

Comparison between the different approaches. For both CED (a) and DBS (b) environment, expert’s rules, and needle kinematic constraints are reported in the upper part. a A comparison between Manual, DR, and ILDR approaches in the CED environment. b A comparison between Manual, DR, RRT*, and ILDR in the DBS environment. The results for both considered scenarios and used approaches are reported in terms of the minimum (\(d_{min}\)) and the mean (\(d_{avg}\)) distance from the critical obstacles, the total path length (\(d_{tot}\)) and the curvature (\(c_{max}\)) calculated over the five best paths for each approach. P values were calculated using Wilcoxon matched-pairs signed-rank test

Fig. 8
figure 8

Comparison between DR and ILDR approaches. For both CED (scenario 1) and DBS (scenario 2) environment, one example of the obtained path is shown. In particular, a a comparison between DR and ILDR approaches in the CED environment. b A comparison between DR and ILDR in the DBS environment. The results for both considered scenarios show an increase in safety (\(>>d_{min}\)) and smoothness (\(<<c_{max}\)) and a reduction in length (\(<<d_{tot}\)) for the proposed method (ILDR)

5 Results

5.1 Convection enhanced delivery

Figure 7a shows a comparison between Manual, DR and ILDR approaches in terms of \(d_{min}\), \(d_{avg}\), \(c_{max}\) and \(d_{tot}\) calculated over the best path of left hemisphere (for each approach, the criteria for the selection of the best path have been described in Sect. 4.3). DR approach keeps a greater \(d_{min}\) and \(d_{avg}\) from obstacles and a significantly lower \(d_{tot}\) (p value < 0.01) of the path with respect to Manual approach following the rules dictated by the expert. ILDR approach keeps a significantly greater \(d_{avg}\) (p value < 0.05) from obstacles and a significantly lower \(c_{max}\) (p value < 0.01) and \(d_{tot}\) (p value < 0.01) with respect to DR approach and Manual approach.

Figure 8a shows the visual comparison between one resulting path obtained with DR and ILDR approaches considering the previously mentioned optimisation criteria (safety, smoothness and path length), that result better for the ILDR approach even by visualization inspection.

5.2 Deep brain stimulation

Figure 7b shows a comparison between Manual, DR, RRT* and ILDR approaches in terms of \(d_{min}\), \(d_{avg}\), \(c_{max}\) and \(d_{tot}\) calculated over the best path of left hemisphere (for each approach, the criteria for the selection of the best path have been described in Sect. 4.3). DR approach keeps lower \(d_{tot}\) and \(c_{max}\) of the path than the Manual approach following the rules dictated by the expert who gives more importance to these two parameters in this case. While ILDR approach keeps a significantly greater \(d_{min}\) (p value < 0.01) and \(d_{avg}\) (p value < 0.01) from obstacles and a lower \(c_{max}\) and \(d_{tot}\) than the DR approach. The comparison between ILDR and RRT* approaches is showed in terms of \(d_{min}\), \(d_{avg}\), \(c_{max}\) and \(d_{tot}\) calculated over the best path of left hemisphere. ILDR approach keeps a significantly greater \(d_{min}\) (p value < 0.0001) and \(d_{avg}\) (p value < 0.0001) from obstacles and a lower \(c_{max}\) and a significantly lower \(d_{tot}\) (p value < 0.01) than the RRT* approach.

Figure 8b shows the visual comparison between one resulting path obtained with DR and ILDR approaches considering the previously mentioned optimisation criteria (safety, smoothness and path length), that result better for the ILDR approach even by visualization inspection.

5.3 Computational time

In Table 5, the computational time T for all the analysed approaches is reported. The ILDR approach is twice as fast compared to the Manual one and keeps much lower computational time than the state-of-the-art sampling-based method RRT*. Although this value is not essential for the proposed pre-operative procedure that is performed off-line, this demonstrates that the proposed method may be potentially applicable to an intra-operative procedure requiring a fast planning time.

6 Discussion and conclusion

The present work proposes a novel automatic path planning approach, called ILDR, for a moving agent in a complex environment. The complexity of the environment represents the worst-case scenario that grants the applicability and reliability of the method. In our experiments, ILDR performed better in terms of obstacle clearance and moving agent kinematic constraints compliance when tested against the Manual approach, the DR classification approach and the RRT* algorithm. By simultaneously optimising paths according to all the requested features, the proposed method outperforms state-of-the-art approaches in terms of path safety, path length, and computational time.

Our method succeeds in obtaining the optimal paths that can be followed to reach a specific target according to rules set by an expert. This approach allows to fully exploit an expert’s knowledge: he/she first performs the demonstrations used to train the GAIL model and then selects the constraints and their priorities, which ultimately lead to the choice of the best path with ASP.

It is worth noting that one of the main contributions of the present work consists in the integration of an inductive learning-based approach with a deductive reasoning-based approach. The inductive learning-based method allows the agent to learn the policy by a set of demonstrations provided by an expert, who can introduce in a path planning algorithm all his requirements and knowledge that cannot always be possible in graph- or sampling-based approaches unless additional optimisation steps are applied with additional computational time. Explicit programming cannot fully cover the complexity of the environment (represented by the human brain in this case, due to the presence of delicate and very complicated anatomical structure, narrow passages), the number of parameters and possible complications that have to be considered during the path planning. For this reason we implemented a DR classifier with a user interface, as described in the final part of Sect. 3.2.4, where the experts can express their individual preferences assigning different weights, thus creating a priority list for maintaining different path planning optimisation criteria (i.e., giving more priority to path safety than path length) while visualizing the trajectory and changing the criteria in real time. The DR method is implemented using ASP, that allowed us enjoy several advantages. First of all, even if the whole machinery can be embedded into a graphical user interface for user’s convenience, under the hood we are dealing with knowledge explicitly expressed via a declarative formalism: modifying and adapting the criteria for dropping unwanted paths and selecting the preferred one(s) is rather easy. Furthermore, as the specification are formally encoded, once the optimization criteria are well-established, we are ensured that the best path is actually chosen, and if more than one are present with the same “score”, then picking one or the other is completely indifferent. If, for some reasons, this turns to be not the case, this means that the criteria should be modified, which, as already stated, can be easily done, especially given that the resulting framework allows one to straightforwardly experiments with this respect.

Table 5 Results in term of computational time are shown

6.1 Clinical translation

As part of the EU’s Horizon EDEN2020 project, the current study proposes a novel automatic planner for steerable needles in keyhole-neurosurgery. Given the environment of the brain, a surgeon-defined start, and a target, the proposed method can provide an optimal path, according to predefined features as insertion length, clearance from safety regions, as blood vessels and eloquent morpho-functional landmarks, and compliance to the needle’s kinematics limits. It is intended to provide a state-of-the-art combined technology platform for minimally invasive surgery. When tested against the RRT* approach, the proposed method performed better in terms of path smoothness and clearance from safety regions, significantly decreasing the length and with a sensibly lower computational time. Accordingly to the possibility to perform curvilinear path for STN and tumor targeting, the proposed algorithm allows optimising the fundamental aspects of the DBS and CED and to maximising both the effectiveness and safety of the procedure.

6.2 Future directions

The proposed methodology favours high applicability and generalisability, as it could potentially be applied to different path planning problems. Future perspectives may include the exploitation of this automatic path planner method in many applications additional to keyhole neurosurgical procedures. The development of a surgical simulator defines an example of applications based on anatomical topology but not on anatomical dimensions. It is easy to see that different rules and constraints can be defined upon expert suggestion, thus making our methodology highly customisable and paving the way to extensions to additional 3-D complex environments, beyond brain surgery. Hence, this approach is widely application-independent and can be adapted to other use cases for path planning in a complex environment, where an expert has a crucial role. The application to a totally different context would require a thorough consultation with domain experts and the creation, if missing, of a specific simulation environment. Nonetheless, once the setup is created for a particular domain, as in the current study with brain surgery, it is easily applicable to different problems in such domain, by simply modifying or adding rules.

Supplementary Information Below is the link to the neurosurgicalsimulator: https://youtu.be/OM4x4W9lWkE.