A hybrid inductive learning-based and deductive reasoning-based 3-D path planning method in complex environments

Traditional path planning methods, such as sampling-based and iterative approaches, allow for optimal path’s computation in complex environments. Nonetheless, environment exploration is subject to rules which can be obtained by domain experts and could be used for improving the search. The present work aims at integrating inductive techniques that generate path candidates with deductive techniques that choose the preferred ones. In particular, an inductive learning model is trained with expert demonstrations and with rules translated into a reward function, while logic programming is used to choose the starting point according to some domain expert’s suggestions. We discuss, as use case, 3-D path planning for neurosurgical steerable needles. Results show that the proposed method computes optimal paths in terms of obstacle clearance and kinematic constraintscompliance,andisabletooutperformstate-of-the-artapproachesintermsofsafetydistance-from-obstaclesrespect, smoothness, and computational time.

Nowadays, path planning is fruitfully employed in many fields, such as entertainment, medicine, mining, rescuing, education, military, space, agriculture, robots for elderly persons, automated guided vehicles, for transferring goods in a factory, unmanned bomb disposal robots and planet exploration robots (Robert et al., 2008). Apart from robotic applications, path planning finds use in planning the routes on circuit boards, obtaining the hierarchical routes for networks in wireless mobile communication, planning the path for digital artists in computer graphics, reconnaissance robots and in computational biology to understand probable protein folding paths (Raja & Pugazhenthi, 2012).
In recent years, path planning has been also widely used in surgery (Adhami & Coste-Manière, 2003). In current clinical practice, a growing number of minimally invasive procedures rely on the use of needles, such as biopsies, brachitherapy for radioactive seeds placement, abscess drainage and drug infusion (Shi et al., 2016). With respect to standard open surgeries, the small diameter of the needle allows to access the targeted anatomy inflicting limited tissue damage and thus reducing the risks for the patient and speed up the recovery.
Over the last two decades, different research groups have focused their efforts on the development of needles able to autonomously steer inside the tissue. These needles can perform curvilinear trajectories planned to maximize the distance from sensitive anatomical structures to be avoided and reach targets otherwise inaccessible via rectilinear insertion paths (Wang et al., 2011). Accurate placement of the needle tip inside tissue is challenging, especially when the target moves and anatomical obstacles must be avoided. Moreover, the complex kinematics of steerable needles (Favaro et al., 2020) make the path planning challenging requiring the aid of automatic or semi-automatic path planning solutions. In our opinion, one of the most challenging path planning scenarios in surgery is neurosurgery. The complexity of planning within the brain is closely linked to the crowded anatomical structures and the relative delicacy of tissues that cannot be damaged without the risk of causing permanent damage or even death. In addition, due to the consistency of brain matter, there are few catheters capable of tracing curvilinear trajectories and those that exist have a minimal degree of curvature (Audette et al., 2020). These reasons significantly reduce the number of feasible trajectories that guarantee a high level of safety. According to this, the planning space in the brain is considered a complex environment. Furthermore, in the majority of cases, reaching the target, located deep in the brain, is very difficult due to the kinematic limitations of the catheter and the various anatomical regions to be avoided. Therefore, during the planning phase, several poses on the surface of the skull have to be evaluated to guarantee the best entry pose for reaching the target.
In general, moving agents in a static or dynamic known environment means finding one or more admissible paths from a starting configuration to a target configuration, avoiding obstacles and some movement possibilities, identified as kinematic constraints. The path planning problem is part of a larger class of "scheduling" and "routing problems", and it is known to be non-deterministic polynomial-time hard and complete (Obitko, 1998). In general, path planning algorithms performance can be evaluated over two main characteristics: "completeness" and "optimality". An algorithm is said to be complete if it can find a solution in a definite interval of time, provided that the solution exists, or report a failure otherwise. An algorithm is optimal if no other algorithm uses less time or space complexity. Given a path, path length is defined as the total distance covered by the moving agent from the starting position to the target, path safety represents the distance from the path to the nearest obstacle, and computation time is the time required to compute a path.
In this work, we propose a framework that couples inductive and deductive techniques in order to improve path planning performances. In particular, an inductive learning model, relying on demonstrations performed by expert operators, is in charge of generating a set of paths as can-didate solutions; a deductive reasoning module selects then the "best" starting point, according to explicit knowledge modeled over domain experts suggestions. Interestingly, this kind of coupling allows us to transfer to the automated path planner some of the knowledge available at human level: the inductive learning module "catches" via demonstrations expert capabilities that are hard to explicitly express (e.g., visual-spatial, bodily-kinesthetic), while the deductive module formally encodes what has been elaborated by the experts upon long-lasting practice (e.g., domain knowledge, best practices). Furthermore, the deductive technique based on a declarative formalism grants several advantages: on the one hand, it makes the knowledge easy to maintain and update; on the other hand, it allows us to provide the final user with a highly customizable tool with real time visual feedback, as she can decide what is important for choosing the best path, and to what extent, for each case at hand.
Eventually, we assess the viability of the proposed approach in a use case, namely 3-D path planning for neurosurgical steerable needle, proving that it stands or even outperform state-of-the art solutions.
The remainder of the paper is structured as follows: in Sect. 2 an overview of the path planning approaches proposed in literature is presented. Section 3 outlines our path planning method. Section 4 describes the experimental protocol used. Section 5 presents the comparison between the presented solution, the expert manual path definition and a state-ofthe-art path planning method for the specific neurosurgical application. Discussion and Conclusions are reported in Sect. 6.

Related works
Several approaches for path planning have been proposed in literature: graph-based, sampling-based, optimisation-based, heuristic-based, learning-based, reasoning-based methods. These methods are described below, and summarised in Table 1, according to optimality (i.e. an algorithm is known to be and "optimal" since it can estimate the best path, according to a certain criteria, given the specific resolution of the approximation); completeness (i.e. an algorithms is known to be "complete", as it can determine whether a solution exists in finite time); scalability (i.e. an algorithms is known to be "scalable" as it can plan a path in a reasonable time even if the search space increase in size); computational time (i.e. the execution time to obtain a solution); the ability to plan within a complex environment (i.e. the environment is composed of many obstacles with elaborate shapes, narrow passages and tangled locations); and the ability to obtain a smooth path (i.e. able to minimise the along the curvature).
Dijkstra algorithm (Dijkstra, 1959) and A* (Hart et al., 1968) are graph-based methods based on the discrete approximation of the planning problem. Many methods represent the environment as a square graph, or as an irregular graph (Kallem et al., 2011), or a Voronoi diagram (Garrido et al., 2006). A search is performed in order to find an optimal path. These algorithms are known to be "resolution-complete", as the one proposed in Fu et al. (2021) that provides more guarantees also in terms of efficiency, and "resolution-optimal". This approach may also be used for identifying a restricted area where further optimisation refinements can be performed (Huang et al., 2009). Notably, even though graph-based methods are relatively simple to implement, they require considerable computational time as the environment size increases (Bellman, 1966) or becomes complex. Tangent graph-based planning methods for a given environment build a graph by selecting collision-free common tangents between the obstacles. These methods allow more accurate path planning around curved obstacles without errors caused by polygonal approximation; however, these methods are not always suitable when considering the kinematics limitations of a moving agent and require additional optimisation steps (Tovar et al., 2007) to obtain a smooth path.
Sampling-based methods are based on a random sampling of the working space, with the aim of significantly reducing the computational time. Rapidly-exploring Random Tree (RRT) (LaValle & Kuffner Jr., 2000) is an exploration algorithm for quick search in high-dimensional spaces, more efficient than brute-force exploration of the state space. In fact, this class of methods is scalable and capable of planning in a complex environment. Its enhanced versions, RRT* (Jordan & Perez, 2013;Favaro et al., 2018) and bidirectional-RRT (Karaman & Frazzoli, 2011) are "probabilistically complete" since the probability to find an existing solution tends to one, as the number of samples goes to infinity, and "asymptotically optimal", as they can refine an initial raw path by increasing the sampling density of the volume.
Paths computed with the approaches mentioned above can be further refined using Bezier curves (Hoy et al., 2015), splines (Lau et al., 2009), polynomial basis functions (Qu et al., 2004), or with optimisation-based methods such as evolutionary algorithms, simulated annealing, and particle swarm (Besada-Portas et al., 2010) to obtain a smooth path. These approaches have the advantage of working properly in complex environments, as demonstrated in Favaro et al. (2021); however, they require higher computational time than the sampling-based methods.
Artificial Intellingence (AI) has been increasingly used while dealing with path planning tasks (Erdos et al., 2013) in the last decade. Heuristic-based techniques such as greedy (Sniedovich, 2006) and genetic algorithm (Whitley, 1994) are two examples of AI approaches; they belong both to the class of optimisation procedure. Greedy algorithms for path planning are often used in combination with other approaches. This kind of algorithms fails to find the optimal solution, as it takes decisions merely on the basis of the information available at each iteration step, without considering the overall picture. Genetic algorithms are also used to generate solutions for path optimisation problems based on operations like mutation, crossover and selection. For this kind of algorithms, the most relevant limit is computational time, as it significantly increases with the search space. Completeness depends on the heuristic function.
Learning-based methods are more flexible than graphbased and sampling-based methods (Xu et al., 2008); indeed, they allow one to directly include all expected constraints and optimality criteria (obstacle clearance, kinematic constraints, minimum path length) in the optimisation process, without the need for subsequent refinement steps, which are time-consuming and may still not lead to the optimal path (Segato et al., 2020). Mnih et al. (2015) and Lillicrap et al. (2015) showed that Deep Reinforcement Learning (DRL) is suitable for solving path planning problems, and several studies (Mirowski et al., 2016(Mirowski et al., , 2018Tai et al., 2017) applied DRL in path planning. Panov et al. (2018) made use of the DRL approach to a grid path planning problem with promising results on small environments. Inverse Reinforcement Learning, also known as an Inductive Learning (IL) technique (Michalski, 1983) that is a process where the learner discovers rules by observing examples, has been applied to a wide range of domains, including autonomous driving (Wulfmeier et al., 2016), robotic manipulation (Crooks et al., 2016) and grid-world planning (Nguyen et al., 2017). In general learning-based method are not optimal or complete, the computational time is not high, and it is not increasing when the search space increase. They perform well in complex environment even if it is dynamic because they don't need prior information about obstacles, as we demonstrated in our previous work (Segato et al., 2021a).
Reasoning-based approaches for path planning have been successfully designed, providing high-level methods like in Lifschitz (2002). They are optimal and complete. Portillo et al. (2011) successfully solved the path planning problem, Gómez et al. (2021) and Erdem et al. (2015) encoded multiagents pathfindings and Erdem et al. (2012) used a deductive reasoning-based approach to control and plan the actions of multiple housekeeping robots whose aim is to tidy up a house in the shortest possible time and to avoid collisions between themselves and other obstacles. Reasoning-based approaches have the capability of explicitly representing domain knowledge; however, a path planning system for complex environments based only on a deductive reasoningbased method or similar approaches might be insufficient, as current implementations cannot handle an excessive increase of the search space and generalise on different environments (Erdem et al., 2012).

Materials and methods
In this work, we propose a novel approach for path planning in 3-D complex environments that combines IL and Deductive Reasoning (DR); we refer to it as the ILDR method. In this way we can exploit all the advantages of the first technique (scalability, low computation time, capacity to plan in a complex environment taking into account kinematic constraints of the robot providing a smooth path if necessary) and the second technique (completeness, optimality, capability of explicitly encoding medical knowledge, solid theoretical bases coupled with a fully declarative nature) which allows to produce formal specifications that are already executable without the need for additional algorithmic coding, thus fostering fast prototyping and easing the interaction with domain experts. The novel aspects of this approach is that it not only includes all the fundamental requirements of the path planning task, but take also into account the expert's knowledge to fully understand the decision-making process that guides the optimal path selection.
To test viability and effectiveness of our approach, we have chosen keyhole neurosurgery as a case study. In this context, path planning is crucial when the pathological target (e.g., a tumor or left subthalamic nucleus (LSTN)) is located in deep brain areas and cannot be safely reached by a flexible probe. Thus, the optimization criteria involved in finding the best path in this case scenario are: the maximisation of the minimum and average distance with to obstacles, so as to avoid delicate structures in the brain and the minimisation of the length and curvature of the path, so as to limit damage to the brain matter.

Moving agent
Let us consider an "agent", showed in Fig. 1a, moving in a 3D complex environment. The agent in this work is the tip of a steerable needle, a new biomimetic flexible probe (PBN) (Burrows et al., 2013), that can translate and rotate in space. Its kinematic constraints are the "diameter", d out , and the "maximum curvature", k max .

Environment
The "3D complex environment" (Env) is showed in Fig. 1b. The "configuration space", C-space, is the set of all the possible t "agent configurations", T agent t , with t ∈ C-space. The agent configurations are described by poses, denoted as 4 × 4 transformation matrices: where p agent t ∈ R 3 is the tip position and R agent t ∈ SO(3) is the orientation relative to a world coordinate frame. The "obstacle space", C obst ∈ C-space, is the space occupied by obstacles. The "free space" C f ree ∈ C-space, is the set of possible configurations (T agent = T f ree ∈ C f ree ; T agent = T obst ∈ C obst ). Moreover we can define: the "start configurations" T start k ∈ C f ree with k ∈ 1..N , the "target space" C target ∈ C f ree , that is a subspace of "free space" which denotes where we want the needle to move to, and the "target configuration" T target ∈ C target .

Actions
The agent at every t-th time step can take an action a t = (x t , y t , z t , α t , β t , γ t ), moving towards the T target , where x,y and z are the axes and α, β, γ the angles around the axes respectively. Actions moving the agent toward an T obst or outside the Env are considered inadmissible.

Path planning problem
The path planning problem, described in Fig. 2a, can be formulated as follows: given the "start configurations" T start k k = 1 : N and a "target configuration", T target , the task is to find a path, Q = {T agent 0 , T agent 1 , ..., T agent n−1 }, T agent 0 = T start k , T agent n−1 = T target , and The moving agent kinematic constraints are the needle diameter (d out ) and the maximum curvature (k max ) that it can perform. At tth time step it can translate (p agentt ) and rotate (R agentt ), performing an action a t , from its configuration, T agentt . b (on the right) The 3D complex environment is represented by a brain structure, the obstacle space (C obst ), the free space (C f ree ), the agent configuration (T agent ) T agent t and T agent t+1 are connected by straight segments, as an admissible sequence of "agent configurations".

Inductive learning-based and deductive reasoning-based (ILDR) architecture
The herein proposed approach to path planning in 3-D complex environments, referred to as the ILDR method, is summarized in Fig. 2b. The best path is obtained by means of a learning approach that generates a set of path candidates and an expert classifier that selects the "preferred" one, according to a set of rules set by the domain expert.

Inductive learning model
As shown in Fig. 3, the inductive component iteratively trains the moving agent to generate optimal paths, {Q I L k }, thanks to expert's manual demonstrations, {Q Manual }.

Generator
With several successful applications in robot control, RL that searches for optimal policies by interacting with the environment becomes one potential solution to learn how to plan a path. Segato et al. (2020), based on DRL, demonstrated its capability of directly learning the complicated policy of planning a path for steerable needle.
In the proposed approach, the agent interacts with the environment, Env. At each time step (t), the agent updates its current state (T agent ) and selects an action (a t ), according to the policy (π ), such that: In response, the agent receives the next state (T agent t+1 ) and observations (explained in the next paragraph). The goal of the agent is to determine an optimal policy π allowing it to take actions inside the environment, maximizing, at each t, the cumulative extrinsic reward r ex (T agent , a). The latter is associated to the real, full state of the system and to the agent's observations. In this way, the moving agent successfully learns to follow the best path in accomplishing its task and its constraints of maximum curvature, k max , and diameter, d out .
Observations The observations collected step-by-step by the agent while exploring the environment are: the length (d tot ) of the path, the minimum (d min ) and average (d avg ) distances from the obstacle space {C Obst }, the maximum curvature of the path (c max ), the target angle (α target ) and the computational time (T ). For each path Q G = {T agent 0 , T agent 1 , .., T agent n−1 } where T agent 0 = T start k k = 1 : N , the observations are:

Fig. 2
Path planning problem and method. a Schematic representation of the 3D path planning problem elements, start configuration (T start k = T agent0 ) and target configuration (T target = T agentn−1 ). The task is to find the optimal path (Q = {T agent0 , T agent1 , .., T agentn−1 }). At every tth time step T agentt and T agentt+1 are connected by straight segments, as an admissible sequence of "agent configurations", taking into account the "obstacle space", C obst . b The proposed path planning algorithm exploits two merged approaches: inductive learning-based and deductive reasoning-based approaches. The expert indicates N start configurations, (T start k with k ∈ N ), the target configuration, (T target ) the rules and performs a set of demonstrations. While the agent acts and observes in the virtual environment, the expert's demonstrations and the agent's observations feed the inductive approach that generates a set of optimal trajectories {Q I L k } witk k equal to the N start configurations. The rules and the kinematic constraints feed the deductive approach that extracts the best path Q I L DR .
-Path length (d tot ): the distance, d, between any two positions can be calculated based on the Euclidean distance: with t ∈ 1, ..., n, n = ||Q G and -Minimum distance (d min ): given the the line segment (p agent t p agent t+1 ) and the m obstacles represented by the occupied configurations T obst j = R obst j p obst j 0 1 with j ∈ 1, ..., m, m = ||C obst , the minimum distance of the path from the nearest obstacle indicating the level of safety is the minimum length between the line segments and the closest obstacle, such that: -Average distance (d avg ) of the path from all the nearest m obstacles: it is calculated over the whole length of the path with respect to all the obstacles, such that: -Maximum curvature (c max ) of the path: curvature c of a path in a 3-D space is the inverse of the radius r t of a sphere passing through 4 positions of the path (p agent t , p agent t+1 , p agent t+2 , p agent t+3 ) computed for each t (the method is explained comprehensively in "Appendix A.1"). Subsequently, the maximum curvature can be extracted as follows: -Target angle (α target ): given the 3-D unit vector representing the agent direction −−−−−−−−−→ p agent t p agent t+1 and the one representing the target direction − −−−−−−−→ p agent t p target , the target angle is defined as follows: The expert gives in input the dataset (C f ree and C obst ), the start and target configuration (T start and T target ) and the kinematics constraints (d out and k max ). The IL model is trained through a loop that starts with paths generated by an expert ({Q Manual }) and paths ({Q G }) performed by the network generator (G) based on a Reinforcement Learning (RL) approach. With a Generative Adversarial Imitation Learning (GAIL) approach a discriminator (D) with its network (D w ), takes in input the expert and generetor policies (π E and π θ ) the starting value of the policy's parameters and of the discriminator (θ 0 and w 0 ). During the training phase, it makes a comparison of these two paths generating an intrinsic reward (r in (T agent , a)) based on the similarity score (r sim ), updating the agent's polici π . The loop continues until the generator, moving in the environment (Env) with actions (a i ), collecting observations (d tot , d min , d avg , c max and α tr ) and computing an extrinsic reward(r ex (T agent , a)), can produce a path similar to the expert's demonstrations and that respects the kinematic constraints. Once the IL model is trained, it generates a set of optimal paths; the weights (w dtot , w d min and w cmax ) and the kinematic constraints are taken as input by the DR classifier that extracts the best path with an approach based on Answer Set Programming (ASP). Applying hard and soft constraints obtaining the best path Q I L DR Reward function (Algorithm 1) the reward function associated with each time step t is shaped in order to make the agent learn to optimise the path, according to three main requirements: 1. path length minimisation 2. obstacle clearance maximisation 3. moving agent's kinematic constraints respect.
The reward r ex (T agent , a) is defined as: -A positive constant reward, r goal t , is given upon reaching the target. -A negative constant reward, r obst t , is given if an obstacle collision is detected. -A negative reward, r step t = − 1 S max , is given at each step t of the agent in order to obtain a reduction in the com-Algorithm 2 Path planning with GAIL Input: Expert paths {Q Manual } ∼ π E , initial policy and discriminator parameters θ 0 , w 0 1: for i =0,1,2,... do 2: Generated paths {Q G } ∼ π θ i 3: Update the D w parameters from w i to w i+1 with the gradient ∇ w : 4: Take a policy step from θ i and θ i+1 with Trust Region Policy (TRPO) cost function log(D w i+1 (T agent , a)) with the gradient ∇ θ : 6: Algorithm 3 Classifying Path with ASP where S max corresponds to a fixed threshold representing the maximum number of steps t.
-A negative constant reward, r dist t , is added whenever the minimum value of d min t is lower than a predefined safe distance (d sa f e = d out 2 ), corresponding to the occupancy of the moving agent. This reward aims to maximise the d min t to reduce the risk of collision and increase the safety rate of the path. -A negative constant reward, r curv t , is assigned if the current path curvature c t overcomes the maximum value of the curvature of the moving agents specified by the user (k max ). -Finally, a negative reward, r target t = − α target t α max r deg , is added to minimise α target t in order to further minimise c and T parameters. The value of this reward is proportional to the ratio between α target t and the maximum angle α max .
The parameters of the reward function are reported in Table 2. The optimal parameters' value have been obtained by fine-tuning with grid-search procedure. Sub-optimal parameter configurations caused the agent to learn and apply inappropriate actions, e.g. moving too close to obstacles, going in the opposite direction with respect to the target, choosing non-optimal paths in terms of distance.
In this way, with the generator, G, we are able to obtain new paths, Q G .

Discriminator
In GAIL (Ho & Ermon, 2016) the task of the discriminator, D, is to distinguish between the paths, {Q G }, generated by G with a RL approach, and the demonstrated paths {Q Manual }.
has successfully matched the demonstrated path, {Q Manual }, reaching a level equal to or higher than one of the expert thanks to the additional observations collected and extrinsic reward received. As showed in Algorithm 2, the proposed path planning approach receives in input the expert's paths, {Q Manual }, the path generated from the Generator, {Q G }, and the initialization of the policy's parameters, θ 0 , and of the discriminator, w 0 . The path {Q G } fits a parameter policy π θ (where θ represent weights), while the manual demonstration, {Q Manual }, fits an expert policy, π E . The discriminator network D w (w, weights) learns to distinguish the generated policy, π θ , from the expert one, π E . The parameters w of D w are updated in order to maximise Eq. 10: where ∇ w is the gradient, E[ ] is the expectation value with respect to a policy, π θ , or to the expert policy, π E , D w T agent , a is the Discriminator network that evaluates the state (T agent ) and the action (a). TRPO (Schulman et al., 2015) assures that the change of parameter θ between π θ + 1 and π θ is limited: where ∇ θ is the gradient, H (π θ ) = E π θ − log π a|T agent is the causal entropy and its value by λ ∈ , λ > 0. Finally the discriminator, D, updates the agent's policy π to be close to the expert policy π E : using an intrinsic reward, defined as r in (T agent , a)) = r sim . The similarity reward r sim proportional to −log(1 − D w (T agent , a) is an increasing reward when the results approach of D w approaches r sim = 1, i.e. D w is not able to Algorithm 4 Calculation of radius curvature 1: Input: four successive position of the path p agent1 (x 1 , y 1 , z 1 ), p agent2 (x 2 , y 2 , z 2 ), p agent3 (x 3 , y 3 , z 3 ), p agent4 (x 4 , y 4 , z 4 ) ∈ Q = {T agent0 , T agent1 , .., T agentn−1 }. 2: Output: Ray of the sphere r i that passes through the input positions.
3: A = det x 1 y 1 z 1 1 x 2 y 2 z 2 1 x 3 y 3 z 3 1 x 4 y 4 z 4 1 4: if A == 0 then 5: The four positions are coplanar 6: else Four positions determine a unique sphere if and only if they are not coplanar 7: D, E, F and G are determined through the Cramer's rule 8: for i = 1 in 4 do 9: x 1 y 1 t 1 1 x 2 y 2 t 2 1 x 3 y 3 t 3 1 x 4 y 4 t 4 1 G = det x 1 y 1 z 1 t 1 x 2 y 2 z 2 t 2 x 3 y 3 z 3 t 3 x 4 y 4 z 4 t 4 12: Coordinate of the centre of the sphere 13: Radius of the sphere discriminate well the two paths. The inverse happens when D w approaches r sim = 0, i.e. it is able to discriminate well between the two paths, and the reward goes to 0. The trained IL model generates the paths {Q I L k } k = 1 : N .

Deductive reasoning classifier
As already introduced above, the optimal path is selected among the N paths given by IL and the paths produced by the deductive module by means of an ASP-based DR Classifier (Gelfond & Lifschitz, 1991;Brewka et al.,2011;Leone & Ricca, 2015). Among modern declarative formalisms, ASP, originally developed in the field of logic programming and non-monotonic reasoning, has became widely used in AI, and it is recognized as an effective tool for Knowledge Representation and Reasoning (KRR). This is especially due to its high expressiveness and the ability to deal with incomplete knowledge, and also because of the availability of robust and efficient solvers; more details on ASP are given in "Appendix A.2".
A domain expert provided the knowledge that we encoded into a logic program consisting of rules and constraints and readable by the ASP solver. For each instance of the path planing problem, the ASP program is hence coupled to a proper representation of the pool of paths calculated by the IL model, {Q I L k } (the "input") and fed to the ASP solver, thus choosing the best one, {Q I L DR }. In the following, we provide more details about the encoded deductive path selector.
A path Q I L DR x cannot be selected if the maximum curvature c max (Eq. 7) measured for Q I L DR x is bigger than the k max , as it can be seen in the following code snippet (In order to fully understand the syntax, we have provided some examples in "Appendix A.2").: where the atom maxCurve(Q I L DR x ,c max ) couples the path Q I L DR x with the maximum curvature c max . A path Q I L DR x must be discarded also if it approaches the sensitive structures at a minimum distance d min (Eq. 4) smaller than the radius (r = d out 2 ) of the moving agent: Along with these hard constraints, that make inappropriate paths to be discarded, we identified several criteria for expressing preferences among admissible ones. In particular, starting from the expert's knowledge, we designed some "soft constraints", and three corresponding weights (w d tot , w d min and w c max , w ∈ R > 0) used to express preferences towards paths that feature minimum (or maximum) adherence to the criteria.

Some additional insights are provided in description of Algorithm 3.
It is worth noting that the purely declarative nature of ASP easily allows fine-tuning the desiderata by combining the constraints. Not only the system can easily be improved if new or more specific knowledge is available from the experts, but the user can change the behavior of the classifier at will when in use. The provided interface gives the user the possibility to compose the desiderata and repeat this step multiple times, after the IL model has generated the output, {Q I L k } (agnostic to the human-chosen desiderata), changing the inputs until she is satisfied with the obtained path. In this case, new weights and more (or less) restrictive constraints (i.e., increasing k max or d out ) can be set. Furthermore, the user can decide to take into account all, some, or none of the encoded preferences, depending on the specific application; if she chooses to apply more than one of these rules, then also different priorities can be set, expressed by the weights. Hence, the capability to customize the set of rules and (hard/soft) constraints to use for each case study makes our tool highly flexible and generalizable.

Experimental protocol
Criteria for defining the "best" surgical path are several; their importance depends on the application at hand. In our experiments, performed in static simulated environments, we focused on Deep Brain Stimulation (DBS) and Convection Enhanced Delivery (CED), which are two relevant applications of steerable needles in keyhole neurosurgery.

Neurosurgical environment
3-D brain structures for CED and DBS environment were identified on two datasets: (1) Time-of-Flight (ToF) Magnetic Resonance (MR) for vessels visualisation, (2) T1 for brain cortex, skull surface, arterial blood vessels, ventricles and all the relevant deep grey matter structures visualisation, segmented through FreeSurfer Software (Fischl, 2012) and 3-D Slicer (Pieper et al., 2004).
-As reported in Fig. 4a, in CED the target space is the tumor=C target , surrounded by different essential structures (ventricles, thalamus, pallidum and vessels), that represent the obstacle space, C obst , while gyri represent the free space, C f ree . -As reported in Fig. 4b , in DBS the target space is the LSTN=C target , located in the central brain core. The obstacle space, C obst , is represented by relevant structures (ventricles, thalamus, pallidum, vessels and CST), while gyri represent the free space, C f ree .

Neurosurgical simulator
A planning tool is implemented in 3D Unity (Goldstone, 2009) and integrated with Machine Learning (ML)-Agents Toolkit (Juliani et al., 2018) that allows to visualize the 3D segmented risk structures of the brain of the patient derived from that data. We designed and developed a neurosurgical simulator, i.e. a "Brain Digital Twin", described in Fig. 5 and shown in the animation (Online Resource 6.2) to create the environment, collect manual demonstrations by the expert surgeon with a joystick controller (with a combination of the translation along the z axis and the rotation around the x and y axes) and train the moving agent. First of all the surgeon is asked to choose the curvature k max , and diameter d out , and the parameters he/she wants to prioritize in the selection of the best trajectory, w d tot , w d min and w c max , used either to minimize or to maximize the rule expression. The surgeon can then select a target configuration in the brain (T target ), e.g. on the tumor and N start configurations (T start k ), on the brain cortex. Once the start and target configurations are defined, it is possible to proceed to either a manual or automatic, pre-or intra-operative procedure. The difference between these last two is the dynamism of the environment in representing the needle-tissue interaction and so the non-holonomic constraints that affect the behaviour and the needle movements. The surgeon can in fact proceed with a static environment or with a dynamic environment whose model is based on a Positions Based Dynamics simulator (Segato et al., 2021b) used to emulate the brain tissues deformations in Key-hole Neurosurgery (KN). The needle model is considered as particle system (O'Brien et  , 2001). For the validation a pre-operative procedure with a static environment was considered. It is assumed that the motion of the needle tip fully determines the motion of the needle ("follow-the-leader" deployment) with a combination of the translation along the z axis and the rotation around the x and y axes.

Experimental validation
The results' assessment for both scenarios, CED and DBS, is based on the comparison of the proposed method, ILDR, with the Manual and DR approaches. Moreover, in the DBS scenario, ILDR was tested against the Rapidly-exploring Random Trees (RRT)* algorithm. As shown in Fig. 6, an expert surgeon (age: 37, performed surgical procedures: 2440) was asked to select, for each environment Env s , 10 desired start configurations, T s start k , on the brain cortex, a target configuraion, T s target , on the target space, C target , and the weights, w s c max , w s d min , w s d tot for the rules prioritisation, reported for both scenarios in Table 3. j experiments, E X P j (with 1 ≤ j ≤ 5), were conducted for each one of the four approaches: Manual, DR, RRT* (only for DBS Environment) and ILDR.
-Manual approach: For each E X P j , the surgeon was asked to generate a pool of surgical paths, { path k } (with 1 ≤ k ≤ 10), and choose the optimal one, path Manual,s j , based on his expertise. -DR approach: For each E X P j was considered the same pool of surgical paths generated in the manual approach,{ path k }, and the optimal one, { path D R,s j }, was selected with the DR classifier, using rules, weights and kinematic constraints given in input by the surgeon. -RRT* approach: For each E X P j the pool of paths, { path k }, was generated with the RRT* algorithm. The optimal one, { path R RT * ,s j }, was selected with a Cost Function F cost , to be minimised: (4) select starting and target configurations , T start k and T target , (5) perform manually the trajectories with a joystick controller or let the agent to perform trajectories autonomously with ILDR and finally (6) visualise all the generated trajectories Using rules, weights and kinematic constraints given in input by the surgeon. For more information on the implementation of this approach please refer to our previous work Segato et al. (2019). -ILDR approach: For each E X P j the pool of paths, { path k }, was generated with the IL model. The optimal one, { path I L DR,s j }, was selected with the DR classifier, using rules, weights and kinematic constraints given in input by the surgeon.
For each path, { path s j }, we calculated: -The length (d tot ) of the path, as described in Eq. 3; -The minimum (d min ) and the mean (d avg ) distances of the path with respect to all the obstacles, as described in Eqs. 4 and 5; -The maximum curvature (c max ) of the path, as described in Eq. 7.

Hardware specification
Experiments were performed on a Linux machine equipped with a 6-core i7 CPU, 16GB of RAM and 1 NVIDIA Titan XP GPU with 12GB of VRAM.

IL training strategy
The training phase, for IL models, for each Environment, takes in input w start (T start w , with 1 ≤ w ≤ 20) and target z (T target z , with 1 ≤ z ≤ 5) configurations and y expert manual path (Q manual y , with 1 ≤ y ≤ 10) for each start and target. The number of manual demonstrations ||x = 1000, (with 1 ≤ x ≤ ||w × ||z × ||y), is obtained by combining the number of demonstrations (||y) provided by the expert user for each couple of start and target (||w×||z). At every episode randomly, a new T start w was chosen among the available ones along with its relative q target z . Table 4 presents the training parameters values referred to the CED and DBS models.  6 Schematic representation of the experimental protocol workflow Each environment Env s (with 1 ≤ s ≤ 2), takes in input the start configurations, T s start k (with 1 ≤ k ≤ 10), the target configuration, T s target , the weights, w s cmax , w s d min , w s dtot and the kinematic constraints of the moving agent, d out and k max . j experiments, E X P j (with 1 ≤ j ≤ 5), were conducted for each approach: Manual, DR, RRT* (only for DBS Environment) and ILDR. A pool of surgical paths, { path k } is generated, and the optimal one, path s j , is selected

Statistical analysis
All the performance metrics (d tot , d min , d avg and c max ), extracted from the path, were analysed employing Matlab (The MathWorks, Natick, Massachusetts, R2020a). Lilliefors test has been initially applied for data normality. Due to the non-normality of data distribution, pairwise comparison was performed with the Wilcoxon matched-pairs signed-rank test. Differences were considered statistically significant at p value < 0.05. Figure 7a shows a comparison between Manual, DR and ILDR approaches in terms of d min , d avg , c max and d tot calculated over the best path of left hemisphere (for each approach, the criteria for the selection of the best path have been described in Sect. 4.3). DR approach keeps a greater Fig. 7 Comparison between the different approaches. For both CED (a) and DBS (b) environment, expert's rules, and needle kinematic constraints are reported in the upper part. a A comparison between Manual, DR, and ILDR approaches in the CED environment. b A comparison between Manual, DR, RRT*, and ILDR in the DBS environment. The results for both considered scenarios and used approaches are reported in terms of the minimum (d min ) and the mean (d avg ) distance from the critical obstacles, the total path length (d tot ) and the curvature (c max ) calculated over the five best paths for each approach. P values were calculated using Wilcoxon matched-pairs signed-rank test Fig. 8 Comparison between DR and ILDR approaches. For both CED (scenario 1) and DBS (scenario 2) environment, one example of the obtained path is shown. In particular, a a comparison between DR and ILDR approaches in the CED environment. b A comparison between DR and ILDR in the DBS environment. The results for both considered scenarios show an increase in safety (>> d min ) and smoothness (<< c max ) and a reduction in length (<< d tot ) for the proposed method (ILDR) d min and d avg from obstacles and a significantly lower d tot ( p value < 0.01) of the path with respect to Manual approach following the rules dictated by the expert. ILDR approach keeps a significantly greater d avg ( p value < 0.05) from obstacles and a significantly lower c max ( p value < 0.01) and d tot ( p value < 0.01) with respect to DR approach and Manual approach. Figure 8a shows the visual comparison between one resulting path obtained with DR and ILDR approaches considering the previously mentioned optimisation criteria (safety, smoothness and path length), that result better for the ILDR approach even by visualization inspection. Figure 7b shows a comparison between Manual, DR, RRT* and ILDR approaches in terms of d min , d avg , c max and d tot calculated over the best path of left hemisphere (for each approach, the criteria for the selection of the best path have been described in Sect. 4.3). DR approach keeps lower d tot and c max of the path than the Manual approach following the rules dictated by the expert who gives more importance to these two parameters in this case. While ILDR approach keeps a significantly greater d min ( p value < 0.01) and d avg ( p value < 0.01) from obstacles and a lower c max and d tot than the DR approach. The comparison between ILDR and RRT* approaches is showed in terms of d min , d avg , c max and d tot calculated over the best path of left hemisphere. ILDR approach keeps a significantly greater d min ( p value < 0.0001) and d avg ( p value < 0.0001) from obstacles and a lower c max and a significantly lower d tot ( p value < 0.01) than the RRT* approach. Figure 8b shows the visual comparison between one resulting path obtained with DR and ILDR approaches considering the previously mentioned optimisation criteria (safety, smoothness and path length), that result better for the ILDR approach even by visualization inspection.

Computational time
In Table 5, the computational time T for all the analysed approaches is reported. The ILDR approach is twice as fast compared to the Manual one and keeps much lower computational time than the state-of-the-art sampling-based method RRT*. Although this value is not essential for the proposed pre-operative procedure that is performed off-line, this demonstrates that the proposed method may be poten-tially applicable to an intra-operative procedure requiring a fast planning time.

Discussion and conclusion
The present work proposes a novel automatic path planning approach, called ILDR, for a moving agent in a complex environment. The complexity of the environment represents the worst-case scenario that grants the applicability and reliability of the method. In our experiments, ILDR performed better in terms of obstacle clearance and moving agent kinematic constraints compliance when tested against the Manual approach, the DR classification approach and the RRT* algorithm. By simultaneously optimising paths according to all the requested features, the proposed method outperforms state-of-the-art approaches in terms of path safety, path length, and computational time.
Our method succeeds in obtaining the optimal paths that can be followed to reach a specific target according to rules set by an expert. This approach allows to fully exploit an expert's knowledge: he/she first performs the demonstrations used to train the GAIL model and then selects the constraints and their priorities, which ultimately lead to the choice of the best path with ASP.
It is worth noting that one of the main contributions of the present work consists in the integration of an inductive learning-based approach with a deductive reasoning-based approach. The inductive learning-based method allows the agent to learn the policy by a set of demonstrations provided by an expert, who can introduce in a path planning algorithm all his requirements and knowledge that cannot always be possible in graph-or sampling-based approaches unless additional optimisation steps are applied with additional computational time. Explicit programming cannot fully cover the complexity of the environment (represented by the human brain in this case, due to the presence of delicate and very complicated anatomical structure, narrow passages), the number of parameters and possible complications that have to be considered during the path planning. For this reason we implemented a DR classifier with a user interface, as described in the final part of Sect. 3.2.4, where the experts can express their individual preferences assigning different weights, thus creating a priority list for maintaining different path planning optimisation criteria (i.e., giving more priority to path safety than path length) while visualizing the trajectory and changing the criteria in real time. The DR method is implemented using ASP, that allowed us enjoy several advantages. First of all, even if the whole machinery can be embedded into a graphical user interface for user's convenience, under the hood we are dealing with knowledge explicitly expressed via a declarative formalism: modifying and adapting the criteria for dropping unwanted paths and selecting the preferred one(s) is rather easy. Furthermore, as the specification are formally encoded, once the optimization criteria are well-established, we are ensured that the best path is actually chosen, and if more than one are present with the same "score", then picking one or the other is completely indifferent. If, for some reasons, this turns to be not the case, this means that the criteria should be modified, which, as already stated, can be easily done, especially given that the resulting framework allows one to straightforwardly experiments with this respect.

Clinical translation
As part of the EU's Horizon EDEN2020 project, the current study proposes a novel automatic planner for steerable needles in keyhole-neurosurgery. Given the environment of the brain, a surgeon-defined start, and a target, the proposed method can provide an optimal path, according to predefined features as insertion length, clearance from safety regions, as blood vessels and eloquent morpho-functional landmarks, and compliance to the needle's kinematics limits. It is intended to provide a state-of-the-art combined technology platform for minimally invasive surgery. When tested against the RRT* approach, the proposed method performed better in terms of path smoothness and clearance from safety regions, significantly decreasing the length and with a sensibly lower computational time. Accordingly to the possibility to perform curvilinear path for STN and tumor targeting, the proposed algorithm allows optimising the fundamental aspects of the DBS and CED and to maximising both the effectiveness and safety of the procedure.

Future directions
The proposed methodology favours high applicability and generalisability, as it could potentially be applied to different path planning problems. Future perspectives may include the exploitation of this automatic path planner method in many applications additional to keyhole neurosurgical procedures. The development of a surgical simulator defines an example of applications based on anatomical topology but not on anatomical dimensions. It is easy to see that different rules and constraints can be defined upon expert suggestion, thus making our methodology highly customisable and paving the way to extensions to additional 3-D complex environments, beyond brain surgery. Hence, this approach is widely application-independent and can be adapted to other use cases for path planning in a complex environment, where an expert has a crucial role. The application to a totally different context would require a thorough consultation with domain experts and the creation, if missing, of a specific simulation environment. Nonetheless, once the setup is created for a particular domain, as in the current study with brain surgery, it is easily applicable to different problems in such domain, by simply modifying or adding rules.

Supplementary Information
Below is the link to the neurosurgical simulator: https://youtu.be/OM4x4W9lWkE.
Acknowledgements Open access funding provided by Politecnico di Milano within the CRUI-CARE Agreement. We thank Eden2020 project consortium partners for precious advice during the project activities. We thank A. Castellano and A. Falini for providing the dataset for the DBS environment simulation.

Data and materials availability
The MRI datasets used for this study have been acquired in the framework of the European Union project EDEN2020 (https://www.eden2020.eu, Grant Agreement No. 688279) and will be made publicly available at the end of the project.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

A.1 Computing the radius of curvatures
In this appendix the method used for computing the radius of curvature of the path is explained in algorithm 4. The method is based on the computation of the equation of the sphere that passes through four successive positions. The equation of a generic 3D sphere is determined by the Eq. 14 where the four coefficients D, E, F, G are unknown constant.
x 2 + y 2 + z 2 + D(x) + E(y) + F(z) + G = 0 ( 1 4 ) The radius of this sphere is the radius of curvature. First of all, it is important to note that four positions defined a single sphere if they are not coplanar (Purdy & Smith, 2010). In the opposite case there are no spheres that pass through the four positions or there are infinity number of them. For this reason is necessary to verify if the four positions of the trajectory are coplanar. And this is true if and only if the determinant A of the matrix indicated in the algorithm is equal to zero. Since the positions are on the sphere, the substitution of their coordinates in Eq. 14 generates a system of four equations where the unknowns are the coefficients D, E, F, G (Ram, 2009). For resolving this system the Cramer's rule is applied. Then the radius of the sphere can be computed.

A.2 Answer set programming
With ASP, computational problems in a large variety of scenarios can be described by means of simple and elegant logic programs consisting of a set of rules; solutions to a problem instance are then found by computing the semantics of such programs combined with the representation, usually expressed using factual rules, of the instance at hand. One of the main advantages of ASP consists of its purely declarative nature: rather than focusing on algorithm design and coding, and thus on how to solve a computational problem at hand, with ASP one can focus on how to describe such computational problem (or how its solutions should look like), completely avoiding the need for explicitly express the steps to be executed. In turn, order of statements in a ASP logic program is immaterial: explicit updates in the problem specification can be more easily incorporated, thus fostering advantages such as fast prototyping, quick error detection and modularity. Besides, a clean model-theoretic semantics grants correctness; intuitively, an ASP program can be seen as a formal yet executable description of the problem. The basic construct of ASP is a rule, that has a general form of Head ← Body; the Body is a logic conjunction in which nonmonotonic negation may appear, and Head can be either an atomic formula or a logic disjunction. Rules are interpreted according to common sense principles: roughly, the intuitive semantics of a rule corresponds to an implication. The answer set semantics associates a problem specification with none, one, or many intended models, called answer sets, each one corresponding to a solution; an ASP program that models a computational problem, coupled with a proper representation of an instance of such problem, can be fed to an implementation of ASP, called ASP solver, in order to actually compute all corresponding answer sets. Efficient and reliable ASP solvers exist, such as DLV/DLV2 (Adrian et al., 2018;Leone et al., 2006), WASP for solving the prepositional part of DLV2 (Alviano et al., 2019) and clingo (Gebser et al., 2019).
For a full description of ASP syntax and semantics, along with examples of its applications in academy and industry, we refer the reader to Calimeri et al. (2020) and Lifschitz (2019) and the vast literature. We briefly recall here some very basic preliminaries useful to understand the herein proposed approach.
A variable or a constant is a term. Variables are denoted by strings starting with some uppercase letter, while constants can either be integers, strings starting with some lowercase letter or quoted strings. If t 1 ,...,t k are terms (either constants or variables) and p is a predicate symbol of arity k, then p(t 1 ,...,t k ) is an atom of arity k.
A literal l is of the form a or not a, where a is an atom; in the former case l is positive, negative otherwise. A rule is of the following form: On the left, the symbol "|" connects atoms that are part of a disjuction in the head, whereas comma separated literals in the right side, i.e., the body, are part of a conjuction. An ASP program is a finite set of rules.
A fact is a rule with empty body, and represents a piece of information known to be true (typically, facts stand for the knowledge granted before reasoning, or represent the instance of a problem); usually, a fact is immediately followed by the "." symbol (i.e., the implication symbol ":-" is omitted). A constraint is a rule with empty head; hard ("strong" or "classical") and soft ("weak") constraints can be specified in order to cut out undesired models and express preferences, respectively. Weak constraints are expressed with the symbol :∼ instead of :-, that is one used for hard constraints. These latter are conditions that must be satisfied, whereas soft constraints represent conditions that should be fullfilled; intuitively, when a solution violates a soft constraint it pays a cost: this induces an ordering among solutions that allows one to express minimization and/or maximization criteria.
ASP enjoys several additional language features for easing knowledge representation; we mention here choice rules, that are a compact way for expressing disjunction of atoms that must adhere to some cardinality conditions and aggregates, that can be used for compact representations of properties and inductive definitions using sets of propositions . The scientific community agreed on a standard language (Calimeri et al., 2020); furthermore, in addition to the standard, several flavours of ASP are supported by solvers, featuring additional constructs such as the #minimize and #maximize statements for expressing preferences in optimization problems similarly to what can be done via weak constraints.
ASP is a very expressive formalism; indeed, in Eiter et al. (1994) it is proved that disjunctive logic programs under answer set semantics capture the complexity class P 2 (that is, they allow us to express every property which is decidable in non-deterministic polynomial time with an oracle in NP), and weak constraints make ASP well-suited to represent a wide class of problems (including, e.g., NP optimization problems) in a very natural and compact way (Buccafurri et al., 1997).
The following example briefly illustrates how the common "Guess&Check" paradigm is used for modelling problem with ASP.
Example 1 Let us consider the well-known problem of 3colorability, which consists of the assignment of three colors to the nodes of a graph in such a way that adjacent nodes always have different colors; this problem is known to be NPcomplete (Brandstädt et al., 1998). Suppose that the nodes and the arcs are represented by a set F of facts with predicates node (unary) and arc (binary), respectively. Then, the following ASP program allows us to determine the admissible ways of coloring the given graph with the three given colors.
Rule r 1 (guess) above states that every node of the graph must be colored as red or green or blue; r 2 (check) forbids the assignment of the same color to any couple of adjacent nodes. The minimality of answer sets semantics guarantees that every node is assigned only one color. Thus, there is a one-one correspondence between the solutions of the 3coloring problem for the instance at hand and the answer sets of F ∪ {r 1 , r 2 }: the graph represented by F is 3-colorable if and only if F ∪ {r 1 , r 2 } has some answer set.
The following example illustrates the use of weak constraints for expressing preferences while dealing with optimization problems.
Example 2 Let us consider the same 3-colorability problem of Example 1. Imagine that we know that some of our graphs are not 3-colorable; as already stated, for such graphs our programs would have no answer sets (meaning that there is no admissible solution). Nevertheless, for such cases we would like to have admissible colorings, even if not complete. The following program, that is slightly different from the previous, allows us to determine all partial colorings. r 3 : color(X,red) | color(X,green) | color(X,blue) | noColor(X) :node(X). r 2 : :arc(X,Y), color(X,C), color(Y,C). Here, rule r 1 of Example 1 is replaced with rule r 2 ; note that now, for the program F ∪{r 3 , r 2 }, admissible answer sets exist with some nodes with no color assigned. These contains answer sets with no node colored at all, others with some colored, and up to others with all node colored, in case the graph is 3-colorable. We can state that we prefer a solution over another one if it features a higher number of colored nodes by means of the following weak constraint: r 4 : :∼ node(X), noColor(X). [ 1@1, X] Now, for a solution of F ∪ {r 3 , r 2 , r 4 }, while if it violates r 2 is inadmissible, and thus discarded, if it violates r 4 then it is assigned a cost as specified in the square brackets. In this latter case, for each node X that has no color, the solution costs 1 at level 1. Weight and level can be constant values or variables appearing in the body of the constraint; furthermore, weights are additive, grouped by levels, among all constraints in the program: in this simple example, just a weak constraint is present, but one can make use of more in order to express complex sets of desiderata. Higher levels correspond to more important desiderata.