Learning intraoperative organ manipulation with context-based reinforcement learning

Purpose Automation of sub-tasks during robotic surgery is challenging due to the high variability of the surgical scenes intra- and inter-patients. For example, the pick and place task can be executed different times during the same operation and for distinct purposes. Hence, designing automation solutions that can generalise a skill over different contexts becomes hard. All the experiments are conducted using the Pneumatic Attachable Flexible (PAF) rail, a novel surgical tool designed for robotic-assisted intraoperative organ manipulation. Methods We build upon previous open-source surgical Reinforcement Learning (RL) training environment to develop a new RL framework for manipulation skills, rlman. In rlman, contextual RL agents are trained to solve different aspects of the pick and place task using the PAF rail system. rlman is implemented to support both low- and high-dimensional state information to solve surgical sub-tasks in a simulation environment. Results We use rlman to train state of the art RL agents to solve four different surgical sub-tasks involving manipulation skills using the PAF rail. We compare the results with state-of-the-art benchmarks found in the literature. We evaluate the ability of the agent to be able to generalise over different aspects of the targeted surgical environment. Conclusion We have shown that the rlman framework can support the training of different RL algorithms for solving surgical sub-task, analysing the importance of context information for generalisation capabilities. We are aiming to deploy the trained policy on the real da Vinci using the dVRK and show that the generalisation of the trained policy can be transferred to the real world. Supplementary Information The online version contains supplementary material available at 10.1007/s11548-022-02630-2.

1 Pneumatic Attachable Flexible Rail Fig. 1: Model of the PAF rail deployed on the kidney surface. The system can be attached to tissues and organ surfaces thanks to the vacuum pressure in the suction cups (gray portion of the system in the figure). A pump creates vacuum in an airtight chamber which is connected to the system via the pressure line. Typically large organs, such as the liver, are manipulated with dedicated retractors, which are normally held by an assistant through the support trocar port. For smaller organs or portions of tissues, the main surgeon operates one of the three arms to manipulate and retract smaller organs like the kidney as well as portions of the bowels in the desired position, actively controlling one arm at the time, positioning it to hold the targeted organ/tissue in the desired position and locking it in place, moving to the next arm if needed and repeating the sequence. Besides the standard laparoscopic retractors, surgeons often also use the shaft of the robotic tools to manipulate the organ towards the desired position. This is a challenging procedure especially due to the geometry of the shaft that requires the surgeon to continuously reposition the tool until the desired position is reached. The PAF rail has been designed with the idea of simplifying these steps, significantly reducing the risk of damages associated with the interaction of rigid tools with soft tissues.

Kidney phantom
An anatomically detailed kidney model was added to the scene to simulate an operative field. It was resized to fit a realistic bounding box of 14x8.1x5.4 (length, width, height) cm 3 . A texture was overlaid using pictures of explanted porcine kidneys. The result is displayed in figure 2a. To accelerate computation in the simulation environment, the kidney was approximated using a cuboid whose dimensions neatly fit the organ model. The approximation error of the surface of the kidney was computed over fifteen equally spaced points positioned on the surface of the model and projected on the corresponding surface of the cuboid, resulting in under 2 mm of average distance. Figure 2b illustrates this setup, with the blue plane and white points belonging to the cuboid and kidney surface, respectively. Moreover, this approximation significantly simplifies the place task, laying the Rail on a plane surface. As shown in figure 2c. The kidney has five triplets of dummies on its top surface, randomly used as targets for the placement of the PAF Rail.
The environment has been developed to allow randomisation of the initial positions of the objects, guaranteeing stochasticity. The randomisation volumes are normalised according to a fixed initial position of the robot, which can itself have a random pose at reset. The randomisation of the kidney is performed following these steps: 1. The horizontal-plane x and y coordinates of the centre of mass of the cuboids are sampled in the interval [-50, 50] mm; 2. The height is lowered with the addition of a vertical translation, given by the projection of the initial position of the robot (fixed vector) on the operative table; 3. The height of the centre of mass is set to 38 mm off the table: this height avoids penetration of the table at reset, whatever the orientation of the kidney; 4. The orientation of the cuboid is randomised in the following intervals, which were chosen to allow some randomisation, yet avoiding too oblique orientations: The random orientation allows the targets to have random heights. The kidney, therefore, is mid-air in some circumstances. This was not considered an issue because the operative field simulates an abdominal cavity, where the kidney is not always flat on a plane surface nor has a portion always in contact with a specific horizontal plane. Given the previous dimensions, the robot is allowed to reach any point in a cube of side 17 cm  3 Clinical Accuracy Fig. 3: As mentioned in Section 1, the PAF rail is able to correctly engaged with the organ when the surface of the suction line is parallel to the organ surface. Experiments have been carried out with explanted porcine kidneys and liver showing that this is the only configuration that permits active suction.

Reaching Target Task
The Reaching task has been tested in a 3D environment where the initial target position is randomised, while the tooltip initial pose is fixed.

Agent
For this task, the agent is represented by the DQN deep neural network built according to the original one described by Mnih et al. in "Human-Level Control through Deep Reinforcement Learning", and whose architecture details can be found below.
Convolutional Neural Network Architecture: The convolutional neural network employed for the Deep Q-Network is from the paper by Mnih et al., except for the input image dimension that has been decreased to reduce the computational demand.

State Observation
The observation for this task is the image acquired by a single vision sensor attached to an Endoscope Camera Manipulator. Specifically, this sensor can render the objects present in its field of view, and the content of the images that it captures is accessible through the API of the employed simulator. The frames that are actually inputted to the agent are preprocessed by first converting them from RGB to grayscale and by finally down-sampling them to 64x64 images to decrease the computational load (see Figure 5). The preprocessing step is applied to the four most recent frames stacked on the width dimension to produce the deep neural network input (i.e. 64x64x4 image). This is done to provide the network with state information over time (i.e. the arm direction of motion and velocity cannot be determined from a single frame), which is fundamental for the agent to perform well in environments with movement.

Action
In order to employ the Deep Q-Network DRL algorithm, the action space has to be discrete. In this environment, the agent can choose among seven possible actions at each time-step: the arm can stay still or move forward and backwards by 3mm in the three spatial dimensions within a predefined cubic area of 30x30x30 mm inside the vision sensor field of view.