Keywords

1 Introduction

Patient safety is a fundamental issue in medical and health care. It was estimated by The Office of Inspector General for Health and Human Services that approximately 440,000 patients suffer some types of preventable harm due to medical errors in hospitals every year, which becomes the third leading cause of deaths in U.S. Behind heart disease and cancer [1]. The surgical skill of a surgeon is one of the important attributes to patient safety. The current master-apprentice style of training in operation room seems not to fully prepare medical students since they do not have enough time to practice within the period of their training. Also, it is recognized that residents-in-training need to be exposed to various surgical emergency situations to be prepared psychologically and skill-wise. One of the important questions is: how are medical students expected to be well trained and not make costly mistakes given the limited amount of practice time on patients? To resolve this dilemma, one approach is to design a realistic surgical simulator system, which may provide a training platform for basic skill practices as well as emergency response. The advantages to use surgical simulator include (1) motion tracking and quantification of hand motion, (2) integration of haptic device to quantify force feedback, and (3) use of machine learning algorithm to compare hand motion between experienced surgeons and medical residents-in-training.

Although the concept of using visualization for skill training may be traced back to the 1970s and 1980s with videogames and primitive flight simulators [2, 3], only in the 1990s (with 3D graphics) and the 2000s (with the use of motion sensors for motion control) has visualization become a tool to construct incredibly realistic virtual reality (VR) based training. One of the successful VR applications is training pilots using flight simulators [46]. Similar to a flight simulator, VR simulators also play an important role in medical education [79]. A VR surgical training simulator is a computer system with certain human/machine interface to simulate surgical procedures in a virtual world for the purpose of training medical professionals, without the need of a real patient, cadaver or animal. A surgical training simulator can provide the capability to learn and practice specific techniques in a controlled setting allowing emphasis on specific aspects of these techniques. Reported evidence shows that VR-based training leads to faster adaptation of novel psychomotor skills and improved surgical performance [10]. It can also save time to be trained in the operating room that may reduce training cost and improve the risk to the patient.

In spite of surgical simulators emergence more than twenty years ago, the quest for their effectiveness has continued up to recently [11]. The challenge to teach a set of complicated surgical skills involves translating a heuristic experience from a skillful surgeon to a trainee who needs to comprehend the given oral instructions and convert to hand motions.

In this paper, we present a haptics-enabled surgical training system integrated with deep learning algorithm for characterization of particular procedures of experienced surgeons to guide medical residents-in-training with quantifiable patterns. We have developed a realistic prototype of VR surgical system for open-heart surgery with specific steps and biopsy operation. Two abstract surgical scenarios are designed to emulate incision and biopsy operational patterns. Using a version of deep learning algorithm [12] proposed by Hinton et al., we demonstrate that a vector with 30 real-valued components can quantify both surgical patterns. These values can be further used to compare how a resident-in-training performs differently as opposed to an experienced surgeon so that more quantifiable corrective training guidance can be provided.

2 Haptics-Enabled Virtual Surgery Training System

In the process of learning, visualization as a cognitive skill plays a central role in navigating different modes of representation. Visualization allows one to make cognitive connections between imaged and observed reality and acts as a bridge for disseminating and accepting knowledge between theory and reality [13]. The same principles also apply to medical education [14]. With enhancement of haptic devices, the virtual surgery training system can be designed to be more realistic by providing “touch-and-feel” when performing on the virtual system.

Our ultimate goal is to develop a comprehensive VR surgical training system with multiple features and human-machine interfaces including 3D immersive visualization, haptic devices with three and six degree of freedoms, free-hand haptic device and motion tracking backed controllers. We have chosen to train for the surgical scenarios of cardiac surgery and the common cancer operation of a biopsy. We have designed the system with the following considerations: (1) setting difficulty levels for each surgical task, (2) incorporating rationale for each difficulty setting, (3) designing assessment methodology based on learning proficiency, and (4) providing feedback based on performance criteria of expert proficiency. By using advanced visualization to recreate the immersive surgical environment and realistic human-computer interface, a surgical simulator can provide virtual training environment for medical students as what a flight simulator offers to train pilots.

One of the benefits to use a realistic simulator for surgical training is, to certain extent, that it could take the place of “cadaver labs,” and make it much easier for surgeons to have access to high fidelity training on “virtual live tissues” that could be made to bleed excessively and provide various anatomical variations that complicate the procedures. It is tremendously advantageous to use a surgical simulator with digital patient over “dead tissue” simulation with a cadaver. This would also allow us to teach not just the procedure, but how to deal with complications of the procedure that require immediate decisions and changes in management. To offer realistic training simulation, it requires both a virtual environment and a realistic haptic interface. This interface needs to be able to track hand movements and allows the user to “grab and use” surgical tools in a virtual environment to actually perform the procedure.

Our design of surgical simulator consists of three major components: (1) integrated immersive virtual patient/environment visualization module, (2) haptic interface module, and (3) motion tracking and machine learning feedback module. The integrated immersive virtual patient/environment visualization provides a realistic environment for the trainee. Also, given that surgical rooms can vary within hospitals as well as between hospitals, the virtual surgical rooms can be customized to mirror a specific room in order to better prepare the trainee on where screens, certain tools, and lights may be oriented. Haptic interface provides trainees with “touch and feel,” which is necessary for their skill training and transfer to real surgical. The motion tracking records their hand motion and quantify each surgical step to be analyzed and categorized by machine learning algorithms to distinguish the level of skillfulness for certain tasks. The comparative analysis will show the difference of a resident-in-training and an experienced surgeon. Using machine learning algorithms (e.g., [15]), the feedback will be provided for designing the next practice set. Figure 1 shows a preliminary model of a digital patient, a typical section procedure with haptic interface and associated surgical environment.

Fig. 1.
figure 1

Prototype of surgical simulator with organ removal function on a digital patient in a virtual surgical room: (a) Virtual surgical room environment with a covered digital patient laying on the surgical table and surgical lighting fixtures (not shown). (b) The digital patient with clamped open heart and connected to the tubes of a perfusion pump. (c) A realistic digital patient with detailed muscle group and bone structures. (d) A digital patient laying down on his side along with a digital nurse checking his vital signs, (e) a digital patient on tan operation table. A user can use two-hand haptic control for virtual operation. Both biopsy needle and scalp are shown in the figure. (f) A digital patient with exposed internal organ for virtual biopsy operation.

We have developed a 3D modular virtual system that can be visualized by immersive visualization devices such as Oculus Rift [16] that allows a user to perform incision and organ removal operation (see Fig. 1e). The visualization framework we developed is based on the open source Processing programming language. Processing is a set of libraries (http://processing.org) that can be considered as an extension of Java language. The Processing language offers capability of rapid development of visualizations while providing an environment that is easy to learn. We chose this environment to make the development of custom scientific visualizations easy for researchers, to minimize the time spent on visualization development. In addition, the platform also has the added benefit of being suitable for real-time rendering to any platform running Java. In addition to Processing language and the MPE library, we have added our own framework for programmable data-driven visualizations, integration with the Oculus Rift Virtual Reality headset, and integration with the motion and orientation controller the Sixense Razer Hydra, OMNI/Phantom, and Quanser/HD2 haptic devices.

The developed programmable data-driven visualization framework - Immersive Data Visualizer (IDV), consists of eight components: (1) a Wavefront.obj file loader, (2) an XML data file loader, (3) a 3D force-directed graph algorithm, (4) a rendering module, (5) an animation module for time series visualization, (6) a Oculus Rift VR headset integration module, (7) a Sixense Hydra Razor controller integration, and (8) a VizWall (large tiled screen) integration module. The IDV can also be used for finite-element simulation and visualization in engineering applications. In fact, real-time finite element simulations can be programmed in Processing and displayed on the VizWall and Oculus Rift. Alternatively, MATLAB can be used to generate finite-element simulations and the data can be saved as.csv point cloud file, and an optional.csv link file. The visualization can then, optionally, be programmatically manipulated and combined with other external data loaded by the XML file. To achieve visual realism, both professional version of 3ds Max (http://www.autodesk.com/products/3ds-max/overview) and UNITY (http://unity3d.com/unity) software are used to create digital patients and surgical room environment.

3 Surgery Training and Pattern Quantification by Deep Learning Algorithm

3.1 Abstraction of Surgical Procedures

As discussed previously, the challenge to train medical students to be a future surgeon is how to effectively transfer knowledge and experience from a skillful surgeon to a resident-in-training. The current practice heavily relies on oral instructions with heuristic commends. If typical surgical procedures can be quantitatively described, it would be much easier to teach surgical steps and correct mistakes with precise instructions and commands. To that end, we took two surgical procedures and made two abstract scenarios so that machine learning can be applied. The first scenario is chosen to be incision procedure, which is usually performed by cutting through tissues by following marked line segments. We designed a template with six letters that represent various curves and sharp turns (Fig. 2). The participants were asked to trace the letter accurately with a time limit in mind. Also, the elbow of the drawing hand cannot touch the desk for support while tracing the letters.

Fig. 2.
figure 2

First abstract surgical procedure for machine learning: tracing six letters C, S, Δ, H, P, S.

The second surgical abstraction emulates biopsy operation. In Fig. 3, three circles represent an organ with embedded tumor (top), a nerve bundle (bottom left) and a blood vessel (bottom right), respectively. To increase the level of difficulty of emulated surgical procedure, various sizes and distances are designed so that different biopsy path needs to be chosen in order not to damage either the nerve bundle or the blood vessel. The participants were asked to draw a straight line from the bottom of the square to the black spot representing the tumor. It is required that the line has to be drawn as straight as possible with a time limit in mind. Also, the elbow of the drawing hand cannot touch the desk for support while drawing.

Fig. 3.
figure 3

Second abstract surgical procedure for biopsy tumor tissue embedded in a normal organ (top circle with black spot). The biopsy path cannot penetrate either the nerve bundle (the circle to the bottom left) or the nearby major blood vessel (the circle to the bottom right).

3.2 Data Generation and Imaging Processing

The imaging data for the first abstract surgical scenario are generated by tracing the letters to mimic the incision procedure. One set of images are treated as the master patterns and used as references for comparison (Figs. 2 and 3). Fifteen sets of images that constitute 7,200 letter-tracing images were made. The imaging data for the second abstract are generated by drawing 6,000 biopsy line images. There were fourteen participants who represent fourteen inexperienced residents. The original template is treated as work from the experienced surgeon. Six letters are chosen to represents both smooth and sharp turns. Five biopsy images are designed to represent various sizes and distances for tumor, nerve bundle, and blood vessel. As instructed, the biopsy line needs to be drawn from the bottom and reach to the tumor (black spot in the middle) without touching the nerve bundle (circle on the bottom left filled in with small dots) and the blood vessel (circle on the bottom right filled with big dots). Otherwise, the image will be considered as surgical accident. All images are scanned and processed for machine learning (see detailed processing steps in Sect. 4.2).

3.3 Deep Learning Algorithms for Pattern Comparison and Feature Extraction

Machine learning algorithms are a set of methodology that automatically detects features and patterns in the data, which can be used either for classification and decision-making. Machine learning, as a scientific discipline, is widely used in many areas [17]. It is even more so after Hinton et al. demonstrated that training process can be accelerated by using deep belief network and efficient gradient calculation by contractive divergence [12]. In this paper, we are interested in exploring applications of deep learning algorithm to quantify features and patterns of surgical procedure illustrated by two abstract scenarios so that surgical outcomes between an experienced surgeon and a resident-in-training so that identified patterns can be objectively compared.

The classical deep learning algorithm is built on neural network by stacking single-layer Restricted Boltzmann Machine (RBM) onto each other to form so-called Deep Belief Network (DBN) [12]. By recognizing difficulty in training a densely-connected, directed belief net with many hidden layers, Hinton et al. pointed out that poor approximation of true conditional distribution due to either presumptuous independency or scalability as number of parameters increases. To overcome this challenge, they presented the deep belief network (DBN) model in which the two hidden layers form an undirected associative memory and the remaining hidden layers form a directed acyclic graph that converts the representations in the associative memory into observable variables such as the pixels of an image. This algorithm extracts features and patterns to form of bit vectors. By comparing these vectors, it is possible to adjust model parameters to produce predicted data closer to the given training data. Thus, this algorithm can be used as a form of unsupervised learning. Considering the learning process by a resident-in-training, it seems that the learning style is very similar - medical students learn how to perform surgery through a show-and-tell apprenticeship. However, it is very difficult for a resident-in-training to translate what he/she hear and see into hand motions precisely. With machine learning to quantify the difference, it is possible to make this translation in a more precise manner so that the medical training can be more efficient.

Deep learning perhaps is one of the most rapid growing fields in the areas of machine learning (see a collection of review papers in [18] and latest review [19]). In this paper, we have adopted Hinton’s deep learning MATLAB code (http://www.cs.toronto.edu/~hinton/) with modifications so that it will be applicable to images we obtained from two abstract surgical scenarios. We used both DBN classifier and autoencoder to process the image data and seek for features and patterns between the reference images as the surgical outcome of an experienced surgeon and images generated by fourteen participants. The learning rate and other parameters in the code are adjusted for optimal learning.

4 Results and Discussion

4.1 Virtual Surgical Simulation Environment

To ensure training effectiveness, our Virtual Surgical Training (VST) System is designed with built-in advanced features of monitoring, alarming, engineering changes based on increased knowledge of biomechanical interactions during surgeries. To make the VST System more useful for medical education, adverse events are designed with built-in results of unexpected emergency scenarios, which are based on human error or operational patient risk factors documented in the literature or real surgical cases. In VST System, when the emergency scenarios do arise, it is expected for a resident-in-training to take the first step is to recognize the problem, then make assessment of the extent of the problem, and finally formulate a solution and proceed to perform the surgery with formulated solution in a systematic step-wise methodical manner. In almost all the emergency cases, time is a critical factor. All the steps mentioned above have to be accomplished in a few minutes with or without availability of additional consultation and assistance from more experienced surgeons.

Our current prototype system (Fig. 1, also [20]) uses the Oculus Rift Virtual Reality headset provides an immersive 3D visualization environment for guiding and controlling simulations. Other viewers are able to watch on the large tiled screen called VizWall as an effective education tool. In addition, the MPE environment (provided by TACC at the University of Texas at Austin) makes loading data, models, and animations from a variety of sources easy and intuitive. The data can contain reference to 3D models, animations, CSV point clouds, CSV link clouds, and CSV/XML topological data. Point clouds can contain additional metadata such as vectors, colors, scales, rotation, and OBJ model name or number. Also, data-driven visualization can be expanded with Fruchterman-Rheingold force-directed graph algorithm, and parameters of that expansion can be changed in the data-driven XML file. Furthermore, in addition to data-driven visualization, after the data is loaded, the user can easily program visualizations that dynamically modify point locations and links based.

4.2 Training Results and Applications

To emulate incision procedure, we have participants write on templates to trace the letters as quickly as possible to introduce some variation in the lines so that they resemble scalpel cutting on marked traces. For the biopsy procedure, we have participants start at the bottom edge of each figure and draw a straight line up towards the black dot (tumor region), which signifies the presence of some cancer that needs a biopsy or other surgical intervention. After scanning all marked images, we obtain a dataset of images we number and group by participant.

Each of our fourteen participants marked on 80 rows similar to those shown in Fig. 4. The template rows of letters are evenly distributed across 20 sheets of papers with 4 rows per sheet. Similarly, the biopsy lines were drawn by each of our participants with 4 rows per sheet and 20 sheets per participant (Fig. 5). The participants used a colored pen that makes image-processing techniques for marking line extraction process much simpler.

Fig. 4.
figure 4

One row of images from letters image dataset

Fig. 5.
figure 5

One row of surgical lines obtained from image dataset

The extraction process of marked images is straightforward. First we use a SnapScan S510 M to scan in all of the sheets. Each of the participant markings for both letters and surgical lines is scanned into one PDF file. We use a GIMP 2 plugin to quickly convert PDFs into PNG images and save them to separate directories numbered 1 to 14 for each participant. In addition, some of the template sheets were scanned in upside down. Rather than flipping them manually, the Matlab flipud() function makes the process simple to program and the end result is the same. Given images of letter markings each containing four rows of C, D, Δ, H, P, and S, we isolate the bounds of each letter on the page and extract the colored marking. In the process of images of marking extraction, the marking image is downsampled and converted to a 63 pixel by 70 grayscale image consisting of double floating point value between 0.0 and 1.0. The purpose of downsampling is to reduce the total amount of data per image, and thereby reduce the total dataset size. Each of the isolated grayscale images is added to a MATLAB matrix named “letters”. Similarly, given images containing four rows of surgical line markings, we isolate the bounds of each box containing a surgical scenario with line markings, and perform a similar colored pen extraction tailored for extraction from white backgrounds. For both of these extractions, we wrote a MATLAB script for preprocessing to obtain the data set. Each of the isolated grayscale images is added to a MATLAB matrix named “surglines”.

Next, the “letters” matrix was converted to batches of 96 letters and processed with a Deep Belief Network (DBN). First, we isolate the extent of each letter or surgical line marking and produce 4 random shifts that keep each marking within the bounds of the image. We use an autoencoder DBN and pre-train a series of stacked Restricted Boltzmann Machines (RBM) with layer sizes 4410-2100-1050-525-30. To train the RBMs we employ the wake-sleep algorithm with one step of the Contrastive Divergence algorithm per epoch (CD1). Then, we unroll the neural network into a neural network with layer sizes 4410-2100-1050-525-30-525-1050-2100-4410 and train by applying conjugate gradient with 4 lines. On an 8 core Intel machine at 3.5 GHz and 32 GB of RAM this training process takes approximately 18 h to reach a batch test set MSE of 39.28 after 131 epochs (Fig. 6). However, we expect that the code will run significantly faster on a GPU based computer. We apply a similar technique with the “surglines” matrix. However, we instead use a batch size of 100 and a input/reconstruction layer of size of 7007, due to the larger image size of 91 by 77 pixels. After training we obtain a batch test set MSE of 8.59 after 101 epochs of conjugate gradient.

Fig. 6.
figure 6

The letters during different phases of training using the deep learning algorithm

At the end of training, we obtained the results (Figs. 7 and 8) that letters and surgical lines can be represented by a series of 30 floating-point values that are strongly correlated with characteristics of the shapes of the letters involved and the individual characteristics of the participants.

Fig. 7.
figure 7

Original letters (top) reconstructed (bottom) from 30 floating point values on test MSE of 39.28 after 131 epochs.

Fig. 8.
figure 8

Original surgical (top) reconstructed (bottom) from 30 floating-point values on test set with MSE of 8.59 after 101 epochs.

The plots below (Figs. 9 and 10) show characteristic vectors of 30 floating point numbers that can be used to determine unique characteristic of surgical lines that can be used to quantitatively assess the characteristics of surgical techniques. Further study may allow the choice of an optimal surgical line for a surgical scenario based on the association of characteristic vectors with outcomes by applying a classification based Deep Belief Network.

Fig. 9.
figure 9

Typical plots of the mean of the 30 floating point values for each participant and each of the six letters as a grayscale bar plot. Each participant has his or her own unique characteristics and mean pattern that is generally consistent for their tracing patterns (only Participant 1, 2, 3, 4 and 6 shown here).

Fig. 10.
figure 10

Typical plots of the mean of the 30 floating point values for each participant and each of the five surgical scenarios as a grayscale bar plot. Each participant has their own unique mean pattern that is generally consistent for their surgical lines.

5 Conclusions and Future Work

We have demonstrated that the prototype of a virtual reality surgical system is built for open-heart surgery with specific steps and biopsy operation. We analyzed two abstract surgical scenarios designed to emulate incision and biopsy surgical patterns using deep learning algorithm. It is found that two surgical patterns generated by each participant can be uniquely characterized by a vector with 30 real-valued (floating-point) components. These vectors can be used to compare how a resident-in-training performs differently as opposed to an experienced surgeon. We plan to further investigate the correlation of these characteristic vectors with the patterns generated by various hand motions. We will also study the relationship between these vectors with cutting force, surgical path, duration of each cut, and other surgical factors of interest.