Introduction

In its broad definition, augmented reality (AR) can be understood as technology capable of augmenting the operator’s capabilities to perform defined tasks by providing holographic or auditive information [1]. Visual cues are commonly provided through head-mounted devices with transparent displays, which allow to directly overlay the real world with computer-generated images and graphics. Meanwhile, AR has emerged to a powerful technology for surgery with the potential of improving surgical skills, reducing the risk of surgical errors, improving the accuracy of surgical execution, and reducing operating room time [2, 3]. The application field in orthopedics is broad, ranging from instrument/implant positioning to osteotomies, tumor and trauma surgery, and surgical training and education. Especially AR head-mounted displays (HMD) allow for intuitive in situ visualization of surgical planning and important anatomical structures. This surgeon-centered approach reduces mental load by visualizing additional information directly onto the patient's anatomy [4]. These advantages have led to a rising number of preclinical and cadaveric studies on AR-HMD in orthopedics [5,6,7,8,9,10].

In vivo studies in the operating room (OR), representing the next step toward the clinical implementation of these innovations, however have been scarcely described to date [11, 12]. As most of the reported studies take place in a laboratory environment [5,6,7,8,9,10], the true technical performance of AR-HMDs in a realistic OR scenario still requires further analysis.

The OR places different demands on the device compared to a controlled laboratory environment. Specific OR conditions such as special LED light sources, movement of OR personnel, reflective surfaces, and a large variance of different textures may pose challenges for sensor-based HMD systems. The sensors of these systems such as the infrared time of flight (TOF) sensor of Microsoft's HoloLens2 (Microsoft, Redmond, WA, USA) [13] exhibit relevant performance differences depending on the aforementioned factors [14,15,16,17,18]. The primary aim of this study was to quantify expected differences of AR-HMD performance in an OR versus a controlled laboratory setting. Further, we aimed to find the factors that might influence the basic technical performance of an AR-HMD based surgical application in a real OR.

Methods

We hypothesized that an AR-HMD performs differently in a real OR versus a laboratory setting and that different OR conditions have a significant influence on its performance. We conducted two series of experiments. The first series directly compared the performance of the device between a laboratory and an OR setting. The second series was designed to investigate the confounding factors that may be present in an OR.

Experimental setup

A lumbosacral spine phantom (Synbone AG, Zizers, Switzerland) was used. Preoperative CT images (SOMATOM Edge Plus, Siemens Healthcare GmbH, Erlangen, Germany) of the specimen were acquired to conduct the 3D surgical planning. For this 3D models were generated using the Mimics Medical software (version 19.0; Materialize NV, Leuven, Belgium) and imported into an 3D surgical planning software (CASPA, version 5.26, Balgrist, Zurich, Switzerland). Planning for each vertebra (L1 to L5) was conducted by an orthopedic surgeon.

Test application

The performance of the HoloLens 2 (see Fig. 1) was analyzed using the HoloNavigation application (HoloNavigation version 1.30.0.0, Incremed AG, Zurich, Switzerland). This application has previously been used in laboratory studies [7, 19] and a first-in-man experiment has been reported [11]. The application serves as a representative example since it is extensively evaluated in the laboratory environment [7, 19], but also approved and validated for patient treatment within a highly controlled first-in-men in vivo study [11]. Moreover, the application utilizes commonly used hardware and software components that are also found in comparable AR-based surgical devices (HoloLens 2, marker-based tracking, surface registration).

Fig. 1
figure 1

a HoloLens2 b PD equipped with clear guide marker c drill sleeve equipped with clear guide marker

The two main components of the application are intraoperative registration and the surgical navigation [7, 11, 19]. In our study, we focus only on the evaluation of the registration component. The components rely on monocular marker tracking and pose estimation using the RGB camera of the HoloLens in combination with optical markers (Clear Guide Medical, Baltimore MD, USA, see Fig. 1). Their patterns originate from the AprilTags library [20, 21]. The markers have to be mounted on a pointing device (PD; see Fig. 1).

The application workflow was as follows. Upon startup, the user is asked to place the world anchor, which serves as the origin of the coordinate system of the HoloLens. World anchors are very important as they fix holograms to a position in the real world by using certain features in the anchor's environment to calculate its position. Anchoring is persistent and works if users change their position significantly. Afterward, the registration step is performed. An initial alignment is achieved by landmark identification, where three points (the spinous process, left and right transverse process) are marked using the PD. Next, surface digitization is performed by scanning the surface of the dorsal structures of the vertebra with the PD to collect points for subsequent registration (see Fig. 2 (a)). The registration of the thereby collected points is done via iterative closest point registration (ICP) to obtain the final alignment [7]. After ICP application, the registered surgical planning is displayed in situ as a 3D hologram (see Fig. 2b).

Fig. 2
figure 2

View from HoloLens-user during experiments (a) scanning of vertebra surface with the PD, green point and line are already saved points, yellow line represents currently sampled points (b) displayed 3D hologram of registered surgical planning and vertebra model

Evaluation procedure

The first test series was performed in standardized setups in the lab (SSlab) and OR (SSOR, see Fig. 3). The SSOR was designed based on our own experience with AR-HMDs and description from literature regarding the performance of the HoloLens sensors depending on environmental conditions [14,15,16,17,18].

Fig. 3
figure 3

Difference between SSlab (a) and SSOR (b), in both images, one of the resident surgeons can be seen in full surgical attire wearing the Hololens2

It consisted of the aforementioned lumbosacral spine phantom, surgical drapes, a conventional OR table, and two partition walls consisting of standard OR stands covered with additional drapes (see Fig. 4). For the SSOR setup, the anchor placement was chosen at a distance of one meter at eye level of the surgeon (see Fig. 4) and care was taken to rotate the OR lights away from the situs so that the PD marker was not overexposed by direct light. Partition walls were placed to reduce the reflections and movement in the direct vicinity of the HoloLens. Since there are more light sources, more reflective surfaces and more movement in the OR setup, all aforementioned measures have been taken to create an optimal environment for the HoloLens. By these measures, SSlab and SSOR ultimately differ in that SSOR has more and different light sources a different floor and more reflective surfaces in the distance.

Fig. 4
figure 4

Illustration of the four different altered OR setups (AOS) (a) OR light setup, red = light focused on 3D model, yellow = anchor location (b) assistant setup, red = assistant, yellow = anchor location (c) anchor setup, red and yellow = anchor location (d) partition setup, red = missing partition, yellow = anchor location

The second test series was performed in the OR, whereby different parameters of the SSOR were systematically changed as follows:

  • OR light setup (AOSlight): The OR lights were positioned directly above the model and the markers.

  • Assistant setup (AOSassistant): An assistant positioned himself on the opposite side and performed a standardized movement sequence in randomized order. The sequence included approaching the OR table, simulating surgery assistance, occluding the anchor, and walking away to the instrument table.

  • Anchor setup (AOSanchor): The anchor was placed far away from the OR table.

  • Partition setup (AOSpartition): No partitioning was used in contrast to the SSOR.

Performance analysis

The experiments were performed by two resident surgeons, each performing half of the trails and with initial experience using the test application.

Five parameters were defined and assessed (see Table 1). Registration accuracy was defined as the maximum error of the hologram superimposition on the bone model (see Fig. 2) in mm, measured by the surgeon using the PD diameter (3 mm) as a reference. Time to full registration (seconds) was recorded from the start of initial landmark registration to the end of the point cloud acquisition. The number of rejections was recorded. The lag of the live superimposition of the PD hologram on to the PD model defined as the update speed of the hologram during movement was graded as 1 (bad), 2 (acceptable), or 3 (excellent) by the surgeon. Failures were defined either as a registration failure (> 3 rejected point clouds) or a superimposition failure (> 45° angulation or shift of the hologram > half vertebra) and subsequently recorded. In the event of a failure, an additional test was added to the test series since the individual test did not provide any usable data regarding the other parameters.

Table 1 List of outcome measurements

Statistics

Based on preliminary testing, a sample size calculation was performed (alpha: 0.05, power: 80%) assuming a median registration accuracy of 2 ± 2.5 mm in the SSlab setting to detect a decrease in registration accuracy of 1 mm in the SSOR setting. The calculation resulted in a total number of 272 trials to establish a difference between SSlab and SSOR settings. To investigate the difference between the SSOR and AOS settings, we performed another sample size calculation. Since we compared multiple AOS trials with SSOR trials, the alpha error was Bonferroni corrected to 0.0125. Power was set to 80%. Assuming a registration accuracy of 2 ± 2.5 mm in the SSOR setting, detecting a decrease in 1 mm would require 136 and 67 trials for SSOR and for each AOS, respectively.

Descriptive statistics were reported as median and IQR for skewed data and as counts and percentage for categorical variables. Normal distribution was assessed using the Shapiro–Wilk test. Differences between the settings (SSlab vs. SSOR, respectively, SSOR vs. the four AOS) were evaluated using Mann–Whitney U for skewed data (numerical values) and Chi-square or Fisher’s exact test (categorical variables). The significance was set < 0.05. Data were analyzed with SPSS version 23 (SPSS Inc, Chicago, IL, USA).

Results

A total of 560 trials were conducted, 273 were carried out in the SSlab and SSOR settings and 287 in the AOS setting (see Fig. 5). Of the 273 standardized-setup trials, 136 were performed with SSlab and 137 with SSOR. Of the 287 AOS trials, 74 were conducted in the AOSlight, 69 in the AOSassistant, 70 in the AOSanchor, and 74 in the AOSpartition setting.

Fig. 5
figure 5

Flowchart of trials selection

Registration accuracy

A median registration accuracy of 2 mm (IQR 0.5 mm) was achieved in SSlab and 2 mm (IQR 2 mm) in SSOR (see Table 2). There was no significant difference in registration accuracy (p = 0.947).

Table 2 Comparison between SSlab and SSOR

A significant difference in registration accuracy between SSOR and AOSlight (3 mm (IQR 2 mm), p < 0.001, see Table 3) and AOSanchor (1.5 mm (IQR 1 mm), p = 0.012, see Table 4), respectively, was observed. The remaining two AOS settings had no significant effect on registration accuracy (p = 1.00 and p = 1.00).

Table 3 Comparison between SSOR, AOSlight, and AOSassistant
Table 4 Comparison between SSOR, AOSanchor, and AOSpartition

Time to full registration

We found a significant difference between the SSlab and SSOR settings in terms of the time required for completion of registration (33 s (IQR 10 s) vs. 48 s (IQR 24 s), p < 0.01, see Table 2).

Statistically significant differences regarding time to full registration between the SSOR and the following AOS settings could be shown:

  • AOSlight: median time 63.5 s (IQR 21 s (p < 0.01)).

  • AOSassistant: median time 53 s (IQR 12 s (p = 0.032)).

  • AOSanchor: median time 42 s (IQR 9 s (p < 0.01)).

Removal of the partition (AOSpartition) did not significantly change the time to full registration (46.5 s, IQR 18 s, p = 1.0).

Number of rejected point cloud acquisitions

No significant difference between the SSlab and SSOR setting was found regarding the number of rejected point clouds (p = 0.898, see Table 2).

A significant difference in the number of rejected point clouds could be observed only between the SSOR and AOSlight settings (p = 0.004, see Table 3).

Lag of the live superimposition hologram

There was no significant difference in hologram lag between the SSlab and the SSOR settings (p = 0.327). However, we found significant differences between the SSOR and the AOSlight setting (p < 0.001, see Table 3) as well as in the AOSassistant setting (p < 0.001, see Table 4).

Failed attempts

The number of failed attempts did not differ significantly between the SSlab and SSOR setting (p = 1). However, in direct comparison between the SSOR setting and the four AOS settings light, assistant, anchor, and partition, we found significant differences regarding the number of failures for AOS light and partition (p = 0.032 and p = 0.032, respectively).

Discussion

Despite increasing research on the use of AR in orthopedics, systematic evaluations of differences in AR-HMD performance between the laboratory and OR setting were lacking hitherto. We found that there are such differences (e.g., time for full registration) and that they can be influenced by variation of the OR setting, such as the light conditions.

The greatest effect was found through the AOSlight trials. Direct light illumination led to a significant worsening of all analyzed parameters. The narrow light cone led to a position-dependent illumination of the PD, ranging from strongly to poorly illuminated sections. In addition, the marker surface itself sometimes reflected the light strongly. The reason for the observed worse performance of the AR-HMD could be an impaired functioning of the infrared sensor, RGB and greyscale cameras due to increased noise in underexposed areas or direct light exposure of these sensors [14, 15]. This is of special importance for the autofocus of the RGB camera and, consequently, for the marker tracking in the used application. The high range of contrast between strongly and weakly exposed areas as well as the strong difference in light intensity of the marker depending on its position in or outside the OR light could exceed the capture range of the camera, potentially resulting in a poor tracking performance. The clinical conclusion is to avoid direct light on the HoloLens and to switch off or turn away the OR lights during the registration step to avoid over and under exposed areas.

We observed a decreased performance of HoloLens 2 regarding time required for registration and hologram lag in the AOSassistant setting. This could be due to motion blur, a common influence factor in camera tracking that has been shown to negatively impact performance [14, 22, 23]. Since spatial features are constantly tracked by the TOF and grayscale cameras, the motion of the assistant could have led to a distortion, resulting in the observed increased lag and thus longer time to complete the registration. Additionally, our experience during the tests showed that fast movements of the PD led to more errors, e.g., in point collection, which could also be explained by motion blurring during marker tracking with the RGB camera. The conclusion would be to be constant and slow in collecting points and to avoid unnecessary movements in the OR.

In the AOSanchor trials, we disproved our hypothesis that remote placement of the world anchor would lead to performance limitations, since distance estimation worsens with increasing distance to the sensor [15, 17, 18]. Presumably, the environment tracking is sufficient directly after the anchor placement, which is why a measurable performance decrease was not observed.

In AOSpartition, we observed an increased number of registration failures. The decrease in performance without partition could either be attributed to the additional reflective surfaces, the further distance to the wall on the opposite side of the examiner, or/and to an increased effect of ambient light. Tracking via infrared camera is more likely prone to errors as the distance between object and sensor increases [15, 17, 18]. Furthermore, reflective surfaces are shown to reduce the performance of the infrared sensor [14,15,16]. In addition, a greater variation in surface materials could also reduce the performance of infrared camera tracking [15, 18, 24]. Furthermore, grayscale cameras also seem to be influenced by different ambient light conditions [25]. Since these are used for the HoloLens self-localization, increased ambient light due to the removal of the partition could be a further factor contributing to a decrease in performance.

Our findings show that the performance of state-of-the-art AR-HMDs is influenced through a variety of factors present in the OR setting. Our results identified direct light illumination, motion blur caused by movement of personnel, or increased reflections as factors that may impair the performance of the HoloLens. However, measures can be taken to standardize the OR environment such that the performance is less impaired. This is underpinned by the fact that registration time was the only statistically significant difference between the standardized OR setup (SSOR) and the laboratory environment. Recommended standardization measures include avoiding direct light illumination on the situs, setting up partitions, and minimizing the movement of personnel. If these measures are considered, a similar registration accuracy, number of failed attempts, lag, and number of rejected point clouds can be expected. However, the time to full registration remains increased. Based on these experiments we recommend to take time for the registration step in order to minimize retries.

Limitations

One limitation of our study is that the measurement method for assessing the registration accuracy hologram overlay is examiner-dependent. We tried to address this limitation by employing 2 examiners. However, the learning curve of the two surgeons is a further factor possibly influencing the results of this study. To reduce the effect of learning on the measured outcomes, the order in which the surgeons performed the different series of experiments was randomized. Additionally, both surgeons had similar experience in the use of the same application due to previous pilot tests. Another limitation is that intervention and patient-specific factors were not analyzed. This influence must be considered in in vivo surgery and further research on the influence of these factors is urgently needed. However, this work is one of the first to analyze the environmental impact of the OR on the performance of a state-of-the-art AR-HMD.

Conclusion

AR-HMDs are affected by different OR setups. Standardization measures for better AR-HMD performance include avoiding direct light illumination on the situs, setting up partitions, and minimizing the movement of personnel. Although AR has arrived in the OR further improvements and research are necessary to establish the technology in surgical treatment. Particularly, the combination of AR and AI will foster this process.