Factors affecting augmented reality head-mounted device performance in real OR

Over the last years, interest and efforts to implement augmented reality (AR) in orthopedic surgery through head-mounted devices (HMD) have increased. However, the majority of experiments were preclinical and within a controlled laboratory environment. The operating room (OR) is a more challenging environment with various confounding factors potentially affecting the performance of an AR-HMD. The aim of this study was to assess the performance of an AR-HMD in a real-life OR setting. An established AR application using the HoloLens 2 HMD was tested in an OR and in a laboratory by two users. The accuracy of the hologram overlay, the time to complete the trial, the number of rejected registration attempts, the delay in live overlay of the hologram, and the number of completely failed runs were recorded. Further, different OR setting parameters (light condition, setting up partitions, movement of personnel, and anchor placement) were modified and compared. Time for full registration was higher with 48 s (IQR 24 s) in the OR versus 33 s (IQR 10 s) in the laboratory setting (p < 0.001). The other investigated parameters didn’t differ significantly if an optimal OR setting was used. Within the OR, the strongest influence on performance of the AR-HMD was different light conditions with direct light illumination on the situs being the least favorable. AR-HMDs are affected by different OR setups. Standardization measures for better AR-HMD performance include avoiding direct light illumination on the situs, setting up partitions, and minimizing the movement of personnel.


Introduction
In its broad definition, augmented reality (AR) can be understood as technology capable of augmenting the operator's capabilities to perform defined tasks by providing holographic or auditive information [1]. Visual cues are commonly provided through head-mounted devices with transparent displays, which allow to directly overlay the real world with computer-generated images and graphics. Meanwhile, AR has emerged to a powerful technology for surgery with the potential of improving surgical skills, reducing the risk of surgical errors, improving the accuracy of surgical execution, and reducing operating room time [2,3]. The application field in orthopedics is broad, ranging from instrument/implant positioning to osteotomies, tumor and trauma surgery, and surgical training and education. Especially AR head-mounted displays (HMD) allow for intuitive in situ visualization of surgical planning and important anatomical structures. This surgeon-centered approach reduces mental load by visualizing additional information directly onto the patient's anatomy [4]. These advantages have led to a rising number of preclinical and cadaveric studies on AR-HMD in orthopedics [5][6][7][8][9][10].
In vivo studies in the operating room (OR), representing the next step toward the clinical implementation of these 1 3 innovations, however have been scarcely described to date [11,12]. As most of the reported studies take place in a laboratory environment [5][6][7][8][9][10], the true technical performance of AR-HMDs in a realistic OR scenario still requires further analysis.
The OR places different demands on the device compared to a controlled laboratory environment. Specific OR conditions such as special LED light sources, movement of OR personnel, reflective surfaces, and a large variance of different textures may pose challenges for sensor-based HMD systems. The sensors of these systems such as the infrared time of flight (TOF) sensor of Microsoft's HoloLens2 (Microsoft, Redmond, WA, USA) [13] exhibit relevant performance differences depending on the aforementioned factors [14][15][16][17][18]. The primary aim of this study was to quantify expected differences of AR-HMD performance in an OR versus a controlled laboratory setting. Further, we aimed to find the factors that might influence the basic technical performance of an AR-HMD based surgical application in a real OR.

Methods
We hypothesized that an AR-HMD performs differently in a real OR versus a laboratory setting and that different OR conditions have a significant influence on its performance. We conducted two series of experiments. The first series directly compared the performance of the device between a laboratory and an OR setting. The second series was designed to investigate the confounding factors that may be present in an OR.

Experimental setup
A lumbosacral spine phantom (Synbone AG, Zizers, Switzerland) was used. Preoperative CT images (SOMATOM Edge Plus, Siemens Healthcare GmbH, Erlangen, Germany) of the specimen were acquired to conduct the 3D surgical planning. For this 3D models were generated using the Mimics Medical software (version 19.0; Materialize NV, Leuven, Belgium) and imported into an 3D surgical planning software (CASPA, version 5.26, Balgrist, Zurich, Switzerland). Planning for each vertebra (L1 to L5) was conducted by an orthopedic surgeon.

Test application
The performance of the HoloLens 2 (see Fig. 1) was analyzed using the HoloNavigation application (HoloNavigation version 1.30.0.0, Incremed AG, Zurich, Switzerland). This application has previously been used in laboratory studies [7,19] and a first-in-man experiment has been reported [11]. The application serves as a representative example since it is extensively evaluated in the laboratory environment [7,19], but also approved and validated for patient treatment within a highly controlled first-in-men in vivo study [11]. Moreover, the application utilizes commonly used hardware and software components that are also found in comparable AR-based surgical devices (HoloLens 2, marker-based tracking, surface registration).
The two main components of the application are intraoperative registration and the surgical navigation [7,11,19]. In our study, we focus only on the evaluation of the registration component. The components rely on monocular marker tracking and pose estimation using the RGB camera of the HoloLens in combination with optical markers (Clear Guide Medical, Baltimore MD, USA, see Fig. 1). Their patterns originate from the AprilTags library [20,21]. The markers have to be mounted on a pointing device (PD; see Fig. 1).
The application workflow was as follows. Upon startup, the user is asked to place the world anchor, which serves as the origin of the coordinate system of the HoloLens. World anchors are very important as they fix holograms to a position in the real world by using certain features in the anchor's environment to calculate its position. Anchoring is persistent and works if users change their position significantly. Afterward, the registration step is performed. An initial alignment is achieved by landmark identification, where three points (the spinous process, left and right transverse process) are marked using the PD. Next, surface digitization is performed by scanning the surface of the dorsal structures of the vertebra with the PD to collect points for subsequent registration (see Fig. 2 (a)). The registration of the thereby collected points is done via iterative closest point registration (ICP) to obtain the final alignment [7]. After ICP application, the registered surgical planning is displayed in situ as a 3D hologram (see Fig. 2b).

Evaluation procedure
The first test series was performed in standardized setups in the lab (SSlab) and OR (SSOR, see Fig. 3). The SSOR was designed based on our own experience with AR-HMDs and description from literature regarding the performance of the HoloLens sensors depending on environmental conditions [14][15][16][17][18].
It consisted of the aforementioned lumbosacral spine phantom, surgical drapes, a conventional OR table, and two partition walls consisting of standard OR stands covered with additional drapes (see Fig. 4). For the SSOR setup, the anchor placement was chosen at a distance of one meter at eye level of the surgeon (see Fig. 4) and care was taken to rotate the OR lights away from the situs so that the PD marker was not overexposed by direct light. Partition walls were placed to reduce the reflections and movement in the direct vicinity of the HoloLens. Since there are more light sources, more reflective surfaces and more movement in the OR setup, all aforementioned measures have been taken to create an optimal environment for the HoloLens. By these measures, SSlab and SSOR ultimately differ in that SSOR has more and different light sources a different floor and more reflective surfaces in the distance.
The second test series was performed in the OR, whereby different parameters of the SSOR were systematically changed as follows:

Performance analysis
The experiments were performed by two resident surgeons, each performing half of the trails and with initial experience using the test application. Five parameters were defined and assessed (see Table 1). Registration accuracy was defined as the maximum error of the hologram superimposition on the bone model (see Fig. 2) in mm, measured by the surgeon using the PD diameter (3 mm) as a reference. Time to full registration (seconds) was recorded from the start of initial landmark registration to the end of the point cloud acquisition. The number of rejections was recorded. The lag of the live superimposition of the PD hologram on to the PD model defined as the update speed of the hologram during movement was graded as 1 (bad), 2 (acceptable), or 3 (excellent) by the surgeon.

Statistics
Based on preliminary testing, a sample size calculation was performed (alpha: 0.05, power: 80%) assuming a median registration accuracy of 2 ± 2.5 mm in the SSlab setting to detect a decrease in registration accuracy of 1 mm in the SSOR setting. The calculation resulted in a total number of 272 trials to establish a difference between SSlab and SSOR settings. To investigate the difference between the SSOR and AOS settings, we performed another sample size calculation. Since we compared multiple AOS trials with SSOR trials, the alpha error was Bonferroni corrected to 0.0125. Power was set to 80%. Assuming a registration accuracy of 2 ± 2.5 mm in the SSOR setting, detecting a decrease in 1 mm would require 136 and 67 trials for SSOR and for each AOS, respectively. Descriptive statistics were reported as median and IQR for skewed data and as counts and percentage for categorical variables. Normal distribution was assessed using the Shapiro-Wilk test. Differences between the settings (SSlab vs. SSOR, respectively, SSOR vs. the four AOS) were evaluated using Mann-Whitney U for skewed data (numerical values) and Chi-square or Fisher's exact test (categorical variables). The significance was set < 0.05. Data were analyzed with SPSS version 23 (SPSS Inc, Chicago, IL, USA).

Results
A total of 560 trials were conducted, 273 were carried out in the SSlab and SSOR settings and 287 in the AOS setting (see Fig. 5). Of the 273 standardized-setup trials, 136 were performed with SSlab and 137 with SSOR. Of the 287 AOS trials, 74 were conducted in the AOSlight, 69 in the AOSassistant, 70 in the AOSanchor, and 74 in the AOSpartition setting.

Registration accuracy
A median registration accuracy of 2 mm (IQR 0.5 mm) was achieved in SSlab and 2 mm (IQR 2 mm) in SSOR (see Table 2). There was no significant difference in registration accuracy (p = 0.947).

Time to full registration
We found a significant difference between the SSlab and SSOR settings in terms of the time required for completion of registration (33 s (IQR 10 s) vs. 48 s (IQR 24 s), p < 0.01, see Table 2). Statistically significant differences regarding time to full registration between the SSOR and the following AOS settings could be shown: • AOSlight: median time 63.5 s (IQR 21 s (p < 0.01)). • AOSassistant: median time 53 s (IQR 12 s (p = 0.032)). • AOSanchor: median time 42 s (IQR 9 s (p < 0.01)).
Removal of the partition (AOSpartition) did not significantly change the time to full registration (46.5 s, IQR 18 s, p = 1.0).

Number of rejected point cloud acquisitions
No significant difference between the SSlab and SSOR setting was found regarding the number of rejected point clouds (p = 0.898, see Table 2).
A significant difference in the number of rejected point clouds could be observed only between the SSOR and AOSlight settings (p = 0.004, see Table 3).

Lag of the live superimposition hologram
There was no significant difference in hologram lag between the SSlab and the SSOR settings (p = 0.327). However, we found significant differences between the SSOR and the AOSlight setting (p < 0.001, see Table 3) as well as in the AOSassistant setting (p < 0.001, see Table 4).

Failed attempts
The number of failed attempts did not differ significantly between the SSlab and SSOR setting (p = 1). However, in direct comparison between the SSOR setting and the four AOS settings light, assistant, anchor, and partition, we found significant differences regarding the number of failures for AOS light and partition (p = 0.032 and p = 0.032, respectively).

Discussion
Despite increasing research on the use of AR in orthopedics, systematic evaluations of differences in AR-HMD performance between the laboratory and OR setting were lacking hitherto. We found that there are such differences (e.g., time for full registration) and that they can be influenced by variation of the OR setting, such as the light conditions. The greatest effect was found through the AOSlight trials. Direct light illumination led to a significant worsening of all analyzed parameters. The narrow light cone led to a position-dependent illumination of the PD, ranging from strongly to poorly illuminated sections. In addition, the marker surface itself sometimes reflected the light strongly.
The reason for the observed worse performance of the AR-HMD could be an impaired functioning of the infrared sensor, RGB and greyscale cameras due to increased noise in underexposed areas or direct light exposure of these sensors [14,15]. This is of special importance for the autofocus of the RGB camera and, consequently, for the marker tracking in the used application. The high range of contrast between strongly and weakly exposed areas as well as the strong difference in light intensity of the marker depending on its position in or outside the OR light could exceed the capture range of the camera, potentially resulting in a poor tracking performance. The clinical conclusion is to avoid direct light on the HoloLens and to switch off or turn away the OR lights during the registration step to avoid over and under exposed areas.
We observed a decreased performance of HoloLens 2 regarding time required for registration and hologram lag in the AOSassistant setting. This could be due to motion blur, a common influence factor in camera tracking that has been shown to negatively impact performance [14,22,23]. Since spatial features are constantly tracked by the TOF and grayscale cameras, the motion of the assistant could have led to a distortion, resulting in the observed increased lag and thus longer time to complete the registration. Additionally, our experience during the tests showed that fast movements of the PD led to more errors, e.g., in point collection, which could also be explained by motion blurring during marker tracking with the RGB camera. The conclusion would be to be constant and slow in collecting points and to avoid unnecessary movements in the OR.
In the AOSanchor trials, we disproved our hypothesis that remote placement of the world anchor would lead to performance limitations, since distance estimation worsens  [15,17,18]. Presumably, the environment tracking is sufficient directly after the anchor placement, which is why a measurable performance decrease was not observed. In AOSpartition, we observed an increased number of registration failures. The decrease in performance without partition could either be attributed to the additional reflective surfaces, the further distance to the wall on the opposite side of the examiner, or/and to an increased effect of ambient light. Tracking via infrared camera is more likely prone to errors as the distance between object and sensor increases [15,17,18]. Furthermore, reflective surfaces are shown to reduce the performance of the infrared sensor [14][15][16]. In addition, a greater variation in surface materials could also reduce the performance of infrared camera tracking [15,18,24]. Furthermore, grayscale cameras also seem to be influenced by different ambient light conditions [25]. Since these are used for the HoloLens self-localization, increased ambient light due to the removal of the partition could be a further factor contributing to a decrease in performance.
Our findings show that the performance of state-of-the-art AR-HMDs is influenced through a variety of factors present in the OR setting. Our results identified direct light illumination, motion blur caused by movement of personnel, or increased reflections as factors that may impair the performance of the HoloLens. However, measures can be taken to standardize the OR environment such that the performance is less impaired. This is underpinned by the fact that registration time was the only statistically significant difference between the standardized OR setup (SSOR) and the laboratory environment. Recommended standardization measures include avoiding direct light illumination on the situs, setting up partitions, and minimizing the movement of personnel. If these measures are considered, a similar registration accuracy, number of failed attempts, lag, and number of rejected point clouds can be expected. However, the time to full registration remains increased. Based on these experiments we recommend to take time for the registration step in order to minimize retries.

Limitations
One limitation of our study is that the measurement method for assessing the registration accuracy hologram overlay is examiner-dependent. We tried to address this limitation by employing 2 examiners. However, the learning curve of the two surgeons is a further factor possibly influencing the results of this study. To reduce the effect of learning on the measured outcomes, the order in which the surgeons performed the different series of experiments was randomized. Additionally, both surgeons had similar experience in the use of the same application due to previous pilot tests. Another limitation is that intervention and patient-specific factors were not analyzed. This influence must be considered in in vivo surgery and further research on the influence of these factors is urgently needed. However, this work is one of the first to analyze the environmental impact of the OR on the performance of a state-of-the-art AR-HMD.

Conclusion
AR-HMDs are affected by different OR setups. Standardization measures for better AR-HMD performance include avoiding direct light illumination on the situs, setting up partitions, and minimizing the movement of personnel. Although AR has arrived in the OR further improvements and research are necessary to establish the technology in surgical treatment. Particularly, the combination of AR and AI will foster this process.
Funding Open access funding provided by University of Zurich. This research received no other external funding.

Data availability
The datasets used and/or analyzed during the current study are available in the zenodo repository: https:// doi. org/ 10. 5281/ zenodo. 71713 26.