1 Introduction

Novel augmented reality (AR) headsets are wearable devices with see-through displays that can render virtual models aligned with real world features (Kress and Cummings 2017b; Park et al. 2021). Alignment may be achieved by using physical image markers that are captured by the headset’s camera, which triggers the rendering of virtual models on the headset’s lenses in a predefined position (Benmahdjoub et al. 2022). This technology can be used to guide surgeons, during the resection of soft tissues by overlaying patient-specific virtual models obtained from medical scans on the patient (Park et al. 2021; Castelan et al. 2021). This application is especially interesting for oncological surgery, where complete tumour removal is key to preventing tumour recurrence. In addition, this technology may enhance the design of free flaps during reconstructive surgery; free flaps are vascularised tissue that is transferred from one part of a patient’s body to the site of a defect to reconstruct anatomical structures that have been damaged, e.g. after tumour removal (Pratt et al. 2018). For this application to be useful, surgeons wearing the headset must be able to localise virtual models accurately. The present study measured the user localisation error (i.e. the difference between the perceived position of features on the virtual models and their actual position), as well as errors in the perceived position and size of the virtual models as a whole and discusses the potential effect of these errors on the outcomes of surgeries that involve the resection of superficial soft tissues.

1.1 The challenge of accurate soft tissue resection in oncological and reconstructive surgery

Accurate soft tissue resection is critical for surgical success as optimal resection margins are considered the most important factor for tumour control and prevention of repeat intervention (Sheoran et al. 2022; Sugiura et al. 2018).

Soft tissue free flaps are vascularized soft tissue (i.e. skin, fascia and sometimes muscle) transferred from one part of a patient’s body to the site of a defect to restore the function of damaged anatomical structures (Hallock 2009). Perforator vessels originate in a main source artery, pierce the deep fascia (a layer of dense connective tissue that surrounds muscles) and branch into subcutaneous vessels (Higueras Suñé et al. 2011). The accurate localisation of perforator vessels is key to preventing impaired blood flow and flap necrosis (Wilson et al. 2009). Flaps must also have adequate shape and dimensions to fit defects. Oversized flaps may create noticeably raised flap tissue compared to the surrounding tissue after their attachment to the site of the defect (Klaassen et al. 2018). Undersized flaps are prone to distortions due to tissue tension and contraction during the healing process, thus increasing the risk of flap necrosis and/or undesired aesthetic results (Feng et al. 2017). To prevent undersized flaps, surgeons normally add a portion of tissue surrounding the flap, e.g. 5 mm (Feng et al. 2017). If the resulting flap is oversized, surgeons trim the flap before its attachment to the site of the defect. A real time image guidance tool for flap design does not exist yet, thus surgeons mainly rely on the visual inspection of the patients’ anatomy to accomplish these tasks, which may lead to extended flap raising and surgery times (Ishii and Kishi 2016).

The excision of small tumours requires submillimetre accuracy (Ghosh et al. 2014). Before the resection of skin and subcutaneous tumour tissue, surgeons delineate tumour margins using tools such as marker pens, rulers and loupes with percentage errors of 8–45% relative to the planned margins (Lalla et al. 2003). Delineation errors as well as the incorrect mapping of tumours may lead to their incomplete excision, which is associated with high recurrence rates, e.g. up to 55% and 60% in basal cell carcinoma and breast tumours, respectively (Waljee et al. 2008; Ríos-Buceta 2007). Incomplete excision may also lead to repeat surgery and increased morbidity, e.g. if critical anatomical structures are damaged (Telfer et al. 2008), and thus additional hospital expenses and a higher risk of complications for patients (Richmond and Davie 1987).

AR-guided localisation of perforator vessels has been investigated in previous studies (Bosc et al. 2017; Jiang et al. 2017; Pratt et al. 2018). However, to the authors’ knowledge, only one study explored the use of AR to indicate planned resection margins of flaps via conventional projection (Hummelink et al. 2017). Previous studies have also explored the AR-guided assessment of tumour resection margins. For instance, Shao et al. (2014) developed a system that used wearable technology to assist surgeons in the identification of tumour margins. This system had the sensitivity and specificity of commercial fluorescence imaging systems. Similarly, Cui et al. (2017) presented a system that rendered fluorescent images on the lenses of an AR headset. They highlighted the potential of this technology to provide an AR visualisation of tumours without blocking the surgeon’s view.

1.2 Limitations of current research

Most previous studies on AR-guided surgery did not study the effect of two variables on the user localisation error: (1) the user’s viewing angle of the virtual models, and (2) the distance between the virtual models and the user’s eyes (Frantz et al. 2018). These variables change with the surgeon’s movement around the patient and thus their effect must be investigated. Luzon et al. (2020) explored the impact of the surgeon’s viewing angle while wearing an AR device, but conclusions should be drawn with caution considering the small sample size of five participants and that the participants’ head position and distance from the markers were not controlled, so that these variables might have influenced the results. Previous studies have often performed accuracy tests under simulated surgical scenarios. Simulations of clinical scenarios are necessary to evaluate AR surgical guidance systems (Pérez-Pachón et al. 2020), but limit the number and type of measurements and number of participants that can be included in the analyses due to the complexity of the experimental design (Herzog et al. 2019). In addition, measuring localisation errors during a surgical task includes variations in the participants’ expertise and dexterity as additional sources of error (Bann et al. 2003; Sondak and Zager 2010). Moreover, X-ray vision (i.e. the perception that virtual models of internal anatomical structures lie on the surface of the patient) is a common problem in AR applications for surgery and may lead to localisation errors (Avery et al. 2009).

1.3 Aims

This study measured the error with which users of an AR headset localise virtual models aligned with real world features. A simple setup was used to reduce perception issues typically associated with X-ray vision and to facilitate a large sample size and thus the collection of a sufficiently large number of measurements for statistical testing. The research questions were:

(RQ1) What is the error in the users’ localisation of virtual models and to what extent does it result in errors in the perceived position and dimensions of the virtual models?

(RQ2) Is there a relationship between these errors and the position and distance of the virtual models from the user’s eyes?

2 Methods

2.1 Participants

We recruited 54 adults aged 20–59 years from staff and students at the University of Aberdeen for this study. All those selected were in good health and without vision problems except those corrected by glasses or contact lenses. The sample size was determined following the results of a power analysis for an ANOVA using G*Power (Faul et al. 2007).

2.2 Experimental setup

The first generation of the AR headset HoloLens (Microsoft Corporation, Redmond, USA) was used in the experiment as the equipment was purchased at the beginning of our project prior to the release of HoloLens 2. An AR app (App-1) was created using Vuforia Engine (Kress and Cummings 2017a). App-1 allowed for the detection of a 12 × 12 cm digital image marker (referred to as “marker” henceforth) by the headset’s camera. This triggered the rendering of a virtual model (a 5-cm-radius virtual hexagon) on the headset’s lenses in a set position (Fig. 1a). The virtual hexagon was rendered as an opaque figure because transparency has been reported to create issues in the users’ perception of the virtual models’ position (Kersten-Oertel et al. 2012). A second app (App-2) was run on a laptop connected to the monitor. App-2 displayed the marker on a monitor in random order at one out of nine predefined positions (Fig. 1a). For the assessment of the observation distance (i.e. the distance between the virtual models and the user’s eyes) on the localisation error, three different distances (distances 1–3) between the monitor and the headset’s camera were set: 65, 85 and 105 cm, respectively (Fig. 1b). Red marks on a table that matched distances 1–3 were used to ensure the correct position of the monitor (Online Resource 1). To measure the intraobserver error, each participant performed the exercises three times. With the combination of marker positions, monitor-camera distances and repeated measures, each participant completed 81 exercises. A chin rest on a lectern was used to ensure that the head of each participant was in the correct position and remained in this position (Fig. 1c). Plastic stops ensured the correct position of the lectern. The chin rest was aligned with the vertical midline of the monitor surface.

Fig. 1
figure 1

Diagrams of a the frontal view of the experimental setup showing a monitor displaying 12 × 12 cm digital image markers at positions 1–9 and virtual hexagons rendered on the headset’s lenses at the positions where we predicted that users would perceive them (i.e. overlaid on the monitor surface and aligned with their corresponding digital image marker); and b the top view showing the monitor at distances 1–3 (i.e. 65, 85 and 105 cm, respectively) and the virtual hexagons (red arrows) and; c photo showing the experimental setup and a researcher adjusting the position of the monitor. Image marker positions 1–8 were at 108 mm from the centre of the monitor and position 9 was aligned with the centre of the monitor

2.3 Procedure

The experiment was conducted under controlled lighting conditions. Participants were asked to adjust the headset to the distance between their eyes (using the HoloLens calibration app) and place their chin on the chin rest (Fig. 1). The centre of the monitor was aligned with the headset camera by placing the monitor at distance one and displaying a digital graph chart, and then placing a laser measuring tool (including a level indicator to ensure its horizontal position) in front of the headset camera pointing at the digital graph chart. A researcher held the laser measuring tool, while another researcher adjusted the monitor height to match the centre of the digital graph chart with the laser pointer. The monitor was levelled to ensure that its screen surface was perpendicular to the headset’s camera plane. This process ensured that the predicted positions and distances of the virtual hexagons from the participants’ eyes remained constant during the experiment (Fig. 2). After these adjustments, the digital graph chart was removed.

Fig. 2
figure 2

Experimental setup showing a participant wearing the HoloLens headset with their chin on the chin rest, the headset’s camera aligned with the centre of the digital graph chart and their eyes aligned with the digital graph chart and virtual hexagons. In this experimental setup, the participant observes the hexagon from their perspective; however, the participant only sees one rendered hexagon at a time on the grid

For each exercise, the monitor was placed at the corresponding distance from the headset camera. Once the headset camera detected a marker, a virtual hexagon was rendered on the headset lenses. If the system performance and participants’ perception were error-free, the virtual hexagon would be perceived by participants at its predefined position (Fig. 1a). Then, the marker was removed from the visualisation and participants were asked to click on each vertex of the virtual hexagon using a mouse. The x- and y-coordinates of each click were recorded by App-2.

2.4 Data analysis

Vertex localisation errors were expected to generate errors in the virtual hexagons’ position and dimensions (Online Resource 2). SPSS 25 (IBM Statistics, Chicago, USA) and SigmaPlot 14 (Systat Software, San Jose, CA) were used for statistical analysis. The participants’ clicks were used to calculate the error in the position of the virtual hexagons’ vertices:

$${V{\text{err}}}_{n}= \sqrt{\left[{(x-x^{\prime})}^{2}+ {(y-y^{\prime})}^{2}\right]}$$
(1)
$${\mu }_{V{\text{err}}}=\frac{({eV}^{n1}+{eV}^{n2}+{eV}^{n3}+{eV}^{n4}+{eV}^{n5}+{eV}^{n6})}{6},$$
(2)

where Verrn is the error in the position of a given virtual hexagon’s vertex, x and y are the predicted x and y coordinates, respectively, \(x^{\prime}\) and \(y^{\prime}\) are the x and y coordinates of the participants’ clicks, respectively, and μVerr is the mean error in the position of the virtual hexagon’s vertices.

The participant’s clicks were also used to calculate the error in the virtual hexagons’ area. As the virtual hexagons as perceived by the participants were expected to be irregular polygons, the area was obtained by calculating the sum of the areas of the four triangles forming the virtual hexagons (Online Resource 3), for which the following formulae were used:

$$s= \frac{(a+b+c)}{2}$$
(3)
$${A}_{n}= \sqrt{s(s-a)(s-b)(s-c)}$$
(4)
$$hA= {A}_{1}+ {A}_{2}+ {A}_{3}+ {A}_{4},$$
(5)

where s is the semi-perimeter of a given triangle, a, b and c are the length of its sides, A is the triangle’s area, and hA is the virtual hexagon’s area.

In addition, the participant’s clicks were used to calculate the error in the virtual hexagons’ centroid position:

$${C}_{x}^{\prime}= \frac{1}{6hA} \sum_{i=0}^{n-1}[\left({x}_{i}+ {x}_{i+1}\right)\left({x}_{i}{y}_{i+1}- {x}_{i+1}{y}_{i}\right)]$$
(6)
$${C}_{y}^{\prime}= \frac{1}{6hA}\sum_{i=0}^{n-1}[\left({y}_{i}+ {y}_{i+1}\right)\left({x}_{i}{y}_{i+1}- {x}_{i+1}{y}_{i}\right)]$$
(7)
$$C{\text{err}}= \sqrt{{({C}_{x}-{C}_{x}^{\prime})}^{2}+{({C}_{y}-{C}_{y}^{\prime})}^{2}},$$
(8)

where \({C}_{x}^{\prime}\) and \({C}_{y}^{\prime}\) are the x and y coordinates, respectively, of the centroid defined by the vertex positions of the hexagon marked by the participant, hA is the virtual hexagon’s area, Cerr is the error in the centroid position, and Cx and Cy are the predefined x and y coordinate of the centroid, respectively.

The virtual hexagons’ area was also used to calculate the expansion or shrinkage of the perceived hexagon (i.e. the “margin error”) by calculating the long diagonal and comparing it with its predicted length:

$$D=2\sqrt{\frac{pA}{1.5\sqrt{3}}}$$
(9)
$$D^{\prime}=2\sqrt{\frac{hA}{1.5\sqrt{3}}}$$
(10)
$$M= \frac{D-{D}^{\prime}}{2},$$
(11)

where \(D^{\prime}\) is the predicted length of the long diagonal, pA is the predicted area, D’ is the length of the long diagonal of the virtual hexagon, hA is the virtual hexagon’s area and M is the margin error. This calculation assumed that the virtual hexagons remained regular polygons after changes in their area and thus changes in the virtual hexagons’ shape were not considered. A summary of the independent and dependent variables is provided in online resource 4. The intra-class correlation coefficient (ICC) was used to analyse the intraobserver variability in the vertex localisation error (Koo and Li 2016). Since, data were not normally distributed, a non-parametric Kruskal–Wallis H test (p < 0.001) was used to analyse the differences between participants.

Errors exceeding 1.5- and 3-times the interquartile range (IQR) were classified as weak and strong outliers (Hoaglin et al. 1986), respectively. Errors across groups were compared within each independent variable (i.e. the predicted positions and distances of the virtual hexagons from the participant’s eyes). For comparisons between more than two groups, the non-parametric Kruskal–Wallis H test (p < 0.001) was used and subsequent pairwise comparisons were performed using Dunn’s method (p < 0.05). Three error categories (≤ 0.5, ≤ 1 and ≤ 5 mm) were determined for the centroid position and margin errors, respectively; these margin errors corresponded to area errors of ≤ 2.0, ≤ 4.1 and ≤ 21.0%, respectively. The percentage of errors within each category was then calculated. These categories allowed exploring the frequency of: submillimetre errors required for high-precision surgery such as tumour excision, ophthalmology, otology, or micro-reconstructive surgery (Mattos et al. 2016); and ≤ 5 mm errors, which may be acceptable for some surgical tasks such as the preoperative staging of breast cancer tumours (Luparia et al. 2013).

3 Results

3.1 Error in the users’ localisation of virtual models

Results obtained for each participant revealed a high similarity between repetitions (ICC = 0.9). In contrast, the vertex localisation error was significantly different between participants (p < 0.001), showing a wide error range and 1.8% of outliers (Online Resource 5). This suggests that vertex localisation errors were user-dependent. Based on the assumption that strong outliers are likely to be easily detected as incorrect by surgeons (Online Resource 6), we excluded strong outliers from subsequent analyses. Overall, average vertex localisation, centroid position and margin errors all remained below 5 mm (Table 1). Additionally, the absolute and relative average errors of percentage area were found to be 3.8% and 1.6%, respectively. 1–5 mm and over 5 mm centroid position errors were found in 64.4% and 29.8% of cases, respectively, when observed from a 65 cm distance (Table 2). Errors in the virtual hexagons’ margins remained below 5 mm, with 72.2% and 27.8% of the hexagons’ margins showing an expansion or shrinkage, respectively.

Table 1 Errors for all dependent variables including all virtual hexagon positions and distances from the participants’ eyes.
Table 2 Percentage of virtual hexagons showing ≤ 0.5, ≤ 1 and ≤ 5 mm centroid position and margins error for virtual hexagons at distance 1 (65 cm) and virtual hexagon positions 1–9 (n = 4146)

3.2 Effect of the virtual models’ position relative to the users’ eyes

The effect of virtual hexagon position on vertex localisation, centroid position and absolute area errors is summarised in Fig. 3a. Vertex localisation and centroid position errors were significantly different between all virtual hexagon positions (p < 0.001). Most participants localised the virtual hexagons on the right side of their field of vision more accurately than on the left side. Indeed, virtual hexagon positions 3, 4 and 5 (i.e. on the right side of the participant's field of view) and 6 (i.e. on the bottom side of the participant's field of view) produced significantly smaller vertex localisation and centroid position errors (p < 0.05) than the other virtual hexagon positions (Fig. 3b and Online Resource 7). In contrast, area errors were significantly smaller (p < 0.05) for virtual hexagon positions 1, 2, 3, 8 and 9, i.e. generally, on the viewers’ eye level and above this level (Fig. 3c).

Fig. 3
figure 3

Virtual hexagon positions with smallest errors for vertex localisation, centroid position and absolute area (highlighted in green) and their mean errors (a) and vertex localisation (b) and absolute area (c) errors for virtual hexagon positions 1–9 (n = 4146). Whiskers indicate ± 1.5 IQR. Weak outliers (i.e. values > 1.5 IQR and < 3 IQR) are indicated with circles and virtual hexagon positions with significantly smaller errors (p < 0.05) with asterisks (Kruskal Wallis H test). Strong outliers (i.e. values > 3 IQR) were excluded

3.3 Effect of the distance between the virtual models and the users’ eyes

Errors in vertex localisation and centroid position decreased as the distance between the headset’s camera and the monitor increased (Fig. 4a and Online Resource 8). These errors were significantly different (p < 0.001) between distances 1–3 (i.e. 65, 85, and 105 cm, respectively). Distances 2 and 3 produced smaller absolute area errors than distance 1 (Fig. 4b).

Fig. 4
figure 4

Vertex localisation (a) and absolute area (b) errors for distances 1–3 (n = 4146). Whiskers indicate ± 1.5 IQR. Weak outliers (i.e. values > 1.5 IQR and < 3 IQR) are indicated with circles and distance associated with the smallest errors (p < 0.05) with asterisks (Kruskal Wallis H test). Strong outliers (i.e. values > 3 IQR) were excluded

4 Discussion

This study measured the errors in the perceived position and dimensions of virtual models by users of the HoloLens 1 headset and investigated the relationship between these errors and the virtual models’ position and distance from the users’ eyes. An increased localisation accuracy for virtual hexagons on the right side of the field of view was found (Fig. 3). This may be partially due to ocular dominance, i.e. the tendency of individuals to favour visual input from one eye, typically the eye with which visual information is perceived more clearly (Lopes-Ferreira et al. 2013). Ocular dominance occurs in 97% of individuals of which 65% show a dominant right eye (Reiss 1997).

Vertex localisation and area errors were larger for the shortest distance of the virtual hexagons from the participants’ eyes (65 cm) than for larger distances, i.e. 85 and 105 cm (Fig. 4). A reason for this may be that a 65 cm distance lies outside the optimal zone for the visualisation of virtual content with HoloLens 1, which was determined as 125–200 cm from the eyes by Condino et al. (2018), and lies further from this zone than 85 and 105 cm distances. Another reason may be a reduced viewing quality of the virtual models if users observe them from short distances (Bach et al. 2018).

Centroid position errors over 5 mm represent a risk of incorrect localisation of perforator vessels during soft tissue flap surgery, which may lead to accidental injury to these vessels, thus compromising flap viability (Corbitt et al. 2014). In addition, the observed margins error, with strong outliers in 4.3% of cases (Online Resource 6), are larger than in surgeons’ drawings of skin tumour resection margins using traditional tools (surgical markers, rulers and loupes), which have been reported to be around 1 mm (Lalla et al. 2003). The observed centroid position errors and margin shrinkage combined may compromise the complete excision of skin tumours or cause the inaccurate mapping of subcutaneous tumours (Table 2). The European Organization for Research and Treatment of Cancer (EORTC) reports a 17.5% cumulative incidence of tumour recurrence associated with incomplete tumour excision (Poortmans et al. 2009). Therefore, the risk for potential errors over 1 mm must be carefully considered before implementing the use of AR headsets in clinical practice.

It should be noted as a study limitation that a 12 × 12 cm marker was used as opposed to previous studies that used smaller markers (Luzon et al. 2020). Small markers are preferable for surgery, because they have a lower risk of occluding the surgical working area and thus interfering with surgical workflows. However, small markers may also lead to errors that are not acceptable for surgery (Pérez-Pachón et al. 2021). In addition, our study did not include surgeon participants. Surgeons usually have increased dexterity compared to non-surgeons, due to their surgical training, and thus they are expected to show smaller errors (Sadideen et al. 2013). Moreover, this study did not measure changes in the hexagon shape, although this information would help to prevent complications derived from mismatches between flaps and defects (Kimura 2009). Finally, our results are limited to HoloLens 1 and cannot be generalised to other headsets.

Experiments analysing user accuracy with AR headsets inform the research community and healthcare professionals on the readiness of AR technologies for their use in the operating theatre. Additional key aspects to be considered include usability, reliability, and workflow. These aspects, with accuracy likely being the next key barrier for implementation in the surgical arena, are especially relevant within the context of high-precision surgical tasks. Extensive research has explored this topic for a wide variety of surgical fields such as neurosurgery, cranio-maxillofacial surgery, or cardiovascular medicine (Gsaxner et al. 2023). With the advent of AR technologies tailored to surgical practice, HoloLens-like technology are expected to become an integral part of high-precision surgical setups within the next decade (Zhang et al. 2023).

Due to the abstract nature of our experiment, our results can also be applied to other fields that involve high-precision tasks. Research on the application of HoloLens-like technologies to high-precision tasks outside of healthcare is currently limited, however, the authors expect that there will be applications within the fields of engineering, manufacturing, and repair in the near future.

5 Conclusion

In this study, we found that some model positions and the shortest distance (65 cm) led to larger localisation errors than other positions and larger distances. Localisation errors tended to be smaller for virtual models rendered on the right side of the user's field of view. Adjusting the location of virtual models according to this may help to maximise localisation accuracy. Developers and manufacturers should also aim for minimising localisation errors for distances of 65 cm or less between the user’s eyes and the virtual models, especially considering open surgery applications, where surgeons operate at an arm range distance from the surgical working area. Finally, the user-dependent errors found in this study indicate that training surgeons on the use of AR headsets may potentially help to minimise localisation errors.