1 Introduction

Computer-assisted surgical navigation systems have revolutionized how patients are treated in challenging medical procedures (Nijmeh et al. 2005). These systems monitor the location of surgical instruments in relation to specific areas of interest on the patient. From here, guiding systems help surgeons plan trajectories that minimize risk of unintended anatomical damage (Hassfeld and Mühling 2001). In recent years, dedicated navigation platforms have been developed for use in fields such as neurosurgery, orthopedic surgery, and maxillofacial surgery (Zhang et al. 2019).

The effectiveness of surgical navigation systems hinges on their ability to continuously track anatomical structures and instruments. Conventionally, surgeons employing free-hand techniques must divert their attention from the patient toward external screens to confirm anatomical positions through medical scans (Burström et al. 2021; Mezger et al. 2013). The integration of instrument and anatomical tracking into the surgeon’s workflow eliminates the need for such redirection.

While there are many applications where surgical navigation can be implemented, these cases typically fall into two categories: monitoring actively moving instruments over time and guiding instruments to fixed locations. The former, common in applications of neurosurgery, demands systems to track surgeons’ instrument movements (Pivazyan et al. 2023). The latter, common in applications of spinal surgery, necessitates systems guiding instruments precisely to specific points on the patient’s body (Wallace et al. 2020). In this context, instrument mobility is restricted with an emphasis on achieving greater precision. Applications of how instruments are tracked in operating room environments vary from procedure to procedure, but they generally utilize either fixed or moving markers. In both cases, markers are usually attached to instrument heads, but the range of movement depends on the procedure. In applications of spinal surgery or pain management procedures, markers are fixed as surgeons attempt to make incremental needle insertions toward anatomies. This is contrasted toward operations that require tracking markers as surgeons actively move their instruments. Although the maximum amount of error tolerated in navigation systems depends on the application, typically under 2 mm of Euclidean error is accepted (Morley et al. 2023).

Current state-of-the-art systems provide surgeons with high positional precision and improve the success of surgical operations (Mezger et al. 2013). These systems revolve around sensing the depth of objects using infrared (IR) cameras. Specifically, retro-reflective markers are attached to medical instruments that can be detected by near-infrared stereoscopic cameras (Mezger et al. 2013; Wu et al. 2019). To achieve depth perception, stereoscopic cameras utilize a two-camera system, which infers positional information based on the difference between the captured images (Smith et al. 2012). Yet, despite advances in medical technology, the cost of implementing such camera systems can be a barrier for small healthcare centers, independent practices, and training purposes (Asselin et al. 2018). For instance, novel 3D navigation systems have been developed by several commercial companies. While these systems are innovative, these systems can cost anywhere from $250,000 to $600,000 (Malham and Wells-Quinn 2019).

Recent improvements in consumer-grade cameras have opened the door for fully optical, or videometric, tracking implementations. For instance, researchers have successfully implemented an iPhone-based augmented reality navigation setup for brain lesion localization (Hou et al. 2016). This technology is considerably more accessible to healthcare institutions compared to more premium alternatives due to its low cost (Sorriento et al. 2020). However, optical tracking systems are unable to utilize the reflective properties of traditional IR markers.

Here, we developed a low-cost, fully optical tracking system using off-the-shelf, readily available inexpensive web cameras (Fig. 1). To facilitate position tracking, we utilize open-source based fiducial markers as an additional cost-effective measure. This study aims to validate the potential of using low-cost alternatives in surgical navigation. To do this, we design a series of experiments that enable the real-time capture of a marker’s movement. By analyzing the error associated with inexpensive stereo tracking, we aim to gain insights into the feasibly of similar systems in surgical applications.

Fig. 1
figure 1

a Using a stereoscopic camera setup made with commercially available web cameras, input images are taken that generate a depth map of the environment. b Fiducial camera markers are tracked to obtain the depth of relevant objects. From here, the marker’s position can be computed

2 Related work

Surgical navigation systems serve diverse functions and offer various advantages for medical practitioners (Kraus et al. 2010). Broadly speaking, these systems can be classified either as those that provide additional guidance during surgical procedures, or those that perform certain steps of the procedure autonomously, contingent upon the user’s discretion (Musahl et al. 2002). These systems employ a diverse set of tracking techniques, including infrared, electromagnetic, and optical approaches, to facilitate their operation.

Commercial implementations of surgical navigation systems have been offered by medical technology companies for decades. Notably, there are systems available in the market designed to enhance the precision of spine surgeries through intraoperative image-based guidance. By attaching infrared (IR) reflective markers to surgical instruments and fiducial infrared markers to bony locations via a back incision, IR cameras can accurately track instrument movements. These systems can cost anywhere between $365,000 and $505,000 (Rossi et al. 2021).

To eliminate the necessity for extra skin incisions during marker insertion, there have also been non-invasive commercial implementations for pedicle screw replacement surgeries. These systems utilize markers equipped with IR LEDs to serve as a positional reference. Smaller markers, with IR LEDs attached, can be affixed to instruments to track them. By detecting the infrared LEDs, the movement of the instruments with respect to the patient can be monitored. The cost of such systems start from $215,000 and can cost as much as $350,000 (Rossi et al. 2021).

The utilization of infrared markers and cameras can significantly increase the cost of surgical navigation systems. To address this, researchers have explored various techniques for surgical navigation, including mechanical tracking, electromagnetic tracking, ultrasound tracking, and more. In particular, Sharma et al. proposed adopting electromagnetic tracking for performing implant surgery utilizing monotonic variations in magnetic fields along the X, Y, and Z axes (Sharma et al. 2021). By attaching microchips to an implant within the body and another to a surgical tool, the devices concurrently measured and relayed magnetic field information from their respective locations to an external receiver. Stenmark et al. discusses the possibility of using videometric tracking of 3D dice in navigation systems (Stenmark et al. 2022). The combined information of the geometry of the die, and its position in each frame, coupled with fiducial markers, was used to estimate the subsequent frame’s position using the solvePnP (point and perspective) algorithm.

Regardless of the type of navigation system used, it is vital that accurate tracking be performed in real time. In this paper, we contribute to optical tracking by introducing a low-cost stereoscopic camera to enable depth perception and investigate the concept of videometric tracking using disparity maps generated from stereoscopic camera images combined with fiducial markers.

3 Stereoscopic camera implementation

Here, we develop an inexpensive tracking system, as seen in Fig. 1a, using off-the-shelf parts. We calibrated a stereoscopic camera setup using two Logitech C920x web cameras. At the time of publication in 2023, these cameras individually retail for $60. This makes the total cost of the tracking system significantly cheaper compared to state-of-the-art systems. Figure 2 displays our stereo camera setup. The baseline distance, defined as the center-to-center distance between the two cameras, is 12.5 cm.

Fig. 2
figure 2

Stereoscopic camera setup. Here, the baseline distance b is 12.5 cm, and the focal length f is 3.67 mm

We calibrated our stereoscopic camera system using a calibration checkerboard with 8 by 11 vertices, in accordance with calibration checkerboard documentation (Zhang 2004).

4 Open-source fiducial marker tracking

Our marker tracking approach utilizes the ArUco marker package from OpenCV to facilitate positional tracking. ArUco markers are a square-based fiducial marker system specifically designed for camera pose estimation (Garrido-Jurado et al. 2014). Leveraging fiducial marker tracking allows for the versatile tracking of both instruments and anatomies, depending on the marker’s placement. For instance, positioning a marker on a surgical instrument enables spatial tracking of the instrument, while positioning a marker on a patient serves as a reference point for relevant anatomical features. In this sense, the versatility of marker tracking has applications in designing low-cost systems for general navigated surgery, as opposed to just being an implementation for a specific domain. As the marker moves, the stereoscopic camera tracks the marker’s center position using OpenCV-based functions.

Given the marker centers from the left \((x_l, y_l)\) and right \((x_r,y_r)\) cameras, we can compute the disparity between the markers. Disparity, d, is formally defined as the horizontal difference between the two marker centers. Using the focal length f and baseline distance b of the cameras, we are able to compute the 3D position of the marker with respect to the camera.

$$\begin{aligned} d&= x_l - x_r \end{aligned}$$
(1)
$$\begin{aligned} z&= \frac{fb}{d} \end{aligned}$$
(2)
$$\begin{aligned} x&= \frac{x_lz}{f} \end{aligned}$$
(3)
$$\begin{aligned} y&= \frac{y_lz}{f} \end{aligned}$$
(4)

In practice, calculating disparity involves using more sophisticated algorithms rather than directly using Eq. (1). The most common algorithms include Stereo Block Matching (StereoBM) and Stereo Semi-Global Block Matching (StereoSGBM) (Hirschmuller 2008). StereoBM, implemented in OpenCV, employs a small sum of absolute difference (SAD) window to identify matching points between the left and right images. Disparity is then calculated as the horizontal pixel difference given matching points (Kim et al. 2020). StereoSGBM, alternatively, implements pixel-wide matching using Mutual Information, which approximates a global 2D smoothness constraint (Hirschmuller 2008). In practice, StereoSGBM has been shown to produce more accurate disparity maps than StereoBM, and is less susceptible to outliers (Stenmark et al. 2022). Therefore, StereoSGBM was used in this study to compute values of disparity.

Using StereoSGBM, we can rewrite Eq. (1) as a function of its OpenCV implementation given the left \(L_v\) and right \(R_v\) video frames. The actual disparity value used comes from the disparity of the depth map at the marker center.

$$\begin{aligned} d = StereoSGBM(L_v,R_v) \end{aligned}$$
(5)

5 Experimental setup

To test the accuracy of our stereoscopic camera system, we developed a positioning platform to move the marker in the X and Y directions (Tsui et al. 2023a). In 3D space, the marker can be referred to as having the coordinates (xyz). The platform spans 500 mm by 500 mm and considers motion in 2D. As shown in Fig. 3a, we use two motors to move the marker around in a square pattern. The use of a positioning platform enables us to perform consistent, repeatable tracking trials focused solely on evaluating the system’s accuracy. Information regarding the accuracy of the positioning platform has been left in Appendix 3.

Fig. 3
figure 3

a Positioning platform setup. The moving marker is able to move in the X and Y directions. b Marker movement pattern. Numbers one through eight correspond to the order of travel

During our experiments, we placed the stereoscopic camera system directly above the positioning platform at a height of 500 mm. As the marker was shuttled around according to Fig. 3b, the stereoscopic camera captured and recorded the position of the marker. Table 1 displays the parameters used during testing. The ArUco markers utilized are 40 mm by 40 mm.

Upon capturing a live video feed of the marker moving on the positioning platform, we experimented with converting the video frames to different color spaces. Each color space employs distinct sets of parameters to alter the video’s color characteristics. By applying different color spaces to the video frames, we aim to assess which one facilitates the smallest error. Notably, the measurement speed of the system is determined by the frames per second (fps) of the camera. In this implementation, we capture a live video feed at 30 fps.

Marker tracking tests were performed in Red, Green, and Blue (RGB), Hue, Saturation, and Lightness (HSL), and Hue, Saturation, and Value (HSV) color spaces. For HSL and HSV spaces, we manually thresholded values of Lightness and Value to separate the marker from the background.

In a previous study, we performed a similar experiment using the same three color spaces. However, rather than computing disparity using StereoSGBM, we calculated disparity using Eq. (1). We observed minimal differences in error among color spaces. Across all color spaces, the average error in marker tracking remained consistently around 5.5 mm (Tsui et al. 2023b). We noticed that the error differences between color spaces were minute. Additionally, the detection percentage of the marker, defined as the instances of correct detection relative to the total frames it was presented, were equivalent. In this set of experiments we look to answer the following question: does a stricter lower bound of Lightness (L) and Value (V), which runs the risk of lowering detection percentage, result in lower error? In these experiments, when thresholding Lightness and Value, the lower threshold for L and V was empirically chosen to be 55%. A total of five experiments were performed for each color space.

Table 1 Parameters used in marker tracking experiments

6 Proposed tracking algorithm

The proposed tracking algorithm given below aims to capture the 3D position (xyz) of the ArUco marker A. Given that A is detected in the left \(L_v\) and right \(R_v\) video frame, the function find pixel location () is called with the help of the ArUco package to return the pixel locations of the marker in the left \((x_l,y_l)\) and right \((x_r,y_r)\) video frame. get disparity () is also called to return the disparity value at the center of the ArUco marker. Using the focal length f and baseline b values, one can compute the 3D position.

Algorithm 1
figure a

Calculate marker positions

7 Results

We divide our results into two sections to examine both the system’s reliability and accuracy. In Fig. 4a, we report the detection percentage of the markers in the different color spaces, as well as the percentage of outliers removed. Quantitatively, we eliminated outliers using the Random Sample Consensus (RANSAC) algorithm. In our implementation, RANSAC estimates a linear model to the data, such that detected outliers do not influence the estimates. As such, RANSAC allows us to robustly filter out outliers. We report in Fig. 4b the average percentage of outliers removed. It can be clearly seen that even though the detection percentage of the HSL and HSV color spaces has significantly decreased compared to RGB due to the Lightness and Value thresholding, this does not appear to reduce the amount of outliers filtered out. We also note that in the RGB case, where no thresholding is applied, the marker is detected upwards of 99% of the time.

Fig. 4
figure 4

a Detection percentage of markers in RGB, HSL, and HSV color spaces. b Percentage of outliers removed in RGB, HSL, and HSV color spaces.

Using the filtered data, we compute the root mean square error (RMSE) of the experimental positions with respect to their theoretical values, as given in Table 2. Here again, a reduction in detection percentage does not appear to lower the error. Furthermore, while the RMSE of the positional readings is relatively low across all spaces, achieving this comes at the expense of filtering out roughly 25–30% of the data. Keeping the video frame in the RGB color space produces the lowest error estimate of roughly 2 mm while also filtering out the least amount of data.

Table 2 Marker tracking RMSE (mm) over five trials in RGB, HSL, and HSV

8 Discussion

Detailed experimentation with fiducial markers interfaced with a basic stereoscopic camera system demonstrates the potential for a fully optical, low-cost surgical navigation system. Given a $120 budget with an open-source marker implementation, this study reveals that achieving an error that hovers around 2 mm is possible using inexpensive off-the-shelf components. However, it is important that the 2 mm of Euclidean error tolerance includes both translation and rotation, whereas our study is only conducted in two degrees of freedom. Future considerations to be given to accurately track marker rotations during pitch, yaw, and roll. For example, one possible implementation of this system would be to use a five degree of freedom robotic arm to mimic surgeon movements.

One additional caveat to these results is the number of outliers that have to be removed in post-processing. In our tests, we noticed that the majority of outliers originate from noisy disparity maps. Addressing these outliers through filtering real-time disparity maps would enhance the system’s robustness. Additionally, our study primarily evaluates the accuracy of a stereoscopic system in translational movements. We broadly classify surgical procedures into two categories: procedures that utilize fixed markers on instruments to get incrementally close to patient anatomies, such as in pain management procedures, and procedures that actively track surgeon-instrument movements, such as in applications of neurosurgery. Our system is intended to handle instances of fixed markers with minimal rotation, rather than surgeon-instrument tracking.

Importantly, using a stereoscopic camera with an ArUco marker is not the only way to capture its 3D position. In fact, ArUco supports utilization of the solvePnP algorithm to approximate both translation and rotation. In our empirical tests, we found that the translation and rotation matrices generated by ArUco vary substantially. As such, we opted to utilize stereoscopic vision to capture position. One future consideration toward low-cost surgical navigation is how to compute the 3D position of markers using non-ArUco based methods. Scaling up this system as-is for usage in operating rooms may prove to be a challenge due to variables impacting detection. While ArUco markers are effective for fast prototyping, their detection relies strongly on camera distance and how large the marker is. While recent literature suggests the feasibility of using 3D ArUco markers, considerations need to be made toward utilizing more effective detection methods without sacrificing the low-cost nature of ArUco (Stenmark et al. 2022).

While our implementation successfully achieves a 2 mm error, we aim to further reduce the error in our system. For instance, upgrading the stereoscopic system with higher-quality cameras is guaranteed to enhance accuracy and diminish the number of outliers. Another avenue is experimenting with positional data fusion from a multi-stereoscopic camera system. Lastly, implementing filtering on the disparity maps produced by Block-Matching algorithms would address tracking inconsistencies. For example, filters such as the Weighted Least Squares (WLS) filter work to smooth out the edges of the disparity map by imposing weighted least squares regularization on the image. Such filters can contribute to smoothing out the output, potentially reducing the presence of outliers (Farbman et al. 2008).

9 Conclusion

In this study, we have designed a fully optical tracking system comprised of off-the-shelf, low-cost parts and open-source fiducial markers. We designed and calibrated a stereoscopic camera to record the 3D position of a moving ArUco marker. Average error, detection percentage, and proneness to outliers were used to evaluate the positioning accuracy of our system in various color spaces. Using optimal experimental settings, we obtained a root mean square error of 1.84 mm. The results suggest the possibility of developing a real-time, cost-effective surgical navigation system.