Introduction

Accurate and precise placement of pedicle screws is foundational to spinal fusion and spinal deformity correction; the use of pedicle screw fixation has improved both fusion rates and surgical correction [1,2,3]. Erroneously placed pedicle screws may result in patient morbidity and mortality, unplanned return to the operating room (UPROR), worse patient-reported outcomes and increased cost of care [4]. Recent studies report UPROR rates of 0.26–1.1% for malpositioned screws, which represents only a small subset of malpositioned screws [5,6,7,8,9].

Studies of pedicle screw accuracy have historically utilized review of computed tomography (CT) scans to assess and measure pedicle breach [10, 11]. The Gertzbein–Robbins (G–R) classification system defines breach as: Grade A, screw placed without breaching the cortex of the pedicle; Grade B, cortical breach < 2 mm; Grade C, cortical breach ≥ 2 mm but < 4 mm; Grade D, cortical breach ≥ 4 mm but < 6 mm; and Grade E, cortical breach ≥ 6 mm [12]. The G–R system has been the clinical “gold standard” for assessing screw safety and breach on postoperative CT [11, 13,14,15,16,17]. Some limitations of the G–R classification include: (1) the influence of surgeon/observer bias; (2) undetermined intrarater/interrater reliability; (3) the effect of metal-induced scatter artifact from the screws; (4) no defined standardized CT view (axial vs. coronal vs. sagittal) for assessment; (5) a lack of external validation.

The advent of computer-assisted surgical navigation has fostered development of an engineering approach to defining pedicle screw accuracy (technical accuracy) by comparing the preoperative screw plan to the actual screw position, and utilizing a Cartesian coordinate system for the reporting of absolute Euclidean error (straight-line distance in three-dimensional [3D] space) [11, 14,15,16,17,18,19,20,21]. However, a major limitation of the methodologies used in these studies is the need for manual alignment of vertebral bodies between pre- and postoperative images, introducing additional variance to the measured accuracy [14, 16, 19,20,21,22,23,24,25,26,27,28,29,30,31,32,33]. Using the term “accuracy” to refer to both anatomic safety measurements (i.e., G–R classification) and plan vs. placement errors results in confusion for readers attempting to evaluate best surgical practice.

The purpose of this study is to present a technically rigorous, clinically useful, and fully automated systematic approach to measure pedicle screw accuracy. This technique eliminates the need for manual alignment of the preoperative and postoperative CTs. Furthermore, we show that a screw-aligned reference frame enables us to generate detailed and clinically meaningful measurements of systematic error and precision. The emphasis on technical accuracy using continuous variables for positional and angular data facilitates direct comparisons between, and iterative improvements to, surgical navigation systems.

Methods

Surgery and imaging

This institutional review board-approved study utilized three adult cadavers. Cadavers were excluded if they had severe metabolic bone disease (e.g., osteoporosis), injury that would compromise the vertebrae, or spinal instrumentation.

Preoperative whole-spine helical CT scans were acquired using a GE Discovery CT750 64-slice CT scanner (GE Healthcare, Chicago, IL). The imaging protocol used 140 kVp and variable tube current (305-485 mA). Slice thickness and spacing were 0.625 mm, with pitch factor of 0.516. Reconstruction diameter (FOV) varied (range 20–30 cm) to accommodate cadaver positioning and spinal curvature, resulting in voxel sizes of 0.39 × 0.39 × 0.625 mm3 to 0.59 × 0.59 × 0.625 mm3. Screw planning was performed using Mimics (Materialize NV, Belgium), a system- and vendor-agnostic, general-purpose medical image visualization and processing software. Bilateral screws T2–L4 were planned using standard criteria to avoid medial, superior, and inferior breach. Planned screw positions were transferred into the robotic trajectory planning environment (Mazor X Stealth, Medtronic, USA).

The cadavers were dissected using a posterior midline approach with subperiosteal exposure from the spinous processes to the tips of the transverse processes. Two surgeons performed pedicle instrumentation using robotic-assisted navigation. Pedicles were drilled through the robotic end-effector, and then 4.5 mm poly-axial pedicle screws of predetermined length were placed through the end-effector.

Postoperative CT scans were performed using the similar protocols as preoperative scans, with the addition of a metal artifact reduction reconstruction. Postoperative CTs were graded by a third senior orthopedic surgeon and a senior radiologist using the Gertzbein–Robbins system. CTs were graded by creating multiplanar reconstruction and adjusting the plane of imaging to align with the pedicle screws at each level. Finally, postoperative laminectomies were conducted by skeletonizing the pedicles to assess superior, inferior, and medial breaches.

Automated processing

Using automated segmentation software, voxel-wise labels were generated for all preoperatively scanned vertebrae, defining the cortex of each bone [34]. Postoperative screw positions were determined by optimizing a poly-axial screw model to fit the CT image data. To compare preoperative plans with postoperative screw placement, automatic rigid-body alignments for each level were performed (Fig. 1).

Fig. 1
figure 1

Registration workflow diagram

Registrations for each vertebral level were initialized using singular value decomposition to align preoperatively planned screw tip and tail landmarks with corresponding points on the detected screws, then optimized by maximizing Mutual Information between voxel intensity of pre- and postoperative CT volumes (Fig. 2) [35, 36]. The optimization step was performed using the diffusion imaging in Python package; segmentation labels were used as inclusive masks for the preoperative volumes while screws were excluded from the postoperative volumes by an intensity threshold [37].

Fig. 2
figure 2

Visualization of outputs from automated CT analysis pipeline. Blue highlights show the preoperative segmentation mask registered on top of the postoperative CT. Red lines show the axes of the detected screws. Successful registration and correct identification of the screw positions indicate that the resulting accuracy measurements will be valid. Note that each image is a stacked average of several slices to visualize the screws across multiple planes

These rigid-body registrations were then applied to map the planned screw trajectories into the postoperative image space. Screw accuracy was measured as translational and angular deviations along all three axes. The entire analysis pipeline (vertebral segmentation, screw detection, image alignment, and error measurements) was fully automated to ensure the objectivity and repeatability of the protocol. Outside of open source tools, all analyses were performed with custom Python code [34, 37].

To test the accuracy of the automated registrations, a total of 16 1 mm tantalum beads were inserted percutaneously across 4 vertebral levels in an equivalently instrumented cadaveric specimen, age 71 years. These beads were manually identified in pre- and postoperative CT and positions were compared after rigid-body alignment (beads being masked out during registration). Perturbation analysis was performed to test registration convergence: initializations for each of the 45 levels involved in the screw analysis were perturbed by 10 random rigid transforms of 2° and 2 mm and the resulting registrations compared to the unperturbed baseline registration.

Coordinate system

Error analysis for each screw was performed in a reference frame oriented to the planned screw trajectory (Fig. 2). For pedicle screws, coordinate axes were approximately aligned with the radiographic right anterior–superior (RAS) standard. For ease of interpretation, we refer to these axes as medial–lateral (ML), anterior–posterior (AP), and superior–inferior (SI). The AP axis was aligned to the planned screw shaft. Next, we fit a plane to the inferior vertebral endplate; the SI axis is defined as the portion of that plane’s normal perpendicular to the AP axis. The ML axis is orthogonal to these two axes.

Accuracy measurements were performed at the screw tip and tail, as well as the “mid pedicle,” defined as the point along the pedicle where the screw comes closest to (or maximally breaches) the medial pedicle wall. All analyses of right sided screws are illustrated as mirrored across the sagittal plane, allowing direct comparison across all screws while maintaining anatomically relevant directionality.

Statistical analysis

Group-level angular and translation errors are expressed as mean absolute error (MAE) and signed mean error (SME, or arithmetic mean), and 2 standard deviations (SD). SD is computed on the raw (signed) errors and represents the “spread” or consistency of screw placement, i.e., precision. Clinical safety is described by breach rates as well as the G–R grade. One-way ANOVA and post hoc multiple comparison tests with Bonferroni correction were done to test for accuracy differences in the upper thoracic (T2–T6), lower thoracic (T7–T12), and lumbar (L1–L4) regions. Statistical tests were completed using SPSS version 29 (IBM Corp, Armonk, NY).

Results

Ninety pedicle screws were planned in 3 cadavers, aged 54, 69 and 84 years. Eighty-eight screws (97.7%) were placed and analyzed in this study. One screw was skipped due to limited arm-reach of the robot and another due to clamp position blocking the screw trajectory. Screw tip, mid-pedicle, and tail positions and angular accuracy are summarized in Table 1.

Table 1 Accuracy of pedicle screws (N = 88)

The mean Euclidean error at the tip and tail were 1.67 mm and 1.78 mm, respectively. The Euclidean error at the mid-pedicle position was 1.04 mm; 2D analysis is utilized since “depth” of screw penetration was not a factor in the mid-pedicle location (Fig. 3). Figure 3A–C provides a graphical presentation of these data as “target analysis” for the tip, tail, and mid-pedicle positions.

Fig. 3
figure 3

Axis-aligned slices of postoperative CT images showing simulated error calculations. Analysis planes (tail, mid-pedicle, and tip) are shown in yellow and the preoperative planned screw position is overlaid in green. Errors are labeled ML for medial–lateral, SI for superior–inferior; in this case, all ML errors are positive (medial) and all SI errors are negative (inferior) A: coronal imaging, B: sagittal imaging, C: axial imaging D: 3 dimensional representation of the vertebrae demonstrating the tail (entry point of the screw), mid-pedicle (point the screw is closest to the pedicular wall, and screw tip

Screw accuracy is reported by regions in Table 2. The most clinically consequential measurement is the medial–lateral mid-pedicular absolute error; one-way ANOVA demonstrates significant differences in this value between regions. Post-hoc pairwise tests showed that upper thoracic and lumbar regions are more accurate than lower thoracic (p < 0.03 and p < 0.01, respectively).

Table 2 Pedicle screw accuracy by region

On visual CT evaluation, 78 (88.6%) of screws were identified as Grade A and 10 (11.4%) identified as Grade B. Direct visualization after laminectomy revealed 3 breaches (3.4%), all less than 2 mm (Grade B). Two of the three breaches identified upon open dissection were also identified on the CT. The third breach was not identified on CT and was scored as Grade A by both raters (33.3% false-negative rate). CT assessment resulted in eight false-positive breach identifications when open dissection was used as the gold standard (8.4% false-positive rate).

Evaluation of registration accuracy was limited by percutaneous placement of the tantalum beads, with only 8 of the 16 beads were available for assessment due to either screw overlap or extra-osseous placement of the beads. Assessment of registration accuracy revealed the positional MAE ± SD was 0.45 mm ± 0.42 mm. The perturbation analysis found a mean translation error of 0.05 mm ± 0.05 mm and a rotation error of 0.07° ± 0.08°.

Discussion

This study is the first description of a fully automated algorithm for determination of the technical accuracy of pedicle screw placement that does not require human intervention in the rigid-body alignment or measurement process. The technical accuracy results from this study of the Mazor X Stealth robotic system are equivalent to or better than reported accuracy from any prior in vivo or cadaver study [5, 19, 20, 26, 27, 32, 33, 38,39,40,41,42]. At the mid-pedicle position, our MAE was 0.75 mm in the ML direction, 0.60 mm in the SI direction, and 1.04 mm in total. Our total angular error of 1.58° was also superior to any prior cadaver or in vivo study [5, 19, 26, 27, 29, 32, 33, 38,39,40,41,42].

Prior to the development of technical accuracy measurement systems, the G–R classification system served a useful role providing categorical accuracy and safety data. If we accept the premise that Grade A and Grade B are clinically acceptable, then only computer-assisted techniques (freehand navigation and robotically assisted navigation) consistently achieve the goal of 100% Grade A and B results (Table 3) [5, 19, 20, 26, 27, 32, 33, 38,39,40,41,42]. The current cadaveric study supports this conclusion, with all screws identified on CT as Grade A (88.6%) or B (11.4%), although only 3 screws were confirmed as breached after open dissection (8.4% false-positive rate).

Table 3 Summary of prior accuracy studies

The difficulties assessing pedicle breach and differentiating between Grade A and Grade B screws secondary to scatter cannot be underestimated, as well as observer bias and lack of a standardized approach to assessing the CT scans. It is likely that prior studies of breach based upon CT evaluation may also suffer from high false-positive rates; however, it is most likely to impact review of pedicle screws with less than 2 mm of breach, which are generally deemed clinically acceptable.

It is difficult to assess the technical accuracy of freehand techniques since freehand surgeons generally do not obtain preoperative 3D imaging or plan their screws on 3D planning software. However, a recent non-consecutive retrospective study of freehand pedicle screw categorical accuracy in 318 pediatric spinal deformity patients with 6,358 screws reported 2.63% of the pedicle screws were Grade C or worse, and 0.26% of screws necessitated UPROR [6]. Another non-consecutive multicenter retrospective review reported 0.26% incidence of both neurologic injury and misplaced instrumentation [7]. A single-center retrospective review of all pediatric patients who underwent spinal fusion over a 30-year period revealed a 1.1% incidence of UPROR related to malpositioned pedicle screws, neurological changes, or pneumothorax (presumably related to implants) [43]. Meta-analyses report overall accuracy rates between 90.6% and 94.9% for freehand and freehand navigation techniques [44, 45]. Collectively, these studies provide useful baseline data for accuracy and revision rates for pedicle screws using non-robotic techniques. The retrospective and non-consecutive nature of the pediatric multicenter database studies limit their utility for comparison with consecutive series of computer-navigated surgical accuracy in vivo and single-center consecutive case reviews [5, 19, 20, 26, 27, 32, 33, 38,39,40,41,42,43].

The advent of computer-assisted surgical navigation has fostered the development of technical accuracy, namely geometric comparisons between the 3D preoperative CT plan and the postoperative CT scan. Freehand navigation and robotically assisted navigation studies have variably described the technical accuracy of pedicle screw placement with increasing sophistication, but without any consensus on terminology or analysis methodology [5, 13,14,15,16,17, 19, 20, 22, 26,27,28,29, 32, 33, 38, 40,41,42]. The protocol described in this work is greatly influenced by a small number of studies utilizing either cadavers, bone models, or in vivo human studies which have iteratively improved the granular reporting of 3D technical accuracy (Table 3) [5, 19, 20, 26, 27, 32, 33, 38,39,40,41,42]. The largest studies of computer-assisted freehand and robotic navigation report total angular errors between 2.0 and 6.3 degrees compared with 1.58 degrees in the current study. Only one study by Volk et al. evaluated mid-pedicle error, which was 1.75 mm in the ML direction and 1.52 mm in the SI direction, compared to 0.75 mm and 0.60 mm in the current study [26].

A combination of standardized clinical safety data (e.g., G–R classification) and technical accuracy including precision is necessary to properly compare different navigation systems and workflows. We advocate a screw-aligned coordinate system as the natural reference frame to present useful feedback for surgeons, and we present “target analysis” visualizations (Fig. 4) to illustrate accuracy (mean error) and precision (2 SD) [19, 26, 33, 38, 40]. Although most prior authors reference screw tip and tail accuracy, we agree with Volk et al. that accuracy data at the mid-pedicle is most critical, as this describes proximity of the pedicle screw to the spinal canal [26]. Furthermore, in addition to MAE, we would strongly endorse reporting SME, as this provides valuable information about directionality of systematic errors and allows direct comparison between screws and across studies. Similarly, the variance (or SD) of the SME indicates the precision/consistency of the surgical technique.

Fig. 4
figure 4

Coronal error at screw tail (a), mid-pedicle (b) and tip (c). The origin (center of the target) is the planned screw trajectory and the blue (circle) marks represent actual individual screw positions; right-side screws have been mirrored across the sagittal plane to standardize laterality. Purple stars indicate the signed mean error, while the shaded ellipse represents two standard deviations in screw position. Grey target circles show 2 and 4 mm coronal error relative to planned position. ML = medial(+)/lateral(−), SI = superior( +)/inferior(−)

A limitation of prior accuracy studies is the requirement for human experts to perform the overlay of the preoperative plan with the postoperative screw position. This study reports the development of a fully automated protocol for determination of pedicle screw accuracy utilizing standard preoperative and postoperative CT scans. Our reported registration accuracy (0.48 mm) was limited by the imaging resolution (0.625 mm), while perturbation analysis showed the algorithm to be extremely consistent. In future studies, we intend to reduce the uncertainty of our registration accuracy, and we believe such assessments should be a prerequisite for technical accuracy data reporting. Some limitations of the current study include the small number of pedicle screws assessed, as well as the use of cadavers instead of in vivo screw assessment. The cadavers did not have any spinal deformity, which likely would impact accuracy results. We recognize that only a subset of accuracy errors in pedicle screw placement result in patient harm. However, improvements to accuracy, precision, and reliability of pedicle screw placement using robotically assisted surgical navigation have the potential to reduce the incidence of patient harm by decreasing the 0.26% incidence of UPROR secondary to malpositioned pedicle screws [6, 7, 43]. A standardized, systematic approach to the reporting of pedicle screw accuracy with computer-assisted pedicle screw insertion techniques using standardized nomenclature as well as a screw-centric 3D coordinate system is vital. Standardized cadaver and human models will greatly facilitate testing, comparison, and improvement of robotic systems (ASTM F2554-18).