An international validation of the AO spine subaxial injury classification system

To validate the AO Spine Subaxial Injury Classification System with participants of various experience levels, subspecialties, and geographic regions. A live webinar was organized in 2020 for validation of the AO Spine Subaxial Injury Classification System. The validation consisted of 41 unique subaxial cervical spine injuries with associated computed tomography scans and key images. Intraobserver reproducibility and interobserver reliability of the AO Spine Subaxial Injury Classification System were calculated for injury morphology, injury subtype, and facet injury. The reliability and reproducibility of the classification system were categorized as slight (ƙ = 0–0.20), fair (ƙ = 0.21–0.40), moderate (ƙ = 0.41–0.60), substantial (ƙ = 0.61–0.80), or excellent (ƙ = > 0.80) as determined by the Landis and Koch classification. A total of 203 AO Spine members participated in the AO Spine Subaxial Injury Classification System validation. The percent of participants accurately classifying each injury was over 90% for fracture morphology and fracture subtype on both assessments. The interobserver reliability for fracture morphology was excellent (ƙ = 0.87), while fracture subtype (ƙ = 0.80) and facet injury were substantial (ƙ = 0.74). The intraobserver reproducibility for fracture morphology and subtype were excellent (ƙ = 0.85, 0.88, respectively), while reproducibility for facet injuries was substantial (ƙ = 0.76). The AO Spine Subaxial Injury Classification System demonstrated excellent interobserver reliability and intraobserver reproducibility for fracture morphology, substantial reliability and reproducibility for facet injuries, and excellent reproducibility with substantial reliability for injury subtype.


Introduction
The AO Spine Subaxial Injury Classification System was designed as a potential tool to help guide management of traumatic subaxial cervical spine injuries. Although subaxial spine injury classifications have existed since the 1970s, they have predominantly relied on anatomic descriptions of injury mechanisms resulting in limited clinical utility [1][2][3]. Furthermore, previous classifications designed to help guide injury management have failed to gain global adoption secondary to poor reliability [4].
The AO Spine Subaxial Injury Classification System was therefore developed with the goal of prognosticating injury severity and creating a classification with good interobserver reliability and intraobserver reproducibility. To accomplish this, the classification system groups traumatic subaxial cervical spine lesions based on their morphology into A (stable-compression), B (potentially unstable-tension band), and C (unstable-translational) type injuries and includes a classification system of associated facet joint injuries. Morphologic injury types are further subdivided hierarchically into subtypes based on stability and injury severity [5]. In this manner, AO Spine created a concise yet comprehensive injury classification system with previous validation studies by the AO Spine Knowledge Forum Trauma group demonstrating substantial interobserver reliability and intraobserver reproducibility [6]. However, large-scale studies demonstrating the high reliability and reproducibility of the classification system are necessary.
A number of previous studies have aimed at validating subaxial cervical spine injury classifications, but they routinely rely on a small subset of validation members [7,8]. The utilization of large study groups or international spine organizations is one method to increase the generalizability of fracture classifications, but utilization of these groups has been infrequently reported in the cervical spine literature [9]. Further, no previous study has attempted to validate a subaxial cervical spine fracture classification, while including hundreds of validation members. Therefore, the primary goal of the study was to determine the reliability and reproducibility of the AO Spine Subaxial Injury Classification System via an open call to all participating AO Spine members.

Methods
A live webinar conference was hosted for validation of the AO Spine Subaxial Injury Classification System in 2020. All AO Spine members were invited to participate. Prior to participation, each member attended a live tutorial video and training session directed by one of the creators of the fracture classification. The conference was conducted in English. In this validation, 203 AO Spine members from six different geographic regions of the world (North America, Central and South America, Europe, Africa, Asia and the Pacific, and the Middle East) elected to participate in reviewing computed tomography (CT) videos of 41 distinct subaxial cervical spine injuries. The CT videos consisted of high-resolution sagittal, axial, and coronal videos. Each CT had a viewing range limited to the area of injury. At the same time, each participant was able to view key images of the injury. The videos were presented to the validation members in a randomized order (assessment 1).
Each validation member was tasked with classification of each subaxial cervical spine injury based on the AO Spine Subaxial Injury Classification System, which included injury morphology (A, B, C), injury subtype (A1, A2, B1, etc.), and presence of a facet injury (Fig. 1). After 3 weeks, each participant attended a second live webinar to evaluate the same CT videos (with a new randomized order) and re-classify them (assessment 2). All answers were recorded in an online survey. Demographic data including nationality, surgical subspecialty (orthopedic spine, neurosurgery, or other), and years of experience (< 5, 5-10, 11-20, and > 20) were recorded.

Statistical analysis
A chi-square test was used to evaluate significant differences in the demographic data. Agreement percentages were used to compare the validation member's classification grade to the "gold standard," defined by a panel of expert spine surgeons and traumatologists who came to unanimous agreement on the classification of the injury. Cohen's Kappa (ƙ) statistic was used to assess the reproducibility and reliability of the injury morphology (A, B, or C), injury subtype (A1, A2, A3, etc.), and facet injury (F1, F2, F3, or F4) classification between independent observers (interobserver reliability) and the reproducibility of the injury classification over two assessments (intraobserver reproducibility). The ƙ coefficients were interpreted using the Landis and Koch grading system [10]. A ƙ coefficient of less than 0.2 was defined as slight, between 0.21 and 0.4 as fair, between 0.41 and 0.6 as moderate, between 0.61 and 0.8 as substantial, and greater than 0.8 as excellent reliability or reproducibility.

Results
A total of 203 validation members elected to participate in the AO Spine Subaxial Injury Classification System. A significantly greater proportion of validation members lived in Europe (40%) and Asia (24.6%) with the remaining from Central or South America (16.7%), North America (8.9%), the Middle East (7.4%), and Africa (2.5%) (p < 0.001). Most validation members were orthopedic surgeons (60.6%) or neurosurgeons (36.9%) with only five members identifying as "other" physicians (2.5%) (p < 0.001). The "other" group consisted of residents and radiologists (Table 1).

Percent agreement with gold standard
Percent agreement for fracture morphology on assessment 1 (AS1) and assessment 2 (AS2) was 95.4 and 94.7%, respectively. Percent agreement for fracture subtype (AS1: 91.7%, AS2: 90.6%) was lower than the percent agreement for fracture morphology, but similar to the percent agreement for facet injury (AS1: 88.6%, AS2: 91.3%). Additionally, the validation members had minimal variability in correctly identifying each fracture morphology [range, 87.
We subsequently reclassified the interobserver reliability based on surgeon experience, surgical subspecialty, and geographic region to determine if a surgeon's region of practice, surgical specialty, or experience level resulted in variability in the interobserver reliability of the injury classification. Surgeon experience did not affect interobserver reliability for fracture morphology [range, AS1: 0.83-0.89, AS2:  (Table 4).
Similar to the interobserver reliability, intraobserver reproducibility was reclassified based on surgeon experience, surgical subspecialty, and geographic region to determine if these factors influenced reproducibility. Although surgeons with 11-20 years' experience had slightly higher intraobserver reproducibility in fracture morphology  Table 6).

Discussion
The international validation of the AO Spine Subaxial Injury Classification System resulted in classification accuracy of greater than 90% for fracture morphology and fracture subtype on both assessments and demonstrated excellent  interobserver reliability and intraobserver reproducibility for fracture morphology, substantial to excellent reliability and reproducibility for fracture subtypes, and substantial reliability and reproducibility for facet injuries. Further, each fracture morphology type (A, B, and C), fracture subtype (A1, B1, C1, etc.) and facet injury type (F1, F2, F3, and F4) had at minimum substantial reliability and reproducibility indicating the system may be universally applied across all subaxial cervical spine injuries. Overall, the results from this international validation study support the utilization of the AO Spine Subaxial Injury Classification System as a tool to communicate subaxial cervical spine injury patterns on a global scale. The first study to validate the AO Spine Subaxial Injury Classification System was a pilot study that consisted of ten AO Spine Knowledge Forum Trauma members [6]. Their validation study demonstrated the classification had substantial interobserver reliability for injury subtypes (ƙ = 0.64) and injury morphology (ƙ = 0.65) with substantial intraobserver reproducibility for injury morphology (ƙ = 0.77) and injury subtype (ƙ = 0.75) [6]. The AO Spine pilot study combined facet injuries into fracture morphology (A, B, C, and F) and injury subtypes (A1, B1, C1, F1, etc.) making a direct comparison between the international validation study and the pilot study groups difficult. However, when comparing the AO Spine pilot group's facet injury interobserver reliability (ƙ = 0.66) to the international validation group's facet injury reliability (AS1: 0.67, AS2: 0.74) both  Table 4 Interobserver reliability of 2020 validation Mean kappa values are sorted by surgeon experience, subspecialty, and region of practice validation groups had a similar substantial reliability. It can also be reasonably assumed that the AO Spine pilot study had similar intraobserver reproducibility (ƙ = 0.75) compared to the international validation after accounting for the separation of fracture morphology reproducibility (ƙ = 0.85) and facet injury reproducibility (ƙ = 0.76). 6 Given the disparate injury morphology reliability between the international group and AO Spine pilot study group, it is unlikely inclusion of facet injuries alone accounted for the large gap in reliability (ƙ = 0.87 vs. 0.65, respectively). While substantial, the reproducibility for facet fracture classification remains lower than that of fracture subtype and morphology. This is likely secondary to difficulties distinguishing between F1 and F2 which are commonly misdiagnosed for one another. Reproducibility would likely improve with CT scan imaging with 1 mm cuts [11]. A couple of reasons may explain why the international validation results had a higher injury morphology reliability when compared to the pilot study. First, the international validation group had 203 participants, compared to the AO Spine pilot study that had 10 participants. This improves the margin of error for a participant who has difficulty applying the classification to cervical spine injuries. Perhaps more importantly, the classification system was available for global use five years prior to the international validation study, giving participants time to utilize the classification system in their spine practice before participating in the international validation. Even though our results suggest there is no correlation between surgeon experience and improved AO Spine Subaxial Injury Classification System reliability or reproducibility, no study has examined if increased application of the classification to cervical spine injuries improves a participants accuracy. Of note, no previous study has found a correlation between surgeon experience and the reliability and reproducibility of different AO Spine classifications [12,13].
A neurosurgery and orthopedic spine attending and three neurosurgery residents performed a separate independent validation of the AO Spine Subaxial Injury Classification System [14]. The intraobserver reproducibility for injury morphology was excellent for both attending spine surgeons (ƙ = 0.86 and 0.95, respectively), and substantial for residents (ƙ = 0.66-0.75) [13]. This held true for injury subtypes with spine surgeons demonstrating excellent reproducibility (ƙ = 0.80 and 0.93, respectively), and residents demonstrating substantial reproducibility (ƙ = 0.63-0.67). When evaluating injury morphology and injury subtype reliability, kappa coefficients ranged from moderate (morphology: ƙ = 0.52 vs. subtype: ƙ = 0.51) on assessment 1 to substantial (morphology: ƙ = 0.63 vs. subtype: ƙ = 0.60) on assessment 2 [14]. The contrast in injury morphology reliability between neurosurgery residents and attending surgeons suggests additional use of the classification may improve its accuracy and the importance of clinical experience in understanding nuanced spinal anatomy and fracture patterns, but future studies are required to confirm this finding.
The AO Spine Latin America Trauma Study group also validated the reliability of facet injuries based on the AO Spine Subaxial Injury Classification System and found surgeons practicing in South America compared to Central America, neurosurgeons compared to orthopedic spine surgeons, and surgeons with 5-10 years' experience had a greater classification accuracy based on univariate analysis [15]. However, on multivariate analysis only South America region remained significant, while hospital type became significant [15]. Although our study identified an increase in orthopedic spine specialists participating in the webinar, both neurosurgeons and orthopedic spine surgeons had excellent interobserver reliability and intraobserver reproducibility for fracture morphology and subtype, with substantial facet injury interobserver reliability and intraobserver reproducibility. Further, there was minimal variation in intraobserver reproducibility and interobserver reliability based on geographic region. This is consistent with literature evaluating previous AO Spine fracture classification systems in which geographic region did not account for any significant variation in the radiographic classification of thoracolumbar fractures [13].
Limitations were present during this study, which require discussion. A previous iteration of this study was attempted in 2018 with the intention to validate the AO Spine Subaxial Injury Classification System on an international scale. However, the disappointing validation outcomes resulted in methodological design alterations and subsequent revalidation of the classification system in 2020. Although discussed in a separate manuscript, the improvement in validation methodology likely accounted for the substantial to excellent reliability and reproducibility of this classification system. Unique CT videos, which were not previously circulated, were displayed during the 2020 validation. Therefore, any participant who may have had access to the 2018 validation injury films would not obtain an advantage during the 2020 validation. Additionally, due to the utilization of a live webinar to validate the subaxial cervical spine injury classification, participating members were given limited time to classify each injury. This may have led some members who process images at a slower rate, have less experience, or are not fluent in the English language to struggle with completing the validation in a timely fashion, which could have artificially suppressed the reliability and reproducibility of the classification [16][17][18][19]. However, given the substantial to excellent reliability and reproducibility of the classification system on a global level, this was likely of limited significance. While use of magnetic resonance imaging (MRI) would be helpful to better evaluate the extent of associated soft tissue injuries, AO Spine classification systems utilize CT scans to classify all injuries to minimize inequality gaps present globally that limit access to MRI in some areas [20,21]. CT scan remains the gold standard for spinal trauma work up, as they are quicker and more accessible than MRIs, with some spine surgeons reporting MRIs taking greater than 24 h to obtain [22].

Conclusion
The AO Spine Subaxial Injury Classification System demonstrated excellent intraobserver reproducibility for fracture morphology and fracture subtype with substantial reproducibility for facet injury. The classification system also had substantial to excellent results when assessing interobserver reliability for fracture morphology, fracture subtype and facet injury. When assessing the reliability and reproducibility of the classification system for each fracture subtype and facet injury variation, the AO Spine Subaxial Injury Classification System demonstrated at minimum substantial reliability and/or reproducibility indicating its global applicability as a classification tool for subaxial cervical spine injuries.