Background

Burst fractures are compression fractures of the spine involving injury to the posterior wall of the vertebral body and at least one end plate [1]. They often occur in the thoracolumbar transition due to the biomechanical forces being highest in this area [2].

The Swedish Fracture Register (SFR) is a nationwide quality register that collect data on all types of orthopedic fractures [3]. Data collected include the date and cause of injury, fracture classifications, treatment, reoperations, and patient-reported outcome measures. The SFR uses a simplified AO/OTA classifications [4] with the aid of pictures to guide the registering physician. Registrations in SFR are made by physicians of different levels of experience, including interns, residents, emergency physicians and orthopedic surgeons.

Spine fractures have been included in the SFR since 2015 [5]. As physicians of any expertise may register the fracture, the accuracy of the classifications, and in turn, the reliability of the data in the SFR may be compromised.

Methods

Aim

The aim of this study was to determine the reliability of the classification of thoracolumbar burst fractures in the SFR.

Study design

This study is retrospective on prospectively collected data.

Study population

Patients in working age, 18–66 years, with a single-level thoracolumbar fracture from the tenth thoracic vertebra to the third lumbar vertebra classified as a burst fracture were identified in the SFR.

Data in the SFR during the time period included the majority of trauma and orthopedic departments in Sweden, including university hospitals and county hospitals [3]. Data is entered by the physician treating the patient in the web based platform, often by specialist or resident in orthopedics [5], but may be registered by even less experienced physicians [6]. Registrations is optimally entered already in the emergency department, but subsequent treatments can be added later [6]. Details concerning the registration process are described in the Appendix.

Thoracic and lumbar fractures in the SFR are classified using a modified version of the 2013 version of AO spine injury classification (AOSpine classification) by Reinhold et al. [7]. The physician selects the fracture level, neurological function, and the type of fracture with the assistance of pictures and information by text. As opposed to the original classification, the SFR classification doesn’t distinguish between incomplete and complete burst fractures [8]. Compression fractures are divided in three categories: simple compression fractures (A1), pincer fractures (A2) and burst fractures (A3/4). The physician then asked to add whether there is a concomitant injury to the posterior tension band (B-type injury). Finally, the physician is asked to decide if there are signs of ankylotic spinal disorder in the fractured area. A detailed description of the modified AOSpine classification is available in the Appendix.

The flow chart of the study is presented in Fig. 1. In this study, patients with factors that influence the decision for or against operative treatment were excluded. These factors included injury to the spinal cord or cauda equina, open fractures, pathological fractures, fractures on multiple levels and low-energy injuries (defined as a fall from standing height or less). Patients were collected from the inception of spine fracture registrations in the SFR, starting in 2015, until February 2019.

Fig. 1
figure 1

Flow chart of the study

Patients with any type of medical imaging within two weeks from the time of injury, as well as before surgery for the operatively treated patients, were eligible for the study. Computed tomography (CT), magnetic resonance imaging (MRI) or conventional radiograph of the spine were collected from the treating hospitals. If more than one modality was available, CT and MRI were prioritized.

Medical imaging assessment

Medical images were reviewed and classified independently by two physicians (SB and FB) and by a third physician (PG) where disagreement on classification occurred. The radiographs were anonymized, and the reviewers only had information about the patient’s birthdate and treatment. No information about patient history, cause of injury, sex and the radiologist evaluation were available. The fracture level was identified, and the fracture type was classified according to the AOSpine classification by Reinhold et al. [7]. The three reviewers had different levels of experience (SB orthopedic resident with training in the AOSpine classification, FB specialist in orthopedic surgery and PG specialist in orthopedics and experienced spine surgeon). The combined results of the three reviewers were considered the gold standard.

Agreement

To evaluate the intra-rater reliability, the reviewers SB and FB assessed a subset of 52 patients, 3 to 6 months after the first classification. For assessing the inter-rater reliability between reviewers SB and FB the initial classification was used. Finally, the inter-rater reliability between the gold standard and the SFR classification was performed.

Statistics

Data are presented with means, medians, quantiles, minimum, maximum and standard deviations for continuous variables, and numbers and percentages for categorical variables. The intra- and inter-rater reliability was determined with percentages of agreement and weighted Cohen’s kappa. The reliability was interpreted using the criteria first proposed by Landis and Koch [9], with kappa coefficients of 0.0–0.2 representing slight reliability, 0.2–0.4 as fair, 0.4–0.6 as moderate, 0.6–0.8 as substantial, and > 0.8 as excellent. The positive predictive value (PPV) was calculated to determine the accuracy of the classifications. Comparisons were made based on experience of the registering physician, choice of treatment (operative vs. non-operative), and hospital type. Statistical significance was set to p < 0.05.

RStudio software version 4.1.0 for Windows (R Foundation for Statistical Computing) was used for all statistical analyses.

Results

Patients and descriptive data

The patient characteristics are summarized in Table 1.

Table 1 Patient characteristics

Comparisons between the SFR classification and the gold standard classification are displayed in Table 2. The accuracy and agreement of the fracture level in comparison to the gold standard was excellent (95%). The number of A1 compression fractures misclassified as a burst fracture was significantly greater for non-operatively treated patients compared to operatively treated (25 (15%), 3 (3%) respectively). There was a significant difference in experience in registrations between operatively and non-operatively treated patients with the majority of operatively treated patients being registered by a spine surgeon (85%), while the non-operative patients were registered by physicians of varying level of experience.

Table 2 Comparison between gold standard and SFR

Intra- and inter-rater reliability

The intra- and inter-rater reliability of the reviewers are presented in Table 3. The percentage of agreement comparing classifications made by the reviewers on different occasions were excellent (83 − 99%). The intra-rater reliability was substantial to excellent, with the exception for B type injuries for reviewer 2 which had fair reliability (kappa 0.28).

Table 3 Intra and inter-rater reliability of the reviewers presented as percentage of agreement (PA) and Cohen´s kappa

Third review assessment

Of the total of 277 patients, 71 patients were assessed by the third reviewer. 10 patients were reviewed due to disagreement on the fracture level. 47 patients were reviewed for disagreement in the classification of type A-injury. Among these, 19 cases were reviewed for disagreement whether there was a wedge compression (A1) or an incomplete burst fracture (A3), and 17 cases for whether there was an incomplete (A3) or a complete (A4) burst fracture. The remaining 11 cases had disagreement whether there was a simple compression (A1) or a pincer fracture (A2) (3 patients), wedge compression (A1) or a complete burst fracture (A4) (3 patients), a pincer fracture (A2) or a complete burst fracture (A4) (1 patient), a fracture that should not be classified as A-type (4 patients). 8 (of 277) patients were reviewed for disagreement in the classification of type B-injures.

The inter-rater reliability between reviewers and gold standard were substantial to excellent across all sub-classifications (Table 3).

Comparison between SFR and the gold standard

Accuracy

The accuracy determined by the PPV for correctly classifying patients having a burst fracture remained high regardless of the experience of the physician, for non-operatively and operatively treated patients and type of hospital (Table 4). However, the classification of B-type injuries was notably less accurate with PPV ranging from 0 to 40%.

Table 4 Accuracy presented as PPVs comparing SFR with gold standard

Agreement between the SFR with the gold standard

The percentage of agreement and inter-rater reliability of the SFR classifications compared with the gold standard are summarized in Table 5. The estimation of Cohen’s kappa for A-type fractures was not possible for this dataset, as all reviewed fractures were determined to be burst fractures according to the SFR. The reliability of the fracture level was excellent (kappa 0.94). The reliability for B-type injuries were only slight (Cohen’s kappa 0.15). When ignoring the subtype of B injury, the reliability slightly improved (kappa 0.34). The overall reliability of the AOSpine classification with the SFR modification was slight (Cohen’s kappa 0.08) even when ignoring the subtype of B-injury registered (kappa 0.23). The inter-rater reliability did not change when analyzing subgroups by age, sex, treatment, registering physician or fracture severity (A3 vs. A4).

Table 5 Inter-rater reliability presented as percentage of agreement (PA) and Cohen´s kappa between the gold standard and SFR-classification

Discussion

The accuracy of correctly classifying a burst fracture in the SFR is excellent regarding level of fracture and type A injury regardless of the experience of the registering physician. However, the classification of B-type injuries in the SFR is far less reliable, with a significant number misclassification even among spine surgeons.

The SFR, being a national quality register that collects data on all types of orthopedic fractures, is a valuable resource for observational research on patients with thoracolumbar burst fractures to identify factors associated with treatment outcomes. Furthermore, as SFR expands to register-based randomized controlled trials, it is important that the classification of burst fractures is reliable [10]. The present study focused on investigating the reliability of classifications for thoracolumbar burst fractures in selected patients where both operative and non-operative options both are considered valid. This is a group of patients that may be registered and classified by physicians with a wide range of experience based on the local tradition of how they are usually treated.

The intra-rater and inter-rater reliability of the AOSpine classification as assessed by the reviewers in our study demonstrated substantial to excellent reliability. This is in accordance with previous reliability studies made on the classification system which have shown moderate to substantial reliability [11,12,13,14]. In a previous study by Morgonsköld et al., the reliability of spine fractures in the SFR was considered acceptable with moderate agreement [8]. In the study, physicians of varying experience levels were involved, all of whom had previous knowledge of the different classification systems used in the SFR, including AOSpine classification. Their gold standard was determined through the consensus of two experienced physicians and accuracy was measured using Cohen’s kappa. However, no direct comparison was made with the recorded classifications in the SFR. In practice, fractures may be registered by physicians without specific training in the classification systems being used.

Based on our results the likelihood that a burst fracture is correctly registered as a burst fracture in the SFR is high regardless of the level of experience in the registering physician. Although the percentage of agreement and PPV was high amongst all levels of experience, the number of simple compression fractures was substantially larger for non-operatively treated patients. Most registrations of operatively treated spine fractures were classified by surgeons themselves, whereas non-operatively treated spine fractures were registered by physicians of varying expertise, which explains the difference. In addition, the proportion of complete burst fractures (A4) compared to incomplete (A3) was higher in the operative group, which is also reasonable, as surgeons are likely to be more inclined to choose operative treatment options in patients with more severe fractures. Because the SFR classification do not distinguish between incomplete and complete burst fractures, simple comparisons between treatment groups may be erroneous, as the groups are not completely similar. The correct identification of complete burst fractures (A4) has been demonstrated to be challenging even among spine surgeons [14]. Consequently, opting not to separate burst fractures into incomplete and complete in the SFR is reasonable, considering the wide range of expertise among the registering physicians.

However, it is necessary to draw attention to the inter-rater reliability for B-type injuries, which was comparatively lower when comparing the gold standard with the classification in the SFR. Previous studies have shown that the inter-rater reliability of posterior tension band injuries tends to be lower compared to type A and C injuries [15,16,17]. Surprisingly, our study demonstrated an even lower level of reliability in this respect. A significant number of the fractures were misclassified as B1. This discrepancy may be explained by the accompanying textual descriptions with the pictures in the SFR that describes B1 as a “fracture through the vertebral body and rupture of the posterior tension band structures through bone” and B2 as “Rupture of the posterior tension band with or without skeletal injury”. It may not immediately apparent that a B1 fracture signifies a monosegmental osseous failure to a physician without knowledge of the classification system. This finding was also unexpectedly common for spine surgeons who should be familiar with the concept of the posterior tension band. According to our results, most physicians have registered any skeletal injury to the posterior structures of the lamina, including non-distraction type injuries such as vertical laminar fractures, as B-injuries. It should be noted that B3 type injuries according to the latest version of the AOSpine classification [16] is classified as C1 in the SFR classification, similarly to the classification presented by Reinhold et al. [7]. The absence of B3 injuries in our material can be explained by the fact that we excluded patients with ankylosing disorders given their increased likelihood of presenting with unstable fractures necessitating surgical fixation [18], and considering that the distinct injury mechanism associated with B3 injuries are primarily characterized by hyperextension [16] in contrast to axial compression for burst fractures [1].

To improve the reliability of B-type injury classifications in the SFR, we suggest that the register consider adding further choices with images depicting various types of fractures to the lamina, including vertical and horizontal spinous fractures. Additionally, it could prompt users to specify whether the horizontal spinous fracture is at the same level as the vertebral body fracture or not. These enhancements may help reduce misclassifications, ultimately improving the reliability of data collected in the SFR.

For thoracolumbar fracture to be reliably classified a CT is the imaging modality of choice [19]. In Swedish medical care where a fracture of the spine is considered a CT is the modality performed and conventional radiography has more or less become obsolete. The SFR doesn’t specify which imaging modality has been used for making the diagnosis. In our material most patients had undergone a CT, which was expected. 11 patients only had a conventional radiography. We chose to not exclude the patients with only conventional radiography as we aimed at determining the reliability of thoracolumbar burst fractures in the register. Excluding the cases with only a conventional radiography may have improved the reliability slightly. In 4 cases we only retrieved an MRI at the time of injury. In all 4 cases the patients underwent surgery and postoperative CT where available. We expect these cases to be patients from smaller hospitals that had been referred to the university hospital, although we have no possibility to verify this assumption. We chose to include these patients as the MRI was sufficient to determine whether a burst fracture was present or not, even though MRI is not the modality of choice determining skeletal injury [19].

In our study, most patients had not undergone an MRI within two weeks of injury. MRI plays an important role regarding soft tissue injuries and previous studies have shown that the classification of thoracolumbar fractures can change in 10 to 30% of cases [19,20,21,22,23]. The limited utilization of MRI may have implications for the accuracy of the true fracture classification, especially regarding B-type injuries although the clinical implications regarding a change in treatment with routine MRI are still questionable [19, 23].

Strengths

The collection of consecutive patients from multiple centers and comparison with the classifications made in the SFR by physicians of different levels of experience under real-life conditions constitute the strengths of this study.

Limitations

There are some limitations of this study that should be acknowledged. The cohort was prospectively collected but the analysis and classification of medical images was carried out retrospectively. We did not have clinical information about the patients, such as symptoms, levels of pain and comorbidities. Only burst fractures were collected and reviewed, which meant that the inter-rater reliability of A type injuries between the gold standard and SFR could not be made. We only reviewed selected patients where both operative and non-operative treatment are both valid options.

The patients identified for this study are likely to be only a fraction of the true number of patients in Sweden with a thoracolumbar burst fracture during this period. Since then, the coverage of SFR has increased SFR [6]. Despite this limitation, our study provides valuable insights to the SFR’s classification reliability of thoracolumbar burst fractures, emphasizing the need for further considerations in future research in this area.

Conclusions

The accuracy of classifying thoracolumbar burst fractures in the SFR is high regardless of the level of experience by the physician, treatment allocation and treating hospital. However, the inter-rater reliability of the AOSpine classifications of thoracolumbar burst fractures in the SFR is low when compared to reviewers with specific training in the classification system, particularly concerning B-type injuries. There are noticeable differences between operatively and non-operatively treated patients, and simple comparisons between treatments with data from the SFR without further review of medical images may lead to erroneous results. Future register-based studies on burst fractures with data from the SFR where classifications are of importance and when comparisons between operatively and non-operatively treated patients are made should include a review of medical images to verify the registered diagnosis.