Introduction

Odontoid fractures account for 9 to 18% of all cervical spine fractures and are most frequently caused by either hyperextension or hyperflexion [1,2,3,4]. In the elderly, odontoid fractures are the most common cervical spine fractures [5, 6]. Moreover, as the population ages, these fractures will become increasingly relevant to clinical practice [4]. The optimal treatment of odontoid fractures in the elderly is, however, still subject to controversy. This age group typically suffers from an increased risk of operative complications when treated surgically but is also at a higher risk of non-union and prolonged treatment duration when treated conservatively.

The treatment for odontoid fractures is typically based on fracture pattern (such as defined by Anderson and d’Alonzo), patient age, neurological deficits and the patient’s medical condition, in an effort to weigh fracture healing versus treatment complications [2, 4, 7]. The general presumption is that a surgical intervention, i.e., either anterior odontoid screw fixation or posterior atlantoaxial fusion, leads to a stable cervical spine. However, the condition of the patient may deteriorate by undergoing (major) cervical spine surgery. Surgical intervention carries significant risks particularly in a very old patient (≥ 80 years). An alternative to surgical stabilization is conservative treatment, involving rigid or non-rigid immobilization. This treatment, however, can also fail and prolong fracture instability, requiring secondary surgery, which unnecessarily lengthens treatment duration. Additionally, conservative treatment can cause immobilization-related complications, e.g., pneumonia, pressure sores.

The objective of this review was to summarize and compare the outcomes of surgical and conservative treatments for type II and III odontoid fractures in the elderly (≥ 65 years), focusing primarily on clinical outcomes and secondarily on fracture union and stability rates. This review is an update of a systematic review published by the authors in 2013 [8].

Methods

Search methods for identification of studies

The PRISMA checklist was used for this review. A systematic search was conducted in seven databases of medical literature: MEDLINE, Embase, Cochrane Central Register of Controlled Trials, Web of Science, Emcare, Academic Search Premier, and PEDro to update the author’s previously published systematic review in 2013 (Supplementary Material). The updated search spanned between April 2012 and January 2022. No restriction was made with regard to language or date. ‘Os odontoideum’ was included in the search, as this term is sometimes incorrectly used to describe odontoid fractures. Duplicate references were removed. References from the included studies were also screened in order to identify additional primary studies not previously identified. Two review authors (JH, CV) working independently examined titles and abstracts from the electronic search. Full texts were obtained for titles and abstracts that were approved by pairs of reviewers. A third review author was consulted, if consensus was not reached.

Criteria for considering studies for this review

Studies were included if the following criteria were met: 1—Studies described one or more outcomes of at least ten patients treated for acute type II or III odontoid fractures, with or without associated fractures or dislocation. 2—Participants were at least 65 years old, and their data could be extracted separately from studies that also involved younger subjects. 3—Inclusion criteria were explicit, and the follow-up period was at least two weeks. 4—The study evaluated any surgical and/or conservative treatment and results were given for each distinct treatment. 5—Patients were not treated for odontoid fractures in the past. 6—Patients did not suffer from systemic comorbidity expected to influence outcome (e.g., rheumatoid arthritis). 7—The paper was published in a peer-reviewed journal. Case reports were excluded.

Clinical outcome was the primary outcome. The Neck Disability Index (NDI) was the most commonly used instrument to assess clinical outcome. The NDI is a 50-point scale, in which a higher score represents a higher degree of disability. The minimal clinically important change/difference (MCID) for the NDI was determined to be 7.5 [9,10,11,12]. The Visual Analogue Scale (VAS) pain score was also commonly reported. The VAS is a 10-point scale (derived from a 100 mm scale), in which a higher score represents a higher degree of pain, and of which the MCID was determined to be 1 (10 mm) [13, 14]. The Smiley-Webster Scale (SWS) was the third commonly used instrument. The SWS is an ordinal scale from 1 to 4, in which 1 represents excellent functioning and 4 represents poor functioning. Fracture union- and stability rates were the secondary outcomes. Fracture union was defined as the presence of bony consolidation of the fracture. Fracture stability was defined as the presence of either bony consolidation or fibrous union of the fracture.

Data collection and analysis

Two review authors (KB, CR) working independently conducted the data extraction. From each study, both demographic/descriptive data (e.g., study population, sample size, number of patients followed-up, fracture types, age, gender, applied treatment) and quantitative data regarding outcomes and complications were extracted. Outcomes were extracted at 52 weeks when available, and, if missing, outcomes at the last follow-up moment were extracted. Outcomes reported over wide and equally spaced intervals, such as NDI and VAS pain scores, were treated as continuous variables, and means and standard deviations were extracted and meta-analyzed. Unless otherwise specified, a random-effects model was used to calculate pooled point estimates with 95% confidence intervals for the NDI, fracture union, and fracture stability. A fixed-effect model was used for VAS scores to avoid negative values for the lower bound of the 95% CI. Outcomes reported over narrow or unequally spaced intervals, such as the SWS score, were treated as categorical ordinal variables, of which medians and their respective ranges were extracted. Given that medians cannot be meta-analyzed, weighted medians were calculated by multiplying the sample size in each study by its respective median, divided by the total number of patients in all studies, of which the result was rounded off. This was similar to the fixed-effect model using sample size as the weighting method. Forest plots were generated to summarize the results. P-values for heterogeneity (< 0.10) and I-squared were computed. I-squared values for heterogeneity were categorized as low (0–25%), moderate (25–50%) and substantial (> 50%). A random-effects multivariable meta-regression model was used to correct for baseline co-variates when sufficiently reported. Both baseline co-variates and clinical outcomes were heterogeneously and sparsely reported. Correction in the meta-regression analysis was therefore only feasible for mean age and fracture type (II, II/III) in relation to the radiological outcomes. The heterogeneous reporting of clinical outcomes made further analyses of these outcomes infeasible. Three meta-regression analyses were done for fracture union and fracture stability. The first model included treatment type, age and fracture type. The other models were for each treatment type separately: one for surgical and one for conservative treatment, including only age and fracture type to the model. A two-tailed p-value < 0.05 was considered statistically significant, unless otherwise indicated. Analyses were performed using Comprehensive Meta-Analysis Software (CMA), version 4.

Assessment of risk-of-bias for the included studies

Two review authors (KB, CR) working independently conducted the risk-of-bias assessment. Studies were classified as cohort studies if confounding variables were corrected for; otherwise, studies were treated as two separate case series extracted from one original study even if these studies were labelled as cohort studies by the authors. Risk-of-bias of the individual studies was assessed with methodology scores based on the type of study: Newcastle–Ottawa Quality Assessment Scale (NOS) for cohort studies and a self-designed appraisal form for uncontrolled case series based on three other studies [15,16,17,18, Supplementary Material]. For the NOS, cohort selection, comparability and outcome assessment were scored on a 0 to 9 range. Items were scored as positive if they fulfilled the criterion, negative when bias was likely or marked as inconclusive if there was insufficient information. If an item was scored positive, one point was awarded. The number of positively scored items was summed per study, adding up to a score between 0 and 22 points for this instrument. Differences in the scoring of the risk-of-bias assessment were discussed during a consensus meeting. For outcomes reported in at least ten studies, the potential for small study bias was assessed using funnel plots, along with Begg’s test for categorical outcomes and Egger’s test for continuous outcomes [19, 20]. Because of high heterogeneity in the results, the trim-and-fill method was not used to address the potential publication bias if an asymmetry was found in the funnel plot [19]. Instead, the classic fail safe n, which is the number of missing studies that would bring the p-value to > alpha, was conducted and reported for each outcome.

Results

Search and selection results

The initial search yielded 1,337 unique references, after removal of duplicates also identified in the search for the previous review [8]. After screening studies based on title and abstract, 127 studies were selected for full-text screening. Additionally, reference and citation tracking were carried out, yielding no further references. A total of thirty-one studies were initially identified. The seventeen unique studies from the previous review were also included, adding up to a total of forty-eight. Seven studies were subsequently excluded, all because they were believed to (partially) describe the same patient cohorts as other studies included in this review [20,21,22,23,24,25,26]. These studies were excluded in favor of studies (partially) reporting on the same patients, but that reported on larger samples and/or more appropriate clinical/radiological outcomes. A total of 41 studies were eventually included (Fig. 1 and Table 1). Four studies were carried out prospectively. Thirty-nine studies were published in English, one in French and one in German. Twenty-four studies systematically reported clinical outcome and hence, were primarily included, as this review’s primary focus was on clinical outcome. The other seventeen systematically reported union- and/or stability rates only and were consequently secondarily included. Overall, forty studies reported fracture union, and all forty-one reported fracture stability.

Fig. 1
figure 1

Modified PRISMA flow diagram depicting the study selection process

Table 1 Characteristics of included studies

Risk-of-bias assessment

Only one study corrected for confounding variables and was hence classified as cohort study, while the remaining studies were classified as case-series (Supplementary Material). All but four studies were retrospective case series. Quality scores for case series ranged between 10 and 20 on a 22-point scale. For the case series, baseline demographics and results were mostly adequately reported, whereas baseline clinical status was generally poorly and heterogeneously reported. Funnel plots were only feasible for the outcomes of fracture union and fracture stability, as clinical outcomes were reported in fewer than ten studies (Supplementary Material). Begg’s test showed a significant small study effect for the surgically treated group for both fracture union and fracture stability (p = 0.048 and p = 0.049, respectively). Of note, the source of asymmetry in a funnel plot could be due to other reasons than publication bias (e.g., true heterogeneity, data irregularities, selection bias) [20]. Given the source of asymmetry was driven by publication bias, the classic fail safe n showed a very large number of missing studies that would be needed to bring the pooled results to become non-significant. This reinforced that the results were robust to any potential publication bias.

Baseline characteristics

A total of 2099 patients were included, of which 1104 (53%) were treated surgically. A total of 1917 patients were followed-up clinically and/or radiologically, representing a 91% follow-up rate. The pooled mean age was 80.6 (95% CI 79.0, 82.1) for surgically treated patients and 81.7 (95% CI 79.7, 83.7) for conservatively treated patients. A total of 1742 (83%) patients were treated for type II fractures, while the remaining 357 patients were enrolled in studies describing both type II and III fractures, in which outcomes were not typically split out by fracture type. The pooled mean follow-up time was 47.9 (95% CI 39.3, 56.4) weeks for surgically treated patients and 55.9 (95% CI 45.3, 66.5) weeks for conservatively treated patients. ASA scores were most frequently used to report baseline functioning, yet were still only provided in fourteen studies. Mean fracture displacement could be derived from only nine studies. Analysis of a difference in baseline functioning and fracture displacement between treatment groups was not feasible.

Clinical outcomes

Analysis of clinical outcome was only feasible for the three most commonly used instruments (Tables 2, 3 and Fig. 2). The remaining studies used other tools that were reported too sparsely to be compared.

Table 2 Results of random effects analyses for clinical and radiological outcomes
Table 3 Median and range for the ordinal clinical outcome as reported by original studies
Fig. 2
figure 2figure 2figure 2

Forest plots showing the pooled average reported outcome (for continuous data) or pooled incidence (for discrete data) stratified by treatment type, surgery and conservative. The squares represent the point estimate of each study with the horizontal lines denoting the 95% CI. The size of the square is proportional to the weight of each study. The center of the gray diamond is the pooled point estimate for each subgroup using a random effects model and its width reflects the 95% CI

Neck Disability Index (NDI)

Seven studies reported NDI scores, of which four for both surgically and conservatively treated patients. NDI scores were available for 700 patients, of which 156 (22%) were treated surgically. The pooled mean NDI score was 14.2 (95% CI 8.79, 19.5) for surgically treated patients and 16.0 (95% CI 12.0, 19.9) for conservatively treated patients. The difference was not clinically relevant (< 7.5), and the data were substantially heterogeneous (p-heterogeneity surgical < 0.001, I-squared 97.4%; p-heterogeneity conservative < 0.001, I-squared 98.9%).

Visual Analogue Scale (VAS) pain

Five studies reported VAS pain scores, of which one for both surgically and conservatively treated patients. VAS scores were available for 180 patients, of which 150 (83%) were treated surgically. The pooled mean VAS score was 1.53 (95% CI 1.35, 1.72) for surgically treated patients and 0.73 (95% CI 0.30, 1.16) for conservatively treated patients. The difference was not clinically relevant (< 1), and the data were substantially heterogeneous (p-heterogeneity surgical < 0.001, I-squared 98.2%).

Smiley-Webster Scale (SWS)

Six studies reported the SWS, of which two for both surgically and conservatively treated patients. Median SWS scores were available for 231 patients, of which 98 (42%) were treated surgically. Weighted median SWS score was 1 (range 1–4) for surgically treated patients, which was not clinically different from the median of 2 (range 1–4) for conservatively treated patients. Of note, both a SWS score of 1 and 2 represents return to full-time work/activity, the difference being no consumption of pain medication for 1 and occasional consumption of pain medication for 2.

Radiological outcome

Fracture union

Forty studies reported extractable fracture union rates, including thirteen that reported union rates for surgical and conservative groups (Table 2 and Fig. 2). Union data were available for 1900 patients, of which 988 (52%) were treated surgically. Union was achieved in 72.7% (95% CI 66.1%, 78.5%) of surgically treated patients and in 40.2% (95% CI 32.0%, 49.0%) of conservatively treated patients. This difference was clinically significant, although the data were substantially heterogeneous (p-heterogeneity surgical < 0.001, I-squared 75.1%; p-heterogeneity conservative < 0.001, I-squared 74.3%).

Fracture stability

Forty-one studies reported extractable fracture stability rates, including fourteen that reported stability rates for surgical and conservative groups (Table 2 and Fig. 2). Stability data were available for 1917 patients, of which 994 (52%) were treated surgically. Stability was achieved in 82.6% (95% CI 74.9%, 88.3%) of surgically treated patients and in 70.1% (95% CI 57.7%, 80.1%) of conservatively treated patients. Data were substantially heterogeneous (p-heterogeneity surgical < 0.001, I-squared 75.7%; p-heterogeneity conservative < 0.001, I-squared 88.3%).

Complications and mortality

Complications and mortality were heterogeneously reported across studies. Complications in the surgical group were mostly related to the operation, whether intraoperative (e.g., screw malposition) or postoperative (e.g., wound infections). Complications in the conservative group were mostly immobilization-related, such as pressure ulcerations and pneumonia. Analysis of a difference in complications and mortality between treatment groups was not feasible.

Meta-regression analysis

Meta-regression analysis: Fracture union

In the model including treatment type, surgically treated patients showed significantly more union than conservative treated patients when corrected for age and fracture type (p < 0.001), although data were still substantially heterogeneous (new I-squared 75.6%). Individually, increased age and fracture type were not identified to significantly influence fracture union (Table 4).

Table 4 Results of random effects multivariable meta-regression analysis for fracture union and stability

Meta-regression analysis: Fracture stability

In the model including treatment type, no significant difference in stability rates was identified between surgically and conservatively treated patients when corrected for age and fracture type (p = 0.09). Data were substantially heterogeneous (new I-squared 83.8%). Individually, increased age and fracture type were not identified to significantly influence fracture stability (Table 4).

Discussion

Multiple studies describing treatment outcomes for odontoid fractures in the elderly have been published since publication of the previous systematic review by the authors in 2013 [8]. Although these studies typically reported larger samples, only four studies included in this updated review were performed prospectively. Only one study corrected for confounding variables and was therefore classified as cohort study. The other studies were classified as case series. Reported data suffered from substantial heterogeneity. These factors limited the analyses that could be executed. As a result, no strong recommendations can be made regarding the optimal treatment for odontoid fractures in the elderly, even though interesting observations were made.

Evaluation of outcomes of odontoid fractures usually focused on the radiological outcome. Clinical outcome was less often described, but can be considered the most relevant. Focusing primarily on clinical outcomes in the current literature review, no clinically relevant differences were observed between surgically and conservatively treated patients for the NDI and VAS pain scores. Median SWS score was 1 for surgically treated patients and 2 for conservatively treated patients, although both a SWS score of 1 and 2 represents return to full-time work/activity, the difference being no consumption of pain medication for 1 and occasional consumption of pain medication for 2. This difference was also not considered clinically relevant. The clinical outcome measures that were reported in the remaining studies varied widely and could not be used for generalized conclusions.

Fracture union was achieved more often in surgically treated patients than in conservatively treated patients. This difference remained after correction for age and fracture type (II/III vs. II) in the meta-regression analysis. A similar difference in fracture stability was not identified between the treatment groups. Multiple studies used fracture union and/or stability as primary outcome, but the correlation with clinical outcome was not properly studied. It remains unclear whether patients indeed benefit clinically from favorable radiological outcomes. Consequently, debate remains as to what the exact goal of treatment should be (e.g., favorable clinical outcome, osseous union and/or fracture stability), and as to how outcome should be measured.

Patient age in the included studies was comparable between treatment groups. However, different age criteria were applied among studies, describing patients ≥ 65, ≥ 70, ≥ 75 or ≥ 80 years. Moreover, surgically and conservatively treated groups described in the included studies may not be comparable with respect to other patient characteristics (e.g., co-morbidity, osteoporosis, severity of comminution). Outcome diversification per age group among the elderly was mostly absent and needs further study. Furthermore, it is often postulated that treatment outcome depends on patient age. Other factors must, however, play some role, as different studies have shown different outcomes for the same treatment, which cannot be explained by patient age alone.

Complications and mortality were common in both treatment groups, although not uniformly reported and therefore not reliably analyzable. Complications relating to the operation, both intraoperatively (e.g., screw malposition) and postoperatively (e.g., wound infections), were the most common complications in surgically treated patients. Immobilization-related complications, such as pressure ulcerations and pneumonia, were the most prevalent complications in patients treated conservatively.

Vaccaro et al. published the only prospective cohort study included in this review that directly compared surgical to conservative treatment [27]. In this study involving 159 patients, of which 101 treated surgically, higher union rates after surgical treatment were reported. This study reported an NDI increase (clinical worsening) between baseline and 52 weeks after both surgical and conservative treatment. This increase was only significant for conservatively treated patients, even though selection bias and residual confounding may have influenced these findings (e.g., no correction for osteoporosis, no adjusted odds ratios reported). Of note, the outcomes presented in this systematic review were not comparisons between baseline and 52 weeks, but rather the pooled point estimates at 52 weeks specifically. In that respect, Vaccaro et al. reported a mean NDI at one year follow-up of 28.0 (SE 2.49) for surgically treated patients and 31.6 (SE 3.34) for conservatively treated patients, also not reaching a minimally clinically important difference (> 7.5), similar to the findings in this systematic review. As already mentioned, the most relevant outcome parameter remains debated.

Strengths and limitations

This meta-analysis had a few limitations. The studies were mostly case series with their associated limitations, such as missing data, confounding bias and variability in outcome assessment. Outcomes were not uniformly reported at 52 weeks, and, when missing, were extracted for the last available follow-up time point. As a result, data collected for this review suffered from substantial heterogeneity throughout the dataset. Meta-regression analyses were feasible for the radiological outcomes only, where only mean age and fracture type could be corrected for. Outcomes should therefore be interpreted with caution. Results were certainly affected by residual confounding. Illustrative in this respect is the study by Molinari et al, in which patients with < 50% fracture displacement were treated conservatively and patients with ≥ 50% fracture displacement were treated surgically, introducing heterogeneity between treatment groups even within one study [28]. For the primary analysis, type II and III fractures were analyzed as one group. It is plausible that type II fractures were more often treated surgically, whereas conservative management was preferred for type III fractures. This may have influenced the findings. In studies describing patients with both type II and III fractures (n = 10), results were not typically sub-grouped by fracture type. Consequently, adjustment for fracture type was only possible for type II/III versus type II (the other studies, n = 31) fractures, not for type II versus type III fractures as would ideally have been the case. Additionally, bone quality was only scarcely described, even when it is known to be an important factor in bone healing. Finally, a variety of both surgical and conservative treatments were analyzed in only two groups. Further diversification of outcomes for different surgical (e.g., anterior, posterior approach) and conservative (e.g., collar, halo vest) treatments was not deemed feasible due to the data limitations. Nevertheless, this study had some strengths. No restriction was made with regard to language or date, which led to a substantially large number of studies to be meta-analyzed for some outcomes. Consequently, this enabled the authors to conduct a multivariable meta-regression analysis to control for confounding as much as the reported data allowed. Lastly, the classic fail safe n conducted gave reassurance that publication bias was unlikely to be the reason of the asymmetry observed in a few funnel plots.

Conclusions

Implications for clinical practice

No clinically relevant differences between surgically and conservatively treated patients were identified in term of the NDI, VAS pain and SWS scores. When corrected for age and fracture type, surgically treated patients showed higher union rates than conservatively treated patients, although selection mechanisms might (partially) explain this difference. When corrected for age and fracture type, no difference in stability rates was observed between surgically and conservatively treated patients. Data were substantially heterogeneous, limiting the possibilities for analysis and strengths of the recommendations derived from these results.

Implications for research

These results need to be further confirmed in well-designed comparative studies with proper adjustment for confounding, such as age, fracture characteristics, and degree of osteoporosis. The correlation between clinical and radiological outcomes needs to be further explored.