Introduction

The incidence of traumatic cervical spine fractures is estimated at 15–65/100,000 hospital admissions annually [1]. Most cervical spine fractures are not associated with spinal cord injury (SCI). One prospective population-based study from Norway identified SCI in only 10% of cases of cervical fractures [2]. Symptoms resulting from cervical spine fractures vary in severity and in their impact on quality of life. A meaningful and relevant measure of outcomes from cervical fractures is important to understand the personal, population, healthcare and economic impact of these injuries [3,4,5].

Young patients tend to sustain cervical spine injuries from high-energy trauma. In less economically developed countries (LEDCs), this remains the most common mechanism of injury [6, 7]. However, in more economically developed countries (MEDCs), cervical spine fractures due to low-impact trauma are becoming more common in older people [8]. A nationwide database study 1 from 2005 to 2013 found an approximate 32% increase in the incidence of cervical fractures in American patients, along with an increase in the average age at which the fracture was sustained, from 51 to 59 years. The proportion of patients injured in falls rather than motor vehicle accidents also increased.

These heterogeneous patient demographics present a challenge to selecting outcome measures that are applicable and relevant to all groups, reliable, and can be compared between studies [9]. Consequently, outcome measures are often selected for specific contexts, such as younger or older patients, or those with or without an associated neurological deficit. There is therefore a need to identify a common core outcome set (COS) for cervical spine fractures to provide consistency in reporting of studies and facilitate comparisons across studies [10,11,12,13].

Outcome domains can be classified as per the COMET (Core Outcome Measure in Effectiveness Trials) outcome taxonomy of core areas [14]. COMET is an initiative aimed at guiding the development and application of COSs for use in specific conditions as a minimum set of outcomes that should be reported in clinical trials. Their taxonomy aims to classify the different types of outcomes used in trials, providing key ‘areas’ that aid development of COSs. We aimed to identify all outcome measures used in studies of cervical spine fractures in adults and classify them into these core areas in order to help inform future development of a COS.

Methods

The protocol for this systematic review was published before starting the review in PROSPERO (CRD42020172311).

Eligibility criteria

We included studies that reported original data from clinical research involving adult human subjects (> 18 years) with fractures of the cervical spine related to trauma. Included studies must have recorded at least one outcome measure. There were no restrictions on year of publication, location of the study or study design.

Exclusion criteria included: non-English language studies, case reports/series involving < 5 patients, studies with > 50% sustaining associated arterial injuries, studies where < 50% patients sustained a traumatic injury, studies where < 50% injuries were cervical spine fractures, studies with > 50% of cases comprising children (< 18 years).

Information sources/Search strategy

We searched Ovid Medline, Ovid Embase, and Scopus, on March 4, 2020, using the search strategies documented in the protocol (see supplementary material for full search strategies). Searches of the Cochrane Central Register of Controlled Trials, and Trials registries of ongoing trials (ClinicalTrials.gov, ISRCTN, EU clinical trials registry) revealed no further relevant studies. The search was updated on May 28, 2022.

Selection process

After removal of duplicates, two reviewers independently reviewed each record during abstract screening and full-text eligibility assessment. Disagreements were resolved through discussion with a third reviewer.

Data collection process

Data were extracted from each included paper. Data items included: study characteristics (year of publication, whether prospective/retrospective, whether single/multi-center, country/countries the study was based in, study recruitment period), patient characteristics (number of included patients, patient age, presence of SCI, whether the study assessed only patients > 65 years old, whether the study included > 50% of patients with AS/DISH), fracture characteristics (mechanism of injury, level of cervical fracture [upper = C1-2, subaxial = C3-7], proportion of patients with multiple cervical spine fractures, proportion of patients with concomitant thoracolumbar spine fractures), treatment modality, follow-up, timing of outcome assessment following injury and outcomes measured (including: a validated pain score, Frankel grade, American Spinal Cord Injury Association [ASIA] Impairment score, radiological evidence of fusion, radiological evidence of stability, any other radiological parameter recorded, mortality rates, complications, patient-reported outcome measures [PROMs], other assessments of functional outcomes [including any assessment of ambulatory status, bowel/bladder function, or employment status], length of hospital stay, discharge destination and re-operation rate).

Risk of bias

Risk of bias was not examined, as our aim was to assess the type of outcome measures used and not to perform summary effect measures from included studies.

Effect measures

For each outcome identified, the measure used to define the outcome was recorded.

Synthesis measures

Results were grouped according to study characteristics (decade of publication, whether prospective/retrospective, whether single/multi-center, continent the study was based in), patient characteristics (whether the study included greater/fewer patients than the median number of all studies, whether the study assessed only patients > 65 years old, whether the study included > 50% of patients with AS/DISH, presence/absence of SCI), level of cervical fracture and treatment modality. Heatmaps of the summary outcomes were generated using a graded color scale in order to highlight variance in reporting of outcomes across different studies. Each clinical outcome identified was classified into a core area, as per the COMET outcome taxonomy of core areas [14]. Statistical analysis was performed using SPSS version 24. Nonparametric data were assessed using the Mann–Whitney U test to assess for difference between two independent groups.

Results

Search findings and study characteristics

Our search identified 536 eligible studies involving 393,266 patients (Fig. 1). Articles included were published between 1979 and 2022. There was an increase over time in the number of publications identified per year. Table 1 outlines study characteristics. Most studies were single center (n = 468; 87.3%), and retrospective in nature (n = 480; 88.7%). The median number of patients involved in each study was 40, range 6–167,278 (interquartile range; Q1 = 20, Q3 = 82). Studies assessed surgical intervention (n = 296; 55.2%), a mixture of treatment modalities (n = 178; 33.2%), conservative treatments (n = 33; 6.2%), or management in a halo brace (n = 26; 4.9%). The upper cervical region alone (C1-2) was the focus of the publication in 310 studies (57.8%), the subaxial cervical spine (C3-7) in 88 (16.4%) and the whole cervical spine in 138 (25.7%). In 318 studies (68.4%), cohorts included patients admitted with clinical features attributable to SCI.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) diagram

Table 1 Study characteristics

Median or mean age was reported in 461 studies (86.0%) with an age range reported in 353 (65.9%). In total, 80 (14.9%) studies included only patients over 65 years of age. Mechanism of injury was reported in 365 studies (59.3%), and of these, 99 (18.5%) reported > 50% of included cases being caused by low energy injuries or falls. In this group of studies mostly assessing patients with low energy injuries, 66 (66.7%) assessed only C1/2 fractures and 32 (32.3%) included only patients > 65 years of age. The number of patients with more than one cervical spine fracture was reported in 36.6% studies and whether there were other non-cervical vertebral fractures in 24.8%. Median study duration was 84 months (range 3–564 months [interquartile range; Q1 = 48, Q3 = 120]), with clear documentation of mean/median/minimum follow-up reported in 464 studies (86.6%).

Overall, the most frequently reported outcomes across all studies were: complications (both fracture-related and treatment-related) in 400 (74.6%), radiological assessment of fusion in 365 (68.1%), mortality rate in 325 (60.6%), radiological assessment of spinal stability in 209 (40%), at least one patient reported outcome in 157 (29.3%), other assessment of functional outcome (as defined in our methods) in 150 (28%), a validated pain score in 118 (22%), ASIA grade in 144 (26.9%), another radiological outcome in 104 (19.4%), Frankel grade in 68 (12.7%), length of stay in hospital in 93 (17.4%), re-operation rates in 69 (12.9%), discharge destination in 26 (4.9%). Most studies did not define a single primary outcome.

The PROMs most commonly used were a validated pain score in 118 (22%), Neck Disability Index (NDI) in 67 (12.5%), 36-Item Short Form Survey (SF-36) in 18 (3.4%), European Quality of Life Five Dimension (ED-5Q) in 8 (1.5%), Cervical Spine Outcome Questionnaire (CSOQ) in 8 (1.5%), Patient Satisfaction Index (PSI) in 7 (1.3%), Neck Pain Driving Index (NPDI) in 3 (0.6%), and Post-Traumatic Neck Score (PTNC) in 2 (0.4%).

Definition of outcome types

Outcomes were measured using both validated and non-validated tools. Table 2 shows the manner in which outcomes assessed were recorded in the included studies. Some outcomes were uniformly measured across all studies, for example mortality as a percentage of participants. In contrast, recovery of neurological function was measured in some studies using a validated tool such as the ASIA or Frankel grade to depict improvement over follow-up, while other studies recorded a narrative depiction of how patients’ neurological function evolved over the study. Some studies defined their own criterion for outcome assessment. For example, ‘treatment success/failure’ was often defined by the authors. Table 2 shows the outcomes used in studies included in our review, as they relate to the COMET outcome taxonomy.

Table 2 Outcomes measured according to the COMET outcome taxonomy of core areas, showing units of measure used in the studies in brackets

Outcomes measured in different contexts

Tables 3, 4 and 5 identify the outcomes used according to study population, fracture pattern and treatment modality. The use of PROMs was greater in multicenter studies compared to single center studies (34.4% vs. 28.6%), and in prospective studies (32.4% vs. 28.8%) but similar in studies assessing larger and small patient cohorts. Functional assessments were used more commonly in larger studies (29.5% vs. 26.5%) and in prospective studies (33.8% vs. 27.1%). Reporting of hospital stay and discharge destination was greater in larger studies (22.6% and 6.5%) greater compared to smaller studies (12.4% and 3.3%). This was true of multicenter studies (34.4% and 10.9%), compared to single-center studies (15% and 4%). Reporting of radiological parameters was more common in smaller, retrospective and single-center studies. While mortality rates were more often reported in larger and multicenter studies, complication rates were similar across all studies of different size and regardless of whether prospective and multicenter. Re-operation rates were more commonly reported with prospectively collected data (3.94% vs. 8.8%).

Table 3 Heatmap showing percentage use of outcomes measured for included studies based on study design/study population
Table 4 Heatmap showing percentage use of outcomes measured for included studies based on geographical location of study
Table 5 Heatmap showing percentage use of outcomes measured for included studies based on year of study

When comparing studies including patients of any age with studies assessing only patients over the age of 65, those assessing the older population only more commonly reported PROMs (35% vs.28.5%), functional assessments (36.3% vs.26.8%), mortality (92.5% vs.55.3%), hospital stay (23.8% vs.16.4%) and discharge destination (11.3% vs.3.8%). The ASIA/Frankel scales to classify neurological recovery, and radiological parameters, were less commonly used in patients over 65 than in the studies that assessed patients of any age. Bony fusion was reported in 61.3% of studies only assessing patients over 65 years old, compared to 69.2% of other studies.

Studies assessing SCIs less commonly used NDI (14.6% vs. 10.7%), a validated pain score (25.8% vs 19.2%), PROMs (34.7% vs. 25.5%) and functional assessments (31.9% vs. 25.5%) when compared to those studies not including SCI patients. This phenomenon was accentuated in the AS/DISH population of studies, which all included patients with SCI and in which ASIA score and mortality were the most cited outcomes, 51.7% and 82.8%, respectively. Despite 59.3% of all papers describing inclusion of patients with SCI, the reporting of Frankel/ASIA in these was only 65.1%, with the rest of the papers not classifying their patients’ neurological function into a validated system. Re-operation rates were more commonly described in the SCI cohorts (15.1% vs. 8.9%).

When comparing studies assessing different types of injuries, the use of a validated pain score was more prevalent in studies assessing upper cervical injuries (27.7% vs. 11.6% in those assessing subaxial injuries only). However, studies assessing only subaxial injuries more commonly used the Frankel/ASIA grade for neurological outcome compared to studies assessing only upper cervical injuries, 23.9% vs. 9.7% for Frankel and 48.9% vs.18.4% for ASIA. Reporting of radiological fusion was more frequent in studies of upper cervical spine rather than subaxial cervical spine (77.7% vs.63.6%). Mortality, complication rates, PROMs, functional assessments, length of hospital stay and discharge destination were also reported more frequently in studies of upper rather than subaxial cervical spine fractures. Re-operation rates, however, were more often reported in subaxial injuries (23.9% vs.11.9%).

Studies reporting outcomes for patients treated purely conservatively more commonly reported PROMs (36.4%), functional outcomes (36.4%), pain scores (30.3%) and mortality (75.8%) than those assessing the use of only halo brace immobilization or surgery. Studies assessing patients managed surgically more frequently reported Frankel/ASIA grade (15.2%/33.8%) for neurological outcome, radiological fusion rates (73%), complication rates (82.8%), duration of hospital stay (15.2%), and re-operation rates (17.6%).

There was a dearth of studies from LEDCs, especially Africa (n = 15) and South America (n = 7), when compared to the output of North America (n = 178), Asia (n = 179) and Europe (n = 143). Table 4 shows PROMs, and functional assessments were more commonly used in Asia, Europe and Oceania than other continents. Specifically, NDI was more used in Asia (16.2%) and Europe (15.4%), compared to Oceania (9.1%), South America (12.9%), and Africa (6.7%). The EQ-5D score was chiefly used in European studies (4.2%). Studies from Asia more commonly reported a validated pain score (31.8%), ASIA/Frankel grade (54.8%), radiological fusion (79.3%), radiological stability (50.8%) and complications (79.3%) than European or North American studies. Duration of hospital stay was more commonly reported in North American studies (26.4%) than European (18.2%) or Asian (7.8%) studies. Discharge destination was more commonly documented in North American studies (8.4%) than European (4.9%) and Asian studies (0.6%).

Over time, there has been an increase in the reporting of validated pain scores (28% in 2020s vs.13.3% in 1980s), ASIA/Frankel grade (50.7% in 2020s vs. 26.7% in 1980s), complications (78.7% in 2020s vs.60% in 1980s), PROMs (34.7% in 2020s vs.13.33% in 1980s) and functional assessments (36% in 2020s vs.13.33% in 1980s). The use of radiological parameters has declined with 62.7% of studies from the 2020s reporting radiological fusion compared to 86.7% in the 1980s and a similar decline in the reporting of radiological stability and other radiological outcomes (Table 5).

Timing of outcome measures

A clear record of schedule of timing of clinical and/or radiological assessment following injury was recorded in 304 (56.7%) of studies. The remainder of studies recorded outcomes at last follow-up with resultant variation in timing between patients, or did not state when the outcomes were measured. Where recorded, the median first timing of assessment following discharge was 3 months (interquartile range 3–6 months).

Response rates

Where available, the response rates of outcome measures used were analyzed. There were statistically significantly greater response rates in the studies assessing patients of any age compared to those assessing only patients over 65 years old (U = 6729.5, p =  < 0.001). There were statistically significantly lower response rates in the studies using PROMs compared to those assessing only non-patient reported outcomes (U = 12,092, p =  < 0.001). No difference in response rates was seen in studies assessing populations with or without SCI, nor when comparing studies in different continents.

Discussion

Summarizing existing outcome measures used

Our systematic review has identified heterogeneity in selection of outcome measures for cervical spine fracture research. Larger prospective studies more commonly employed PROMs as a method of collecting patient outcome data. Conversely, radiological outcomes were more commonly recorded in smaller, single unit retrospective studies. Larger prospective studies may be better resourced to collect more outcome data using validated tools, with the ability to prospectively follow-up patients and better ensure completion of questionnaires. Radiological outcomes may be more easily obtained, but the greater reporting of radiological outcomes in smaller retrospective studies may also reflect the complexity of importing radiological source material from different centers or from larger populations. Also, the cost of mandating specific imaging in larger studies could be prohibitive.

Only a small proportion of the published literature focused on the elderly population (14.4%), but the elderly/frail population are increasingly becoming the main casualties of cervical fractures as the world’s population ages [15]. Interestingly, we found studies assessing only patients over 65 years of age more commonly reported PROMs, pain scores, functional assessments, length of hospital stay, and discharge destination, but reported less frequently regarding radiological outcomes. This may reflect the growing evidence that radiological bony fusion is less common in the elderly/frail who sustain an odontoid fracture of the cervical spine, irrespective of treatment strategy [16,17,18,19,20]. Quality of life outcomes may be better suited to this population. The majority of the papers included in our review reported on upper cervical spine fractures and surgical management. In comparison, conservative management strategies and their outcomes are less well defined [21]. This is an unmet need with the ageing population. A better understanding of the optimal methods of non-operative management of different types of cervical spine fractures in the elderly/frail and how best to measure clinical outcomes in these populations requires further scrutiny.

Outcome measures in patients with SCI were less likely to use PROMs and functional assessments, which was unexpected, given the importance of neurological function following recovery [22]. The use of a validated classification system (e.g., Frankel/ASIA) allows comparison of cohorts across different studies and quantification of recovery [23, 24]. Scivoletto et al. [25] sought expert opinion on use of outcome measurement tools after SCI, and identified clinicians most commonly used neurological function, pain, spasticity, gait and ability to self-care to define a patient’s recovery. By contrast, in our study, the use of a validated grading classification to define neurological outcome was noted in only 57.8% of papers including SCI patients.

We identified a disparity in outcomes used in MEDCs compared to LEDCs. This has implications for the extrapolation of study outcomes between populations. It may reflect the financial cost and the logistical complexity of post-discharge follow-up of patients in LEDCs. The low proportion of intercontinental studies (0.4%) suggests that more could be done by MEDCs to include LEDC populations in future trials. Certainly the recent trend in global neurosurgery will help in this regard and specifically; studies need to address differences in measured clinical outcomes, and the patients’ recovery goals between populations [26,27,28].

Patient involvement in reporting of outcomes

Since 1979, as the rate of reporting of radiological outcomes within studies of cervical spine fractures has declined, there has been a shift toward better understanding the functional outcomes of patients. While there are clinician reported outcome scores (e.g. JOA/FIM), PROMs recognize the importance of patients involvement/perspectives in the research process [29]. There is often discordance between what physicians and patients perceive as important outcomes [30]. We identified over twice as many studies in the 2020s reported PROMs compared to the 1980s. Pain scores were the most commonly used type of PROM, used in one in five studies. While the NDI was used in around one in ten studies, in total eleven different PROMs were identified that aim to quantify recovery following injury across numerous different domains.

Timing of outcome assessment

There was a lack of clear reporting on timing of assessments following injury. Duration of follow-up and the timing of outcome measure assessments following injury differed significantly between studies and were not always reported. The optimal duration of follow-up may be context dependent, for example, longer if assessing motor recovery following traumatic SCI, when recovery tends to plateau after a period of approximately 12–18 months [24], but shorter in frail/elderly patients with odontoid fractures who have a mortality rate as high as 34.1–37.5% at one year [31, 32]. Better reporting of the timing of outcome measurements would aid in comparisons of outcomes across study populations.

Strengths and limitations of this review

Using broad search terms, and including all treatment modalities, meant this review included a large number, and a wide range of studies. It is likely that we have identified all of the common outcome measures in use. Including non-English language studies may have provided a comparison of outcomes used in different cultures/languages. Nevertheless, with studies performed in every continent, the included outcome measures likely represent those used worldwide for cervical spine fractures.

As we aimed to understand the methods used to report outcomes, we did not assess the quality of the included studies or attempt any meta-analysis of outcome data. Including only studies meeting certain quality criteria may have systematically excluded those reporting certain outcome measures.

Development of a core outcome set

The heterogeneity of outcome measures for cervical spine fractures identified in this systematic review demonstrates the requirement for standardization in future clinical studies. Developing a COS for cervical spine fractures requires ascertainment of which outcome measures are optimal for the patient groups under study. The differences in outcome measures chosen in the existing literature show that these would need to be stratified by patient, population, and injury features. Determining which outcome measures are optimal would benefit from stakeholder input, especially from patients themselves. There is a paucity of literature assessing patients’ beliefs and values regarding the definition of a ‘good outcome’ following a cervical fracture [33]. Patient self-assessment and expectations regarding their recovery will be context specific, and depend on the injury, their age, pre-existing frailty and comorbidities, alongside their values and beliefs. Defining the most appropriate PROMs for cervical spine fractures, as part of a COS would help standardize outcome measures, allowing comparison across studies and modes of treatment. The involvement of patients in defining these COS is vital [34]. We have registered this requirement on the COMET database (https://www.comet-initiative.org/Studies/Details/2030) and plan to conduct a Delphi study to identify these ideal context-specific core outcomes in order to improve the quality of published outcomes for patients with cervical spine fractures.

Conclusions

Overall, the most commonly reported clinical outcomes were complications, radiological fusion and mortality. There was significant heterogeneity in the types of outcome reported across studies of differing populations. Less than a third of the included studies included PROMs, but use of PROMs was more common in larger/prospective studies, in studies assessing patients over 65 years of age and in studies assessing patients managed non-operatively. Over time, the use of radiological outcomes has declined, with a trend to greater use of PROMs. Response rates were poorer in the studies assessing patients over 65 years of age and when PROMs were used, highlighting the challenge in particular in the assessment of outcomes in the elderly cohort. Moreover, the review identified a shortage of studies from LEDCs and a lack of intercontinental studies examining outcomes following cervical spine fractures, which requires a global approach from the neurosurgical community to remedy.