A systematic review identifying outcome measures used in evaluating adults sustaining cervical spine fractures

To assess the outcome measures used in studies investigating cervical spine fractures in adults, with or without associated spinal cord injury, to inform development of a core outcome set. Medline, Embase and Scopus were searched for relevant studies until May 28, 2022, without a historic limit on study date. Study characteristics, population characteristics and outcomes reported were extracted and analyzed. Our literature search identified 536 studies that met criteria for inclusion, involving 393,266 patients. Most studies were single center (87.3%), retrospective studies (88.9%) and involved a median of 40 patients (range 6–167,278). Treatments assessed included: surgery (55.2%), conservative (6.2%), halo immobilization (4.9%), or a mixture (33.2%). Median study duration was 84 months (range 3–564 months); the timing of clinical and/or radiological follow-up assessment after injury was reported in 56.7%. There was significant heterogeneity in outcomes used, with 79 different reported outcomes measures. Differences in use were identified between smaller/larger, retro-/prospective and single/multicenter cohorts. Over time, the use of radiological outcomes has declined with greater emphasis on patient-reported outcome measures (PROMs). Studies of conservative management were more likely to detail PROMs and mortality, whereas surgical studies reported Frankel/ASIA grade, radiological fusion, complication rates, duration of hospital stay and re-operation rates more frequently. In studies assessing the elderly population (> 65 years), use of PROMs, mortality, hospital stay and discharge destination were more common, whereas fusion was reported less often. Response rates for outcome assessments were lower in studies assessing elderly patients, and studies using PROMs. We have classified the various outcome measures used for patients with cervical spine fractures based on the COMET outcome taxonomy. We also described the contexts in which different outcomes are more commonly employed to help guide decision-making when designing future research endeavors.


Introduction
The incidence of traumatic cervical spine fractures is estimated at 15-65/100,000 hospital admissions annually [1].Most cervical spine fractures are not associated with spinal cord injury (SCI).One prospective population-based study from Norway identified SCI in only 10% of cases of cervical fractures [2].Symptoms resulting from cervical spine fractures vary in severity and in their impact on quality of life.A meaningful and relevant measure of outcomes from cervical fractures is important to understand the personal, population, healthcare and economic impact of these injuries [3][4][5].
Young patients tend to sustain cervical spine injuries from high-energy trauma.In less economically developed countries (LEDCs), this remains the most common mechanism of injury [6,7].However, in more economically developed countries (MEDCs), cervical spine fractures due to lowimpact trauma are becoming more common in older people [8].A nationwide database study 1 from 2005 to 2013 found an approximate 32% increase in the incidence of cervical fractures in American patients, along with an increase in the average age at which the fracture was sustained, from 51 to 59 years.The proportion of patients injured in falls rather than motor vehicle accidents also increased.
These heterogeneous patient demographics present a challenge to selecting outcome measures that are applicable and relevant to all groups, reliable, and can be compared between studies [9].Consequently, outcome measures are often selected for specific contexts, such as younger or older patients, or those with or without an associated neurological deficit.There is therefore a need to identify a common core outcome set (COS) for cervical spine fractures to provide consistency in reporting of studies and facilitate comparisons across studies [10][11][12][13].
Outcome domains can be classified as per the COMET (Core Outcome Measure in Effectiveness Trials) outcome taxonomy of core areas [14].COMET is an initiative aimed at guiding the development and application of COSs for use in specific conditions as a minimum set of outcomes that should be reported in clinical trials.Their taxonomy aims to classify the different types of outcomes used in trials, providing key 'areas' that aid development of COSs.We aimed to identify all outcome measures used in studies of cervical spine fractures in adults and classify them into these core areas in order to help inform future development of a COS.

Methods
The protocol for this systematic review was published before starting the review in PROSPERO (CRD42020172311).

Eligibility criteria
We included studies that reported original data from clinical research involving adult human subjects (> 18 years) with fractures of the cervical spine related to trauma.Included studies must have recorded at least one outcome measure.There were no restrictions on year of publication, location of the study or study design.
Exclusion criteria included: non-English language studies, case reports/series involving < 5 patients, studies with > 50% sustaining associated arterial injuries, studies where < 50% patients sustained a traumatic injury, studies where < 50% injuries were cervical spine fractures, studies with > 50% of cases comprising children (< 18 years).

Selection process
After removal of duplicates, two reviewers independently reviewed each record during abstract screening and full-text eligibility assessment.Disagreements were resolved through discussion with a third reviewer.

Data collection process
Data were extracted from each included paper.Data items included: study characteristics (year of publication, whether prospective/retrospective, whether single/multi-center, country/countries the study was based in, study recruitment period), patient characteristics (number of included patients, patient age, presence of SCI, whether the study assessed only patients > 65 years old, whether the study included > 50% of patients with AS/DISH), fracture characteristics (mechanism of injury, level of cervical fracture [upper = C1-2, subaxial = C3-7], proportion of patients with multiple cervical spine fractures, proportion of patients with concomitant thoracolumbar spine fractures), treatment modality, follow-up, timing of outcome assessment following injury and outcomes measured (including: a validated pain score, Frankel grade, American Spinal Cord Injury Association [ASIA] Impairment score, radiological evidence of fusion, radiological evidence of stability, any other radiological parameter recorded, mortality rates, complications, patient-reported outcome measures [PROMs], other assessments of functional outcomes [including any assessment of ambulatory status, bowel/bladder function, or employment status], length of hospital stay, discharge destination and reoperation rate).

Risk of bias
Risk of bias was not examined, as our aim was to assess the type of outcome measures used and not to perform summary effect measures from included studies.

Effect measures
For each outcome identified, the measure used to define the outcome was recorded.

Synthesis measures
Results were grouped according to study characteristics (decade of publication, whether prospective/retrospective, whether single/multi-center, continent the study was based in), patient characteristics (whether the study included greater/fewer patients than the median number of all studies, whether the study assessed only patients > 65 years old, whether the study included > 50% of patients with AS/ DISH, presence/absence of SCI), level of cervical fracture and treatment modality.Heatmaps of the summary outcomes were generated using a graded color scale in order to highlight variance in reporting of outcomes across different studies.Each clinical outcome identified was classified into a core area, as per the COMET outcome taxonomy of core areas [14].Statistical analysis was performed using SPSS version 24.Nonparametric data were assessed using the Mann-Whitney U test to assess for difference between two independent groups.
Median or mean age was reported in 461 studies (86.0%) with an age range reported in 353 (65.9%).In total, 80 (14.9%) studies included only patients over 65 years of age.Mechanism of injury was reported in 365 studies (59.3%), and of these, 99 (18.5%) reported > 50% of included cases being caused by low energy injuries or falls.In this group of studies mostly assessing patients with low energy injuries, 66 (66.7%) assessed only C1/2 fractures and 32 (32.3%) included only patients > 65 years of age.The number of patients with more than one cervical spine fracture was reported in 36.6% studies and whether there were other noncervical vertebral fractures in 24.8%.Median study duration was 84 months (range 3-564 months [interquartile range; Q1 = 48, Q3 = 120]), with clear documentation of mean/ median/minimum follow-up reported in 464 studies (86.6%).

Definition of outcome types
Outcomes were measured using both validated and nonvalidated tools.Table 2 shows the manner in which outcomes assessed were recorded in the included studies.Some outcomes were uniformly measured across all studies, for example mortality as a percentage of participants.In contrast, recovery of neurological function was measured in some studies using a validated tool such as the ASIA or Frankel grade to depict improvement over follow-up, while other studies recorded a narrative depiction of how patients' neurological function evolved over the study.Some studies defined their own criterion for outcome assessment.For example, 'treatment success/failure' was often defined by the authors.Table 2 shows the outcomes used in studies included in our review, as they relate to the COMET outcome taxonomy.

Outcomes measured in different contexts
Tables 3, 4 and 5 identify the outcomes used according to study population, fracture pattern and treatment modality.The use of PROMs was greater in multicenter studies compared to single center studies (34.4% vs. 28.6%),and in prospective studies (32.4% vs. 28.8%)but similar in studies assessing larger and small patient cohorts.Functional assessments were used more commonly in larger studies (29.5% vs. 26.5%)and in prospective studies (33.8% vs. 27.1%).Reporting of hospital stay and discharge destination was greater in larger studies (22.6% and 6.5%) greater compared to smaller studies (12.4% and 3.3%).This was true of multicenter studies (34.4% and 10.9%), compared to single-center studies (15% and 4%).Reporting of radiological parameters was more common in smaller, retrospective and single-center studies.While mortality rates were more  often reported in larger and multicenter studies, complication rates were similar across all studies of different size and regardless of whether prospective and multicenter.Re-operation rates were more commonly reported with prospectively collected data (3.94% vs. 8.8%).
When comparing studies including patients of any age with studies assessing only patients over the age of 65, those assessing the older population only more commonly reported PROMs (35% vs.28.5%), functional assessments (36.3% vs.26.8%), mortality (92.5% vs.55.3%), hospital stay (23.8% vs.16.4%) and discharge destination (11.3% vs.3.8%).The ASIA/Frankel scales to classify neurological recovery, and radiological parameters, were less commonly used in patients over 65 than in the studies that assessed patients of any age.Bony fusion was reported in 61.3% of studies only assessing patients over 65 years old, compared to 69.2% of other studies.

8.9%).
When comparing studies assessing different types of injuries, the use of a validated pain score was more prevalent in studies assessing upper cervical injuries (27.7% vs. 11.6% in those assessing subaxial injuries only).However, studies assessing only subaxial injuries more commonly used the Frankel/ASIA grade for neurological outcome compared to studies assessing only upper cervical injuries, 23.9% vs. 9.7% for Frankel and 48.9% vs.18.4% for ASIA.Reporting of radiological fusion was more frequent in studies of upper cervical spine rather than subaxial cervical spine (77.7% vs.63.6%).Mortality, complication rates, PROMs, functional assessments, length of hospital stay and discharge destination were also reported more frequently in studies of upper rather than subaxial cervical spine fractures.Re-operation rates, however, were more often reported in subaxial injuries (23.9% vs.11.9%).
Over time, there has been an increase in the reporting of validated pain scores (28% in 2020s vs.13.3% in 1980s), ASIA/Frankel grade (50.7% in 2020s vs. 26.7% in 1980s), complications (78.7% in 2020s vs.60% in 1980s), PROMs (34.7% in 2020s vs.13.33% in 1980s) and functional assessments (36% in 2020s vs. 13.33% in 1980s).The use of Table 3 Heatmap showing percentage use of outcomes measured for included studies based on study design/study population *One study was a protocol and included no patients; **in three studies this was not defined; ***in five studies this was unclear radiological parameters has declined with 62.7% of studies from the 2020s reporting radiological fusion compared to 86.7% in the 1980s and a similar decline in the reporting of radiological stability and other radiological outcomes (Table 5).

Timing of outcome measures
A clear record of schedule of timing of clinical and/or radiological assessment following injury was recorded in 304 (56.7%) of studies.The remainder of studies recorded outcomes at last follow-up with resultant variation in timing between patients, or did not state when the outcomes were measured.Where recorded, the median first timing of assessment following discharge was 3 months (interquartile range 3-6 months).

Response rates
Where available, the response rates of outcome measures used were analyzed.There were statistically significantly greater response rates in the studies assessing patients of any age compared to those assessing only patients over 65 years old (U = 6729.5,p = < 0.001).There were statistically significantly lower response rates in the studies using PROMs compared to those assessing only non-patient reported outcomes (U = 12,092, p = < 0.001).No difference in response rates was seen in studies assessing populations with or without SCI, nor when comparing studies in different continents.

Summarizing existing outcome measures used
Our systematic review has identified heterogeneity in selection of outcome measures for cervical spine fracture research.Larger prospective studies more commonly employed PROMs as a method of collecting patient outcome data.Conversely, radiological outcomes were more commonly recorded in smaller, single unit retrospective studies.Larger prospective studies may be better resourced to collect more outcome data using validated tools, with the ability to prospectively follow-up patients and better ensure completion of questionnaires.Radiological outcomes may be more easily obtained, but the greater reporting of radiological outcomes in smaller retrospective studies may also reflect the complexity of importing radiological source material from different centers or from larger populations.Also, the cost of mandating specific imaging in larger studies could be prohibitive.
Only a small proportion of the published literature focused on the elderly population (14.4%), but the elderly/ frail population are increasingly becoming the main casualties of cervical fractures as the world's population ages [15].Interestingly, we found studies assessing only patients over 65 years of age more commonly reported PROMs, pain scores, functional assessments, length of hospital stay, and discharge destination, but reported less frequently regarding radiological outcomes.This may reflect the growing evidence that radiological bony fusion is less common in the elderly/frail who sustain an odontoid fracture of the cervical spine, irrespective of treatment strategy [16][17][18][19][20]. Quality of Table 5 Heatmap showing percentage use of outcomes measured for included studies based on year of study *1970s not included as only one study from this decade met inclusion criteria life outcomes may be better suited to this population.The majority of the papers included in our review reported on upper cervical spine fractures and surgical management.In comparison, conservative management strategies and their outcomes are less well defined [21].This is an unmet need with the ageing population.A better understanding of the optimal methods of non-operative management of different types of cervical spine fractures in the elderly/frail and how best to measure clinical outcomes in these populations requires further scrutiny.
Outcome measures in patients with SCI were less likely to use PROMs and functional assessments, which was unexpected, given the importance of neurological function following recovery [22].The use of a validated classification system (e.g., Frankel/ASIA) allows comparison of cohorts across different studies and quantification of recovery [23,24].Scivoletto et al. [25] sought expert opinion on use of outcome measurement tools after SCI, and identified clinicians most commonly used neurological function, pain, spasticity, gait and ability to self-care to define a patient's recovery.By contrast, in our study, the use of a validated grading classification to define neurological outcome was noted in only 57.8% of papers including SCI patients.
We identified a disparity in outcomes used in MEDCs compared to LEDCs.This has implications for the extrapolation of study outcomes between populations.It may reflect the financial cost and the logistical complexity of post-discharge follow-up of patients in LEDCs.The low proportion of intercontinental studies (0.4%) suggests that more could be done by MEDCs to include LEDC populations in future trials.Certainly the recent trend in global neurosurgery will help in this regard and specifically; studies need to address differences in measured clinical outcomes, and the patients' recovery goals between populations [26][27][28].

Patient involvement in reporting of outcomes
Since 1979, as the rate of reporting of radiological outcomes within studies of cervical spine fractures has declined, there has been a shift toward better understanding the functional outcomes of patients.While there are clinician reported outcome scores (e.g.JOA/FIM), PROMs recognize the importance of patients involvement/perspectives in the research process [29].There is often discordance between what physicians and patients perceive as important outcomes [30].We identified over twice as many studies in the 2020s reported PROMs compared to the 1980s.Pain scores were the most commonly used type of PROM, used in one in five studies.While the NDI was used in around one in ten studies, in total eleven different PROMs were identified that aim to quantify recovery following injury across numerous different domains.

Timing of outcome assessment
There was a lack of clear reporting on timing of assessments following injury.Duration of follow-up and the timing of outcome measure assessments following injury differed significantly between studies and were not always reported.The optimal duration of follow-up may be context dependent, for example, longer if assessing motor recovery following traumatic SCI, when recovery tends to plateau after a period of approximately 12-18 months [24], but shorter in frail/elderly patients with odontoid fractures who have a mortality rate as high as 34.1-37.5% at one year [31,32].Better reporting of the timing of outcome measurements would aid in comparisons of outcomes across study populations.

Strengths and limitations of this review
Using broad search terms, and including all treatment modalities, meant this review included a large number, and a wide range of studies.It is likely that we have identified all of the common outcome measures in use.Including non-English language studies may have provided a comparison of outcomes used in different cultures/languages.Nevertheless, with studies performed in every continent, the included outcome measures likely represent those used worldwide for cervical spine fractures.
As we aimed to understand the methods used to report outcomes, we did not assess the quality of the included studies or attempt any meta-analysis of outcome data.Including only studies meeting certain quality criteria may have systematically excluded those reporting certain outcome measures.

Development of a core outcome set
The heterogeneity of outcome measures for cervical spine fractures identified in this systematic review demonstrates the requirement for standardization in future clinical studies.Developing a COS for cervical spine fractures requires ascertainment of which outcome measures are optimal for the patient groups under study.The differences in outcome measures chosen in the existing literature show that these would need to be stratified by patient, population, and injury features.Determining which outcome measures are optimal would benefit from stakeholder input, especially from patients themselves.There is a paucity of literature assessing patients' beliefs and values regarding the definition of a 'good outcome' following a cervical fracture [33].Patient self-assessment and expectations regarding their recovery will be context specific, and depend on the injury, their age, pre-existing frailty and comorbidities, alongside their values and beliefs.Defining the most appropriate PROMs for cervical spine fractures, as part of a COS would help standardize outcome measures, allowing comparison across studies and modes of treatment.The involvement of patients in defining these COS is vital [34].We have registered this requirement on the COMET database (https:// www.comet-initi ative.org/ Studi es/ Detai ls/ 2030) and plan to conduct a Delphi study to identify these ideal context-specific core outcomes in order to improve the quality of published outcomes for patients with cervical spine fractures.

Conclusions
Overall, the most commonly reported clinical outcomes were complications, radiological fusion and mortality.There was significant heterogeneity in the types of outcome reported across studies of differing populations.Less than a third of the included studies included PROMs, but use of PROMs was more common in larger/prospective studies, in studies assessing patients over 65 years of age and in studies assessing patients managed non-operatively.Over time, the use of radiological outcomes has declined, with a trend to greater use of PROMs.Response rates were poorer in the studies assessing patients over 65 years of age and when PROMs were used, highlighting the challenge in particular in the assessment of outcomes in the elderly cohort.Moreover, the review identified a shortage of studies from LEDCs and a lack of intercontinental studies examining outcomes following cervical spine fractures, which requires a global approach from the neurosurgical community to remedy.

Fig. 1
Fig. 1 Preferred reporting items for systematic reviews and meta-analyses (PRISMA) diagram We searched Ovid Medline, Ovid Embase, and Scopus, on March 4, 2020, using the search strategies documented in the protocol (see supplementary material for full search strategies).Searches of the Cochrane Central Register of Controlled Trials, and Trials registries of ongoing trials (Clini-calTrials.gov,ISRCTN, EU clinical trials registry) revealed no further relevant studies.The search was updated on May 28, 2022.

Table 2
Outcomes measured according to the COMET outcome taxonomy of core areas, showing units of measure used in the studies in brackets VAS, Visual analog scale; NRS, Numeric rating scale; JOA, Japanese Orthopaedic Association scoring system; SF-36, 36-Item Short Form Survey; CSOQ, cervical spine outcome questionnaire; NPDI, neck pain driving index; PTNC, post-traumatic neck score; PSI, patient satisfaction index; WISCI-2, Walking Index for Spinal Cord Injury 2 score; FIM, functional independence measure score; mRS, modified Rankin score; SCIM, spinal cord independence measure; ODI, Oswestry disability scale

Table 4
Heatmap showing percentage use of outcomes measured for included studies based on geographical location of study