Introduction

Based on a ground-up and evidence-based approach, the AOSpine Knowledge Forum (KF) Trauma has undertaken initiatives to develop a novel disease-specific outcome instrument for spine trauma patients. In addition to outcome measurement from the patients’ perspective, there is also a need for a tool that incorporates the most relevant clinical and radiological parameters from spine surgeons’ perspective as a corollary predictive outcomes tool. In daily clinical practice, treating surgeons routinely use a number of clinical and radiological parameters to evaluate treatment results after traumatic spine injuries, either conservative or surgical. In order to predict the outcome and determine the potential need for additional treatment, it is common that spine surgeons make estimates of expected problems with respect to a number of short-term and long-term outcomes. It is likely that surgeons’ perspectives may differ substantially from the patients’ perspective [1,2,3,4,5,6,7].

It would be valuable to standardize the surgeons’ ‘gut feeling’ and make it measurable. Therefore, we sought to assess the potential utility of a new concept of a Clinician Reported Outcome Spine Trauma (AOSpine CROST) as supplemental to a patient reported outcomes tool. Such a tool would be administered by the treating surgeons at various time points during the follow-up period, after patients’ initial treatment. We hypothesized that treating surgeons with their content expertise would be enabled to estimate and predict clinical and functional outcomes of spine trauma patients using this tool. The quality of spine care would be improved with standardizing the evaluation of patients’ postoperative course. The objective of this paper is to report on the development of the AOSpine CROST as well as the results of an initial reliability study.

Materials and methods

Developmental process AOSpine CROST

In the developmental process of the tool, two separate surveys were conducted among international spine trauma experts in order to identify relevant clinical and radiological parameters for the thoracic and lumbar spine [8], and for the cervical spine [9]. Subsequently, integrating evidence from the preparatory studies with expert opinion, a working draft version of the AOSpine CROST was developed. This process consisted of an iterative approach of multiple cycles of development, review, and revision including an expert clinician panel consisting of AOSpine KF Trauma and its associate members. Attention was paid to the definition of the parameters and additional descriptions in order to specify those parameters. Also various response scales were investigated. After the development of a draft version of the tool, a pilot test was performed during an expert committee meeting. The tool was evaluated by rating it for various cases from the daily clinical practice. After completing this phase, a definite version was developed to be further validated.

Reliability study

For the validation phase, a study to evaluate intra- and inter-rater reliability was performed among an expert panel. An invitation was sent by the data manager of AOSpine International to the Steering Committee members of the KF Trauma and Spinal Cord Injury as well as to their associate members. The participants were provided with 20 selected spine trauma cases through an online system, representing a typical wide range of clinical cases as would be seen in daily clinical practice. The cases were selected by the first author (SS) and senior author (FCO). The web-based system provided background data about case scenarios, their AOSpine CROST evaluation, and any comments in an additional blank field. For retest reliability, the cases were reassessed at two occasions with a 4-week interval.

Cases

In line with the aim of the AOSpine CROST to evaluate the provided treatment at the first follow-up time-point after trauma, the cases scenarios mimicked a first outpatient visit after the initial trauma. The cases were selected from a large database of the University Medical Center Utrecht (Utrecht, the Netherlands) and Schön Klinik (Fürth, Germany), and included 14 surgically and 6 conservatively treated patients. Each case consisted of: (1) patient characteristics, (2) background life-style, (3) trauma-related characteristics together with the CT and/or MRI scan slices from the trauma-setting, (4) the further course at the hospital, and (5) the outpatient clinic follow-up together with the AP and lateral CR or any other modality that was performed. Two examples of cases which showed various AOSpine CROST results are shown in Figs. 1 and 2.

Fig. 1
figure 1

An example of a case for which the AOSpine CROST was rated with acceptable results

Fig. 2
figure 2

An example of a case for which the AOSpine CROST was rated with fair results

Statistical analysis

The results of the developmental process of the AOSpine CROST were analyzed using descriptive statistics. The inter- and intra-rater reliability per parameter was analyzed using Kappa statistics, with < 0 values indicating poor agreement, 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect agreement [10]. The inter-rater agreement for the total score was calculated using intraclass correlation coefficient (ICC) [11]. The internal consistency was analyzed using Cronbach’s α, with α ≥ 0.7 being indicated as acceptable while α ≥ 0.9 as excellent.

Results

AOSpine CROST tool

After pilot testing and multiple evaluations during expert committee meetings, a final version of the AOSpine CROST was developed consisting of 10 parameters (Table 1). Eight parameters were rated for both surgically and non-surgically treated patients, while 2 parameters were only applicable to surgically treated patients (‘Wound healing’ and ‘Implants’). The tool was pilot tested among several examples of spine trauma cases from the daily clinical practice. In line with the approach in the preparatory studies, each parameter was rated both for the short-term and long-term perspectives, indicated as ‘within 12 months’ and ‘from 12 months onwards’, respectively.

Table 1 Parameters of the AOSpine CROST (Clinician Reported Outcome Spine Trauma) together with the added questions in order to specify each parameter

It was decided not to further classify response levels or develop specific cutoff points. After review of a number of scoring methodologies during the initial testing process a dichotomous scoring system (‘yes’ or ‘no’ response) was selected to express expected problems or adverse events for the parameters. Each ‘yes’-answer provided 1 point. The total recorded score was the sum of the ‘yes’-answers with a maximum achievable score being 8 points for non-surgically and 10 points for surgically treated patients. A higher total score would indicate worse expected clinical outcomes. The score is seen as an indication of the surgeon’s anticipation of a change in the treatment plan. The definitive version of the AOSpine CROST used in this study is shown in “Appendix.”

Participants

In total, out of 19 invited international spine trauma experts, 16 (84.2%) participated in the first round and 14 (73.7%) in the second round. Ten were related to AOSpine KF Trauma and 6 to AOSpine KF Spinal Cord Injury. Different world regions were represented, with 9 (56.3%) experts from North America, 5 (31.3%) from Europe, 1 (6.3%) from Asia, and 1 (6.3%) from South America.

Intra-rater reliability

The intra-rater reliability analysis per parameter showed fair to good results, both for the short term and long term (Table 2). For the short term, Kappa values ranged from 0.40 (‘General bone quality’) to 0.80 (‘Radiographic sagittal alignment’). For the long-term predictive outcomes, ‘Radiographic sagittal alignment’ (κ = 0.67) again showed the highest agreement. Compared to the short-term reliability, ‘Wound healing’ (κ = 0.31 vs 0.68)), ‘Stability of the injured spine level’ (κ = 0.57 vs 0.79), and ‘Implants’ (κ = 0.44 vs 0.67) showed rather lower agreements.

Table 2 Intra-rater reliability per item using kappa statistics (ĸ)

Inter-rater reliability

With slight to moderate agreement, the results of the inter-rater reliability analysis per parameter were lower than for intra-rater results, both for the short term and long term (Table 3). For the short term, ‘Spinal column mobility’ showed the lowest agreement (κ = 0.18), while the highest agreement was reached for ‘Radiographic sagittal alignment’ (κ = 0.60). The lowest inter-rater reliability for the long term was for ‘Spinal column mobility’ (κ = 0.16), and the highest for ‘Radiographic sagittal alignment’ (κ = 0.46).

Table 3 Inter-rater reliability per item using kappa statistics (ĸ)

Analyses of the inter-rater agreement results for the total scores of the AOSpine CROST showed moderate results for both surgically and non-surgically treated cases, as well as for the short and long term. As shown in Table 4, the intraclass correlation coefficient (ICC) ranged from 0.52 to 0.60.

Table 4 Inter-rater agreement for the total score of the AOSpine CROST using intraclass correlation coefficient, both for the short term and long term

Internal consistency

Acceptable results were observed for the internal consistency of the total AOSpine CROST scores. The Cronbach’s alpha ranged from 0.76 to 0.82 (Table 5).

Table 5 Internal consistency of the AOSpine CROST using Cronbach’s α

Participants’ comments

Although several comments were provided by the participants concerning the cases provided, no specific comments were directed at the AOSpine CROST tool.

Discussion

Based on the results of two preparatory studies combined with findings from expert committee meetings, the AOSpine CROST has been developed. An initial reliability study conducted among senior spine trauma experts showed fair results for inter-rater reliability; however, moderate results for intra-rater reliability as well as acceptable results for internal consistency were found. We believe this is the first scoring tool for spine trauma care that reflects the spine surgeon’s expectations on predicted patient outcomes applicable to a routine clinical setting.

The tool was developed in an iterative fashion in several development cycles with sequential reviews and revisions in multiphase processes conducted in expert meetings. First, on the basis of two preparatory studies [8, 9], a number of parameters were selected in this process and then refined further. In the developmental process of the AOSpine CROST, multiple versions of the tool including those parameters were pilot tested among several examples of spine trauma cases from the daily clinical practice. In this perspective, various parameters as ‘Neurological status’, ‘Radiographic sagittal alignment’, ‘General bone quality’, and ‘Stability of the injured spine level’ were more precisely defined. However, after extensive efforts through sequential reviews, revisions, and pilot tests the expert committee decided not to further define or formulate specific cutoff points, rather one question was added for each parameter to make the tool more easily usable and improve interpretability. Further, a duration-based differentiation was made for short-term (‘in the next 12 months’) and long-term outcomes (‘from 12 months onwards’).

There were multiple reasons for these decisions in the current phase. For example, in the ‘Neurological status’ parameter addition of a dedicated neurological classification system was contemplated. As there are a variety of neurological classification systems in use, such as ASIA, Frankel Scales, and AOSpine Injury Classification systems [12,13,14,15], it was decided not to further specify this domain. Moreover, the correlation of various types of potential neurological deterioration relative to outcomes remained controversial and therefore was felt to be too unpredictable for classification at this time [16]. Also, ‘Radiographic sagittal alignment’ was not specified in terms of specific kyphosis angles as to their impact on outcomes, as various threshold definitions have been proposed in previous literature [17, 18]. Moreover, worldwide variation of measurement techniques by surgeons around the world has made creation of specific numeric levels undesirable [19]. The parameter of ‘Stability of the injured spine level’ was further refined by addition of the term ‘mechanical instability’. The same was the case for ‘Spinal column mobility’ in which maintenance of overall spinal column mobility was described. ‘General bone quality’ was felt to be another key descriptor of patient bone quality rendered by the surgeon. It was felt important as it may play a role in surgical decision making and also affect supplemental interventional treatments [20]. Also, a higher risk of implant failure and the possibility for gradual neurological deterioration can be correlated with impaired bone quality [21]. Furthermore, domains for patient ‘General physical condition’ and ‘General psychological condition’ were felt to be important factors for treatment selection as well as expected outcomes [22, 23]. ‘Implant’-related concerns were selected as a separate domain as osteoporosis and type of implant selection may impact anticipated outcomes, for instance, in case of short-segment fixation in patients with poor bone stock [24]. ‘Wound healing’ in surgically treated patient might impact patient care, e.g., in form of revision surgeries or ongoing antibiotic treatment. ‘Functional recovery’ was added to the clinician’s perspective-based AOSpine CROST tool while not having been evaluated in preparatory studies [8, 9]. We felt that this parameter would add a valuable contribution to the overall tool, and provide a direct connection to the patient’s reported outcome as expressed by AOSpine PROST (Patient Reported Outcome Spine Trauma). This AOSpine PROST was developed and validated on the basis of different foundational studies and following an international consensus conference [25].

In general, fair to moderate results were observed for the inter-rater reliability of the tool, while the intra-rater reliability showed moderate to good results. Also, acceptable internal consistency was seen for the parameters of the AOSpine CROST. Thus, the tool is able to adequately measure the underlying construct, and evaluate the treatment progress using clinical and radiological parameters. These results indicate that individual surgeons are highly consistent in their judgments, but there is disagreement among different surgeons. This difference in the evaluation of crucial clinical parameters among surgeons with substantial experience and interest in spine trauma might also explain some of the ongoing controversies on the care of spine trauma patients. It may reflect the regional differences in treatment of trauma patients (or lack of worldwide accepted guidelines) and is considered as a possible expected finding of this study. This view is supported by the better results for intra-rater reliability and internal consistency. In the next phase, while testing the AOSpine CROST in a clinical setting including surgeons from the same departments and regions, better inter-rater reliability within one region or department is expected. From this perspective, this tool may also be useful in understanding the reasons for the observed variations in the practice. The reliability results may also be related with the current study design whereby cases scenarios were provided in an online environment. We would hope that direct assessments in front of actual patients and in a realistic clinical setting would allow for a more consistent assessment of patient by different practitioners, especially when parameters such as ‘General physical condition’ and ‘General psychological condition’ are concerned which scored as ‘fair’ only in this current validation study.

This study has several limitations. First, we relied on a relatively limited number of participants in the reliability study. Nevertheless, as each participant rated the AOSpine CROST for 20 cases, a total of 280–320 data points were retrieved which is comparable or even considerably more compared to many other inter-rater reliability studies in which 30–50 cases are rated by 3–5 participants. Secondly, our study design did not include longer term patient follow-up results to investigate the prospective value of the tool. We do plan to perform such actual outcomes based studies in the future with patients in actual clinical settings. Finally, the patients were presented as online cases only with descriptive scenarios. We felt that for our initial validation phase of our clinician outcomes tool this would provide the most expedient way to test the initial reliability of the AOSpine CROST.

In conclusion, the AOSpine CROST (Clinician Reported Outcome Spine Trauma) was developed on the basis of two preparatory studies combined with the results of expert committee meetings. An initial reliability analysis showed fair to moderate results and acceptable internal consistency. In the next phase, further prospective validation studies will be performed to investigate the construct validity, reliability, and predictive value of the tool. We believe that this tool has the potential to be used in the clinical setting, which can provide a holistic view of patients’ health when used together with the AOSpine PROST (Patient Reported Outcome Spine Trauma) and may help resolve some of the ongoing controversies.