Introduction

Measurement of the results of interventions on the individuals’ health-related quality of life is not only relevant for optimal treatment strategies but also from the standpoint of cost-effectiveness. However, the outcomes of spine trauma patients have traditionally been limited to reporting of mortality and neurologic deficits or expressed with instruments designed for chronic conditions [1,2,3]. The use of different outcome measures, which were not designed for spine trauma, contributes to the ongoing controversies on the optimal treatment of this specific patient population [4, 5].

To address this void, the AOSpine Knowledge Forum Trauma initiated a project to develop and validate a disease-specific outcome measure for spine trauma patients: the Patient Reported Outcome Spine Trauma (AOSpine PROST). The systematic approach and Core Set development methodology of the International Classification of Functioning, Disability, and Health (ICF) of the World Health Organization (WHO) was used as the basis for the development of the tool [6, 7]. In a preparatory phase, three studies aimed to identify ICF categories relevant to measure outcomes of traumatic spinal column injuries from different perspectives: research, experts and patients. A fourth study investigated various question and response formats for use in AOSpine PROST. In the next phase, a formal consensus process integrated evidence from the preparatory studies and expert opinion and let to the selection of 25 ICF categories as ‘core categories’ and the appropriate response scale. Subsequently, a draft Dutch version of the tool was developed by clustering the 25 core ICF categories into 19 items and implementing those into the selected 0–100 Numeric Rating Scale (NRS-101). After pilot testing, a definitive Dutch version to be validated was developed [8]

In the developmental process and initial validation, we sought to focus on patients sustaining injuries to their spinal column and excluded completely paralyzed and polytrauma patients, to identify specific problems related to spine trauma. This study aimed to validate the Dutch version of the AOSpine PROST among traumatic spinal column injury patients. More specifically, the psychometric properties were investigated to assess its reliability, validity and responsiveness.

Materials and Methods

Target population

Adult (≥ 18 years) traumatic spinal column injury patients who were capable of understanding and adequately filling out the questionnaires were included. Polytrauma patients (Injury Severity Score (ISS) > 15) and patients with complete paralysis (American Spinal Injury Association (ASIA) impairment grade A or B at discharge or transfer from hospital) were excluded.

Instruments

For the purpose of concurrent validity, the AOSpine PROST should be compared to a validated outcome instrument designed for patients with traumatic spinal column injuries. However, such instrument is not available. Therefore, a generic health-related quality-of-life (HRQoL) outcome instrument, the Medical Outcomes Study 36-item Short-Form Health Survey (SF-36), was also administered to patients as reference standard.

The AOSpine PROST consists of 19 questions on broad aspects of functioning. (Appendix 1 shows the translated and cross-cultural adapted English version.) Each item has a 0–100 numeric rating scale, with 0 indicating no function at all and 100 the functional level before trauma. The scale is supported by smileys at both ends of the ruler. The SF-36 includes 36 items measuring 8 health subscales, and is widely used to measure general health status of patients with different diseases [9]. The two summary measures, the physical component summary (PCS) and mental component summary (MCS), are calculated from the 8 health subscales. Scores range from 0 to 100, with higher scores indicating better health status. The Dutch version of SF-36 has shown good validity results [10, 11]. These questionnaires (AOSpine PROST and SF-36) along with a limited number of additional questions were administered to the patients as one questionnaire via an online system. The additional questions aimed to explore the presence of irrelevant question in AOSpine PROST, the absence of relevant questions and patients’ self-reported degree of recovery.

The health professionals participating in the study were asked to complete background data, consisting of socio-demographic characteristics and trauma-related variables and to make an assessment of patient’s degree of recovery based on clinical and radiological assessments (not recovered at all, somewhat-, mainly-, and completely recovered).

Study procedures

Patients were recruited from two level-1 trauma centers in the Netherlands: University Medical Center, Utrecht (UMCU), and Radboud University Medical Center, Nijmegen (RUMC). The study consisted of two arms: test–retest and responsiveness. For the test–retest part, eligible patients who were seen at the outpatient clinic within 13 months post-trauma were invited to participate while in the responsiveness arm patients were recruited shortly before discharge from hospital. After informed consent, patients received an email with a link to the questionnaire or postal mail with a login code. For the purpose of test–retest, one week after completion patients were asked to fill out the same questionnaire once more. In the responsiveness arm, the questionnaire was administered three times: at 2-week, 6-week and 3-month post-trauma. If it was not completed within 3 days, patients received a reminder via email or telephone.

Statistical analysis

Patient characteristics were analyzed using descriptive statistics and frequency analysis. Content validity was assessed by evaluating the number of inapplicable questions and the responses to the open question if any question was missing in AOSpine PROST. Also floor and ceiling effects were analyzed, which could occur if > 15% of the patients achieve the lowest or highest possible score, respectively. The mean total scores in correspondence to the degree of recovery, both as reported by patients and assessed by the clinicians, were analyzed using Welch’s ANOVA.

Concurrent validity between AOSpine PROST and SF-36 was analyzed using Spearman correlation coefficient (rs). The rs can take values from + 1 to − 1, with + 1 indicating a perfect association, 0 no association and − 1 a perfect negative association of ranks [12]. Concurrent validity is supported if the coefficient is at least 0.70 [13].

The internal consistency was assessed by calculating Cronbach’s α and item-total correlation coefficients. It is suggested that the value of α should be > 0.70 for acceptance as satisfactory internal consistency [13, 14]. Also pairwise Spearman correlation was performed to investigate the correlation between AOSpine PROST items.

Test–retest reliability was assessed using Intraclass Correlation Coefficients (ICCs), with good and excellent reliability indicated by values of 0.70 to 0.85 and > 0.85, respectively [13].

Responsiveness was analyzed using effect size (ES) and standardized response mean (SRM). ES was reflected as the change in score divided by the standard deviation (SD) at 2 weeks. In general, ES > 0.8 is regarded large based on Cohen’s criteria [15]. The SRM is the change score divided by the SD of the change score.

Finally, exploratory factor analysis was performed to identify the dimensionality of the AOSpine PROST. Factors with an Eigenvalue greater than 1 were selected, and selection was confirmed by visual inspection of the scree plot. The factor loading of each item after varimax rotation was examined.

Statistical analyses were conducted using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA).

Results

Patient characteristics

Out of 179 patients, a total of 163 (91.1%) were enrolled. Five patients from RUMC and 11 from UMCU were excluded as they did not complete any questionnaire. Out of the included patients, 14 (8.6%) were from RUMC and 149 (91.4%) from UMCU. The basic patient and clinical characteristics are shown in Table 1.

Table 1 Socio-demographic and clinical characteristics of the study populationa

Content validity

The mean registered time by the online system to complete AOSpine PROST was 7.0 ± 4.3 (range 1–30) minutes. No item in the questionnaire was indicated as inapplicable or irrelevant.

Eight (4.9%) patients reported to have difficulties in filling out AOSpine PROST questions. However, in their clarification patients described in more detail their limitations relevant to a specific question rather than to have a practical difficulty or misunderstanding of the questions. Twenty-three (14.1%) patients responded positive that a relevant question was missing. A detailed analyses of those responses revealed that no specific question was missing. However, patients did indicate that it was somewhat unclear whether the questions should be solely answered for the spine fracture or also to other sustained fractures.

As no patient had the minimum and only one patient (0.6%) the maximum total score, no floor and ceiling effects were observed. AOSpine PROST scores relative to the degree of recovery were more strongly related to the indication by patients (p < 0.001) compared to the clinicians’ assessments (Table 2).

Table 2 Mean AOSpine PROST scores relative to the degree of recovery, both as reported by patients and as assessed by clinicians [mean ± SD (range)]*

Concurrent validity

The Spearman correlations between AOSpine PROST and SF-36 questionnaires are shown in Table 3. The AOSpine PROST most strongly correlated with the physical components (p < 0.001): Physical Functioning (0.79), Role Physical (0.72) and PCS (0.78).

Table 3 Spearman correlation (rs) between AOSpine PROST and SF-36, both for the subscales and summary scales

Internal consistency

The internal consistency of AOSpine PROST total score was excellent (Cronbach’s alpha = 0.960). With a range of 0.50 to 0.86 item-total correlation showed good results (Table 4). The lowest values were seen for ‘Urinating’ (0.50) and ‘Bowel movement’ (0.58). Cronbach’s alpha did not improve (0.95–0.96) when an item was removed. As shown in Appendix 2, Spearman correlation between AOSpine PROST items showed also good results (0.25–0.79).

Table 4 Results for internal consistency and test–retest reliability. Internal consistency for AOSpine PROST items, both item-total correlation (Rho) and Cronbach’s α if item removed are shown

Test–retest reliability

A total of 64 patients were included in the test–retest arm (Table 4). The mean time after trauma was 5.6 ± 4.1 (range 0–13) months when completing the first questionnaire. The time between the first and second administrations was 9 ± 2.3 (range 4–14) days. Excellent test–retest reliability was seen for the total score (ICC = 0.92). When looking into detail per item, all had acceptable to excellent reliability results expect for ‘Back and/or neck pain’ item (ICC = 0.55).

Responsiveness

Out of initially enrolled 96 patients in the responsiveness arm, 59 (61.5%) had completed the questionnaires at all time points and could be included in the responsiveness analysis. AOSpine PROST mean and median scores for the different time points showed gradual increasing over time (Table 5), as would be expected with gradual recovery over time. The change in scores from 2 weeks to 3 months are shown in Table 6. The AOSpine PROST scores showed significant (p < 0.001) larger changes compared to SF-36. Also, the largest ES and SRM were seen for AOSpine PROST (ES = 1.81 and SRM = 2.03). Table 7 shows the changes in AOSpine PROST scores compared to patient-reported degree of recovery; a higher degree of recovery is reflected by a higher change in score with larger ES and SRM.

Table 5 AOSpine PROST scores at 2-week, 6-week and 3-month post-trauma. (n = 59)
Table 6 Change in outcome scores (AOSpine PROST and SF-36) from 2 weeks to 3 months with effect size and standardized response mean (n = 59)
Table 7 Changes in AOSpine PROST scores from 2 weeks to 3 months according to patients’ self-reported degree of recovery at 3-month follow-up

Factor analysis

Factor analysis revealed that two factors had an Eigenvalue > 1, i.e., possible identification of two dimensions across AOSpine PROST items (Factor 1 and Factor 2). These factors had an Eigenvalue of 11.0 and 1.4, and explaining 58.1% and 7.3% of the variance, respectively. Rotated factor analysis for the items among those two factors showed that most items load high on Factor 1 and low on Factor 2, indicating that those items considerably contribute to the dimension represented by Factor 1 (see Table 8 and Fig. 1). Contrarily, the items ‘Urinating’ and ‘Bowel movement’ load high on Factor 2 and low on Factor 1. No item loaded low on both factors which indicates that no possible third factor is expected.

Table 8 Rotated factor analysis for the AOSpine PROST items among the two identified factors (Factor 1 and Factor 2) with Eigenvalue > 1
Fig. 1
figure 1

Rotated factor pattern of AOSpine PROST factor analysis Each dot with its description represents an AOSpine PROST item: prost_1 = Household activities; prost_2 = Work/study; prost_3 = Recreation and leisure; prost_4 = Social life; prost_5 = Walking; prost_6 = Travel; prost_7 = Changing posture; prost_8 = Maintaining posture; prost_9 = Lifting and carrying; prost_10 = Personal care; prost_11 = Urinating; prost_12 = Bowel movement; prost_13 = Sexual function; prost_14 = Emotional function; prost_15 = Energy level; prost_16 = Sleep; prost_17 = Stiffness of your neck and/or back; prost_18 = Loss of strength in your arms and/or legs; prost_19 = Back and/or neck pain

Discussion

This study investigated the psychometric properties of the Dutch language version of the AOSpine PROST (Patient Reported Outcome Spine Trauma), a novel patient-reported outcome measure specifically designed for spine trauma patients. Although a number of outcome instruments have either been developed and validated, or used in, individuals with traumatic spinal cord injury, these tend to focus on the impact of paralysis, e.g., Spinal Cord Independence Measure (SCIM) and Functional Independence Measure (FIM) [3, 16]. A unique approach in AOSpine PROST is asking patients to recall their pre-injury level of health, more specifically to compare their current function (0) with their pre-trauma level of function (100). This feature might have contributed to the good responsiveness of the tool. Comparing the health and function of spinal trauma patients with normative standardized data is not straightforward because the characteristics of spine trauma patients may very well deviate from those of the general population [17, 18]. This explains our findings of in general weak correlation between AOSpine PROST and SF-36. Also, various patient characteristics may influence their outcomes, e.g., cause of trauma and comorbidities.

Excellent results were obtained for internal consistency and test–retest reliability. Very high Cronbach’s alpha values were obtained for all items with exception of ‘Urinating’ and ‘Bowel movement.’ This is in some contrast with findings from the factor analysis. When applying this to spine trauma patient population, these functions are likely to be adversely affected in spinal cord injured patients [19, 20]. It is hypothesized that this bidimensional model will no longer be applicable when the tool is tested among ASIA A and B patients. As this will be performed in the next phase, it was decided not to make any changes to the current version of AOSpine PROST. Also redundancy of certain items will be investigated in future studies, as well as a detailed analysis per item including larger patients samples. Another interesting finding was the discrepancies in AOSpine PROST scores when related to patient-reported degree of recovery compared to the assessments of the clinicians. This supports the authors’ aim to also develop an outcome instrument from the perspective of the treating surgeons: AOSpine CROST (Clinician Reported Outcome Spine Trauma) using the most relevant clinical and radiological parameters [21].

In a preparatory phase of the AOSpine PROST project, a systematic literature review found SF-36 to be the most frequently used generic instrument in studies including spine trauma patients [1]. This finding is supported by several other studies [22,23,24]. We found good concurrent validity for AOSpine PROST when compared to physical component of SF-36 scores and satisfactory but lower concurrent validity for mental SF-36 scores. This indicates that AOSpine PROST total scores reflect more the patient’s experienced physical than the mental health. As was hypothesized further, the responsiveness analysis yielded much better results for AOSpine PROST with the highest ES and SR.

This study has several limitations. The ability to detect minimal clinically important differences has not been investigated completely as a larger patient sample would be required. We certainly aim to investigate this specific aspect in future studies. Second, the contribution of included patients was not equal from the two centers. This was due to the combination of different amounts of spine trauma exposure and practical difficulties in the enrollment process. Finally, somewhat heterogeneous patient population was seen with a relative high mean age and percentage of males. This aging spine trauma patient population is in general seen in the clinics nowadays and also underpinned by various publications [22, 25]. Also, slight differences were seen between the recruited patients from the two centers. Further investigation of subgroups such as age, specific injuries and severity of spinal cord injury would, however, still be very interesting and will be performed in future studies.

In conclusion, this study aimed to analyze the psychometric properties of the Dutch version of the AOSpine PROST and showed very satisfactory results for reliability, validity and responsiveness. In future studies, the applicability of the tool to complete paralyzed patients will also be investigated. The aim is to translate and cross-cultural adapt the AOSpine PROST into many languages in order to make it available for spine trauma patients around the world. Treating surgeons are encouraged to use this novel and validated tool in clinical setting and research to contribute to further evidence-based and patient-centered spine trauma care.