Introduction

The AOSpine Knowledge Forum Trauma initiated a project to develop universal disease-specific outcome instruments for spine trauma patients. Because of the possible discrepancies when comparing outcomes from the patients’ perspective to clinical and radiological assessments by the clinicians, two separate tools were developed: the Patient Reported Outcome Spine Trauma (AOSpine PROST) to represent the patients’ perspective, and the Clinician Reported Outcome Spine Trauma (AOSpine CROST) to cover the perspective of the treating surgeons [1].

Although a number of outcome measures have been used in individuals with traumatic spine injuries, these tend to focus on the impact of paralysis [2]. In the absence of an instrument that is specifically designed and validated for spine trauma patients without complete paralysis, it is difficult to compare outcomes of different treatments of the spinal column injury within and between studies [3]. Because of the persisting controversies on the optimal treatment of many types of these injuries, there is a real need for such an instrument [4,5,6].

The systematic approach and methodology of the International Classification of Functioning, Disability, and Health (ICF) of the World Health Organization (WHO) was used as the basis for the development of the AOSpine PROST [7, 8]. The ICF recognizes that functioning and disability are multi-dimensional concepts relating to different components: body functions (b), body structures (s), activities and participation (d), and environmental factors (e). Figure 1 shows the components of the ICF and the hierarchical organization of more than 1400 categories into different levels of detail. This article reports on the multi-phase process used in developing the AOSpine PROST, as well as the results of its application in a pilot study.

Fig. 1
figure 1

The bio-psycho-social model of the International Classification of Functioning, Disability and Health (ICF), along with an example of the hierarchical fashion in different levels

Phase I: preparatory studies

Four different studies were completed in the preparatory phase of the project, all of which have been published. Three preparatory studies aimed to identify ICF categories relevant to measure the outcomes of traumatic spinal column injuries from different perspectives. The research perspective was covered by a systematic literature review [3]. Out of 5117 screened references, 245 were included, and 17 different frequently used outcome measures used in spine trauma research were identified. The content of these measures were linked to 57 ICF categories, using established linking rules [9, 10]. The expert perspective was explored through a web-based survey among 150 experienced spine trauma surgeons from all world regions, and identified 13 ICF categories as most relevant [11]. The patient perspective was investigated in an international empirical study including 187 patients from nine trauma centers in seven countries, and yielded 38 ICF categories as the most important [12]. A fourth study investigated various question and response formats for their potential use in the patient reported outcome instrument [13].

Phase II: international consensus conference

In the next phase, a formal consensus process integrated evidence from the preparatory studies and expert opinion [14].

From a pool of candidates already involved or interested in the project, eleven international spine trauma experts from six countries were selected to attend a consensus conference. The selected experts are globally renowned for their contributions that have advanced the field of spine trauma research and care. Based on voting and group discussions, 25 out of 159 relevant ICF categories were selected as core categories (Table 1). A core ICF category was defined as being (a) relevant for adult traumatic spinal column injury patients, (b) relevant for clinical and functional recovery during the acute and post-acute time frame, and (c) meaningful to include in the outcome instrument. The attendants also agreed on one specific question format as well as the 0–100 Numeric Rating Scale (NRS-101) as the response format to use as the scale (Fig. 2).

Table 1 The core ICF categories (n = 25) and their relation to the defined items in the AOSpine PROST version that was pilot tested, along with examples incorporated in each item
Fig. 2
figure 2

The question and response formats initially agreed on during the international consensus conference, and the format used in AOSpine PROST that was pilot tested

Phase III: development AOSpine PROST

Methodology

Taking the results from the consensus conference as the basis, a draft version of the AOSpine PROST was developed in the Dutch language following the steps that we outline here. First, it was investigated if and which core ICF categories could be clustered as one item. Subsequently, the defined items were implemented in the selected question and response formats. This draft version was discussed among the Dutch-native investigators, and a senior researcher and professor in spinal cord injury rehabilitation with an academic background in psychology and extensive amount of experience in the development of outcome measures. The draft version was also translated into English to discuss it among the AOSpine Knowledge Forum Trauma members. Based on this feedback, changes were applied accordingly to the Dutch version and a draft version was developed. Finally, this draft version was pilot tested.

From core ICF categories to specific items

The draft version of the AOSpine PROST was developed by clustering the 25 core ICF categories into 19 items (Table 1).

The majority of the core ICF categories (n = 15) were transformed into one specific item. Three items of the AOSpine PROST were formed by clustering two core ICF categories: Using transportation (d470) and Driving (d475) formed the item ‘Traveling’, Preparing meals (d630) and Doing Housework (d640) were combined in ‘Domestic life’, and Support and Relationships (e3) and Community life (d910) into ‘Social activities’. One item, ‘Personal care’, was formed by clustering three core ICF categories: Washing oneself (d510), Toileting (d530) and Dressing (d540). Products or substances for personal consumption (e110) was the only core ICF category that could not be transformed into a specific item.

Subsequently, examples were added to all items, except for the items ‘Work/Study’ and ‘Sexual functioning’ (Table 1). Those examples were primarily selected from the extensive descriptions of each specific ICF category in the ICF manual [8].

Once agreement was reached upon the examples, the next step was to implement the items into the selected question and response formats. Unlike degenerative disorders or diseases where patients express their function as compared to perfect health, patients recovering from an injury express their health status in relation to their status prior to the accident or injury. Expressing all items in the selected question format (‘Please indicate your level of functioning NOW [item] compared to BEFORE the accident’) resulted for most items, however, in complicated sentences and cumbersome sentence structures. Therefore, it was decided to explain the question format at the beginning of the questionnaire instead of presenting it per item and define ‘accident’ as the accident that caused the spine injury.

The main focus of each item was the functional impairment and the problems in daily living related to this impairment. To stress this, the phrase ‘the extent to which [item] limits your current level of overall function’ was added to some items, e.g. the item ‘Pain’.

Scoring methodology

Each item is scored on the aforementioned NRS-101 scale. In this scale, 0 indicates no function at all while 100 represents the pre-injury level of function, which may not necessarily correspond to population normative data nor to function in a condition of perfect health. During the developmental phase of the AOSpine PROST, it was decided to visualize and support the scale by smileys at both ends of the ruler (Fig. 2). The total score is the sum of all scores divided by the number of completed items. Instructions on how to score an item, and the statement that all items should be completed were added to the questionnaire.

Pilot testing AOSpine PROST

Procedures

Patients were recruited from the Orthopaedic outpatient department of a level-1 trauma center in The Netherlands. In line with the patient population in the previous phases of the project, eligibility criteria were defined as adults with a diagnosis of spine trauma and outpatient follow-up within 13 months post-trauma. Poly-trauma (Injury Severity Score >15) and completely paralyzed patients (American Spinal Injury Association grade A or B) were excluded.

Eligible patients were informed about the study and invited to participate. Once informed consent was given, the Dutch draft version of the AOSpine PROST was filled out in a cognitive interview setting. More specifically, the ‘think aloud’ and ‘probing’ methodology was used to assess the comprehensibility, relevance, acceptability, feasibility and completeness of the questions [15]. In this context, the respondents were instructed to complete the AOSpine PROST as they would do at home or at another place, and to verbalize their thoughts while filling out each question. Using the ‘probing’ methodology, the interviewer (SS) asked questions within the course of the interview in response to patients’ comments to comprehend their interpretation more precisely and clearly. Background data was collected from the medical record and completed during the interviews.

The Medical Ethics Review Committee (MERC) of the University Medical Center Utrecht confirmed that the Medical Research Involving Human Subject Act (WMO) does not apply to this study and that, therefore, an official approval of this study by the MERC was not required under the WMO.

Results cognitive interviews

In total, 25 eligible patients were enrolled consecutively in January and February 2015. The basic socio-demographic and clinical characteristics are shown in Table 2.

Table 2 Patient and clinical characteristics of the study population in the pilot study (n = 25)

The think aloud and probing methodology revealed that the items were very well understood and easy to read, except for some difficulties with two items. ‘Work/Study’ was considered as the general function of daily living by 7 out of 10 retired patients. They postulated that the time they used to spend on their previous paid work, currently was filled with many other activities. The remaining three retired patients did not provide an answer with the assumption that the question was inapplicable. The second item that patients experienced difficulties with was ‘Energy level and motivation’. It was considered as two separate questions. All patients indicated that they were highly motivated to recover as soon as possible, but their energy levels were considerably lower. The score they provided was an average of these considerations.

Analyses of the rational for providing a specific score to an item revealed that the examples were most important. If one example within the same item was scored high, while another was given a low score, patients usually estimated an average score.

The NRS-101 scale was comprehended clearly by 23 out of 25 patients (92.0%) to compare their current level of function with their pre-injury functional state.

The time to fill out the AOSpine PROST could not be calculated because of probing during the course of the interview. The average total time of the cognitive interview was 14.4 min (range 8–20). Patients indicated the questionnaire not to be too extensive.

Content validity

All items were considered as relevant by the patients. Two patients (8.0%) suggested that we should add the use of painkillers as an item.

Internal consistency

The internal consistency of the questionnaire was excellent with a Cronbach’s α of 0.926 [16]. As shown in Table 3, a wide range of item-total correlations was seen, from 0.182 (‘Urination’) to 0.897 (‘Personal care’). However, Cronbach’s α became only 0.05 higher after removal of the item with the lowest item-total correlation. The highest median scores were observed for ‘Urination’ and ‘Defecation’ (Table 3).

Table 3 Mean scores per item along with the standard deviation, ranges and median scores, as well as the corrected item-total correlations and alpha if the item is deleted

Discussion

Using the systematic approach and methodology of the ICF, and based on the results of four different preparatory studies and an international consensus conference, a disease-specific patient reported outcome instrument for traumatic spinal column injury patients has been developed. A Dutch draft version of this 19-item AOSpine PROST was pilot tested and showed very satisfactory results for comprehensibility, relevance, acceptability, feasibility, and completeness, as well as high internal consistency.

The ICF methodology as well the ‘think aloud’ and ‘probing’ methods have proven to be very good and valid methodologies for developing and refining outcome instruments [7, 8, 15, 17,18,19].

The 19 items of AOSpine PROST cover a wide range of domains, including and beyond the scope of activities of daily living. With the specific response scale, patients are able to compare their current level of function with their situation before the trauma. This makes the AOSpine PROST valuable compared to outcome measures that solely focus on the level of dependence in patients’ daily activities, such as the SCIM and WISCI [20, 21]. Applying these outcome instruments to patients with only mild or transient neurological deficits would result in ceiling effects. Moreover, the AOSpine PROST includes many items that could be very relevant for spinal cord injured patients, e.g. ‘Urination’, ‘Defecation’, and ‘Changing your body position’. In contrast to many other outcome measures used in this specific patient population, which include generic outcome measures and instruments designed for patient populations with degenerative conditions [3], the AOSpine PROST holds promise as a useful outcome measure in patients with and without neurological deficit, making it more feasible for clinical use as well.

All 25 core ICF categories could be incorporated in the AOSpine PROST, except for Products or substances for personal consumption (e110). This ICF category was defined as a core category, with the rationale of possibly including a separate item that would describe the use of opioids. During the pilot study, two patients indicated that opioid use is a missing item and should be included. However, the overall concept of the outcome instrument relates more to the functional impairment and the problems in daily living related to this impairment and not to specific treatment strategies such as the use of medication. Opioid use could be taken into account for the AOSpine CROST, the future outcome instrument from the perspective of the treating surgeons [22, 23].

The findings obtained from the pilot study are of great value for refining the AOSpine PROST prior to multicenter validation of this instrument. In the next phase, the items ‘Work/Study’ and ‘Energy level and motivation’ will be adjusted because of the difficulties experienced by the patients when answering these questions. Examples will be added to the ‘Work/Study’ item, and the motivation part will be removed from the item ‘Energy level and motivation’ as ceiling effects could be expected for ‘Motivation’ when separating it as an item. Another valuable finding was that patients scored their level of function by taking all provided examples into account and calculate an average score. This may lead to lower item-total correlations for the specific items. To abolish this obscurity, instructions will be added to base the score on the situation or example where the patient is most disabled.

We do recognize several limitations of the development process for this outcome instrument. First, this process slightly deviates from the ICF Core Set development guideline, e.g. a focus group was not included in the preparatory phase [7]. Nevertheless, the chosen process provides a solid and systematic base for the selection of the core ICF categories described in this article. Second, a specific trauma patient population was chosen. The rational was to exclude confounding factors and focus on the effect of spinal column injury on health and function in the acute and post-acute phase. Once validated in this specific patient population, the AOSpine PROST will be subjected to further validation in completely paralyzed patients as well. Third, the number of patients included in the pilot study could be debated. We believe this is a sufficient number to explore the most common obstacles experienced by the patients to fill out the AOSpine PROST. Fourth, analyses as test–retest reliability, floor and ceiling effect, or responsiveness were not performed in the pilot study. These analyses will be performed in the next phase in a multicenter validation study including a considerable larger number of patients. Finally, in the development process the Dutch version was freely translated into English to be reviewed by the AOSpine Knowledge Forum Trauma. We believe this is acceptable for this phase of the project. Once a definitive Dutch version is developed and ready to be validated, a careful translation into English will be performed and the linguistic equivalence of both versions will be checked using established guidelines [24].

In conclusion, using the ICF methodology and incorporating the results of four preparatory studies and an international consensus conference, the AOSpine Patient Reported Outcome Spine Trauma (AOSpine PROST) was developed. Taking the results from the subsequent pilot study into account, a definite version will be developed, followed by international multicenter studies to validate both the Dutch and English versions. Once validated, the AOSpine PROST together with the AOSpine CROST have the potential to be useful in the clinics as well as research, to evaluate, compare and establish the effectiveness of interventions in the treatment of spine trauma patients. In this context, the outcomes as assessed by these tools could be related to many clinical characteristics including the type of fractures, the provided treatments and radiological results.