Skip to main content

Measurement Properties of the Spinal Function Sort in Patients with Sub-acute Whiplash-Associated Disorders


Purpose To extensively analyze the measurement properties the Spinal Function Sort (SFS) in patients with sub-acute whiplash-associated disorders (WAD). Methods Three-hundred-two patients with WAD were recruited from an outpatient work rehabilitation center. Internal consistency was assessed by Cronbach’s α. Construct validity was tested based on eight a priori hypotheses. Structural validity was measured with principal component analysis (PCA). Test–retest reliability and agreement was evaluated in a sub sample (n = 32) using intraclass correlation coefficient (ICC) and limits of agreement (LoA). The predictive validity of SFS for future work status at 1, 3, 6, and 12 months follow-up was determined by area under the curve (AUC) of receiver operating characteristics. Non-return to work (N-RTW) was defined with two cut-off points: workcapacity <50 and <100 %. Results N-RTW decreased from 50 %, 1 month follow-up, to 14 %, 12 months follow-up. Cronbach’s α was 0.98, PCA revealed evidence for unidimensionality. ICC was 0.86, LoA was ±33 points. Seven out of eight hypotheses for construct validity were not rejected. AUC reduced with a longer follow-up from 0.71 for 1 month to 0.61 at 12 months, for cut-off point <50 %. For cut-off point <100 % these values were 0.71 and 0.59. Conclusion In patients with sub-acute WAD test–retest reliability, internal consistency, construct- and structural validity of the SFS were adequate. LoA were substantial. Sensitivity to accurately predict N-RTW was poor. The predictive validity of the SFS for N-RTW of patients with sub-acute WAD from an outpatient work rehabilitation setting was only sufficient for the short term (1 month).


Self-report questionnaires have been developed for many types of health conditions, some for use in occupational rehabilitation. One of the reasons for their popularity is the relative efficiency of data collection. In limited time, a broad array of data can be collected about the functional impairments, limitations, and psychological status experienced by the evaluee. This information can be very useful for planning return to work interventions.

However, disability questionnaires have important limitations for use in European occupational rehabilitation settings. The first is that the use of self-reported measures depends on the literacy and linguistic skills of an evaluee which may be limited in evaluees with different cultural backgrounds i.e. mother languages [1]. The second is that most disability instruments do not have a work-related point of reference, but consider an unlimited spectrum of activities. Whether or not the evaluee can actually lift 15 kg at work, for example, is still unknown after filling in the questionnaire. These limitations may be overcome by using a picture-based questionnaire such as the Spinal Function Sort (SFS) [2]. The SFS is a self-report measure of tasks and activities that includes a picture to each item [3]. The items are linked to demonstrable physical ability. The SFS is used in conjunction with a functional capacity evaluation (FCE) to cross-reference self-reported abilities with measured abilities (i.e. functional capacity) [4].

In patients with chronic low back pain (CLBP) the SFS has revealed good clinical practicality, reliability and high predictive validity for non-return to work in various settings and countries [58]. Although, the SFS is used in occupational health for other health conditions as well, the measurement properties including the (predictive) validity for future compensation benefits of SFS other than CLBP are unknown. Furthermore, it is not reported whether the SFS performs differently in samples which are assessed earlier in the course of the disorder.

Hence, the aim of this study was to test measurement properties of the SFS by assessing internal consistency, test–retest reliability, agreement, construct validity and predictive validity for work status of the SFS in patients with sub-acute WAD.


Subjects, Procedure and Context


This study was embedded within usual care of an outpatient work rehabilitation setting. From January 2011 to January 2012 eligible participants were referred for an interdisciplinary rehabilitation assessment at the rehabilitation clinic in Bellikon (Switzerland) by insurance physicians or case managers of Swiss Accident Insurance Fund (SUVA). Participants were from the German-speaking part of Switzerland. The main reasons for referral included: (1) not regaining full work capacity (WC) within 6–12 weeks after a whiplash injury; (2) exceeding expected healing times; (3) or having plateaued with the provided medical and rehabilitative care. Inclusion criteria were: injured workers with WAD related neck pain and, Grade I or II according the Québec Task Force Classification with reduced working capacity of their actual job. They were within 6–12 weeks after initial injury, and received worker’s compensation benefits.

Ethical approval for this study was granted by the Medical Ethics Committee of the Canton Aargau (EK AG 2010/055). Patients gave consent that their data were used for research purpose.


At base line a review of the medical history and a physical examination was performed by a rehabilitation physician (approximately 60 min), followed by FCE tests administered by a physiotherapist. After determination of eligibility, patients completed questionnaires and carried out FCE tests (60 min). Fitness-for-work certificates or work capacity settlement were explicitly not part of this interdisciplinary assessment.


All participants were insured by SUVA, the largest state owned accident insurance in Switzerland. SUVA covers costs for occupational and non-occupational injuries for employed individuals and unemployed job-seeking persons [9]. Injured persons receive compensation up to a maximum of 80 % of the previous salary, and medical and vocational assistance. Invalidity pensions can also be refunded by SUVA to the injured person.



The SFS was used to measure self-reported functional ability to perform work-related tasks and activities of daily life that involve the spine. The SFS contains 50 drawings with simple descriptions (Item example in the Fig. 1). Patients rated their functional ability for each activity on a 5-point Likert scale: “able” (4), to “restricted” (1, 2, 3) or “unable” (0). The SFS yields a single rating ranging from 0 to 200, with higher scores indicating more or better abilities. The scores can be categorized according the work demands as defined by the Dictionary of Occupational Titles (DOT) [10]. SFS scores have been adapted to the DOT categories previously as follows [5]: SFS score <100 ≈ minimal work demands, 100–124 ≈ sedentary work (<5 kg), 125–164 ≈ light work (5–10 kg), 165–179 ≈ medium heavy work (10–25 kg), 180–194 ≈ heavy work (25–45), >195 ≈ very heavy work (>45 kg) These categories allow a comparison between the self-reported functional ability and work demand. For test–retest reliability of the SFS a sample of patients was tested twice within a week after baseline.

Fig. 1
figure 1

Item 14 of the Spinal Function Sort (SFS) questionnaire: Lift a 10 kg milk crate from the floor to eye-level

Physician Determined Work Capacity (WC)

To determine the predictive validity for future work status, the WC was used as an estimate of ability of work. The WC was obtained from the accident insurance’s administrative data. WC was determined at 1, 3, 6 and 12 months after baseline by the treating physician, usually a general practitioner, and represents the proportion workability of pre-injury work. Determination of WC was based on proposed WC-forms and recommendations [11, 12]. WC is expressed in a percentage (0–100 %) and is translated in days or hours modified work. For example, if a worker is deemed WC = 50 %, he will work for 2.5 days/week or 5 half days/week modified work. The remaining 50 % is financially compensated. The reliability and validity of the WC determination is unknown. WC in %, is directly related to compensation costs and reflects the proportion of work loss to the employer, the employee and the insurance. Therefore, this method of WC-determination may be less dependent to distortion compared self-reported measures of WC [13].


FCE is a standardized battery of functional tests that intend to measure a patient’s safe physical ability for work related activity [14]. For the purpose of this study four lifting tests were analyzed: lifting floor to waist, lifting waist to overhead, short two handed carry, long one-handed carry (right). Patients were asked to perform the test to their maximum ability. The tests have good reliability and acceptable agreement in patients with WAD [15].


Pain intensity was measured with an 11-point numeric rating scale (NRS) ranging from no pain (0) to worst pain (10) [16]. The patient was asked to rate his momentary pain (“pain now”). The NRS is a commonly used scale with proven reliability and validity in patients with neck pain [17].


Neck pain-related disability was measured with the Neck Disability Index (NDI) [18]. The NDI contains 10 items: pain intensity, personal care, lifting, reading, headaches, concentration, work, driving, sleeping, and recreation. The scale of each item ranges from no disability (0) to total disability (5). A higher score indicates more severe self-reported disability. The NDI is reliable and valid in several languages and settings [18, 19].

Mental Distress

The Hospital Anxiety and Depression Scale (HADS) was used to assess the symptom severity of anxiety disorders and depression in non-psychiatric populations [21]. The HADS consists of two scales, one for anxiety and one for depression (A- and D-scale respectively). Each scale contains 7 items, with each item rated from 0 (best) to 3 (worst). The scale scores are calculated by summing the responses to the items up to a maximum score of 21 points (severe case) per scale. A higher score indicates more severe anxiety or depression. Good reliability, validity have been reported for the use of the HADS in the general and various clinical populations [20, 21].

Data Analysis

Normal distribution was visually assessed using P–P plots and tested with the Kolmogorov–Smirnov and the Shapiro–Wilk tests. Floor and ceiling effects for the SFS were considered to be present if more than 15 % of participants achieved the lowest or highest possible score of the items [22].

Internal Consistency

Internal consistency was assessed by item-to-total correlations and Cronbach’s alpha. Optimal consistency for measurements at group level was considered when alpha value is between 0.7 and 0.9. Values <0.7 may be indicative for items measuring different traits, values >0.9 may be indicative for item redundancy [23].


The unidimensionality of the 50 SFS items was measured with principal component analysis (PCA) with Kaiser normalization and Varimax rotation. An Eigenvalue criterion of 1.0 was used for the factor analysis. Unidimensionality was assumed when ratio of the first to the second factor was 3:1 [24].

Test–Retest Reliability and Agreement

Test–retest reliability was expressed as an Intraclass Correlation Coefficient (model 1; one-way random) (ICC). ICC was interpreted as follows: ICC ≥ 0.90 is excellent; good when ICC was between 0.75 and 0.90; moderate when ICC was between 0.50 and 0.75; and poor when ICC ≤ 0.50. ICCs were acceptable when ICC ≥ 0.75, and the lower boundary of the 95 % confidence interval of the ICC ≥ 0.50 [25]. Agreement was expressed in limits of agreement (LoA) (mean difference ± 1.96 × SD of mean difference) [26].

Construct Validation: Hypothesis Testing

Eight predefined hypothesis on the strength of the association of SFS and four FCE lifting tests, NDI, Pain NRS, and HADS A + D are displayed in Text Box A. The strength of the association is expressed in the absolute value of the correlation coefficient. Associations were calculated using Spearman rank correlation coefficient and interpreted as follows: 0.00–0.25 little if any (“not correlated”); 0.26–0.49 low or weak; 0.50–0.69 moderate; 0.70–0.89 high or strong; 0.90–1.00 very strong correlation [27]. The SFS was considered valid, when 7 out of 8 hypotheses (≥80 %) of the a priori hypotheses were not rejected [28].

Text Box A Eight hypotheses for examining construct validity of the Spinal Function Sort

Predictive Validity for Work Status at 1, 3, 6 and 12 months

Sensitivity and specificity, positive predictive value as well as likelihood ratio of a positive test were calculated to evaluate the predictive validity of the SFS items at baseline for work capacity at 1, 3, 6 and 12 months after baseline assessment. In a setting of injured workers, who are in a transition phase from acute to chronic disorder, the aim is to identify those patients with a high probability of not returning to work (N-RTW) in order to target specific rehabilitation interventions to those patients. We used two cut-off points to measure N-RTW i.e. WC < 50 %, or WC < 100 %. These two cut-off points were determined based on distribution-plots of WC. The index test was the SFS. Sensitivity was defined as the proportion of patients, identified for different DOT categories based on the SFS score, not have N-RTW. Specificity was defined as the proportion of patients, identified for different DOT-categories based on the SFS score, who did return to work. The positive predictive value for N-RTW was calculated as the percentage of patients within a DOT category that were correctly identified not to have regained full work capacity. Likelihood ratio was calculated as Sensitivity/1 − Specificity. Based on a previous study, it was expected that “minimal”, perceived ability (SFS score <100, less than sedentary work) score would have a high positive predictive value in identifying those patients who would N-RTW at follow-up times [5]. Receiver operating characteristic (ROC) curves were drawn and area under the curve (AUC) was calculated. The AUC has a maximum value of 1.0, indicating a perfect predictive validity which is reached if the curve lies in the upper-left corner; a value of 0.5, represented by the diagonal, means that the measurement instrument cannot distinguish between patients N-RTW or RTW. An AUC of at least 0.70 is considered “appropriate” [29]. As a cut off indicating statistical significance p < 0.05 was used. All analyses were performed using SPSS (Statistical Package for Social Sciences, Version 21).



From January 2011 to January 2012, 313 subjects were eligible based on the inclusion criteria. Seven SFS scores were missing. In the construct validity study 306 subjects were included. From this sample 302 were included in the study on the predictive validity of the SFS because 4 patients no follow-data on WC were available (Table 1). For the test–retest reliability 32, 11 females, 21 males, mean age 39.6 years, were assessed twice within a week. The patients characteristics of the test–retest study are reported elsewhere [15].

Table 1 Characteristics of the patients (n = 302)

Internal Consistency, Ceiling Effects

Internal consistency was Cronbach’s alpha 0.98. Removing 50 % of the items (even or uneven items), resulted in alpha values of 0.97. Ceiling effects were not present, except in items 45–48. The item to total correlation was <0.20 in item 45–48. These four items displayed very heavy material handling tasks (>45 kg). In a post hoc analysis, Cronbach’s alpha values were unchanged when removing item 45–48. All other items showed item to total correlations >0.30.


Correlations coefficients between each of the SFS were in the majority >0.3. PCA with fixed factors showed the presence of six components with Eigenvalues exceeding 1, explaining 55.3, 8.2, 4.6, 3.2, 2.3 and 2.1 % of the variance, respectively. The inspection of the scree plot revealed 2 components. For the interpretation of the components Varimax rotation was executed. The rotated solution revealed the presence of a mixed structure with two components showing a number of strong loadings. The items 45–48 loaded on a different component. The ratio from the first to the second Eigenvalue was 6.87, indicating reasonable evidence for unidimensionality.

Test–Retest Reliability and Agreement

The test–retest reliability measured with the ICC was 0.86 (95 %CI 0.71; 0.93). For the 32 patients in the reliability study, mean SFS scores for test and retest were 146.4 (mean, SD 32.1), and 146.6 (mean, SD 37.2) respectively. Mean difference in SFS score between test and retest was 0.2 (SD 16.9, p = 0.0.943). Hence LOA were 0.2 ± 33 points Variances were not related to the magnitude of the score. A highly influential patient with a difference of 62 units between tests was detected. LoA calculated without that patient were −23.2 and 27.7 with a mean difference of 2.2 (Fig. 2).

Fig. 2
figure 2

Bland-Altman plot of the SFS scores. The middle line represents the mean difference between the two tests. Gray circle represent the upper and cross symbol represent lower limit of agreement, i.e. mean difference + 1.96 SD of the differences and mean difference − 1.96 SD of the differences, respectively. An outlier with a difference in SFS scores of 62 is not shown

Construct Validity

Construct Validation: Hypothesis Testing

Spearman rank correlations coefficient between the SFS and FCE tests were for lifting floor to waist: 0.68; for lifting waist to overhead: 0.61; for short two-handed horizontal carry: 0.70; for one-handed carry right: 0.64. Correlations between the SFS and disability was −0.62; with pain: −0.49; with anxiety: −0.49 and with depression: −0.52. All correlations were significant (p value <0.01). Seven of eight hypotheses were not rejected. Correlations between SFS and work-related lifting tests was moderate to high (0.61–0.70). Depression showed a slightly stronger correlation than hypothesized.

Predictive Validity for Work Status at 1, 3, 6 and 12 months Follow-Up

Sensitivity of the SFS scores transformed into DOT categories for N-RTW at 1, 3, 6 and 12 months ranged between and 0.37 and 0.98 when using the cut-off value of <50 % WC between 0.28 and 0.98, with the cut-off <100 % respectively (Table 2). Sensitivity was substantially higher in the DOT-transformed categories “light” to “very heavy” than in the “sedentary” to “minimal” categories (Table 2). The likelihood ratio for a positive test for N-RTW at 1, 3, 6 and 12 months decreases from 4.64 to 0.96 for the cut-off value <50 % WC, and from 4.32 to 0.79 for the cut-off value of <100 % WC. SFS score can be dichotomized into scores <100 and scores ≥100 points. Patients with scores <100 perceive themselves as having minimal working ability. With this dichotomized scores, Sensitivity for N-RTW with the cut-off of WC < 50 % ranged over time between 0.37 and 0.41, and specificity (=RTW) ranged between 0.80 and 0.92. For the cut-off of WC < 100 %: sensitivity for N-RTW ranged over time between 0.28 and 0.34 and specificity (=RTW) ranged between 0.81 and 0.94 (based on data in Table 2, separately available on request). All ROC curves are displayed in Fig. 3. The AUC reached the cut-off for “acceptable” (>0.70) only three out of eight times: at 1 month follow for both WC cut-offs and at 3 months for cut-off 50 % WC.

Table 2 Predictive validity of DOT-transformed SFS categories for non-return to work at 1, 3, 6, and 12 months of follow-up
Fig. 3
figure 3

(AUC, Part 1). ROC curve of SFS total score at baseline with cut off values of work capacity 50 or 100 % at 1 month (first row) and 3 months (second row) follow-up to predict non return to work. (AUC, Part 2). ROC curve of SFS total score at baseline with cut off values of workcapacity 50 or 100 % at 6 months (third row) and 12 months (fourth row) follow-up to predict non return to work. WC workcapacity, AUC area under the curve, CI confidence interval


The aim of the study was to extensively analyze measurement properties of the SFS in patients with WAD 6–12 weeks after injury. The majority (7 out of 8) of the a priori defined hypotheses for construct validity were not rejected. The SFS test structure was confirmed by a distinct factor loading. Test–retest reliability was good, however measure of error (LoA values) on an individual level were large relative to the scale range. Predictive validity of the SFS based on the AUC was acceptable in three out of 8 AUC: at 1 month for both cut-offs and at 3 months for cut-off 50 % WC. The SFS scores for the DOT-transformed categories “minimal” to “sedentary” workload were not able to identify those who will N-RTW (low sensitivity). The positive likelihood ratio for N-RTW was sufficient only for the categories “minimal” to “sedentary” for both cut-off WC < 50 % and WC < 100 %.

The SFS can, based on the measurement properties evaluated in this study, be recommended for clinical and research applications in patients in an occupational setting with sub-acute WAD and with different cultural backgrounds. Clinicians should be aware of the large measurement error of the SFS when making recommendations on individual level. The scores of the SFS may assist to predict N-RTW especially for medium, heavy and very heavy DOT categories. Application of the SFS may be a practical alternative or addition to other instruments with sufficient measurement properties. Practicality can be enhanced when half the items are removed. Further research should analyze if even more items can be removed (Cronbach’s α of half the SFS items is 0.97, indicating that item redundancy is still apparent).

The SFS scores in our sub-acute sample was substantially higher (mean 133 points, SD 42.7) than in two other validation studies with chronic low back pain patients in Europe (mean 105 points, SD 46.1), and in Australia (mean 116, SD 40.8) [5, 8]. A very high Cronbach’s α was found, which is in line with previous validation studies [5, 6, 8]. High internal consistency may be partly determined by a large number of items [30]. These high alpha values are indicative for item redundancy. In a sensitivity analysis we calculated Cronbach’s α and PCA values with half of the SFS items, with minimal changes in consistency and dimensionality. From a statistical point of view, half of the SFS items could be omitted, reducing the time requirement to fill out the questionnaire to 5 Min. (now, 10–15 min.). In agreement with previous studies, four items, with very heavy lifting tasks, could be removed without affecting the measurement properties of the SFS [5, 6]. Our results concerning reliability measured with ICC 0.80 are lower than two reliability studies 0.89 and 0.98 respectively [6, 8]. The LoA values found in a rehabilitation setting in the French-speaking area of Switzerland were ±11 while in the German-speaking area the values were ±27, whereas our results were ±33 [6]. In the studies of the German speaking sample the SFS was part of case-closure FCE setting to define fitness-for-work, whereas in the French-speaking sample this was not the case [5, 6]. One reason for the differences in reliability and agreement may be the difference in interval between test and retest; 2–3 days compared to 7 days in our study. Another reason may be that our patients were in a sub-acute stage of WAD which may change more on a daily basis compared to chronic patients. The ability to predict N-RTW in our study was substantially lower than in a sample of patients with CLBP [5] although follow-up times were similar. Albeit some similarities, the work rehabilitation setting and large proportion of blue collar workers with a Non-Swiss cultural background, several other reasons may explain these differences.

First, the proportion of patients who did N-RTW was substantially lower at 3 and 12 month follow-up in our study sample compared in patients with CLBP with rates between 34 and 16 %, and 62 and 54 % respectively. This may be due the fact that the CLBP patient had on average a significantly longer duration of 200 days off work, compared to 90 days in this study. Therefore, a smaller proportion of WAD patients is expected to N-RTW due to the benign natural course of the disorder despite perceived disability [31]. Further, we used WC data from the physician and the insurance. Moreover, legal regulations in Switzerland recently changed allowing to close claims of patients with WAD within the first 1 or 2 years which is not the case in CLBP [32]. These changes may have influenced N-RTW rates in patients with WAD which depend on the legal jurisdictions [33]. Hence, the validity of the SFS should be tested also in patients with WAD in other health cares systems. Secondly, in one study patients were classified as RTW if they had worked at least 1 day in the follow-up period [5]. These differences influence the proportion of patients classified as RTW or N-RTW, and therefore the results concerning the predictive properties of the SFS [34]. Third, the differences in symptoms of patients with WAD differ in part from those with CLBP. And forth, the depicted tasks of the SFS involving the spine may be perceived to the neck differently from the lower back.

Future studies should investigate whether a short version of the SFS would lead to similar measurement properties. Computer based measures could offer some advantages over a paper form. By using Item Response Theory (IRT) techniques only suitable items are assigned based on the response pattern of the evaluee. First results using a computer based measure similar to the SFS are promising, but need further evaluation in clinical samples [35, 36].


We used hypotheses and cut-off points based on the results of previous studies. These cut-offs may viewed as arbitrary. Moreover, we analysed WC in % which may lead to different results then compared to self-report of the employee, or other reporting measures [3739]. Moreover, the psychometric properties of WC in % are unknown. WC may rely on physicians interpretations and patients report [40]. Finally, replication studies are needed because the results differ in other populations, contexts and FCE procedures.


In patients with sub-acute WAD test–retest reliability, internal consistency, construct- and structural validity of the SFS were adequate. LoA was substantial. Sensitivity to accurately predict N-RTW was poor.

Based on the AUC the predictive validity of the SFS for N-RTW of patients with sub-acute WAD from an outpatient work rehabilitation setting was only sufficient for the short term.



Spinal Function Sort questionnaire


Whiplash-associated disorder


Functional capacity evaluation


Work capacity


(Non-)return to work


Dictionary of Occupational Titles


  1. Burrus C, Ballabeni P, Deriaz O, Gobelet C, Luthi F. Predictors of nonresponse in a questionnaire-based outcome study of vocational rehabilitation patients. Arch Phys Med Rehabil. 2009;90:1499–505.

    Article  PubMed  Google Scholar 

  2. Matheson LN, Matheson ML, Grant J. Development of a measure of perceived functional ability. J Occup Rehabil. 1993;3:15–30.

    CAS  Article  PubMed  Google Scholar 

  3. Matheson LN. History, design characteristics, and uses of the pictorial activity and task sorts. J Occup Rehabil. 2004;14:175–95.

    Article  PubMed  Google Scholar 

  4. Oliveri M. Functional capacity evaluation. In: Gobelet C, Franchignoni F, editors. Vocational rehabilitation. Paris: Springer; 2005.

    Google Scholar 

  5. Oesch PR, Hilfiker R, Kool JP, Bachmann S, Hagen KB. Perceived functional ability assessed with the spinal function sort: is it valid for European rehabilitation settings in patients with non-specific non-acute low back pain? Eur Spine J. 2010;19:1527–33.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  6. Borloz S, Trippolini MA, Ballabeni P, Luthi F, Deriaz O. Cross-cultural adaptation, reliability, internal consistency and validation of the Spinal Function Sort (SFS) for French- and German-speaking patients with back complaints. J Occup Rehabil. 2012;22:387–93.

    CAS  Article  PubMed  Google Scholar 

  7. Robinson RC, Kishino N, Matheson L, Woods S, Hoffman K, Unterberg J, et al. Improvement in postoperative and nonoperative spinal patients on a self-report measure of disability: the Spinal Function Sort (SFS). J Occup Rehabil. 2003;13:107–13.

    Article  PubMed  Google Scholar 

  8. Gibson L, Strong J. The reliability and validity of a measure of perceived functional capacity for work in chronic back pain. J Occup Rehabil. 1996;6:159–75.

    CAS  Article  PubMed  Google Scholar 

  9. Suva. Suva: an overview [Swiss Accident Insurance Fund] 2013. Accessed 17 Sep 2013.

  10. U.S. Department of Labor, Ma X. The revised handbook for analyzing jobs. 4th ed. Indianapolis: JIST Works, Inc.; 1991.

    Google Scholar 

  11. Stöckli H, Ettlin T, Gysi F, Knüsel O, Marelli R. Soltermann B [Diagnostics and therapeutic approach in the chronic phase of whiplash associated disorders]. Schweiz Med Forum. 2005;5:1182–7.

    Google Scholar 

  12. Fitforwork-swiss. WOCADO [Workcapacity estimation for doctors] [The Work Foundation], 2013. Accessed 03 Dec 2013.

  13. van Poppel MN, de Vet HC, Koes BW, Smid T, Bouter LM. Measuring sick leave: a comparison of self-reported data on sick leave and data from company records. Occup Med (Lond). 2002;52:485–90.

    Article  Google Scholar 

  14. Isernhagen SJ. Functional capacity evaluation: rational, procedure, utility of the kinesiophysical approach. J Occup Rehabil. 1992;2:157–68.

    CAS  Article  PubMed  Google Scholar 

  15. Trippolini MA, Reneman MF, Jansen B, Dijkstra PU, Geertzen JH. Reliability and safety of functional capacity evaluation in patients with whiplash associated disorders. J Occup Rehabil. 2013;23:381–90.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  16. Ferraz MB, Quaresma MR, Aquino LR, Atra E, Tugwell P, Goldsmith CH. Reliability of pain scales in the assessment of literate and illiterate patients with rheumatoid arthritis. J Rheumatol. 1990;17:1022–4.

    CAS  PubMed  Google Scholar 

  17. Pool JJ, Ostelo RW, Hoving JL, Bouter LM, de Vet HC. Minimal clinically important change of the Neck Disability Index and the Numerical Rating Scale for patients with neck pain. Spine (Phila Pa 1976). 2007;32:3047–51.

    Article  Google Scholar 

  18. MacDermid JC, Walton DM, Avery S, Blanchard A, Etruw E, McAlpine C, et al. Measurement properties of the Neck Disability Index: a systematic review. J Orthop Sports Phys Ther. 2009;39:400–17.

    Article  PubMed  Google Scholar 

  19. Vernon H. The Neck Disability Index: state-of-the-art, 1991–2008. J Manip Physiol Ther. 2008;31:491–502.

    Article  Google Scholar 

  20. Bjelland I, Dahl AA, Haug TT, Neckelmann D. The validity of the Hospital Anxiety and Depression Scale. An updated literature review. J Psychosom Res. 2002;52:69–77.

    Article  PubMed  Google Scholar 

  21. Herrmann C. International experiences with the Hospital Anxiety and Depression Scale—a review of validation data and clinical results. J Psychosom Res. 1997;42:17–41.

    CAS  Article  PubMed  Google Scholar 

  22. McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4:293–307.

    CAS  Article  PubMed  Google Scholar 

  23. Portney LG, Watkins MP. Reliability. Foundations of clinical research. Applications to practice. 2nd ed. Upper Saddle River, NJ: Prentice-Hall Health; 2000. p. 61–77.

    Google Scholar 

  24. Polit D, Beck C. Developing and testing self-report scales. In: Polit D, Beck C, editors. Nursing research, generating and assessing ecidence for nursing practice. Philadelphia: Lippincott; 2008. p. 474–505.

    Google Scholar 

  25. Bland JM, Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med. 1990;20:337–40.

    CAS  Article  PubMed  Google Scholar 

  26. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10.

    CAS  Article  PubMed  Google Scholar 

  27. Hazard Munro B. Statistical methods for health care. Philadelphia: J. B. Lippincott; 1986.

    Google Scholar 

  28. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.

    Article  PubMed  Google Scholar 

  29. de Vet HC, Terwee CB, Mokkink LB, Knol D. Measurement in medicine: a practical guide. 1st ed. Cambridge: Cambridge University Press; 2011.

    Book  Google Scholar 

  30. Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. 4th ed. Oxford: Oxford University Press; 2008.

    Google Scholar 

  31. Carroll LJ, Hogg-Johnson S, Cote P, van der Velde G, Holm LW, Carragee EJ, et al. Course and prognostic factors for neck pain in workers: results of the Bone and Joint Decade 2000–2010 Task Force on Neck Pain and Its Associated Disorders. Spine (Phila Pa 1976). 2008;33:S93–100.

    Article  Google Scholar 

  32. Disability pensions for whiplash injury related disability [Federal Supreme Court of Switzerland], 2010. Accessed 07 Mar 2014.

  33. Schrader H, Obelieniene D, Bovim G, Surkiene D, Mickeviciene D, Miseviciene I, et al. Natural evolution of late whiplash syndrome outside the medicolegal context. Lancet. 1996;347:1207–11.

    CAS  Article  PubMed  Google Scholar 

  34. Portney LG, Watkins MP. Validity. Foundations of clinical research. Applications to practice. 2nd ed. Upper Saddle River, NJ: Prentice-Hall Health; 2000. p. 79–107.

    Google Scholar 

  35. Mayer J, Mooney V, Matheson L, Leggett S, Verna J, Balourdas G, et al. Reliability and validity of a new computer-administered pictorial activity and task sort. J Occup Rehabil. 2005;15:203–13.

    Article  PubMed  Google Scholar 

  36. Mooney V, Matheson LN, Verna J, Leggett S, Dreisinger TE, Mayer JM. Performance-integrated self-report measurement of physical ability. Spine J. 2010;10:433–40.

    Article  PubMed  Google Scholar 

  37. Gatchel RJ. Psychosocial factors that can influence the self-assessment of function. J Occup Rehabil. 2004;14:197–206.

    Article  PubMed  Google Scholar 

  38. Tompa E. Measuring the burden of work disability: a review of methods, measurement issues and evidence. In: Loisel P, Anema JR, editors. Handbook of work disability. Prevention and management. New York: Springer; 2013. p. 43–58.

    Chapter  Google Scholar 

  39. Gauthier N, Sullivan MJ, Adams H, Stanish WD, Thibault P. Investigating risk factors for chronicity: the importance of distinguishing between return-to-work status and self-report measures of disability. J Occup Environ Med. 2006;48:312–8.

    Article  PubMed  Google Scholar 

  40. Rainville J, Pransky G, Indahl A, Mayer EK. The physician as disability advisor for patients with musculoskeletal complaints. Spine (Phila Pa 1976). 2005;30:2579–84.

    Article  Google Scholar 

Download references


The authors thank the physiotherapists and physicians of the Department of Work Rehabilitation, Rehaklinik Bellikon for their help in performing the tests and collecting data. We also thank Claudia Diethelm, Axel Gehrke and Stephan Scholz-Odermatt for data preparation, and all subjects for their participation.

Conflict of interest

This study was supported by the Rehaklinik Bellikon and from the Swiss Accident Insurance Fund (Schweizerische Unfallversicherungsanstalt, suva), Grant No. 10042900. The study was performed in accordance with the ethical standards of the Declaration of Helsinki and ethical approval for this study was granted by the Medical Ethics Committee of the Canton Aargau (EK AG 2010/055). No benefits in any form have been or will be received from a commercial party related directly or indirectly to the subject of this manuscript. The authors M. A. T., P. U. D., J. H. B. G. and M. F. R. declare that they have no conflict of interest.

Author information

Authors and Affiliations


Corresponding author

Correspondence to M. A. Trippolini.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Trippolini, M.A., Dijkstra, P.U., Geertzen, J.H.B. et al. Measurement Properties of the Spinal Function Sort in Patients with Sub-acute Whiplash-Associated Disorders. J Occup Rehabil 25, 527–536 (2015).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Whiplash injuries
  • Neck pain
  • Physical function
  • Questionnaires
  • Disability evaluation
  • Work