The high prevalence of low back pain (LBP) and the associated costs, result in a substantial socioeconomic burden to society [1]. Therefore, it is important to evaluate the cost-effectiveness of interventions to help decision makers to determine which intervention should be reimbursed.

For care providers, it is often not possible to reliably make a specific patho-anatomical diagnosis for patients with LBP. Consequently, 85–95% of patients with LBP are diagnosed by their general practitioner as having non-specific LBP [2]. Given this absence of diagnostic precision, and the limited evidence that basing treatment on a specific aetiology improves outcomes, most clinicians use pattern recognition and patient profiling in an attempt to optimize treatment outcomes [3]. A classification system for LBP that has received considerable attention in the literature is the one developed by Delitto et al. [4]. In Delitto’s classification system, classification is based on signs and symptoms of the patient and aims to identify subgroups of patients who are most likely to respond to different types of treatment, i.e. direction-specific exercises, spinal manipulation, stabilization exercises or traction. Two randomized controlled trials (RCTs) reported that treatment based on this system resulted in slightly more favourable outcomes compared with other usual physical therapy management strategies [5, 6]. One of these RCTs also conducted an economic evaluation and reported a trend towards reduced total medical costs ($774) compared with usual physical therapy care ($1,004) after 1 year [5]. However, a limitation of this study was that only direct medical costs were included, while in patients with chronic LBP most costs are attributed to indirect costs such as absenteeism from paid labour [1].

The aim of the present study was to determine whether a modified version of Delitto’s classification-based treatment approach was cost-effective in patients with sub-acute (6–12 weeks) and chronic (>12 weeks) LBP. This approach was compared with usual physical therapy care according to Dutch physical therapy guidelines [7, 8]. The economic evaluation was performed from a societal perspective. Details of the effectiveness of this intervention are published elsewhere [9].


This economic evaluation was conducted alongside a RCT performed in The Netherlands between 2008 and 2010 [9]. The Medical Ethics Committee of the VU University Medical Centre in Amsterdam approved this study. The methods used are briefly described here; a detailed description of the design has been published elsewhere [10].

Patients and setting

Patients were recruited by physical therapists working in primary care in the greater Amsterdam area in the Netherlands. Recruitment occurred during patients’ first consultation. Eligible patients were referred to one of four research physical therapists, who evaluated patient eligibility, obtained written informed consent, collected baseline questionnaires and conducted clinical examinations. Inclusion criteria were: LBP as the primary complaint (with or without associated leg pain), age between 18 and 65 years, current episode longer than 6 weeks, and able to read and write Dutch. The main exclusion criteria were: known- or suspected-specific LBP (e.g. cauda equina compression, fractures), severe radiculopathy (widespread sensory loss and substantial diminished myotomal strength), serious co-morbidity and psychopathology.


The research physical therapists classified all patients into one of the following treatment subgroups using the criteria of the classification algorithm presented in Fig. 1: direction-specific exercises, spinal manipulation, or stabilization exercises. This algorithm is an updated version of Delitto’s algorithm [11] that was modified to fit into the Dutch health care system and the typical Dutch physical therapist practices. Adaptations included; exclusion of patients with acute LBP, because guidelines in the Netherlands discourage physical therapy in this phase [7, 12], and exclusion of the traction subgroup because Dutch private practices generally do not have mechanical traction equipment and traction is not recommended in Dutch guidelines [7]. Following the recommendation by Stanton et al. [13], patients that could be placed in one single treatment subgroup using only the first three boxes of the classification algorithm received the label ‘clear classification’. Patients were given the label ‘unclear classification’ if additional criteria in the second part of the classification algorithm were needed to place them in one of the three subgroups.

Fig. 1
figure 1

The classification-based system for patients with LBP (>6 weeks) as used in this RCT


After the examination by the research physical therapist, an independent researcher randomized each patient to either the classification-based group or the usual physical therapy care group using a randomization list. This list was generated by computer before the start of the trial and patients were pre-stratified by duration of complaints (more or less than 12 weeks) and by disability (Oswestry disability index [ODI] more or less than 25%).

Classification-based group

Patients assigned to the classification-based group were referred to a physical therapist who was trained in the classification treatment protocol. Patients received treatment according to their classification category (direction-specific exercises, spinal manipulation, or stabilization exercises) for a minimum of 4 weeks. After this period, the physical therapist was allowed to change treatment strategy according to the current Dutch guidelines.

Direction-specific exercises

Patients in this subgroup received exercises in the direction that matched their directional preference: extension, flexion, lateral (side-gliding right or left) or flexion-rotation (right or left). For each direction, a treatment protocol was available which contained patient-generated forces (sustained positions and/or repeated movements) and physical therapist-generated forces (mobilizations in the direction of preference). The protocol could be adapted to the patient’s pain responses.

Spinal manipulation

Patients in this subgroup received high-velocity thrusts directed at the affected spinal levels from T12 to L5 and/or the sacroiliac region. Decisions about the choice of the technique and the location of the forces were left to the discretion of the manual therapist. The symptomatic side was manipulated first. During each session, the manual therapist could make a maximum of four attempts (two on each side) to achieve a cavitation (i.e. ‘a pop’) heard or felt by the manual therapist or the patient.

Stabilization exercises

Patients in this subgroup were first taught to activate their local stability system to control neutral joint position in a supine position (‘drawing in’ the stomach with normal breathing). Second, the patients were instructed to perform these abdominal bracing exercises during strengthening exercises performed in standing, quadruped, and side-support positions [14]. Third, in addition to these standardized exercises, activity of the local and global muscle systems was trained in positions and movements that aggravated the patient’s pain, to restore functional capacity and improve dynamic control.

Usual physical therapy care group

Patients assigned to usual physical therapy care were referred to a physical therapist who was trained in implementation of the current Dutch physical therapy LBP guidelines [7, 8]. These guidelines recommend limiting the number of treatment sessions for patients with a normal course of LBP, giving adequate information, using mainly active interventions (e.g. resistive strengthening, stretching and postural exercises) and setting functional treatment goals.

Clinical outcome measures

The outcome measures were: (1) global perceived effect (GPE), measured by self-assessment on a 7-point Likert scale ranging from ‘completely recovered’ to ‘worse than ever’. The GPE outcome measure was dichotomized a priori into success (completely recovered and much recovered) and non-success (slightly recovered, no change, slightly worse, much worse and worse than ever), (2) pain intensity over the previous week, measured using an 11-point numerical rating scale (NRS, 0 = ‘no pain’ to 10 = ‘worst imaginable pain’), (3) functional status, measured with the 10-item ODI, version 2.1a (score range 0–100) with higher scores indicating lower functional status [11]; Health-related quality of life was measured with the EuroQol (EQ-5D) [15]. The Dutch tariff developed by Lamers et al. [16] was used to calculate utilities. Quality-adjusted life-years (QALYs) were calculated by multiplying the utility of a health state by the time spent in this health state. Transitions between health states were linearly interpolated.

Cost measures

Cost data were collected from a societal perspective at 8, 26, 39 and 52 weeks after the start of treatment. Direct healthcare costs, direct non-healthcare costs and absenteeism from unpaid labour were measured using self-completed cost diaries [17]. Absenteeism from paid labour was measured using the Productivity and Disease Questionnaire (PRODISQ) [18]. Productivity loss costs were estimated using the friction cost method (FCM), which assumes that sick workers are replaced after a certain period of time (friction period used was 154 days) [19]. Mean productivity costs per working hour according to age and sex were used to estimate the costs of absenteeism [19]. Healthcare utilization was valued according to the guidelines published in the updated handbook for economic evaluation in the Netherlands [19]. Medication was valued using prices of the Royal Dutch society for Pharmacy, the G-standard (Z-index, The Hague, The Netherlands). Table 1 lists the cost categories and prices used in the present study. The index year for this study was 2009 [20].

Table 1 Prices used in the economic evaluation

Statistical analysis

The economic evaluation was performed according to the intention-to-treat principle. Multiple imputation (MI) was used to impute missing cost and effect data. In the MI procedure, five imputed data sets were generated, each of which was analysed separately. Using Rubin’s rules, effects and costs from the five complete data sets were pooled [21]. All analyses were based on imputed data.

Mean clinical outcome differences were analysed with t tests and uncertainty was estimated by 95% confidence intervals. Costs generally have a highly skewed distribution. Therefore, bootstrapping with 5,000 replications was used to estimate ‘approximate bootstrap confidence’ (ABC) intervals around cost differences [22, 23]. Incremental cost-effectiveness ratios (ICERs) were calculated by dividing the incremental costs by the incremental effects. Similarly, incremental cost-utility ratios (ICURs) were calculated by dividing the incremental costs by the difference in QALYs based on the EQ-5D. Uncertainty around the ICERs and ICURs were estimated using bootstrapping with 5,000 replications. The bootstrapped cost-effective pairs were graphically represented on cost-effectiveness planes (CE planes) [24]. Cost-effectiveness acceptability curves (CEA curves) were estimated that indicate the probability that a treatment is cost-effective at a specific ceiling ratio, which is the amount of money society is willing to pay to gain one extra unit of effect [25].

To assess the robustness of the study results, we performed three sensitivity analyses. Firstly, a per-protocol analysis was conducted. A deviation from the protocol was defined as not receiving treatment after allocation, withdrawal from therapy after three or less visits and not being treated according to patient’s subgroup for patients allocated to the classification-based group. Second, the main analyses were repeated using only complete cases (i.e. with complete clinical outcome data and complete cost data). Finally, the impact of using the Human Capital Approach (HCA) instead of the FCM on the results was assessed. With the HCA the total costs of productivity losses are estimated without considering the possibility of replacing a sick worker.

Finally, a post hoc per-protocol analysis was performed among patients with a ‘clear classification’ label. A previous analysis showed that for this subgroup the clinical outcomes slightly increased in favour of the classification-based group after 1 year [9]. All analyses were performed in SPSS-18 and R [26].


A total of 243 potentially eligible patients were recruited by 62 physical therapists. Of these, 158 patients met all inclusion criteria. Two patients dropped out immediately after randomization and before treatment started, resulting in a sample of 156 patients. The classification procedure placed 85 (54%) patients into the direction-specific exercise subgroup, 42 (27%) into the manipulation subgroup and 29 (19%) into the stabilization exercises subgroup. Baseline characteristics were comparable between the groups, and are described in Table 2. Eighty-two patients were randomized to the usual physical therapy care group and 74 to the classification-based group. Complete clinical outcome data were available for 67 (91%) patients in the classification-based group and 76 (93%) patients in the usual physical therapy care group, and complete QALY data for 60 (81%) and 59 (72%) patients, respectively. Complete cost data were available for 62 (84%) patients in the classification-based group and 65 (79%) patients in the usual physical therapy care group. Patients without complete cost data were younger, had more fearful beliefs about work measured with the Fear-Avoidance Beliefs Questionnaire [27], and were more frequently male and smokers.

Table 2 Baseline characteristics

Clinical outcomes

Table 3 shows that statistically significant differences were present between groups in terms of recovery status (as measured by GPE). Patients that received classification-based treatment recovered more often than patients in the usual physical therapy care group. No differences were found on the other clinical outcomes.

Table 3 Multiply imputed and pooled clinical outcomes after 1 year

Resource utilization and costs

Table 4 shows the mean health care utilization after 1 year in patients with complete cost data. Table 5 shows the differences in costs between classification-based treatment and usual physical therapy care after 1 year. There was a trend towards reduced primary care costs (e.g. physical therapy and general practitioner) for the classification-based group compared to the usual physical therapy care group, but this difference was not statistically significant. Mean total societal costs (i.e. direct health care, direct non-health care and indirect non-health care) were €2,287 for the classification-based group and €2,020 for the usual physical therapy care group, but this difference was not statistically significant (mean difference €266, 95% CI €-720 to €1,612). Indirect non-health care costs (i.e. absenteeism from paid and unpaid labour) were the main contributor to the difference in total costs.

Table 4 Mean (SD) health care utilization after 1 year in patients with complete cost data
Table 5 Multiply imputed and pooled costs after 1 year

Cost-effectiveness and cost-utility analysis

Table 6 shows among others the main results of the cost-effectiveness analysis after 1 year. The bootstrapped cost-effect pairs for the ODI, the NRS and the GPE were primarily located in the northeast and southeast quadrant, indicating that the effects in the classification-based treatment were larger, but that the costs did not differ.

Table 6 Results of cost-effectiveness and cost-utility analyses after 1 year

The ICER for GPE shows that the extra costs for one recovered patient extra in the classification-based group in comparison with the usual physical therapy care group were €1,299. Figure 2 shows the CE plane and CEA curve for GPE after 1 year. The CE plane confirms that there was significant difference in GPE between the two groups, but not in total costs. The CEA curve shows that the probability of classification-based treatment being cost-effective is 95%, if the society is willing to pay around €10,000 for one extra recovered patient.

Fig. 2
figure 2

Cost-effectiveness plane and cost-effectiveness acceptability curve for classification-based treatment in comparison with usual physical therapy care for global perceived effect after 12 months (multiple imputed data)

The cost-utility analysis showed that classification-based treatment was more costly and slightly more effective than usual physical therapy care (ICUR 10,543, Table 6 ‘main analysis’). The CEA curve in Fig. 3 shows that the maximum probability of classification-based treatment being cost-effective lies around 0.8. To reach this probability society should be willing to pay around €30,000 per QALY.

Fig. 3
figure 3

Cost-utility plane and cost-utility acceptability curve for classification-based treatment in comparison with usual physical therapy care after 12 months (multiple imputed data)

Sensitivity analyses

The results of the sensitivity analyses are also presented in Table 6. The results of these analyses followed the same pattern as the main analyses. However, when using the HCA, the total costs of classification-based treatment were slightly less compared with usual physical therapy care (Table 6).

Post hoc analysis

A total of 116 patients met the criteria of a ‘clear classification’ and 104 of these patients followed the treatment protocols. A per-protocol analysis among patients with a ‘clear classification’ label showed that there was a trend for larger effects for classification-based treatment compared with the results of the other analyses (Table 6). However, between group differences were again only statistically significant for GPE and the cost-effectiveness results did not differ fundamentally from the main analysis.


This is the first economic evaluation that evaluated the cost-effectiveness of this classification-based treatment approach in comparison with usual physical therapy care from a societal perspective, meaning that direct and indirect costs were included. The classification-based treatment approach as used in this study was not cost-effective compared with usual physical therapy care for patients with sub-acute and chronic LBP.

Fritz et al. [5] also evaluated the cost-effectiveness of classification-based treatment, but included only direct health care costs. Fritz’s study and our study found lower direct healthcare costs in the classification-based group as compared to usual physical therapy care after 1 year, but the differences were not statistically significant. However, it must be noted that the patient selection criteria, the comparison treatment and the classification algorithm used in the study of Fritz et al. [5] differed from our study and hampers comparison of the two studies.

The results of the three sensitivity analyses were in line with the main analysis, indicating that our findings were reasonably robust. The one exception was that indirect costs were higher in the classification-based approach using the FCA, but lower when using the HCA. This difference was due to two patients in the usual physical therapy care group who were absent from work during the whole year. For these two patients, only the indirect costs of the first 154 days of sick leave were included using the FCA, whereas the indirect costs were included for the whole year using the HCA resulting in higher indirect and total costs. However, the conclusion of the cost-effectiveness and cost-utility analyses did not differ between both approaches.

Patients in the usual physical therapy care group received significantly more physical and manual therapy sessions than patients in the classification-based group at a follow-up period of 1 year (mean difference 4.6, 95% CI 0.7–8.5, p = 0.02). This might indicate that the classification-based approach leads to lower physical therapy consumption compared to usual physical therapy care in the long run. However, another explanation could be the differences in the training of the therapists. Almost all physical therapists in the classification-based group were also trained as manual therapists, whereas in the usual physical therapy care group only a minority of physical therapists were also manual therapists. Korthals-de Bos et al. [28] also found higher utilization of physical therapy in those receiving treatment from non-manual therapists compared to those receiving treatment from manual therapists in patients with neck pain. In the present study, the differences between the two groups in utilization of physical and manual therapy did not have major consequences for the total costs, because the physical and manual therapy costs represented only around 15% of the total costs.

An exploratory post hoc per-protocol analysis showed that the cost-effectiveness of classification-based treatment in patients with a ‘clear classification’ label was slightly better compared with the results of the other analyses. Ideally, a classification algorithm should be able to classify all patients into (only) one subgroup. In our RCT, using the first part of the algorithm 24% of all patients did not meet any of the subgroups and 16% met more than one subgroup. Our findings are consistent with those of Stanton et al. [13]. Although our post hoc results must be interpreted very cautiously, they may indicate that the classification algorithm needs further refinement and possibly extension to be able to classify other relevant subgroups in order to improve clinical outcomes and reduce costs.


In this trial 16–21% of the cost data were missing and for some of the baseline characteristics the patients with complete cost data differed from the patients with incomplete cost data. However, the results of the cost-effectiveness and cost-utility analyses for patients with complete cost data did not differ from the results after applying multiple imputation techniques for missing cost and effect data. Therefore, this is unlikely to have any major influence on the results.


Societal costs of classification-based treatment in patients with LBP were comparable with usual physical therapy care. Cost-effectiveness could not be demonstrated with regard to any of the outcome measures. Therefore, based on a societal cost perspective, we do not recommend widespread implementation of this approach.