Introduction

The development of prediction models has grown in popularity in Low Back Pain (LBP) research [1,2,3]. Prediction models can help clinicians as physical therapists in daily practice in making a prognosis and provide an estimate of the probability of persisting symptoms for individual patients [4]. This probability estimate may be an indication for the clinician to adjust the treatment goals to the patient needs.

A recent literature review showed that most prediction models (developed for physical therapists) do not use performance measures that evaluate the clinical usefulness of the models well [5]. Performance measures as Receiver Operating Characteristic Curve (ROC curve), the Area Under this Curve (AUC), sensitivity and specificity, in combination with a low and high risk cutoff point can be used to determine the clinical feasibility of a developed prediction model. However, these measures have shown to be less sensitive in evaluating the added discriminative performance of a predictor and do not provide direct feedback on the number of chronic LBP patients that are correctly classified and are therefore less clinically useful [6,7,8]. A novel performance measure to evaluate the discriminative ability of a prediction model is Decision Curve Analysis (DCA) and the Net Benefit (NB) [9]. This method is able to identify the number of patients that are better classified and incorporate clinical consequences of using a model, which is useful for clinicians [7]. Furthermore, this novel method is recommended by recent guidelines to develop prediction models (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement) [10]. Remarkably is that this method has not been used frequently by LBP researchers yet. To our opinion, until now only two studies predicting LBP using DCA, were published [11, 12]. Heymans et al. published a model to predict chronic LBP in workers that included the variables of a ‘clinically relevant decrease in pain intensity and in disability status in the first 3 months’, ‘pain intensity at baseline’ and ‘kinesiophobia’ [13]. The importance of kinesiophobia as a prognostic factor for chronic LBP was suspect. The fear-avoidance model was originally developed to explain the transition from acute to chronic pain [14]. However, there was conflicting evidence in the literature about the clinical usefulness of kinesiophobia as a predictor for chronic LBP. Gheldof et al. stated that, fear of movement measured with the Tampa scale only was a risk factor in case of failure to recovery from short-term LBP [15]. The impact of kinesiophobia on the transition from subacute to chronic LBP was also reported in the studies of Heneweer et al. [16] and Swinkels-Meewisse et al. [17]. Also, Dawson et al. stated that kinesiophobia increased the likelihood of sick leave due to LBP [18]. Furthermore Lakke et al. showed in a synthesis of evidence from systematic reviews (SRs) that kinesiophobia was often included in studies as a possible predictive variable, although this was not always justified [19]. So, the objective of the current study is to evaluate if kinesiophobia is a clinically relevant predictor of chronic LBP in the light of the novel discriminative performance measure, Decision Curve Analysis (DCA), using the Net Benefit (NB) because this measure is more suitable to test the predictive performance of separate predictor variables.

Methods

Study design

A prospective cohort study (n = 170) was used by merging data from workers on sick-leave with LBP that received usual care (UC), in two randomised controlled trials (RCTs) [20, 21]. This study was reported in accordance with the strengthening the reporting of observational studies in epidemiology (STROBE) statement [22]. These RCTs were conducted at the same department, within the same time frame, were similar in design, used the same baseline and follow-up variables and the same in/exclusion criteria: patients with non-specific LBP, on sickleave for 4–8 weeks, visited their occupational physician. For detailed information about the data merging process see Heymans et al. [13]. Both RCTs were approved by the medical ethical committee

Outcome measure

Pain intensity was assessed on a Numerical Rating Scale (NRS) at baseline, 3 and 6 months [23]. The outcome measure chronic LBP (0 = no, 1 = yes) was defined as having a pain intensity score of ≥4 at baseline and ≥ 3 at three and 6 months of follow-up [13, 24].

Prediction models used in this study

The models were derived from the study published in 2010 [13]. In the current study, the value of the variable ‘kinesiophobia’ was studied comparing the following models.

  • Model 1 consisted of the variables ‘pain intensity at baseline,’ a ‘clinically relevant change in pain intensity [21] and in disability status [25] in the first 3 months’. A clinically relevant change in pain and disability was noted by a change of 3 and 4 points on the NRS and Roland Disability Questionnaire (RDQ) respectively within the first 3 months after the LBP episode [26, 27].

  • Model 2 was model 1, plus an extra variable under study: ‘kinesiophobia’ [28].

Statistical analysis

A logistic regression model was used to study the relationship between the outcome measure and the aforementioned predictors. In the original paper of Heymans et al. [13], the variables ‘Pain intensity at baseline’ missed 3% of the data, ‘kinesiophobia’ 6,2%, ‘change in pain intensity in the first 3 months’ 19,7% and ‘change in functional status in the first 3 months’ 23,7% missing data. These missing values were replaced by applying multiple imputation (MI) by using the Multiple Imputation by Chained Equations package [29]. For the current study the first imputed dataset (from 10) was used from the original study to evaluate the DCA analysis of the prediction models that compared the inclusion of the kinesiophobia variable. This procedure was followed for practical reasons because if we had used all multiple imputed datasets from the original study, we had to somehow pool all DCA and Net Benefit results when they were applied in each imputed dataset and these pooling methods are not available. Moreover, our study goal was to compare DCA and Net Benefit of two prediction models and we think that we were still able to fulfill that goal accurately by using one of the imputed datasets because the regression coefficient estimates in this dataset were strongly comparable to the pooled estimates from the original study. The AUC values (95% Confidence Intervals) of each model were also presented. All statistics were done with R software using Harrell’s rms package.

Decision curve analysis

Decision curve analysis (DCA) is a method to evaluate the net benefit (NB) of a prediction model across clinicians and patient preferences for accepting the risk of under- or overtreatment [9, 30]. The decision to treat depends on the benefits (effectiveness) and harms (complications, costs) of the treatment. For this, in DCA the ‘probability threshold’ (pt) is important: a level of certainty of the outcome above which the patient would choose to be treated. This threshold includes the relative value of the patient for receiving treatment when thinking he/she develops chronic LBP in relation to the value of avoiding treatment thinking he/she will recover from LBP. If the treatment is effective with minimal costs and risk of complications this threshold will be low. On the other hand, if the treatment is associated with high intensity, minimal effect and high costs, the threshold will be high. The net benefit (NB) is calculated by the difference between the expected benefit and harm associated with the treatment. The expected benefit incorporates the number of patients who will correctly develop chronic LBP according to the prediction model and will be treated: the true positive patients (TP). The expected harm incorporates the number of patients who will recover from LBP but would be treated (the false positives = FP) multiplied by a weighting factor based on the patient’s threshold probability. In formula: NB = (TP - w FP)/N, where N is the total number of patients and the w in the NB formula is described by pt/(1 - pt). For example, a physiotherapist uses a prediction model to determine the probability of developing chronic LBP and wonders if a patient with a probability of 30% according to the model has to be treated by an exercise program. In formula: w = 0.3/(1–0.3) = 0.43, which means that the number of FP patients in the NB formula gets less weight and that unnecessary treatment is less important than missing treatment (because w is multiplied by FP). When the patient worries about the LBP and/or the treatment is effective, cheap and not intensive, a physical therapist could decide to treat the patient at this low risk of chronic LBP [21]. A physical therapist that uses a higher pt of 70% (w = 0.7/0.3 = 2.33), assumes FP decisions more harmful. This may play a role in case of intensive and costly exercise programs [21]. Because patients and clinicians may value harms and benefits differently, the NB can be calculated for different value of pt’s and compared to the NB of treating all patients (assuming everybody develops chronic LBP and needs treatment) or treating no patients (assuming nobody develops chronic LBP). This can be graphically depicted by making a decision curve. A higher NB value means that the model will be more clinically useful as indicated by the higher number of TP patients that are identified. Further, the NB of prediction model 1 can be compared to the NB of prediction model 2 at each level of pt and it can be evaluated if the variable ‘kinesiophobia’ is needed to improve the predictive performance of the prediction model. This will be further clarified and explored in the results section.

Results

Table 1 showed the patients characteristics of all occupational LBP patients (n = 170) of which 91 patients developed chronic LBP (53.5%).

Table 1 Patient characteristics at baseline (n = 170) of the LBP patients

Table 2 showed the strength of the relationships of the variables in the two prediction models.

Table 2 Odds ratios (OR) of the 2 prediction models compared

Model 2 (with the added variable ‘kinesiophobia’) did not perform better compared to Model 1 given the AUC of 0.862 compared to the AUC of model 1 of 0.858. Further, the strength of all variables in the models remained the same and the variable ‘kinesiophobia’ in Model 2 showed an insignificant OR of 1.05 with 95% CI of 0.99–1.11.

Decision curve analysis

Comparing both prediction models at one threshold probability

First, our comparison started with an example of the calculation of the NB for prediction model 1, the model without the variable ‘kinesiophobia’ at a pt of 30% by using Table 3. At a pt of 30% the number of TP patients according to the prediction model was 86 and the number of FPs was 38. With a total number of patients of 170 the NB = 86/170 – (38/170 x (0.3/0.7)) = 0.410. This NB meant that a net 41 TP patients per 100 patients was identified, compared to assuming that all patients did not develop chronic LBP, at the same number of FP patients. The calculation of the NB for prediction model 2, that also included the variable ‘kinesiophobia’, was 84/170 – (33/170 x (0.3/0.7)) = 0.411. The interpretation of this NB was that a net 41 TP patients per 100 patients was identified, compared to assuming all patients were negative, at the same number of FP patients. At this level of pt the NB of both models were similar and the prediction was not improved by including the variable ‘kinesiophobia’.

Table 3 Relationship between chronic LBP and results of a prediction models with a predicted probability of chronic LBP of 30%

When it was assumed that all patients were positive and developed chronic LBP, the NB was calculated as NB = 91/170 – (79/170 * (0.3 / 0.7)) = 0.336. This value was lower as the NB of prediction model 1 and 2 above, which meant that both the prediction models were more of clinical benefit at a pt of 30% than just assuming everybody had LBP and treat them accordingly. The difference in NB between both prediction models and assuming that all patients were positive was (NBmodel – NBtreat all) * 100 = (0.411–0.336) * 100 = 7.5. This meant that a net 8 TP patients was identified by using the prediction models compared to treating all patients, without an increase in de number of FP patients. Comparing prediction models 1 and 2, where model 2 contained the extra predictor ‘kinesiophobia’, at a pt of 30%, there was no difference in NB between these models.

Comparing both prediction models at various threshold probabilities - the decision curve

On the decision curve the NB of the prediction models (y-axis) according to the various threshold probabilities pt (x-axis) was plotted. The NBs of the prediction models 1 and 2 were shown in Fig. 1.

Fig. 1
figure 1

Decision curves of the prediction Models 1 and 2 to predict chronic LBP. Y-axis is Net Benefit and x-axis is threshold probability pt. Dotted black line belongs to the Net Benefit of prediction Model 1, dotted grey line to Model 2. The black line is the Net Benefit when all patients are assumed negative and the grey line is the Net Benefit when everybody is assumed positive and would be treated. * The Net Benefit of the model around 90% is sometimes negative due to random noise [9]

It could be seen in Fig. 1 that there were virtually no differences between prediction models 1 and 2 in NBs, i.e. adding ‘kinesiophobia’ did not increase the NB over the whole range of pt’s. Table 4 showed the differences in NBs and the improvement in the detection of TP patients of prediction model 2 compared to model 1 at different pt’s. Both models had a higher NB compared to treating everybody over the range of pt’s of 10% to just over 50%. This improvement resulted in the identification of more TP patients for both models compared to treating everybody. Further, prediction model 1 and 2 had slightly the same NBs. Over the whole range of pt’s they alternated in the identification of marginally more TP patients. This meant that the variable ‘kinesiophobia’ was not necessary to improve the prediction of chronic LBP.

Table 4 Net Benefits of prediction Models 1 and 2 compared to the NB of treating nobody or everybody and the consequences for the number of TPs at probability thresholds ranging from 5 to 60%

Discussion

Main findings

In 2010 it was shown by Heymans et al. [13] that a prediction model including ‘pain intensity at baseline’,

‘kinesiophobia’ and a ‘clinically relevant decrease in pain intensity and in disability status in the first 3 months’ predicted chronic LBP well. The finding that changes in the initial LBP and functional status period were relevant to predict chronic LBP later in time was demonstrated in more studies [31,32,33,34]. In the current study it was shown that the variable ‘kinesiophobia’ was not required to predict chronic occupational LBP in workers by using the novel performance measures DCA and NB.

Decision curve analysis

Often it was not totally clear when a prediction model was of benefit for clinicians and/or patients. It made therefore sense to evaluate the clinical value of the model at different levels of threshold probabilities by using the NB. For example, when patients worried about their LBP, the physical therapist might want to know if the model was still of benefit at low risk probabilities. The patient might then be successfully referred to a low intensive and cheap intervention program [21]. To know at which probability threshold the prediction model was clinically useful, we had to know what kind of risk probabilities physical therapists used in practice and what kind of harm and benefits were acceptable for physical therapists and patients. That was challenging for LBP because physical therapists might think differently about LBP and the consequences of treatment for their patients.

Strengths and limitations

Performance measures as sensitivity and specificity could be used to determine the discriminative ability of a prediction model. However, these measures are less suitable to test the predictive performance of separate predictor variables [6]. Sensitivity or specificity may decrease even when the ROC curve of one model uniformly dominates the ROC curve of the other model. The NRI and decision-analytic measures will agree in sign in reasonable scenarios [8].

Therefore, the sensitivity, specificity and ROC curve were not considered in this study. A limitation could be that our model was only internally validated, however internal validation was sufficient to allow the use of DCA [35]. Our definition of chronic LBP was not applicable to all types of chronic LBP patients in practice. For example, the prognosis of patients with pain free episodes (not identified by our definition) might be determined by other variables for chronic LBP. Furthermore, our definition was determined on the level of pain intensity. It had been argued that chronic LBP may not only be based on pain intensity but also on limitations in function [36]. However, recent studies showed that LBP pathways were linked to functional disability in such a way that if one knew the level of pain intensity, also the level of functional limitations could be determined and vice versa [34]. Another limitation is that data were used from RCTs rather than observational studies. RCTs apply strict inclusion and exclusion criteria that can for example result in a more homogeneous patient population that may affect the performances of the prediction model [4]. Consequently, this may hamper the generalization of results to a group of patients that is seen in daily practice.

Conclusion

In our study ‘kinesiophobia’ (measured by the TS) seemed not required to improve the prediction of chronic occupational LBP and that it was not needed to adapt the treatment strategy. The performance measures NRI and DCA were not used in LBP research and practice yet. Why these measures were not used is unclear. Perhaps because the most published articles concerning the NRI and NB methods were published in methodological or statistical oriented journals. Although, the use of probability thresholds was mentioned before for physical therapists within the context of clinical decision making [37, 38]. The DCA gave the best insight in the clinical usefulness of prediction models for physical therapists. They could translate clinical usefulness and benefits in terms of number of TP patients that were identified, which is attractive for healthcare professionals and their patients especially in the light of making good treatment decisions.