Introduction

The number of elective lumbar spinal fusions (LSFs) has increased 2.4-fold in the past decade [1], although postoperative pain reduction often remains unsatisfactory [2]. Some patients have a considerably lower probability of achieving a reduction in pain postoperatively [3]. To improve clinical shared decision making, expectation management, and patient selection, it is important to predict expected outcomes after LSF and act upon this information.

Prediction tools are reliable tools that can predict the probability of outcomes after LSF. Patients and surgeons can consult such prediction tools to estimate probabilities of outcomes, such as pain reduction, after LSF for that specific patient. Factors that predict postoperative pain reduction have been reported previously [4,5,6]. Patient characteristics such as age, smoking, American Society of Anaesthesiologists (ASA) score and preoperative patient-reported outcome measures (PROMs) on pain, mental health and health-related quality of life (HRQOL) are associated with postoperative pain reduction [4,5,6,7]. To the best of our knowledge, only one study externally validated a prediction tool that predicts pain reduction after LSF, which has been translated into an easily implementable tool in the USA [7]. However, due to substantial differences in healthcare systems, this tool probably cannot be applied to European countries. Moreover, potentially important predictors such as symptom duration and mental health were not incorporated in that model. For use in clinical practice, an externally validated and easily applicable prediction tool developed in a representative population is imperative [8].

Thus, the aim of this multicentre cohort study is to develop and validate a prediction tool to predict the probability of clinically relevant reduction in pain 1 and 2 years after elective one- to three-level LSF.

Methods

From January 2011 until January 2015, baseline and 1- to 2-year postoperative questionnaires were collected from 202 patients undergoing elective LSF as part of routine care in the university hospital. In this cohort study, this derivation set was used to develop and internally validate the logistic regression model. The validation set was used for external validation of the model and contained baseline and 1- to 2-year postoperative data on 251 patients collected from July 2014 until November 2016 in the general hospital. This study was assessed by the local ethics committee and was considered not applicable to the Medical Research Involving Human Subject Act (number: 16-4-262.1/ivb).

Population

Adult patients (≥ 18 years) eligible for elective one- to three-level LSF were included. Diagnosis and surgical procedure were verified from their medical records. Patients were included in the study if they were diagnosed with degenerative disc disease, spondylosis, spondylolysis/-listhesis, spinal stenosis, adjacent level disease, post-herniotomy, post-laminectomy or (recurrent) disc herniation. Revisions of a spinal fusion within 1 year of the previous surgery were excluded.

Data collection

Patients preoperatively and postoperatively completed questionnaires on the following: back and leg pain using the Visual Analogue Scale (VAS) [9], physical functioning using the Oswestry Disability Index (ODI) [10], HRQOL using the RAND-36 [11], mental health using the Pain Catastrophizing Scale (PCS) and Hospital Anxiety and Depression Scale (HADS) [12, 13]. From the three VAS scores (back pain, right leg pain and left leg pain), the predominant (worst reported) pain score was used as a predictor. The RAND-36 resulted in a mental component score (RAND-36 MCS) and a physical component score (RAND-36 PCS). The HADS provided anxiety and depression subscores.

Furthermore, the following demographic data were collected: sex, age, Body Mass Index (BMI), smoking status (yes/no), duration of pain (< 2 years/ ≥ 2 years) and ASA score (I–II/III).

In the validation set, back and leg pain was measured using the 11-point Numeric Pain Rating Scale (NRS) instead of the VAS [9, 14]. The NRS score was transformed to a 0–100 scale by multiplying all scores by ten, to match with the VAS scale in the derivation set.

Dependent variable

Pain relief is the main goal for most patients undergoing LSF [15]. Therefore, the primary outcome of the prediction tool was defined as a clinically relevant reduction in predominant pain in the back or (one of the) legs (worst reported pain in back or legs) as measured with the VAS at 1 to 2 years after surgery. The secondary outcome was defined as a clinically relevant reduction in leg pain at 1 to 2 years after surgery. The VAS for pain ranges from 0 to 100, with 0 indicating no pain and 100 indicating the most severe pain imaginable [9]. To make interpretation of the prediction tool more practical, the dependent variable was made binary: clinically relevant pain reduction or not. Minimal clinically important change (MCIC) for pain ranged between 0.28 and 2.88 on an 11-point scale in the literature on spinal surgery, and a reduction of 2.88 or more (28.8 on a 0–100 point scale) was a priori defined as a clinically relevant pain reduction to prevent overestimation [16].

Statistics

Analyses were performed using SPSS (versions 24, SPSS Inc., Chicago, IL, USA) and R (version 3.3.2; https://www.r-project.org). In the case of incomplete variables within a case, multiple imputation of missing values was used [17].

The independent samples t test for normally distributed variables or the Mann–Whitney U test for nonnormally distributed variables was used to analyse differences in baseline and outcome variables between subgroups within and between cohorts.

Multivariable logistic regression was used to develop the prediction model. Stepwise backward elimination was used to eliminate nonsignificant predictor variables from the logistic regression model. To prevent premature deletion of predictor variables, a more liberal alpha for exclusion criterion of variables was used (alpha = 0.157) [18].

Discriminatory capacity of the prediction model was quantified by the area under the receiver operating characteristic curve (AUC). The discriminative capacity is perfect when the AUC is 1.0; there is no discriminative capacity when the AUC is 0.5 equivalent to a coin flip.

The logistic regression model was internally validated using standard bootstrapping techniques. As a result, a shrinkage factor was computed, which was used to penalize the regression coefficients of the logistic regression model. The internally validated model was applied to the validation set, for which a new AUC was calculated to evaluate its performance in the population of the second hospital. A nomogram was developed from the validated logistic regression model.

Power analysis

As a general rule, ten events per predictor variable are necessary to find associations in logistic regression models [19]. The percentage of patients undergoing LSF achieving MCIC in pain on average was 56% [3]. A prediction model with 11 predictors could be developed based on a sample size of 197 patients (202 patients were available in the derivation set). Eleven independent variables were selected based on clinical relevance by literature review [4,5,6,7] and by expert opinion of five experienced spine surgeons. Selected variables include the following: sex, BMI, pain duration, smoking status, educational level, employment status, ASA score, VAS, ODI, PCS and RAND-36 [4,5,6,7].

Results

Population characteristics

The derivation set consisted of 202 patients who were found eligible for analysis (see Fig. 1). Baseline characteristics are shown in Table 1. The mean reduction in predominant pain was 33/100 points (SD = 31.3); for leg pain, it was 35/100 (SD = 35.5).

Fig. 1
figure 1

Flowchart of number of patients included in the dataset used to develop the model. LSF lumbar spinal fusion

Table 1 Cohort characteristics and differences between derivation sample, complete case and external validation sample*

The validation set consisted of 251 patients (see Table 1). The validation set differed from the derivation set in terms of the mean preoperative predominant pain score (P = 0.001), RAND-36 MCS (P = 0.047) and reduction in predominant pain (P = 0.044). The mean reduction in predominant pain in the validation set was 27/100 points (SD = 29.4); for leg pain, this was 31/100 (SD = 34.6). No significant differences in terms of predominant pain reduction were found between categories of surgery type, primary diagnosis or number of levels fused (see Table 2).

Table 2 Surgery characteristics and subgroup distribution with regard to clinically relevant pain reduction 1 to 2 years after LSF

Development of the prediction model

In total, 9.1% of values were missing in the derivation set; these values were imputed using 20 imputations.

The clinical prediction model consisted of eight independent predictors after stepwise backward elimination: smoking, BMI, pain duration, educational level, ASA, predominant preoperative pain, physical functioning (ODI), HRQOL related to mental health (RAND-36 MSC). Patients had a higher probability (odds ratio [95% confidence interval]) of achieving a clinically relevant pain reduction if they were nonsmoking patients (0.41 [0.19–0.87]) with lower BMI (0.93 [0.85–1.01]), short pain duration (0.49 [0.20–1.19]), low educational level (0.46 [0.19–1.12]), lower ASA score (4.82 [1.35–17.25]), higher VAS scores (1.05 [1.02–1.08]), lower ODI (0.96 [0.93–1.00]) and higher RAND-36 MCS (1.03 [0.10–1.06]) (see Table 3). The model had an AUC of 0.77 (95% CI = 0.70–0.83).

Table 3 Predictors of a clinically relevant reduction in predominant pain 1 to 2 years after LSF using logistic regression

The model for leg pain consisted of four independent predictors after stepwise backward elimination: smoking, pain duration, ASA, predominant preoperative pain. Patients had a higher probability of achieving a clinically relevant leg pain reduction if they were nonsmoking (0.55 [0.27–1.12]), had short pain duration (0.59 [0.30–1.15]), lower ASA score (3.18 [0.82–12.34]) and higher VAS scores (1.03 [1.01–1.05]). The model had an AUC of 0.71 (95% CI = 0.63–0.77).

Internal validation

The bootstrap validation yielded a shrinkage of 0.84 for predominant pain and 0.88 for leg pain, which was used to multiply the regression coefficients of the final model in order to correct for overfitting (see Table 4). The optimism-corrected AUC of the internally validated model was 0.74 for predominant pain and 0.69 for leg pain.

Table 4 Internally validated logistic prediction model for clinically relevant pain reduction 1 to 2 years after LSF

External validation

After exclusion of patients who had not completed any preoperative PROMs, 0.18% of the values were missing and these were imputed. Educational level was missing in the validation cohort and was therefore omitted from the prediction model. In the validation set, the prediction model was able to discriminate between achieving relevant pain reduction or not in 68% of the cases, meaning that an AUC of 0.68 (95% CI = 0.66–0.69) was achieved. For leg pain, the AUC in the validation set was 0.52 (95% CI = 0.44–0.59).

Development of the prediction tool

From the validated model for clinically relevant reduction in predominant pain, a nomogram was plotted (see Fig. 2). Patients score points per predictor variable, as visualized on the rulers. Explanation on how to use the nomogram and a practical example can be found in “Appendix 1.”

Fig. 2
figure 2

Nomogram to predict the probability of a clinically relevant pain reduction for the individual patient. BMI Body Mass Index; ASA American Society of Anaesthesiologists; ODI Oswestry Disability Index; MCS Mental Component Scale

Sensitivity analysis

Primary diagnosis, as categorized in Table 2, was added as a predictor to the clinical prediction model, to assess whether variability in diagnosis within our population influenced the final prediction model. Primary diagnosis was excluded from the final prediction model after stepwise backward elimination.

Discussion

We developed and validated a tool to preoperatively predict a clinically relevant reduction 1 to 2 years after LSF in an adequately powered analysis. A nomogram was developed from the externally validated model (for the primary outcome) for application in clinical practice. With an AUC of 0.68 in an external population, this prediction tool possesses fair discriminatory ability to predict a clinically relevant reduction in predominant pain. We also developed and externally validated a model for clinically relevant reduction in leg pain, which had an AUC of 0.52 and thus possesses low discriminatory ability. The clinical prediction tool for predominant pain could be implemented in clinical practice to improve shared decision making when considering LSF.

In agreement with our findings, previous studies reported that preoperative nonsmoking status [5, 7], better physical functioning [4, 6] and better mental health [4, 5] predict pain reduction 1 to 2 years after LSF. This strengthens the likelihood that the prediction tool developed in this study is able to predict pain reduction in other populations as well.

Surprisingly, our results showed that higher educational level indicated a lower probability of a clinically relevant pain reduction, whereas from the literature high socioeconomic status is usually associated with a better health condition, especially in patients with chronic low back pain [20, 21]. Educational systems in various countries are different, and definitions of high educational level can differ; therefore, further research is needed to verify this finding.

The performance of our prediction nonvalidated model for reduction in predominant pain was similar to that of Abbott et al. (0.74 vs. 0.72 respectively); the externally validated model of Kohr et al. performed better compared to ours (0.79 vs. 0.68 respectively) [7, 22]. However, they externally validated their model in a random sample from the same population it was built in, explaining the high performance. The model performance for reduction in leg pain was low (AUC = 0.52). Therefore, this model was not translated into a prediction tool. A possible explanation for the low AUC is that we excluded possibly important predictors too soon in the model development phase, leading to overfitting (data fits "too well") of the model to the derivation set [18]. The added value of our study lies in the fact that we externally validated a model predicting a clinically relevant reduction in predominant pain in a European setting and translated it into a concrete tool for use in clinical practice (see Fig. 2).

Strengths and limitations

A strength of the study is that our model is derived from an academic hospital population and externally validated on a population from a general hospital. Usually, surgical populations from an academic hospital and general hospital differ in the sense of complexity of the surgery. From our external validation, it is apparent the model can predict a clinically relevant reduction in predominant pain in both academic (AUC = 0.74) and nonacademic settings (AUC = 0.68). However, for leg pain, this was not the case as it did not perform well in the nonacademic setting (AUC = 0.52). Further external validation of the prediction tool is necessary for applicability of the prediction tool to countries with different surgical populations and healthcare systems.

A limitation of this study is the amount of missing data in derivation set used to develop the model (9.1%). This was probably caused by the fact that the data were collected retrospectively from standard care records. Consequently, multiple imputation was to minimize to increase the power of our analysis. Secondly, in the general hospital, the variable “educational level” was missing [23]. We chose elimination of this predictor from the model rather than imputation, because the value of this predictor is considered untrustworthy without external validation. Finally, we acknowledge that the cutoff point for clinical relevance in our model, although based on literature, is arbitrary. Nevertheless, the primary outcome was defined as a clinically relevant reduction in predominant pain, as indications for elective LSF are due to both back and leg pain in our hospitals.

Future implications of the results

The validated prediction tool for estimating clinically relevant reduction in predominant pain can be used by clinicians as an aid to preoperatively inform individual patients about their expected outcomes. An example and explanation of the clinical application and decision making with the help of nomogram can be found in “Appendix 1.” Secondly, adding new variables able to predict clinically relevant pain reduction could improve the performance of the prediction models. A variable that is overlooked in all previously mentioned models is preoperative physical performance. In other types of surgery, it has been proven physical performance can improve predictive power [24, 25], which may also hold true for patients undergoing LSF. Thirdly, for patients who are less likely to achieve a clinically relevant pain reduction, care should be tailored to their specific needs in order to improve this probability. Using the nomogram, a surgeon can identify which risk factors that are modifiable contribute least to the expected pain reduction for the individual patient and can inform the patient to improve these risk factors before surgery.

Conclusion

Using the validated prediction tool (nomogram), a patient's probability of a clinically relevant pain reduction can be estimated 1 to 2 years after undergoing LSF. This validated prediction tool can be implemented in clinical practice to aid patients and care professionals in the difficult process of clinical decision making when considering LSF.