Development and validation of a prediction tool for pain reduction in adult patients undergoing elective lumbar spinal fusion: a multicentre cohort study

On average, 56% of patients report a clinically relevant reduction in pain after lumbar spinal fusion (LSF). Preoperatively identifying which patient will benefit from LSF is paramount to improve clinical decision making, expectation management and treatment selection. Therefore, this multicentre study aimed to develop and validate a clinical prediction tool for a clinically relevant reduction in pain 1 to 2 years after elective LSF. The outcomes were defined as a clinically relevant reduction in predominant (worst reported pain in back or legs) pain 1 to 2 years after LSF. Patient-reported outcome measures and patient characteristics from 202 patients were used to develop a prediction model by logistic regression. Data from 251 patients were used to validate the model. Nonsmokers (odds ratio = 0.41 [95% confidence interval = 0.19–0.87]), with lower Body Mass Index (0.93 [0.85–1.01]), shorter pain duration (0.49 [0.20–1.19]), lower American Society of Anaesthesiologists score (4.82 [1.35–17.25]), higher Visual Analogue Scale score for predominant pain (1.05 [1.02–1.08]), lower Oswestry Disability Index (0.96 [0.93–1.00]) and higher RAND-36 mental component score (1.03 [0.10–1.06]) preoperatively had a higher chance of a clinically relevant reduction in predominant pain. The area under the curve of the externally validated model yielded 0.68. A nomogram was developed to aid clinical decision making. Using the developed nomogram surgeons can estimate the probability of achieving a clinically relevant pain reduction 1 to 2 years after LSF and consequently inform patients on expected outcomes when considering treatment.


Introduction
The number of elective lumbar spinal fusions (LSFs) has increased 2.4-fold in the past decade [1], although postoperative pain reduction often remains unsatisfactory [2]. Some patients have a considerably lower probability of achieving Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s0058 6-020-06473 -w) contains supplementary material, which is available to authorized users. a reduction in pain postoperatively [3]. To improve clinical shared decision making, expectation management, and patient selection, it is important to predict expected outcomes after LSF and act upon this information.
Prediction tools are reliable tools that can predict the probability of outcomes after LSF. Patients and surgeons can consult such prediction tools to estimate probabilities of outcomes, such as pain reduction, after LSF for that specific patient. Factors that predict postoperative pain reduction have been reported previously [4][5][6]. Patient characteristics such as age, smoking, American Society of Anaesthesiologists (ASA) score and preoperative patient-reported outcome measures (PROMs) on pain, mental health and health-related quality of life (HRQOL) are associated with postoperative pain reduction [4][5][6][7]. To the best of our knowledge, only one study externally validated a prediction tool that predicts pain reduction after LSF, which has been translated into an easily implementable tool in the USA [7]. However, due to substantial differences in healthcare systems, this tool probably cannot be applied to European countries. Moreover, potentially important predictors such as symptom duration and mental health were not incorporated in that model. For use in clinical practice, an externally validated and easily applicable prediction tool developed in a representative population is imperative [8].
Thus, the aim of this multicentre cohort study is to develop and validate a prediction tool to predict the probability of clinically relevant reduction in pain 1 and 2 years after elective one-to three-level LSF.

Methods
From January 2011 until January 2015, baseline and 1-to 2-year postoperative questionnaires were collected from 202 patients undergoing elective LSF as part of routine care in the university hospital. In this cohort study, this derivation set was used to develop and internally validate the logistic regression model. The validation set was used for external validation of the model and contained baseline and 1-to 2-year postoperative data on 251 patients collected from July 2014 until November 2016 in the general hospital. This study was assessed by the local ethics committee and was considered not applicable to the Medical Research Involving Human Subject Act (number: 16-4-262.1/ivb).

Population
Adult patients (≥ 18 years) eligible for elective one-to threelevel LSF were included. Diagnosis and surgical procedure were verified from their medical records. Patients were included in the study if they were diagnosed with degenerative disc disease, spondylosis, spondylolysis/-listhesis, spinal stenosis, adjacent level disease, post-herniotomy, post-laminectomy or (recurrent) disc herniation. Revisions of a spinal fusion within 1 year of the previous surgery were excluded.

Data collection
Patients preoperatively and postoperatively completed questionnaires on the following: back and leg pain using the Visual Analogue Scale (VAS) [9], physical functioning using the Oswestry Disability Index (ODI) [10], HRQOL using the RAND-36 [11], mental health using the Pain Catastrophizing Scale (PCS) and Hospital Anxiety and Depression Scale (HADS) [12,13]. From the three VAS scores (back pain, right leg pain and left leg pain), the predominant (worst reported) pain score was used as a predictor. The RAND-36 resulted in a mental component score (RAND-36 MCS) and a physical component score (RAND-36 PCS). The HADS provided anxiety and depression subscores.
In the validation set, back and leg pain was measured using the 11-point Numeric Pain Rating Scale (NRS) instead of the VAS [9,14]. The NRS score was transformed to a 0-100 scale by multiplying all scores by ten, to match with the VAS scale in the derivation set.

Dependent variable
Pain relief is the main goal for most patients undergoing LSF [15]. Therefore, the primary outcome of the prediction tool was defined as a clinically relevant reduction in predominant pain in the back or (one of the) legs (worst reported pain in back or legs) as measured with the VAS at 1 to 2 years after surgery. The secondary outcome was defined as a clinically relevant reduction in leg pain at 1 to 2 years after surgery. The VAS for pain ranges from 0 to 100, with 0 indicating no pain and 100 indicating the most severe pain imaginable [9]. To make interpretation of the prediction tool more practical, the dependent variable was made binary: clinically relevant pain reduction or not. Minimal clinically important change (MCIC) for pain ranged between 0.28 and 2.88 on an 11-point scale in the literature on spinal surgery, and a reduction of 2.88 or more (28.8 on a 0-100 point scale) was a priori defined as a clinically relevant pain reduction to prevent overestimation [16].

Statistics
Analyses were performed using SPSS (versions 24, SPSS Inc., Chicago, IL, USA) and R (version 3.3.2; https :// www.r-proje ct.org). In the case of incomplete variables within a case, multiple imputation of missing values was used [17].
The independent samples t test for normally distributed variables or the Mann-Whitney U test for nonnormally distributed variables was used to analyse differences in baseline and outcome variables between subgroups within and between cohorts.
Multivariable logistic regression was used to develop the prediction model. Stepwise backward elimination was used to eliminate nonsignificant predictor variables from the logistic regression model. To prevent premature deletion of predictor variables, a more liberal alpha for exclusion criterion of variables was used (alpha = 0.157) [18].
Discriminatory capacity of the prediction model was quantified by the area under the receiver operating characteristic curve (AUC). The discriminative capacity is perfect when the AUC is 1.0; there is no discriminative capacity when the AUC is 0.5 equivalent to a coin flip. The logistic regression model was internally validated using standard bootstrapping techniques. As a result, a shrinkage factor was computed, which was used to penalize the regression coefficients of the logistic regression model. The internally validated model was applied to the validation set, for which a new AUC was calculated to evaluate its performance in the population of the second hospital. A nomogram was developed from the validated logistic regression model.

Power analysis
As a general rule, ten events per predictor variable are necessary to find associations in logistic regression models [19]. The percentage of patients undergoing LSF achieving MCIC in pain on average was 56% [3]. A prediction model with 11 predictors could be developed based on a sample size of 197 patients (202 patients were available in the derivation set). Eleven independent variables were selected based on clinical relevance by literature review [4][5][6][7] and by expert opinion of five experienced spine surgeons. Selected variables include the following: sex, BMI, pain duration, smoking status, educational level, employment status, ASA score, VAS, ODI, PCS and RAND-36 [4][5][6][7].

Population characteristics
The derivation set consisted of 202 patients who were found eligible for analysis (see Fig. 1). Baseline characteristics are shown in Table 1. The mean reduction in predominant pain was 33/100 points (SD = 31.3); for leg pain, it was 35/100 (SD = 35.5).
The validation set consisted of 251 patients (see Table 1). The validation set differed from the derivation set in terms of the mean preoperative predominant pain score (P = 0.001), RAND-36 MCS (P = 0.047) and reduction in predominant pain (P = 0.044). The mean reduction in predominant pain in the validation set was 27/100 points (SD = 29.4); for leg pain, this was 31/100 (SD = 34.6). No significant differences in terms of predominant pain reduction were found between categories of surgery type, primary diagnosis or number of levels fused (see Table 2).

Development of the prediction model
In total, 9.1% of values were missing in the derivation set; these values were imputed using 20 imputations.
The model for leg pain consisted of four independent predictors after stepwise backward elimination: smoking,

Internal validation
The bootstrap validation yielded a shrinkage of 0.84 for predominant pain and 0.88 for leg pain, which was used to multiply the regression coefficients of the final model in order to correct for overfitting (see Table 4). The optimismcorrected AUC of the internally validated model was 0.74 for predominant pain and 0.69 for leg pain.

External validation
After exclusion of patients who had not completed any preoperative PROMs, 0.18% of the values were missing and these were imputed. Educational level was missing in the validation cohort and was therefore omitted from the prediction model. In the validation set, the prediction model was able to discriminate between achieving relevant pain reduction or not in 68% of the cases, meaning that

Development of the prediction tool
From the validated model for clinically relevant reduction in predominant pain, a nomogram was plotted (see Fig. 2). Patients score points per predictor variable, as visualized on the rulers. Explanation on how to use the nomogram and a practical example can be found in "Appendix 1."

Sensitivity analysis
Primary diagnosis, as categorized in Table 2, was added as a predictor to the clinical prediction model, to assess whether variability in diagnosis within our population influenced the final prediction model. Primary diagnosis was excluded from the final prediction model after stepwise backward elimination.

Discussion
We developed and validated a tool to preoperatively predict a clinically relevant reduction 1 to 2 years after LSF in an adequately powered analysis. A nomogram was developed from the externally validated model (for the primary outcome) for application in clinical practice. With an AUC of 0.68 in an external population, this prediction tool possesses fair discriminatory ability to predict a clinically relevant reduction in predominant pain. We also developed and externally validated a model for clinically relevant reduction in leg pain, which had an AUC of 0.52 and thus possesses low discriminatory ability. The clinical prediction tool for predominant pain could be implemented in clinical practice to improve shared decision making when considering LSF.
In agreement with our findings, previous studies reported that preoperative nonsmoking status [5,7], better physical functioning [4,6] and better mental health [4,5] predict pain reduction 1 to 2 years after LSF. This strengthens the likelihood that the prediction tool developed in this study is able to predict pain reduction in other populations as well.
Surprisingly, our results showed that higher educational level indicated a lower probability of a clinically relevant pain reduction, whereas from the literature high socioeconomic status is usually associated with a better health condition, especially in patients with chronic low back pain [20,21]. Educational systems in various countries are different,  [7,22]. However, they externally validated their model in a random sample from the same population it was built in, explaining the high performance. The model performance for reduction in leg pain was low (AUC = 0.52). Therefore, this model was not translated into a prediction tool. A possible explanation for the low AUC is that we excluded possibly important predictors too soon in the model development phase, leading to overfitting (data fits "too well") of the model to the derivation set [18]. The added value of our study lies in the fact that we externally validated a model predicting a clinically relevant reduction in predominant pain in a European setting and translated it into a concrete tool for use in clinical practice (see Fig. 2).

Strengths and limitations
A strength of the study is that our model is derived from an academic hospital population and externally validated on a population from a general hospital. Usually, surgical populations from an academic hospital and general hospital differ in the sense of complexity of the surgery. From our external validation, it is apparent the model can predict a clinically relevant reduction in predominant pain in both academic (AUC = 0.74) and nonacademic settings (AUC = 0.68). However, for leg pain, this was not the case as it did not perform well in the nonacademic setting (AUC = 0.52). Further external validation of the prediction tool is necessary for applicability of the prediction tool to countries with different surgical populations and healthcare systems.
A limitation of this study is the amount of missing data in derivation set used to develop the model (9.1%). This was probably caused by the fact that the data were collected retrospectively from standard care records. Consequently, multiple imputation was to minimize to increase the power of our analysis. Secondly, in the general hospital, the variable "educational level" was missing [23]. We chose elimination of this predictor from the model rather than imputation, because the value of this predictor is considered untrustworthy without external validation. Finally, we acknowledge that the cutoff point for clinical relevance in our model, although based on literature, is arbitrary. Nevertheless, the primary outcome was defined as a clinically relevant reduction in predominant pain, as indications for elective LSF are due to both back and leg pain in our hospitals.

Future implications of the results
The validated prediction tool for estimating clinically relevant reduction in predominant pain can be used by clinicians as an aid to preoperatively inform individual patients about their expected outcomes. An example and explanation of the clinical application and decision making with the help of nomogram can be found in "Appendix 1." Secondly, adding new variables able to predict clinically relevant pain reduction could improve the performance of the prediction models. A variable that is overlooked in all previously mentioned models is preoperative physical performance. In other types of surgery, it has been proven physical performance can improve predictive power [24,25], which may also hold true for patients undergoing LSF. Thirdly, for patients who are less likely to achieve a clinically relevant pain reduction, care should be tailored to their specific needs in order to improve this probability. Using the nomogram, a surgeon can identify which risk factors that are modifiable contribute least to the expected pain reduction for the individual patient and can inform the patient to improve these risk factors before surgery.

Conclusion
Using the validated prediction tool (nomogram), a patient's probability of a clinically relevant pain reduction can be estimated 1 to 2 years after undergoing LSF. This validated prediction tool can be implemented in clinical practice to aid patients and care professionals in the difficult process of clinical decision making when considering LSF.
Funding No funding.

Compliance with ethical standards
Conflict of interest All authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.