To evaluate the predictive precision of the Dialogue Support, a tool for additional help in shared decision-making before surgery of the degenerative spine.
Data in Swespine (Swedish national quality registry) of patients operated between 2007 and 2019 found the development of prediction algorithms based on logistic regression analyses, where socio-demographic and baseline variables were included. The algorithms were tested in four diagnostic groups: lumbar disc herniation, lumbar spinal stenosis, degenerative disc disease and cervical radiculopathy. By random selection, 80% of the study population was used for the prediction of outcome and then tested against the actual outcome of the remaining 20%. Outcome measures were global assessment of pain (GA), and satisfaction with outcome.
Calibration plots demonstrated a high degree of concordance on a group level. On an individual level, ROC curves showed moderate predictive capacity with AUC (area under the curve) values 0.67–0.68 for global assessment and 0.6–0.67 for satisfaction.
The Dialogue Support can serve as an aid to both patient and surgeon when discussing and deciding on surgical treatment of degenerative conditions in the lumbar and cervical spine.
Level of evidence
The most important question for the patient to be answered before surgery is: “how much better will I be?” This is often a difficult question for the surgeon to answer reliably, as patient-reported outcome after surgery for degenerative spinal conditions demonstrates major heterogeneity. At follow-up, using Swespine, between 10 and 40% of the patients, depending on the preoperative diagnosis, still may suffer from some spine-related disability and pain .
With the development of national quality surgical spine registers such as the Swedish “Swespine”, surgeons and patients are having access to aggregated outcome data, serving as a rough suggestion of possible achievements for an individual patient after surgery. This is, for example, demonstrated in three international Nordic cooperation studies where data from Swespine, NorSpine (Norway) and Danespine (Denmark) are presented [2,3,4].
It is obvious, however, that translating these data, which are presented on a group level, to individualized assessment of surgical success, may be difficult. This is because several socio-demographic characteristics and other baseline variables, not always known, will modify the outcome.
In 2013, using Swespine data, an analytic project was initiated, together with Region Stockholm (https://www.sll.se/om-regionstockholm/Information-in-English1/). The initial aim was to present case-mix-adjusted outcome data publicly for a fair comparison and benchmarking of individual spine centres; https://vardenisiffror.se/jamfor/kallsystem (only in Swedish). This led to the development of a tool for prediction of individual outcome after surgery for lumbar and cervical degenerative conditions. In 2017, we could present the Dialogue Support for members of the Swedish Society of Spinal Surgeons, and it was later made publicly available; http://www.4s.nu/4s-f%C3%B6rening/dialogst%C3%B6d-44852774 (only in Swedish).
The support is an interactive web-based instrument to be used in shared decision-making with the patient when discussing surgery for different spinal disorders. After being translated to English and discussed with the Eurospine board, it was made publicly available at Eurospine Home page in October 2020; https://app.molnify.com/app/7wqw6owgrznr76bkaqc6l4bs7q.
In Fig. 1a and b, two examples of prediction of outcome using the Dialogue Support are demonstrated. To the left on the screen picture, the patient’s individual values of predictor variables are recorded. The right side demonstrates the predicted outcomes for that patient based on the patient’s characteristics combined with algorithms describing the relationship between patient characteristics and outcomes based on large amounts of historical data. The pie chart shows predicted probabilities for the five alternatives of pain change and in the banner at the top of the screen, the predicted pain change and satisfaction with outcome are dichotomously summarized into a percentage of success and satisfaction (for description of outcome variables see Methods). The tool can be used by the reader using the following link to the Eurospine Home page: http://www.eurospine.org
Patient-centred outcome prediction is a growing focus in spine research, producing several reports annually. The majority discuss prediction in terms of Patient-Reported Outcome Measures (PROMs) [5,6,7,8,9,10,11,12,13,14,15], a few deals with adverse events [16, 17], length of stay , revision surgery  or return to work . Among the PROM analyses, the outcome measure is usually dichotomized. The predictive modelling differs, but the most frequently used is based on multivariate logistic regression algorithms [5, 6, 9, 11, 13, 14]. In recent years, machine learning has gained interest as an alternative [9, 12, 21, 22]. Further details of available reports related to degenerative spine surgery are presented in Table 1.
The aim of the current study is to evaluate the predictive precision of the Dialogue Support.
The dialogue support, www.eurospine.org
The Dialogue Support is predicting outcome 1 year after surgery for patients with selected spinal disorders. The underlying prediction models used have been trained on a sizable body of data throughout Sweden during a 10-year period and are updated every year. The data quantity thus always includes outcomes no more than 1 year old.
The prediction is demonstrated as a proportion of a specific patient group achieving a certain outcome after surgery and answering PROM after 1 year, here global assessment ("Totally pain free-Much better-Somewhat better-Unchanged-Worse”)  and satisfaction. Each prediction algorithm (one per diagnostic group) is based on approximately 2000–12,000 individuals, depending on the diagnosis and baseline profile. The included diagnosis groups are lumbar disc herniation (LDH), lumbar spinal stenosis (LSS), lumbar degenerative disc disease (DDD) and cervical radiculopathy (CR), which is caused by either disc herniation or foraminal stenosis.
Swespine in its current form was started 1998 and to date includes approximately 155 000 operated patients (i.e. index procedures used for predictive evaluation) with degenerative conditions in the lumbar and cervico-thoracic spine. National coverage is 95%, completeness 85% and 1-year follow-up over 70%.
Participants in the current study
Patients with their index surgery between 2007-01-01 and 2019–04-01, and who had one-year follow-up, were included in the analyses, resulting in a total of 87 494 patients: 23 087 with LDH, 51 390 with LSS and 5 872 with DDD 7 154 with CR. All patients have given consent prior to registration in Swespine, including information that their data will be used in clinical studies and that they can withdraw their individual data from the registry at any time. The evaluation procedure follows the TRIPOD recommendations (Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) .
All analyses were performed separately for the four different subgroups. Patients with missing data on the outcome variable or any of the explanatory variables were excluded. No imputations for missing data were performed.
The two outcome measures used were:
A. Global Assessment (GA)
is an ordinal type Likert scale with six response alternatives; “How is your back/leg pain today as compared to before the surgery?” where 0 represents no back/leg pain before the surgery, 1 = completely pain free, 2 = much better, 3 = somewhat better, 4 = unchanged, 5 = worse . Patients responding with the 0-alternative are excluded in the analyses. For the cervical spine, the question relates to neck/arm pain. Leg pain is the outcome in Spinal stenosis and LDH groups, back pain in the DDD group and arm pain in the CR group. In the summarized presentation, GA was dichotomized into success (response alternatives 1 and 2) and failure (response alternatives 3–5).
B. Satisfaction (SAT)
with treatment outcome, an ordinal Likert scale with three response alternatives (satisfied, hesitant and dissatisfied). In the analysis this variable was dichotomized to satisfied and not satisfied (hesitant or dissatisfied).
A large number of socio-demographic and clinical baseline factors deemed relevant for predicting outcome were evaluated for inclusion in the models. Eligible predictors were all demographic and baseline data in Swespine. Link (in Swedish); http://www.4s.nu/swespine-formul%C3%A4r-44871294.
Development of algorithms underlying the prediction model was carried out in three steps.
Model development and variable selection
For the analysis of GA, an ordered probit model was estimated to account for the five levels in the outcome variable. The analysis of satisfaction was based on a logistic regression model. Backward variable selection based on Akaike information criterion was employed to select variables for inclusion in the final model. Model selection was performed out of sample by randomly splitting the data into a training data set (80% of sample) and a test data set (20% of sample).
To evaluate accuracy of the model, calibration plots and receiver operating characteristic (ROC) curves were used . In the calibration plots, patients were divided into 20 equally sized groups based on predicted value for their outcome, and then, the average actual outcome within each group was calculated. Diagnostic ability was also evaluated using receiver operating characteristic (ROC) curves, where the ability to classify patients into the correct group is assessed when varying the discrimination threshold. The dotted line indicates that the model predicts no better than chance, described by an area under the curve (AUC/c-statistic) of 0.5. A line extending to the upper left corner indicates perfect separation with an AUC/c-statistic of 1.0. In these analyses, GA was dichotomized into success (response alternatives 1, 2) and failure (response alternatives 3, 4, 5)
Model estimation on entire sample
Once variable selection had been performed in step 1 and validated in step 2, the final models were re-estimated on the entire data set, i.e. both the training data and the test data, to get the largest possible number of observation for the final model parameter estimation.
A separate model is estimated for each of the 4 diagnoses of patients and for each of the 2 outcomes, resulting in 8 ROC and 8 calibration plots.
Data management and statistical analysis were conducted using R Statistical Software version 4.0.0.
Description of the study population
The eligible study population was reduced because of dropouts at follow-up and missing data of predictor variables as shown in Table 2. In general, the differences of baseline data between the diagnostic groups were small or moderate. The DDD group had the longest duration of pain and the highest frequency of earlier spine surgery (Table 3).
Assessment of predictive ability
The plots demonstrate how well predicted probabilities agree with actual outcome for subgroups with different case mix. Observations were ranked according to the predicted value and grouped in 20 categories in the lumbar diagnostic groups and in 5 categories in the cervical group. The proportion with actual successful outcome (y-axis) and predicted value (x-axis) was calculated for each category and plotted against each other. The solid line represents perfect calibration and dotted line represents the actual results. The concordance between prediction and actual outcome measured with GA for success and satisfaction on a group level was high, with small differences between diagnostic groups. Satisfaction in the DDD group was least concordant. The findings are demonstrated in Fig. 2.
The ROC curve demonstrates the ability of the model to separate successful cases and failures. As shown in Fig. 3, the ability of the prediction models to discriminate between successes and failures on individual level is fair, with AUC ranging from 0.67 to 0.68. There were slight differences in model fit between the diagnostic groups. For satisfaction, the AUC value varied more between diagnostic groups, from 0.6 for DDD to 0.67 for CR.
Model estimation on entire sample
Table 4 presents the effect of each predictor on the two outcomes for the four different patient groups. Indicators of lower socio-economic status, such as smoking, disability pension and unemployment, were consistently associated with lower satisfaction and less pain improvement. Previous spine surgery was a negative predictor for all diagnostic groups. Short duration of back/neck pain was associated with more pain improvement and higher satisfaction in all diagnostic groups. Short duration of leg/arm pain was associated with more pain improvement and higher satisfaction in the LDH, CSS and CR groups. Age and gender were of minor importance, as was also ODI, whereas a higher quality of life (EQ-5D) at baseline predicted higher satisfaction at follow-up for all but the CR group.
The Dialogue Support, based on the national Swedish quality register “Swespine”, presents the predicted outcome after surgery for degenerative spinal disorders. The outcome is presented as a percentage of outcome according to global assessment of pain (GA) and satisfaction with outcome based on the patient’s characteristics combined with algorithms describing the relationship between patient characteristics and outcomes based on large amounts of historical data.
The calibration plots of subgroups demonstrate a high degree of concordance, with minor differences between diagnostic groups. The message to the patient can be expressed as follows: in the group of earlier operated individuals with a similar baseline profile as you, a certain percentage reported after 1 year “complete relief of pain”, another percentage reported “much better”, a third percentage “somewhat better”, etc. This is visualized in the pie diagram.
On an individual level, as estimated with ROC curves, the precision of the predictive model was fair. For global assessment of pain, the AUC value ranged from 0.67 to 0.68. For satisfaction, it ranged from 0.60 (DDD) to 0.66 (LDH). Other reports with PROMs as outcome measure and logistic regression as analytic method describe AUC values and c-indexes ranging from 0.64 to 0.79, mostly tested on single-centre cohorts [5, 6, 9, 11, 14].
In recent years, new computer based analytic methods, often called “machine learning” and “deep learning”, have been proposed as possibly more powerful methods of data acquisition and analysis. The suggested advantage of these techniques appears not to be determined . Reported AUC values or c-indexes with these analytic methods range from 0.59 to 0.90 [8, 12, 21, 22]. Probably the more critical aspect of outcome prediction, and possibilities of improving precision, is related to addition of more predictor variables. This appears to be more important than focusing on analytic techniques, although different analytic models may change the predictive potential of a particular data set, as has been demonstrated on a hip and knee arthroplasty cohort .
The limitation of the current model as such is related to the baseline and socio-demographic variables at hand. Evaluation and development of the Dialogue Support is a continuous process. There could be some changes in the surgery techniques, processing, trainings, support, machineries and medicines affecting outcome. This is considered by yearly updating of the database. The yearly updating of the database and introduction of new baseline variables are also expected to increase the precision of the model on the individual level over time. Dropouts may have a tendency of worse outcome than consenters also in registers , so there is a possible risk of overestimating success in the model. However, this is still an unsolved question .
A possible selection bias could be caused by the proportion of patients not having their index surgery recorded. However, there are only exceptional occurrences of patients opting out from Swespine. The major cause of loss to registration is deficient routines in some of the participating centres, not dependent on individual patients. In our interpretation, this does not cause any bias in the national register. A second possible cause of selection bias would be loss to follow-up. This has been assessed in a recent publication .
When it comes to application of the Dialogue Support in other countries, the Swedish reference database can be a limitation to generalizability. Spine patients in different countries have different cultural and social conditions, which may affect the predictive ability of the model. Interpretation of predictions should be done with this in mind, until validation tests have been performed. Ideally the model should be tested on a national, or other large, database in the country in question.
The strength of the Dialogue Support is related to the large reference population in the national Swespine Registry, which has a coverage of 95%, a completeness of 85% and a one-year follow-up of more than 70%. Thus, the data that the prediction model is based on well represent the “degenerative spine population” in Sweden over the last ten years. Predictions can thus be generalized in the entire nation and applied to all spine centres. Dropouts and missing data may infer a limitation to the generalizability, which we hope to reduce with increasing web-based registration.
In the aggregated perspective, the Dialogue support, acting as “one piece of the puzzle”, can support the clinician’s clinical experience/judgement. In popular terms, using the Dialogue support, it is possible to describe outcome as a certain probability to have a successful and satisfactory outcome after a proposed surgical intervention.
Thus, the Dialogue support offers the opportunity to both patient and surgeon to contemplate and discuss the probability of benefit and risk based on more substantial evidence than the experience and conjecture of an individual surgeon. It is also an interesting opportunity to start international research cooperation based on the Dialogue support.
Our ambition is to validate these models further once new data becomes available (data are continuously collected in Swespine, approximately 10 000 patients per year), and we can also foresee validating models in other countries where similar (albeit not identical) data are collected.
The Dialogue Support is a useful prediction tool with an accuracy which is high on a group level and moderate on an individual level. It can serve as an aid to both patient and surgeon when discussing a surgical treatment of degenerative conditions in the lumbar and cervical spine.
Individual data are not publicly available but on a group level through yearly reports by the steering committee of Swespine (www.4s.se).
Fritzell P, Hagg O, Gerdhem P, Abbott A, Parai C, Thoreson O, Stromqvist B, Mellgren L, Blom C (2018) Swespine 25 years. 2018 annual report follow up of spine surgery performed in sweden in 2017. ISBN: 978-91-983912-3-7
Lagerbäck T, Fritzell P, Hägg O, Nordvall D, Lønne G, Solberg TK, Andersen MØ, Eiskjær S, Gehrchen M, Jacobs WC, van Hooff ML, Gerdhem P (2019) Effectiveness of surgery for sciatica with disc herniation is not substantially affected by differences in surgical incidences among three countries: results from the Danish, Swedish and Norwegian spine registries. Eur Spine J 11:2562–2571
Lønne G, Fritzell P, Hägg O, Nordvall D, Gerdhem P, Lagerbäck T, Andersen M, Eiskjaer S, Gehrchen M, Jacobs W, van Hooff ML, Solberg TK (2019) Lumbar spinal stenosis: comparison of surgical practice variation and clinical outcome in three national spine registries. Spine J 1:41–49
Andersen MØ, Fritzell P, Eiskjaer S, Lagerbäck T, Hägg O, Nordvall D, Lönne G, Solberg T, Jacobs W, van Hooff M, Gerdhem P, Gehrchen M (2019) Surgical treatment of degenerative disk disease in three scandinavian countries: an international register study based on three merged national spine registers. Glob Spine J 9(8):850–858
Hegarty D, Shorten G (2012) Multivariate prognostic modeling of persistent pain following lumbar discectomy. Pain Physician 15(5):421–434
Janssen ERC, Punt IM, van Kuijk SMJ, Hoebink EA, van Meeteren NLU, Willems PC (2020) Development and validation of a prediction tool for pain reduction in adult patients undergoing elective lumbar spinal fusion: a multicentre cohort study. Eur Spine J 29(8):1909–1916
Karhade AV, Fogel HA, Cha TD, Hershman SH, Doorly TP, Kang JD, Bono CM, Harris MB, Schwab JH, Tobert DG (2021) Development of prediction models for clinically meaningful improvement in PROMIS scores after lumbar decompression. Spine J 21(3):397–404
Khan O, Badhiwala JH, Witiw CD, Wilson JR, Fehlings MG (2020) Machine learning algorithms for prediction of health-related quality-of-life after surgery for mild degenerative cervical myelopathy. Spine J 8: S1529–9430(20)30047–4
Khor S, Lavallee D, Cizik AM, Bellabarba C, Chapman JR, Howe CR, Lu D, Mohit AA, Oskouian RJ, Roh JR, Shonnard N, Dagal A, Flum DR (2018) Development and validation of a prediction model for pain and functional outcomes after lumbar spine Surgery. JAMA Surg 153(7):634–642
McGirt MJ, Sivaganesan A, Asher AL, Devin CJ (2015) Prediction model for outcome after low-back surgery: individualized likelihood of complication, hospital readmission, return to work, and 12-month improvement in functional disability. Neurosurg Focus 39(6):E13
McGirt MJ, Bydon M, Archer KR, Devin CJ, Chotai S, Parker SL, Nian H, Harrell FE Jr, Speroff T, Dittus RS, Philips SE, Shaffrey CI, Foley KT, Asher AL (2017) An analysis from the quality outcomes database, Part 1. Disability, quality of life, and pain outcomes following lumbar spine surgery: predicting likely individual patient outcomes for shared decision-making. J Neurosurg Spine 27(4):357–369
Merali ZG, Witiw CD, Badhiwala JH, Wilson JR, Fehlings MG (2019) Using a machine learning approach to predict outcome after surgery for degenerative cervical myelopathy. PLoS ONE 14(4):e0215133
Quddusi A, Eversdijk HAJ, Klukowska AM, de Wispelaere MP, Kernbach JM, Schröder ML, Staartjes VE (2020) External validation of a prediction model for pain and functional outcome after elective lumbar spinal fusion. Eur Spine J 29(2):374–383
Rundell SD, Pennings JS, Nian H, Harrell FE Jr, Khan I, Bydon M, Asher AL, Devin CL, Archer KR (2020) Adding 3-month patient data improves prognostic models of 12-month disability, pain, and satisfaction after specific lumbar spine surgical procedures: development and validation of a prediction model. Spine J 20(4):600–613
Staub LP, Aghayev E, Skrivankova V, Lord SJ, Haschtmann D, Mannion AF (2020) Development and temporal validation of a prognostic model for 1-year clinical outcome after decompression surgery for lumbar disc herniation. Eur Spine J 29(7):1742–1751
Bernstein DN, Keswani A, Chi D, Dowdell JE, Overley SC, Chaudhary SB, Mesfin A (2019) Development and validation of risk-adjustment models for elective, single-level posterior lumbar spinal fusions. J Spine Surg 5(1):46–57
Han SS, Azad TD, Suarez PA, Ratliff JK (2019) A machine learning approach for predictive models of adverse events following spine surgery. Spine J 19(11):1772–1781
Karnuta JM, Golubovsky JL, Haeberle HS, Rajan PV, Navarro SM, Kamath AF, Schaffer JL, Krebs VE, Pelle DW, Ramkumar PN (2020) Can a machine learning model accurately predict patient resource utilization following lumbar spinal fusion? Spine J 20(3):329–336
Lubelski D, Alentado V, Nowacki AS, Shriver M, Abdullah KG, Steinmetz MP, Benzel EC, Mroz TE (2018) Preoperative nomograms predict patient-specific cervical spine surgery clinical and quality of life outcomes. Neurosurgery 83(1):104–113
Asher AL, Devin CJ, Archer KR, Chotai S, Parker SL, Bydon M, Nian H, Harrell FE Jr, Speroff T, Dittus RS, Philips SE, Shaffrey CI, Foley KT, McGirt MJ (2017) An analysis from the quality outcomes database, Part 2. Predictive model for return to work after elective surgery for lumbar degenerative disease. J Neurosurg Spine 27(4):370–381
Siccoli A, de Wispelaere MP, Schröder ML, Staartjes VE (2019) Machine learning-based preoperative predictive analytics for lumbar spinal stenosis. Neurosurg Focus 46(5):E5
Staartjes VE, de Wispelaere MP, Vandertop WP, Schröder ML (2019) Deep learning-based preoperative predictive analytics for patient-reported outcomes following lumbar discectomy: feasibility of center-specific modeling. Spine J 19(5):853–861
Parai C, Hägg O, Lind B, Brisby H (2018) The value of patient global assessment in lumbar spine surgery: an evaluation based on more than 90,000 patients. Eur Spine J 27(3):554–563
Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement BMC Medicine 13:1
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21(1):128–138
Huber M, Kurz C, Leidl R (2019) Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning. BMC Med Inform Decis Mak 19(1):3
Parai C, Hägg O, Willers C, Lind B, Brisby H (2020) Characteristics and predicted outcome of patients lost to follow-up after degenerative lumbar spine surgery. Eur Spine J 29(12):3063–3073
Solberg TK, Sørlie A, Øystein KS, Nygaard P, Ingebrigtsen T (2011) Would loss to follow-up bias the outcome evaluation of patients operated for degenerative disorders of the lumbar spine? A study of responding and non-responding cohort participants from a clinical spine surgery registry. Acta Orthop 82(1):56–63
We thank all members of the Steering committee of Swespine for collaboration and support necessary for the development of the Dialogue support and the current evaluation: Allan Abbott PT PhD, Carina Blom secretary, Paul Gerdhem MD PhD, Håkan Löfgren MD PhD, Lena Mellgren secr, Catharina Parai MD PhD, Björn Strömqvist MD PhD, Olof Thoresson MD PhD and finally, not the least, David Bergqvist from Ivbar/Logex for his statistical work.
No funding was obtained.
Conflict of interest
JM holds stock in LOGEX Healthcare Analytics, a company specialized in analysis of health care data. The other authors declare no conflict of interest.
Consent for publication
All patients in Swespine are given written information about the registry, including information that their data can be used for research and publication if they accept to participate in the register and that they can withdraw their data at any time.
Retrospective data are used in the current article. All data are anonymized. There are no individual data in the Dialogue support, but all data are on group level. Data are national and cannot be traced to specific location or to an individual, and are not included in the medical record. According to Swedish law, this implies that ethical approval is not needed.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Fritzell, P., Mesterton, J. & Hagg, O. Prediction of outcome after spinal surgery—using The Dialogue Support based on the Swedish national quality register. Eur Spine J 31, 889–900 (2022). https://doi.org/10.1007/s00586-021-07065-y
- National quality register
- Spine surgery
- Dialogue support