Introduction

The rates of instrumented spinal fusion surgery increased markedly over the past decades, succeeded by growing evidence of especially short- and mid-term treatment effects for specific indications including lumbar spondylolisthesis associated with spinal stenosis [1,2,3,4,5,6]. Long-term clinical outcomes of spinal fusion are however still scarce and inconclusive [7,8,9,10,11]. Both deterioration and preservation of achieved clinical outcomes are reported, which can be partly explained by the heterogeneity in study designs and populations.

Another area of controversy is the relationship between radiographic fusion and clinical outcomes [12,13,14]. Among patients with degenerative lumbar spondylolisthesis treated with decompression and uninstrumented posterolateral fusion, solid arthrodesis appeared only beneficial for long-term clinical outcomes [15, 16]. However, in the presence of rigid instrumentation the necessity of a solid fusion within the first years can be debated. This emphasizes the need for long-term evaluations.

The current study investigated the long-term clinical outcomes of patients that were included in a randomized controlled trial (RCT) and who received instrumented posterolateral spinal fusion with autologous bone graft or osteogenic protein-1 for lumbar spondylolisthesis with neurological manifestations [17]. The primary objective was to assess disability, as determined by the Oswestry Disability Index (ODI), at long-term follow-up compared to baseline and 1 year after surgery. In addition, the effect of diagnosis, graft type and fusion status at 1-year follow-up were investigated. Secondary outcomes included pain experience, quality of life, satisfaction with treatment and reoperation rate.

Materials and methods

Study design and population

We performed a cross-sectional long-term follow-up among the Dutch participants of the previously published international multicenter Osigraft RCT [17]. In this original study, 134 patients were randomized to osteogenic protein-1 (OP-1, also known as BMP-7) or autograft for posterolateral spinal fusion from 2004 to 2008; 113 patients were included in the primary analysis. All patients underwent single-level instrumented posterolateral fusion of the lumbar spine for degenerative or isthmic spondylolisthesis with symptoms of neurological compression. Patients in the OP-1 group received Osigraft (Stryker Biotech, Hopkinton, MA, USA) combined with local bone. Patients in the control group received autologous bone graft from the iliac crest combined with local bone (autograft group). The primary outcome was overall success at 1-year follow-up, based on a combination of clinical outcomes and evidence of posterolateral fusion on computed tomography (CT) scans.

For the current study, patients were recruited from the Dutch study population with complete 1-year follow-up that consisted of 61 patients (Fig. 1). In January 2018, available patients were invited to participate by mailing an information letter, informed consent form, set of questionnaires and return envelope. They were asked to return the blank questionnaires in case they declined to participate. Non-responders were sent a reminder after 4 weeks.

Fig. 1
figure 1

Flowchart of patients included in the long-term follow-up study

Clinical outcomes

To assess long-term clinical outcomes, a set of various disease specific and generic questionnaires as well as additional questions was compiled. In line with the assessments done at baseline and 1 year after surgery in the original study, patients received the following validated questionnaires: ODI, EQ-5D-3L and visual analogue scale (VAS) for leg pain. Back pain was only assessed at long-term follow-up. The sum score of the disease specific ODI, defined as primary outcome, ranges from 0% (no disability) to 100% (maximum disability possible) [18]. Responses to the EQ-5D-3L were converted into a single health state index score ranging from − 0.329 (worst health state) to 1.000 (best possible health) [19, 20]. The VAS for pain runs from 0 (no pain) to 100 (terrible pain) and a score of ≤ 30 was considered as mild pain [21, 22].

Satisfaction with treatment at long-term follow-up was measured with a numeric rating scale (NRS) ranging from 0 (very dissatisfied) to 10 (very satisfied). In addition, patients were asked 1) how their complaints of back pain and leg pain have changed since the index surgery, 2) for the main effect of surgery on their pain complaints and 3) if they would choose the same treatment if they had the same condition and complaints. Finally, patients were asked for any lumbar spine reoperations since the index surgery.

Statistics

Data were processed and analyzed in SPSS Statistics 24.0 (IBM Corp., Armonk, NY, USA). Patient characteristics and all patient reported outcome measures were evaluated using descriptive statistics. Differences in ODI over time (baseline, 1-year and long-term follow-up) and the effect of graft type (OP-1 vs. autograft) were analyzed using a mixed analysis of variance (ANOVA) model for repeated measures. In addition, a multiple regression (enter method) was run to predict the ODI score at long-term follow-up from graft type, diagnosis (degenerative vs. isthmic spondylolisthesis) and fusion status at 1-year follow-up (fusion vs. doubtful fusion/non-union). EQ-5D-3L index scores and VAS leg pain over time were analyzed with Friedman’s test. For all statistical tests the threshold for significance was set to p < 0.05.

Ethical considerations

The Medical Ethical Committee of the University Medical Center Utrecht, The Netherlands, confirmed that this follow-up study did not fall under the Medical Research Involving Human Subjects Act and ethical approval was not required. Each study participant provided written informed consent.

Results

Study population

Since the 1-year follow-up, 5 of the 61 Dutch patients had died from causes unrelated to the index surgery, leaving 56 patients available for long-term follow-up. A total of 41 (73%) patients was enrolled, with a mean follow-up of 11.8 (range 10.1–13.7) years. Twelve patients did not respond to the questionnaire, and 3 were not willing to participate. The distribution among treatment groups is shown in Fig. 1.

Demographics, surgical details and 1-year fusion status on group level and per treatment condition are outlined in Table 1. The mean age of the 17 males and 24 females assessed at long-term follow-up was 62 ± 11 (range 30–91) years. The majority of the patients underwent surgery for isthmic spondylolisthesis (71%) and the overall 1-year fusion rate was 66%.

Table 1 Demographics, surgical details and 1-year fusion status on group level and per treatment group

Clinical outcomes

ODI, EQ-5D-3L index scores and VAS pain scores at each timepoint on group level are listed in Table 2. Both means ± standard deviation and medians along with their interquartile range (IQR) are reported, as not all data are normally distributed.

Table 2 Patient reported outcome measures at baseline, 1-year follow-up and long-term follow-up. Both means ± standard deviation and medians along with their interquartile range (IQR) are presented, as not all variables are normally distributed. VAS leg represents the maximum score for the left and right leg. VAS back pain is only measured at long-term follow-up

The mean ODI improved from 43 ± 15 at baseline to 13 ± 16 at 1 year and slightly regressed to 20 ± 19 at final follow-up. The mixed ANOVA model for repeated measures showed no significant interaction between timing of follow-up and graft type on ODI (F (2, 76) = 1.028, p = 0.363). Tests of within-subjects effects and between-subjects effects of the mixed ANOVA indicated, respectively, a main effect of time (F(2, 76) = 51.393, p < 0.001), but no main effect of graft type (F (1, 38) = 0.021, p = 0.884). Post-hoc analysis with Bonferroni correction confirmed a significant difference between baseline ODI and both post-operative time-points (p < 0.001), but not between 1-year and long-term follow-up (p = 0.075).

Multiple regression showed that the ODI at long-term follow-up could not be predicted based on the independent variables: diagnosis, graft type or 1-year fusion status (F (3, 37) = 1.033, p = 0.389). The overall model fit was R2 = 0.077. Based on these results and the sample size, all secondary outcomes are presented on group level.

As illustrated by Table 2, both the EQ-5D-3L index score and VAS leg pain regressed slightly between 1-year and long-term follow up. Friedman’s test confirmed that the EQ-5D-3L index and VAS leg pain scores differed between timepoints (Friedman’s Q(2) = 36, p < 0.001 and Friedman’s Q(2) = 28, p < 0.001 respectively). Post-hoc testing with Dunn-Bonferroni correction showed however that for both outcomes the regression during follow-up was not significant (EQ-5D-3L Z = 0.271, p = 0.769 and VAS leg pain Z =  − 0.485, p = 0.147).

Satisfaction

Overall satisfaction with treatment was excellent, with a mean score of 8.0 ± 1.8 (range 3–10). The majority of the patients (76%) scored ≥ 8; only 5 patients scored < 6. Moreover, 78% would choose the same treatment again. The remaining patients answered this question with 'I don’t know'.

Figure 2 shows that 78% of the patients reported improvement in back pain and 71% improvement in leg pain. Of the 6 patients who reported much worsening of back and/or leg pain, only 1 patient underwent revision surgery at the same level (case 3 in Table 3). Three of these 6 patients were scored as fused at 1-year follow-up, including the revised case.

Fig. 2
figure 2

Effect of surgery on back pain (light grey) and leg pain (dark grey) at long-term follow-up

Table 3 Overview of additional lumbar spine surgeries since 1-year follow-up

To the question 'On which complaint(s) had the surgery most effect?' 32% of the patients answered back pain, whereas 11% reported leg pain (Fig. 3). More than half of the patients (53%) reported a combined effect. According to 2 patients, the surgery was not effective at all. These patients scored consistently low on all satisfaction questions (satisfaction 4 and much worsening of both back and leg pain). Moreover, they reported severe disability based on the ODI at both 1-year and long-term follow-up (ranging between 40 and 47), but only at long-term follow-up a severe VAS leg pain score (> 80) and very low EQ-5D-3L index score (0.174). Their VAS back pain score at long-term follow-up was also > 80. One of these unsatisfied patients was scored as fused at 1-year follow-up.

Fig. 3
figure 3

Main effect of surgery at long-term follow-up

Additional surgery

As outlined in Table 3, 4 patients underwent additional lumbar spine surgery since the final follow-up of the initial study, but none of these surgeries were related to non-union.

Discussion

This study showed excellent long-term (> 10 years) clinical outcomes of instrumented posterolateral spinal fusion for degenerative and isthmic spondylolisthesis. Although the clinical success of spinal fusions is often debated, the quality of life and satisfaction outcomes of the current study are comparable to the most successful orthopaedic procedures, such as hip and knee arthroplasty [23,24,25,26]. Interestingly, patients reported not only clinical improvement for neurological symptoms, but also at least as much for back pain. Only 11% indicated a main effect of surgery on leg pain. VAS back and leg pain scores at long-term follow-up were very similar. Apparently, back pain is an important contributor to discomfort in spondylolisthesis cases; also in patients with neurological symptoms, which were a prerequisite for inclusion in the original study.

Although the clinical outcomes remained satisfactory for 10 years, a slight but non-significant deterioration in ODI, EQ-5D-3L index and VAS leg pain score compared to 1-year follow-up was observed. Such diminishment of the treatment effect was also observed in a similar study by Ekman et al. and may be caused by adjacent segment degeneration or general effects of ageing [7]. These effects cannot be further quantified as no radiographic or clinical assessment was performed and information on concomitant diseases was lacking. On the other hand, the clinical relevance of adjacent segment degeneration seems to be limited [11, 27, 28]. In the current study, only 3 patients (7%) underwent additional surgery at an adjacent level. Another explanation could be the diminishing of the placebo effect of surgery over time or the psychological phenomenon known as response shift [29, 30].

Recognizing the difficulty to compare our results with previous long-term follow-up studies of spinal fusion for spondylolisthesis, due to differences in indication, type of surgery, follow-up period and/or outcome measures, our patients reported relatively low ODI and high EQ-5D-3L index scores at each timepoint [7, 8, 10, 31]. Satisfaction with treatment falls well within the range reported in the literature [7, 8, 10, 32, 33]. Contrary, the long-term VAS leg pain score was relatively high [8, 33]. Interestingly, none of these previous studies had neurological manifestations as strict inclusion criterion. We and many others believe that these symptoms are an important indication for spinal surgery, as illustrated by the less favourable treatment effect achieved for patients with chronic low back pain without nerve root compression [5, 6, 34]. A recent meta-analysis on surgical treatment for degenerative spinal conditions indicated that lumbar radiculopathy was associated with the greatest mean change in health related quality of life from baseline [35].

None of the participants underwent revision surgery for pseudoarthrosis, despite a substantial number of patients (34%) that were classified as ‘not fused’ on the CT-scan at 1-year follow-up. Also, based on the primary outcome measure ODI, no relationship was found between fusion status and long-term clinical outcome. Both patients classified as ‘fused’ and ‘not fused’ experienced a low level of disability at long-term follow-up (mean 21 ± 20 and 17 ± 16 respectively). Although a number of patients possibly developed further bony fusion in the course of the follow-up period, it is also possible that the combination of a fibrous union with pedicle screw instrumentation in situ offers sufficient stability in this patient population.

In line with the 1-year results of the original study, no difference in long-term ODI was seen between the patients who received OP-1 combined with local bone and solely autologous bone graft. This confirms the absence of a strong relationship between radiographic and clinical outcomes. Consecutive clinical trials failed to demonstrate non-inferiority of OP-1 versus autograft for spinal fusion and Osigraft was withdrawn from the market in 2015 [17, 36].

The findings of this study add to the scarce literature on long-term clinical outcomes of spinal fusion and endorse the importance of appropriate surgical patient selection. However, we do recognize some limitations. First, this long-term follow-up was confined to only the Dutch participants of the original international multicentre study. Despite the acceptable follow-up rate of 73%, this resulted in a relatively small sample size [37]. Participants were however equally distributed among the randomized treatment groups and their baseline and 1-year clinical outcomes were comparable with the outcomes of both the total study population and the entire Dutch sample, reducing the risk of selection bias. Third, the outcomes of this study were limited to patient reported outcome measures. Radiological fusion was only evaluated at 1-year follow-up. Finally, back pain was only assessed at long-term follow-up and in relation to that, patients’ pre-operative main complaint was unknown.

In conclusion, this study showed favourable long-term clinical outcomes in patients who underwent instrumented posterolateral spinal fusion for spondylolisthesis with neurological symptoms. Diagnosis (degenerative vs. isthmic spondylolisthesis), graft type (OP-1 vs. autograft) and 1-year fusion status (fusion vs. doubtful fusion/non-union) were not predictive for the ODI > 10 years after surgery. Comparison with available long-term follow-up studies stresses the necessity of established and strict indications for this procedure.