Background

Lumbar spinal fusion (LSF) with or without decompression surgery aims to stabilize the lumbar spine in various degenerative disorders such as spinal stenosis, spondylolisthesis, disc herniation, and discogenic low back pain [1,2,3,4]. LSF is frequently and increasingly performed [5,6,7,8] and presented in current guidelines of the North American Spine Society as ‘a necessary element of the surgeon’s armamentarium in the treatment of lumbar degenerative disorders’ [9]. Nevertheless, LSF in degenerative disorders of the lumbar spine remains a subject of controversy. For example, outcomes in patients with chronic low back pain did not favour LSF over rehabilitation at long-term follow-up [10, 11]. In addition, two systematic reviews showed no convincing evidence of superiority of LSF compared to nonsurgical management in discogenic low back pain [3] and degenerative lumbar spondylosis [12]. Furthermore, some studies analysed cost-effectiveness of LSF versus nonsurgical care in degenerative spondylolisthesis and obtained insufficient convincing evidence favouring LSF [13, 14]. Moreover, LSF should be carefully applied because of known complications such as neurologic deficit, infection, pseudarthrosis, and revision surgery [15,16,17,18,19]. As a result of the ongoing debate, the recent National Institute for Health and Care Excellence (NICE) guidelines recommended to not offer spinal fusion for people with low back pain unless as part of a randomized controlled trial [20].

An overview of outcomes after LSF is needed, considering that guidelines provide conflicting recommendations [9, 20] and understanding of long-term outcomes after LSF in degenerative disorders is particularly lacking [12]. Specifically, a systematic review of prospective cohort studies is crucial to understand long-term outcomes [21,22,23] in broad patient categories [24] and larger samples [8]. Furthermore, outcomes of prospective cohort studies are not biased by patients who might be disappointed after assignment to an undesired intervention such as ‘unstructured physiotherapy’ instead of LSF surgery [11]. Ultimately, the systematic review should meta-analyse the direction and tendency of outcomes after LSF surgery to improve the understanding of recovery after LSF and improve post-LSF management [25]. Therefore, the objective of this study was to systematically review and meta-analyse the course of pain (back and leg) and disability in patients with degenerative disorders of the lumbar spine, including spinal stenosis, spondylolisthesis, disc herniation, and discogenic low back pain (i.e. degenerative disc disease) after first-time LSF surgery.

Methods

Protocol and registration

The systematic review followed the methods of the pre-defined study protocol [26] and guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [27]. The study was registered with PROSPERO (CRD42015026922).

Eligibility criteria

Prospective cohort studies reporting pain and disability outcomes after first-time LSF were included. The patient population involved adults (aged over 16 years) with degenerative disorders of the lumbar spine including spinal stenosis [28], spondylolisthesis [28], disc herniation [29], and discogenic low back pain (i.e. degenerative disc disease) [30]. Prospective cohort studies with consecutive patient sampling were considered most appropriate to analyse outcomes after first-time LSF surgery as a result of a broad representation of the population [24, 31,32,33] and lengthy follow-up [21,22,23, 32, 33]. For these reasons, randomized controlled trials were excluded [26].

Search strategy, information sources, and study selection

The search strategy was created by the first author (NK) with the support of an experienced medical librarian (RDV) and critically reviewed by all researchers [26] (Appendix 1, ESM). After approval, published studies were searched in MEDLINE, EMBASE, CINAHL, and ZETOC databases up to 31 March 2017. In addition, a search was conducted for unpublished studies (British National Bibliography for Report Literature and OpenGrey), studies in press, and studies published ahead of print. Furthermore, reference lists of included studies were checked. The language of publication was not restricted. Titles and abstracts (stage 1) followed by full texts (stage 2) were independently screened by two researchers (NK and TH). In general, if there was any doubt about exclusion of the study, the study proceeded to the full-text screening stage to reduce the likelihood of excluding a relevant study. A third researcher (AR) mediated in situations where consensus could not be reached [34]. Corresponding authors were contacted by e-mail if a full text could not be retrieved. The process of study selection was summarized using a PRISMA flow diagram [27].

Data collection and outcomes

Data for each included study were extracted using a standardized form, which was optimized after piloting in five studies. Data were extracted independently and in duplicate by two out of three researchers (NK, AR, TH). In case of missing data, authors were contacted to provide additional information. Data were extracted on the following items: pain and disability outcomes, participants (setting and area), patient characteristics, duration of symptoms, surgical procedure(s), clinical care pathway, study design, sample size, eligibility criteria, and follow-up dates. Pain and disability outcome data were extracted at all available intervals and were measured with, for example, Visual Analogue Scale (VAS), Numeric Rating Scale (NRS), and Oswestry Disability Index (ODI) [35].

Risk of bias assessment

A modified version of the Quality in Prognostic Studies (QUIPS) tool [36, 37] was used to assess studies on the domains of representation of sample, definition of study sample, study attrition, outcome measurement, confounding, statistical analysis, provision of data, and blinding of outcomes [26] (Appendix Table 2, ESM). The overall risk of bias within a study was considered ‘low’ when all items were scored at ‘low risk of bias’ [34, 36]. Overall risk of bias was considered ‘high’ if one or more items within a study were scored at ‘high risk of bias’. In all other cases, the overall risk of bias of a study was considered ‘unclear’. Data on risk of bias were extracted independently and in duplicate by two researchers (NK, BS). Disagreements were solved by consensus in two consensus meetings. Ultimately, a third researcher (AR) mediated where consensus could not be reached. Weighted Cohen’s κ [38] was used to assess agreement between the researchers.

In addition, location bias and outcome reporting bias were assessed [34]. The presence of location bias was assumed when studies published in low or non-impact factor journals reported greater improvements than those published in high-impact journals [39]. The Spearman rank-order correlation coefficient was determined to analyse the association between impact factor and change score of disability, as the impact factors were considered of ordinal level. The change score of disability was calculated as baseline minus 1-year follow-up ODI outcome.

Information regarding impact factors of journals at year of publication was retrieved from the Journal Citation Reports database (Thomson Reuters) [40]. Outcome reporting bias was assessed by comparing outcomes listed in the study protocol or methods section with the actually reported results [34]. A modified funnel plot was constructed to investigate publication bias [34]. In the presence of publication bias, the funnel plot should resemble an asymmetrical funnel [41].

Data synthesis and analysis

A meta-analysis was conducted on pain and disability outcomes using a random effects model, maximum likelihood. Studies providing both outcome and variance data were included in the meta-analysis. As a result, an available cases analysis was performed as imputation of variance data was considered inappropriate due to a very small proportion of available variance data [34]. A sensitivity analysis with all studies was performed to assess the impact of including studies without variance data, therefore comparing n-weighted means with the initial meta-analysis outcomes. A visual presentation of outcomes (mean and upper bound 95% confidence interval) at all possible follow-up intervals provided optimal information to analyse the direction and tendency of outcomes. Both intervals on short- and long-term were considered valuable to determine the course of pain and disability outcomes. Both pain and disability outcomes were converted to a 0–100 scale to facilitate comparison between studies (0 representing no pain or disability, 100 representing maximum pain or disability) [37]. Headrick’s formula [42] was used to combine means when separate means (e.g. one-, two-, and three-level LSF surgery) described results of one study group. Individual patient data were extracted to calculate the proportion of patients who gained from LSF treatment [43] according to the minimal important change values provided by Ostelo et al. [44] (VAS 15, NRS 2, ODI 10). Where possible, the statistical heterogeneity of outcomes was analysed using the I2 statistic [45].

Results

Selected studies

The search retrieved 7452 studies, a total of 5532 after removal of duplicates (Fig. 1). Following initial screening of titles and abstracts, 158 studies were considered potentially eligible. Most common reasons for exclusion on title or abstract were: other population (i.e. cervical fusion, trauma, scoliosis) or other study type (i.e. randomized controlled trial, retrospective study, or systematic review). After assessment of full text, twenty-five studies [46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70] fulfilled the eligibility criteria and were included for qualitative analysis. Boissiere et al. [47], Kok et al. [54], and Pereira et al. [60] were excluded for the quantitative analysis because of an overlap in study population with Barrey et al. [46], Kok et al. [55], and Franke et al. [68], respectively.

Fig. 1
figure 1

Reproduced with permission from Moher et al. [27]

Flow chart of included studies.

Characteristics

Tables 1 and 2 show overall study and patient characteristics, respectively. The twenty-five studies included n = 1777 adults in total, with a range in sample size from 20 to 255. The age of participants ranged between 17 and 87 years. The recruitment period ranged from 1996 to 2014. Pain outcomes were commonly measured with the VAS (0–10 or 0–100) for overall pain, back pain, or leg pain. Disability outcomes were measured with the ODI in all included studies except for two [56, 70]. There was a wide variation in performed fusion techniques (Table 2). Specifically, there were different approaches (e.g. anterior or posterior), types of surgery (minimally invasive or open), types of graft (allograft and/or autograph), and cages used (yes or no). Information regarding the clinical care pathway after surgery was described in only six [53, 56,57,58,59, 65] out of twenty-five studies.

Table 1 Study characteristics
Table 2 Patient characteristics

Methodological quality assessment

In total, 163 out of 200 quality assessment items (81.5%) were scored similarly resulting in an agreement between assessors of weighted kappa 0.61. Conclusively, 104 items were scored as low risk of bias, 88 items as unclear, and 8 items as high risk of bias (Appendix Table 2, ESM). Study attrition was scored four times as ‘high risk of bias’: three studies provided no information regarding loss to follow-up [51, 58, 59], and one study had a high loss to follow-up without proper explanation [62]. Other items were scored with high risk of bias for the following reasons: insufficient presentation of eligibility criteria and suspected inclusion of full cases only [67]; unclear selection of participants [52]; out-of-boundary outcome of VAS back pain at the preoperative follow-up interval [54]; and patient-reported pain and disability outcomes in cooperation with physician [63]. Ultimately, 17 studies were considered at unclear risk of bias and eight with high risk of bias.

Location bias

Analysis of indexing of studies shows that all studies were indexed in MEDLINE (Appendix Table 3, ESM). Fourteen studies [46,47,48,49,50, 53, 59,60,61, 63, 64, 66, 69, 70] were indexed in EMBASE and two studies [51, 53] in CINAHL. The impact factors of the journals ranged from 0.430 to 5.163 [40] (Appendix Table 3, ESM). Correlation analysis between rank of impact factor and rank of change score resulted in a (negative) Spearman’s rho coefficient of − 0.266. Analysis of indexing did not seem to show signs of location bias.

Outcome reporting bias

Only four studies [49, 60, 68, 69] reported use of a published study protocol. Ghogawala et al. [49], Franke et al. [68], and Kanter et al. [69] provided data at all follow-up intervals and thorough description of their primary and secondary outcome selection. Pereira et al. [60] described in their study protocol measurement of VAS (back pain and leg pain) and ODI at baseline, 4 weeks and 3, 6, and 12 months, but outcomes at 3-, 6-, and 12-month follow-up were not reported.

Publication bias

The modified funnel plot to assess possible publication bias did not indicate selective publication of disability outcomes in relation to the study sample size (Fig. 2).

Fig. 2
figure 2

Funnel plot presenting sample size versus change score of disability outcomes to assess publication bias. n number

Pain outcomes

Four cohorts reported pain outcomes and variance data on a VAS scale for both back and leg pain. Nine studies described VAS back pain outcomes, and nine studies reported outcomes with the VAS leg pain. One study analysed pain outcomes with the back pain NRS and leg pain NRS [50]. Three studies described pain outcomes for subgroups. Cobo Soriano et al. [48] created a ‘disc herniation’ and ‘other lumbar spine disorders’ group. Inage et al. [52] made three groups: one-level fusion, two-level fusion, and three-level fusion. Pavlov et al. [59] created a ‘single-level’ and ‘double-level’ groups. When converted to a 100-point scale, mean preoperative outcomes of individual studies ranged from VAS back and leg pain 70–93, VAS back pain 44–79, and VAS leg pain 57–82.4. The course of pain following first-time LSF for degenerative disorders at different follow-up intervals is presented in Fig. 3a, b, c. Details of the meta-analysis of pain outcomes are provided in Appendix 4 (ESM).

Fig. 3
figure 3

a VAS back pain and leg pain, b course of back pain, c course of leg pain (mean and upper bound 95% CI) following first-time LSF for degenerative disorders. CI confidence interval, LSF lumbar spinal fusion, n number of patients, VAS Visual Analogue Scale

Disability outcomes

Twelve cohorts included in the quantitative synthesis reported disability outcomes using ODI scores. Cobo Soriano et al. [48], Inage et al. [52], and Pavlov et al. [59] created subgroups similar to the reporting of pain outcomes. The disability outcomes at the preoperative assessment point ranged from 24.5 to 62.8. The course of disability following first-time LSF for degenerative disorders at different follow-up intervals is shown in Fig. 4. Details of the meta-analysis of disability outcomes are provided in Appendix 4 (ESM).

Fig. 4
figure 4

Course of disability (mean and upper bound 95% CI) following first-time LSF for degenerative disorders. CI confidence interval, LSF lumber spinal fusion, n number of patients, preop preoperative

Sensitivity analysis

The sensitivity analysis, including studies without variance data, included data of two additional cohorts on VAS back and leg pain outcomes, three cohorts on VAS back pain, three cohorts on VAS leg pain, and eight cohorts on disability. Pain and disability outcomes showed similar means and courses in studies with and without variance data. However, the sensitivity analysis added data on very long-term follow-up (≥ 42 months). An increase in back pain and disability outcomes was considered notable. Details of the sensitivity analysis of pain and disability outcomes are provided in Appendix 4 (ESM).

Subgroup analysis

A meta-analysis of outcomes for different diagnostic subgroups was not possible with the available data. Fifteen studies presented blended data of different diagnostic subgroups without presenting outcomes per diagnostic subgroup. The remaining ten studies reported outcomes for eleven diagnostic subgroups: 1 spinal stenosis, 6 spondylolisthesis, 1 disc herniation, 0 discogenic low back pain, and 3 degenerative disc disease. Unfortunately, there were no follow-up intervals presented by two studies within one diagnostic subgroup and authors could not be reached to provide additional data, making a subgroup analysis impossible.

Nevertheless, two studies [52, 59] performed a subgroup analysis based on different numbers of fused levels. Both studies did not find a statistical significant difference in pain and disability outcomes and number of affected levels. Leg pain before surgery was reported in thirteen studies [48, 50,51,52, 55,56,57, 60, 63, 66,67,68, 70]. However, these studies did not provide data regarding an association between leg pain before and after surgery. Three studies [48, 51, 53] in total included 75 participants smoking and 148 not smoking. Other studies did not provide information regarding smoking or did not provide amounts of non-smokers. Furthermore, three studies reported work status of their participants. Ghogawala et al. [49] reported that 24 out of 50 patients were working preoperatively, similar to a 13 out of 18 and 79 out of 143 in the studies of, respectively, Kleeman et al. [53] and Franke et al. [68]. Neither of the studies performed subgroup analysis nor provided information regarding type of work and income. One study [50] identified 17 out of 58 patients with depression at time of surgery. Pain catastrophizing was not reported in a single study.

Complications

A variety of complications following LSF were reported (Table 3) [46, 47, 49, 53,54,55,56,57,58,59,60, 62,63,64,65, 68, 69], including: reoperation, cage migration, cage breakage, malposition of pedicle screw, wound infection, adjacent segment instability, pseudarthrosis, dural tear, spinal haematoma, vascular wound, aortic occlusion with non-fatal cardiac arrest, myocardial infarction, pulmonary embolism, acute allergic reaction, urosepsis, bowel injury, and postoperative confusion. There were no cases of surgery-related mortality reported.

Table 3 Overview of surgery-related complications

Discussion

This is the first rigorous systematic review and meta-analysis to determine the course of pain and disability after first-time LSF surgery in degenerative disorders. In summary, back and leg pain outcomes showed a decrease at every follow-up interval compared to preoperative levels. Back pain decreased at several times of follow-up ranging between 20 and 30 (respective upper bounds 95% confidence interval, 24 and 42). However, sensitivity analysis showed an increase to an n-weighted mean VAS back pain of 45 at 42-month follow-up in one study at high risk of bias [63]. In contrast, leg pain improved substantially at both short- and long-term follow-up. The decrease in leg pain was more distinct due to a seriously decreased leg pain at the 1- and 3-month intervals, which might be a result of successful nerve root decompression after LSF [71,72,73,74]. Furthermore, the mean leg pain outcomes seem lower at all follow-up intervals compared to the back pain outcomes. The severity of disability showed a relatively steady decrease over time, with an exception of the 6-week interval [57, 61, 69]. In addition, sensitivity analysis showed an increase to an n-weighted mean ODI of 30 and 24.6 at the 42-month [63] and 48-month follow-up [59], taking into account both studies at high risk of bias. In light of the a priori [26, 44] formulated minimal important change values, the course after first-time LSF surgery in degenerative disorders showed an overall clinically relevant decrease in pain and disability outcomes. A clinically significant decrease in back pain and disability at long-term follow-up might be questionable. Nevertheless, the long-term results should be interpreted with caution as a result of risk of bias and lacking variance data.

The findings were highly comparable with results presented in the systematic review and meta-analysis of randomized controlled trials and observational studies by Phan et al. [75], which compared back pain and disability outcomes between minimally invasive and open transforaminal lumbar interbody fusion in treatment of degenerative lumbar disease. Analysis comparing severity of back pain (VAS) and disability (ODI) at last follow-up showed an n-weighted mean of, respectively, 25.4 and 16.2, quite similar to the 20 and 16.4 at 24-month follow-up presented in the current study. Carreon et al. [76] reported similar disability (ODI) outcomes at a minimum of 1-year follow-up of (n-weighted mean) 28.3 for the overall surgical population in studies comparing LSF and nonsurgical interventions in various degenerative lumbar spine disorders.

Strengths and limitations

This is the first systematic review and meta-analysis analysing the long-term course of pain and disability following LSF. The knowledge gained can help guide treatment decision-making to improve patient selection (those with predominant leg pain) and decrease post-surgery back pain. The current course of back pain might indicate a need for improved preoperative selection, medical treatment, and physiotherapy management. The current quantitative analysis included twenty-two studies, representing a large sample of 1777 adults with degenerative disorders. Moreover, the availability of numerous cohorts is likely to improve generalizability of our results and support our decision to forgo the inclusion of randomized controlled trials, which might be biased by the utilization of (too) strict eligibility criteria [22, 24, 25]. Finally, the use of a thoroughly developed and published study protocol [26] has improved reproducibility and validity of the findings.

Nonetheless, a few deviations from the study protocol had to be made. For example, variance data were not available in seven studies [48,49,50, 59, 62, 63, 67] which decreased the power and generalizability of the meta-analysis outcomes. A sensitivity analysis with n-weighted means was performed to assess the impact of including studies without variance data and indicated no difference of outcomes. In addition, diagnostic heterogeneity of included patients is likely to influence the pain and disability outcomes, as patient-reported outcomes after LSF might be dependent on the clinical diagnosis [77, 78]. The patients included in the current study do not equally represent all diagnostic subgroups (spinal stenosis, spondylolisthesis, disc herniation, discogenic low back pain), and therefore it remains unclear whether it is valid to generalize the results to all subgroups of patients with degenerative disorders of the lumbar spine.

Recommendations for future research

Future research is needed to improve understanding of the course of pain and disability in patients of different diagnostic subgroups and different back pain trajectories [79, 80]. A large rigid cohort with broad patient categories would make precise subgroup analyses possible. In addition, research should provide more information regarding the clinical care pathway, psychosocial and physical conditions of the patients, medical treatment, and physiotherapy on both short- and long-term. Without this information, it remains impossible to improve LSF management. Furthermore, future research should focus on systematically collecting performance data to augment the patient-reported outcomes [81]. To end, it seems necessary to concentrate research in patients after first-time lumbar fusion for degenerative lumbar disorders on very long-term follow-up intervals (≥ 42 months), as a part of the post-surgery improvement in outcomes seems to decline.

Conclusion

Overall, both pain and disability outcomes improved after first-time lumbar spinal fusion for degenerative disorders. Results of the current study indicate that leg pain might be more reduced and for a longer period of time than axial back pain and disability. In patients with predominant back pain, more caution seems needed. In conclusion, a clinically meaningful result might be expected after first-time lumbar spinal fusion in patients with degenerative lumbar disorders and predominant leg pain.