Spinal surgery is a controversial option for managing low back pain as highlighted in recent guidelines [1] and The Lancet low back pain series [2]. From the range of surgical procedures available, indication for lumbar discectomy/microdiscectomy is however clear when patients present with leg pain in situations, where (1) conservative treatment has not improved pain and function and (2) radiographical findings support radiculopathy [1], i.e. as a second-line or adjunctive treatment option [2]. Reported success rates from randomised controlled trials (RCT) for lumbar discectomy are similar to conservative interventions varying between 46 and 75% at 6–8 weeks, and 78–95% at 1–2 years post-surgery [3]; but data from real-world observational evidence are less clear.

Ongoing problems are reported for a substantial number of patients. Evidence suggests 30–70% patients continue to experience pain [4]. In addition, significant rates for revision surgery are reported although exact figures vary. For example, 3–12% required further surgery in the Netherlands [5], and approximately 14% of patients required revision surgery in the UK [6].

Variability in rehabilitation compounds the problem further, with variable advice and management from surgeons and physiotherapists throughout the postoperative period [7, 8]. In the UK, where patients live geographically dictates whether they receive rehabilitation, and if they do, the content and number of sessions of rehabilitation is inconsistent [8]. A recent RCT found no benefit (clinical effectiveness or cost-effectiveness) of early referral for physiotherapy following surgery compared to no referral [9]. Postoperative rehabilitation could possibly be harmful for patients if outcomes from the clinical course are better than outcomes of rehabilitation interventions. Additionally, a clear trend in clinical recovery could indicate optimal timing for rehabilitation. Real-world evidence from observational studies may be valuable to understand clinical course and to also mitigate the limitations of RCT data including strict eligibility criteria [10].

With 80% people affected by low back pain at some point within their lifetime [11] and subsequent estimates of £10.7 billion annually for lost productivity and sickness/disability benefit just in the UK [12], data to support effective management decisions are important. Radiculopathy represents a subset of this population with estimates of a lifetime incidence as 13–40%, and an annual incidence of an episode of sciatica as 1–5% [1]. People experiencing low back pain and sciatica are reported as higher severity, and their outcomes are poorer than low back pain alone [13]. Data support an increasing number of operations in the UK National Health Service [14] and internationally, with annual estimates of 12,000 operations in the Netherlands [15] and 287,122 in the USA [16].

No systematic review to date has rigorously investigated clinical course following lumbar discectomy from a pain and disability perspective. Parker et al. [17] have investigated the frequency of symptom recurrence and reoperation across published cohorts. They found that leg/back pain remained problematic for 3–34% of patients in the short term (6–24 months, 39 cohorts, n = 8156 patients), and for 5–36% of patients in the long term (> 24 months, 28 cohorts, n = 6255). Recurrence ranged from 0 to 23% (70 cohorts, n = 18,085) [17]. Machado et al. [18] did investigate clinical course using prospective cohort study data to April 2015. They found a reduction in pain and disability by 3 months following surgery but mild-to-moderate pain and disability persisting to 5 years following surgery. However, confidence in their findings is limited by inclusion of studies without a baseline pre-surgery, a range of surgical procedures, and an outdated assessment of risk of bias (only assessing sampling, completeness of follow-up, description of prognostic outcome).


To describe the clinical course of pain and disability [19] in patients aged > 16 years post-first-time lumbar discectomy.


Protocol and registration

A systematic review and meta-analysis were conducted according to a registered (PROSPERO: CRD42015020806) and published protocol [20] informed by PRISMA-P [21], method guidelines by the Cochrane Back and Neck Group [22], and Cochrane Handbook [23]. The protocol was revised to develop a more sensitive search strategy (20 January 2016), and the planned search date was extended to encompass a high number of recently published studies (28 February 2018). There was one deviation from the protocol [20] to include all eligible studies irrespective of the follow-up in meta-analyses.

Eligibility criteria


Adult patients (aged > 16 years) following first-time lumbar discectomy/microdiscectomy/automated percutaneous discectomy for radiculopathy, with no general complications (e.g. anaesthetic, cardiopulmonary, thromboembolic) or surgical complications including cauda equina [24], were included. Studies including participants who had undergone revision surgery were excluded if data were not obtainable for first-time surgery participants only. Participants were included from across all clinical settings and healthcare providers. Reported treatments, e.g. type and nature of rehabilitation post-surgery, were recorded and evaluated as a component of the risk of bias assessment.

Outcome measures

Measurements that were reported on ≥ 1 outcomes of pain and disability [19], with a baseline pre-surgery, were included.


Inception prospective cohort studies with a clear defined episode inception (point of surgery) were included. Prospective cohorts were the preferred design to enable control of unwarranted influences. Studies not published in English were excluded.

Information sources

A sensitive topic-based search strategy designed for individual databases from their inception to 28 February 2018 was conducted, with no restrictions on language or geography [20]:

  • CINAHL (via EBSCOhost 1981–);

  • EMBASE (via EBSCOhost 1974–);

  • PubMed;

  • MEDLINE (via OVIDSP 1946–);

  • ZETOC (1993–);

  • Scopus (1996–);

  • TRIP (non-Premium version);

  • Science Citation Index and Social Science Citation Index (journal search terms: spine, neurology, orthopaedics);

  • An additional search of the Cochrane Back and Neck Group website (https://back.cochrane.org/our-reviews), Cochrane Database of Systematic Reviews, and MEDLINE identified any relevant systematic reviews to enable reference list checking.

The search strategy also encompassed unpublished research:

  • British National Bibliography for Report Literature (search terms: spine, disc, discectomy, surgery, sciatica);

  • Ethos (search terms: spine, disc, discectomy, surgery, sciatica);

  • OpenGrey.


The search strategy included both the study population terms suggested by the Cochrane Back and Neck Group, and a specific strategy for searching prognosis studies on MEDLINE. Examples of MEDLINE OvidSP advanced search, OpenGrey and EBSCOhost Boolean search and SCOPUS search are detailed in the published protocol [20]. Reference list searches of all relevant publications were conducted. When conference abstracts and proceedings were found, authors of grey literature were contacted by email. Study population terms encompassed:

  • Population: Leg pain and/or low back pain (‘leg pain’ OR ‘back pain’ OR exp backache OR ‘low-back pain’ OR sciatica OR ‘sciatic neuropathy’ OR lumbago OR ‘back disorders’ OR dorsalgia).


  • Target condition: Prolapsed intervertebral disc [‘disc adj degeneration’ OR ‘disc adj prolapse’ OR ‘disc adj herniation’ OR ‘intervertebral disc$’ OR radiculopathies(mesh) OR ‘nerve root compression’(mesh)].


  • Intervention: lumbar discectomy [discectom* OR diskectom* OR microdisc* OR micro-disc OR microdisk* OR micro-disk* OR nucleotomy(mesh) OR nucleotomies(mesh)].


  • Methodology: prospective cohort studies (inception OR survival OR ‘life tables’ OR ‘log rank’ OR prospective OR cohort OR ‘follow-up’ OR ‘follow-up study’).

Study selection

Study records were managed through EndNote. Two reviewers [AR/PG] independently searched information sources, assessing identified studies for eligibility through a process of grading eligibility criteria as either eligible or not eligible or might be eligible [25]. The title and abstracts were initially assessed [26]. Full texts were obtained for potentially relevant studies, abstracts with insufficient information, or in situations of disagreement between reviewers. A study was included when both reviewers independently assessed it as satisfying the eligibility criteria. The third reviewer [NRH] mediated in situations of disagreement following an initial discussion [22]. The process of assessing eligibility was initially piloted on five studies.

Data collection process

Using a study-specific standardised form that was initially piloted on five studies, two independent reviewers [AR/PG] extracted data. A third reviewer [NRH] carried out checks of the data to ensure clarity and consistency, and any inconsistencies were resolved through discussion and amendment.

Data items

Key data were extracted for each included cohort and included: study information, surgical procedure, duration of symptoms, number of participants, setting, interventions during follow-up phase, pain outcome measures, disability outcome measures, baseline, follow-up assessment points, losses to follow-up, and results. Pre-defined outcomes of interest were measures to assess pain and disability [19].

Risk of bias in individual studies

Following initial piloting on five studies, risk of bias was independently assessed by the same two reviewers, with the third reviewer again mediating in situations of disagreement. Study risk of bias was assessed using a modified QUality In Prognostic Studies (QUIPS) tool [27], specifically designed to assess risk of bias in prognostic factor studies ideally using a prospective cohort. As the prognostic factor section was not relevant to this review, it was removed. This and further modifications were informed by an iterative process of review and refinement based on Pengel et al.’s review of low back pain prognosis [28] to produce an eight-component definitive tool. In accordance with QUIPS, a risk of bias or low, moderate, or high was determined for each component and tabulated [27].

Summary measures and synthesis of results

Statistical pooling of outcomes with established measurement properties and continuous data was presented for short-term (≤ 3 months), medium-term (> 3, ≤ 12 months), and long-term (> 12 months) follow-up. Evaluated short-term (early postoperative period) and long-term outcomes were the time points of greatest interest to guide rehabilitation decision-making following surgery. Risk of bias within and across studies is presented at each time point, to enable a critical evaluation of its impact on findings. Authors were contacted to request raw data or additional summary statistics in situations where data or variance data were missing. Data were extracted for all time points [28]. Continuous outcome data were converted to a 0–100 scale when necessary. Means and 95% CIs were plotted over time for leg pain, back pain, and disability. Meta-analyses evaluated mean outcomes at the different time points. The variance-weighted mean was used in random-effect meta-analyses.

Risk of bias across studies

A modified version of the Grading of Recommendations Assessment, Development and Evaluation [GRADE] method was used to assess the strength of the overall body of evidence. Iorio et al. [29] and Huguet et al. [30] support GRADE’s five domains of down-rating quality (risk of bias, publication bias, imprecision, inconsistency, and indirectness) and two domains of up-rating quality [adaptation of two (large effect, dose–response gradient) of the three GRADE domains] for reviews focused on prognostic designs. Assessment of publication bias was informed by analysis of consistency between protocols, where available, and study findings and analysis of potential competing interests from research groups, with findings reported narratively. Care was ensured that the same data were not used multiple times from large studies using national databases or multiple articles reporting the same data sets. GRADE was used to rate the overall quality of evidence for an outcome (pain, disability) across studies. As distinct to GRADE used for assessing intervention studies, study design was not a key feature as longitudinal designs are the only option for prognostic research.

Additional analyses

Meta-regression analyses were conducted to study the moderation effect of key variables including type of surgery, duration of symptoms prior to surgery [31], level of preoperative pain and disability on pain and disability outcomes. It was not possible to investigate the influence of age at time of surgery, level of education, work satisfaction, coexistence of psychological complaints, evidence of passive avoidance coping function, and duration of sick leave, as data were insufficient. Selection of variables was informed by a recent review of prognostic factors [32].


Study selection

The PRISMA flow diagram (Fig. 1) documents study inclusion and exclusion [33]. A total of 3,993 potentially relevant studies were identified, with 230 studies reviewed at the full-text stage. Eighty-seven studies were included (references detailed in Online Resource 1), represented by the inclusion of 99 articles (n = 12 studies reported across > 1 article). A further n = 31 studies were eligible for inclusion, but no data were available from the study report or by contacting the authors (references detailed in Online Resource 2). Complete agreement was achieved between reviewers.

Fig. 1
figure 1

PRISMA 2009 flow diagram

Study characteristics

Study characteristics are reported in Online Resource 3. The total number of participants included across the 87 studies was n = 31,034. Sample size ranged from 12 to 10,615. The studies were conducted across multiple countries, and published between 1989 and 2018. Follow-up period ranged from 1 day to 8 years post-surgery.

Risk of bias within studies

No studies were at low risk of bias, with 49 studies assessed as moderate risk of bias and 38 studies as high risk of bias. Online Resource 4 reports detail of the risk of bias assessment. Complete agreement across all domains was achieved between reviewers. All domains contributed to risk of bias.

Results of individual studies

Online Resource 3 also details data extraction for each included study, inclusive of results.

Synthesis of results

Leg pain

Leg pain was measured on a 0–10 NRS or VAS in 50 studies (n = 14,910 participants). Mean leg pain prior to surgery was 7.04 (Fig. 2). Leg pain improved substantially immediately following surgery, and the improvement was maintained.

Fig. 2
figure 2

Mean and 95% CI leg pain over time

Back pain

Back pain was measured on a 0–10 NRS or VAS in 53 studies (n = 14,877 participants). Mean back pain prior to surgery was 4.72 (Fig. 3). Back pain improved following surgery but to a lesser degree than leg pain, and the improvement was maintained.

Fig. 3
figure 3

Mean and 95% CI back pain over time


Disability was measured on the ODI 0–100 in 48 studies (n = 15,037 participants). Mean disability prior to surgery was 53.33 (Fig. 4). Disability improved substantially immediately following surgery, and the improvement was maintained.

Fig. 4
figure 4

Mean and 95% CI disability over time

Assessment using the adapted GRADE method commenced with the quality of evidence rated as moderate as a starting point based on the gold standard prospective design of included studies (Table 1). The level of evidence was downgraded for study limitations (risk of bias) and publication bias (no published protocols), but upgraded for inconsistency, indirectness, and precision. Using GRADE, there is moderate-level evidence for the clinical course of patient outcome for pain and disability.

Table 1 Adapted Grading of Recommendations Assessment, Development and Evaluation [GRADE] table for systematic reviews with meta-analysis of prognostic designs across a range of measures

Additional analyses

Meta-regression analyses were possible as there were > 10 studies in each comparison for the a priori identified potential predictor variables of type of surgery, duration of symptoms prior to surgery, and level of preoperative pain and disability (Table 2). The mean difference for these variables is illustrated, and most were not significant with wide 95% confidence intervals and p values > 0.05 (p value indicates the significant difference in mean pooled value between the categories of the predictor variables). The only significant findings were that severity of back pain prior to surgery was predictive for future severity of leg pain (p = 0.0006), and that history of disability was predictive for future disability (p = 0.0244).

Table 2 Meta-regression model results


Statement of principal findings

This is the first rigorous meta-analysis to investigate the clinical course of pain and disability over long-term follow-up (> 7 years) following first-time lumbar discectomy. A clinically relevant decrease in leg pain and disability is seen immediately following surgery (> MCID) [34], and the improvement is maintained. The findings are consistent with the improvement in leg pain and disability documented by Machado et al. [18], but the immediate improvement is new information. Some improvement in back pain (not assessed by Machado et al.) is also seen and maintained. There is, however, evidence of a plateau in improvement with residual low severity pain and low levels of disability in the long term, indicative of persistent problems.

Mean leg pain prior to surgery was 7.04, illustrating high severity reflecting its role as the main indicator for surgical intervention [1]. Leg pain improved immediately, and this improvement was maintained, supporting the role of discectomy [2]. The findings reflect a high reported success rate for lumbar discectomy compared to RCT data [3] if based on leg pain as the measured outcome, with success clear for patients immediately following surgery. This is interesting and contrasts previous findings that the presence of features of neuropathic pain accompanied by higher scores of neuropathic pain was associated with persistent leg pain [35]. Mean disability prior to surgery was 53.33, illustrating a moderate level of disability. Disability improved immediately, and this improvement was maintained, again supporting the role of surgery [2].

There was evidence of low severity residual leg pain at the long-term follow-up which was possible to evaluate to 7.3 years, and low severity residual back pain. There was also evidence of low residual disability at the long-term follow-up. These data support rehabilitation intervention for patients who do not improve immediately following surgery or who experience problems in the long term. Findings are consistent with the remaining symptoms documented by Parker et al. [17] for leg/back pain being problematic for 5–36% of patients in the long term (> 24 months), and the pain and disability persisting at 5 years following surgery documented by Machado et al. [18] although the low severity from this review is less than the mild-to-moderate severity previously reported [18].

The results of the meta-regression analyses to assess the relationship between the outcome data and a priori potential covariates found few significant relationships (Table 2). Only severity of preoperative back pain was predictive for future severity of leg pain (p = 0.0006), and preoperative disability was predictive for future disability (p = 0.0244). This is largely consistent with previous research investigating predictors of outcome that found very-low-level evidence for duration of back pain and severity of back pain not being associated with outcome, and low-level evidence for duration of leg pain preoperatively not being associated with outcome [32]. However, there are some differences. Data supporting preoperative back pain predictive for future severity of leg pain contrast to previous findings of very-low-level evidence for back pain not being associated with outcome. Preoperative disability predictive for future disability contrasts to previous findings of very-low-level evidence for disability not being associated with outcome. The previous finding of low-level evidence that higher severity of preoperative leg pain predicted better outcome [32] was not supported by these data. The highly significant finding of severity of preoperative back pain as predictive for the outcome of severity of leg pain is therefore interesting and merits further investigation. Owing to poor reporting, it was not possible to investigate the influence of age at time of surgery, level of education, work satisfaction, coexistence of psychological complaints, evidence of passive avoidance coping function, and duration of sick leave, as data were insufficient.

Strengths and weaknesses of study in relation to other studies

This is the first low risk of bias systematic review (self-assessed using AMSTAR 2 checklist [36]) that has synthesised the evidence for clinical course of pain and disability following lumbar discectomy surgery. However, findings are limited by moderate/high risk of bias studies, potential publication bias, and lack of use of reporting guidelines. The exclusion of 15 studies not published in English and 31 studies where data were unavailable may be a limitation as key findings may have been missed although the large number of included studies and precision of most confidence intervals mitigates against this. Discussion of findings is limited by the scarce literature available.

Interpretation of findings needs to be in the context of the moderate/high risk of bias across studies, e.g. loss to follow-up affecting the internal validity of studies. This review’s robust methodology may however be criticised for being overly stringent on included studies that were not designed as prognostic studies with attention to key issues for internal validity. As the means and 95% CIs illustrate, we cannot have confidence in some longer-term data that are dependent on individual studies at risk of bias.


These data provide real-world evidence of clinical course and mitigate the limitations of RCT data with strict eligibility criteria [10] to inform our understanding. The immediate improvement following surgery perhaps explains why there is no clinical effectiveness or cost-effectiveness of early referral for rehabilitation over no referral [9].

Unanswered questions and future research

There is no need for further observational data regarding this population, and researchers should now focus on addressing the persistent low severity symptoms and disability in the longer term. Knowledge of clinical course is essential to inform clinical decision-making processes regarding the selection of patients for rehabilitation following surgery and timing of interventions.


There is moderate-level evidence for the clinically relevant immediate improvement in leg pain and disability following surgery with accompanying improvements in back pain, supporting the role of discectomy surgery [1, 2]. This review included only prospective cohort studies that are the gold standard design, and these real-world data can be used collectively with RCT data to support clinical effectiveness.