Lumbar spinal stenosis (LSS) is a clinical diagnosis characterized by symptoms of back- and leg pain, neurogenic claudication and corresponding MRI findings showing narrowing of the spinal canal. Several studies have shown that surgery is a beneficial treatment option [1, 2] and LSS is currently the most frequent cause of spinal surgery in the western world [3, 4]. Patient reported outcome after surgery is reported to be good or excellent in 60–80 percent of the patients [5,6,7,8]. Unfavorable outcomes have been attributed to inadequate patient selection and individual risk factors such as comorbidity, psychosocial factors, high BMI and smoking [9,10,11].

Radiological imaging is mandatory for establishing the LSS diagnosis and several radiological classification systems have been proposed, but their correlation to symptom severity is generally weak [12,13,14]. Previous studies evaluating the relationship between radiological findings and patient reported outcomes have reported conflicting results [15,16,17]. The identification of prognostic factors could improve surgical decision-making and possibly clinical outcomes. Thus, the aim of this analysis was to investigate a broad spectrum of preoperative MRI findings in LSS patients and their potential associations with PROMs 2 years after surgery.


The NORwegian Degenerative spondylolisthesis and spinal STENosis (NORDSTEN) study is a large RCT evaluating clinical and radiological outcomes of different surgical treatments for LSS. The patients included in the present analysis are from the NORDSTEN Spinal Stenosis Trial (SST), which includes 437 LSS patients without spondylolisthesis [18].

Inclusion process and patient recruitment

All patients included had MRl findings and symptoms consistent with LSS. In total 2227 patients were referred for evaluation at a spine surgery unit, and 437 patients fulfilling all eligibility criteria were finally included in the SST trial (Fig. 1). All patients were enrolled between February 2014 and October 2018. The patients were randomized and treated with three commonly used surgical techniques for LSS. All three techniques resulted in similar success rates [19]. The included patients answered the questionnaires preoperatively and at the 2-year follow up. Inclusion criteria are presented in Table 1.

Fig. 1
figure 1

Flow chart of the NORDSTEN and the SST according to the STROBE-statement. DST = Degenerative Spondylolisthesis Trail. SST = Spinal Stenosis Trial

Table 1 Inclusion and exclusion criteria for the Spinal Stenosis Trial (SST) in the NORDSTEN-study

Magnetic resonance imaging

All participants underwent a 1.5 or 3 Tesla MRI of the lumbar spine within 6 months before surgery. The MRI protocol included sagittal T1- and axial and sagittal T2- weighted images with repetition time (TR)/ echo time (TE) 1500–6548/82–126 ms for T2-weighted images and 400–826/8–14 ms for T1-weighted images, slice thickness: 3–5 mm, FOV:160–350 mm. All MRI examinations were anonymized. PACS IDS7 (SECTRA) integrated measurement tools were used for assessment of morphological changes.

Two experienced radiologists established a protocol for MRI evaluation in concordance with previously validated classification systems. The inter- and intra-observer agreement analysis is evaluated in a previous study [20].

We defined the index level as the narrowest lumbar level measured with dural sac cross-sectional area (DSCA). At index level, we investigated the following parameters and dichotomized the radiological scores into moderate and severe changes:

  • Schizas qualitative grading system, grading the morphology of the dural sac ranging from A (no or minor narrowing) to D (extreme narrowing). Schizas grade C and D were classified as severe changes. The distinction between moderate and severe changes is determined by observation of cerebrospinal fluid surrounding the neural structures [12].

  • DSCA according to the method described by Sconstrom and Hansson [21]. DSCA less than 75 mm2 was classified as severe changes.

  • Pfirrmann grading system to evaluate the intervertebral disc degeneration from 1 (normal) to 5 (worst)) [22]. Pfirrmann 4 and 5 was classified as severe changes. Moderate changes were distinguishable by white/grey disc and severe changes by black/collapsed disc.

  • Facet joint angle measured according to the method described by Noren et al. [23] and facet joint tropism evaluated according to the method of Vanharanta [24]. Tropism of 15° or more was classified as severe changes.

  • Fat infiltration of the multifidus muscle according to the Goutallier classification from 0 (normal) to 4 (severe) [25]. Goutallier grade 2–4 was classified as severe changes. Worst side right/left from the index level was used in the analysis.

Outcome measures

Before surgery and at the 2-year follow up, the patients completed a self-administered questionnaire containing commonly used PROMs such as the Norwegian version of the Oswestry Disability Index (ODI, the Zurich Claudication Questionnaire (ZCQ) and numeric rating scale (NRS) for leg and back pain.

The primary outcome measure was a reduction of at least 30% of the ODI score after the 2 year follow up period determined as threshold value to define the surgical intervention as a success [26,27,28,29].

Secondary patient reported outcomes measures were summary scores reported at 2 year follow-up for ODI, ZCQ and NRS for leg and back pain.

The ODI is a low back pain-specific questionnaire consisting of ten questions concerning pain related disability. The ODI score ranges from zero (no disability) to 100 (most severe disability) [30, 31].

The ZCQ is a disease specific questionnaire for LSS measuring symptom severity and physical function[32]. The symptom severity- scale ranges from 1.0 to 5.0. The activity scale ranges from 1.0 to 4.0. For all scales, 1.0 is minimum burden. The NRS for leg and back pain ranges is from zero (no pain) to 10 (worst pain imaginable) [33].

Statistical analysis

The present study is a blinded analysis of data collected prospectively in a RCT, nested within the NORDSTEN Spinal Stenosis Trail. Standard descriptive statistics were used to present demographic data at baseline and outcome measures at baseline and follow-up. Paired-sample T-tests were used to compare differences in means between baseline and 2-year follow-up. To analyze the association between MRI findings and the primary and secondary outcomes we applied multivariable regression models including all MRI parameters and controlling for the most relevant patient demographics including age (continuous), sex, current smoking status (yes/no) and BMI (continuous). For the primary dichotomous outcome, a logistic regression model was used, estimating odds ratios and corresponding 95% confidence intervals. For the continuous secondary outcomes we used linear regressions, and estimated unstandardized regression coefficients with corresponding 95% confidence intervals. All analyses were done using Stata version 16.1.

Ethics and trial registration

The Committee for Medical and Health Research Ethics of Central Norway approved the study (study identifier: 2011/2034). The study was registered at (22.11.2013) under the identifier NCT02007083. All patients provided written informed consent.


Baseline data

This analysis included 437 patients, mean age was 66.8 (SD 8.4) years, 52.7% were males and 20.8% were smokers. Patient characteristics and PROMs preoperatively and 2 years after surgery are presented in Table 2. The proportion of patients categorized with severe radiological changes preoperatively were: Schizas grade 296 of 415 (71%), DSCA 360 of 415 (86%), Pfirrmann score 241 of 415 (58%), fatty infiltration of the multifidus muscle 308 of 368 (84%), facet joint tropism 49 of 415 (12%). In total 35 Patients (8%) dropped out during follow up.

Table 2 Cohort of LSS patients selected for surgical treatment

Clinical outcomes

Mean improvement in ODI from baseline to two-year follow-up for the cohort was 19.1 (95% CI 17.5–20.8). The proportion of patients with minimum 30% improvement in ODI score was 273/393 (69.5%). Mean improvement in ZCQ was 1.0 (95% CI 0.9–1.1) for Symptom Severity and 0.8 (95% CI 0.8–0.9) for Physical Function. The mean NRS leg pain improvement was 3.5 (95% CI 3.2–3.8) and 2.7 (95% CI 2.4–3.0) for NRS Back pain There was a statistically significant improvement between baseline scores and scores at 2 years follow-up for all investigated PROMs with p values < 0.001.

Risk factor analyses

Primary analysis

When controlling for gender, age, smoking status and BMI, the only MRI parameter associated with less chance of achieving the targeted goal of minimum 30% improvement in ODI score was severe disc degeneration (Pfirrmann score 4–5) (OR 0.54 95% CI 0.34, 0.88) (Table 3).

Table 3 Logistic regression model with odds ratio indicating the chance of successful surgery when comparing moderate/severe changes in given radiological classification systems

Secondary analyses

Compared to moderate disc degeneration (Pfirrmann score 1–3), severe disc degeneration (Pfirmann score 4–5) was significantly associated with higher ZCQ symptom and function score with mean difference of 0.19 points (95% CI 0.02, 0.36) and 0.17 points (95% CI 0.04, 0.29).

The comparison between tropism yes/no indicated a statistically significant association between absence of facet joint tropism preoperatively and less improved PROMs measured with NRS leg pain with mean difference of -1.12 (95% CI − 2.13, − 0.12) and NRS lumbar pain with mean difference of − 0.98 (95% CI − 1.91, − 0.01).

Compared to severe morphological changes (Schizas grade C-D), moderate morphological changes (Schizas grade A-B) was statistically significantly associated with less improved ODI score with mean difference of − 4.6 ODI points (95% CI − 8.6, − 0.6) (Table 4).

Table 4 Cohort of LSS patients selected for surgical treatment


Our main finding in this analysis is the negative association between severe disc degeneration (Pfirrmann score 4–5) and the odds of achieving a 30% improvement on the ODI score. Patients categorized with severe disc degeneration had almost 50% reduction of their probability to experience successful outcome 2 years after surgery. The finding in the primary analysis was supported by the secondary analysis, using continuous PROM improvement as dependent variables. The effect size was small, and probably not clinically important [26] but due to the consistency of severe disc degeneration across different outcomes, we still consider the negative prognostic impact as clinically relevant.

The association between facet joint tropism and improvement in leg and back pain, and between PROMs and Schizas grade in the secondary analyses reached statistical significance. However, the effect sizes were small, and most probably not clinically relevant [26]. No such associations were found in the primary analyses, we therefore consider these finding as incidental, probably due to multiple testing.

Mannion et al. conducted one of the major studies in this field. They found a significant and clinically relevant association between improvement in PROMs after surgery, and higher preoperative Schizas grade and a higher reduction of DSCA [17]. We could not reproduce this observation. A possible explanation might be the use of different outcome measures i.e., the ODI score in our analysis and the Core Outcome Measure Index (COMI) used in the study by Mannion. Considering that patients with clinical and radiological lateral stenosis were also included in the NORDSTEN cohort, the authors cannot rule out the possibility that the influence of a severe central stenosis might consequently be statistically weakened.

Sigmundsson et al. suggested that decreased DSCA at baseline was associated with less back and leg pain at follow up. The utilized instrument was a VAS scale and the clinical relevance was by the authors considered as minor. No association between baseline DSCA and ODI at follow-up was detected [15]. The relatively small Swedish study with 109 participants did not dichotomize DSCA. Consequently, this observation is not directly comparable with our findings. Weber et al. investigated preoperatively Schizas grade and PROMS at 1 year follow up based on unselected patients from the Norwegian Spine Registry without finding any clinical association [16].

None of the referred studies investigated the possible association between PROMs and preoperative disc degeneration, fatty infiltration, or tropism. The investigated radiological parameters were chosen based on previously suggested potential but limited published data as well as easily applicable parameters. In addition to Schizas grade and DSCA, earlier studied have investigated the potential predicative value of other MRI findings. Kuittinen et al. investigated lateral spinal canal recess stenosis and foraminal stenosis preoperatively without detecting any association to outcome scores after surgery [34]. The present analysis do not include measurements of lateral or foraminal stenosis. Regarding the predictive value of severe disc degeneration one cannot exclude the presence of possible confounders, such as overall more degeneration of the spine or more multilevel central stenosis than unilateral one level stenosis. The NORDSTEN group have earlier published a paper investigating the association between symptom severity before surgery and preoperative MRI findings in patients with LSS. A significant association between Pfirrmann score and ODI score was detected, but with uncertain clinical relevance.[14].

The dichotomization of the scores in the different radiological classification systems was chosen to differentiate between patients with moderate and severe MRI changes and in concordance with earlier studies [12, 14, 21, 24].

To adjust for potential confounders we used gender, age, smoking status and BMI as covariates in the analysis. These variables have been identified as independent predictors for surgical outcomes in previous studies [11, 17, 35, 36]. Other potential predictors suggested i.e., depression, grade of physical activity and observed scoliosis could not be included in the analyses due to the absence of such data in the NORDSTEN cohort.

Limitations and strengths

The large number of participants gave us the opportunity to investigate a large number of radiological variables without compromising the strength of our statistical models. However, the chance of false significant associations increase with increasing number of variates. Due to the low number of dropouts we consider the risk of attrition bias to be low.

The MRIs investigated in this paper are collected from a large number of institutions. Factors as slice orientation and magnet strength may vary. This could inflict our measurements and consequently bias the result of the analysis. However, due to the strict MRI protocol distributed to all radiological institutions, we consider the risk of information bias to be low. All radiological measurements were performed by investigators, blinded to clinical data, and both inter- and intra- reliability were high.

It is important to recognize is that that the results cannot be generalized to subgroups not included in the study cohort, for example those with a concomitant degenerative spondylolisthesis and those with unilateral recess stenosis. For the included patient groups, some risk of selection bias does still exist. Due to a low number of included patients at some spine surgery units, it is likely that a considerable number of patients were not screened for eligibility. Hence the representation of included patients might not be in accordance with the defined study population.


In this study on patients operated for LSS, severe disc degeneration was the only preoperative MRI finding associated with reduced chance of achieving a 30% improvement in ODI score 2 years after surgery. Grade of spinal stenosis measured by Schizas and DSCA was not associated with outcome.