Introduction

Degenerative cervical myelopathy (DCM) is the leading cause of spinal cord impairment in adults [1]. Degenerative changes cause spinal canal narrowing leading to pain and progressive neurological deterioration with weakness, gait disturbance, spasticity, paresthesia, and sphincter dysregulation. Surgery for DCM halts disease progression and improves functional outcome [2]. In conservatively treated DCM patients, 20–62% will deteriorate within 3–6 years [3]. The optimal timing of surgical treatment is unclear, and more validated outcome predictors for DCM surgery are needed. Older age, higher comorbidity, longer symptom duration, more severe myelopathy, and myelopathy signs on magnetic resonance imaging (MRI) preoperatively have been described as independent predictors of worse postoperative outcomes [2, 4, 5], suggesting surgery at an earlier myelopathy stage to be beneficial. Mental distress and psychiatric disease are further described as risk factors of poor surgical outcome [6], as well as smoking, higher body mass index, diabetes mellitus, spasticity, hyperreflexia, disturbed gait, and muscle atrophy [2]. Nonetheless, the prognostic implications of these predictors are poorly understood, as is the value of adding a posterolateral instrumented fusion to a laminectomy. Although older studies present high rates of post-laminectomy kyphosis potentially advocating prophylactic fusion [7], modern studies indicate no clinical advantage of adding fusion to laminectomy [8,9,10]. Furthermore, fusion surgery itself may lead as frequently as laminectomy to kyphosis at the proximal or distal junction [11].

The aims of this study were to investigate improvement rates and adverse events following LAM and LAM + F and to explore the value of baseline clinical, radiological, and surgical parameters as predictors of patient-reported outcome measures (PROMs) and adverse events following DCM surgery.

Methods

The study was approved by the Swedish Ethical Review Authority (Dnr: 2017/450, amendment 2019-00913), and written informed consent was waived by the authority.

Participants

This study is a retrospective post hoc analysis of a previously published cohort of 717 patients surgically treated for DCM with LAM or LAM + F at the 18 major spine centers in Sweden since the registration started in January 2006 until March 2019. The previous study focused on efficacy and cost-effectiveness [8].

All patients were included through the prospective Swedish Spine Register (Swespine), which is governed by the Swedish Society of Spinal Surgeons (www.4s.nu) and reports approximately 80% of all spine surgeries in the country [12]. Inclusion criteria were (1) age ≥ 18 years, (2) diagnosis of cervical spinal stenosis with at least one clinical sign of myelopathy, and (3) treatment with LAM or LAM + F. The surgical method was decided by the individual surgeon in joint agreement with the patient. Based on register data and preoperative MRI data, the exclusion criteria were previous cervical spine surgery, traumatic spinal injury, spinal infection, rheumatoid arthritis, ankylosing spondylitis, neoplastic disease, severe cardiac disease, severe neurological disease, or conditions other than DCM causing significant pain or gait impairment.

Data collection

Clinical data and adverse events

All clinical data were collected from Swespine and parameters are listed in Table 1 also including PROMs collected at baseline and at 2- and 5-year follow-ups: European myelopathy score (EMS) [13], Neck Disability Index (NDI) [14], European Quality of Life-5 Dimension Questionnaire (EQ-5D), European Quality of Life-Visual Analog Scale (EQ-VAS) [12], Visual Analog Scale (VAS) neck, and VAS arm [15].

Table 1 Clinical data collected from the Swedish Spine Register, including patient-reported outcome measures (PROMs)

The EMS is a disability scale resembling the gold standard modified Japanese Orthopaedic Association (mJOA) score. Both scales measure severity of myelopathy, but the EMS is self-administered by the patient and includes a pain assessment.

Clinical improvement/deterioration were defined as (1) increased/decreased EMS by 1 point in mild myelopathy, 2 points in moderate myelopathy, and 3 points in severe myelopathy [16], and/or (2) increased/decreased NDI by ≥ 17% [17, 18]. EMS and NDI improvement/deterioration rates at 2- and 5-year follow-ups were calculated for cases with available baseline and follow-up data.

Data on adverse events consisted of perioperative complications reported by the surgeons, complications reported by patients 1 year after surgery, reoperations, and death.

Radiological evaluation

In addition to register data, preoperative T2-weighted midsagittal MRI images of the cervical spine (C2-Th1) were evaluated by observers blinded to treatment for (1) number of compressed levels (defined as a level with complete absence of cerebrospinal fluid surrounding the spinal cord anteriorly and posteriorly), (2) spondylolisthesis (measured at the point of greatest slippage (mm) between two adjacent vertebral bodies), and (3) kyphosis (measured using the modified K-line interval [mK-line INT, mm]: the minimum interval between a line connecting the midpoints of the spinal cord at the level of the inferior endplates of C2 and C7 on the midsagittal image and the tip of the anterior compression factor as illustrated in Fig. 1).

Fig. 1
figure 1

Kyphosis is measured using the modified K-line interval: the minimum interval between a line connecting the midpoints of the spinal cord at the level of the inferior endplates of C2 and C7 on the midsagittal image and the tip of the anterior compression factor. The image illustrates a modified K-line interval measured to 3.2 mm

Statistical analysis

Baseline data, improvement/deterioration rates for EMS and NDI at 2- and 5-year follow-ups, and adverse events are presented descriptively for available cases. Categorical variables are reported with frequency and percentage. Continuous variables are reported with mean and standard deviation (SD). Improvement/deterioration rates and frequency of adverse events were compared between the two treatment groups (LAM, LAM + F) with Fisher´s exact test. Note that cases with baseline values that could not improve or deteriorate according to the definition of improvement/deterioration for EMS (1 point in mild myelopathy, 2 points in moderate myelopathy, and 3 points in severe myelopathy) and NDI (≥ 17 percentage points) were excluded.

Predictor evaluation and imputation

Missing data were handled using multiple imputation with chained equations, as implemented in the R package ‘mice.’ All variables with missing values were imputed except operated levels, surgeon-reported complications, reoperations, and death dates. To support the imputation, we used baseline variables, variables recorded at the same visit, and the corresponding variable at adjacent visits. One-hundred imputed datasets were generated, and the results were pooled using Rubin’s rules.

For predictor evaluation, EMS and NDI improvement rates at 2 and 5 years were also calculated based on imputed data. The predictor endpoints were EMS and NDI improvement rates (yes/no) and EMS and NDI scores (continuous) at 2 and 5 years, surgeon-reported complications (count; 0–16), patient-reported complications at 1-year follow-up (count; 0–5), reoperation-free interval, i.e., time from surgery to first reoperation (days), and mortality, i.e., time from surgery to death (days). Patients were censored at death, migration, and end of follow-up on March 20, 2019.

For each predictor, univariate and multivariable models were fitted to imputed data. Dichotomous endpoints were analyzed using logistic regression with the odds ratio (OR) as the effect measure; numerical endpoints using linear regression with the regression coefficient as the effect measure; count endpoints using Poisson regression with the ratio as the effect measure; and ‘time to event’-endpoints using Cox regression with the hazard ratio (HR) as the effect measure. For all effect measures, 95% confidence intervals (CIs) were calculated, with statistical significance set to a p value ≤ 0.05.

Statistical analyses were performed in R, version 4.0.5 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Baseline variables for the 412 patients operated with LAM and the 305 patients operated with LAM + F are detailed in Table 2. The mean age at operation was 68 years, and 37.4% were women. The LAM group had significantly more severe kyphosis, and the LAM + F group had significantly more unemployment, more operated levels, and more severe spondylolisthesis.

Table 2 Baseline data for included cases and numbers of patients with entries in the Swedish Spine Register (Swespine) per variable and number of patients with MRI

EMS and NDI improvement and deterioration rates at 2- and 5-year follow-ups in the groups are given in Table 3 and did not differ statistically between the groups.

Table 3 EMS and NDI improvement and deterioration calculated for available cases with baseline and 2-year and 5-year follow-up data, respectively

Adverse events are detailed in Table 4. The LAM + F group had more surgeon-reported complications [38 (12.5%) versus 26 (6.3%); p = 0.003] and more patient-reported complications [68 (22.3%) versus 45 (10.9%); p < 0.001] compared with LAM. During the 13 years of observation, more LAM + F patients underwent at least one reoperation (p = 0.027; log-rank test).

Table 4 Adverse events

Predictor analyses on imputed data

The proportion of missing values ranged from 1.7% for ‘number of operated levels’ to 41.6% for ‘unemployment status.’ The overall EMS improvement rate was 43.8% at 2 years and 36.3% at 5 years. The overall NDI improvement rate was 39.0% at 2 years and 39.5% at 5 years (Table 5).

Table 5 Summary of dichotomous endpoints after 2 and 5 years

The multivariable analyses showed that lower baseline EMS scores were associated with more EMS 1–3 point-improvement at 2 years (OR 0.80; 95% CI 0.72–0.89; p < 0.001) and higher NDI scores with more NDI ≥ 17 percentage point-improvement at 2 years (OR 1.06; 95% CI 1.03–1.09; p < 0.001) and 5 years (OR 1.06; 95 CI 1.02–1.09; p < 0.001). The multivariable analyses showed that lower baseline EMS scores were independent predictors of worse EMS outcome at 2 years (0.48; 95% CI 0.37–0.59; p < 0.001) and 5 years (0.64; 95% CI 0.52–0.76; p < 0.001). Older age and higher baseline NDI scores were independent predictors of worse NDI outcome at 2 years (− 0.30; 95% CI − 0.57 to − 0.02; p = 0.038 and 0.31; 95% CI 0.17–0.45; p < 0.001, respectively) and 5 years (− 0.42; 95% CI − 0.81 to − 0.04; p = 0.032 and 0.34; 95% CI 0.09–0.60; p = 0.011, respectively).

LAM + F was associated with more surgeon-reported complications (ratio 1.81; 95% CI 1.17–2.80; p = 0.008; Poisson regression). More operated levels were associated with more patient-reported complications (ratio 1.12 per level; 95% CI 1.02–1.22; p = 0.012; Poisson regression) and a shorter reoperation-free interval (HR 1.30 per level; 95% CI 1.08–1.58; p = 0.046; Cox regression). Male sex and older age were predictors of mortality (HR 1.62; 95% CI 1.08–2.43; p = 0.022; Cox regression and HR 1.08 per year; 95% CI 1.05–1.12; p < 0.001; Cox regression, respectively).

Discussion

LAM and LAM + F groups did not differ significantly regarding improvement or deterioration rates; however, more surgeon- and patient-reported complications and reoperations were observed in the LAM + F group.

EMS and NDI improvement rates were approximately 40% at the 2- and 5-year follow-ups. This percentage is lower compared to previous studies with similar follow-up periods and assessment scales, reporting improvement rates between 45 and 71% [17, 19,20,21]. This can partly be explained by the threshold values used to define improvement in our study. The EMS has a sensitivity to change, i.e., mean of postoperative minus preoperative score divided by the median of all scores, of 0.18 [22], which is comparable to the mJOA score, but considered to be a low sensitivity to change [23]. However, the EMS improvement rates are validated by the comparable NDI improvement rates.

The predictor analysis showed that lower EMS and higher NDI scores at baseline were significant predictors of worse outcome in EMS and NDI scores at 2 and 5 years, respectively. The improvement, however, was larger in patients with worse baseline EMS and NDI scores compared with those with better baseline EMS and NDI scores. Older age was also a significant negative predictor of NDI outcome at both 2 and 5 years, but not of EMS outcome. Furthermore, neither sex nor smoking demonstrated any predictive value for any outcome. This points toward baseline severity of myelopathy as a main predictor of postoperative clinical outcome. Although there were no associations between longer neck/arm pain duration and worse EMS/NDI outcome, it is still possible that surgery was performed too late in these patients and that an earlier intervention could have improved the prognosis, consistent with previous findings indicating that longer duration of disease is associated with poorer treatment-response [2, 24]. On the other hand, patients with worse baseline EMS and NDI scores improved more, which might indicate that surgery should not be performed too early. On this note, patients with mild-to-moderate myelopathy might be benefitted by an initially conservative, i.e., wait-and-watch conservative treatment, or even long-term conservative treatment. Arguably, patients with mild and even moderate myelopathy with non-progressive or slowly progressive DCM could be managed conservatively. Kadaňka et al. reported no differences in mJOA score, subjective evaluation by the patient, and functional outcome between conservative and surgical treatment for DCM after 10 years (number of patients = 47) [25], whereas Gulati et al. demonstrated after a follow-up period of 1-year significant improvements after surgical treatment in all measured PROMs for both mild and moderate-to-severe DCM [21]. To further elucidate the optimal timing of surgical treatment, prospective trials comparing (a) intervention for mild-to-moderate myelopathy and intervention for severe myelopathy and (b) direct intervention and wait-and-watch conservative treatment for mild-to-moderate myelopathy are needed.

The surgical method had a predictive value for a higher number of surgeon-reported complications in the LAM + F group, and complications reported by surgeons and patients were twice as high for LAM + F compared with LAM, which we attribute to a higher invasiveness of the LAM + F surgical method. This is also in line with the finding that more operated levels were associated with more patient-reported complications and a shorter reoperation-free interval.

The predictive value of male sex and older age for mortality is expected, but it also shows that this is a vulnerable patient group with one in five patients dying on average in less than 4.5 years after surgery, despite a mean age of 68 years in this cohort. As a reference, the Swedish life expectancy year 2020 was 84 years for women and 81 years for men. This frailty could be another argument for performing surgery at an earlier stage, but in combination with the higher rate of adverse events in the LAM + F group, it also suggests that less invasive procedures might be preferable.

Limitations

In this cohort, preoperative lateral X-rays were not routinely performed, and only preoperative MRIs were available. We were therefore not able to evaluate the impact of all degrees of cervical spine anatomy spanning from lordosis to kyphosis and we limited MRI evaluation to measurements of kyphosis with the mK-line INT, which is the only MRI-based kyphosis measure that has been presented to predict outcome after posterior surgery for DCM. Even so, as the findings by Taniyama et al. were in the context of residual anterior spinal cord compression after laminoplasty, we refrained from using their mK-line INT cut-off of 4.0 mm [26].

Kato et al. reported an association between cervical deformity and worse postoperative outcome [27]. Our study, in contrast, showed no associations between the radiological variables and worse outcomes. This might be explained by the high frequency of normal measurements in our material and future work might aim at validation of dichotomized deformity variables with certain thresholds. In our study design, we were deliberately conservative with the use of thresholds as these can be viewed as arbitrary or controversial, especially in a retrospective study. If significant associations had been found, however, it could have been justifiable to perform new predictor analyses using already set thresholds from a previously published study [8]. Another reason for the absent clinical-radiological association could simply be that there is no correlation between the preoperative degree of deformity and patient-reported outcome, possibly suggesting that the development of kyphosis might be due to a partially voluntary cervical spine flexion to counteract clinical myelopathy [9]. Other factors, such as irreversible neural tissue injury due to the spinal cord compression, the technical success of the surgery, comorbidities, or patient attitude, may also play an important role in determining long-term clinical outcome. Unfortunately, Swespine does not collect data on, e.g., surgeon experience, operative time, or the progression of the operation except for surgeon-reported complications, data which could have strengthened the predictor analysis. Also, the register and thus this study did not include systematic radiological follow-ups, which would have enabled a comparative analysis between postoperative deformity and clinical outcome. A prospective controlled study of patients with DCM focusing on follow-up variables of deformity/instability would be valuable.

The significant differences in baseline characteristics between the groups represent an inevitable problem in non-randomized samples as constituted by this clinical cohort, which we addressed with propensity score matching in a previous study [8]. Even so, we did not see any associations between baseline predictors and outcome variables, or between the two surgical methods and outcome variables, which again might indicate that other factors are more important in determining the outcome of these patients.

Another limitation is the lack of stringent criteria to define an ‘early’ versus ‘late’ intervention. For instance, using a radiological threshold of three or more compressed levels to define a ‘late’ intervention would not necessarily correspond to a more severe clinical myelopathy, and radiological evidence of less than three compressed levels would not exclude a rapidly progressing myelopathy that requires timely surgery. Furthermore, a severe myelopathy could have a short symptom duration and vice versa. A temporal dichotomization of the surgical treatment might therefore be more practical in a prospective study setting. In this study, we therefore suggest that surgery at an earlier myelopathy stage might be beneficial, independent of symptom duration, while assuming that DCM is a progressive condition.

The relatively high frequency of serious surgeon-reported complications such as spinal cord injury and vertebral artery injury, but low frequency of less serious complications might represent a propensity to report the worst complications, resulting in a relative under-reporting of less severe complications. An example is the difference between the 22 reported spinal cord injuries, but only two reported nerve root injuries. As we did not have access to medical records in this register-based national cohort, we could not investigate adverse events more closely. For instance, C5 motor palsies, which can cause significant morbidity, did not have their own category. In an often complex postoperative clinical setting, we hypothesize that C5 motor palsies might have been grouped together with spinal cord injuries by some of the reporting surgeons. It was also evident that patients reported more complications at the 1-year follow-up than the surgeons reported complications perioperatively. This might be due to under-reporting by surgeons, who have a more limited timeframe for detection of complications, as well as differences in sets of complications reported by surgeons and patients.

Missing data are another inevitable limitation of a register-based study. Using multiple imputation, we could avoid discarding patients with missing data, while also reducing the collection bias of the study. This was particularly important for increasing the possibility of detecting or ruling out predictors, especially with as many as 22 independent variables in the predictor evaluation. For the same reasons, we used imputed data for the main analysis of improvement rates and available cases for a sensitivity analysis. The imputed improvement rates were somewhat higher, more internally consistent, and more consistent with previous research [17, 19,20,21]. This is interpreted as a tendency among improved patients to be less prone to participate in follow-ups and to complete questionnaires. With multiple imputation, however, this group is still accounted for.

Conclusion

EMS and NDI improvement rates in our cohort were approximately 40% at 2- and 5-year follow-ups independent of surgical method. Worse baseline EMS scores and worse baseline NDI scores and older age were associated with worse EMS and NDI outcomes, respectively. LAM + F was associated with more surgeon-reported complications and more operated levels with more patient-reported complications and a shorter reoperation-free interval. Male sex and older age were independent predictors of mortality. These findings suggest that surgical intervention at an earlier stage might be beneficial and that less invasive procedures might be preferable in this frail patient population.