Introduction

Degenerative cervical myelopathy is a leading cause of spinal cord dysfunction in adults worldwide. Previous research has estimated that degenerative spine disease accounts for 59% of non-traumatic spinal cord injury in Japan, 54% in the United States, and 31% in Europe. The annual incidence in North America and Europe during 1966–2011 was 76 and 26 per million, respectively [1]. The spinal cord compression is typically caused by disc pathology, osteophyte formation, and hypertrophy of the ligamentum flavum [2]. The condition is characterized by motor deficits, paresthesia, gait disturbance, spasticity, and incontinence [3]. Decompressive surgery halts disease progression and improves functional outcome [4]. The decompression of the spinal cord can be achieved with both anterior and posterior approaches, and several surgical techniques have been proposed, including discectomy with fusion, corpectomy with fusion, laminectomy alone, laminectomy with fusion, and laminoplasty [3]. Anterior decompression has traditionally been recommended in patients with a kyphotic spine and less than three compressed levels [5], whereas posterior decompression is preferred in patients with cervical lordosis and three or more compressed levels [6]. In 2019, the World Federation of Neurosurgical Societies Spine Committee modified these recommendations, suggesting the use of posterior approaches for patients with 1- or 2-level posterior compression and patients with a flexible kyphosis [7]. However, when choosing the posterior approach, it remains controversial whether fusion as an adjunct to laminectomy is necessary. Following reports of post-laminectomy kyphosis in the 1970s/1980s [8], conventional thinking has favored including a prophylactic posterior-lateral instrumented fusion when there are signs of ‘instability’ [9]; however, there is no widely agreed upon definition of ‘instability’ in the degenerated cervical spine. Hence, the decision to fuse is mainly based on surgeon preference [10].

In the lumbar spine, fusion surgery has been associated with increased blood loss, operation times, resource use, risk of complications, and 30-day mortality, when compared with laminectomy alone [11, 12]. In a retrospective study comparing cervical laminectomy with and without instrumented fusion, there were no differences in clinical outcomes, whereas blood loss and operation times were increased in the fusion group. Instrumented fusions also entail the risks of screw misplacement, pseudoarthrosis, distal junction kyphosis, and adjacent segment pathology [7, 13, 14]. With muscle-preserving laminectomy techniques that can maintain sagittal balance by retaining the facet integrity and the extensor musculature [15], there is reason to explore the additional value of instrumented fusion in the cervical spine.

Using the prospective Swedish Spine Register (Swespine), the aim of this study was to compare clinical outcomes after 5 years for patients with degenerative cervical myelopathy, treated with either laminectomy alone or laminectomy plus fusion.

Methods

This study was approved by the Swedish Ethics Review Board and is reported according to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) Statement.

The Swedish health care system and Swespine

Sweden has a socialized health care system and a stable population. There is close correlation between site of residence and delivery of treatment in that site’s regional spine unit, which reduces referral bias. The patients in this study were treated at the country’s 18 major spine units, and the two treatments were evenly distributed across geographical regions and over the 13 years of observation.

Swespine has assembled cervical spine surgeries since 2006 and reports 80% of all spine surgeries in the country. The patients complete baseline questionnaires and validated patient-reported outcome measures (PROMs) after 1, 2, 5, and 10 years postoperatively, without any assistance from the surgeon [16,17,18]. The register is governed by the Swedish Society of Spinal Surgeons (www.4s.nu) with public financial support. All participants provide oral and written consent.

Study design

In this cohort study, participants were eligible if they were (1) 18 years or older, (2) diagnosed with cervical spinal stenosis and exhibited at least 1 clinical sign of myelopathy, (3) treated for degenerative cervical myelopathy with either laminectomy alone or laminectomy with posterior-lateral instrumented fusion since the registration began in January 2006 until acquisition of data in March 2019. In order to restrict confounders, all patients with previous cervical surgery or stated comorbidities in the register, including traumatic spinal injury, spinal infection, rheumatoid arthritis, ankylosing spondylitis, neoplastic disease, severe cardiac disease, severe neurological disease, or unspecified conditions causing either pain or gait disturbance, were excluded. The treatment groups were compared after 5 years using propensity score matching to adjust for clinicodemographic and radiographic confounders, along with a cost–benefit analysis. Differences in complication, reoperation, and mortality rates were also assessed.

Data collection and outcome measures

Preoperative data included sex, age, body mass index (BMI), smoking, baseline PROMs, and comorbidities. The surgeon recorded diagnosis, neurological impairment, surgical details, and perioperative complications, including death, deep venous thrombosis, pulmonary embolism, urinary tract infection, urinary retention, postoperative hematoma, deep wound infection, nerve root injury, spinal cord injury, dural tear, vascular injury, esophageal injury, vocal cord paralysis, Horner syndrome, and implant malposition. The surgeon also recorded early reoperations, i.e., reoperations performed during the initial hospital stay, and late reoperations, including reoperations both on the index level and on adjacent levels. Follow-up PROMs were completed by patients after 1, 2, and 5 years postoperatively. At the 1-year follow-up, patients were able to independently report postoperative complications, including deep venous thrombosis and pulmonary embolism, hoarseness and dysphagia that lasted more than 1 month after surgery, and superficial wound infection.

The preoperative magnetic resonance imaging (MRI) was reviewed on T2-weighted midsagittal images for number of compressed levels, spondylolisthesis, and kyphosis in the subaxial cervical spine. The radiological reviewers were blinded to the treatment. Spondylolisthesis was defined by an MRI cut-off value of at least 2.0 mm slippage between adjacent vertebral bodies [19,20,21,22]. Cervical alignment was quantified using the modified K-line (line connecting the midpoint of the spinal cord at the level of the inferior endplates of C2 and C7 on the midsagittal image), and the minimum interval (INT): measured between the tip of local kyphosis and the modified K-line [23]. A modified K-line INT less than 4.0 mm was defined as kyphosis [21].

The primary outcome measure was the European Myelopathy Score (EMS), a disability scale similar to the gold standard modified Japanese Orthopedic Association (mJOA) score. Both scales measure severity of myelopathy and are essentially equal, except that the EMS is self-administered by the patient and includes a pain assessment. The EMS is based on 5 items: gait, hand function, proprioception, paresthesia, and bladder function. The scale ranges from 5 to 18, with lower scores reflecting more severe myelopathy (normal: 17–18, mild: 13–16, moderate: 9–12, severe: 5–8) [24]. The EMS has a sensitivity to change, i.e., mean of (preoperative score – postoperative score) / median of all scores, of 0.18, which is equivalent to the mJOA score [25, 26].

Secondary outcome measures were the Neck Disability Index (NDI) (range 0–100%, higher scores indicate more severe disability) [27], the European Quality of Life-5 Dimension Questionnaire (EQ-5D) (range − 0.5 to + 1, higher scores reflect better quality of life), the European Quality of Life-Visual Analogue Scale (EQ-VAS) (range 0–100, higher scores reflect better overall health status) [17], and the Visual Analogue Scale (VAS) for neck and arm pain (range 0–10, higher scores indicate more pain) [28].

The cause of death was attained for all deceased patients from the Cause of Death Register of the National Board of Health and Welfare.

Statistical analysis

Missing data were handled using multiple imputation, as implemented in the R package ‘mice’. All variables with missing values were imputed, except operated levels, death date, complications, and reoperations. For each missing value, 100 imputations were generated and pooled before computing summaries, rendering 100 imputed datasets. The proportion of missing values ranged from 6.0% for ‘hospitalization time’ to 32.5% for ‘preoperative EQ-VAS’.

Predictive mean matching was used for numerical variables, logistic regression for dichotomous variables, and ordinal regression for ordinal variables.

Using propensity score matching, patients were matched for the following covariates: sex, age, BMI, smoking, baseline EMS, baseline NDI score, and preoperative number of compressed levels (1–6 levels), spondylolisthesis, and kyphosis. The propensity scores were estimated using logistic regression models with ‘laminectomy with fusion’ as the exposure and the covariates as explanatory variables. Using the R package ‘Matching’, patients were matched 1-to-1 on imputed data with a caliper of 0.01, and the requirement that surgery had taken place within 365 days. Covariate balance was assessed using the standardized mean difference [29], with a difference greater than 0.1 being a threshold for declaring imbalance [30].

Primary and secondary outcomes after 1, 2, and 5 years were presented graphically after propensity score matching, with error bars corresponding to a 95% confidence interval (CI). The 5-year outcomes were compared between the propensity score-matched groups with an analysis of covariance, adjusted for baseline values and the covariates included in the propensity score matching. A positive mean difference corresponded to a higher outcome value for the fusion group compared with the laminectomy-alone group. All these analyses were carried out separately for each imputed dataset, pooling the results using Rubin’s rules [31]. This also means that the number of propensity score-matched pairs differed somewhat across imputations. Statistical significance was set to p-value of 0.05 or less.

Complications and reoperations were compared using adjusted linear regression analyses. Early reoperations and late reoperations were analyzed separately. Kaplan–Meier plots and log-rank tests were used to compare postoperative mortality between the groups.

All statistical analyses were performed in R, version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) [32].

Results

Baseline characteristics

Among 967 eligible patients, 155 (27%) of 567 patients were excluded in the laminectomy-alone group, and 95 (24%) of 400 patients were excluded in the fusion group. The grounds for exclusion were evenly distributed between the groups, with previous cervical surgery accounting for 133 (53%) of the 250 excluded patients: 76 (30%) patients in the laminectomy-alone group, and 57 (23%) patients in the fusion group. Seven-hundred-and-seventeen (74%) patients were included (Fig. 1). Laminectomy alone was performed on 412 patients (mean age 68 years; 149 women [36%]), with an average of 3.4 operated levels. Laminectomy with fusion was performed on 305 patients (mean age 68 years; 119 women [39%]), with an average of 4.4 operated levels.

Fig. 1
figure 1

Flow diagram of patient inclusion. All available patients are collected at each time point; therefore, several patients in the baseline box (N = 717) are not shown in the follow-up boxes as they have not yet reached the follow-up point and thus not counted as lost to follow-up. Using imputation, the propensity score-matched patients represent 212 patients on average in each group, with recorded European Myelopathy Score data at the 2-year follow-up, at the 5-year follow-up, or at both the 2- and the 5-year follow-up

After imputation, sex ratio, age, BMI, and the number of compressed levels were comparable between the groups, with slightly worse smoking status, baseline EMS, and baseline NDI score in the fusion group. In the laminectomy-alone group, 40% of the patients had spondylolisthesis, and 40% were kyphotic, whereas in the fusion group, 59% had spondylolisthesis, and 46% were kyphotic. After propensity score matching, 212 patients remained in each group on average, with well-balanced covariates (Table 1).

Table 1 Baseline characteristics and propensity score-matched balance table

Preoperative MRIs were assessed for 487 (68%) patients. Among the 412 laminectomy-alone patients, 267 (65%) patients were MRI-assessed, and 72 (27%) had three or more compressed levels. Among the 305 fusion patients, 223 (73%) patients were MRI-assessed, and 65 (29%) had three or more compressed levels.

Outcomes

After imputation and propensity score matching, there were on average 212 patients with a 5-year follow-up in each group (Fig. 1). The EMS values were similar in both groups: baseline 12.1 (95% CI 11.0–13.3) to 12.7 (95% CI 11.6–13.9) in the laminectomy-alone group and baseline 12.0 (95% CI 10.8–13.2) to 12.1 (95% CI 10.7–13.4) in the fusion group. The mean difference after 5 years was − 0.6 (95% CI − 2.2–1.0; p = 0.47). There were no clinically important differences between the groups in NDI, EQ-5D, EQ-VAS, VAS-neck, or VAS-arm scores (Table 2, Fig. 2).

Table 2 ANCOVA table after propensity score matching
Fig. 2
figure 2

Comparison of primary and secondary outcome measures between laminectomy alone versus laminectomy with instrumented fusion, presented graphically after propensity score matching, with error bars corresponding to a 95% confidence interval. EMS European myelopathy score, NDI neck disability index, EQ-5D European quality of life-5 dimensions, EQ-VAS European quality of life-visual analogue scale, VAS visual analogue scale

Similarly, a sensitivity analysis without propensity score matching demonstrated no clinically important differences in primary or secondary outcomes after 5 years (Table 3).

Table 3 Sensitivity analysis. ANCOVA table before propensity score matching

Complications, reoperations, and mortality

There were 11 surgeon-reported complications per 100 index operations in the laminectomy-alone group and 24 per 100 index operations in the fusion group. The adjusted mean difference was 0.12 (95% CI 0.03–0.21; p = 0.009) before propensity score matching and 0.10 (95% CI − 0.03–0.23; p = 0.14) afterward. Two surgery-related deaths were reported in the fusion group, none in the laminectomy-alone group.

After 1 postoperative year, patients themselves also reported postoperative complications. There was no strong evidence of a difference in the occurrence of deep venous thrombosis, pulmonary embolism, hoarseness, and dysphagia: odds ratio (OR) 1.24 (95% CI 0.38–4.03; p = 0.72), OR 1.54 (95% CI 0.47–5.05; p = 0.47), OR 1.86 (95% CI 1.00–3.49; p = 0.051), and OR 1.62 (95% CI 0.79–3.32; p = 0.19), respectively. More superficial wound infections were reported in the fusion group: OR 2.09 (95% CI 1.14–3.81; p = 0.017).

There were 5 early reoperations in the laminectomy-alone group and 6 in the fusion group. The adjusted mean differences were 0.00 (95% CI − 0.02–0.02; p = 0.73) and − 0.01 (95% CI − 0.03–0.02; p = 0.50) before and after propensity score matching, respectively.

In the laminectomy-alone group, 8 patients underwent one late reoperation, and 1 patient underwent two late reoperations, whereas in the fusion group, 14 patients underwent one late reoperation, and 2 patients underwent three late reoperations. The adjusted mean differences before and after matching were 0.04 (95% CI − 0.00–0.07; p = 0.062) and 0.04 (95% CI − 0.02–0.09; p = 0.19), respectively.

For early and late reoperations combined, there were 4 reoperations per 100 index operations in the laminectomy-alone group, and 9 reoperations per 100 index operations in the fusion group. The adjusted mean differences before and after matching were 0.04 (95% CI − 0.00–0.08; p = 0.070) and 0.03 (95% CI − 0.03–0.09; p = 0.37), respectively.

For the propensity score-matched patients, the log-rank test revealed no statistically significant difference in mortality rate between the groups. The Kaplan–Meier plot suggested a slightly higher mortality rate during the first 3 postoperative years in the fusion group, but an equal cumulative survival from 6 years onward (Fig. 3). A sensitivity analysis without matching demonstrated similar results (Fig. 4). The cause of death-categories were evenly distributed between the groups, except for neoplastic diseases, which were twice as common in the fusion group.

Fig. 3
figure 3

Comparison of postoperative mortality for laminectomy alone versus laminectomy with instrumented fusion after propensity score matching, including the p-value for the log-rank test

Fig. 4
figure 4

Sensitivity analysis. Comparison of postoperative mortality between laminectomy alone versus laminectomy with instrumented fusion before propensity score matching, including the p-value for the log-rank test

Cost estimation

After propensity score matching, the mean hospitalization time was 5.5 days in the fusion group and 4.2 days in the laminectomy-alone group (mean difference 1.28; 95% CI 0.42–2.14; p = 0.004). Based on the prolonged hospitalization and the implant-related costs for the average instrumented fusion operation of 4.4 levels: the mean cost increase in US dollars per instrumented patient = (1.3 days x care day cost) + (4.4 levels x [cost of bilateral screws + bilateral screwcaps] + [cost of bilateral rods]) = (1.3 x $1490) + (4.4 x [$480 + $106] + [$182]) = $4697 US. For the 212 matched pairs, this rendered an additional cost of $995849 US in the fusion group. The care day cost is based on a standardized model for inpatient care costs for this patient category within the Swedish health care system. The implant costs are based on a standardized price template provided by Swedish representatives of the implant manufacturer.

Discussion

To our knowledge, this is the first study comparing laminectomy alone versus laminectomy with fusion for degenerative cervical myelopathy with a 5-year follow-up. With a nationwide prospective register that reports 80% of all spine surgeries in the country and an average of more than 3 operated levels in both groups, this study should reflect a national setting. We found no clinically important differences in patient-reported outcomes after 5 years, even when adjusting for important confounders with a well-balanced propensity score-matched analysis.

We found no major differences in complication or reoperation rates. During the earliest postoperative years, there was weak evidence of a higher mortality rate in the fusion group. This tendency may be coupled with the higher death rate due to unrelated neoplastic diseases observed in the fusion group, but it could also represent an increased vulnerability to more invasive surgery. It should also be noted that patients undergoing fusion generally had worse preoperative EMS and NDI scores, which may have impacted the results toward worse clinical outcomes in the fusion group, even when matched. However, no statistically significant differences were observed.

We observed an inclination to choose fusion in patients with smoking, spondylolisthesis, and worse preoperative EMS and NDI scores, which indicates that fusion surgery is preferred in more degenerated spines. We did not see an equally strong treatment preference regarding kyphosis, which might play a larger role in guiding the selection toward an anterior approach to achieve lordosis and/or anterior decompression. Previous research has demonstrated similar postoperative outcomes and complication rates for anterior and posterior decompression for degenerative cervical myelopathy [21]. However, a comparison between laminectomy alone and anterior decompression is outside the scope of this article. Surprisingly, the number of MRI-assessed compressed levels did not appear to significantly affect the choice toward fusion in this study, despite the average number of operated levels being more than 3 in both groups. When choosing to fuse, we observed that on average one more level was operated in the fusion group compared with the laminectomy-alone group, indicating that the decision to fuse is associated with more levels being operated.

Controversy exists concerning the use of prophylactic instrumented fusion as an adjunct to laminectomy for degenerated cervical spines. The notion of ‘instability’ has been the central argument for fusion to avoid pathological mobility that may lead to deformity or threaten the integrity of the spinal cord. Nonetheless, the effect on long-term clinical outcome remains uncertain in the absence of controlled studies [7]. Not until recently have these methods been compared as the focus of a study that reported a better functional 2-year outcome for 186 patients undergoing laminectomy with fusion compared with 22 patients undergoing laminectomy alone [33]. Noteworthy, one-fifth of these patients had ossification of the posterior longitudinal ligament. A recent review concluded that laminectomy with fusion is an effective and safe treatment, suggesting that laminectomy alone should only be used for highly selected subgroups of patients with degenerative cervical myelopathy [34]. Our results contradict these studies, concluding that the extent to which prophylactic instrumented fusion should be adopted remains poorly understood in terms of long-term clinical outcome and safety.

Cost–benefit analysis

Based on the extended hospitalization and the implant-related costs, we estimated a cost increase of approximately $4700 US for each matched patient treated with instrumented fusion, without any observed benefit regarding long-term clinical outcomes, complications, or reoperation rates. Although the absence of differences in clinical outcomes and adverse events could be attributed to an adequate selection of surgical method by the surgeon, one could also argue that when matching for radiographic confounders, the main arguments for instrumented fusion are removed. With equal reoperation and complication rates in both treatment groups, we therefore conclude that fusion is more expensive during the first 5 postoperative years. Whether instrumented fusions are more cost-effective in longer observation periods in preventing post-laminectomy kyphosis, or if laminectomy alone is more cost-effective in avoiding pseudoarthrosis, distal junction kyphosis, and adjacent segment pathology, requires further research.

Limitations

The preoperative imaging was ordered by the operating surgeon, who subsequently decided the surgical method, introducing selection bias. Dynamic lateral radiographs of the cervical spine would have improved the assessment of spondylolisthesis. Unfortunately, they were not routinely performed preoperatively and therefore not included in the propensity analysis. However, the surgeon still takes MRI-assessed spondylolisthesis into account when deciding whether to fuse, making this measure a relevant confounder in this study setting.

One-third of the patients were lost to the 5-year follow-up, which decreases the validity of the findings. Furthermore, one-third of the preoperative MRIs were not possible to locate. Multiple imputation can partly reduce the impact of missing data; however, complications and reoperations cannot be imputed. Unfortunately, the extent to which complications and reoperations are collected to the register is uncertain, and the relatively few complications reported in this study likely reflect an important collection bias. Since reoperations are reported less systematically than the index operation, a per-protocol analysis would have been unreliable. Our register-based intent-to-treat analysis is also problematic, since failed laminectomies with subsequent successful reoperations with other methods will still be analyzed as a laminectomy. Even so, considering the mixed results in previous research, it is possible to assume that no established method would skew the data in either direction.

One-fourth of the eligible patients were excluded. This number was higher than expected during the study design. Considering the propensities for worse preoperative EMS and NDI scores, as well as more degenerated cervical spines in the fusion group, one could have expected a higher exclusion rate in the fusion group. Surprisingly, more patients were excluded in the laminectomy-alone group. A partial explanation was that more patients had undergone previous cervical surgery in the laminectomy-alone group. If this reflects a limitation in the register-based data collection or more frequent reoperations with laminectomy alone in this study setting is unclear. The 58 (23%) excluded patients with unspecified conditions causing pain or gait disturbance create further uncertainty, since medical records beyond the register- and MRI-based data were not available for review. However, with predetermined exclusion criteria and evenly distributed grounds for exclusion between the groups, we believe that the selected population reflects a balanced study population in terms of collection bias.

In the study design, we chose to exclude all patients with stated comorbidities in the register to restrict confounders that we could not adjust for in the propensity score-matched analysis, while maintaining the generalizability that the more selective controlled clinical trials lack. Even so, a propensity score-matched analysis cannot account for unmeasurable confounders. The limitations of this study suggest that randomized controlled trials with stratification and subgroup analyses based on more precisely defined instability criteria might be necessary.

Even though the current study population is based on a national register with a high degree of reporting and evenly distributed data, there are still important concerns regarding the generalizability. The results are based on a Swedish health care setting and the exclusion criteria proved to be rather selective. Also, with propensity score matching, although reducing selection bias, additional exclusion criteria are added, rendering the relatively small number of 212 matched pairs. One could therefore argue that the results are not generalizable. On the other hand, the sensitivity analysis, which confirmed the results of the propensity score-matched analysis, while introducing more selection bias, it also increases the generalizability of the results. In summary, these results combined can thus be regarded as generalizable to a limited degree, but more importantly, warrant the future use of randomized register-based studies with less exclusion.

Finally, it is important to note that the current study design without random allocation cannot answer the question whether fusion in well-selected cases contributed to the equal results, or if fusion simply did not offer any additional detectable value. The use of muscle-preserving laminectomy techniques in comparative studies with randomization and long-term follow-up will likely shed light to this question in future research.

Conclusions

For patients with degenerative cervical myelopathy, surgical treatment with instrumented fusion as an adjunct to laminectomy was not associated with superior long-term clinical outcomes compared with laminectomy alone, even when adjusting for important confounders with a well-balanced propensity score-matched analysis. The cost–benefit analysis favored treatment with laminectomy alone. These findings are based on a national cohort and can thus be regarded as generalizable.