Introduction

Cervical radiculopathy is caused by encroachment of a cervical nerve root, usually by bone or disc herniation. Typical symptoms are neck pain radiating into the arm, possible loss of motor function and/or sensory loss [1]. The most common surgical treatment is discectomy and fusion [2] either as stand-alone implant surgery, or with the addition of anterior plating [3]. Anterior cervical discectomy is one of the most frequently performed spinal procedures. In the US, almost 550,000 patients were operated on between 2005 and 2008 [4]. Concern that fusion may cause adjacent segment disease [5] has given rise to motion preserving implants (arthroplasty). In the US, cervical arthroplasty surgery increased by 708% between 2005 and 2008 [4].

Multiple trials [614] and three recent meta-analyses [1517] have compared the results of arthroplasty versus fusion. Most authors concluded with clinical outcome in favor of arthroplasty [612, 1517]. However, few trials included blinding of patients [15], and blinding was only performed until just after the surgical procedure was completed [7, 11]. Only one study implemented blinding of the surgical team [14]. So far, no studies have demonstrated clinical outcome in favor of fusion.

The aim of the Norwegian Cervical Arthroplasty Trial (NORCAT) was to assess 2-year clinical outcome in patients operated for single-level cervical radiculopathy with either arthroplasty or fusion.

Methods

Study design

Patients with single-level radiculopathy were included from November 2008 to January 2013 at five neurosurgical departments in Norway. The surgical procedure was either arthroplasty or fusion. The randomization was stratified according to center, and blocked using the Unit of Applied Clinical Research website (http://www.ntnu.edu/dmf/akf/randomisering), to ensure equality in the groups. The study was designed to include 146 patients. Follow-up visits were scheduled at 3 months, 1 and 2 years. At 6 months, the patients answered the questionnaires by mail. Participating patients were blinded to the treatment until the last follow-up was completed.

The NORCAT received a grant from DePuy Synthes Spine (325 Paramount Drive Raynham, MA 02767). However, the sponsor was not involved in study design, conducting the trial, writing or reviewing the manuscript. The grant was unrestricted and the sponsor had no right of refusal for publication of the data. The sponsor read the manuscript before submission.

Participants

Inclusion criteria were: age between 25 and 60 years, clinical C6 or C7 radiculopathy with corresponding radiological findings, Neck Disability Index (NDI) [18] ≥30%, no response to non-operative treatment, and no clinical improvement during the six weeks prior to surgery. Exclusion criteria were: significant spondylosis involving more than one level, adjacent level ankylosis, intramedullary changes on magnetic resonance imaging (MRI), and myelopathy. The complete list of inclusion and exclusion criteria is available in Table S1 in the supplementary appendix.

Study interventions

Discectomy via anterolateral approach was performed. The surgical team was blinded to the result of randomization until nerve root decompression was completed. Both arthroplasty [DISCOVER® prosthesis (DePuy Spine Inc., 325 Paramount Dr, Raynham, MA 02767, USA)], and fusion [CERVIOS® cage (Synthes GmbH, Eimattstrasse 3, 4436 Oberdorf, Switzerland)] implant systems were available in the operating theater.

Arthroplasty

The DISCOVER® prosthesis allows for unconstrained motion. Two titan plates are fixed to the endplates with a polyethylene inlay. Fluoroscopy was used to ensure that the prosthesis was placed in the midline and sufficiently towards the posterior edge of the vertebra. The appropriate size of implants was determined with templates.

Fusion

The CERVIOS® cage was used to achieve anterior cervical interbody fusion. The cage was preloaded with chronOS and the procedure was performed as stand-alone surgery.

Outcome measures

Primary outcome

The NDI is a self-rated questionnaire developed for patients with neck disability. The questionnaire is composed of ten items: seven related to activities of daily living (personal care, lifting, reading, work/daily activities, driving, sleep, and recreation), two to pain (pain, headache), and one to concentration. Each item is rated from 0 to 5. The NDI summary score ranges from 0 to 50. We expressed the score as a percentage with lower scores indicating less severe symptoms. We used the validated Norwegian version [19].

Secondary outcomes

Secondary outcome measures were the Numeric Rating Scale (NRS 11) [20], the Short Form 36 (SF-36) [21], and the EuroQol-5 Dimension-3 Level (EQ-5D-3L) [22]. In addition, data regarding the surgical procedure (duration of surgery, duration of anesthesia, and total blood loss), perioperative major complications (dural tear, damage of n. laryngeus recurrens, index level nerve, esophagus, trachea or large vessel), Dysphagia Short Questionnaire [23], reoperations at index level within 2 years, and work status were recorded.

NRS 11 is a one-dimensional pain scale from 0 (“no pain at all”) to 10 (“worst imaginable pain”), used to evaluate arm and neck pain.

SF-36 is a generic questionnaire measuring health-related quality of life along eight dimensions (physical function, role limitations due to physical problems, bodily pain, general health, vitality, social function, role limitations due to emotional problems, and mental health) with two summary scores (physical component summary [PCS], mental component summary [MCS]). The score ranges from 0 to 100, with higher scores relating to better health. We used the validated Norwegian (chronic) version 2.0 [24].

The EQ-5D-3L is a generic quality of life questionnaire with five dimensions (mobility, self-care, activities of daily life, pain, and anxiety/depression), ranging from −0.59 to 1. Higher scores indicate better health status. We used the validated Norwegian version [25], and syntax files obtained from the EQ-5D society using the UK time trade off tariff to calculate the utility index [26].

The Dysphagia Short Questionnaire consists of five items (ability to swallow, incorrect swallowing, globus sensation, involuntary weight loss, and pneumonia), with scores ranging from 0 to 18. Lower scores represent milder symptoms.

Statistical analyses

The trial was planned to have 80% power to detect a difference of 10/100 in NDI score, considered to be the minimal level required for clinical important change [27, 28]. On the basis of a significance level of 0.05 and a standard deviation of 18, 104 participants were required for the trial. Correcting for 40% lost to follow-up gave a total of 146 participants. A P value of <0.05 was used as a level of significance. PASW (Predictive Analytics SoftWare) Statistics 18 (IBM Corporation, Armonk, New York, USA) was used for all analysis.

Outcomes were analyzed according to the intention-to-treat principle. Continuous data are described as means and standard deviations (SD), or medians and interquartile ranges (IQR), as appropriate, and were statistically tested between the groups with independent t test or Mann–Whitney U test depending on assumptions on statistical distribution. Ninety-five percent confidence intervals (CI) are specified in Figure illustrations and Tables for the outcome measures. Categorical data are described as number of patients and percentages, tested with χ 2 test or Fischer’s exact test, as appropriate.

To assess change in outcome from baseline to each follow-up time-point, paired samples t tests were used for parametric data, and Wilcoxon signed rank tests for non-parametric data.

The repeated measurements after intervention were analyzed using linear mixed models with a random intercept adjusted for baseline score. Follow-up time-points, treatment modality and baseline score were included as fixed main effects together with interaction terms between follow-up time-points and treatment modality. The mean differences between treatment modalities with 95% CI at each follow-up time-point were estimated using linear combinations of estimators. The linear mixed model analysis was not described in the original study protocol, but applied due to a difference in NDI scores between the treatment modalities at baseline. A sensitivity analysis including seven patients who were randomized and excluded from the trial was performed based on intention-to-treat principle with extreme values (best possible score) for all outcome measures.

Possible effect or difference between the five neurosurgical departments was also evaluated, but neither the statistical assessment nor the trial design indicated that any multicenter effect should be taken into account in our final statistical analysis.

Ethical considerations

The trial was approved by the Regional Committee for Medical and Health Research Ethics in Central Norway, and the Data Protection Official for Research. All enrolled patients gave their written informed consent. Participating senior surgeons at each hospital performed all operations. The accuracy of the study to the protocol was vouched for by all authors, and it was a unanimous agreement to submit the final manuscript for publication.

Trial registration http://www.clinicaltrials.gov NCT 00735176.19.

Results

Patient characteristics

During the study period, 3922 patients attended specialist outpatient clinics for cervical radiculopathy at the study sites, 143 of which were eligible for inclusion. Seven patients were excluded, leaving 136 patients for inclusion, 68 in each group (Fig. 1). Of these, 120 attended the 2-year follow-up and returned the questionnaires, giving a dropout rate of 11.8%. The groups were well matched with respect to demographic and clinical characteristics at baseline (Table 1).

Fig. 1
figure 1

Eligibility, randomisation, and follow-up of the patients. aOne patient withdrew consent before surgery, and one patient’s MRI was too old, but he/she was unable to undertake a new preoperative MRI due to claustrophobia. One patient had a short neck, which made visualization of the relevant level C6/C7 with fluoroscopy impossible, and for another, the prostheses were not available in the operating theater at time of surgery due to a misunderstanding. In the last patient, the surgeons had to convert to fusion with anterior plating fixation due to instability. bShort necks made visualization of the relevant level C6/C7 with fluoroscopy impossible. cOne patient did not receive allocated intervention due to problems with positioning of the arthroplasty device, resulting in conversion to fusion. dFour patients in each group did not attend the follow-up and did not return the questionnaires. eEleven patients in the arthroplasty group and 14 patients in the fusion group did not return the questionnaires. fFive patients in the arthroplasty group and six patients in the fusion group did not attend the follow-up and did not return the questionnaires. gTwo patients did not attend the follow-up and did not return the questionnaires, one patient attended the follow-up without returning the questionnaires, and 1 patient had undergone brain surgery that resulted in postoperative problems with hand writing. hOne patients did not attend the follow-up and did not return the questionnaires

Table 1 Baseline characteristics of the study participants operated for single-level cervical radiculopathy with either arthroplasty or fusion

Primary and secondary outcomes

Change in outcome from baseline to each follow-up time-point revealed no significant difference between the NDI scores of the two treatment modalities, P = 0.25 (Fig. 2; Table 2), and no differences in secondary outcome measures [Table 3; Fig. 3 (supplementary appendix)] after 2 years.

Fig. 2
figure 2

Plot of the primary outcome measure Neck Disability Index from baseline to 2-year follow-up. The primary outcome measure was the Neck Disability Index (NDI). It is composed of 10 items, each scored from 0 to 5. It was calculated in percentage where a higher score indicates more severe symptoms. The figure shows the results of the observed improvement in NDI score from baseline to 3-month, 6-month, 1-year and 2-year follow-up for both treatment modalities (the intention-to-treat population) without adjustment for baseline differences, dropouts and missing data

Table 2 Comparison of Neck Disability Indexa between arthroplasty and fusion group at each follow-up
Table 3 Comparison of secondary outcome measures between arthroplasty and fusion group at each follow-up

Both procedures demonstrated a statistically significant improvement in NDI score from baseline to 3 months, 6 months, 1 and 2 years, P < 0.001 (Fig. 2). For arthroplasty, the mean reduction in NDI score was from 45.7 (95% CI 42.9–48.6) at baseline to 25.0 (95% CI 20.1–29.9) after 2 years, P < 0.001, and for fusion from 51.2 (95% CI 48.0–54.4) at baseline to 21.2 (95% CI 16.7–25.6) after 2 years, P < 0.001. The improvement from baseline to each follow-up time was also statistically significant for all secondary outcomes, P < 0.001 (Fig. 3; supplementary appendix).

From three months, there was no significant change in NDI score for either arthroplasty or fusion (P = 0.20). The proportion of patients reaching the minimal clinically important change of ten or more improvement in NDI score from baseline, was 70.0% (n = 42) for arthroplasty, and 78.3% (n = 47) for fusion, P = 0.30.

Statistical analysis using linear mixed models for repeated measurements that correct for baseline differences, dropouts and missing data demonstrated a mean difference in NDI score of 5.9% in favor of fusion after 2 years, P = 0.049 (Table 2). Figure 4 in the supplementary appendix shows a plot for NDI where the two groups are compared based on estimated results from the statistical model. For the secondary outcomes, there was a mean difference in NRS arm pain of 1.0 in favor of fusion after 2 years, P = 0.03 (Table 3).

The surgical procedure was significantly longer with arthroplasty, P < 0.001. There were no major complications in either group, and no difference in dysphagia score in the 2 years following treatment (Table S2, supplementary appendix).

Two years after surgery, one patient in the fusion group and eight in the arthroplasty group had undergone index level reoperation, P = 0.03 (Table S2, supplementary appendix). One patient in the fusion group and five in the arthroplasty group were reoperated due to index level uncovertebral restenosis. Decompression was performed with a posterior foraminotomy, leaving the implants in place. Three patients in the arthroplasty group were reoperated because of migration and anterior displacement of the implant with suspected instability and secondary neck pain. Revision surgeries were performed with removal of the prostheses, cage implantation and/or anterior plating.

The median duration of sick leave after surgery was 10 weeks (IQR 6–27) with arthroplasty, and 12 weeks with fusion (IQR 6–30), P = 0.17. After 2 years, 59.7% in the arthroplasty group, and 71.7% in the fusion group had resumed work, P = 0.16.

A sensitivity analysis including the seven patients who were randomized but excluded from the trial was performed based on an intention-to-treat principle with extreme values (best possible scores) for all outcomes. Independent t tests revealed no differences between the groups up to 2 years.

Discussion

We found excellent clinical results for both treatment modalities at 3 months, which were sustained at 2 years. There was no significant difference between arthroplasty and fusion at any of the follow-up times. However, statistical analyses using linear mixed models that adjust for baseline values, dropout and missing data showed a difference in self-rated neck disability and the numeric rating score for arm pain in favor of fusion after 2 years.

This is not consistent with most randomized controlled trials [612], the recent study on available registry data by Staub and colleagues [29], and three recent meta-analyses [1517] reporting clinical outcome in favor of arthroplasty.

The between-group difference in NDI score of 5.9%, shown in the present study is small and the statistical significance is weak, and the results must, therefore, be interpreted with caution. One might argue that the difference should not be considered clinically important, but there is no clear consensus-based agreement on how large the between-group difference should be [30, 31]. There were 78.3% in the fusion group and 70.0% in the arthroplasty group reporting an NDI change of 10 or more from baseline to 2-year follow-up. Even though the difference was not statistically significant, the direction did not favor arthroplasty. There may be several reasons for the discrepancy compared with previous studies, such as different implant design, different study methods, different fusion technique, different lengths of follow-up, and the impact of funding by arthroplasty manufacturers.

Different arthroplasty designs have revealed different biomechanical performances for the treatment of single-level cervical disc disease [32]. Arthroplasty devices are considered constrained in certain planes if they restrict motion to less than that seen physiologically. The usual designs are, however, “semiconstrained”, which allows for physiological movement, or “nonconstrained”, where there is no mechanical stop and extremes of motion are prevented by the perispinal soft tissue and inherent compression across the disc space [33]. The nonconstrained device used in the present trial is comparable in this respect with the Bryan device (Medtronic Spine and Biologics) [7, 8, 10] and the Porous Coated Motion (PCM) device (NuVasiveInc. San Diego, CA, USA) [11]. The Prestige ST (Medtronic Sofamore Danek) [6, 9] differs from the present study implant by its semiconstrained design, and by the implantation technique, where the device is fixed with screws to the vertebrae cranial and caudal to the disc space. In addition to different degree of constraint, implants may also differ in design of their articulating surfaces. The ball and socket design of the device used in the present trial has a different impact on range of motion (ROM) compared with the Bryan and PCM devices, and the adjacent level intradiscal pressure has been shown to differ according to implant design [32].

The study methods of the present trial also differ from the previously mentioned studies where only two describe blinding of the participating patients [7, 11]. However, Heller and colleagues [7] could not continue blinding of patients after completion of the surgical procedure due to treatment with non-steroid anti-inflammatory medication (NSAID) in the arthroplasty group for two weeks after surgery. Phillips and colleagues [11] blinded patients only until after the surgical procedure was completed. Blinding of the surgical team until after decompression of the compressed nerve root has rarely been included in previous study designs, but was conducted in the study by Skeppholm and colleagues [14], consistent with the previous study methods. Strict study methods are probably important to avoid expectation bias in both patients and surgeons, and may have been a contributing factor to the discrepancy with previous trials.

Another aspect, which may influence the outcome, is the applied fusion technique. Stand-alone polyetheretherketone (PEEK) cage implant as used in the NORCAT differs from most other comparable trials, where allograft and anterior plating are most commonly used [611]. The reported fusion rates between the two techniques after 2 years are, however, similar at 97.5% [6], 94.3% [7], and 92.1% [11] for allograft with plating and 92% [34] for stand-alone PEEK cage. Nemoto and colleagues [34] recently assessed clinical outcome and complications regarding postoperative dysphagia between stand-alone cage implant versus cage and anterior plating in single-level cervical disc disease, and found no difference between the two surgical methods.

The length of follow-up may also have an impact on the clinical outcome, and longer observational period after surgery is often requested. Time is naturally highly relevant in relation to the impact of adjacent level disease [35]. However, the present study results demonstrate that there is little change in clinical outcome from 3 months up to 2 years after surgery. A longer follow-up has probably little effect on clinical outcome related to the completed surgery, as recently demonstrated by Gornet et al. [36].

Arthroplasty manufacturers are often represented as sponsors of large randomized, controlled trials, as was the case in the present study. Their role in relation to outcome is probably important to include in the overall discussion regarding outcome discrepancy between authors, and was recently discussed by Alvin and colleagues [37]. They assessed whether trials funded by arthroplasty manufacturers had a greater likelihood of reporting results in favor of arthroplasty, and found lower complication rates when a conflict of interest was reported, but no impact on health-related quality of life outcomes.

Critical issues which may explain the discrepancy in clinical outcome between the present study and most previous comparable trials are difficult to point out. The truth, however, may be a combination of physiological and actual differences between the implants, as well as different study designs as discussed above.

The expected clinical outcome is important in the surgical decision-making for individual patients. In addition, differences between surgical techniques are also key factors to consider. In the present trial, patients operated with arthroplasty had significantly longer duration of surgery, which corresponds to the results from a newly published meta-analysis [15]. Even though experienced spinal surgeons operated the patients, all surgeons were more familiar with the fusion procedure as it was the standard treatment in the departments involved. Thus, level of experience is one possible explanation for the difference in surgery duration. Other possible explanations are that implantation of the specific arthroplasty device is technically more demanding and time consuming. There were no severe complications in the present study, but the reoperation rate differed from previous trials reporting more secondary surgeries with fusion [6, 8, 9]. The difference in index level reoperations could be explained by suboptimal implantation technique or incorrect size of the arthroplasty device. However, all patients who were reoperated had their primary surgery at a time-point when all surgeons had good experience with the particular arthroplasty device. In a recent study using the same implant [38], instability and accompanying neck pain after arthroplasty were found in 8% of patients, all of whom underwent revision surgery.

Corresponding with previous reports [6, 7], patients in the arthroplasty group returned to work two weeks earlier than patients in the fusion group, but there was no difference in employment status at 2-year follow-up. A previous study concluded that the duration of preoperative sick leave influenced return to work postoperatively [39]. In the present trial, preoperative sick leave was 3 weeks shorter in the arthroplasty group, but the difference was not significant.

Ament and colleagues recently assessed the cost-effectiveness of 2-level arthroplasty or fusion at 2- and 5-years follow-up. Arthroplasty was more expensive than fusion, but came out with higher total quality adjusted life years, suggesting it to be a highly cost-effective treatment option [40, 41]. Consistent with these results, Zou and colleagues recently presented a meta-analysis on clinical outcome after two-contiguous level cervical disc surgery and concluded that arthroplasty was equivalent, and in some aspects significantly superior to fusion regarding clinical outcome [42]. Considering the results of the present trial, the growing interest among physicians for arthroplasty as an alternative to fusion, and the high number of surgical procedures performed each year [43], future studies should focus on both clinical outcome as well as cost-effectiveness analyses.

The role of adjacent level disease was not addressed in the present study since clinical outcome was the only focus of this report. The impact of adjacent level disease will be presented in a forthcoming paper including the NORCAT 5-year follow-up data. Regarding maintenance of mobility, which is the main goal of choosing arthroplasty over fusion, the authors of the present study have recently shown that high-grade heterotopic ossification around the Discover arthroplasty device was found in 62% after 2 years [44].

Limitations

Our study may be criticized for a too short follow-up period. However, the present study shows that there is little change in clinical outcome from 3 months up to 2 years. Similar results at even longer follow-up was recently presented by Staub and colleagues who reported quite stable postoperative course of patient-reported outcomes between 2 and 5 years both after arthroplasty and fusion based on registry data [29]. Their results also strengthen the external validity of randomized controlled trials comparing cervical arthroplasty and fusion, where a large number of patients often do not meet the inclusion criteria, as was the case in the present trial.

Even though no patients with severe spondylosis should have been included in the NORCAT, the degree of spondylosis using radiographic parameters for evaluation could have been emphasized specifically in the inclusion/exclusion criteria. Therefore, one cannot exclude the possibility that some patients not meeting the criteria for arthroplasty may have been included, which again could have biased the study in favor of the fusion group.

Conclusion

There was a high level of success for both treatment modalities at 2 years. Arthroplasty was not superior to fusion regarding clinical outcome. The rate of index level reoperations was higher and duration of the surgical procedure was longer with arthroplasty. More studies assessing clinical outcome and cost-effectiveness analyses are needed.