Introduction

Anterior cervical discectomy and fusion (ACDF) is a commonly used procedure to decompress cervical spinal nerves or the cervical medulla. The “discectomy” refers to the removal of the intervertebral disc including the herniated part to provide decompression of the nervous tissue. The “fusion” part refers to the additional surgical procedure to stabilize the two adjacent vertebrae which will, theoretically, be compromised after removing the intervertebral disc. In order to provide surgical fusion, it is usual care to place an intervertebral device, like a bone graft or a cage, between the vertebra to replace the disc tissue and to maintain foraminal height. This procedure can be accompanied by anterior plating, which is assumed to add to the stability of the spine. Subsequent “bony fusion” is deemed to follow upon consolidation of the bone between the adjacent vertebrae through and along the intervertebral device. The intervention is regarded to lead to solid arthrodesis and to carry minimal surgical risks [1,2,3]. It is slightly confusing that “surgical fusion” and “bony fusion” are in general both referred to as “fusion”.

Autologous iliac bone grafts as well as cages made from titanium, polyether ether ketone (PEEK) and various other materials are commonly used as intervertebral devices. Although cages can differ in shape and material, they are all intended to maintain height and to add to immobilization of the degenerated motion segment [4].

Firm immobilization is only effectuated once bony fusion has been accomplished. Anterior discectomy temporarily challenges the stability of the cervical spine post-operatively, and this can theoretically lead to kyphotic malalignment [5], which can give rise to neck disability and pain and, ultimately, to neurological deficits. In order to avoid these complications, patients’ daily activities are restricted until bony fusion has been accomplished.

However, knowledge about the process of bony fusion is limited. Firstly, it is debated what the timing of bony fusion is after a discectomy was carried out. Secondly, the method to judge bony fusion is not equivocal. Finally, the correlation between bony fusion and clinical outcome is unknown.

The primary objective of this systematic review is to study the process of bony fusion and to obtain an overview of methods to evaluate bony fusion. Secondary objectives are to compare results based on evaluation methodology, cage or graft material and addition of bone stimulating agents, and to assess whether there is a correlation between clinical outcome and bony fusion accomplishment.

Methods

Data searches and study selection

To obtain all relevant literature, the electronic databases PubMed and Embase were searched on 14 January 2016. The search strings presented in Table 1 were used. According to PRISMA guidelines, two of the authors (IN and MTK) individually and independently screened the articles for predefined inclusion criteria. These were stated as follows:

Table 1 Search strings used for the data search in January 2016
  • The article was published in English or Dutch;

  • The article was an original report presenting primary data;

  • The article was published on or after 1 January 2000;

  • The study had a minimum of 10 patients;

  • The study focused on the cervical spine (C2-Th1);

  • The study presented patients undergoing a 1- or 2-level anterior cervical discectomy and fusion with an intervertebral device (exempting prostheses) or a bone graft;

  • The included patients did not undergo revision surgery or surgery as treatment for trauma;

  • The method of assessing fusion was described;

  • The study assessed fusion with CT scan or X-ray;

  • The article was published in a peer-reviewed journal.

Only studies that the evaluators reached a consensus on were included. If needed, a third reviewer (CVL) was consulted.

Quality assessment

To assess the quality of the selected studies, the studies were evaluated with the aid of an adjusted version of the Dutch Cochrane Centre checklist for cohort studies, presented in Table 2. The methodological requirements and objectives of these studies were closely evaluated. This was done individually and independently by two reviewers (IN and CVL). Studies were assessed on selection bias, outcome bias and follow-up bias, each category accounting for a maximum of 3 points. In total, a study could be awarded a maximum of 9 points. Studies were then divided into a low (5–9 points) or high (4 or less points) risk of bias group using a method adapted from Furlan et al. [6].

Table 2 Quality assessment checklist

Data extraction

All data from the included studies were analysed, and data regarding the following items were extracted:

  • Number of participating patients;

  • Mean time and range of follow-up;

  • Percentage of fusion at 3, 6, 12 and 24 months and at final follow-up;

  • Method of measuring bony fusion;

  • Use of bone growth stimulation;

  • Distribution of patients over different implant types;

  • Use of plate and/or screws;

  • Clinical outcome and correlation to bony fusion;

  • Contact area and height of the implant.

Statistical analysis

Descriptive analyses were performed using paired t-tests, and dichotomous data were analysed using Chi-square tests with Yates’ correction. P values of less than or equal to 0.05 were considered statistically significant.

Clinical relevance was assessed using the method described by Ostelo et al. [7], who defined absolute cut-off values for multiple clinical outcome measures and proposed a minimal clinically important difference (MCID) as an improvement of 30% or more in respect of the baseline value as a general rule.

Results

Characteristics of included studies

Through our search, 1421 unique studies were identified. After matching these to our inclusion criteria, 146 studies were included. The most common grounds to exclude studies were as follows: patients did not undergo ACDF, bony fusion was not properly described, and patient numbers were too small, as shown in Fig. 1.

Fig. 1
figure 1

Flow chart of excluding studies. ACDF anterior cervical discectomy and fusion

Combining all studies resulted in a cohort of 10,208 patients, of whom 3200 received a bone graft (including allogenic and autologous bone), 4671 received a polyether ether ketone (PEEK) cage, 348 received a poly(methyl methacrylate) (PMMA) cage, 239 received a carbon fibre cage, and 1750 received a titanium cage (Fig. 2).

Fig. 2
figure 2

Distribution of implants over patients. PEEK polyether ether ketone. PMMA poly(methyl methacrylate)

Risk of bias

A total of 119 studies were assessed to have a low risk of bias, and 27 studies showed a high risk of bias. When comparing studies with a low and high risk of bias, the difference mainly seems to be due to outcome and follow-up bias, since studies with a high risk of bias generally did not divide follow-up into multiple moments in time and did not investigate the correlation between clinical outcome and bony fusion.

Bony fusion

Among many other definitions, bony fusion was most commonly defined as the presence of trabecular bridging on X-rays or CT scans and/or absence of motion on flexion/extension radiographs. Realization of bony fusion was generally reported at the final follow-up moment (FFU). The median time to FFU was 20.5 months, with a range of 3–408 months. At FFU, studies report accomplished bony fusion in a mean of 90.1% of patients, ranging from one study reporting 30% [8] to studies reporting 100% [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. Studies with a high risk of bias reported statistically significantly higher numbers of patients in which bony fusion was accomplished than studies with a low risk of bias (94.0% and 89.4%, respectively; p < 0.0001). The rate of bony fusion (accomplishment of fusion in a particular patient over time) was studied in approximately half of the included articles, in which accomplishment of bony fusion was measured at 3, 6, 12 and 24 months follow-up (Table 3, Fig. 3) [4, 8,9,10,11, 13, 15,16,17, 20, 21, 23,24,25, 27, 30,31,32, 34, 38, 40,41,42, 45, 47,48,49,50,51, 54, 55, 57, 59,60,61,62,63,64,65, 67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97]. Significantly higher bony fusion accomplishment rates are observed after longer periods of follow-up; however, the difference in accomplishment of bony fusion between 12 months and 24 months follow-up is not clinically relevant.

Table 3 Fusion rate over time
Fig. 3
figure 3

Fusion rate over time

Methods of measuring bony fusion

Trabecular bridging as a sign of accomplishment of bony fusion was determined in 26 studies evaluating CT scans and in 63 studies evaluating plain antero-posterior and/or lateral X-rays. Motion on lateral flexion/extension X-rays as a sign of accomplishment of bony fusion was determined in 55 studies. In 17 studies, the angulation changes at the target level were measured, and in 11 studies, the difference in interspinous distances upon flexion and deflexion was measured. In 27 studies, the method was not further defined. At FFU, bony fusion was accomplished in 90.1% of patients in studies using CT scanning, bony fusion in studies using plain X-rays was 88.3%, and bony fusion in studies using flexion/extension X-rays was 91.7% (Table 4). The cut-off points in angulation changes and differences in interspinous distances on flexion/extension X-rays vary between articles (Table 5). This did, however, not lead to different bony fusion percentages for the angulation studies, but did lead to differences in the interspinous distance studies. Remarkably, fusion percentages were higher in those studies that allowed 0-mm difference as an upper border for movement in contrast to those that allowed up to 3-mm movement.

Table 4 Overview of different radiological methods of measuring fusion
Table 5 Overview of measuring fusion using flexion/extension radiographs

Measuring bony fusion by judging trabecular bridging on plain X-rays resulted in significantly lower bony fusion accomplishment than using flexion/extension X-rays (p < 0.0001). There was no statistically significant difference in bony fusion accomplishment comparing trabeculae on CT scans and flexion/extension X-rays (p = 0.06) or trabecular judgment on CT scans and plain X-rays (p = 0.077). A subgroup analysis was performed with the studies measuring fusion at 3-, 6-, 12- and 24-month follow-ups (Fig. 4). Gradual increase in the patients that attained bony fusion over the range of the year after surgery is observed. Again, significantly higher bony fusion accomplishment rates are observed after longer periods of follow-up, though the difference in accomplishment of fusion between 12-month and 24-month follow-ups is again not clinically relevant.

Fig. 4
figure 4

Fusion rate over time, stratified per radiologic technique

In 38 of the 146 articles, it was mentioned whether the radiographs were analysed by a radiologist or a clinician. In 26 of these, analysis was performed by a radiologist [10, 13, 20, 24, 32, 49, 50, 54, 57, 58, 64, 74, 85, 94, 97,98,99,100,101,102,103,104,105,106,107,108]. They found fusion was achieved in 93.5% of patients after a median follow-up of 23 months. In the other 12 articles, the analysis was performed by a clinician, usually a neuro- or orthopaedic surgeon [12, 29, 35, 61, 67, 69, 87, 92, 96, 109,110,111]. They found fusion was achieved in 85.5% of patients after a median follow-up of 23 months. This difference in fusion accomplishment was statistically significant (p < 0.0001).

Inter-observer variability was only scarcely documented and could therefore not be analysed.

Correlation fusion and type of implant

At FFU, bony fusion was achieved in 91.4% of patients with bone grafts, in 89.1% of patients with PEEK-cages, in 83.4% of patients with PMMA-cages, in 92.9% of patients with carbon fibre cages and in 91.3% of patients with titanium cages (Table 6). As the median time to FFU varied greatly, the different bony fusion percentages cannot be compared.

Table 6 Distribution of fusion and patients over different cage types

Correlation bony fusion and use of plates and/or screws

There were 3971 patients who received a plate in addition to the implant. At FFU, bony fusion was reported in 91.4% of patients. There were 499 patients who received a cage with screws attached (no plate). At FFU, bony fusion was accomplished in 96.6% of these patients. A total of 5738 received a stand-alone implant, without addition of a plate and/or screws. At FFU, the bony fusion rate in these patients was 88.6% (Table 7). In patients treated with a cage with screws attached (no plate), the bony fusion accomplishment was significantly higher than in patients treated with stand-alone implants and implants with plates (p < 0.0001). In patients treated with stand-alone implants, bony fusion accomplishment was also significantly lower than in patients treated with implants with plates (p < 0.0001). These differences can, however, not be concluded to be clinically relevant.

Table 7 Distribution of fusion over screw and plate additions

Using bone growth stimulation

The different types of bone growth stimulating agents that were used are autologous bone in 3985 patients, allogenic bone in 690 patients, freeze-dried cadaveric allogenic bone in 1188 patients, β-tricalcium phosphate in 474 patients, plasmapore coating in 424 patients, hydroxy-apatite in 311 patients, no filling in 1724 patients and 17 other types of bone growth stimulating agents spread over 1412 patients (Fig. 5). The distribution of accomplishment of bony fusion over the different types of agents is shown in Table 8. As the median time to FFU varied greatly, the different bony fusion results cannot be compared.

Fig. 5
figure 5

Distribution of bone growth stimulating agents over patients

Table 8 Distribution of fusion over bone growth stimulating agents

Correlation of bony fusion and height and surface of implant

Dimensional aspects of the implants were described in 19 studies [19, 27, 42, 45, 50, 51, 53, 72, 74, 77, 81, 85, 94, 111,112,113,114,115,116]. Only the study by Yoo et al. [116] assessed these aspects in relation to accomplishment of bony fusion. This study had a low risk of bias and found no correlation between a cage height of more than 7 mm and the absence of bony fusion (odds ratio 3.852; p = 0.101).

Correlation between bony fusion and clinical outcome

Clinical outcome was assessed in relation to bony fusion in 18 studies (Table 9) [25, 37, 40, 63, 84, 95, 105, 107, 109, 112, 116,117,118,119,120,121,122,123]. Of these, 17 studies had a low risk of bias and 1 study had a high risk of bias. Out of these 18 studies, 3 found a statistically significant correlation between the occurrence of bony fusion and a good clinical outcome [109, 120, 122]. The other 15 studies did not find a correlation between bony fusion and clinical outcome. Accomplishment of bony fusion in studies that did find a correlation was significantly lower than in studies that did not find a correlation (69.3% versus 89.8%, p < 0.0001). None of these studies correlated clinical outcome with accomplishment of bony fusion at different time points.

Table 9 Clinical outcome was assessed in correlation to fusion in 18 studies

The study by Klingler et al. [109] retrospectively compared patients treated with PEEK and PMMA implants. Clinical outcome was evaluated using the visual analogue scale (VAS), the neck disability index (NDI), the short-form 36 health survey (SF-36) and the patient satisfaction index (PSI). In patients with a PMMA implant, the fusion accomplishment after a median FFU of 46 months was 47.1%. Fused patients showed a statistically significant better physical component summary of the SF-36 than non-fused patients (p = 0.024). As the MCID for this measure is 15 [7], and the absolute difference between fused and non-fused patients is 9.2, this difference was not deemed clinically relevant. There was no correlation between bony fusion and other clinical outcome measures. In patients with a PEEK implant, fusion was accomplished in 62.2% of patients after a median FFU of 16 months. There was no correlation with any of the clinical outcome measures.

The study by Schroder et al. [120] prospectively studied patients treated with titanium cages and evaluated clinical outcome using Odom’s criteria. At FFU (median 20 months), fusion was accomplished in 74.0% of patients. The occurrence of fusion was correlated with excellent and good results, whereas the absence of fusion was correlated with satisfactory and poor results (p = 0.0364). When using Odom’s criteria, an MCID cannot be established; therefore, clinical relevance could not be assessed.

The study by Wright et al. [122] prospectively studied patients treated with autologous bone grafts and evaluated clinical outcome using VAS scores for neck pain and arm pain. At FFU (median 12 months), fusion was accomplished in 82.9% of patients. The absence of fusion was correlated with higher VAS scores for neck pain. Such correlation was not found for VAS arm pain. Absolute values were not provided in this article; therefore, clinical relevance could not be assessed.

Discussion

After ACDF surgery, bony fusion is achieved in approximately 90% of patients after a median follow-up time of 20.5 months. Bony fusion rate studies demonstrate approximately 50% of fusion after 3 months, 75% after 6 months and 90% from 12 months on. The differences between 12 and 24 months of follow-up are not clinically relevant in the overall group, or when stratified per radiologic technique. From this, it can be concluded that 12 months of follow-up is sufficient.

Methods to determine accomplishment of bony fusion seem to influence the judgement of bony fusion. Plain X-rays consistently show lower bony fusion results, even after a longer period of follow-up, and fusion results are likewise influenced by choosing cut-off levels for assessment of bony fusion. Comparable fusion results were found in comparing trabeculae on CT scans and movement on flexion/extension X-rays. As there is no generally accepted definition of bony fusion, the different techniques cannot be compared to a gold standard and it is not possible to determine which method is more accurate.

A significant correlation was found between fusion accomplishment and whether the imaging was analysed by a radiologist or a clinician. Since the articles used in this analysis did not provide additional information on this topic, and none of the articles compared radiologists and clinicians, no explanation towards this difference can be given.

The lack of a generally accepted definition of fusion is due to the absence of studies that compare fusion in an intervention group with fusion in control groups. Observing bridging of bone trabeculae on X-ray or CT scans is a qualitative measure. Measuring movement on flexion/extension is quantitative and can serve as a method to develop a gold standard. Ouchida and colleagues [124] claim that flexion–deflexion is more accurately measured on dynamic CT scans in comparison with dynamic X-rays, though again a control group is lacking. A solution could be to consider “the definite fusion group” in a group of patients treated with an intervertebral device. The “definite fusion group” may be formed by patients that demonstrated overgrowth of bone along and through the device. If those patients serve as controls for the other patients, the variation around the 0 degree or 0-mm movement measure, attributable to the measuring method, could be established. This can help in establishing a critical value above which the absence of fusion could be defined.

Another method was introduced by Johnsson and colleagues [125], who introduced metallic markers in the adjacent bony structures to enable observing movement of the vertebrae. Nevertheless, the accuracy was limited to 0.5–0.7 mm and 0.5–2 degrees in this study, which was performed in the lumbar spine. Therefore, it seems inadequate to use a cut-off value of 2 degrees to decide on fusion in the cervical spine, like some of the articles included in this review have done.

A minority of studies (n = 18) examined the correlation between accomplishment of bony fusion and clinical outcome. Only 3 studies demonstrated a correlation between the absence of fusion and worse clinical outcome, 15 studies did not find a statistically significant correlation. Studies that did find a correlation had lower bony fusion rates than studies that did not find a correlation, which could mean that the studies that did not find a correlation did not have enough power to statistically assess a correlation between fusion and clinical outcome. Furthermore, none of these studies correlated clinical outcome with accomplishment of fusion at different time points. It would be interesting to examine improvement of clinical outcome correlated with accomplishment of bony fusion over time. A recent study did demonstrate a correlation between the absence of fusion and neck pain and considered two time-points [124]. Patients with fusion at 6 months had less neck pain than without fusion at 6 months, and patients with fusion at 12 months had less neck pain than those without fusion at 12 months. However, the number of patients studied was relatively low. Also, the difference in neck pain between months 6 and 12, nor the difference in fusion, nor the correlation between those 2 was studied.

In future studies, it is recommended to evaluate clinical condition in correlation with bony fusion in an earlier phase of the fusion process, when fusion is not yet accomplished in the majority of patients. Conclusions on the correlation of bony fusion and clinical condition cannot be drawn based on the available literature.

Articles with high risk of bias reported higher percentages of bony fusion accomplishment than articles with low risk of bias. In articles with a high risk of bias, the method of measuring bony fusion was often not described, therefore, the higher fusion rates can be due to improper determining of bony fusion.

When comparing different types of implants, bone growth stimulating agents, plates or cages with screws and dimensional aspects of the implant, minor statistically significant differences are found in bony fusion accomplishment, which do not reach clinically relevant numbers, in regard to the MCID. Small differences in bony fusion results will likely not be of importance, if a correlation with clinical outcome cannot be established.

Conclusion

Fusion as a long-term result after ACDF is satisfactory, but lack of a generally accepted definition of bony fusion and differences in study design hamper conclusions on optimising the rate of bony fusion by choice of material and/or additives. Overall, it can be concluded that 12 months of follow-up after ACDF is sufficient.