The diagnostic accuracy of imaging modalities to detect pseudarthrosis after spinal fusion—a systematic review and meta-analysis of the literature

Objective The aim of the study was to determine the diagnostic accuracy of imaging modalities to detect pseudarthrosis after thoracolumbar spinal fusion, with surgical exploration as reference standard. Materials and methods A systematic literature search for original studies was performed on the diagnostic accuracy of imaging as index test compared to surgical exploration as reference standard to diagnose pseudarthrosis after thoracolumbar spinal fusion. Diagnostic accuracy values were extracted and methodologic quality of studies was evaluated by the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool. Per modality, clinically comparable studies were included in subgroup meta-analysis and weighted odds ratios (ORs) were calculated using the random effects model. Results Fifteen studies were included. Risk of bias was classified as high/unclear in 58% of the studies. Concerns of applicability was classified as high/unclear in 40% of the studies. Four scintigraphy studies including 93 patients in total were pooled to OR = 2.91 (95% confidence interval [CI]: 0.93–9.13). Five studies on plain radiography with 398 patients in total were pooled into OR = 7.07 (95% CI: 2.97–16.86). Two studies evaluating flexion-extension radiography of 75 patients in total were pooled into OR = 4.00 (95% CI: 0.15–105.96). Two studies of 68 patients in total were pooled for CT and yielded OR = 17.02 (95% CI: 6.42–45.10). A single study reporting on polytomography, OR = 10.15 (95% CI 5.49–18.78), was also considered to be an accurate study. Conclusions With a pooled OR of 17.02, CT can be considered the most accurate imaging modality for the detection of pseudarthrosis after thoracolumbar spinal fusion from this review.


Introduction
Low back pain is a global health and socio-economic problem [1], as it is the leading cause of disability and work absenteeism in the Western world [2]. When conservative measures fail, operative intervention can be considered. Spinal fusion is a surgical procedure in which rigid fixation of vertebral segments is achieved by means of osteosynthesis and bone grafting to create definite bony fusion of the vertebrae involved. Failed spinal fusion may occur in 30-40% of spinal fusion patients [3,4]. Pseudarthrosis is defined as the absence of solid bony fusion at a minimum follow-up of 6 months after surgery [5,6]. Pseudarthrosis can be associated with persistent or recurrent back and/or leg pain [7], but can also be asymptomatic [7][8][9]. Whether symptomatic or asymptomatic, pseudarthrosis increases the risk of material failure, late deformity, and neurological symptoms [10,11].
Revision surgery is the preferred treatment in spinal fusion patients suffering from symptoms due to pseudarthrosis. Revision surgery is invasive, expensive, and may have a worse outcome than primary surgery [12,13] and should only be performed when the pseudarthrosis diagnosis is irrefutable. Since symptoms of pseudarthrosis may be nonspecific and multiple individual sources of pain may contribute to the complex of symptoms [14], diagnostic tools are required to set the diagnosis. The gold standard for the diagnosis of pseudarthrosis is surgical exploration [5,7,15,16], an invasive, costly, and nowadays rarely used test which is not desirable or ethical in patients without symptoms. The aim of the study was to determine the diagnostic accuracy of imaging modalities to detect pseudarthrosis after thoracolumbar spinal fusion, with surgical exploration as the reference standard.

Identification of studies
This review was performed according to the PRISMA statement guidelines [17,18]. A systematic literature search was conducted in the PubMed, EMBASE, and CINAHL databases from inception until February 2017 to identify relevant studies. A list of keywords and text words was formulated to describe the detection of pseudarthrosis by imaging as index test compared to surgical exploration as reference standard in patients after spinal fusion surgery. Terms for imaging: tomography, radiography, plain radiographs, MRI, CT, scintigraphy, SPECT, SPECT/CT, PET, PET/CT, DEXA. Terms for study design: diagnostic accuracy, precision, predictive value, sensitivity, specificity, false positive, false negative. Terms for patient population: spine, vertebrae, vertebral column, spinal fusion, spinal arthrodesis, spondylodesis, bone graft, pseudarthrosis, non-union, delayed union, clinical failure, surgical exploration, re-operation, second-look operation. The search was limited to the English language.
Once the search was completed, the resulting articles were checked for duplicates. Subsequently, two independent reviewers (PW, orthopedic surgeon with over 10 years of experience in spinal surgery and MP, junior researcher specialized in imaging) screened the identified citations to determine whether they met predefined inand exclusion criteria. If disagreements could not be resolved by consensus, a third reviewer (CB, clinical epidemiologist with over 15 years of experience in conducting systematic reviews) was consulted. Only original studies that provided data to construct contingency tables were included. Exclusion criteria were spinal fusion for the indications bone fracture, tumor, infection; time interval between surgery and index test less than 6 months; patient population smaller than ten; cervical fusion; animal studies; in vitro studies.

Data extraction
Standard reference data, population characteristics, details on spinal fusion, index test, reference test, and time intervals were extracted by the reviewers (PW, MP). Disagreements were resolved by consensus. Besides study characteristics, diagnostic accuracy data was extracted. Since the outcome was dichotomous (diagnosis was either pseudarthrosis or fusion), contingency tables were constructed. We also recorded whether the results originated from per-patient-, per-level-, or per-side-based analysis.

Methodological quality
The methodological quality of each selected study was assessed independently by the reviewers according to the Quality Assessment for Diagnostic Accuracy Studies 2 (QUADAS-2) tool [19]. The QUADAS-2 tool consists of four key domains that discuss patient selection, index test, reference standard, flow of patients through the study, and timing of the index test and reference standard. Each domain was scored in terms of risk of bias and concerns regarding applicability to the research question. Disagreements were resolved by consensus.

Data synthesis and statistical analysis
Pseudarthrosis was defined as a positive test result and fusion as a negative test result. Diagnostic accuracy values were calculated from the extracted contingency tables. Continuity correction was applied to studies with zero-cell counts by adding 0.5 to all cells of the study [20]. Per index test, the studies describing that test were considered for inclusion into subgroup meta-analysis.

Inclusion in meta-analysis
Meta-analysis was only performed when studies evaluating the same modality were not significantly hampered by clinical heterogeneity. Studies were considered clinically heterogeneous when patient groups, outcome measures, and/or the execution of index tests were considerably different. The random effect model was employed during metaanalyses to account for unobserved sources of variation [21]. The odds ratio (OR) was used as the principal summary measure in meta-analysis. The higher the OR, the better the discriminatory performance. An OR of 1 indicates a test that does not discriminate between patients with pseudarthrosis and patients with fusion [22]. An OR below 1 suggests a negative association between index test and surgical exploration. Analyses were performed using the Stata statistical software package, version 14.1 (StataCorp, College Station, TX, USA).

Identification of studies
One hundred sixty-five potentially relevant references were identified through database search. After screening, 15 studies were included in this review, reporting on eight modalities: plain radiography, flexion extension radiography (FE radiography), computed tomography (CT), single-photon emission computed tomography (SPECT), planar scintigraphy, polytomography, ultra sound/sonography (US) and 18 F-fluoride positron emission tomography/computed tomography (PET/CT). The study selection flowchart is detailed in Fig. 1. The level of evidence of the included studies ranged from I to III.

Data extraction
Study characteristics of the 15 included studies are listed in Table 1. The number of levels fused in a single patient during initial surgery ranged from 1 to 13 levels. Eight articles monitored pseudarthrosis per patient, five monitored each level separately, and two made a distinction between the left and right side of each operated level. All articles reported that persistent low back pain and/or suspicion of pseudarthrosis was the reason for surgical exploration. The time interval between initial surgery and surgical exploration ranged from 6 to 120 months. Table 2 displays the quality assessment according to QUADAS-2. An overview of the distribution of QUADAS-2 scores is presented in Fig. 2. Risk of bias on 'flow and timing', 'patient selection', 'index test', and 'reference standard' was classified as high or unclear in 58% of cases. Common weaknesses related to poor documentation of patient selection and description of the reference standard. Two studies were considered to have low risk of bias in all four domains. Concerns of applicability on 'patient selection', 'index test', and 'reference standard' was classified as high or unclear in 42% of cases. Three studies were considered to suffer from low applicability concerns over all three domains.

Inclusion in meta-analysis
The studies discussing the index tests SPECT [24,28] and planar scintigraphy [14,23,30] were considered for inclusion into subgroup meta-analysis further referred to as scintigraphy. McMaster et al. was not included because the time interval between fusion surgery and surgical exploration was deviating too much from the other studies. The remaining four studies were pooled.
Six studies were considered for inclusion in meta-analysis for plain radiography [14,15,26,27,31,32]. Fogel et al. was excluded since the low prevalence of pseudarthrosis made the study population incomparable to the other studies (see Table 3). The remaining five studies were considered comparable enough to be pooled. Two articles diagnosed pseudarthrosis per patient [14,26], two per level [27,31], and one per side [15]. We chose to pool these studies despite differences in analysis region since we were mainly interested in the correlation between findings on imaging and surgical exploration. Using the same rationale, no distinction was made between studies on posterolateral and interbody fusion.
Two articles were considered for FE radiography metaanalysis [14,15]. Apart from differences in analysis regions, the study characteristics were considered comparable and the studies were therefore pooled.
Six articles were considered for inclusion in CT metaanalysis [14-16, 25, 32, 33]. The study of Brodsky et al. was excluded for lack of sagittal and coronal reconstructions, essential in the assessment of interbody bony fusion [14,16,33,35]. Laasonen et al. and Larsen et al. were excluded on slice thickness. Thicknesses of 5 and 6 mm were used respectively, while bony bridging should be assessed using thin slice CT to be reliable [16,32,33,35]. Fogel et al. was excluded for low prevalence of pseudarthrosis compared to the other studies. The posterolateral fusion patient group of Carreon et al. [16] and the interbody fusion patient group Carreon et al. [33] were pooled for CT. Figure 3 shows a forest plot of the studies selected for subgroup meta-analysis, with their respective weights and resulting pooled ORs. Index tests for which only one study was identified, i.e., US, polytomography, 18 Ffluoride PET/CT [15,29,34], could inevitably not undergo subgroup meta-analysis. These single studies were, however, evaluated on the same grounds and if considered reliable, included in Table 4 to complement the meta-analysis results. This was only the case for the study on polytomography [15]. For the study on US [29], the authors considered that with the evaluation of ten patients only, US was not investigated thoroughly enough for pseudarthrosis detection. In the 18 F-fluoride PET/CT study [34], the reference standard was either surgical exploration or clinical follow-up, based on the index test outcome. This introduced a bias in the patient population that underwent surgical exploration; only patients with a suspicion of pseudarthrosis on 18 F-fluoride PET/CT were surgically explored and used to calculate diagnostic accuracy.

Discussion
This systematic review summarizes studies in literature that investigated the diagnostic accuracy of imaging modalities to detect pseudarthrosis after thoracolumbar spinal fusion with surgical exploration as the reference standard. Diagnostic accuracy values of individual studies were determined, and for studies of the same modality that were clinically comparable, a pooled OR was calculated.
Patients after spinal fusion can be monitored by several modalities. Plain radiographs attempt to reveal deficient morphology of the fusion mass as a sign of pseudarthrosis.
However, plain radiographs are projections only [35,36] whereas pseudarthrosis is a three-dimensional problem. The pooled OR of radiography was 7.07. In FE radiography, radiographs are made during flexion and extension of the spinal column to detect motion in the operated segment as a sign of pseudarthrosis. Cases exist where no signs of pseudarthrosis were found on plain radiography, CT, and MRI, but FE radiography detected the pseudarthrosis by unveiling motion between the segments [37]. However, on the other hand, absence of motion does not necessarily correspond with solid fusion and the presence of motion is not directly related to pseudarthrosis [12,[38][39][40]. Furthermore, no consensus exists   Going from a single slice in radiography to several planes in polytomography, the OR increased to 10.15. However, polytomography seems to be outdated by CT developments and currently not frequently used. CT offers three-dimensional osseous detail [33,35]. After meta-analysis, CT was the modality with the highest OR in this review. Besides detection of bridging trabecular bone, CT is able to detect subsidence and lucency around fusion material as possible signs of pseudarthrosis [35]. On the downside, assessment can be complicated by artefacts when metallic cages

REFERENCE STANDARD
McMaster et al. 1980 [23] Slizofski et al. 1987 [24] Laasonen et al. 1988 [25] Brodsky et al. 1991 [15] Blumenthal et al. 1993 [26] Kant et al. 1995 [27] Larsen et al. 1996 [14] Albert et al. 1997 [28] Jacobson et al. 1997 [29] Bohnsack et al. 1999 [30] BranƟgan et al. 2000 [31] Carreon et al. 2007 [16] Fogel et al. 2008 [32] Carreon et al. 2008 [33] Quon et al. 2012 [34]  and/or instrumentation are used [14,32,33,35]. Technological improvements such as iterative reconstruction and dual-energy scanning are likely to improve accuracy [43]. Whether CT alone is sufficient for clinical decision-making is under debate. Choudhri et al. stated that multiple modalities should be considered for the noninvasive evaluation of symptomatic patients with suspected failure of spinal fusion [38]. US can demonstrate callus formation and bone healing [44,45]. Although the first study assessing the role of US for pseudarthrosis detection in ten patients seemed promising in 1997 [29], it has been the only study since. Pseudarthrosis diagnosis can also be based on abnormalities in bone metabolism. Studies on SPECT and planar scintigraphy were grouped together in meta-analysis since both modalities use 99m Tc-labeled phosphonates as tracer. 99m Tclabeled phosphonates are adsorbed onto or into the crystalline structure of hydroxyapatite to mark bone remodeling. With a pooled OR of 2.91, scintigraphy amounted to the lowest OR value after subgroup meta-analyses. An analog to 99m Tc-labeled phosphonates is 18 F-fluoride. Both tracers have similar uptake mechanisms [46] but 18 F-fluoride decays via positron emission and can therefore be imaged by PET. Compared to 99m Tc SPECT, 18 F-fluoride PET provides higher resolution, higher sensitivity, and better quantification capabilities [47]. PET combined with CT allows localization of abnormal uptake, which might enhance discriminative power [6]. Quon et al. evaluated PET/CT as index test for pseudarthrosis diagnosis [34]. The results seem promising but studies of higher methodological quality should be conducted to draw firmer conclusions on its value in pseudarthrosis diagnosis.
In the database search, one paper evaluating MRI [48] and one paper evaluating RSA as index test [49] were identified but not included. In MRI, bridging bone between endplates can be visualized [50] and changes in Table 3 Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive and negative likelihood ratios (LR+, LR-), prevalence of pseudarthrosis, accuracy ((true positive + true negative) / (total)) and OR values with corresponding 95% confidence intervals for the seven index tests   A strength of the present review was that the patient populations of the included studies resemble patient populations that would undergo these tests in clinical practice to either confirm or exclude pseudarthrosis, since all suffered from persisting or recurrent pain after spinal fusion. The methodological choice to only include studies that compared an index modality to the gold standard of surgical exploration was a strength on one hand since it is the most valid way to assess the diagnostic accuracy of a modality [14]. However, it was a weakness on the other hand, since it meant the exclusion of newer studies that evaluate state-of-the-art modalities. The study design of using surgical exploration as gold standard is no longer ethical or practical in clinical practice. As a result, the value of state-of-the-art modalities could not be discussed in this review and are still left to be evaluated. Another weakness of the study was that studies in meta-analysis, although relatively comparable, did show differences in spinal fusion technique, types of cages and instrumentation, imaging characteristics, pseudarthrosis definition, experience of the observers, and patient characteristics. Especially the time interval between spinal fusion and index test was highly variable between studies. Furthermore, the interpretation of index test results was incomplete in some studies. Imaging findings were reported but not always classified as either pseudarthrosis or fused. In these cases, the cut-off point was determined by the writers of this review, which is arbitrary, although not necessarily far from clinical practice. Studies also reported poorly on patient population inclusion criteria. Lack of information may have led to incorrect inclusion of studies in meta-analyses and weakens the findings of this review.
To conclude, with a pooled OR of 17.02, CT can be considered the most accurate non-invasive imaging modality for the detection of pseudarthrosis after spinal fusion from this review.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.