Main text

Introduction

Approximately 5–15% of patients with low back pain suffer from lumbar disc herniation (LDH) [1, 2]. LDH is the most common spine disorder requiring surgical intervention [3, 4]. Clinical guidelines recommend history taking and physical examination to rule out LDH diagnosis [4]. However, the diagnostic accuracy of both history taking and physical examination is still insufficient [5, 6]. Diagnostic imaging in patients with back pain and/or leg pain is often used to assess nerve root compression due to disc herniation or spinal stenosis and cauda equina syndrome [7,8,9,10]. Furthermore, diagnostic imaging can also be used to identify the affected disc level before surgery [11].

Diagnostic imaging can be done by Magnetic Resonance Imaging (MRI), Computed Tomography (CT), X-ray and myelography. Currently MRI is the imaging modality of choice, as it has the advantage of not using ionising radiation and has good visualizing capacities especially of soft tissue [9, 12]. CT is often used and available for detection of morphologic changes and has a well-recognized role in the diagnosis of herniated discs [13, 14]. Compared to MRI, CT is cheaper, the total testing time is shorter, and the availability of CT scanners is larger in hospital settings, but has the drawback of exposure to ionising radiation. Myelography involves injection of contrast medium in the lumbar spine, followed by X-ray, CT or MRI projections [15]. For certain conditions (e.g. metal implants or malalignment of the spine) myelography might replace MRI as the imaging modality of choice [16]. Plain radiography (X-ray) is the most commonly used technique due to its relative low cost and ready availability [9, 17,18,19].

However, the evidence for diagnostic accuracy of diagnostic imaging for LDH is still unclear [20, 21]. In addition, discordance between patients’ clinical findings and MRI findings is also reported [22, 23]. We have performed a large study evaluating the evidence om diagnostic accuracy of MRI and CT for all kinds of lumbar pathologies compared to various reference standards [12, 24]. The aim of the current review is to more specifically summarize and compare the evidence on the diagnostic accuracy of diagnostic imaging (CT, X-rays, myelography and MRI) identifying LDH in patients with low back pain and/or leg pain with surgery as a reference standard.

Methods

Design

A systematic review and meta-analysis, according to the guidelines of the Cochrane handbook of systematic reviews of diagnostic test accuracy studies [25]. The protocol was registered in PROSPERO (2015:CRD42015027687).

Search strategy

We conducted the search in MEDLINE, EMBASE, and CINAHL (untill 1 June 2017) without language restriction (see Appendix 1). The search strategy was designed in collaboration with a medical information specialist. In addition, reference lists of relevant review articles as well as all retrieved relevant publications on diagnostic test accuracy studies were checked to identify any potentially missed articles.

Study selection

We applied the following selection criteria: a) both prospective and retrospective cohort and case-control studies; b) adults with low back and/or leg pain with lumbar disc herniation as the suspected underlying pathology; c) Index tests were MRI, X-ray, myelography or CT; d) Reference standard was surgery; e) Data to generate 2 × 2 table; f) Published full reports, preferably in English, Dutch or German language.

We defined LDH as herniated nucleus pulposus, including protruded, extruded or sequestrated disc, causing nerve root compression. Two of the review authors (RvR/RO/BK/JHK/MB) independently selected first titles and abstracts and assessed relevant full papers. We used consensus to resolve disagreements; in cases of persisting disagreement a third review author (AV) was consulted.

Risk of bias assessment

Pairs of review authors (MvT/BK/RvR/JHK) independently performed risk of bias assessment using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 tool [26]. In the flow and timing domain, we considered a time period between index test and reference standard of 1 week or less appropriate. Risk of bias and concerns about applicability of each domain were classified as low, high or unclear risk. Consensus was reached by discussion of discrepancies between the two reviewers. If discrepancies persisted, we consulted a third reviewer (AV).

Data extraction

Pairs of review authors (MvT/BK/JHK/RvR) independently performed data extraction using a standardised form. We extracted data on author, year of publication and journal; study design and setting; study population; pathology considered, age, gender, numbers of subjects for inclusion in study and analysis, patient selection, level of measurement (patient or disc). Also, we obtained data on index and reference test characteristics; including type of test, year; methods of execution, cut- off values, positivity thresholds and outcome scales; diagnostic parameters; diagnostic two-by-two table or parameters to reconstruct this table.

Statistical analysis

For each included study we calculated sensitivity and specificity (and 95% confidence intervals (CI)) preferably on patient level data using the data from two-by-two tables. We conducted a meta-analysis separately for each of the index tests using a bivariate analysis. We chose the bivariate random-effects approach, because it incorporates both within and between study variation of sensitivity and specificity together with any correlation that might exist between sensitivity and specificity [27]. We present summary point estimates of sensitivity and specificity (and 95% confidence region) and the results were plotted in receiver operating characteristic (ROC) space [28]. When possible we generated linked ROC plots in case of pairs of diagnostic imaging tests, when both tests had been evaluated in the same study. Meta-regression was used to evaluate whether there is a difference in test accuracy between different imaging techniques or between patient level data and disc level data [29]. Analysis was carried out using STATA 13.1 software.

Two reviewers (JHK, AV) assessed the quality of the evidence for each index test using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group criteria [28, 30]. Disagreements were resolved by a third review author (MB/DvdW). The quality of evidence is categorized as high, moderate, low, or very low [31]. The quality of the evidence started at high and is reduced by one level for each of the following domains not met: limitations of the study design (> 25% of participants in studies with two or more domains with high risk of bias); indirectness (> 25% of participants in studies with serious applicability concerns); inconsistency (unexplained variation in sensitivities and specificities across the studies [32]); imprecision (wide confidence interval of the sensitivity and specificity in > 25% of the studies); and publication bias [33].

Results

Literature search

A total of 27,776 citations were obtained. Finally, 14 studies met our selection criteria (Fig. 1). No studies were excluded based on the language. Of these, nine studies investigated CT [34,35,36,37,38,39,40,41,42], eight myelography [34, 37,38,39, 41, 43], six MRI [36, 39, 43,44,45,46], and none assessed X-ray. All studies were performed in secondary care settings, such as neurological clinics or pain clinics; three studies [41, 43, 47] were retrospective (Table 1). All but one study evaluated old imaging techniques as they were published between 1982 and 1994, one study evaluating MRI was published in 2006 [46].

Fig. 1
figure 1

Flow chart of selected articles

Table 1 Study characteristics

Population

A total of 940 patients receiving surgery were included. Overall 1289 patients were involved in these studies but the reference standard was not performed in 349 patients. The patients (14 to 82 years) all had clinical findings consistent with LDH. Seven studies (n = 288) [34, 37, 41, 42, 44, 46, 47] were analyzed on patient level; others analyzed disc levels (Table 1).

Risk of bias

Although we only selected studies using surgery as a reference standard, none of the studies were assessed as having low risk of bias (RoB) related to the reference standard, mainly because it was unclear whether results of the reference standard had been interpreted without knowledge of imaging results (Fig. 2). Seven studies were considered to have high RoB related to patient selection, as patients had not clearly been selected using consecutive or random sampling. Only two studies reported a time-interval between index test and reference standard, which were 3 months and 9 months, respectively [44, 47].

Fig. 2
figure 2

Assessment of risk of bias for each included study

Diagnostic accuracy

Computed tomography

Nine studies, with four studies with measurements on patient level (327 patients) [34, 37, 41, 42] and a total of 578 discs explorations [35, 36, 38,39,40], were included. The mean prior probability of LDH was 72.0% (range 49.2–92.3%). The sensitivity and specificity ranged from 59 to 93% and from 45 to 100%, respectively (Fig. 3). The summary estimates were 81.3% (95%CI: 72.3–87.7%) for sensitivity and 77.1% (95%CI: 61.9–87.5%) for specificity (Fig. 4). We found no inconsistency as an inverse correlation between logit-transformed sensitivity and logit-transformed specificity was shown (estimate = − 0.2649). There were no differences in summary estimates for sensitivity and/or for specificity between patient level data and disc level data (chi-square = 2.52, 2df, P = 0.28).

Fig. 3
figure 3

Forest plot of the diagnostic accuracy of CT in the identification of lumbar disc herniation

Fig. 4
figure 4

Summary ROC plots of sensitivity and specificity of all studies

We found a moderate quality evidence (downgraded because of limitations in study design) for the accuracy of CT (Table 2).

Table 2 GRADE evidence for diagnostic accuracy of lumbar disc herniation

Myelography

Eight studies, with five studies with measurements on patient level (422 patients) [34, 37, 41, 42, 47] and a total 423 disc explorations [38, 39, 43], were included. The mean prior probability of LDH was 69.2% (range: 49.2–91.3%). The sensitivity and specificity ranged from 54 to 92% and from 50 to 89%, respectively (Fig. 5). We found a summary estimate of 75.7% (95%CI: 64.9–84.1%) for sensitivity and 76.5% (95%CI: 67.8–83.4%) for specificity (Fig. 4). We found no inconsistency (estimate = − 0.7644). There was a difference in summary estimate for sensitivity between patient level data (83.9% (95%CI: 76.4–89.3%)) and disc level data (61.1% (95%CI: 50.2–71.0%)) (chi-square = 9.23, 1df, P = 0.002), but not for specificity (chi-square = 1.26, 1df, P = 0.26).

Fig. 5
figure 5

Forest plot of the diagnostic accuracy of myelography

We conclude that there is moderate quality evidence for the accuracy of myelography (downgraded because of limitations in study design) (Table 2).

Magnetic resonance imaging

Six studies, with two studies with measurements on patient level (66 patients) [44, 46] and a total 299 disc explorations [36, 39, 43, 45], were included. In these studies the mean prior probability of LDH was 68.9% (range: 48.6–98.7%). The sensitivity and specificity ranged from 64 to 93% and from 55 to 100%, respectively with wide confidence intervals (imprecision) (Fig. 6). The summary estimate was 80.9% (95%CI: 68.8–89.1%) for sensitivity and 81% (95%CI: 59.2–92.6%) for specificity (Fig. 4). Because of a positive correlation between logit-transformed sensitivity and logit-transformed specificity (estimate = 0.5516) we decided that there was inconsistency. It was not possible to examine a difference between patient level data and disc level data in sensitivity and specificity.

Fig. 6
figure 6

Forest plot of the diagnostic accuracy of MRI

We conclude that there is very low quality evidence for the accuracy of MRI (downgraded by study design, inconsistency and imprecision) (Table 2).

Comparing imaging techniques

CT versus Myelography

Six studies evaluated CT and myelography (followed by plain radiography) in the same patient population and the linked results are plotted in ROC space (Fig. 7) [34, 37,38,39, 41, 42]. The summary estimate of sensitivity was 76.7% (95%CI: 66–84.8%) for CT and 74.4% (95%CI: 64.8–82.2%) for myelography. The summary estimate of specificity was 71.2% (95%CI: 55.2–83.2%) for CT and was 72.4% (95%CI: 62.5–80.4%) for myelography. These summary estimates were slightly lower compared to the ones based on all CT and myelography studies. We concluded that there is comparable accuracy for CT and myelography (chi square = 0.27, 2df, P = 0.87).

Fig. 7
figure 7

Summary ROC plots of CT versus myelography

CT versus MRI

Two studies evaluated CT and MRI (Fig. 8) [36, 39]. The summary estimate of sensitivity was 70.6% (95%CI: 49.5–85.5%) for CT and 80.0% (95%CI: 50.6–93.9%) for MRI. The summary estimate of specificity was 82.5% (95%CI: 63.3–92.7%) for CT and 93.5% (95%CI: 57.0–99.4%) for MRI. The results showed a comparable accuracy for CT and MRI (chi-square = 0.51, 2df, P = 0.78).

Fig. 8
figure 8

Summary ROC plots of CT versus MRI

Myelography versus MRI

Two studies evaluated myelography and MRI (Fig. 9) [39, 43]. The summary estimate of sensitivity was 55.3% (95%CI: 45.2–65.0%) for myelography and 67.4% (95%CI: 56.6–76.7%) for MRI. The summary estimate of specificity was 87.8% (95%CI: 79.7–92.9%) for myelography and 81.3% (95%CI: 69.4–89.3%) for MRI. These results indicate comparable accuracy for myelography and MRI (chi-square = 3.59, 2df, P = 0.17).

Fig. 9
figure 9

Summary ROC plots of myelography versus MRI

Discussion

We found 14 diagnostic accuracy studies including 940 patients and all evaluating rather old imaging techniques. Summary estimates of sensitivity and specificity of the different imaging techniques varied between 76 and 81%, with moderate to very low quality evidence. Furthermore, CT, myelography and MRI show comparable accuracy.

We found very low quality evidence for diagnostic accuracy of MRI. Even though MRI is more expensive, clinicians generally prefer MRI to CT, as it does not carry the risks associated with ionising radiation and unlike myelography, MRI is non-invasive [48]. MRI may also be more useful when surgical treatment is considered as it can identify tissue properties as well as anatomical structures [48]. These are most likely the reasons for suggesting MRI as the most appropriate test to confirm the presence of LDH in a recent guideline regardless its disappointing diagnostic accuracy.

Strengths and weaknesses

Heterogeneity arises from several reasons. First, imaging techniques used in studies included old ones like 0.5Tesla [44] or 0.35Tesla MRI [45]. In clinical practice the results of diagnostic imaging are interpreted with knowledge of history items and physical examination. Furthermore, clinicians frequently state that imaging does not play a crucial role in predicting prognosis or deciding on a management strategy among patients with LDH [4]. This might be one of the reasons why there are no recent studies on the diagnostic accuracy of imaging techniques for detecting LDH. However, older techniques will probably identify less underlying causes of back pain than newer imaging techniques. Evaluation of diagnostic accuracy of advanced diagnostic equipment is therefore needed. Second, the included studies focussed on LDH, but classification of this pathology differed between studies [49]. For example, some studies defined LDH as protruded, extruded, and sequestrates disc [38, 39], but other studies were defined LHD as the presence of neuronal compression [35, 36, 42, 46]. There were some studies without a definition of LHD [37, 40]. Third, we combined disc level data with patient level data. Results at disc level including more than one disc level in the same patient may lead to smaller confidence intervals and possibly to an overestimation of diagnostic accuracy. Unexpectedly, confidence intervals were often wider in disc level data compared to patient level data. Fourth, the diagnostic accuracy in this study was possibly overestimated by a high prior probability (48.6 to 98.5%) of LDH. It was reported that about 4% of patients who present with low back pain in a primary care setting have a disc herniation [8]. The high prior probability results in selection bias. Furthermore, patient selection was unclear in many studies. This is important since the interpretation of the test result (posterior probability) depends on its sensitivity and specificity as well as the probability of the disease [50]. Lastly, the use of surgery as a reference standard can easily bias the results due to partial verification [51]. Surgery is often regarded as the best available reference standard. Not everyone is subjected to surgery but only those patients with a very strong suspicion based on clinical symptoms combined with the results of the diagnostic imaging of LDH which leads to (partial) verification bias. In this review, among 669 patients with suspected LDH, 349 (52.2%) patients did not undergo surgical treatment in seven studies [34, 36, 37, 43, 45,46,47]. Verification bias can lead to an increased diagnostic accuracy of the index test; i.e. it will show an increased sensitivity.

As far as we know, this is the first meta-analysis comparing diagnostic accuracy between different techniques in low back and/or leg pain with LDH as the suspected underlying pathology.

Implications

Concerning practice we conclude that the diagnostic accuracy of today’s imaging techniques in unknown. This severely hampers the choice of techniques as well as the interpretation of the outcomes as no information is present concerning false positives or negatives. Future research should focus on the diagnostic accuracy of frequently used imaging techniques (diagnostic test accuracy studies) and on the place of diagnostic imaging within the clinical pathway (diagnostic modelling).

Conclusion

In conclusion, we found no studies evaluating modern diagnostic imaging techniques. For the older techniques we found moderate quality evidence for moderate diagnostic accuracy of CT and myelography, and very low quality evidence for moderate diagnostic accuracy of MRI in patients with suspected lumbar disc herniation. The accuracy of CT, MRI and myelography is comparable.