Introduction

Low back pain (LBP) is responsible for more years lived with disability than any other health condition worldwide [1, 2]. Current literature describes a prevalence ranging between 1.4% and 20% depending on which definition of LBP is used [3]. In the Netherlands, approximately 44% of the population experiences at least one episode of LBP in their lifetime with one in five reporting persistent back pain lasting longer than three months, defined as chronic low back pain (CLBP) [4]. LBP often results in substantial limitations in functional activities and is responsible for high healthcare and socioeconomic costs [5, 6]. In the vast majority of patients with LBP (85–90%) [7] the etiology is unknown, and it is challenging for medical specialists to identify patients who would benefit from either surgical or non-surgical interventions.

Although magnetic resonance imaging (MRI) of the lumbar spine is frequently performed in patients with LBP, appropriate use and interpretation in patients with LBP remains controversial [8, 9]. There are many different image features visible on MRI that could relate to LBP. Conversely, multiple etiologies do relate to spinal degeneration but do not actually cause the perceived pain[10]. Potential causes of LBP and the corresponding image features can be divided into five categories:’discogenic’, ‘neuropathic’,’osseous’, ‘facetogenic’, and’paraspinal’ [11]. The currently available literature is mainly focused on only one of these image feature categories, and thus each study or review addresses only part of the possible etiologies. An overview with a broader scope in which all possible lumbar MRI features are related to LBP is lacking. Radiologists, spinal surgeons, and other clinicians could benefit from such an overview in their reporting and decision-making. The multitude of small-scale studies on single image features makes it currently difficult to oversee which study results are reliable and which features are of clinical relevance and supported by evidence.

The purpose of this review was to evaluate a wide range of image features and the available evidence for a relation with LBP across five feature categories (i.e.’discogenic’, ‘neuropathic’,’osseous’, ‘facetogenic’, and’paraspinal’) using separate literature searches. In light of the large number of image features and corresponding literature, the focus of this review was to include large-scale, high evidence studies. This review provides a comprehensive overview and discussion of relevant LBP image features.

Methods

This study was designed as a narrative review and follows the quality assessment guidelines of the Scale for the Assessment of Narrative Review Articles (SANRA), a brief critical appraisal tool for the assessment of non-systematic articles [12]. To provide this overview the project team, consisting of orthopedic spine surgeons, musculoskeletal radiologists, and a methodologist created an initial list of image features by reviewing relevant literature. An image feature was defined as a pathology or degenerative process visible on lumbar spine MRI images that possibly relate to LBP. The resulting 29 image features (Table 1) were categorized as’discogenic’, ‘neuropathic’,’osseous’, ‘facetogenic’, or’paraspinal’ [11].

Table 1 List of the 29 image features that were included in this review

Search strategy

A separate literature search was performed for each image feature in MEDLINE. The queries consisted of search terms related to magnetic resonance and LBP, which were used in all queries, and search terms related to a specific image feature (Appendix, Table 4). The feature-specific search terms were constructed from keywords and synonyms related to that specific feature.

Study selection

To ensure that only relevant publications were included, studies were screened for eligibility on predefined selection criteria. First, all search results were filtered using three inclusion criteria: (1) published in or after the year 2000 to ensure that subjects were scanned with modern MRI systems, (2) written in English, and (3) studies involving human subjects, which excluded all animal studies. Publications that met these criteria underwent title and abstract screening. Publications were included for full-text screening when the research aim was to directly assess the relation between one or multiple image features and any form of LBP. Although reviews were not included, publications that were referenced in these reviews were considered for inclusion. Full-text publications were excluded based on two criteria. First, studies with fewer than 100 subjects were excluded to limit our review to larger population-based studies that provide more robust evidence than smaller observational studies. Second, studies that reported on MRI scans acquired with a field strength of less than 1.5 T were excluded to ensure that there were no large differences in image quality and to match clinical practice where 1.5 T and 3 T are most common. Relevant findings from the remaining studies were entered into a data extraction table.

Data extraction

Data extracted included the year of publication, study design, type and duration of low back pain (current LBP (no period specified), CLBP, LBP for a specified period, disabling LBP, sciatica, radiculopathy, and all types of LBP), number of included subjects, main finding per separate image feature (e.g., the odds ratio and the related p-value), and a GRADE score (1–4). Data extraction was performed by one reviewer (JG) and checked by a second (MH).

Best evidence synthesis

The GRADE scoring system was used to assess the certainty of the evidence of each study [13]. GRADE consists of four categories to indicate the quality of evidence: high (GRADE 1), moderate (GRADE 2), low (GRADE 3), and very low (GRADE 4). The study design determines the initial quality of evidence. Randomized trials start in the high category (GRADE 1) and observational studies start in the low category (GRADE 3). The initial GRADE score decreases when finding risk of bias, imprecision, inconsistency, indirectness, or publication bias and increased when finding a large magnitude of effect and when all residual confounding would decrease the magnitude of effect.

We classified an association of an image feature with LBP as insufficiently evident when either no studies or only one GRADE 3 study met the inclusion criteria. When at least one GRADE 2 study or more than two GRADE 3 studies met the criteria, we considered the association sufficiently evident. Subsequently, when no studies or only one GRADE 3 study was included, it was labeled as “insufficient amount of evidence”. Best evidence synthesis was performed by two reviewers (JG and MH).

The agreement between different studies reporting on the same image feature was summarized by calculating the average evidence agreement (EA) per feature. The EA was defined as the percentage of studies that did find a relation between a feature and LBP compared to the studies that did not. The results of each study were categorized as either positive association, no association, or mixed results. The results of a study were labeled as mixed results when were inconclusive, such as different findings for different types of LBP. The average EA per image feature was calculated by dividing the number of studies with a positive association by the total number of studies. Studies with mixed results were counted as half a positive study. Consequently, when all studies of a specific image feature showed mixed results the EA was 50%. Based on the EA, all image features were divided into three categories to give an overall summary of the collected evidence: positive association (EA ≥ 67%), mixed results (33% < EA < 67%), and no association (EA ≤ 33%). No association meant that the included studies were inconclusive, and therefore no association was found.

Results

All searches combined generated a total of 4472 hits, including duplicates. After title and abstract screening, 251 were assessed in full text. A total of 31 studies met the selection criteria and were included in this review (Fig. 1). In Table 2 an overview of study characteristics is shown. Twelve studies were rated with a GRADE score of two while the remaining 19 studies had a GRADE score of three.

Fig. 1
figure 1

Flowchart of the selection of studies

Table 2 Overview of the characteristics of included studies

An overview of the best evidence synthesis per image feature is shown in Table 3. Eleven out of the 29 image features showed a positive association (EA ≥ 66%; Modic changes in general [67%], Modic changes type I [100%], disc narrowing [67%], endplate defects [67%], Pfirrmann grade [67%], disc herniation [100%], disc extrusion [100%], ligamentum flavum hypertrophy [100%], central spinal canal stenosis [70%], nerve compression [100%], and muscle fat infiltration [67%]). Two image features appeared to have mixed results (EA between 33–66%; spondylolisthesis [40%], and paraspinal muscle cross-sectional area [50%]). Seven out of the 29 image features showed no association (EA ≤ 33%; Modic changes type II [33%], Modic changes type III [0%], disc bulging [0%], high intensity zone [31%], disc protrusion [0%], foramen stenosis [33%], and facet fluid sign [25%]).

Table 3 Best evidence synthesis per image feature

For nine image features the evidence was inconclusive (Table 3). For these features, an insufficient amount of evidence was found meaning either no studies were found that met our selection criteria or only one study with GRADE 3 was included. These features were black discs, synovial cysts, epidural fat, facet tropism, facet arthrosis, facet hypertrophy, osteomyelitis, and vertebral body infarction.

Discussion

The purpose of this review was to create an overview of the available evidence on the relation between certain MRI image features and low back pain (LBP). The use of MRI in clinical decision making for patients with LBP is debatable. Especially in the older population, an MRI scan of the lumbar spine is almost certain to show degenerative changes of the spine. There is an abundance of small cross-sectional studies that evaluate a small set of image features, while large population-based studies of high methodological quality are scarce. A complex multifactorial disorder such as LBP demands high-quality research which addresses potential causes of LBP and their relation to MRI image features. In this narrative review, the relevant features were carefully reviewed and compared to the currently available literature. Of the 29 considered features, categorized as discogenic, neurogenic, osseous, facetogenic, and paraspinal, 11 features showed a positive association, 2 showed mixed results, 7 showed no association with LBP, and 9 had inconclusive evidence. The relations between features and their associated pain mechanisms were evaluated to provide an overview of features that have the highest probability to be related to LBP. All features are discussed separately per feature category.

Discogenic features

One of the major sources of discogenic pain is disc degeneration, which is a complex process [11]. In a meta-analysis performed by Brinjikji et al. [45], the authors found a strong association between disc degeneration and LBP in the adult population 50 years of age or younger. Disc degeneration is commonly classified with the Pfirrmann grading system [46]. In this review three studies on Pfirrmann grading showed an EA of 67%, resulting in a positive association with LBP. The single paper in which no relation was found between Pfirrmann grading and LBP, initially did show a significant relation [21]. However, when adjusted for Modic changes and disc protrusion/hernia, this relation ceases to be significant, illustrating the complex relationship between disc degeneration and LBP. Since the Pfirrmann grade is determined by evaluating the disc height and signal intensity of the nucleus pulposus it can be expected that these features also have a high EA. For disc height as an individual image feature, we found an EA of 67% and for low signal intensity of the nucleus pulposus (black disc), we found limited evidence for a relationship with LBP. This might raise the question of whether only disc height would be sufficient in evaluating disc degeneration. However, the presented evidence in this study on black discs was insufficient, leaving this question unanswered.

Disc bulging is related to disc degeneration and reduced disc height, yet this feature showed an EA of 0%. Only two studies were included on disc bulging, both stating no relation between this image feature and LBP [18, 19]. However, in the meta-analyses of Brinjikji et al. the results contradict this finding [45]. The authors included three studies on disc bulging, using different inclusion and exclusion criteria, and concluded that a strong association exists between disc bulging and LBP [47,48,49]. However, in two of these studies, only adolescent patients were included [48, 49]. It is known that adolescents have a significantly lower prevalence of disc bulging compared to adult populations [50].

Annular fissures, often reported as high intensity zones (HIZ) in the annulus, are related to both disc degeneration and disc bulging as well [51, 52]. Eight studies on annular fissures and HIZ were included with an overall EA of 31%. The recently published systematic review by Teraguchi et al. (1541 included patients) and the narrative review by Cheung et al., both stated there is no uniform consensus in the present literature, which is in line with the results we present [51, 53]. However, annular fissures can be difficult to identify in MR images, especially when the image quality is low. Berger-Roscher et al. showed that MR imaging systems with varying field strengths demonstrated differences in their ability to visualize annular fissures and concluded that clinically used systems often do not visualize annular fissures well[54]. This limitation of MR imaging systems currently used in clinical practice may be the cause of inconsistent results in different studies. Further research with high resolution MR images is necessary to determine whether a relationship between annular fissures and LBP exists.

Neurogenic features

Neurogenic features are mostly associated with different types of neuropathic pain such as radiculopathy or sciatica. Even though our search strategy specifically targeted research that included LBP, studies that reported neuropathic pain were also included. The EA was based on all included studies, regardless of the reported type of pain. Of all 31 included studies, one [28] focused on neuropathic pain only, and eight [15, 16, 18, 21, 22, 37, 42, 43] studies focused on both neuropathic pain and LBP.

In total six studies were included on either disc herniation, disc extrusion, and disc protrusion (EA 100%, 100%, and 0% respectively) or a combination of the three. The difference between disc extrusion and protrusion is explained by the mechanism of pain. Both often cause neuropathic pain due to nerve compression [11]. Disc extrusion is distinguished from protrusion by the damage or tear of the annulus. This in itself triggers an immune response and inflammation which can induce additional LBP [11]. The pain mechanisms of all other neurogenic image features are either mechanical pressure of tissue nerve roots or nerve stimulation by proinflammatory cytokines and neurotransmitters [11]. Of these features only spinal canal stenosis (EA of 70%), nerve compression (EA of 100%) and ligamentum flavum hypertrophy (EA of 100%) had an EA higher than 67%.

Surprisingly, foraminal stenosis showed no association with LBP (33%) even though the same pain mechanisms can be at play as with spinal canal stenosis and nerve compression. Pain symptoms may have a different distribution depending on the type and location of neurovascular compression [55]. Since foraminal stenosis causes compression of specific nerves, symptoms are usually similar to radiculopathy [11]. In total three studies on foraminal stenosis were included in the current review. Two studies showed no association with LBP and evaluated LBP specifically without looking at any types of neuropathic pain [19, 25]. The third included study, written by Janardhana et al. (2010), evaluated LBP as well as neuropathic pain and showed a positive association with pain in general [18]. This illustrates the difference between LBP and neuropathic pain and might explain the different findings within neurogenic image features.

Osseous features

Osseous features can be divided into two different categories. Firstly, features related to pathologies within the bone that cause pain, such as Modic changes and endplate defects. Secondly, structural abnormalities causing instability or spinal compression such as spondylolisthesis or spondylolysis.

Modic changes are often reported and used in clinical decision making, yet their etiology is not well understood [11]. Our findings show that both Modic changes in general (67%) and type I changes (100%) are related to LBP. However, type II (33%) and III (0%) changes showed no association with LBP. With type I changes the general agreement is that pain is induced by an inflammatory reaction, but the cause of inflammation is still debated [11, 56, 57]. This relation with LBP is mostly acknowledged [58], which does not apply to type II and III Modic changes [11]. This is also supported by the findings of this review. Similarly, Brinkjikji et al. reported a significant association between Modic type I and LBP while Modic changes as a whole (Modic 1–3) did not have an association with LBP [45]. In contrast, Herlin et al. stated that the associations between Modic changes and LBP-related outcomes are inconsistent in current literature [59].

Endplate defects include Schmorl nodes, fractures, avulsions/erosions, and calcifications which all have been considered to be clinically associated with LBP [57]. Also, endplate defects identified using discography prove to be associated with LBP [60]. This corroborates with the EA of 75% for the association between endplate defects and LBP found in our study. These findings are in line with a recently published systematic review, which provides moderate quality evidence of an association between structural endplate defects and LBP [61]. More specifically, consistent evidence was shown for an association between erosion, sclerosis and Schmorl nodes, and LBP [61]. Moreover, Schmorl nodes are known to be associated with Modic changes and other degenerative features [57].

Spondylolisthesis is widely used in clinical decision making. However, in our study we included five studies showing a combined EA of 40% for LBP. Two of these studies showed no association with LBP specifically [19, 20], whereas the remaining three studies showed either a positive association or mixed results while including both LBP and neuropathic pain [15, 17, 22]. Both Ishimoto et al. and de Schepper et al. found no association with LBP and a positive association with radiating pain [15, 17]. These results are supported by the review of Gagnet et al. in which radiculopathy was mentioned as the most important symptom of spondylolisthesis [62]. The mixed result found in our study is therefore likely due to our search strategy, which was predominantly focused on LBP. Spondylolysis, which is often associated with spondylolisthesis, showed insufficient evidence. While often asymptomatic, patients with symptoms often suffer from LBP [62, 63]. The diagnosis of spondylolysis is rarely performed using MRI images since cortical bone is hypointense in both T1 and T2 weighted images. Therefore, fracture of the vertebral arch is mainly assessed using X-ray or CT images, which might explain the inconclusive evidence found in this review.

Facetogenic features

An estimated 15% of all LBP patients are caused by facetogenic pain [64, 65]. Pain is often caused by inflammation of the zygapophysial joint, which can directly cause pain [64]. This inflammation also causes swelling which possibly induces neurovascular compression [64]. In this review a total of three studies were included [19, 25, 34], which resulted in insufficient evidence on all facetogenic features except for facet fluid sign.

One paper showed no association between LBP and facet fluid sign [34] and one showed mixed results [25]. This feature shows an accumulation of fluid within the facet joint and is often associated with vertebral instability and spondylolisthesis [66]. However, the study performed by Shinto et al. reports no significant relation between facet effusion and L4-spondylolisthesis, which contradicts its relation with vertebral instability [34]. The authors also found that the prevalence of facet effusion did not increase with age, questioning its relation with degenerative changes.

Similar to spondylolysis, facetogenic pain is often associated with neuropathic pain, which was not specifically targeted by our search strategy. Furthermore, cortical bone, and therefore the zygapophysial joint, are better visualized in CT images.

Paraspinal features

The paraspinal features reviewed in this study focused on the paraspinal muscles and consisted of muscle fat infiltration (FI) (EA of 67%) and muscle cross-sectional area (CSA) (EA 50%). FI is related to muscle activity whereas CSA relates to muscle strength. The paraspinal muscles stabilize the spine and therefore protect spinal structures from potentially damaging stresses [67]. It is still debated whether paraspinal muscle dysfunction could be a primary cause of LBP.

In total four studies were included in this review [28, 32, 35, 41] of which three had mixed results since they included different muscles in their research, i.e. Multifidii, Erector spinae, psoas, and Quadratus lumborum. When separating the results per muscle group the multifidus muscle showed a positive association (67%) between FI and LBP, and mixed results (50%) between CSA and LBP. Furthermore, the erector spinae only showed a positive association (100%) between FI and LBP while all other muscles showed no association with either feature. This is supported by the review by Ranger et al. that showed mixed results as well, with only a strong relation between multifidus CSA and LBP [67]. All other muscles showed either conflicting evidence or no association between the two image features and LBP.

Nevertheless, the discussion remains on whether pain is generated by a certain muscle morphology or vice versa. Suri et al. systematically reviewed the relation between paraspinal muscle characteristics and future LBP. The authors found an association with both multifidus CSA and erector spinae FI, and future LBP while all other muscle characteristics had limited evidence or no association with LBP [68]. On the other hand, Cooley et al. found that neurocompressive disorders seem to alter muscle morphology at or below the affected level [69]. They conclude that muscle characteristics change due to present radiculopathy instead of the other way around. These contradicting results illustrate the lack of evidence on a causal relation between these paraspinal image features and LBP.

Overall findings

The purpose of this review was to create an overview of the available evidence on the relation between certain MRI image features and LBP. Ideally, this results in a list of features that are strongly associated with LBP, which will benefit radiological reporting and clinical decision making. Due to the intricate and conflicting nature of this research field, caution is warranted when drawing conclusions from the literature. In this narrative review, the relevant features were carefully reviewed and compared to the currently available literature. The various relations between MRI features and their associated pain mechanisms were evaluated to provide a list of features that have the highest probability to be related to LBP. This list comprises of type I Modic changes, disc degeneration, endplate defects, disc herniation, spinal canal stenosis, nerve compression, and muscle fat infiltration. Furthermore, this review reveals the gaps in the current literature and points out features for which robust evidence is lacking and for which large-scale studies are needed.

Limitations

The use of MRI in patients with LBP remains a controversial discussion and a narrative review with such a broad scope inherently has limitations. The first limitation is the distinction between different types of pain. There is a wide variety in types and duration of LBP as reported by the included studies as shown in Table 2. This study is focused on LBP, making it difficult to draw concise conclusions. Particularly, neurogenic features are mostly associated with neuropathic (radiating) pain, which was not specifically targeted in the search strategy. As many of the studies did not make a distinction between LBP and radiating pain, we chose to include all types of reported pain to create a clear overview of the present literature.

Second, the quality of evidence was assessed using the GRADE score, as part of the evidence synthesis [13]. All included studies were initially scored as GRADE 3 since no studies with an experimental design were included. This score was only upgraded to a GRADE 2 when either a large effect was shown or a large patient group was included. The way this scoring system was implemented therefore does not accurately assess the quality of evidence since it is only affected by effect size. Consequently, no weighing of features was performed in this review. However, it was still used to determine whether a sufficient amount of evidence was included to draw any conclusions.

Third, an evidence agreement of 67% or higher was considered as sufficient evidence for a positive association between LBP and an image feature. Even though this percentage might seem low, it still depicts the trend in current literature. Also, only 31 studies were included of the combined 4472 hits found with the different searches. This large number of hits also includes all the duplicates which was inevitable with our chosen method. Nevertheless, only a small number of studies met our inclusion criteria, which were deliberately defined such that studies with small sample sizes were excluded. This enabled this review to be more concise, keep the number of included papers at a reasonable level, and only focus on papers with a higher level of evidence. Our findings also display the limited amount of studies on MRI and LBP with more than 100 subjects included, indicating the need for larger studies in this field.

Lastly, as a criterion for this narrative review we included studies published after 2000. Consequently, relevant studies published before the year 2000 might have been excluded. The methods and technology used in medical imaging have improved significantly in the past 20 years, which could affect the accuracy and reliability of older studies, and with that the results in this review. However, excluding studies in which strong relationships between certain image features and LBP have been established before 2000 might have resulted in false negative results in our review. Comparison of the results with existing literature regardless of publication date mitigates this limitation by ensuring that relevant studies published before the year 2000 were discussed.

Conclusion

This study resulted in a comprehensive overview of which diagnostic lumbar MRI image features have the highest probability of a strong relationship with LBP. These features include type I Modic changes, disc degeneration, endplate defects, disc herniation, spinal canal stenosis, nerve compression, and muscle fat infiltration. Also, several features which are generally associated with LBP show no apparent relationship with LBP. This overview supports interpreting lumbar MRI scans of LBP patients and shows which features are related to the perceived pain. The findings of this review can be used to improve clinical decision-making for patients with LBP based on MRI images.