Measuring and reporting of vertebral endplate bone marrow lesions as seen on MRI (Modic changes): recommendations from the ISSLS Degenerative Spinal Phenotypes Group

The positive association between low back pain and MRI evidence of vertebral endplate bone marrow lesions, often called Modic changes (MC), offers the exciting prospect of diagnosing a specific phenotype of chronic low back pain (LBP). However, imprecision in the reporting of MC has introduced substantial challenges, as variations in both imaging equipment and scanning parameters can impact conspicuity of MC. This review discusses key methodological factors that impact MC classification and recommends guidelines for more consistent MC reporting that will allow for better integration of research into this LBP phenotype. Non-systematic literature review. The high diagnostic specificity of MC classification for a painful level contributes to the significant association observed between MC and LBP, whereas low and variable sensitivity underlies the between- and within-study variability in observed associations. Poor sensitivity may be owing to the presence of other pain generators, to the limited MRI resolution, and to the imperfect reliability of MC classification, which lowers diagnostic sensitivity and thus influences the association between MC and LBP. Importantly, magnetic field strength and pulse sequence parameters also impact detection of MC. Advances in pulse sequences may improve reliability and prove valuable for quantifying lesion severity. Comparison of MC data between studies can be problematic. Various methodological factors impact detection and classification of MC, and the lack of reporting guidelines hinders interpretation and comparison of findings. Thus, it is critical to adopt imaging and reporting standards that codify acceptable methodological criteria. These slides can be retrieved under Electronic Supplementary Material.


Authors
Fields, Aaron J Battié, Michele C Herzog, Richard J et al.
parameters also impact detection of MC. Advances in pulse sequences may improve reliability and prove valuable for quantifying lesion severity.
Conclusions: Comparison of MC data between studies can be problematic. Various methodological factors impact detection and classification of MC, and the lack of reporting guidelines hinders interpretation and comparison of findings. Thus, it is critical to adopt imaging and reporting standards that codify acceptable methodological criteria.

Keywords
low back pain; Modic changes; bone marrow lesion; endplate damage; magnetic resonance imaging What are Modic changes and why are they important? Modic et al. [1] and de Roos et al. [2] separately observed that the bone marrow near damaged endplates in LBP patients had a distinct appearance when viewed with MRI. Compared to the adjacent normal bone marrow, the endplate lesions with active inflammation and fibrovascular replacement of the hematopoietic marrow appeared hypointense on T1-weighted images and hyperintense on T2-weighted images (type 1 changes; MC1); the endplate lesions with fatty replacement of the marrow appeared hyperintense on T1-weighted images and isointense to slightly hyperintense on conventional T2-weighted images (type 2 changes; MC2). A third type of endplate lesion, characterized by sclerotic subchondral bone, appeared hypointense on both T1-and T2-weighted images (type 3 changes; MC3) [3]. The etiology, risk factors, and management of Modic changes are the focus of a large body of work that has been reviewed elsewhere [4][5][6][7][8]. Briefly, it appears Modic changes may be either transient or permanent stages of a chronic pathologic process [8][9][10][11]; they associate with age and disc degeneration [12,5]; and they predominate in the lower lumbar spine (L4-S1) [12]. Perhaps most importantly, systematic reviews suggest a positive association between LBP and Modic changes [5,6].
Yet, the clinical relevance of Modic changes varies widely between studies [4,5,13], and controversies remain. Such inconsistencies may be due in part to misclassification and to imprecision in reporting of Modic changes. While some investigators follow standardized imaging and grading procedures, many utilize equipment, imaging sequences or reporting methods that can inadvertently misrepresent Modic changes. Also, the frequent omission of basic methodological details makes study comparisons difficult. As a result, discrepancies between study findings may reflect differences in imaging procedures, particularly between older and newer imaging technologies. This issue is becoming more apparent and more important as clinical interest in Modic changes grows. Thus, we recommend that investigators reporting Modic change data follow simple reporting guidelines (Box 1). These guidelines are not meant to constrain investigators to one particular methodology; instead, they are meant to ensure that the methodological details that impact the appearance and detection of Modic changes are clearly reported.

Identifying Modic changes: the high cost of misclassification
Misclassification prevents a clear understanding of the clinical significance of Modic changes and is a potentially important source of discrepancy between studies. The conventional MRI assessment of Modic changes amounts to a binary classification test. Owing to the difficulty of measuring test performance against histopathologic findings as ground truth, prior studies compared Modic classification to clinical pain diagnoses from provocative discography. Although the usefulness and safety of provocative discography are controversial [14,15], the accuracy of provocative discography can be quite high (specificity of 0.94 and false-positive rate of 6%) if performed using a low-pressure technique [16]. A review of the performance of Modic classification shows that high specificity may underlie the significant associations between Modic changes and LBP. Specifically, among six studies that reported the diagnostic performance of Modic classification for identifying a painful disc concordant with a positive discogram, the specificity was over 95% [17][18][19][20][21] in all but one [22] (Table 1). In other words, among patients with chronic LBP receiving discography, observation of a Modic change has a low false positive rate at a given spinal level. The specificity may be higher for MC1 than MC2 or MC3, although it is difficult to compare lesion types since some lesions show mixed elements of both MC1 and MC2 or MC 2 and MC 3 [23].
Conversely, the low and variable sensitivity of Modic classification for detecting discography-concordant pain may contribute to the weak associations with symptoms and to inter-study variability. For example, among the same six studies, the sensitivity of Modic classification for identifying a painful disc was less than 50%; moreover, it was highly variable between studies, ranging from 14% to 48% (Table 1). Thus, the absence of a Modic change is not sufficient for ruling out pain at a given spinal level.
The low and variable sensitivity can potentially reflect the existence of other pain generators. It may also be that poor sensitivity of Modic changes is related to limited spatial resolution, signal-to-noise, or lack of fat-saturation in the MRI exam. For example, one study reported that only 11% of fibrovascular marrow lesions and 62% of fatty marrow lesions identified on histologic sections were visible with a conventional imaging protocol [24]. As we discuss in the next section, this finding underscores the importance of reporting technical imaging parameters when interpreting the reported associations between Modic changes and LBP (Box 1).
Finally, the imperfect accuracy, or reliability, of Modic classification directly impacts diagnostic sensitivity and thereby contributes to inter-study variability. Jones et al. [25] reported good inter-rater reliability (κ = 0.85) for 5 raters, with repeatability for the individual raters ranging from κ = 0.71-1.00. Other studies reported modest to good agreement between raters (κ = 0.64-0.85 [26,12,9,27,28]; Table 2). The impact of rater agreement and its variation between studies highlights the need to measure and report interrater values (Box 1).
To emphasize the impact of imperfect reliability, consider the hypothetical effects of false negative classifications on the relationship between Modic changes and LBP ( Figure 1). In three scenarios with different sample sizes, rater disagreements resulting in false negative classifications were used to calculate inter-rater reliabilities and odds ratios (ORs). For a given sample size, ORs were highly sensitive to inter-rater reliability, especially for interrater agreements below κ = 0.80. Moreover, for the average inter-rater agreement of the six studies reported in Table 2 (κ = 0.788), associations between Modic changes and LBP ranged from a mean OR = 3.83 (CI 0.43-34.85, not significant) to a mean OR = 9.99 (CI 2.14-52.16, p < 0.001). Together these data suggest that imperfect reliability of Modic classification could be source of discrepancies between studies. Reliability is expected to improve by implementation of standardized imaging protocols and adoption of quantitative classification schemes that are based on a continuous rather than categorical measurement scale, thereby increasing sensitivity, and bringing much-needed clarity to the complex relationship between endplate bone marrow lesions and LBP.

The importance of MRI field strength, pulse sequence, and fat suppression
The ability for MRI to interrogate the endplate bone marrow is limited by instrument precision. MRI precision is influenced by field strength, acquisition matrix, and pulse sequence parameters because these factors directly affect image resolution, signal-to-noise, and contrast-to-noise ratios. In addition, field strength and fat saturation (for T2) also alter the T1 and T2 relaxation times and tissue visualization. All of these effects impact the appearance of water and fat signals, which are used to distinguish MC1 from MC2. For example, when older magnets with field strengths less than 1.0T are used to image the vertebral bone marrow, they tend to greatly mitigate chemical shift, susceptibility artifacts, and flow artifacts compared to 1.5T magnets. This improves the clarity of MC1, but makes it more difficult to distinguish MC2. Conversely, the 1.5T magnets make it more difficult to identify MC1 because marrow inhomogeneities are more pronounced compared to magnets with field strengths less than 1.0T [29]. Newer magnets with field strengths of 3.0T or higher have better conventional fat saturation than 1.5T magnets [30]. This reduces the tendency to overlook the marrow edema that occurs with MC1, although it is unclear whether this results in a clinically important difference. Variable field strengths also cause discrepancies in the observed frequency of different types of Modic changes. For example, whereas MC1 were 3-4 times more prevalent on 0.3T scanners, MC2 were identified twice as often with 1.5T scanners [29]. Generally speaking, the higher spatial resolution and greater dynamic range and signal-and contrast-to-noise ratios of 1.5T and 3.0T magnets, as compared to weaker magnets, make it easier to identify Modic changes. Thus, magnetic field strength should be reported in studies of Modic changes (Box 1).
Pulse sequence parameters also play a key role because they can dramatically alter the marrow signal. On T1-weighted images, fat is bright and fluid is dark. This makes fat conspicuous and helps radiologists identify boundaries between bone marrow lesions and normal marrow, which is useful for studying the bone marrow. Also, the short echo time of T1-weighted images provides a high intrinsic signal-to-noise ratio, which enhances anatomic detail. On T2-weighted images, fluid appears bright and fat is variable, depending on whether the images are acquired as conventional spin echo (moderate-to-high fat signal) vs.
fast spin echo (high fat signal and MC2 both were defined as appearing hyperintense on T2-weighted MRI. However, many investigators now use fat-saturated T2-weighted sequences, which cause MC1 to appear hyperintense and MC2 to appear hypointense ( Figure 2). Spectral fat suppression has the added benefit of increasing the dynamic range, and if combined with modest echo times of conventional T2-weighted sequences (60-80 ms), it can do so while preserving anatomic detail. The efficiency of fat suppression for Modic classification depends on the field homogeneity, which is better in the center of the magnet bore and improves with higherorder shimming. Spectral fat suppression can only be effectively used at field strengths greater than 1.0T. A number of fat suppression sequences are often used to supplement T1and T2-weighted sequences, including chemical shift imaging sequences (such as Dixon) and Short T1 inversion recovery (STIR) sequences. STIR sequences can be used at low field strengths, and they provide more uniform fat saturation than fast spin echo sequences.
Depending on the type of spectral fat suppression, STIR sequences can be less susceptible to magnetic field inhomogeneities. Authors should report the type of fat suppression used (Box 1).

Advances in the classification and detection of endplate bone marrow lesions
Recent advances in image analysis techniques may improve the reliability of Modic classification while at the same time provide quantitative methodologies that are needed to evaluate treatments. These advanced techniques are exploratory, with small studies performed mostly at single imaging centers. One strategy involves contouring the shape of the Modic change, which allows classification to be based on continuous measurements of size rather than on "present or absent" categorizations ( Figure 3). For example, Wang et al. [28] reported outstanding intra-and inter-rater reliability values for three quantitative indices measured from manual contours of mid-sagittal slices: affected/unaffected vertebral area ratio (intra-rater: 0.96; inter-rater: 0.81), cerebrospinal fluid-adjusted mean signal intensity of the Modic change (intra-rater: 0.99; inter-rater: 0.92), and total signal intensity of the Modic change (intra-rater: 0.96; inter-rater: 0.92). Semi-automated contouring approaches [31] improve measurement efficiency and may further enhance the reliability of indices based on Modic change size and intensity.
A second strategy for quantitative and objective classification of endplate bone marrow lesions is based on the assessment of bone marrow lesion composition rather than on lesion size or structure. Measuring bone marrow composition is especially advantageous for evaluating lesion progression and for monitoring the response to treatments, where biochemical changes in the marrow compartment are likely to precede any visible structural changes. MR imaging based on chemical shift encoding-based water-fat imaging enables the spatially resolved assessment of bone marrow fat at trabecular sites with heterogeneous red marrow distribution [32]. For example, multi-echo gradient echo acquisitions with echo time steps that result in a water-fat phase difference different from 0 and 2π give robust water-fat separation, and accurate fat quantification is possible with techniques that incorporate a precalibrated multi-peak fat spectrum in the signal model [33]. These methods may be useful for quantifying the extent of marrow edema and fatty replacement that coincide with MC1 and MC2 (Figure 3). Compared to single voxel spectroscopy, water-fat MRI sequences also facilitate measuring the spatial heterogeneity in marrow content, which is useful for classifying lesions that exhibit both edematous and fatty changes. Currently, classification of these "mixed-type" Modic changes [34,35] with T1 and T2-weighted sequences alone is highly subjective.
A third strategy involves diffusion-weighted imaging, which may help differentiate patients that have endplate bone marrow lesions with degenerative vs. infectious etiologies. Using diffusion-weighted MRI, Patel et al. [36] found that patients with well-marginated, linear regions of high signal intensity situated within adjacent vertebrae at the interface between normal and abnormal bone marrow were predominantly infection-free; conversely, absence of this "claw sign" was associated with discitis/osteomyelitis. Those authors hypothesized that a gradual, progressive degenerative process results in a well-defined border between the normal and affected bone marrow, although this remains unconfirmed. Apparent diffusion coefficient (ADC) maps remove the T2 shine through of diffusion-weighted imaging and thereby provide quantifiable signal that is directly proportional to the diffusivity of water inside the tissue. ADC values can accurately distinguish between infectious spondylitis and MC1 [37,38]. One limitation with this approach is the sensitivity of image quality and ADC values to chosen strength and timing of the gradients (b-values) [38].
A complementary strategy for identifying spinal levels with endplate bone marrow lesions involves using pulse sequences that enhance visualization of endplate damage. Damage to the cartilage endplate and subchondral bone is believed to be an important factor in the etiology of endplate bone marrow lesions because damage promotes cross-talk between inflammatory factors expressed in the disc and the quiescent bone marrow [8,39]. However, conventional T2-weighted sequences used in the spine (echo time = 60-80 ms) are unable to show the cartilage endplate, because the cartilage has short T2 values, and thus, its signal is not captured in sequences with long echo times. Newer sequences such as ultra-short timeto-echo [40][41][42] (UTE) can overcome this limitation. For example, Law et al. [42] first demonstrated the feasibility of assessing cartilage endplate integrity using UTE MRI, which simultaneously improves visualization of endplate morphology ( Figure 3). In addition to assessing the morphology of the cartilage endplate, UTE also has the capability of assessing its biochemical composition [41], which could potentially be used to assess early degeneration. In the future, quantitative measurements of cartilage endplate degeneration and damage may provide a more comprehensive evaluation of endplate bone marrow lesions [43].

Summary and recommendations
Comparison of Modic change data between studies can be problematic. Depending on various technical factors such as imaging sequence and magnetic field strength, MC1 can be detected at greatly varying frequencies and MC2 can appear hyper-rather than hypointense. These variations may result in inconsistencies in the apparent relationship between Modic changes and LBP. Problems with comparability can be even greater in longitudinal studies where the comparison is between images acquired at baseline with older MR units that have lower signal-to-noise ratios and images acquired at follow-up with newer MR units with improved signal-to-noise ratios and fat-saturated sequences. Even when identical sequences are used with the same imager at all time points, the subjective and categorical nature of Modic classification limits inter-rater and intra-rater reliability. All of these factors affect the reported associations between Modic changes and LBP and may underlie the wide variability between studies.
Overall, as research and clinical interest in Modic changes increases, and as stronger magnets and newer sequences gradually replace older ones, it will be critical to appreciate how technological advancements can influence reported clinical associations and comparability of study results. It will also be necessary to ensure the methodological details that accompany Modic change data are sufficiently documented to understand which technologies and techniques were used for image acquisition and analysis. This technical information is essential for consistent interpretation of Modic change data. Therefore, it is critical to adopt imaging and reporting standards that codify acceptable methodological information that is necessary to accompany Modic change data (Box 1).
Finally, while qualitative Modic classification with T1 and T2 sequences is currently the norm, novel quantitative methods show potential for assessing the severity of changes in marrow composition and for characterizing endplate structure and function in a more objective and operator-and scanner-independent manner. These developments may form the basis for more accurate future classification systems.

MR unit:
Identify the MRI instrument and magnetic field strength, plus any surface coils or specialized tables used to collect and amplify signal. Include model number and manufacturer. For longitudinal or multi-center studies, use scanners with the same field strength.

Sequences:
Specify the T1-and T2-weighted sequence parameters, including the type of spin echo (i.e. fast vs. conventional), repetition time/echo time, field of view, matrix size, slice thickness/spacing, number of echoes, and the type of fat suppression that was applied (T2 only). If fat suppression was used, the Modic classification for type 2 changes should be defined as having hyperintense signal on T1-weighted images and hypointense on fatsaturated T2-weighted images. If additional sequences, e.g. STIR/Dixon, will be used for classifying Modic changes, these sequences should be reported in addition to the T1-and T2-weighted sequences.

Image evaluation:
Describe which image slices were rated and which levels were evaluated.

Rater agreement:
Report the inter-rater and intra-rater kappa statistics for categorical Modic classification. Also, report the inter-rater and intra-rater intraclass correlation coefficients (ICC) if using quantitative measurements, e.g. lesion size, cerebrospinal fluid-normalized intensity, etc.  Hypothetical scenarios show how the reliability of Modic classification, or inter-rater agreement, impacts the relationship between Modic changes and LBP. In each scenario, the actual prevalence of Modic changes in LBP patients (46% [5]) and asymptomatic controls (6% [5]) was simulated, and increasing numbers of false negatives were randomly assigned to the groups (group size = n/2). After each false negative assignment, the inter-rater agreement (κ) and odds ratio was calculated. This simulation was repeated 100 times in order to consider cases where the false negative assignments randomly affected members of each group. The mean odds ratio was calculated for 100 repeat simulations. Plus signs (+) for each scenario indicate the cutoff value of κ that produced a significant odds ratio for   Eur Spine J. Author manuscript; available in PMC 2020 May 08. assessment of cartilage endplate integrity, UTE shows good marrow contrast. Images were acquired with a UTE sequence that combines a nonselective hard pulse and 3D radial acquisition (10/0.236, 19-degree flip angle, 15-cm field of view, 1.5-mm slice thickness).
(C) Matching sagittal fat fraction map from water-fat MRI showing reduced fat fraction in the region corresponding to the contour in (A). Compared to site-matched normal regions in the adjacent, non-affected vertebrae, regions with type 1 Modic changes had lower fat fraction, suggesting that water-fat MRI is sensitive to the fibrovascular replacement of the normal bone marrow that occurs with this type of endplate bone marrow lesion. Data are shown for three subjects presenting with chronic LBP, and fat fractions are shown as mean ± SD for three ROIs (3-mm diameter) per region. Images were acquired with a 3D spoiled gradient recalled sequence (10/1.3, 3-degree flip angle, 22-cm field of view, 4-mm slice thickness) with six echoes and iterative decomposition of water and fat with echo asymmetry and least squares estimation (IDEAL). All images were acquired at 3.0T.
Fields et al.

Page 16
Eur Spine J. Author manuscript; available in PMC 2020 May 08.