Introduction

Non-specific low back pain (LBP) is an endemic disorder and is the leading cause of activity limitation and work absence [1]. LBP has been linked to degeneration of the intervertebral discs (IVDs) [2] and is considered one of the major pain sources, often referred to as discogenic pain [3]. Nerve ingrowth along granulation tissue in fissures extending into the outer layers of the annulus fibrosus has been suggested as one reason for discogenic pain [3]. Hence, there is a need for classification of fissures with plausible correlation to pain.

Computed tomography (CT) discogram is regarded as the reference standard for morphological imaging of disc fissures, but the radiation burden, the subjectivity of the interpretation, as well as the invasive nature of the discography procedure has questioned its legitimacy [4].

In clinical practice, MRI is currently used for the characterization of the lumbar discs. The T2-weighted images reflect the water content as well as disrupted extracellular matrix structure [5]. This, in combination with the knowledge that various phenotypes of annular fissures represent different macroscopic alterations of the IVD matrix [6], makes the T2-weighted images suitable to characterize annular fissures. However, the clinical evaluation of the IVDs is limited by the visual interpretation of the MR images, making small and subtle degenerative changes, such as annular fissures difficult to depict. Furthermore, subjective assessment of annular fissures by identification of so-called high intensity zones (HIZs) has not been shown to differentiate a painful degenerated disc from a degenerated but asymptomatic disc [7] or correlate well with clinical LBP symptoms [8].

With the rapid technical evolution within computer science, new data-driven softwares have been developed that can robustly extract objective markers of impaired matrix integrity from conventional MR images. For example, regional IVD markers have been shown to differentiate between LBP patients and controls [9, 10] as well as between groups of IVDs with and without HIZ [9, 11]. Such markers might be able to characterize potentially painful fissures on an individual level. Moreover, the technical evolution has supplied machine-learning algorithms that can be trained on multiparametric data for classification of specific outcomes. As such, it can merge several promising markers into a strong and aggregated model for optimized, individualized diagnostics.

This study aimed to investigate if machine-learning modelling based on global and regional MRI markers can differentiate between individual IVDs with fissures extending into the outer layers of the annulus fibrosus or not. The study also aimed to investigate if such tool can differentiate between IVDs with positive and negative pain-provocation at discography.

Methods

Hypothesis

The hypothesis of the study is that potentially painful disc fissures, extending into the outer layers of annulus fibrosus, present an MRI pattern that can be detected with machine-learning.

Study design

A retrospective analysis of prospectively collected data.

Subjects

The study was based on an existing dataset, previously collected to investigate the impact of HIZ on pain provoced by discography [12]. As such, the dataset did not only include MR images but also CT-discograms, which depicted the fissures in detail and, hence, offered a reference standard for the fissure modelling. That is, based on the appearence in the CT-discograms, the IVDs could be classified into those with fissures extending into the outer layers of the annulus fibrosus or not. Then, a machine-learning algorithm could be trained on MRI markers to separate between these classes. Similarly, the pain-provocation response also offered the possibility to train an algorithm for differentiation between IVDs with positive and negative pain-provocation at discography.

In the study, only LBP patients with CT-discograms of sufficient quality for fissure classification were included (n = 30, age = 26–64 years, 11 males). The inclusion criteria for the patients were: referred for preoperative discography, > 6 months duration of non-specific LBP and no response to conservative treatment. The exclusion criteria were: history of spinal surgery, known claustrophobia and possible pregnancy.

Examination protocol

On the same day, the lumbar spine was examined consequently with 1) MRI, 2) pressure-controlled discography (PCD) followed by 3) CT.

MRI examination

All patients were examined on a 1.5 T MRI scanner (Siemens Magnetom Symphony Maestro Class, Erlangen, Germany) using a conventional scan protocol, including T2-weighted sequences in axial and sagittal views (slice thickness = 4 mm, TR = 4.0–6.9 ms, TE = 114-124 ms). All scanning was performed with a fixed receiver amplifier gain to normalize the signal intensity.

PCD procedure

The PCD procedure was performed without sedation and included 86 IVDs that were labeled as pain-positive or pain-negative discograms. A pain-positive discogram was defined as an IVD, where the patient experienced reproducible concordant pain in similar to daily pain, ≥ 5/10 NRS (numerical rating scale) at a pressure ≤ 50 psi above opening pressure and one negative control disc. A pain-negative discogram was defined as an IVD, associated with no induced pain or a pain-response unfamiliar to the patient.

CT examination

CT examination of the lumbar spine was performed approximately 30–60 min after the PCD using a 16 slice CT scanner (Siemens Somatom Sensation, Erlangen Germany) with contrast media Omnipaque 180 mg/ml (GE Health- care, Oslo, Norway), slice thickness = 0.75 mm and b-kernel = 45. Multiplanar reconstructions of 1 mm thickness were reconstructed in sagittal, coronal and axial planes, enabling a detailed description of the contrast media distribution in all planes.

Image analysis

CT-discogram analysis

A total of 83 CT-discograms (including injected control IVDs) were included in this study, after three of the discograms had been excluded due to annular injections. A senior resident in radiology labeled the extension of the annular fissures in the CT-discograms according to the Dallas Discogram Description (DDD) [13] and Adams grading schemes [14]. The discograms were dichotomized into IVDs without (0) and with outer annular fissures (1), i.e. DDD ≥ 2/DDD ≤ 1 and Adams ≥ D/Adams ≤ C, respectively [15]. Blinded evaluation of CT-images of 10 patients was done by both the resident (after 3 months) and a senior radiologist to obtain measures for calculation of intra- and inter-rater reliability. An intra-rater agreement of 96% for labeling the fissure extension in the axial plane, grade 0–1 (p = 1.0) and 96% in the sagittal plane, grade 0–1 (p = 1.0) were found [15]. Also, the inter-rater agreement was found to be 100% for labeling the fissure extension in the axial plane (p N/A) and 100% in the sagittal plane (p N/A) for the current data set [15].

Conventional MRI analysis

HIZ in the IVDs was identified as a high-intensity signal located in the annulus fibrosus on the T2-weighted MR images [16]. In similar with the CT-discogram analysis, intra- as well as interrater evaluation regarding HIZ was performed on the MR images, displaying high kappa values (> 0.80) [12].

Extraction of MRI markers

An in-house software tool [9] based on MATLab (R2018b, Mathworks®, Natick, Massachusetts, U.S.A.) was used for extraction of the MRI markers. First, a senior resident in radiology segmented all IVDs in the sagittal T2-weighted image using the five most central slices of the IVDs. The nucleus pulposus and annulus fibrosus were included in the segmented IVD. The mean pixel value (signal intensity: SI) within the segmented IVD as well as the standard deviation of the mean SI was extracted. Then, each IVD was automatically divided into 5 equally large sub-regions in the ventral-dorsal direction: (1: ventral annulus fibrosus, 2: transition zone between annulus fibrosus and ventral part of nucleus pulposus, 3: nucleus pulposus, 4: transition zone between the dorsal part of nucleus pulposus and dorsal annulus fibrosus and 5: dorsal annulus fibrosus; Fig. 1). From each sub-region, the mean SI and the standard deviations of the mean SI were extracted. In total, 12 different MRI markers were extracted from each IVD (Table 1).

Fig. 1
figure 1

Overview of the machine-learning workflow for the fissure and pain modelling

Table 1 Description of the MRI markers that were extracted from the T2-weighted images

The intra- and inter-observer reliability for the IVD segmentation has previously been determined (0.79–0.99 depending on the sub-region) on a similar cohort with a high degree of consensus [17].

Fissure and pain modelling

The Anaconda platform (Anaconda Software Distribution. Computer software. Vers. 2–2.4.0. Anaconda, Nov. 2016) with the Python package scikit-learn (version 0.23.2) was used to classify fissures and pain from the MRI markers (Fig. 1). In specific, the “Random forest” machine-learning algorithm with 100 trees was used. For development of the models, the machine-learning algorithm was trained on 75% of the data. To provide insight into the different models and determine the usefulness of each MRI maker at predicting the fissure categories and pain-responses, importance scores that displayed the relative contribution of each marker to the model were determined as relative values with highest score 1.0. The trained models were then tested on the remaining 25% unseen data to validate their diagnostic performance in terms of accuracy (proportion of true classifications out of all classifications), precision (proportion of true positive classifications out of all positive classifications), recall (proportion of true positive classifications out of all positive events), f1-score (harmonic mean of the model's precision and recall). The procedure was repeated 10 times and the results were averaged. Finally, learning curves for the train and test data were plotted to evaluate the quality of the model and possible under/overfitting.

Statistics

For comparisons between groups of IVDs with and without outer annular fissures as well as between groups of IVDs with negative and positive pain-response, a two-sided test with a significance level set at p < 0.05 was used. To adjust for within individual correlation and Adams grade, a mixed model with compound symmetry per group was used. Associations between the MRI markers and fissure and pain findings were determined using the Pearson's correlation coefficient.

Results

General description of the data

Of all included IVDs (n = 83), 67 had fissures extended into the outer layers of the annulus fibrosus. Of all IVDs with outer annular fissures, only 49% presented a HIZ on the MR images.

For all MRI markers, except for sub-region 2, significant correlations with fissure findings were found (sub-region 1: R = 0.61, p < 0.0001; 2: R =  − 0.08, p = 0.46; 3: R =  − 0.37, p = 0.0006; 4: R =  − 0.49, p < 0.0001; 5: R = 0.33, p = 0.002; whole IVD: − 0.59, p < 0.0001), even when adjusting for within subject effects. In these regions, significant differences between IVDs with and without outer annular fissures were found for all MRI markers, where groups of IVDs with outer annular fissures presented lower global mean SI values (whole IVD: 44% lower, p < 0.0001), but also higher means at the anterior and posterior annulus (sub-region 1:43% higher, p < 0.0001 and 5:20% higher, p = 0.003) and lower means at the nucleus pulposus (sub-region 3:13% lower, p = 0.002) and at the transit zone between nucleus pulposus and annulus fibrosus (sub-region 4:20% lower, p = 0.003).

A total of 50 IVDs were classified as pain-positive, where three had no fissures in the outer annulus fibrosus and 47 had outer annular fissures. The number of IVDs with negative pain-response at discography were 33, where 13 had no fissures in the outer annulus fibrosus and 20 had outer annular fissures.

In comparisons to IVDs with negative pain-response, IVDs with positive pain-response presented lower global mean SI values (whole IVD: 18% lower, p = 0.027) and lower means at the transit zone between the dorsal part of nucleus pulposus and dorsal annulus fibrosus (sub-region 4: 10% lower, p = 0.019), even when adjusting for Adams grade and within subject effects. In these regions, also significant correlations between means and pain findings were found (whole IVD: R =  − 0.49, p < 0.0001; sub-region 4: R =  − 0.50, p =  < 0.0001).

Fissure and pain model

Table 2 shows the feature importance scores for all extracted MRI markers. The mean SI value in the whole IVD and in sub-region 2–5 was shown to contribute most to the fissure model. All MRI markers, including both global and regional markers as well as means and standard deviations, contributed approximately equally to the pain model.

Table 2 Relative contribution (importance score) of the MRI markers to the fissure and pain models

Table 3 shows the performance of the fissure model for the DDD characterization scheme. Similar results were found also for the Adams scheme (data not shown). The model predicted IVDs with outer annular fissures with very high precision and accuracy (mean: 99 and 97%, respectively), while the pain model predicted pain-positive discograms with only moderate precision and accuracy (mean: 71 and 69%, respectively). The receiver operating characteristic curve for the fissure model (Fig. 2) displayed strong predictive power of the model as a large area under the curve. A smaller area, but still larger than for random predictions, was found for the pain model (Fig. 2).

Table 3 Performance measures for the fissure and pain models presented as mean values of ten repeated testings
Fig. 2
figure 2

Receiver operating characteristic (ROC) curves, illustrating the trade-off between the true positive rate and false positive rate for a the fissure and b pain classification using different probability thresholds. ROC curves for ten test repetitions are shown in different colors, where some curves in a overlap. For reference, a ROC curve representing the random state is also shown (black dotted line)

The learning curves displayed no signs of underfitting or overfitting of neither models and both curves showed a rapid progression towards the final accuracy scores (Fig. 3).

Fig. 3
figure 3

The learning curves, illustrating the improvement in accuracy score with increased numbers of training data, for a the fissure and b pain model. The curves are plotted as the mean accuracy score (black) and the variability (shaded grey area), representing the standard deviation above and below the mean

Discussion

In consensus with previous studies [9, 18, 19], findings showed that potentially painful fissures are associated with MRI changes, not only at the site of the fissure but also at other regions, introducing a specific heterogeneity pattern over the IVD that could be used for phenotyping. Detailed MRI analysis offered global and regional markers that could identify individual IVDs with such fissures with very high precision and accuracy using machine-learning modelling and thereby, could be an important tool in future research and diagnostic handling of LBP patients. Further research is encouraged to evaluate the value of the model for long-term monitoring and treatment results.

One of the problems in the clinic today is to identify if a patient is suffering from “discogenic pain” and if so which of the IVDs that demonstrate degenerative changes actually are pain-generators. Patients often have several IVDs with varying grade of degeneration while LBP is experienced by the individual. Hence, it is difficult to find a link between pain and MRI findings for separate IVDs. In an attempt to relate patient pain to MRI findings, the study utilized the patient’s experience of pain during pain-provoked discography as a pain reference. Findings showed a significant correlation between the pain-response and the global mean SI value, as well as in the sub-region at the transit zone between the dorsal nucleus pulposus and annulus fibrosus, even when adjusting for Adams grade and within subject effects. Waldenberg et al. have also reported MRI changes within the transit zone when stratifying subjects to pain [9], indicating that such degenerative change may be a biomarker of pain. However, the resulting pain model did not present high diagnostic performance, indicating that the model may benefit from additional data to enable clinical usability. Such data could for example include patient specific features e.g. BMI, age etc. or radiological features that reflects other aspect of the IVD and characterizes the other tissues of the spine. Moreover, the fact that pain is a subjective experience and there is no validated method that with certainty can separate painful from non-painful discs in a patient may have contributed to the moderate result obtained for the final pain model [4]. To know how good the model is for detailed disc diagnostics and which discs are the pain source in the individual patient, the model needs further evaluation on patient outcome data, e.g. after single segment fusion surgery.

In a previous study, based on the same MRI data, subjective interpretation of the MR images was used to identify HIZ and correlate the findings with pain provoked by discography. No correlations were found, possibly due to the limitation in the evaluation of the IVDs [12]. As shown by the present findings, all fissures extending to the outer annulus were not visible in the MR images as HIZ. Also, all fissures displayed as HIZ may not be potentially painful fissure phenotypes and fissures that cause pain might not always display a visible HIZ. This highlights the need for better LBP phenotypes. To reduce the human influence, computer-assisted postprocessing tools, as here presented, are needed. With such tools, objective markers from the MR images can be extracted for improved IVD evaluation.

Limitations

A strength of the present study was the unique data set, including both MR images and CT-discograms. With CT-discograms as a reference, the machine-learning algorithm could merge the MRI markers into an aggregated fissure model with strong predictive power. However, the pain model may have been limited by the reference method. Provocative discography, which no longer is used in our clinic, remains controversial for diagnosing the pain source in LBP patients [4]. Also, the method has been reported to induce a pressure increase in adjacent discs that might cause a false positive pain-response for the injected IVD [20]. This may have contaminated the analysis and degraded the pain prediction. Moreover, only LBP patients with discography data were included in the study and this may have introduced a selection bias. Another limitation of the study was the lack of an external validation set and the small training and test sets. Nevertheless, the learning curve for the fissure model showed a rapid convergence towards a very high score, indicating that only a small number of data points are needed for this model.

Conclusions

The present study showed that machine-learning modelling including global and regional MRI markers can differentiate IVDs with potential painful fissures from IVDs without such fissures and, hence, can enable individualized diagnostics. Such model could be of importance to uncover phenotypes of IVDs causing pain. However, the model only demonstrated moderate diagnostic accuracy regarding pain which could be assigned to either a non-sufficient algorithm or the used pain reference.