FormalPara Key Summary Points

We developed a convolutional neural network (CNN) model to predict treatment outcomes of transforaminal epidural steroid injection (TFESI) for controlling cervical radicular pain due to cervical foraminal stenosis.

We retrospectively recruited 293 patients with cervical TFESI due to cervical radicular pain caused by cervical foraminal stenosis, and obtained a single oblique cervical radiograph from each patient.

We cut each oblique cervical radiograph image into a square shape, including the foramen that was targeted for TFESI, the intervertebral disc, the facet joint of the corresponding level with the targeted foramen, and the pedicles of the vertebral bodies just above and below the targeted foramen, which was used as input data.

The area under the curve of our developed model for predicting the treatment outcome of cervical TFESI in patients with cervical foraminal stenosis was 0.823.

A CNN model trained using oblique cervical radiographs can be helpful in predicting treatment outcomes after cervical TFESI in patients with cervical foraminal stenosis.

Introduction

Cervical foraminal stenosis is a condition caused by neural foraminal narrowing and is a common cause of upper extremity radicular pain. Bulging disc degeneration, facet and ligament hypertrophy, and degenerative bony spurs are the main factors that induce the narrowing of the neural foramen [1, 2]. In addition, cervical foraminal stenosis causes an inflammatory response in which various inflammation-mediated cells and proinflammatory cytokines are involved, resulting in cervical radicular pain [3]. Conservative treatments such as oral medication, physical therapy, and injection procedures are used to control cervical radicular pain caused by cervical foraminal stenosis [4,5,6]. Transforaminal epidural steroid injection (TFESI) is one of the most effective treatments for alleviating pain from cervical foraminal stenosis [3, 4]. Corticosteroids inhibit the synthesis of various proinflammatory mediators [7, 8]. Several previous clinical trials have reported a positive therapeutic effect of TFESI for controlling cervical radicular pain due to cervical foraminal stenosis [3, 4, 9, 10].

The prediction of therapeutic outcomes after TFESI is important because it allows clinicians to elucidate a therapeutic plan for cervical radicular pain due to cervical foraminal stenosis. In 2017, Kim et al. evaluated the treatment outcome of TFESI based on the severity of cervical foraminal stenosis seen on cervical axial magnetic resonance imaging (MRI); however, they found no significant difference in therapeutic outcomes based on stenosis severity [3]. Other than the study by Kim et al., studies investigating the therapeutic effect of TFESI for controlling pain induced by cervical foraminal stenosis have not been conducted.

We propose that the degree of stenosis of the cervical foramen, the degeneration of the cervical disc and facet, and the presence of a bony spur around the cervical foramen can affect the outcome of TFESI. An oblique cervical radiograph can show these structural findings and has merit in that it can be easily performed because almost all clinics and hospitals are equipped with a radiographic imaging machine, and the cost that patients should pay for the test is relatively low. However, there are still no measurement tools for anatomical abnormalities shown in radiographs related to cervical foraminal stenosis, and there is a lack of criteria for classifying the degree of stenosis. Accordingly, the analysis of the effect of TFESI according to radiographic findings is limited.

Machine learning (ML) is a computer algorithm that can automatically learn from data without the need for explicit programming [11,12,13]. ML is known for its ability to overcome the limitations of existing image analysis techniques and enable breakthroughs in the field of image analysis [11,12,13]. Deep learning (DL) is an advanced ML approach that involves the use of a large number of hidden layers to build artificial neural networks with structures and functions similar to those of the human brain [14,15,16]. It can learn from unstructured and perceptual image data, and several studies have demonstrated that the DL technique can outperform traditional ML techniques [14,15,16]. A convolutional neural network (CNN) is a representative DL model specializing in image analysis [11,12,13]. We believe that the CNN model can recognize and analyze the findings related to foraminal stenosis on oblique cervical radiographs and predict the therapeutic outcome of TFESI [11,12,13].

In the current study, oblique cervical radiographs were used as input data to train a CNN model to predict therapeutic outcomes after cervical TFESI in patients with chronic cervical radicular pain caused by cervical foraminal stenosis.

Methods

Participants

This retrospective observational study involved 358 patients who visited the spine center of a university hospital and underwent cervical TFESI for cervical foraminal stenosis between January 2013 and December 2021. The inclusion criteria for this study were as follows: (1) age 20–79 years; (2) single-level cervical TFESI for segmental pain that radiated to the upper extremity due to cervical foraminal stenosis; (3) a ≥ 3-month history of a symptomatic cervical radicular pain score of > 3 on a numerical rating scale (NRS-11; 0 = no pain; 10 = the worst pain) prior to TFESI; (4) ≥ 50% temporary pain relief following a diagnostic nerve block with 1 mL of 2% lidocaine (a diagnostic block was conducted on a day prior to the day of cervical TFESI); and (5) MRI findings corresponding to the clinical presentations. To diagnose chronic cervical radicular pain due to cervical foraminal stenosis, findings of physical examination, such as motor and sensory deficits, deep tendon reflexes, and Spurling sign, were considered. Subsequently, the diagnosis was confirmed through a diagnostic block. We excluded patients with peripheral neuropathy, cervical myelopathy, or a spinal infection.

Of the 358 patients, 7 were excluded because they were 80 years or older, 33 were excluded for undergoing cervical TFESI at two levels, 23 were excluded for the onset of pain within less than 3 months before the study, and 2 for having combined cervical myelopathy. Consequently, 293 patients were finally included in this study (mean age, 54.0 ± 11.2; men/women, 164:129; injection levels C5/C6/C7/C8, 10:138:135:10; right/left, 162:131) (Fig. 1). We did not exclude patients with cervical malalignment due to scoliosis. However, none of the included 293 patients had cervical malalignment resulting from scoliosis. The study protocol was approved by the institutional review board of Yeungnam University Hospital, which waived the requirement for written informed consent owing to the retrospective nature of the study. The Helsinki Declaration was adhered to in this study. We conducted this study using methods similar to those employed in our previous research [17].

Fig. 1
figure 1

Flowchart depicting patient inclusion

Transforaminal Epidural Steroid Injection

An aseptic technique was used for the cervical TFESI. The procedure was conducted following the method of Kim et al. [3]. The patients were placed in the supine position using C-arm fluoroscopy (Siemens, Erlangen, Germany). To focus on the target, the C-arm was rotated toward the region, and the craniocaudal angle was controlled to focus on the intervertebral foramen. A 26-gauge, 90-mm spinal needle with a bend at the tip was inserted into the skin and advanced to the anterior half of the superior articular process of the cervical spine. Next, the depth of the needle tip was checked using the anteroposterior and lateral views of the C-arm. A test dose of the contrast medium (0.2–0.3 mL) was injected to determine whether the needle tip was placed at the proper location. Further injection of contrast medium was performed under real-time fluoroscopic monitoring. Subsequently, 5 mg of dexamethasone was mixed with 1.5 mL of normal saline and injected. The cervical TFESI was performed once in each patient.

Images Used for the Deep Learning Algorithm (Input Data)

Oblique cervical radiographs were obtained from a 45° anteroposterior orientation on both the left and right sides of each patient. An ipsilateral oblique cervical radiograph was used as input data when TFESI was performed on the right or left side of the cervical spine. In addition, we cut each oblique cervical radiograph image with a square shape, including the foramen that was targeted for TFESI, the intervertebral disc, the facet joint of the corresponding level with the targeted foramen, and pedicles of the vertebral bodies just above and below the targeted foramen (Fig. 2). The image, including the targeted foramen and structures around the targeted foramen, was used as the input data.

Fig. 2
figure 2

Diagram of the process for the development of the deep learning model for predicting the therapeutic outcome after transforaminal epidural steroid injection in patients with cervical foraminal stenosis. AP anteroposterior, ROI region of interest, AUC area under the curve

Measurement of Therapeutic Outcome (Output Data)

Pain severity was assessed at pretreatment and the 2-month follow-up after cervical TFESI. The pain was assessed using the numeric rating scale (NRS) (0 = no pain; 10 = worst pain). The NRS data were collected via chart review. A favorable outcome was defined as a ≥ 50% reduction in the NRS score at 2 months post TFESI compared to the pretreatment NRS score. A poor outcome was defined as a < 50% reduction in the NRS score at 2 months post TFESI vs. the pretreatment score. To validate the change in pain reduction, NRS scores were evaluated by assessing the difference between the pretreatment NRS scores and the 2-month post-TFESI scores (change in NRS [%] = [pretreatment NRS score − 2 months post-TFESI NRS score]/pretreatment NRS score × 100).

Deep Learning Algorithms

Python 3.8.10, scikit-learn 1.1.2, and TensorFlow 2.10.1 with Keras were used to develop the CNN model for predicting cervical TFESI outcomes. We trained the pre-trained CNN models separately using four state-of-the-art CNN models (EfficientNetV2B0, B1, B2, and B3) and compared their performances. The EfficientNetV2B1 model was selected to develop a model to predict therapeutic outcomes after cervical TFESI in patients with cervical foraminal stenosis. Table 1 presents the details of the proposed model.

Table 1 Layer types and parameters in our developed model

Statistical Analysis

Statistical analyses were performed using Python 3.8.10 and scikit-learn version 0.24.1. A receiver operating characteristic curve analysis was performed, and the area under the curve (AUC) was calculated. The 95% confidence interval (CI) for AUC was calculated as described by DeLong et al. [18]. Scikit-learn was used to calculate the receiver operating characteristic curve and AUC.

Results

The performance evaluation of our DL model yielded significant results. The model achieved an accuracy of 77.0% with an AUC of 0.823 (95% CI 0.727–0.919) for the validation data. During training, the model demonstrated exceptional performance with a high accuracy of 99.1% and an AUC of 0.999 (95% CI 0.996–1.000) (Fig. 3).

Fig. 3
figure 3

Receiver operating characteristic curves for the validation and test datasets of our developed model. Acc accuracy, AUC area under the curve, CI confidence interval

Precision and recall metrics were used to assess the ability of the model to identify patients with favorable and poor outcomes. The precision was 0.806 for poor outcomes and 0.744 for favorable outcomes, indicating a relatively low false-positive rate. The recall was 0.842 for favorable outcomes and 0.694 for poor outcomes, indicating a good ability to detect true-positive cases. A comprehensive evaluation of the model performance, as measured by the F1-score, yielded a reported macro average of 0.768, indicating balanced performance across both classes. Further details on the model performance are listed in Table 2.

Table 2 Details of performed of our developed deep learning model

Two DL models, ResNet50 and MobileNet, were trained using the same dataset, and their performance was compared with the proposed model [19, 20]. The results showed superior accuracy and AUC based on the validation data in the proposed model compared to the two comparison models (Table 3).

Table 3 Two CNN models for comparison: model details

Figure 4 presents the confusion matrix as a visual representation of the model classification. The intensity of the color in each cell reflects the case count, with darker shades indicating lower counts. Of the 74 validation data points, the model achieved accurate predictions for 57 and misclassified 17. Specifically, these misclassifications included 6 false positives and 11 false negatives, highlighting potential areas for model optimization and performance.

Fig. 4
figure 4

The correct classification and misclassification cases of our deep learning model

Discussion

In the current study, we developed a DL algorithm for predicting therapeutic outcomes after TFESI in patients with radicular pain due to cervical foraminal stenosis based on oblique cervical radiographs. The AUC of the developed DL algorithm, evaluated using validation data, was 0.823 for predicting the outcome of pain reduction at 2 months after TFESI in patients with radicular pain due to cervical foraminal stenosis. Considering that an AUC with a range of 0.8–0.9 is generally considered to be excellent, we believe that our DL model trained using oblique cervical radiographs as input data would be helpful for pain physicians to predict the treatment outcome of cervical TFESI for controlling cervical radicular pain induced by cervical foraminal stenosis [21].

A deep neural network (DNN), also known as a DL, is a feedforward neural network with multiple hidden layers between the input and output layers. Each layer included a variable number of nodes [14, 22, 23]. All nodes in the DNN are connected by internal links. DNN uses backpropagation, which is an algorithm designed to test for errors by working back from output nodes to input nodes and provides greater capacity than traditional shallow neural networks [14, 22, 23].

A CNN is a class of DNN, which is the most commonly used artificial intelligence model. A CNN automatically and adaptively learns the spatial hierarchy of features via backpropagation using multiple stacked layers, including convolution, pooling, and fully connected layers [12, 13, 22]. It uses multiple channels of two-dimensional image data as input and transforms them repeatedly using convolution and pooling operations, which enables the extraction of meaningful and significant features from the input image data [12, 13, 22]. The CNN model is predominantly used in various computer vision tasks and is used to process image data and recognize the patterns of image data [24]. However, because of the nature of DNN, including CNN, CNN is limited in that it cannot know which factors in the image data are weighted or considered important features. Likewise, we cannot know which features or patterns of oblique cervical radiographs are significantly more weighted in the process of data learning, but we believe that our DL model would have detected the narrowed foramen, degeneration of the disc and cervical facet joints, and formation of bony spurs.

To the best of our knowledge, only one study has attempted to find a factor or feature in image data that is associated with the therapeutic outcome of TFESI in patients with cervical radicular pain [3]. In 2017, Kim et al. included 53 patients with cervical radicular pain caused by cervical foraminal stenosis [3]. The patients were divided into two groups according to the degree of foraminal stenosis observed on cervical axial MRI (non-severe foraminal stenosis group, 22 patients; severe foraminal stenosis group, 31 patients). However, the therapeutic effect of TFESI, measured by changes in NRS scores, was not significantly different between the two groups. Kim et al. did not find any anatomical factors in cervical MRI that influenced the therapeutic outcome after TFESI [3]. We believe that the binary classification used by Kim et al. in their study was not sensitive enough to extract the difference in the degree of narrowing of the neural foramen and degeneration of the cervical spine between the included patients. In contrast, our DL model seems to have recognized the difference in the spinal structure shown on the oblique cervical radiograph of each patient.

To the best of our knowledge, two previous studies have reported the prediction of treatment outcomes for spinal radicular pain using a DL model [12, 17]. In 2022, Kim et al. collected whole T2-weighted sagittal lumbar spine MR images from 503 patients with chronic lumbosacral radicular pain [12]. In that study, favorable and poor outcomes were defined as ≥ 50% and < 50% reduction at 2 months after TFESI, respectively; the validation accuracy was 76.2%, and the AUC was 0.827. However, the DL model was used to predict TFESI outcomes in patients with lumbosacral radicular pain. In 2023, Wang et al. recruited 288 patients with radicular pain due to cervical foraminal stenosis [17]. They collected a single T2-axial spine MR image of each patient. They also set ≥ 50% and < 50% reduction at 2 months after TFESI as favorable and poor outcomes, respectively. The AUC of their developed model for predicting therapeutic outcome of TFESI was 0.801. However, Wang et al. used MR images as input data for developing their algorithm. Therefore, our study is the first to demonstrate the usefulness of a DL model trained using cervical radiographs in predicting the therapeutic outcomes of TFESI for cervical radicular pain.

The advantage of our model is that it can predict therapeutic outcomes using only a single oblique cervical radiograph from each patient, making it convenient for application in clinical settings. However, for clinical use, the predictive capacity of the DL model should be enhanced. If the predictive accuracy is increased because cervical radiographs can be easily performed or obtained in clinical practice, we believe that the DL model using cervical radiographs as input data can be easily and widely used to predict the therapeutic outcomes of TFESI in patients with cervical radicular pain in pain clinics or hospitals. To increase the predictive ability of the DL model, the combined use of image and clinical data as input can be helpful. Additionally, if the amount of input image data increased, the performance of the DL model could be enhanced. In the future, incorporating image data from other external hospitals can increase the DL model’s generalizability. Furthermore, the expansion of the dataset and its subsequent segmentation into training, validation, and testing subsets will significantly enhance the predictive generalization capabilities of the model through comprehensive training and validation of its performance.

In addition, the misclassifications of the validation data (6 false positives and 11 false negatives) suggest that the model has difficulty with certain cases. Further optimization, which may include tweaking the model architecture, adjusting the hyperparameters, or incorporating additional data augmentation techniques, is necessary to address these limitations. Furthermore, our study focused only on the outcome of ≥ 50% reduction or < 50% reduction in the NRS score at 2 months post TFESI compared to the pretreatment NRS for developing the DL model. Using a diverse set of outcomes as output data could provide more comprehensive clinical information. In addition, the DL model was constructed on the basis of retrospective data, which poses limitations.

Conclusion

The CNN model developed in this study was trained using oblique cervical radiographs. The AUC of the model was 0.823, which can be interpreted as excellent performance in predicting the therapeutic outcome of TFESI in patients with cervical radicular pain due to cervical foraminal stenosis.