Introduction

Crohn’s disease (CD) is a chronic, recurrent inflammatory bowel disease (IBD), and its global incidence rate has been increasing, resulting in an extreme economic burden [1, 2]. Progressive inflammation-based stimulation may result in serious complications such as strictures, perforation, or fistulas that require surgery [3]. It is important to monitor the treatment response continuously to adjust medication and guide clinical treatment decisions [4].

The simplified endoscopic score of Crohn’s disease (SES-CD) is the most mature and quantitative scoring system for CD activity; it is limited by its invasiveness, with a risk of serious complications and difficulty in evaluating segments with severe strictures [5, 6]. CT, MR, and intestinal ultrasound have successfully become noninvasive assessment methods for CD due to their ability to visualise the full intestine [7,8,9]. Some quantitative scoring systems have been developed based on these cross-sectional imaging methods, such as the magnetic resonance activity index, Clermont score, and London score, which mainly reference qualitative parameters [10,11,12]. However, their clinical practicality has been limited by poor observer consistency and complex calculation methods.

Dual-energy CT enterography (DECTE) is an imaging technique based on two different energy settings for data acquisition that has been widely used in vascular imaging, tumour differentiation, and prognosis in recent years and is now gradually being applied to IBD [13,14,15]. The Dane team assessed inflammation activity with the iodine concentration (IC) from DECTE and compared it with the results of pathological analysis, indicating that the IC can serve as a radiological marker of CD activity [16]. In addition, some studies have found that the IC, normalised iodine concentration (NIC), and slope of the energy spectrum curve (λHU) are related to CD activity [17,18,19]. Forty kiloelectron-volt (keV) DECTE contributes to distinguishing wall enhancement between normal and diseased intestinal segments and can improve image quality [20]. Currently, most studies use DECTE to evaluate activity by comparing it with the Crohn’s disease activity index (CDAI) or pathological analysis, with only a few studies comparing it with SES-CD score and finding that NIC and λHU help differentiate CD activity [21, 22]. This study aimed to analyse and establish an ML model based on DECTE to noninvasively evaluate CD activity with the SES-CD score as the gold standard.

Materials and methods

Ethics

This retrospective study was approved by the Ethics Committee of Chongqing General Hospital and exempted from the requirement for written informed consent from patients.

Patients

Patients suspected as CD at Chongqing General Hospital from July 2021 to 2023 were included, and all subjects underwent DECTE scans. The inclusion criteria for candidate patient selection were as follows: (1) age 18–65 years old, (2) confirmed Crohn’s disease, and (3) endoscopy segmented score obtained within ± 14 days of DECTE scan. The exclusion criteria were as follows: (1) no DECTE image at the postprocessing workstation, (2) poor image quality, such as an intestinal wall that was too thin or had poor fullness, and (3) incomplete clinical data. Clinical data included patient age, sex, course of the disease, disease behavior, drug treatment method (traditional or biological therapy), C-reactive protein (CRP), albumin (ALB), and erythrocyte sedimentation rate (ESR). Finally, 46 CD patients were included, and a total of 202 segments were evaluated. According to the principle of hierarchical randomisation, the data were divided into a training set (N = 141) and a testing set (N = 61) at a 7:3 ratio. The flow chart of the study population is shown in Fig. 1.

Fig. 1
figure 1

Follow diagram of the study population

Endoscopic evaluation

One gastroenterologist who was blinded to the clinical data of each patient performed the endoscopy, divided the whole intestinal segment into five segments (the ileum, right colon, transverse colon, left colon, and rectum), and then assessed inflammation with the SES-CD criteria [23]. Next, they calculated the total score for each bowel, with a score of 0–12 points for each segment. SES-CD < 3 was considered to indicate inactive disease, and SES-CD ≥ 3 was considered to indicate active disease.

CT scanning

All patients fasted for 8 h before intestinal CTE examination and took 2000 mL 2.5% mannitol solution orally 1 h before examination. All patients were scanned with a dual-energy CT scanner (IQon spectral CT, Philips Healthcare, China). The tube voltage was fixed at 120 kVp, tube current 145 mAs, pitch 1.2, rotation speed 0.5 s, and reconstruction layer thickness 1.00 mm. The contrast agent (Ioversol, 350 mg(I)/mL, Jiangsu Hengrui medicine CO., LTD, China) was administered via a peripheral vein at 1.5 mL/kg and a rate of 3.0 mL/s with a high-pressure syringe. The scanning time was monitored using a monitoring method. The abdominal aorta was detected 10 s after injection of contrast agent, and the threshold was automatically triggered when it reached 150 HU. The arterial and venous phases of DECTE were collected approximately 30 s and 80 s after injection of contrast agent, respectively. We reconstructed dual-energy images at a Philips postprocessing workstation, obtained conventional images at 120 keV, examined decomposed images of water and iodine based on the materials, and obtained monochromatic images within the energy range of 40 to 120 keV.

Image processing and evaluation

Two radiologists with more than 5 years of expertise in diagnostic abdominal imaging, and who were blinded to the patient information reviewed the traditional CT and DECTE images. The images were evaluated and analysed using the Philips postprocessing workstation. A circular area of interest (ROI) was ideally defined to capture as much of the highly enhanced part of the lesion in the intestinal wall as possible, with a minimum value set at 5 mm2 (range: 5 mm2–10 mm2), and the IC in an artery in the same layer was measured (abdominal aorta or iliac artery). Then, the NIC was calculated as IC in the affected intestinal segment/IC in the artery in the same layer, and the λ HU was calculated as (HU40 keV–HU100 keV)/60 [24]. The thickness of the intestinal wall, segmental mural hyperenhancement, strictures, upstream dilation, comb sign, fibrofatty proliferation, inflammation, and regional lymph node size were assessed on conventional CT images. To ensure consistency in the results, all measurements were taken three times at different locations on the same layer, and the average value was calculated. Quantitative parameters are ultimately displayed as the arithmetic mean of the value obtained by two radiologists. When qualitative parameters were inconsistent, disagreements were resolved by consensus [25]. The consistency analysis between observers is shown in Supplementary Tables 1 and 2.

Model construction

The model was constructed with the training set to compare and analyse the clinical, routine imaging, and quantitative parameters of DECTE representing active and inactive intestinal segments. Features with p < 0.05 were screened using LASSO regression combined with 10-fold cross validation. We developed three ML models based on conventional image features (model 1), DECTE parameters (model 2), and all significant parameters (model 3) by a logistic regression algorithm and performed parameter tuning by using 5-fold cross validation. Finally, model performance was tested in the testing set.

Statistical analysis

Python software 3.8 and SPSS 23.0 statistical software were applied to analyse the data. p < 0.05 was considered statistically significant. Continuous variables conforming to a normal distribution are expressed as the mean ± standard deviation (SD), and the groups were compared using Student’s t test. Continuous variables that did not conform to a normal distribution are presented as medians and interquartile ranges based on their distribution and were compared by the Mann‒Whitney U test. Classification data are represented as frequencies (percentages) and were compared using the chi-square test or Fisher’s exact test. Receiver operating characteristic curves, calibration curves and decision curves were used to evaluate model performance.

Results

Clinical findings

The demographic and clinical characteristics of the participants are shown in Table 1. This study included 46 CD patients, including 19 males and 27 females, with an average age of 27.50 years [23.00, 33.00]. A total of 202 segments of the intestine were included: 110 segments were active, and 92 segments were inactive. Table 2 shows the distribution of variables in the training and testing sets, indicating that there were no significant differences between the two groups (p > 0.05).

Table 1 Patient characteristic
Table 2 Comparison of baseline characteristic between train set and test set

Diagnostic performance of DECTE parameters

Comparing the DECTE parameters of the active and inactive segments in the total sample, it was found that all parameters of the active intestinal segments were higher than those of the inactive intestinal segments (p < 0.001). As demonstrated in Table 3, all DECTE parameters performed well in evaluating CD activity (AUC value > 0.75). λ HU in the venous phase (λ HU-V) had the greatest performance in evaluating the activity of CD, with an AUC value of 0.81. When λ HU-V ≥ 1.975, its sensitivity and specificity in diagnosing active intestinal segments were 0.800 and 0.783, respectively. According to this result, the ROC curves were plotted in Fig. 2a. Examples of typical images of active and inactive patients are shown in Figs. 3 and 4.

Table 3 Evaluation performance of all spectral parameters
Fig. 2
figure 2

The performance of single DECTE parameters and machine learning models. Receiver operating characteristic curves of single parameters in all sample (a). Receiver operating characteristic curves of machine learning models in the test set (b). Calibration curves for the three model in testing sets (c). Decision curve analysis for the three model in testing sets (d)

Fig. 3
figure 3

Dual energy CT examination in a 40-years-old female patient with rectum SES-CD score 0. Endoscope, rectum (a), iodine centration in the arterial phase (b), iodine concentration in the vein phase (c), Z-Effective in the arterial phase (d), Z-effective in the vein phase (e), Slope of the energy spectrum curve in the arterial phase (f), Slope of the energy spectrum curve in the vein phase (g)

Fig. 4
figure 4

Dual energy CT examination in a 20-years-old female patient with left colon SES-CD score 6. Endoscope, Left colon (a), iodine centration in the arterial phase (b), iodine concentration in the vein phase (c), Z-effective in the arterial phase (d), Z-effective in the vein phase (e), Slope of the energy spectrum curve in the arterial phase (f), Slope of the energy spectrum curve in the vein phase (g)

Model variable selection

A comparison between the active and inactive bowels groups in the training set is shown in Table 4, and the results show significant differences (p < 0.05) except for age, sex, upstream dilation, and engorged vasa recta. The LASSO algorithm combined with 10-fold cross validation was used to further screen the characteristics when the minimum mean square error (λ was 0.032). Finally, three DECTE parameters and four radiographic features were significant difference: λ HU in the arterial phase, λ HU-V, and NIC in the arterial phase, wall thickness, stricture, segmental mural hyperenhancement, and regional lymph node size (Fig. 5).

Table 4 Difference between active and inactive segment in training set
Fig. 5
figure 5

LASSO feature screening pattern diagram. LASSO coefficients for machine learning features (a). A coefficient profile plot was generated at the selected log λ value using a tenfold cross-validation, seven machine learning features with the best coefficients were selected. Standard parameters (λ) selection in LASSO model used tenfold cross-validation with a minimum criterion (b). The optimal λ values are indicated by the vertical black lines, and a λ value of 0.032 was selected

Diagnostic performance of the machine learning model

The performance indicators of each model in the test set are shown in Table 5, among which model 2 and model 3 had a more balanced overall performance. The three ML models performed well in evaluating CD activity (AUC > 0.80), with the combined model having the highest AUC of 0.87(95% confidence interval (CI): 0.779–0.959) (Fig. 2b). However, the DeLong test showed no statistically significant difference in the AUC among the three models in the test set (the p value range of the three models was approximately 0.071 to 0.766, p value > 0.05), as detailed in Supplementary Table 3. The calibration curves showed that the fitting curves of the three models almost coincided with the diagonal, indicating a good fit with the actual data (Fig. 2c). Decision curve analysis showed that within the range of approximately 10% to 90%, the clinical net benefits of three models were higher than those of all and none, indicating that all three models had clinical net benefits within a certain threshold probability. Among them, the net benefits of model 2 and model 3 were higher than that of model 1 in the probability range of approximately 38% to 82% (Fig. 2d).

Table 5 The performance of three models in testing set

Discussion

This study explored the value of DECTE quantitative parameters in evaluating CD activity and developed ML models for evaluating inflammation in CD patients, including a conventional CT model, a DECTE model, and a combined model. Although there was no significant difference in the AUC among the three models, the DECTE model and the combined model were more balanced in overall performance than the conventional CT model and exhibited better diagnostic performance than individual DECTE quantitative parameters alone.

Among the identified variables, wall thickness and segmental mural hyperenhancement had been previously identified as characteristic parameters in the traditional CT evaluation of CD activity [26, 27]. Literature reports that strictures can also distinguish CD activity [28], and our research confirms these results. Previous studies have shown that lymph node enlargement (length ≥ 1 cm) can be considered a sign of the active stage of CD, but it is more prominent in severely active intestinal segments [27, 29]. We found that there was statistical significance in the size of regional lymph nodes between active and inactive intestinal segments, which could be reactive hyperplasia of mesentery lymph nodes caused by CD activity. This result is contrary to the conclusion of Amir [30], who found that there was no significant difference in the size of regional lymph nodes (diameter > 3 mm) between the active and inactive groups. We believe that this may be firstly due to different reference standards for defining activity–clinical activity scores are nonspecific and cannot represent a certain inflammatory segment or clarify the contribution of the affected segment. Secondly, the included lymph nodes were too small, resulting in statistical insignificance. Although ulceration is an important parameter for CD activity, this study did not evaluate it due to the lower soft tissue resolution of CT compared to MRE.

Zhu et al [21] and Dane et al [16] suggested that NIC is a radiological marker for differentiating active and inactive bowels with SES-CD, which is consistent with the results of our study. We assumed that the outcome may be caused by inflammatory congestion, inflammatory cell infiltration, and noncaseous granulomas in CD patients. λHU represents the attenuation changes within the lesion during the passage of contrast agent. We found that λ HU-V and λ HU-A were significantly correlated with CD activity, consistent with previous studies [22, 31], indicating that the amount of contrast agent increases as the blood vessels increase when CD is in the active phase. Our results show that λ HU-V had better diagnostic efficacy (AUC 0.81 vs 0.79) than λ HU-A, which is consistent with previous research results [31, 32]. This may be because when active inflammation occurs, although the vasa recta expands and increases, the arterial imaging is too early and the contrast agent does not fully enter the lesion. During the venous phase, the contrast agent is fully filled. In addition, when the contrast agent seeps into the extravascular space, the interstitial fibrous tissue can reduce the outflow rate of the contrast agent.

Machine learning is a subset of artificial intelligence. Using feature selection to reduce the dimensions of the data and adjust the hyperparameters can produce a more powerful and generalizable ML model. In recent years, ML in IBD has mainly been used for phenotype diagnosis, gene classification of gut microbiota, and prediction of postoperative recurrence [33,34,35]. A few studies have constructed ML models to evaluate CD activity and severity. Recently, all ML models constructed by Cai et al [36] performed well in predicting activity in CD test sets. Their study used the CDAI score as the assessment criterion for grouping, while we used the SES-CD as the standard, which displayed the activity of the affected intestinal segment more intuitively compared to CDAI. The Guez [37] team established a multimodal ML model to evaluate CD endoscopic activity by integrating MR information and biochemical indicators. The results showed that the length of diseased intestinal segments and the biochemical indicators were the most informative parameters. In summary, previous research results indicate the potential of ML to accurately and noninvasively assess intestinal activity. Our research also confirms this result. The use of DECTE to establish a ML model provides a new method for non-invasive quantitative evaluation of CD activity, which does not require complex calculations, and the parameters are intuitive and easily acquired. In addition, DECTE scans can reduce scan duration and radiation exposure because of their unique hardware design [38]. The model in this study follows the approach of gastroenterologists in evaluating diseases and specific intestinal segments, revealing the role of different features on the activity of diseased intestinal segments and providing an effective tool for precise clinical diagnosis and treatment decision-making. In addition, compared to traditional statistical methods, machine learning models can usually be rigorously validated.

There are a few limitations that should be noted in this study. First, we used manual ROIs to measure DECTE parameters; in the future, semiautomatic or fully automated methods should be developed to ensure measurement accuracy. Second, our study did not evaluate the correlation between biochemical biomarkers and CD activity, as the relative contribution of each inflammatory segment to the overall biochemical biomarker (such as CRP and ESR, etc.) is unknown. Third, false positive results are a concern in the model. Increasing sample size and using multiple algorithms may be a key factor in reducing false positive rates and improving diagnostic accuracy in the future. Fourth, deep learning is a branch of machine learning and the mainstream trend of future artificial intelligence development. Due to sample size and time constraints, we will further explore the application value of deep learning in Crohn’s disease in the future. Finally, this was a single-centre study whose conclusions require additional validation with multi-centre data before future clinical applications.

Conclusion

Our machine learning model based on DECTE can feasibly evaluate intestinal segment activity in CD patients, and the DECT parameters provide a quantitative analysis for the evaluation of specific intestinal segment activity in CD patients.