Introduction

Crohn’s disease (CD) is a subtype of inflammatory bowel disease (IBD), which results in progressive intestinal damage and disability [1]. Infliximab (IFX), a monoclonal antibody selectively targeting tumor necrosis factor-α (TNF-α), offers mainstay for managing CD patients with moderate to severe inflammation and improves mucosal healing and clinical remission [2,3,4]. However, approximately 13–40% of CD patients resist to the initial IFX therapy (primary nonresponse, PNR) [5, 6], related to elevated cost and even risk of severe side effects, and require the implementation of “precision medicine” [7, 8]. Thus, it is necessary to screen patients sensitive to IFX therapy in advance and develop prognostic tools for outcomes.

Although several risk factors have been identified to be associate with therapeutic nonresponse [2, 5, 7], conflicting results or lack of validation still obstruct the strategy on improving therapeutic outcome, due to the unclear mechanisms of PNR. Recently, visceral adipose tissue (VAT), involved in the pathogenesis of CD and associated with more complex disease phenotype [9], has shown strong association with recurrence and postoperative complications [10, 11]. Besides, evidence indicates that VAT as source of proinflammatory substances associates with chronic intestinal inflammation [12, 13]. Therefore, VAT might be a useful predictor for IFX response.

Computed tomography (CT) enables noninvasively measurement of VAT [14]. Previous studies have quantitative analyzed metrics of VAT volume to clarify the relationship between adipose and therapeutic response [15]. Furthermore, radiomics can efficiently extract numerous imaging features imperceptible to the naked eyes [16] and allow more accurate identification of the features of bowel lesions in CD patients [17]. Therefore, the use of CT-based radiomics to extract effective features from both intestinal lesions and VAT may potentially enhanced pharmacotherapy response prediction.

Given the prior highlighted values of radiomics and the non-negligible role of VAT in disease progression, we aim to develop a comprehensive radiomics model (VAT-bowel model) based on the pretreatment CT features of VAT and bowel features to compare with bowel model alone and to explore whether VAT can further improve the predictive efficacy on the basis of bowel model.

Methods

Patient and study design

In this retrospective study, a total of 231 patients with CD who underwent computed tomography enterography (CTE) before standardized IFX treatment for clinically and/or endoscopically active disease were consecutively recruited between January 2013 and December 2020 in two tertiary IBD centers under the institutional ethics review from both the Sixth Affiliated Hospital of Sun Yat-Sen University (center 1) and the First Affiliated Hospital of Sun Yat-Sen University (center 2).

The inclusion criteria were as follows: (a) patients underwent CTE within 1 month prior to IFX therapy; (b) treated with regularly standardized IFX induction therapy (5 mg/kg at weeks 0–2 to 2–6 induction, week 14 evaluation; (c) the absence of previous anti-TNF therapy; and (d) performed simple endoscopic score for Crohn’s disease (SES-CD) > 3 of standard endoscopy within 0.5 months prior to IFX therapy. The exclusion criteria were as follows: (a) poor CTE image quality that hindered analysis; (b) history of enterotomy, which may influence the nature radiomic features; (c) lack of posttreatment endoscopic comparison; and (d) poorly defined intestinal wall and VAT due to severe effusion around the lesion.

According to the inclusion and exclusion criteria, 231 patients with CD were included (Supplement Figure 1). Patients with CD who are at the First Affiliated Hospital of Sun Yat-Sen University between January 2013 and December 2020 were semirandomly allocated to training cohort and test cohort 1, maintaining a ratio of 7:3 (112 patients:48 patients). Another 71 CD patients who underwent IFX treatment in the Sixth Affiliated Hospital of Sun Yat-Sen University between January 2018 and December 2020 were allocated as test cohort 2.

Definition of primary response and nonresponse to infliximab therapy

Patients with CD in the two centers received standard IFX induction at the weeks 0–2 to 2–6, with a dosage of 5 mg/kg. At week 14, clinical symptoms, endoscopy, laboratory examination, and anti-IFX drug levels were collected from each patient to assess the efficacy of IFX therapy. Additionally, information was collected on whether IFX was used as monotherapy or in combination with an immunomodulator. Patients who did not show satisfactory improvement in a global physician assessment and required treatment changes such as dose escalation, corticosteroid addition, agent switch, or surgery were defined as having PNR. Otherwise, they were classified as primary response to IFX (PR). Besides, patients categorized as PR also needed to conform to the decrease of 50% in SES-CD relative to the baseline as recommended in expert consensus and prior reported clinical trials [18, 19]. Two gastroenterologists (C. Z. and H. Q.) retrospectively evaluated the SES-CD through the report description and endoscopic pictures.

Development and validation of VAT radiomics model

The study flowchart and the radiomics analysis workflow are shown in Fig. 1, which illustrates the procedure for the development of the VAT model, bowel model and VAT-bowel model, and details of the development.

Fig. 1
figure 1

The study flow chart (upper) and the radiomics analysis workflow (lower) (VAT model, radiomics model based on features extracted from visceral adipose tissue; bowel model, radiomics model based on features extracted from the inflamed bowel; VAT-bowel model, a combination of the VAT model and bowel model)

Radiomics features extraction and selection

Given the extensive and intricate distribution of VAT, it was automatically segmented on CT images by utilizing a deep learning-based framework called nnU-Net [20], which has been widely used in medical image segmentation task. To do so, two radiologists (Z.R. and Y.W.) scrutinized and modified the segmentation results with open-source software ITK-SNAP (version 3.4.0; https://www.itksnap.org) to determine the final volume of interest (VOI) in this study. The process of VOI segmentation on two cases is shown in Fig. 2, and the details are described in Supplementary materials A1.

Fig. 2
figure 2

The process of VOI segmentation on two cases, including automatic segmentation and radiologist’s modification. The dice similarity coefficients for cases 1 and 2 are 0.971 and 0.957, respectively. The red regions are the automatic segmentation results generated by the nnU-net, while the green regions indicate the area, modified from radiologists (VOI, volume of interest)

From each VOI on CT images, a total of 1130 radiomics features were extracted with the following categories: shape features, first-order features, texture features, and these imaging features transformed by different filters. The feature extraction was performed in the Python environment (version 3.6; https://www.python.org/) using PyRadiomics toolkit (version 3.0.1; https://pyradiomics.readthedocs.io/en/latest/index.html). More details are provided in Supplementary materials A2.

The reproducibility of the extracted radiomics features was assessed through inter- and intra-observer analysis with calculation of intraclass correlation coefficients (ICCs). Subsequently, a refinement process was conducted to purify the radiomics features (Supplementary materials A3).

Radiomics model development and validation

Based on the selected features, a binary classification model was built for distinguishing between PR and PNR by using a support vector machine (SVM) classifier and detailed in Supplementary materials A4.

The validation of the VAT radiomics model was conducted on the internal and external validation cohorts using the optimal subset of features and model parameters obtained from the training cohort. The final prediction probability for each sample in the validation cohorts was derived by averaging the outputs of all models generated by the LOOCV strategy during the training phase. The development and validation of the radiomics model were performed in Python environment (version 3.6; https://www.python.org/) using scikit-learn toolkit (version 19.0; https://scikit-learn.org/stable).

Development and validation of bowel and VAT-bowel radiomics models

In our previous study [17], a radiomics model based on the entire inflamed bowel in CTE images (bowel model) was established for identification of PNR to IFX therapy. To further improve its performance, radiomics features of VAT would be added to the bowel model to construct a comprehensive (VAT-bowel) radiomics model. The constructing process of both bowel and VAT-bowel radiomics model is referred to Fig. 1.

For the development of bowel model, the extraction and selection of bowel features were consistent with our previous study. For the development of VAT-bowel model, we combined the selected VAT features and bowel features and further selected them by using LASSO to obtain the optimal subset of VAT-bowel features. Both models were built by utilizing the SVM classifier and the LOOCV strategy, following the same development and validation process as the VAT model.

Statistical analysis

Evaluation of sample size

The predictive performance of the VAT model was assessed by using receiver operating characteristic (ROC) analysis. The area under the ROC curve (AUC) and the corresponding 95% confidence interval (CI) were calculated.

A sample size of 48 patients (34 PR and 14 PNR) is required for the LOOCV modeling of IFX therapy prediction based on the following conditions by using MedCalc Statistical Software (version 15.8; http://www.medcalc.org/) and detailed in Supplementary material A5.

Predictive performance comparison of bowel and VAT-bowel radiomics models

In order to explore whether and to what extent the efficacy of the model could be improved when incorporating VAT features into our previously developed bowel model, a comparative analysis was conducted between the performance of the bowel model and that of the VAT-bowel radiomics model using the following methods. DeLong’s test was employed to compare the AUCs of the models when McNemar’s test was used to compare the accuracy. The integrated discriminant improvement (IDI) index was calculated to compare the incremental predictive utility between the two radiomics models. Additionally, the Hosmer-Lemeshow goodness-of-fit test was utilized to evaluate the calibration performance of the models. The clinical utility of each model was assessed by calculating the net benefit at different threshold probabilities and conducting a decision curve analysis.

A two-tailed p value less than 0.05 was considered statistically significant. All statistical analyses were performed with R statistical software (version 4.0.4; http://www.r-project.org/).

Results

Patient characteristics

During the follow-up period, 33 patients (33/112, 29.5%) from the training cohort and 36 patients (36/119, 28.1%) (14/48 from test cohort 1 and 22/71 from test cohort 2) from the total test cohort experienced IFX treatment. Four univariate analysis recognized clinical factors in PNR, including BMI, CRP, Hb, and ALB, and exhibited significant difference compared to PR in the training cohort (Table 1, p < 0.05).

Table 1 Characters of patients in the training and test cohorts

Development and validation of VAT radiomics model

Radiomics features selection and predictive performance validation for the radiomics model

Based on LASSO algorithm, 12 features were selected for training or validation cohorts (Supplementary materials A6, Fig. 3). The correlation coefficients were shown in Table 2.

Fig. 3
figure 3

Heatmaps generated by unsupervised hierarchical clustering of the 12 selected features of the VAT model in the (a) training cohort and (b) total validation cohort, respectively. The feature values are standardized to the range of [0, 1] in order to achieve a clear view. Each row of the heatmap is one selected radiomics feature, and each column is a sample (PNR, red; PR, blue). At the top, generated dendrogram represents samples with similar information determined by clustering (PNR, primary nonresponse; PR, primary response to infliximab therapy; VAT model, radiomics model based on features extracted from visceral adipose tissue)

Table 2 The selected radiomics features of the VAT radiomics model and the corresponding coefficients

The VAT model alone could distinguish the PNR from PR group with a cut-off value of 0.280 (Fig. 4a and Table 3). The AUCs were 0.761 (95% CI, 0.672–0.839) in training cohort, 0.737 (95% CI, 0.590–0.854) in internal validation cohort, and 0.714 (95% CI, 0.595–0.815) in external validation cohort, respectively (all p < 0.005; Table 3; Fig. 5a–c). There were no significant differences in the AUCs among the three data cohorts according to DeLong’s test (all p > 0.500). With the Hosmer-Lemeshow test, the χ2 were 10.075 (p = 0.260), 13.76 (p = 0.088) and 2.056 (p = 0.979) in the training and two validation cohorts (Fig. 5d–f). VAT model possessed a relatively good net benefit in clinical utility over the three data cohorts, compared to the all positive and all negative curves (Fig. 5g–i).

Fig. 4
figure 4

Scatter plots of the predicted probabilities of the (a) VAT model, (b) bowel model, and (c) VAT-bowel model for distinguishing PR from PNR on all data cohorts. A horizontal solid line is drawn at each plot map and indicates the optimal cut-off value of 0.280, 0.190, and 0.268, respectively. The points above the solid line are classified as PNR (primary nonresponse) by the model, while those below the line are classified as PR (primary response) to infliximab therapy. The blue points represent the PR group confirmed by expert assessment; the red points then belong to the PNR group (VAT model, radiomics model based on features extracted from visceral adipose tissue; bowel model, radiomics model based on features extracted from the whole inflamed bowel; VAT-bowel model, a combination of the VAT model and bowel model)

Table 3 Predictive performance of radiomics models based on different features in differentiating PNR from PR in the training and validation cohorts
Fig. 5
figure 5

Predictive performance of VAT model, bowel model, and VAT-bowel model in the training cohort and internal and external validation cohorts. Plots in the first row are the ROC curves for the three models and show the performance to distinguish PNR from PR. The second row is the calibration curves of these three models in those three data cohorts, while plots in the third row are the results corresponding decision curve analysis (VAT model, radiomics model based on features extracted from visceral adipose tissue; bowel model, radiomics model based on features extracted from the whole inflamed bowel; VAT-bowel model, a combination of the VAT model and bowel model; ROC, receiver operating characteristic; AUC, area under the receiver operator characteristic curve; PNR, primary nonresponse; PR, primary response to infliximab therapy)

Development and validation of bowel and VAT-bowel radiomics models

Fourteen radiomics features were finally included in bowel model (Supplementary materials A7) with a cut-off vale of 0.190 (Fig. 4b). The bowel model reached a predictive performance to AUCs of 0.832 (95% CI, 0.750–0.896) in training cohort, 0.784 (95% CI, 0.641–0.889) in internal validation cohort, and 0.799 (95% CI, 0.687–0.885) in external validation cohort (all p < 0.001), respectively (Table 3; Fig. 5a–c).

The finally selected 12 VAT radiomics features and 14 bowel radiomics features were combined for the development of VAT-bowel model, and 22 features with nonzero coefficients were subsequently retained from the total of 26 features according to LASSO with an optimal λ value of 0.006 (lnλ =  − 5.116; Supplementary Figure 2C; Supplementary Table 2 and Fig. 6a and b). The visualized radiomics feature maps (overlaid on CT images) of two important texture features extracted from VAT and bowel from two patients (1 PNR and 1 PR) were shown in Fig. 6c. The VAT-bowel model showed the best predicted power (Fig. 4c) with AUCs of 0.873 (95% CI, 0.797–0.928) in the training cohort, 0.840 (95% CI, 0.706–0.930) in the internal validation cohort, and 0.833 (95% CI, 0.726–0.911) in the external validation cohort (Table 3; Fig. 5a–c). However, no significant differences were found from the AUCs among the three data cohorts according to DeLong’s test (all p > 0.500).

Fig. 6
figure 6

Heatmaps generated by unsupervised hierarchical clustering of the 22 selected features of VAT-bowel model in (a) training cohort and (b) the total validation cohort, and (c) examples of VAT and bowel feature maps overlaid on the CT images of four CD patients. The values of the heatmaps and feature maps are all standardized to the range of [0, 1], in order to achieve a clear view. In heatmaps (a and b), each row is one selected radiomics feature, and each column is a sample (PNR, red; PR, blue); the dendrogram at the top represents samples with similar information determined by clustering; the white arrows point to a VAT feature named “wavelet-LHH_glszm_LargeAreaLowGrayLevelEmphasis” or a bowel feature named “wavelet-LHL_glszm_LargeAreaEmphasis.” These two representative radiomics features are overlaid on CT images of four patients (cases a and b, with response to IFX therapy, predicted probabilities = 0.199 and 0.174; cases c and d, without response to IFX therapy, predicted probabilities = 0.622 and 0.523; VAT-bowel model’s cut-off value = 0.268) as shown in image (c). Both features demonstrate differences between patients in PR and PNR groups, with higher values from the PNR patients (cases c and d), suggesting more complex and coarse texture features of VAT and bowel. PNR, primary nonresponse; PR, primary response; IFX, infliximab; VAT, visceral adipose tissue; VAT-bowel model, radiomics model based on features extracted from VAT and the whole inflamed bowel

Predictive performance comparison between bowel and VAT-bowel radiomics models

The VAT-bowel model demonstrated superior performance over the bowel model for distinguishing PNR from PR, with higher AUC and accuracy in all data cohorts (Table 4 and Fig. 5). Although no significant differences were observed between the AUCs of the two models according to DeLong’s test (all p > 0.090), McNemar’s test revealed that the accuracy of the VAT-bowel model was significantly higher in training cohort (accuracy = 0.821 vs. 0.732, p = 0.076), internal validation cohort (accuracy = 0.813 vs. 0.708, p = 0.070), and external validation cohort (accuracy = 0.817 vs. 0.690, p = 0.035), which suggested good discrimination of VAT-bowel model.

Table 4 Predictive performance comparison of bowel and VAT-bowel radiomics models in differentiating PNR from PR in the training and validation cohorts

For calibration, both models exhibited good fit in all data cohorts with p > 0.100 by Hosmer-Lemeshow test (Table 4). The calibration curves of VAT-bowel model were closer to the ideal calibration curves than that of the bowel model as shown in Fig. 5d–f, indicating a relatively good calibration power of VAT-bowel model.

In addition, the IDI indices also indicated that the VAT-bowel model improved prediction efficacy compared to the bowel model in the training cohort (IDI = 0.031, p = 0.016), internal validation cohort (IDI = 0.042, p = 0.024), and external validation cohort (IDI = 0.037, p = 0.032).

With regard to the clinical utility, the decision curves (Fig. 5g–i) showed that the VAT-bowel model possessed a slightly better net benefit overall than the bowel model in predicting the outcome of IFX therapy.

Discussion

In this study, we demonstrated that RM is able to capture the pathophysiological changes occurring in VAT. Based on features of RM, it is associated with the response to IFX and could potentially provide additional information in predicting therapeutic response. Furthermore, we developed a comprehensive radiomics model combining with VAT and bowel features. Compared to using bowel-RM alone, this integrated model yielded significant improvement in predictive ability with the IDI of 0.031 in training cohort and 0.042 and 0.037 in two independent validation cohorts, respectively (all p < 0.05).

Current knowledge on the use of VAT in CD remains limited. Numerous studies have already indicated a positive correlation between VAT and CD activity, as well as its ability to predict complications, recurrence, and suboptimal response to biologic therapies, despite the mechanisms remaining elusive [12]. The accumulating evidence suggests that the metabolically active visceral adipose compartment may serve as a possible source of proinflammatory substances [12, 21]. BMI was used to assess VAT. However, it was not suitable for CD patients because of the malnutrition-induced weight lost [22, 23]. The quantitative VAT analysis with manually outlined contours from several vertebral levels can fail to capture microstructure with important messages in disease progress, although it has been used as indicators in predicting treatment efficacy [24, 25]. Moreover, time-consuming deficiencies in computing the area hinder its clinical applicability.

The mechanism underlying the correlation between adipose and inflammation of CD remains elusive. Studies have shown that adipose tissue plays a crucial role in the production of proinflammatory cytokines such as tumor necrosis factor-alpha (TNFα), interleukin-(IL)-6 (IL-6), and IL-8 (CXCL8) [26]. The presence of VAT is believed to establish a responsive immunological region surrounding the irritated intestine. The maintenance of balance between the host’s immune system and commensal microbiota heavily relies on the integrity of the gastrointestinal epithelial barrier. The translocation of bacteria into the mesenteric tissue leads to the development of mesenteric adipose and chronic inflammation, resulting in subsequent mucosal damage and inflammation [26]. High visceral adipose predicts complicated CD and disease exacerbation [24]. Our previous study also substantiates the utility of VAT as an indicator for assessing disease severity [9]. Therefore, we think that the VAT of CD expressing higher amounts of TNF-α can also affect the response to infliximab in patients with CD and may be acted as image features.

With the advent of medical artificial intelligence, it enables the comprehensive extraction of multidimensional information from lesions. Radiomics quantifies image features using voxel values and their interrelationships. In our study, morphological features such as flatness and sphericity provide a quantitative description of the physical appearance of the lesion. It is worth to note that 88.9% (10/12) of the VAT features were wavelet. The wavelet transform enables the decomposition of noise and useful signal into different scales, allowing for the conversion of wavelet coefficients and thereby facilitating the distinction between useful signal and noise [27]. CD patients showed different VAT texture characteristics from healthy people’s CT imaging. In medical images, the quantitative or qualitative changes of texture features often reflect the pathological changes of the body. Besides, it was reported that wavelet features were strongly associated with survival in patients with hepatocellular carcinoma and biological characteristics of ICC, which can also quantify intratumoral heterogeneity [28, 29]. The wavelet transformation offers possibility to decompose special patterns, not visible to the naked eye, and enables the quantification of VAT heterogeneity, was caused from pathological variation or inflammatory cytokine infiltrations [30, 31]. Our study substantiated that VAT-RM can effectively capture these pathological changes in VAT in CD noninvasively and convert them into radiomic features, thereby reinforcing the evidence of mechanisms to impact VAT on revealing therapeutic efficacy and influencing strategy selection. However, it is also challenging to infer the connection between these characteristics and biological differences solely from the data presented in this study, thus necessitating additional research on the underlying factors contributing to variations in radiomics features within adipose tissue.

Consistent with previous study [17], bowel RM served as a promising technique in tailoring treatment strategy in CD patients, exhibiting satisfactory performance in predicting effectiveness and robustness, while peri-lesion microenvironment such as VAT was ignored [17]. In our study, the combined model consisting of VAT and bowel radiomics features outperformed bowel RM alone for identifying CD patients at high risk of PNR for IFX both in training and testing cohort. From a statistical point of view, although the lack of a significant improvement in AUCs suggests that the overall performances of the two models are roughly equivalent, it is worth noting that the accuracies of the VAT-bowel RM tend to be higher at the chosen threshold, and the variations in the IDIs for the integrated model are meaningful. Moreover, VAT-bowel RM exhibits better goodness of fit and overall has a slightly better net benefit than bowel RM alone. Our study showed that the comprehensive model was superior to the bowel RM alone, which could provide more information to judge the probability of achieving PNR before treatment in patient who intends to receive IFX treatment.

This study had several limitations. Although, MRE is a preferred examination for CD patients as it is radiative-free and can provide more biological information [32], we used CTE rather than magnetic resonance enterography (MRE) to develop the radiomic signature. In future prospective studies, however, a CT-based radiomics framework may facilitate artificial development in the field of MR through transfer learning. Secondly, the radiomics signatures were extracted from single-phase CT images, underutilizing the information in the CT images. We will integrate the radiomic information from plain, arterial, and venous CT images in further research. Lastly, the sample size in this study is still limited. Multicenter validation with a larger sample size of patients is essential to obtain higher-level evidence for future clinical applications.

In conclusion, VAT has effect on detection of IFX treatment response and improves the performance for identification of CD patients at high risk of primary nonresponse to IFX therapy. We have conducted a CT-based radiomics model (RM), composed from influencing factors for VAT-bowel analysis in differentiation nonresponse from response patients under IFX treatment. Our results suggested that comprehensive RM captured the pathological changes occurring in VAT and bowel lesions, which could help to identify CD patients who will be resistant to IFX at the beginning of therapy.