Magnetic resonance imaging (MRI) has gained a strong role in evaluation of luminal Crohn’s disease and is the preferred modality for evaluation of small bowel disease [1]. The terminal ileum is the most common location of small bowel Crohn’s disease and can be visualized by both MRI and ileocolonoscopy. A typical MRI protocol for Crohn’s disease evaluation includes a non-enhanced T2-weighted sequence with fat-suppression and T1-weighted sequences before and after intravenous administration of a gadolinium chelated contrast agent [2, 3]. MRI features such as the degree and pattern of bowel wall enhancement after intravenous contrast have shown to be linked to inflammation as assessed by endoscopic and histopathologic reference standards [4,5,6].

Diffusion-weighted (DW)-MRI has been used for large organs, such as the brain and liver, where quantitative measurements can be made with relative ease. However, the sensitivity of DW-MRI to motion artefacts has limited its applications for small bowel diseases [7]. Despite these difficulties, technical improvements and recent positive results of DW-MRI have encouraged new investigations into small bowel applications [8].

A recent study by Kim et al. found that the addition of DW sequences to contrast-enhanced (CE-)MRI did not provide a substantial benefit in terms of diagnostic accuracy. Although sensitivity was increased (62% to 83%), the added detection concerned mainly mild disease with doubtful clinical relevance, while a decrease of specificity was seen (94% to 60%) [9]. However, a different study by Qi et al., using capsule endoscopy as their reference standard, did find an improvement of diagnostic accuracy (79% to 92%) when DWI was added to CE-MRI [10]. A study by Seo et al. included 44 patients of the same cohort used by Kim et al. and focused on substitution of CE sequences with DW sequences in a conventional MRI protocol [11]. DW-MRI and CE-MRI showed 91.8% agreement for dichotomous classification of segments (inflammation or no inflammation) and comparable correlation to the Crohn’s Disease Endoscopic Index of Activity (CDEIS) (r = 0.61 and 0.71, P = 0.11). A study by Schmid-Tannwald et al. included 14 patients with internal fistulas and sinus tracts of different etiologies. They found no significant difference in the detection rate of fistulas and sinus tracts between CE-MRI (96%) and DW-MRI (76%) [12].

The effort to replace the use of intravenous contrast medium is motivated by the occurrence of side-effects, mainly nephrogenic systemic fibrosis, and the need to avoid contrast medium in certain groups of patients, such as children and pregnant women [13]. It should be noted that almost all reported cases of nephrogenic systemic fibrosis have occurred while using linear gadolinium agents in patients with end-stage kidney disease [14]. Recent research has also brought forward concerns over gadolinium depositions in intracranial neuronal tissue, which were found to be dose-dependent but unrelated to renal function [15]. However, the clinical implications of these findings are yet unclear. Although there are medical and financial motivations to reduce the use of gadolinium contrast media, the benefits of replacement should be thoroughly investigated to justify the omission of the well-established use of contrast-enhanced sequences. For a comprehensive comparison, different aspects of image assessment should be considered, such as image quality and evaluability, diagnosis of active disease, severity grading, and interobserver agreement. Furthermore, different combinations of incorporated scan sequences could lead to differences in performance and should be evaluated.

The purpose of this study was to determine the diagnostic and grading performance of CE-MRI, DW-MRI and combined protocols, for disease activity of the terminal ileum in Crohn’s disease patients.

Methods and materials


Between October 2011 and September 2014, patients with known or suspected Crohn’s disease were prospectively recruited as part of the VIGOR++ project (FP7/2007-2013, 270379). The full cohort of 106 patients has been previously published, in a study investigating the use of semi-automatic MRI measurements in Crohn’s disease patients [16]. For the current study, MRI examinations from a single centre (Amsterdam UMC, location AMC, Amsterdam, the Netherlands) were re-examined. Each patient underwent MRI and ileocolonoscopy within 2 weeks as part of their clinical follow-up. Patients with no endoscopic intubation of the terminal ileum or with missing essential MRI sequences were excluded from the analysis. Ethical permission was obtained from the hospital’s medical ethics committee and all included patients gave written informed consent.

MRI protocol

Patients fasted for 4 h prior to the examination. Oral contrast medium consisted of 2400 mL 2.5% Mannitol solution (Baxter, Utrecht, the Netherlands) split in two doses: 800 mL (3 h before examination) and 1600 mL (1 h before examination). Patients were scanned on a 3.0-Tesla MRI unit (Intera/Ingenia; Philips Healthcare, Best, the Netherlands) in the supine position using the protocol outlined in supplementary Table 1. A coronal balanced gradient-echo sequence was performed, followed by coronal and axial T2-weighted single-shot fast spin echo (SSFSE) sequences without fat suppression and an axial T2-weighted SSFSE sequence with fat suppression. An axial free-breathing DWI sequence (with b-values 0, 300, and 600 s/mm2) was used for apparent diffusion coefficient (ADC) mapping. A coronal 3D T1-weighted spoiled gradient echo (SPGE) sequence with fat suppression was performed before injection of intravenous gadolinium contrast. A dynamic contrast-enhanced (DCE) coronal 3D T1-weighted SPGE was performed. Sixty seconds after the start of DCE-MRI, 0.1 mL/kg body weight of gadobutrol (Gadovist 1.0 mmol/mL, Bayer Schering Pharma, Berlin, Germany) was administered intravenously by bolus injection (5 ml/s). Subsequently, coronal and axial 3D T1-weighted SPGE sequences with fat suppression were performed in the delayed phase (± 7 min after injection). To achieve spasmolysis, 10 mg of butylscopolamine bromide (Buscopan, Boehringer Ingelheim, Germany) was administered intravenously three times at even intervals during the examination (supplementary Table 1).


Ileocolonoscopy was performed using standard bowel preparation and equipment (model CF-160L, Olympus) within 2 weeks of the MRI examination by either a gastroenterologist or senior resident under direct supervision of a gastroenterologist. The endoscopist was blinded to results from MRI, with the exception of cases where balloon-dilatation was indicated. For those cases, the stricture length on MRI was used to determine the feasibility of balloon-dilatation. The segmental Crohn’s Disease Endoscopic Index of Severity (CDEIS) was calculated for all endoscopically intubated terminal ileum segments [17].

Image analysis

Three abdominal radiologists (C.Y.N., K.H., B.M.) with respectively 18, 8, and 11 years of experience in IBD imaging evaluated each case at two different time points.

Cases were initially evaluated using either CE-MRI or DW-MRI. Balanced GE and T2-weighted SSFSE (with and without fat suppression) were included in both cases. Directly after evaluation with the initial protocol, the omitted sequences were added to form a combined protocol (CE/DW-MRI or DW/CE-MRI, depending on the initial protocol).

Cases were randomly assigned an initial evaluation protocol at the first time point and the assignment was reversed at the second time point. To reduce memory bias, an interval of 6 weeks was used between the first and second time point, and case numbering and order were again randomized.

Imaging sequences were separately graded for quality (0—non-diagnostic, 1—diagnostic, numerous artefacts, 2—diagnostic, few artefacts, 3—diagnostic, no artefacts). Disease activity was graded using the MRI features based on T1-, T2-, and diffusion-weighted sequences and grading criteria presented in Table 1  (see Figs. 1, 2 for example cases). Active disease was defined as the presence of > 0 grade on one or more disease features. Using 11-point Likert scales, readers graded overall severity (0: no disease—10: very severe disease) and their confidence for grading (0: not confident—10: fully confident). After initial evaluation, the omitted sequences were added and the following features were once more evaluated: T1 enhancement and pattern or DWI mural signal (depending on the added sequences), stenosis, comb sign, fistulas, abscess and enlarged lymph nodes. Optionally, overall severity grading and level of confidence could be changed. Comb sign, enlarged lymph nodes, fistula, and abscess were deemed present when found by at least two of three readers.

Table 1 MRI features and grading criteria
Fig. 1
figure 1

21-year-old male patient with Crohn’s disease. The axial post-contrast T1-weighted sequence shows mural thickening and marked layered contrast enhancement of the neo-terminal ileum (A). Correspondingly, a high mural signal intensity is seen on DWI (b = 600 s/mm2) (B). Endoscopic images show numerous ulcerations starting at the anastomosis (C)

Fig. 2
figure 2

67-year-old female patient with Crohn’s disease. The axial post-contrast T1-weighted sequence shows mural thickening, marked layered contrast enhancement and luminal narrowing at the neo-terminal ileum (A). However, DWI only shows a minor increase in mural signal (b = 600 s/mm2) (B). On endoscopy, a narrowed anastomosis was found with multiple ulcerations leading into the neo-terminal ileum (C)

Statistical analysis

Active endoscopic disease was defined as a CDEIS ≥ 3 [18]. Parameters for the diagnostic value of MRI for endoscopic active disease were calculated for all imaging protocols and sensitivity, specificity, and accuracy were compared using the McNemar test. Individual features and severity grading scores were correlated to CDEIS using the Spearman rank correlation. For comparison of correlation coefficients between datasets, the Steiger’s Z-test for dependent, overlapping correlations was used [19]. Confidence scores were compared using the Wilcoxon signed rank test. Interobserver agreement was calculated using Fleiss’ kappa coefficients for binomial data and intraclass coefficients for continuous and ordinal data [20, 21]. Kappa and intraclass correlation coefficient (ICC) values were interpreted using the following criteria: ≤ 0.20, poor; ≥ 0.21–0.40, fair; ≥ 0.41–0.60, moderate; ≥ 0.61–0.80, good; ≥ 0.81–1.00, very good [22]. Interpretation of Spearman’s correlation coefficient was as follows: 0–0.20, very weak; ≥ 0.20–0.40, weak; ≥ 0.40–0.60, moderate; ≥ 0.60–0.80, strong; ≥ 0.80–1.00, very strong. A P value of < 0.05 was considered significant. All analyses were performed in SPSS 22 for Mac (SPSS, Chicago, Ill) and R Statistical language (v3.1.2, Vienna, Austria).



From a total of 89 eligible patients, 28 were excluded as detailed in Fig. 3. Eventually, 61 patients were evaluated. Six of these patients had a final diagnosis other than Crohn’s disease: ulcerative colitis (n = 1), irritable bowel syndrome (n = 4), and unclear diagnosis (n = 1). These patients were included in the analysis, as these diagnoses were unknown before MRI and ileocolonoscopy, and prior indication for MRI was the same as for other patients. Patient clinical characteristics are presented in Table 2.

Fig. 3
figure 3

Flow diagram detailing patient in- and exclusions

Table 2 Patient characteristics

Image quality and distension

For each evaluation protocol, a total of 183 evaluations were performed by the three readers. CE and DW sequences showed good image quality (score 2 or 3) in 98% and 93% of cases, respectively, with mean (SD) image quality scores of 2.9 (0.4) and 2.6 (0.7). In observer 1 and 2, no significant differences were seen, although for observer 3, CE sequences showed a significantly higher image quality. CE sequences were all rated as diagnostic, while six DW sequences were rated as non-diagnostic due to severe artefacts (3%). The terminal ileum could be evaluated on CE and DW sequences in 98% and 92% of cases, respectively. Adequate terminal ileum distension (score 2 or 3) was seen in 88% of evaluations.

Evaluation of diagnostic accuracy

Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy for each reader for detection of disease activity on MRI as compared to the endoscopic reference standard are found in Table 3. No significant differences were seen between imaging protocols (P > 0.1).

Table 3 Diagnostic accuracy values for all imaging protocols per reader

Assessment of disease severity and confidence

Correlations between severity grading and CDEIS, interobserver agreement for severity grading and levels of confidence are presented in Table 4. No significant differences were found between the correlation coefficients of different imaging protocols (P > 0.05), although CE-MRI showed numerically higher coefficients than DW-MRI for all readers. Interobserver agreement for severity grading was very good for CE-MRI, CE/DW-MRI, and DW/CE-MRI (ICC: 0.84, 0.82, and 0.85, respectively) and good for DW-MRI (ICC: 0.79). Confidence levels were significantly higher for CE-MRI than for DW-MRI (P < 0.02). Combined imaging protocols showed significantly improved confidence levels over DW-MRI for all observers (P < 0.001) and over CE-MRI for observer 1 and 2 (P < 0.001), but not for observer 3 (P = 0.06).

Table 4 Severity grading correlation to CDEIS and confidence scores for each observer and imaging protocol. Severity grading confidence for each imaging protocol

Individual MRI features

Strong correlation to CDEIS was seen for following features: wall thickness (r = 0.64–0.72), mural T2 signal (r = 0.64–0.77), T1 enhancement (r = 0.62–0.75), and enhancement pattern (r = 0.63–0.77). Mural DWI signal showed moderate-to-strong correlation to CDEIS (r = 0.58–0.71), while perimural T2 signal and stenosis showed weak-to-moderate correlation to CDEIS (r = 0.21–0.48 and r = 0.33–0.55, respectively). ICC values showed good interobserver agreement for wall thickness (0.75–0.78), mural T2 signal (0.77–0.82), T1 enhancement (0.75–0.83), enhancement pattern (0.75–0.77), and mural DWI signal (0.67–0.70), while lower agreement was seen for perimural T2 signal (0.50–0.62) and stenosis (0.52–0.63).

Five fistulas were identified using CE-MRI, of which three were identified using DW-MRI, while the other two were only identified by one reader using DW-MRI. No additional fistulas were described using DW-MRI or the combined imaging protocols. No abscesses were identified. CE-MRI showed good kappa values against fair-to-moderate kappa values for DW-MRI: comb sign (0.66 vs. 0.45), fistula (0.71 vs. 0.53), and enlarged lymph nodes (0.61 vs. 0.22).


The results from our study indicate that CE-MRI and DW-MRI have comparable accuracy in diagnosis and grading of disease activity, although readers had significantly higher levels of grading confidence using CE-MRI. Despite the low prevalence of fistulas, a discrepancy in detection rate was seen in favor of CE-MRI. Good agreement was seen for detection of extramural disease features (i.e., comb sign, fistula, and enlarged lymph nodes) on CE-MRI, against fair-to-moderate agreement using DW-MRI. Furthermore, a minor proportion of DWI sequences were considered non-evaluable (8%). Combined imaging protocols showed no increase in diagnostic or grading performance, with the exception of increased confidence levels over CE-MRI for two out of three readers. This discrepancy did not seem related to the readers’ experience in IBD imaging.

A number of previous studies have compared separate aspects of the use of CE-MRI and DWI-MRI for Crohn’s disease. Two pediatric studies compared the accuracy for detection of small bowel lesions between DWI-MRI and CE-MRI, and found that DWI-MRI provided similar or even better performance [23, 24]. A recent longitudinal study by Huh et al. showed that DWI-MRI could be used to diagnose complete remission and improved inflammation after medical therapy with 76% and 84% accuracy, respectively [25]. Our results are similar to that of a previous study by Seo et al., which found no significant difference in terms of correlation to CDEIS between CE-MRI (r = 0.71) and DW-MRI (r = 0.61), although a similar discrepancy for detection of penetrating complications was reported in favor of CE-MRI [11]. In two studies by Schmid-Tannwald et al, CE-MRI showed no significant differences to DW-MRI for sensitivity for the diagnosis of active inflammation (0.80 vs. 0.67) and detection of fistulas and sinus tracts (0.96 vs. 0.76) [12, 26]. Although these studies, and our own, found no significant differences between CE-MRI and DW-MRI, the consistency of numerical differences in favor of CE-MRI raises concern whether these are not based on random variance and might indicate a clinically relevant difference in diagnosis and grading. Preferably, a systematic review and if possible a meta-analysis of the mentioned studies should be performed to definitively conclude whether CE-MRI has superior diagnostic and grading accuracy over DW-MRI. Furthermore, none of the previous studies evaluated levels of confidence, which should also be taken into account.

Our study had several limitations that should be addressed. Due to the limited field-of-view of the DWI sequence, which was positioned on the terminal ileum, only these segments were analyzed in the current study. A similar study reported a higher rate of false positives in colonic segments, compared to terminal ileum segments [9]. However, a field-of-view for DWI covering the complete abdomen would be equally recommendable for visualization of the small bowel. Delayed contrast-enhanced sequences used in our study and portal-venous phase sequences have both shown capable of detecting and grading mural lesions [4, 27], although a recent study found that an enhancement ratio between early and delayed sequences could be used for further characterization of inflammation and fibrosis [28]. In our study, DW-MRI was performed using three b-values (0, 300, 600 s/mm2). A recent study and review advised the use of a slightly higher b-value of 800 s/mm2 to provide the best diagnostic accuracy and signal-to-noise ratio [8, 10].

The use of quantitative DWI measurements, namely the apparent diffusion coefficient (ADC), has been investigated in several research studies, and has shown promise as a biomarker for bowel inflammation, although concerns over its reproducibility should be addressed in future studies [29]. As such, ADC measurement was not included in the current study. Quantification of contrast enhancement can be obtained using the relative contrast enhancement (RCE) feature, which is incorporated in the Magnetic Resonance Index of Activity (MaRIA) [6]. However, manual region of interest placement and corrections of signal intensity values on certain scanner types complicate the use of these measurements [30].

A recent meta-analysis which investigated DW-MRI for the diagnosis of bowel inflammation in Crohn’s disease revealed a sensitivity of 79% and specificity of 61% [31]. They reported a high heterogeneity in the collected data and concluded that accuracy was likely overestimated in some studies, due to issues such as lack of blinding and use of contrast-enhanced sequences as a reference standard. Reasons for the high number of false positives in DW-MRI are yet to be investigated, but are suggested to be caused by inadequate bowel distension and preparation [8].

Results from our study and other studies show no significant differences between CE-MRI and DW-MRI in terms of diagnosis and grading of bowel inflammation. However, we found higher levels of grading confidence for CE-MRI, a higher rate of evaluable scans and a discrepancy in diagnosis of penetrating complications in favor of CE-MRI. Based on our findings, we would recommend the use of CE-MRI for routine examinations, while DW-MRI can be a good alternative in patients with contraindications for intravenous contrast medium. Although combined imaging protocols showed increased confidence scores, they did not perform better in terms of diagnostic or grading capabilities and revealed no additional findings.