Before the initiation of medical or surgical therapy for symptomatic Crohn’s disease (CD), it is crucial to assess whether inflammatory activity is present, because even though the CD may be in remission, symptoms of coexisting irritable bowel syndrome (IBS) may mimic active disease. It also is important to distinguish bowel obstruction due to inflammation from stenosis due to residual fibrotic stenosis as these respectively warrant medical therapy or surgical therapy. Furthermore, if inflammatory activity is present, it is important to distinguish between mild, moderate or severe disease as medical management differs among the disease stages [1, 2].

The reference standard for diagnosing active CD and staging disease activity is endoscopy [3]. However, with standard endoscopic techniques only part of the bowel can be visualized, while the low patient acceptance forms another drawback of this technique.

Many studies have advocated the use of computed tomography (CT) for abdominal evaluation in patients with CD, as it is an accurate and patient-friendly technique [48]. However, during an abdominal CT examination patients are exposed to considerable radiation doses (mean cumulative effective dose is 36.1 mSv; however, more than 75 mSv can be obtained) [9].

As assessment of disease activity is often necessary repeatedly, the excess lifetime cancer mortality risk attributable to radiation exposure will increase when abdominal CT is used for CD evaluation. It has been estimated that about 1.5 to 2.0% of all cancers in the US may be attributable to the radiation from CT studies [10]. In contrast, magnetic resonance imaging (MRI) is an investigation that does not require the use of ionizing radiation. As it also is a non-invasive technique, MRI is increasingly used for abdominal evaluation in patients with CD [1113]. However, while MRI has been shown to be accurate in diagnosing active CD [14, 15], the accuracy of MRI in staging disease activity is not so clear yet. As MRI is inferior to colonoscopy in the detection of subtle mucosal detail, MRI might provide false-negative results in patients with mild, superficial CD. This hypothesis is supported by findings from several studies in which false-negative MRI results were seen in patients with active, mostly mild CD [1619]. However, in other studies disease activity was overestimated on MRI [2022].

Thus, the purpose of our study was to systematically review the accuracy of MRI in staging disease activity in CD by performing a meta-analysis.

Materials and methods

Search strategy and study eligibility

A computer-assisted search was performed of the MEDLINE, EMBASE, CINAHL and Cochrane databases to identify papers reporting the accuracy of MRI in staging CD activity. In MEDLINE and EMBASE, we used “Crohn disease (MeSH)” and “Magnetic resonance imaging (MeSH)” as search terms. For searching the CINAHL and Cochrane databases, we used “Crohn disease” and “Magnetic resonance imaging” as free text words. The search period was restricted from 1990 through April 2007. No age limits or language restrictions were applied.

Titles and/or abstracts of all retrieved papers were checked by one observer (KH) to determine eligibility for inclusion. Reference lists of review articles and eligible studies were checked manually to identify other relevant papers. Hand searching of major journals was not performed. Only data that were presented as full-text articles were eligible for inclusion. As field strength of most MRI systems currently used in clinical practice is ≥1.0 T, we decided to exclude papers in which MRI field strength was ≤0.5 T. All eligible articles were retrieved as full-text articles.

Study selection

Two reviewers (KH and SB) independently checked all retrieved articles to check whether they satisfied the following criteria: (1) they provided data on disease activity of CD; (2) MRI was used to evaluate CD; (3) findings at histopathology, colonoscopy and/or intra-operative findings were used as the reference standard; (4) positive criteria were defined for MRI (i.e., criteria described to stage disease activity); (5) data were available to fill out cross-tabs (for calculation of agreement in staging disease).

If all criteria were met, the article was included in the study. Disagreement between the two reviewers regarding inclusion was resolved by consensus. The authors of the primary research were approached for additional information, if neccessary.

Study characteristics

Both reviewers independently assessed study characteristics of the included studies and extracted relevant data, described in detail below, by using a standardized form. No blinding of authors’ information, authors’ affiliation or journal title was performed. Inconsistencies in assessment of the included studies were resolved by consensus.

Patient characteristics

The following patient characteristics were recorded: (1) number of patients; (2) sex ratio distribution; (3) mean age (range); (4) part of the gastrointestinal tract examined.

Study quality assessment

To assess study quality characteristics, the QUADAS tool was used as a guideline. The QUADAS tool has been developed for reviewers to evaluate the quality of studies and especially studies of diagnostic accuracy [23, 24]. The following characteristics were assessed:

  1. (1)

    Whether the spectrum of patients was representative of the patients who will receive MRI in practice;

  2. (2)

    If selection criteria were clearly described;

  3. (3)

    Whether the time period between the MRI and the reference standard was short enough to be reasonably sure that the condition did not change between the two tests;

  4. (4)

    Whether all patients received verification using a reference standard;

  5. (5)

    Whether the execution of the MRI was described in sufficient detail to permit its replication (we considered the MRI description as sufficient if information was provided about the following imaging features: magnetic field strength; type of coil used, bowel preparation used, and sequences used for evaluation; the use of intravenous and/or luminal contrast medium);

  6. (6)

    Whether the execution of the reference standard was described in sufficient detail to permit its replication (we considered the reference standard described as sufficient if the criteria used for diagnosing the different disease stages were defined);

  7. (7)

    Whether the MRI results and the reference test results were evaluated independently;

  8. (8)

    Whether interpretation of the MRI results was independent of clinical information.

Imaging features

The following imaging features were recorded for MRI, if available: (1) magnetic field strength; (2) coil used (body or surface); (3) bowel preparation and type of bowel preparation (bowel cleansing, fasting and/or diet, use of spasmolytic medication); (4) amount and type of intravenous and/or luminal contrast medium (enteroclysis, oral and/or rectal contrast medium) if administrated; (5) sequences used for disease evaluation.

Imaging criteria used for staging disease activity

For each study the imaging criteria that were used to stage CD on MRI (e.g., pathological bowel wall thickening, pathological bowel wall enhancement and stenosis) were noted.

Reference standard

The verification method used (surgery, histopathology and/or colonoscopy) was recorded for each study.

Data extraction

For each study, 3 × 3 (remission, mild, frank) or 4 × 4 (remission, mild, moderate, severe) contingency tables were extracted from the articles, depending on the way of reporting.

Data analysis

An overall analysis was performed for the 3 × 3 data. For this approach, 4 × 4 tables were reconstructed to 3 × 3 tables by grouping moderate and severe disease together as frank disease. For the 3 × 3 data, analysis was performed using a multivariate random-effects approach [25] performed by using a Bayesian algorithm [26] in the Winbugs program. Summary estimates were calculated. If studies reported data for multiple independent observers, we used the data leading to the lowest Aikaike information criterion (AIC) value to calculate summary estimates; a lower AIC value indicates a better fit of the data [27].

Analysis on 4 × 4 tables could not be performed due to the limited amount of data per stage. The results of the indivual studies are described.


Search strategy and study selection

The search strategy resulted in 253 articles; 36 were found to be eligible after reading the abstract and were retrieved as full text for further analysis. Finally, seven papers [17, 18, 2832] fulfilled all inclusion criteria and were used for data extraction and data analysis (Appendix 1). There was no disagreement regarding inclusion between the two reviewers.

Study characteristics

Inconsistencies in assessments of the included studies among reviewers were resolved by consensus. Of the 56 items scored (8 per study), for 6 items, inconsistencies existed.

Patient characteristics

In three of the seven included studies, only patients with CD were evaluated [17, 31, 32]. In the other four studies, both patients with CD and with ulcerative colitis were included [18, 2830]. For our analyses we only used the data on CD. In two studies [29, 30] children were included; in the other studies, only adult patients were included (Table 1).

Table 1 Patient characteristics of the included studies

Study design characteristics

Selection criteria were described in six of the seven studies. In four of the studies patients were eligible for inclusion if they were scheduled for a colonoscopy. Hardly any clinical and laboratory data were provided in detail. Verification of results was complete in all studies, but in some of the patients the entire bowel could not be examined. In the studies evaluating disease activity per bowel segment, only the segments that were inspected at colonoscopy were used for comparison with the MRI findings. The criteria used to determine the presence of CD on the reference standard were not uniformly described. Information on whether the reference test was evaluated independently from the index test was not reported (Table 2).

Table 2 Study design characteristics

Imaging features and imaging criteria used for diagnosis

In six studies the magnetic field strength was 1.5 T; in one study the field strength was 3.0 T. The bowel preparation, the use of luminal contrast medium and the type of coil that was used were not reported adequately in all studies. The type, concentration and amount of the intravenous contrast medium was reported in all studies (Table 3).

Table 3 Imaging features and criteria

The one criterion considered indicative of disease in all studies was pathological bowel wall enhancement, while bowel wall thickening was used as parameter in six studies. However, different appraisals were used to determine pathological bowel wall enhancement; in the older studies percentages of contrast enhancement were used (post- and precontrast MRI), with higher ratios indicating more severe disease [28, 29]. In other studies subjective enhancement was used to stage disease [17, 18, 3032]. With regard to bowel wall thickening, cutoff values to indicate the different stages of disease were provided only in one study [30]. All other imaging criteria (e.g., presence of stenosis, lymphadenopathy) were inconsistently used.

Data extraction

Data were reported on a per-patient basis in five studies [2832] and on a per-segment basis in two studies [17, 18]. For the studies reporting segmental data, we grouped the available segmental data per patient to enable data analysis on a per-patient basis; only bowel segments that were inspected endoscopically or surgically were included, and the most severe segmental score was used for analysis. In four studies distinction was made among remission, mild, moderate and severe disease [28, 29, 31, 32]; in two studies only remission, mild and frank disease were distinguished [17, 18]. In one study numerical scores from 0 to 4 were given with stages 1 and 2 representing mild disease and stages 3 and 4 representing frank disease [30]. For each study, 3 × 3 (remission, mild, frank) contingency tables were constructed.

Summary estimates

The 3 × 3 data

For calculating the summary estimates on the 3 × 3 data, we grouped moderate and severe disease together as frank disease from studies reporting 4 × 4 tables [28, 29, 31, 32].

In addition, in two studies [31, 32] results were provided for two observers (see Table 4). In both studies the data obtained by the second observer led to the lowest AIC value. Therefore, we used the results of observer 2 in both studies for calculating the summary estimates (Table 4). Data of in total 140 patients were used for the 3 × 3 data analysis with 16 patients in remission, 29 with mild disease and 95 with frank disease.

Table 4 Summary estimates for 3 × 3 data on per-patient basis

MRI correctly staged frank disease in a large proportion of patients (91%). Correct staging of mild disease by MRI occurred in 62% (95% CI: 38–84%) of the patients, and this estimate has a broad confidence interval, indicating the heterogeneity within the results. For remission, correct staging by MRI occurred in 62% (95% CI: 44–79%) of patients, also with heterogeneous results.

MRI overstaged disease activity in 37% of patients in remission, mostly as mild disease (31%). Overstaging of mild disease as frank disease was observed in 21% and understaging in 17%.

The 4 × 4 data

In four studies distinction was made among remission, mild, moderate and severe disease [28, 29, 31, 32]. In two of these studies [31, 32], results were provided for two observers. However, due to the low number of data, analysis could not be performed to present summary estimates. In total 72 patients were evaluated in these four studies: 15 patients in remission, 20 with mild disease, 21 with moderate disease and 16 with severe disease. The results of the individual studies are reported in Table 5.

Table 5 Individual study results for 4 × 4 data on a per-patient basis


MRI was highly accurate for diagnosing patients with frank disease. MRI more often overstaged than understaged disease activity in CD, but in most of these patients radiological staging and disease staging by the reference standard differed one grade.

An explanation for the inaccuracy in staging of patients with mild disease and patients in remission of MRI compared with the reference standard is the relative inexperience with evaluation of abdominal MRI for CD. Although bowel wall enhancement and bowel wall thickening are recognized as important parameters that indicate CD, no strict cutoff points have been defined yet to differentiate between the different stages of disease. This is reflected by the variation in definitions used in the different studies. In all included studies the subjective evaluation of the observers was very important for staging. Even in the studies wherein cutoff points were clearly described to differentiate among the different stages of disease, the radiologist had to subjectively define which bowel loop to use for assessment of enhancement and thickening.

Also, more patients were included with frank disease than with mild disease, while patients in remission were least often included. Frank disease is often easier to diagnose than mild disease or remission, as in this disease stage the parameters indicative of disease are most pronounced.

Another explanation for inaccuracy of MRI in staging is the fact that MRI and the reference standard are essentially different methods. With ileocolonoscopy only the lumen and the inner surface of the bowel wall can be assessed, while tissue sampling for histopathological examination only provides mucosal specimens. Meanwhile, on MRI the entire bowel wall with all its layers and the extraintestinal abdomen (e.g., the mesenteric vessels, mesenteric lymph nodes, mesenteric fat) are evaluated. As CD is a transmural disease, the extent of inflammatory or fibrotic changes might be better assessed on MRI than by inspection of the mucosal surface. A good next step would therefore be to compare MRI results with surgical pathology as in this manner all bowel wall layers can be examinated.

We only determined the ability of MRI to grade disease activity for the colon and terminal ileum, while CD can also be localized in the small bowel. We decided to limit our meta-analysis to findings in the colon and terminal ileum, as no reference standard was available for grading disease activity of the small bowel. The investigation that has often been used for evaluation of small bowel CD in the past (i.e., small bowel barium examination) is increasingly considered to be an imperfect reference standard. Comparative studies of MRI with established superior reference tests for the small bowel, such as double-balloon endoscopy (DBE) or video capsule endoscopy (VCE), are very scarce [33] as these endoscopic techniques were not commercially available until very recently and are only limitedly available at present. Also, for VCE or DBE the assessment of the severity of CD of the small bowel is not standardized yet.

A limitation of our analysis is the fact that we grouped moderate and severe disease together as frank disease. Information about the ability of MRI to differentiate between moderate and severe disease is discarded in this manner. However, we decided to put these data together in order to provide a more robust statement regarding the accuracy of MRI for disease activity, as only a limited amount of data was available. We provided these data to show the limited number of studies and the extreme heterogeneity in results between studies.

Another limitation is that although we accepted only colonoscopic, histopathological and/or surgical results as reference standard, the criteria for determination of disease activity on the reference standard were not identical between studies. Therefore, activity assessment on the reference standard might not have been consistent between studies. This might have influenced pooled accuracy estimates of MRI for staging disease activity. However, all three reference methods are reliable and are often used for assessment.

We decided not to perform subgroup analysis on the differences in technique, MR imaging criteria used or reference methods used as conclusions from subgroup analysis would not be very reliable due to the limited amount of data available. Therefore, we can not draw conclusions on the influence of the aforementioned differences for staging disease.

Before MRI can be implemented in routine clinical practice for the evaluation of CD, more research should be done on the reproducibility of MRI of the small bowel and colon. In our meta-analysis only two studies looked at interobserver agreement, and both reported moderate kappa values [31, 32]. As an imaging technique should be both accurate and reproducible, more studies are required to determine the role of MRI in clinical practice.

Also, before MRI can be used as a valid alternative for colonoscopy in the assessment of CD activity, it should become clear which imaging criteria are consistent with the different stages of CD. If standardized criteria were available internationally, larger trials would be possible, while comparison among studies would also be simplified. For that purpose, a more standardized technical imaging approach would be advisable as well. Future research should therefore focus on standardization of preparation, imaging technique and more uniform imaging criteria used for diagnosis of disease, in addition to including larger numbers of patients.

It would be interesting to see how other imaging techniques commonly used for evaluation of CD (i.e., computed tomography, ultrasonography) would perform in staging disease activity. Data on staging disease activity in CD are lacking for these techniques; by using the same inclusion criteria as we described above, only one article on power Doppler sonography [34] would be eligible for analysis (data not shown).

In conclusion, MRI can be used for staging disease activity in CD as with MRI most patients with frank disease are correctly diagnosed. However, in patients with disease in remission and mild disease, correct staging is limited.