Magnetic resonance enterography, small bowel ultrasound and colonoscopy to diagnose and stage Crohn’s disease: patient acceptability and perceived burden

Objectives To compare patient acceptability and burden of magnetic resonance enterography (MRE) and ultrasound (US) to each other, and to other enteric investigations, particularly colonoscopy. Methods 159 patients (mean age 38, 94 female) with newly diagnosed or relapsing Crohn’s disease, prospectively recruited to a multicentre diagnostic accuracy study comparing MRE and US completed an experience questionnaire on the burden and acceptability of small bowel investigations between December 2013 and September 2016. Acceptability, recovery time, scan burden and willingness to repeat the test were analysed using the Wilcoxon signed rank and McNemar tests; and group differences in scan burden with Mann–Whitney U and Kruskal–Wallis tests. Results Overall, 128 (88%) patients rated MRE as very or fairly acceptable, lower than US (144, 99%; p < 0.001), but greater than colonoscopy (60, 60%; p < 0.001). MRE recovery time was longer than US (p < 0.001), but shorter than colonoscopy (p < 0.001). Patients were less willing to undergo MRE again than US (127 vs. 133, 91% vs. 99%; p = 0.012), but more willing than for colonoscopy (68, 75%; p = 0.017). MRE generated greater burden than US (p < 0.001), although burden scores were low. Younger age and emotional distress were associated with greater MRE and US burden. Higher MRE discomfort was associated with patient preference for US (p = 0.053). Patients rated test accuracy as more important than scan discomfort. Conclusions MRE and US are well tolerated. Although MRE generates greater burden, longer recovery and is less preferred than US, it is more acceptable than colonoscopy. Patients, however, place greater emphasis on diagnostic accuracy than burden. Key Points • MRE and US are rated as acceptable by most patients and superior to colonoscopy. • MRE generates significantly greater burden and longer recovery times than US, particularly in younger patients and those with high levels of emotional distress. • Most patients prefer the experience of undergoing US than MRE; however, patients rate test accuracy as more importance than scan burden. Electronic supplementary material The online version of this article (10.1007/s00330-018-5661-2) contains supplementary material, which is available to authorized users.


Introduction
Cross-sectional imaging plays a crucial role in the diagnosis and follow-up of Crohn's disease (CD), and is fundamental to determine disease extent, activity and complications [1]. Although many techniques are available, emphasis is placed on MR enterography (MRE) and small bowel ultrasound (US) given the potential detrimental effects of repeated ionising radiation exposure associated with CT [1,2]. Meta-analyses suggest that MRE and US are largely equivalent in terms of accuracy, although most comparative studies to date have been relatively small and single site [3][4][5], and implementation has been governed largely by availability, local expertise and clinician preference [6,7]. However the results of the METRIC trial, a large prospective multicentre diagnostic accuracy study, have shown that although both MRE and US have high accuracy for the extent and activity of small bowel Crohn's disease, MRE is superior to US when tested in a national health service setting [8,9]. Patient experience and acceptability will influence test utility. While MRE and US avoid radiation, they have their own specific attributes which may impact on tolerance. For example, patients must ingest large volumes of oral contrast for MRE, and US requires abdominal compression. Patients' perceptions of test Bburden^(levels of physical and psychological discomfort) can impact on compliance, even if the test is diagnostically superior to alternatives, as exemplified by low uptake of colorectal cancer screening colonoscopy [10]. This diminishes test utility. Indeed, patients may delay seeking medical attention, fearing the discomfort associated with procedures such as colonoscopy [11]. To date little data reports imaging test preferences amongst patients with CD, and available data largely compares now largely obsolete investigations such as barium enema and enteroclysis [1].
The purpose of our study was to compare the perceived burden and acceptability of MRE to US, and to other enteric investigations in patients recruited to the METRIC trial [8], to identify predictors of scan preference and to determine the perceived importance of different scan attributes.

Participants
The METRIC trial protocol was published previously [8], and the main trial was recently reported [9]. In summary, patients with newly diagnosed CD, or with known CD and suspected luminal relapse, were prospectively recruited from eight hospitals and underwent MRE and US, in addition to other small bowel investigations performed as part of usual clinical care. Patients with newly diagnosed CD had already undergone colonoscopy or had this pending; patients with suspected relapse only underwent colonoscopy if clinically indicated. Overall, 335 patients were recruited and all were given the option to complete a patient experience questionnaire investigating their experience. In total 324 (97%) consented to take part in the experience substudy, of whom 159 completed the questionnaire (48% of total recruitment) (see Fig. 1).

Questionnaire distribution
Patients were provided with paper copies of the questionnaire at the time of consent by a member of their local trial team, or these were posted if this was not possible. A stamped addressed envelope for return was provided and patients were asked to complete the questionnaire only after all their investigations were completed for that particular diagnostic episode. Patients were encouraged to contact their clinical team if they were unsure whether they had completed their current round of investigations. Participants were asked to record the date of questionnaire completion.

Questionnaire content
Demographics: Patients were asked their age, gender, educational level and ethnicity. Missing demographic data on age and gender were supplied via the central trial database.
Physical and emotional well-being: Emotional distress was assessed using the General Health Questionnaire GHQ-12 [12]. An example item is, BIn the last three months have you….been feeling unhappy and depressed^. Using the GHQ-12 binary coding method (0,0,1,1), a mean sum score was created ranging from 0 to 12. A score of 4 or higher is considered indicative of significant distress levels [13].
Co-morbidity was assessed by asking patients about their current and recent physical health and mental well-being.
Patients were asked to report (Byes^or Bno^) whether they had any of the following diseases: heart or vascular disease, diabetes, epilepsy, stroke, arthritis, asthma, mental or emotional disorder. There was also an option to provide details of other illness. A response of Byes^to any illness was coded and summed to form a dichotomous Bco-morbidity^variable (Bpresent^or Babsent^), but mental or emotional disorder was omitted since this was captured by the GHQ-12.
Scan recovery, overall acceptability and willingness to have again The questionnaire was divided into sections pertaining to MRE, US, hydro-US (ultrasound performed following oral contrast administration), barium follow-through (BaFT), CTE enterography (CTE) and colonoscopy. Patients were asked to indicate whether they had undergone the test and, if so, complete the relevant sections, or otherwise to leave that section blank.
For each investigation, patients graded their recovery time on a 9-point scale ranging from Bimmediate^to Ba week^. Data were collapsed into six categories for analysis (see BResults^).
Patients rated how acceptable they found investigations on a 4-point scale: Bnot at all acceptable^to Bvery acceptable( see BResults^). Patients were also asked to select the least acceptable (or worst) part of the investigation from a range of attributes provided, specific to the particular investigation. For example, exposure to ionising radiation was listed as an option for CTE and BaFT, and laxative requirement listed for colonoscopy (see

Scan burden for MRE and US
Scan burden for MRE and US was quantified using a questionnaire adapted from that used to assess colonoscopy and whole-body MRI [14,15] (Supplementary Data 1 and 2). Five additional items of direct relevance to small bowel investigations were added: abdominal bloating, diarrhoea, nausea, vomiting and sleep difficulties. The questionnaire combined a series of individual items into three main domains: satisfaction, worry and discomfort. The MRE questionnaire included 31 items (7, 6 and 18 in satisfaction, worry and discomfort domains, respectively) and the US questionnaire included 28 items (7, 6 and 15 satisfaction, worry and discomfort domains, respectively), excluding items pertaining to noise, claustrophobia, injections and undesirable side effects, but additionally including an item relating to the abdominal pressure of the US probe.
Patients rated their experiences using a 1-7 Likert scale, where 1 and 7 were anchored to bipolar statements related to the scan, e.g. 1 = Bthe noise of the scanner was unbearable^to 7 = Bthe noise of the scanner was fine^. Scores for each item were reverse scored, totalled and averaged so that higher scores equated to higher burden. Internal reliability of subscales was assessed using Cronbach's alpha.

Scan preference
Patients were then asked to indicate whether they would prefer MRE or US if they had to undergo just one test.

Overall perceived importance of investigation attributes
Patients were asked to rate how important 25 possible investigation attributes were to them on a 5-point scale: Bnot at all important^to Bextremely important^(Supplementary Data 3). Higher scores indicated higher levels of perceived importance. Attributes included diagnostic accuracy and test efficiency to reach a final diagnosis as well as items specific to certain scans, for example requirement to drink a large volume of oral contrast.

Statistical analysis
The study was powered to enable comparison of scan burden between MRE and US using the Wilcoxon signed rank test, with a medium effect size (d = 0.5), alpha of 0.05 and 95% power. A minimum number of 57 patients was required. Analysis was performed using IBM SPSS version 24 (IBM Corp.). Independent t tests and chi-square tests were used to assess differences between (i) questionnaire completers and non-completers and (ii) newly diagnosed and relapse cohorts, for continuous or categorical data respectively. Related samples Wilcoxon sign tests were used to assess differences between scan recovery time, scan acceptability and scan burden. McNemar tests were used to assess willingness to have the different investigations again. Bonferroni corrections were applied to the latter, meaning a p < 0.01 threshold for statistical significance. Differences in perceived MRE and US scan burden between different subgroups were assessed using Mann-Whitney U tests or Kruskal-Wallis as appropriate. Post hoc comparisons using a Bonferroni correction were used to assess the effect of age on scan burden, adopting a p < 0.01 threshold for statistical significance. The time between the questionnaire completion and the date of MRE and US was dichotomised into less than 5 weeks or 5 weeks or longer [16] and any association with scan preference or perceived importance of different test attributes assessed using chi-square tests and Spearman's rho correlation coefficients respectively, adopting a p < 0.002 threshold following a Bonferroni correction (0.05/25). We also explored whether a time interval of less than or more than 1 week influenced scan preference.

Ethical considerations
Ethical approval for the METRIC trial (including the current study) was obtained from the National Health Service Research Ethics Committee (NHS REC) in September 2013 (ref: 13/SC/0394).

Results
Participant characteristics are shown in Table 1. Just under half of participants who consented to complete a questionnaire actually did so (159/ 324, 49%). Participants completing the questionnaire were significantly older than non-responders (mean age, 38.2 vs. 33.8 years; t = 2.603, p = 0.010), but there were no gender differences between groups (chi-square = 1.606, p = 0.205) or whether the patient was newly diagnosed or relapsing (chi-square = 1.763, p = 0.184).
There were no significant differences in demographics, educational level, ethnicity, presence of comorbidities or prevalence of significant psychological distress between those with newly diagnosed CD or suspected relapse. Overall, rates of psychological distress were high, with 48% reporting clinically significant levels ( Table 1). The median number of days between patients undergoing the MRE scan and completing the questionnaire was 7 (range 0-326; 46.6 weeks). The median number of days between patients undergoing the US scan and completing the questionnaire was 6 (range 0-326). The proportion who completed the questionnaire less than 5 weeks after their MRE and US scan was 61% (n = 70) and 61% (n = 71) respectively.
The number and percentage of patients who completed the questions about scan experience across the different imaging modalities are shown in Supplementary Table 1.
Attributes selected as the least acceptable part of MRE, US and colonoscopy are shown in Figs. 2 to 4 (see Supplementary  Figs. 1-3 for hydro-US, BaFT and CTE). Drinking contrast (37%) and repeated breath-holding (14%) were most commonly cited for MRE, followed by Bother^which comprised mainly Overall 49% reported US as being Bfine^, with no least acceptable part, although 30% reported abdominal compression as the least acceptable aspect. Conversely, for colonoscopy 55% of patients rated the laxative as the least acceptable part of the investigation, followed by discomfort (23%).
Burden scores for MRE and US are shown in Table 3. Patients reported higher burden during MRE versus US, although scores were relatively low overall. There were significant differences between MRE and US on all three subscales (discomfort: z = 9.558, p < 0.001; satisfaction: z = 7.043, p < 0.001; and worry: z = 8.017, p < 0.001).
Differences in MRE scan burden according to patient demographics and scan preference are shown in Table 4.
Perceived MRE scan burden was significantly higher among younger people (with significant differences only between the youngest and oldest age group, z = 2.969, p = 0.003 following Bonferroni corrections) and people with high levels of emotional distress. There was a nonsignificant trend towards higher MRE scan burden among patients reporting a preference for US. Younger age and high levels of emotional distress were also associated with higher perceived burden of US (see Table 4).

Scan preference
When asked which scan patients would prefer, the majority who expressed a preference (100/125 [80%]) selected US over MRE. Scan preference was not related to the time between questionnaire completion and undergoing US (either less than 5 weeks vs. 5 weeks or longer, or less than 1 vs. 1 week or longer), χ 2 = 2.733, p = 0.098 and χ 2 = 0.901, p = 0.343 respectively. There was also no association between the time of questionnaire completion and undergoing MRE (less than 5 weeks vs. 5 weeks or longer χ 2 = 2.421, p = 0.120, or less than 1 vs. 1 week or longer χ 2 = 2.182, p = 0.140 respectively).

Overall perceived importance of investigation attributes
Ratings of test attribute importance (graded from 1 = not at all important to 5 = extremely important) split according to patient cohort are shown in Fig. 6. For both cohorts, accuracy was rated as the most important attribute, followed by waiting time to diagnosis/ treatment and number of tests needed prior to final diagnosis. In general, negative physical test attributes such as requirement to drink fluid, test discomfort and fasting were rated as less important and generally between Ba little bit important^and Bmoderately important^. There was some evidence that, compared to those completing questionnaires within 5 weeks of MRE or US, patients completing questionnaires more than 5 weeks after perceived several physical attributes as less important (see Supplementary  Table 5). However, none of these associations were significant following Bonferroni corrections, (p < 0.002).

Discussion
Using questionnaire data from a large number of patients prospectively recruited to a diagnostic accuracy study, we found that MRE and US are both acceptable and reasonably well tolerated; most indicated that they would repeat the tests and scan burden scores were relatively low for both overall. However, MRE was judged significantly less favourably than US in terms of recovery time, acceptability, burden (across satisfaction, worry and discomfort domains) and Bwillingness to have again^, the last of these approaching significance at the p < 0.01 threshold. Putting this into context, MRE was still rated significantly more favourably than colonoscopy, which was the least acceptable of all tests. As could be anticipated, the Bworse part^of the MRE scan was drinking enteric contrast beforehand and the associated side effects such as diarrhoea and/or abdominal pain/bloating, while conversely almost half indicated that US was Ball fine^, with a minority listing abdominal compression as the worst part.
While our primary focus was MRE and US, we collected data on other small bowel investigations performed as part of usual clinical care in recruited patients. Recovery time for MRE was significantly longer than for CTE. However, we found no significant difference in recovery time compared to BaFT. This finding may be secondary to lack of statistical power contingent on small numbers undergoing BaFT, but it is possible that the constipating effects of barium contributed to slower recovery. In general, MRI is a challenging test for patients. Relatively long scan times, claustrophobic scanner bore and associated noise all influence patient experience negatively [17]. Using a very similar questionnaire to the present study in a sample of 115 patients [15], patient burden during whole-body MRI (WB-MRI) cancer staging was reported as actually a little better than we found for MRE (2.21 vs. 2.72) [15]. Interestingly, both studies reported that high levels of emotional distress predicted increased MRI burden, although the current study found that younger age was also associated. Indeed, a very large proportion of participants in the current study reported significant psychological distress, comparable with high rates of anxiety recently among patients with active CD [18], and reaching levels more typically reported by patients being investigated for suspected cancer [19].
It is perhaps unsurprising that most patients stated that they would choose US over MRE, and indeed MRI scans have been judged to be more challenging than other scans such as PET-CT and contrast-enhanced spectral mammography [20,21]. One very important finding from the current study is that patients rated several scan attributes as more important than the challenges and discomfort of undergoing scans. Notably, diagnostic accuracy was the most important attribute. This is comparable to data from studies of CT colonography [22]. Patients, at least to some extent, seem tolerant of discomfort if they believe that the test is more accurate than a less arduous alternative. We did not provide differential accuracy data for the tests under investigation, and it is likely that patients assumed they are similar when selecting preferences. The METRIC study has recently reported [9] and shows that when compared prospectively, MRE has significantly higher sensitivity for extent (presence and location) of small bowel Crohn's disease than US (80% vs. 70%). When viewed in this context, and given their emphasis on diagnostic accuracy, it seems that MRE is an acceptable first-line test for patients; although patients' experience during MRE was inferior to US, absolute levels of scan burden were relatively low and acceptability ratings reasonable. The performance of US in the Fig. 6 Perceived importance of different scan attributes (mean scores on a scale of [1][2][3][4][5] METRIC trial was, however, still good, particularly for small bowel disease presence, and the technique still undoubtedly has a major role in managing Crohn's disease patients. It is clearly a very well tolerated test by patients and completely safe, an important attribute given the potential deleterious effect of gadolinium deposition with repeated MRE [23][24][25]. Perhaps surprisingly, patients did not rate radiation exposure as particularly important, although again this may be influenced by their knowledge of this issue. A very important consideration in questionnaire studies is the timing of the survey post intervention [26]. In a similar study comparing patient preferences for CT colonography verses colonoscopy, van Gelder et al reported that patient preference for CT colonography fell from 71% immediately after the tests to 61% 5 weeks later, and that drivers for preference switched from physical discomfort to relative diagnostic accuracy [16]. In the current study, we instructed patients to complete questionnaires after all their diagnostic tests had been completed rather than using a fixed time point [27]. The median return time in the current study was 1 week, although this ranged widely from 0 to 47 weeks. There was some evidence that after 5 weeks the perceived importance of a few attributes related to scan discomfort declined over time (although no significance was found after statistical correction). The rating of diagnostic accuracy as the priority for patients was not influenced by the time between the tests and questionnaire response, nor was overall patient scan preference, suggesting that our findings are robust.
Overall, it is clear therefore that the choice of imaging investigation should be based on a discussion between the referring clinician, radiologist and patient, considering scan attributes including diagnostic accuracy, patient experiences and priorities, and the exact underlying clinical question.
This study has limitations. Although the largest prospective study of patients' experiences of cross-sectional imaging in Crohn's disease to date, questionnaire response rates were under 50% despite the large majority stating that they would participate initially. However, this is consistent with questionnaire studies of similar design (e.g. [15,28]). Non-responders were significantly younger than those who completed questionnaires, which may restrict generalisability. Postcode data were unavailable so we were unable to examine the influence of deprivation on questionnaire completion rates or scan burden/preference.
Since patients had already consented to the METRIC trial, the cohort sampled were apparently willing to undergo these tests in the first place. It would have been interesting to question those declining participation as to whether prior experience of either test had influenced their decision. We did not specifically record the experience of patients who did not complete or interrupted their imaging examination which would have been informative. In addition, some patients did not complete the questionnaire until weeks after their scan, and their recall of scan experiences may be imperfect. However, as noted above, the effect of such delay on reported experiences did not impact patient preferences. Finally, we used a variety of questionnaires, which, although comprehensive, may not fully capture the subtleties of patient experience.
In summary, both MRE and US are well tolerated generally by patients with CD, and better than colonoscopy. However, patient burden and recovery are significantly inferior for MRE compared to US. Whilst a majority of patients would opt to undergo US rather than MRE, patients rate other scan attributes, notably diagnostic accuracy, as more important than discomfort.
Funding This work was supported by the National Institute of Health Research Health Technology Assessment NIHR HTA programme (project number 11/23/01) and will be published in full in Health Technology Assessment. The project is supported by researchers at the National Institute for Health Research University College London Hospitals Biomedical Research Centre. ST and SH are NIHR senior investigators.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Professor Stuart Taylor.

Conflict of interest Stuart Taylor is a research consultant to Robarts.
Statistics and biometry One of the authors has significant statistical expertise.
Informed consent Written informed consent was obtained from all subjects (patients) in this study.
Ethical approval Institutional review board approval was obtained.
Study subjects or cohorts overlap Some study subjects or cohorts have been previously reported in the main METRIC trial results paper.

Methodology
• prospective • cross-sectional study • multicentre study Department of Health disclaimer This report presents independent research commissioned by the National Institute for Health Research (NIHR). The views and opinions expressed by authors in this publication are those of the authors and do not necessarily reflect those of the NHS, the NIHR, NETSCC or the HTA programme or the Department of Health.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.