Introduction

Several colorectal investigations are well suited to the detection of pre-malignant polyps and colorectal cancer (CRC), including computed tomographic colonography (CTC), flexible sigmoidoscopy (FS) and optical colonoscopy (OC). Consequently, each of these modalities may be effective for population-based screening, where asymptomatic individuals in an age group at risk (e.g. over 50 years) are tested for premalignant polyps that can then be removed, preventing CRC entirely [1]. Screening can also detect pre-clinical CRC, allowing for earlier intervention and improved prognosis. Uptake of CRC screening is low compared to breast and cervical cancer [24] and it is intuitive that invitees would be most likely to undergo tests that align with their preferences.

CTC has high sensitivity (the probability of detecting disease when it is present) for large (≥10 mm) adenomas [5] and demonstrates the whole colon, unlike FS. CTC also appears attractive to invitees: A recent randomised trial found that invitees anticipate the CTC procedure as being less painful and embarrassing than OC [6]. A potential advantage of CTC over OC and FS is that laxative components of the bowel preparation can be reduced or even eliminated, decreasing test burden [7]. Full-laxative bowel preparation is the most inconvenient aspect of CTC [8, 9]. However, reducing or eliminating the laxative component may diminish polyp sensitivity and specificity (the probability of ruling out disease when it is absent, avoiding false-positive results and unnecessary follow-up investigations) [10] and qualitative research has suggested that screening invitees prefer a more sensitive full-laxative bowel preparation, despite greater burden [11]. However, a recent quantitative study [12] found that the values placed on sensitivity versus non-purgation were similar, with no overall preference. It is therefore unclear whether non-laxative CTC would be perceived as having sufficient acceptability to translate into higher uptake than full-laxative CTC. Furthermore, CTC requires that patients undergo the inconvenience of confirmatory follow-up testing if polyps are suspected, whereas endoscopic tests can remove these during the same examination. This means it is also unclear whether either form of CTC would be perceived as having significant advantages and higher uptake than endoscopic tests (particularly FS).

Most previous research [e.g. 1316] has asked participants to compare information on several tests and state their overall preference. However, this differs substantially from real screening invitations [e.g. 17], which typically contain information about a single test. Another limitation is that study participants are rarely provided with detailed information about the risks, costs and benefits used in screening programmes to encourage invitees to make an informed choice [18]. We aimed to create a more realistic decision-making context by randomly allocating individuals approaching screening age to be sent detailed written information on a single CRC screening test (non-laxative CTC, full-laxative CTC, FS with enema bowel preparation or full-laxative OC). We used a questionnaire survey to quantify how positively or negatively individuals rated key aspects of each screening test (e.g. tolerability of preparation and health benefits) and compare ratings between tests. We also measured and compared participants’ expectations of how likely they were to participate in screening with each test.

Materials and methods

Design

The research team sent out postal invitations to 13 primary care centres (General Practices) that were identified by regional Primary Care Research Networks in England as being potentially interested in participating in research. Of the seven practices responding to the invitation, three were selected in order to achieve a diverse mix of geographic and demographic characteristics: two were located in Cumbria, a rural area with an affluent, predominantly white British population; one was in an urban area (London) with a heterogeneous socioeconomic and ethnic population.

Following ethical approval, invitees were randomly assigned to be mailed one of four study packs (in June 2013) consisting principally of an information booklet and leaflet (explaining screening and characteristics of one of the four tests), a standardised questionnaire and a Freepost envelope. Invitees were requested to read the booklet and leaflet, complete the questionnaire and return it in the envelope. Completion of the questionnaire constituted implied consent to participate. Invitees who did not wish to participate were requested to return the questionnaire blank and individuals were sent reminder letters if they had not responded within 2 weeks of being contacted. Written materials are reproduced in Appendices 1.1 to 1.4.

Participants

Eligible participants were all adults aged 45-54 years. Exclusion criteria were (1) previous diagnosis of bowel cancer; (2) diagnosis of any cancer in the previous 6 months; (3) learning disability; (4) receiving regular colonoscopies; (5) experiencing significant cognitive decline; (6) clinically judged to be unsuitable to participate in the study (e.g. due to a terminal illness, mental health issue or inability to read English). Electronic patient records were screened using database queries and manual review by primary care practitioners. After exclusions, a random sample of eligible participants was selected at each practice in order to obtain an approximately 2:1 ratio of Cumbria and London invitees for a more nationally representative proportion of white British invitees.

Test conditions

Participants were sent information regarding one of four tests: non-laxative CTC, full-laxative CTC, FS and OC. This information was designed using detailed literature from the English Bowel Cancer Screening Programme (BCSP) as a template. The BCSP provides invitees with written information on the current standard screening method in England (guaiac faecal occult blood testing, followed by OC if the test is abnormal) [19, 20] and these booklets provide a useful indication of the material that would be required if any of the four tests in this study were offered as part of a national screening programme. Hence, study information materials were designed to be identical in style and wording to BCSP materials and were consistent across test conditions wherever possible (see Fig. 1 for an example). Key similarities and differences in the descriptions for each test are described in Appendix 2. As an example of a characteristic that differed between test conditions, FS was described as reducing the risk of getting bowel cancer by 23 % and dying from the disease by 31 % [from 21]. Since tests that demonstrate the whole colon confer greater theoretical benefit than a test of the distal colon only, CTC and OC were described as being estimated to reduce the respective risks by at least 23 % and 31 %. Statistics were derived from existing literature [2034] and expert radiologist opinion. Drafts of study materials were piloted using a “think-aloud” technique [35] in which lay members of the public read through each leaflet out loud and articulated what they were thinking at various stages. This allowed text that was difficult to understand to be identified and amended.

Fig. 1
figure 1

Example pages from an information leaflet (non-laxative CTC)

Outcomes

Participants were asked: “Imagine you have just received an invitation to have the [test name] test in the post. Realistically speaking, would you decide to have the [test name] test?” Response options were “definitely not”, “probably not”, “yes, probably” and “yes, definitely”. The primary outcome was whether participants stated that they would “definitely” have the test (participants responding with this option were categorised as intenders; participants responding with any other option were categorised as non-intenders). This question was adapted from a multicentre randomised trial on efficacy of FS, where there is evidence that responses are a strong predictor of subsequent screening behaviour [22, 36, 37].

Invitees were asked to rate their agreement with six statements regarding their perceptions of the test itself (e.g. “having the test would be embarrassing”). Response options consisted of a 5-point Likert scale ranging from “strongly disagree” to “strongly agree” and were scored 1-5 with higher scores representing more positive perceptions. Cronbach’s α (a measure of whether a set of items reliably measure the same factor) was 0.72 for items relating to perceptions of the test, indicating that the average correlation between scores on these items was acceptably high. Consequently, scores for these six items were summed for each participant to create an overall test perception score. Invitees were also asked to rate how strongly they agreed or disagreed with three items relating to perceptions of health benefits (e.g. “having the test would reduce my risk of getting bowel cancer”) and four items relating to perceptions of the preparation (e.g. “preparing before the test would be uncomfortable”). These items were analysed separately (instead of being summed to create two overall scores for each participant) due to low Cronbach’s α that could not be improved by omitting variables (α = 0.62 and 0.68, respectively). Finally, a 5-point response scale ranging from “very unlikely” to “very likely” allowed participants to rate their views on two items relating to sensitivity (“the test finding bowel cancer if I have it” and “the test finding polyps if I have any”; α = 0.91 and so responses were summed to create an overall sensitivity perception score for each participant) and one item that assessed views of specificity (“having a follow-up test that shows there was no problem”). The choice of items was influenced by previously used measures of screening test benefits and barriers [38, 39]. The secondary outcomes consisted of overall scores for test tolerability and sensitivity along with scores for individual items on preparation, specificity and health benefits.

Participants were also asked to report whether they had read the information materials (question adapted from [40]) and asked six multiple-choice questions to assess comprehension (e.g. “The test would be done: At a hospital/At home/At a GP practice”) with one point given for each correct answer.

The questionnaire ended with questions on demographics (e.g. ethnicity and markers of socioeconomic status), participants’ perceived risk status compared to others (from [41]) and self-rated health. An estimate of socioeconomic status was derived using a previously used method [42] that combined responses to questions on education, and vehicle and home ownership (one point was given for each having no formal qualifications, no vehicle ownership and living in rented accommodation). Scores ranged from 0-3; higher scores represented greater deprivation.

Sample size

A meta-analysis that pooled heterogeneous studies of hospital-based screening tests and a trial of non-laxative CTC found actual uptake to vary around 30 % for the four tests [43, 44]. Assuming that intentions to be screened would be 20 % higher to account for the gap between intentions and behaviour [37], approximately 50 % would definitely intend to take the test. Four hundred four participants were estimated to be required per group in order to detect a difference of at least 10 % between two tests (considered clinically meaningful; α = 0.05, β = 0.2) in an analysis of valid responders. A total of 3,100 individuals were invited, anticipating a response rate of approximately 50 %.

Randomisation sequence generation

A member of the research team carried out randomisation at each individual practice in a single step, i.e. the allocation sequence was not available to practice staff or the researchers until all invitees had been allocated. All eligible individuals living at a given address were allocated to the same condition. A random value was assigned to each eligible household using the rand() function in Excel (Excel 2010 for Windows, Microsoft, Redmond, WA, USA). Values for all invitees were then categorised into quartiles with each quartile representing one test condition (conditions applied to each quartile were determined through coin tosses prior to data collection).

Statistical analysis

Frequencies and percentages were used to illustrate demographic and other background data. All inferential analyses used non-parametric methods (i.e. Kruskal-Wallis or Mann-Whitney U tests to compare ordinal data and Pearson χ2 tests to compare proportions) since Kolmogorov-Smirnov tests indicated that relevant data were non-normal (p < 0.05).

Pearson χ2 tests were used to compare the proportion of intenders between tests. The first analysis compared valid responders who had completed the study per the protocol (i.e. they did not report being outside the eligible age range for the study, had received the correct study pack, stated that they had read the information in full and had answered the question on intention to be screened). Sensitivity analyses compared intentions between tests after re-categorising intenders as those who responded with either “yes, definitely” or “yes, probably”. Both analyses were then repeated in an expanded analysis of all invitees (i.e. irrespective of whether they had responded or not, whether study packs were undelivered or whether they met other criteria for the main analysis). It was assumed that non-responders to the question on intention would not have the test (or could not intend if the study pack was returned undelivered). Wilson score 95 % confidence intervals were calculated for each test individually and across all tests.

Scores on items relating to perceptions were compared using Kruskal-Wallis tests, followed by planned post-hoc Mann-Whitney U tests to compare scores for which there was strong evidence against the null hypothesis (p < 0.05). Missing data for items relating to perceptions were imputed using multiple imputation [45]. Results did not differ meaningfully between imputed and non-imputed analyses so only non-imputed statistics are shown for post-hoc comparisons. All analyses were carried out using SPSS 21 for Windows (IBM, Armonk, NY).

Results

Flow of participants through the study is shown in Appendix 3. Of 3,100 invitees, 1,468 were female (47.4 %) with a median age of 49 years [interquartile range (IQR): 47 to 52]. Among 3,094 invitees with Index of Multiple Deprivation (IMD) data, the median score was 13.4, and out of 2,030 participants with ethnicity data (either self-reported or from practice records), the majority were white British (n = 1,420; 70.0 %).

The response rate for valid responders was 19.5 % (26.0 % in Cumbria and 7.5 % in London). Demographic characteristics of valid responders are shown in Table 1. The median age of valid responders was 50 years (IQR: 47 to 52). The majority of participants correctly answered at least five out of the six comprehension questions (n = 498; 81.4 %). Scores varied by test; OC responders had higher scores (median: 6; IQR: 5 to 6) than FS responders (5; 4 to 6), non-laxative CTC (5; 5 to 6) and full-laxative CTC responders [5; 5 to 6; all p-values < 0.0005; effect sizes (r): 0.26 to 0.37]. Non-laxative CTC responders also had higher scores than FS responders (p = 0.014; effect size: 0.14).

Table 1 Demographic statistics of all valid responders

Survey responders were older than non-responders (median age: 50 vs. 49 years) and resided in less deprived areas (median IMD score: 11.5 vs. 14.4). Females were more likely to be responders (25.8 %) than males (18.8 %) and white British invitees were more likely to respond than other ethnicities (40.4 % vs. 13.4 %; all p-values < 0.0005).

Intentions

Out of 603 valid responders (those who received the correct study pack, read all information and responded to the intention question), 288 indicated they would definitely have the test (47.8 %). There was no strong evidence to suggest that the proportion of intenders differed between tests (Table 2; χ2[3, 603] = 4.49; p = 0.213). The results of the three sensitivity analyses were not meaningfully different (p-values ranged from 0.408 to 0.917).

Table 2 Number and percentage of intenders and non-intenders, overall and for each test

Perceptions of tests for valid responders

Kruskal-Wallis tests comparing perceptions between tests showed strong evidence against the null hypothesis for all four items relating to preparation, indicating that there were differences between at least two test conditions (all p-values ≤ 0.001). There were also differences in terms of perceptions of test tolerability (p < 0.0005). There were no apparent differences for perceptions of sensitivity (p = 0.310 to 0.535) or specificity (p = 0.250 to 0.576). Finally, there was no strong evidence of differences for items relating to benefits, i.e. whether the test would reduce the risk of developing bowel cancer (p = 0.290 to 0.476), dying from bowel cancer (p = 0.805 to 0.901) or dying in the next 10 years (p = 0.460 to 0.621).

Planned post-hoc Mann-Whitney U tests between pairs of test conditions on items relating to preparation showed strong evidence of differences for most comparisons: Non-laxative preparation for CTC was rated more positively than all other methods in terms of avoiding discomfort and embarrassment. It was also superior to full-laxative preparation for CTC and OC in terms of ease of completion. In addition, preparation for FS was rated more positively than full-laxative preparation for CTC in terms of time manageability and avoiding discomfort. Finally, preparations for all tests were rated more positively compared to full-laxative OC in terms of time manageability, avoiding discomfort and ease of completion. These results created a ranking in which non-laxative CTC preparation was rated more positively overall compared to enema preparation for FS, which was rated more positively than full-laxative CTC preparation, followed by full-laxative OC preparation. In terms of the tests themselves, all were perceived more positively than OC. Table 3 shows results of all post-hoc comparisons and median overall scores for test tolerability and individual items relating to preparation. Appendix 4 shows median scores and IQRs of items for which there was weak evidence against the null hypothesis.

Table 3 Mann-Whitney U test results comparing preparation and test tolerability items between tests (left)

Discussion

In this study, people were given standardised information regarding different screening tests to a level of detail consistent with material provided by the English Bowel Cancer Screening Programme. The study was strengthened by using the pre-existing information leaflet as a template since it had been designed with the aim of facilitating an evaluation of the risks, costs and benefits of screening participation. A further strength was the randomised design, which meant that groups were comparable and that intentions were derived in a setting more closely matching a genuine screening invitation.

We found a clear order of perceived tolerability for preparations, consistent with previous studies showing reduced- and non-laxative preparations are perceived as superior to full-laxative methods regarding anticipated burden [6, 11, 12]. Our study adds to these findings by showing that non-laxative preparations are also rated more positively than the enema that precedes FS. We also found that all tests were rated more positively than OC in terms of the anticipated experience of the procedures themselves. Previous research has shown that CTC is considered more tolerable than OC for screening, in terms of both patients’ expectations (e.g. [6]) and their eventual experience [46]. Our study supports these findings and also shows that participants did not rate CTC more positively than FS, surprisingly, since CTC does not require invasive intubation to the same extent.

Despite differences in perceptions, we found no convincing evidence that future screening intentions differed between either form of CTC, or between CTC and endoscopic alternatives. This was the case when the analysis focused only on valid responders and when the analysis included non-responders, even though the latter increased statistical power and narrowed confidence intervals. In some respects, this agrees with previous research: a meta-analysis [43] pooled screening uptake for several of the tests under investigation and found only a small difference between full-laxative CTC and OC in favour of the latter (22 % vs. 28 %, respectively; FS had uptake of 35 %). However, data on non- and reduced-laxative CTC were not included. A previous trial of screening behaviour found that uptake of reduced-laxative CTC was markedly higher compared with full-laxative OC (34 % vs. 22 %, respectively) [44]. As a result, it might be assumed that reducing the laxative component of bowel preparation has a large effect on uptake but this is uncertain without a direct comparison of CTC uptake following different preparations. Our findings suggest that there would be little difference between methods.

Although correlation between intention and behaviour is well established [37, 47], it is possible that differences in uptake would only become apparent following a genuine screening invitation instead of a hypothetical one. Perceived differences in preparation and test tolerability may become more influential following a real invitation: The prospect of discomfort and inconvenience may be more immediate, whereas potential health gains remain in the relatively distant future [48]. Hence, a randomised trial would be the most valid method to compare actual uptake of different forms of CTC and other tests. It could also be argued that improving perceived tolerability is desirable on the grounds of improving patient satisfaction, even if uptake does not improve. In this respect, our results favour non-laxative CTC over all other modalities in terms of participants’ perceptions of the test (compared to OC) and the preparation (compared to all alternatives).

It was surprising that there were no apparent differences between tests in terms of perceived sensitivity and specificity, despite large differences in the statistics provided. For example, OC did not refer to false positives (since confirmatory testing and polypectomy are generally not required) and FS was described as having sensitivity of 65 % compared with 85-90 % for CTC and OC. Although we adapted a pre-existing and rigorously designed leaflet as a template, it is possible that participants were not able to use it to extract this specific information and therefore may have had unrealistically positive views about FS (e.g. if they did not appreciate the implications of FS only examining the distal third of the large bowel). This may also reflect general weaknesses in participants’ health literacy or numeracy [49]. Future research may assess whether current information can be improved in this respect.

Our study has limitations. The response rate was lower than anticipated and responders differed to non-responders regarding several demographic characteristics. It is therefore possible that intentions and perceptions cannot be generalised for all people approaching screening age. There is no evidence that non-response bias differed by condition, but it is possible that non-responders were more sensitive to the test on offer compared to responders and would therefore have been more likely to endorse some tests over others. It was also necessary to create original items to measure perceptions since no validated alternatives existed. Although items were based on existing questionnaires, Cronbach’s α was too low to generate reliable overall perception scales for preparation, health benefits or accuracy. This resulted in numerous hypotheses being tested but most p-values were very low or very high, reducing the likelihood of incorrect inferences.

In conclusion, we found that perceptions of non-laxative preparation for CTC are more positive compared to all alternatives and that the anticipated test experience of full- and non-laxative CTC (and FS) is perceived as superior to OC. However, we found no strong evidence that participants are more likely to undergo either form of CTC compared to alternatives. This suggests that uptake is unlikely to be increased significantly by offering one test over another.