Background

Colorectal cancer (CRC) is the third most common type of cancer world-wide [1]. Most cases of CRC are incidental, and even though there are some risk factors of CRC, individual-based interventions on these risk factors are difficult to implement [2, 3]. Therefore, many countries have implemented national screening services for CRC with different modalities such as immunochemical faecal occult blood test (iFOBT), sigmoidoscopy or colonoscopy [4]. In 2014, CRC screening with iFOBT was implemented in Denmark, targeting all individuals aged 50–74 years [5, 6]. All participants with a positive test are urged to undergo a follow-up procedure, which includes bowel preparation and an investigative colonoscopy under local anaesthesia. Besides the intended benefits of early detection, there are potential unintended harms of screening, less frequently reported in the literature [7]. These harms include negative psychosocial consequences, particularly from false-positive results and (over)diagnosis [8,9,10].

Previous cancer screening research in breast, lung and cervical cancer has revealed different degrees of psychosocial consequences from participating in cancer screening and particularly from receiving a false-positive result [11,12,13,14,15]. However, the outcome measures and study design of cancer screening studies on psychosocial consequences are in general inadequate [16, 17]. Therefore, research using questionnaires with high content validity, sound psychometric properties and study designs including baseline measurements as well as timely assessments are highly needed [18].

Brodersen et al. have previously developed condition-specific questionnaires with high content validity and sound psychometric properties to measure psychosocial consequences of screening for specific cancers and other life-threatening diseases [19,20,21,22]. Furthermore, Brodersen et al. found that a common core questionnaire, consequences of screening (COS), was relevant in all these screening settings.

We have not identified studies investigating psychosocial consequences of screening for CRC using a condition-specific questionnaire with high content validity and adequate measurement properties. Furthermore, it has not been investigated whether COS is relevant for use in a CRC screening setting. Therefore, the aims of this study were:

  1. 1.

    To investigate content relevance and content coverage of COS in a CRC screening setting.

  2. 2.

    To generate items and themes relevant in a CRC screening setting, in case of gaps in content coverage in the present COS.

  3. 3.

    To test the possibly extended version of COS for dimensionality and differential item functioning (DIF) using Item Response Theory Rasch Models.

Methods

The COS questionnaire

The COS questionnaire was originally developed in a breast cancer screening setting [19, 23]. Subsequently, a core-set of nine dimensions and one single item from this first COS questionnaire has been confirmed relevant and has been statistically validated in various other screening settings [20,21,22].

The core-COS consists of two parts: part I, encompassing four dimensions and one single item, which is relevant before, at, and after screening and for control persons not invited to screening, and part II, encompassing five dimensions, which is only relevant when a screened participant has received a final diagnosis (Table 1). In part I, all items are phrased as the example in Fig. 1, with a common stem as in Fig. 1 as a heading of every five to six items. In part II, the items are phrased as in Fig. 2, with a common stem as in the Fig. 2 as a heading of each page in the questionnaire.

Table 1 Content of the core-questionnaire COS (consequences of screening)
Fig. 1
figure 1

Response categories, COS part I

Fig. 2
figure 2

Response categories, COS part II

Furthermore, in the construction of the COS questionnaires for the various other screening settings additional condition-specific dimensions have been developed and validated for use in these specific screening settings [20,21,22]. Five of these condition-specific dimensions (‘Introvert’, ‘Change in body perception’, Fear and powerlessness’, ‘Change in perception of own age’, and ‘Emotional reactions’) were assumed relevant before at and after CRC screening as well. Hence, they were added as domains to part I of the COS questionnaire for CRC screening that was to be developed.

In core-COS part I, the response options are arranged in four categories from ´Not at all´ to ´A lot´ (Fig. 1). The response scores range from 0 to 3, where 0 corresponds to ‘Not at all’ and 3 to ‘A lot’.

In core-COS part II, the response options are arranged in five categories with `No change´ placed in the middle and two response categories on each side indicating change in opposing directions (less/more change) (Fig. 2). The response category scores range from 0–2 in both directions, where 2 indicates most change.

Design and setting

This study consisted of two phases: (1) a qualitative phase where content relevance and content coverage of COS in a CRC screening setting were investigated and new items were generated in case of gaps in content coverage (2) a quantitative phase where the possibly extended version of COS was tested for unidimensionality and DIF.

Phase 1, Qualitative phase

The qualitative phase of this study was conducted as an independent, but connected, part of an explorative qualitative study using focus groups to investigate experiences of receiving a false-positive CRC screening result [24]. The rationale for only including participants with false-positive results and low-risk polyps in the focus groups was based on research in mammography screening, where participants with false-positive results experience the most psychosocial consequences [25, 26]. This group, together with the low-risk polyp group were thus the most relevant groups to uncover the psychosocial consequences of CRC screening. The group with normal results has in previous been shown to be least affected psychosocially, why this group would not contribute with new information, not already revealed by informants with polyps or a false-positive result [26].

The explorative qualitative study including details on recruitment, sampling, and participant characteristics has been published elsewhere [24]. Here we describe how the data was used in this validation study and the additional data collection and analysis.

Four focus groups were performed in Region Zealand, Denmark in 2015 with 16 participants in total. The first interview included five women diagnosed with low-risk polyps, the second included three men with a false-positive result, the third included four women with a false-positive result and the final interview included four men diagnosed with low-risk polyps.

The focus groups were divided into two parts: (a) an explorative part, and (b) a structured part focused on the development and content validation of the questionnaire. The two parts were held in continuation of each other with a break between the explorative part and the structured part. In the first part we explored the experiences of receiving a false-positive CRC screening result by open-ended discussions and in the second part we specifically investigated content relevance and content coverage of the core-COS together with the condition-specific dimensions. Moreover, all items were tested for understandability and ease of completion. We also introduced different phrasing of the items to the participants, to find the most appropriate alternatives.

Each focus group was audio-recorded and lasted 55–90 min.

During the focus groups, JB together with the co-authors of the explorative study acted as moderators and JM as an observer.

Development and test of new items

In cases where topics not covered in the existing COS were discussed in the explorative part of the focus groups, new items covering these topics were developed. This was partly performed during the focus groups together with the participants and partly by JB in-between focus groups based on the analysed transcripts. When new items had been developed, these were integrated with the existing items to a new draft questionnaire, which was tested in the following focus group. Hence, the test of COS and the development of new items was an iterative process that was performed continuously throughout the data collection process.

Investigation of content validity of new items was performed alongside with the investigation of content validity of COS.

Data analysis

The focus groups were audio-recorded and transcribed verbatim and a systematic text condensation approach was used to analyse data from the explorative part of the focus groups [27]. Data were analysed in between each focus group and findings used within the next focus group.

Single interviews

The final draft version of the questionnaire was further tested for understandability and functionality by “think-aloud-test” in five single interviews after the four group interviews. The sample was found using convenience sampling of individuals in the target population of the CRC screening programme. In the single interviews JM acted as a moderator. In case of any problems with understandability or comments on functionality, these were discussed between JM and JB and decisions on possible changes were made by these two authors.

Recall period

The recall period is the period back in time that the respondents are supposed to refer to when responding to the questionnaire. The choice of length of the recall period depends on the outcome to be measured and the design and setting of the study.

The recall period of this questionnaire was discussed and decided within the author group. Hence, neither focus group nor single interview participants were involved in this decision.

Phase 2, Quantitative phase

Data collection for statistical psychometric properties analyses

The data used for the statistical assessment of the psychometric properties was a subset of data collected for a longitudinal questionnaire study, not published yet, aiming to quantify the psychosocial consequences of CRC screening.

In the questionnaire study, the questionnaire was sent to all positive screenees and to age-, sex-, and municipality-matched negative screenees, non-attendees and control persons in a 2:1 design. In total we sent the questionnaire to 4178 individuals eight weeks after their matched positive screenees had received their final diagnosis. The final diagnoses were: CRC, medium- and high-risk polyps, low-risk polyps and clean colon. Hence, positive screenees could be classified in these four categories.

Sample size

There is no consensus on an appropriate sample size in Item Response Theory using Rasch Models. Nevertheless, the COSMIN Risk of bias checklist refers to a sample size of ≥ 200 subjects as ‘very good’ [28]. Moreover, previous experiences with Rasch analyses have shown that with samples of 1000 subjects all results tend to be rejected (type I error) i.e. no scales would seem to have adequate fit to the model due to too large power of the study [29]. Therefore, we assumed that a sample of approximately 400 subjects or approximately 60 subjects in each of the seven subgroups was an appropriate sample size. The CRC group was too small; hence, all 50 respondents were included (Table 2).

Table 2 Quantitative phase, participant characteristics

Statistical analyses of dimensionality

The analytical approach was to see whether the data fitted a Rasch model so the investigated scale possessed all the advantageous psychometric properties inherent to the Rasch model [30]. When the items fit the Rasch model the patient-reported outcome measure possesses criterion-related construct validity and is proved to be objective, sufficient, and reliable [23].

Firstly, we investigated unidimensionality, then we investigated absence of differential item functioning (DIF) and lastly, we investigated local independence [30, 31]. When these criteria were fulfilled, the scales and items fit the Rasch model.

Unidimensionality is the ability of a scale only to measure one aspect of a latent trait.

Differential item functioning (DIF), is when an item is excessively correlated to an exogenous variable, and therefore, functions differently in different group of respondents. DIF can be further divided into uniform, when the DIF is constant across the latent trait, and non-uniform DIF, when the DIF vary across the latent trait. Uniform DIF can be adjusted for, while non-uniform DIF cannot [32]. Local independence is when responses to an item are conditionally independent, meaning that two items in a scale only correlate because they both measure the same latent trait.

For each domain, we analysed unidimensionality and uniform DIF with Andersen’s conditional likelihood ratio test (CLR-χ2) [33]. Then individual item fit to the partial credit Rasch model for polytomous items was assessed by conditional infits and outfits and by comparing observed and expected item responses for individuals as well as for study groups [34]. Finally, we analysed uniform DIF for subgroups and LD for particular items by partial gamma coefficients using graphical loglinear Rasch Models [35].

The items were assessed by covariates for DIF. The covariates were age, sex and screening result, which have previously been proven relevant in screening settings [22].

When an item or set of items did not fit the model, we analysed data to locate the source of misfit. Furthermore, we re-read the phrasing of all items in that domain to locate any linguistically poorly defined items or any distinction in the meaning of the items indicating that the item belonged to another domain than we had initially hypothesised.

When an item possessed DIF, the scale was analysed both without that item and with split for that item regarding the covariate of which the item possessed DIF. If DIF was uniform and hereby corrected by the split, then the overall fit to the model would increase.

We decided to keep any item in the model as far as the item did not have non-uniform DIF or low content validity.

The Benjamini–Hochberg procedure was used to correct for multiple testing and Cronbach’s alpha was used to assess reliability [36, 37].

We used DIGRAM to perform all the statistical analyses [38].

Results

Phase I, qualitative phase

The items in core-COS as well as the items in the condition-specific domains were all found relevant by the participants. Moreover, participants found all items understandable and easy to complete.

The first part of the interviews generated new information on experiences of the CRC screening. Uncomfortableness, pain, perceived burden of drinking the laxative and being bound to one´s home during the bowel preparation were CRC screening-specific topics discussed during the explorative part of the focus groups, that were not covered in the previous version of COS. Embarrassment, pain, and vulnerability related to the colonoscopy as well as uncertainty of the screening result and opinions on participation were other new topics not covered in the previous COS.

These topics were covered in a total of 18 newly developed items that were divided into three new a priori domains: ‘Perceived burden of bowel preparation’, ‘Negative colonoscopy experiences’ and ‘Knowledge of having colorectal polyps’. These new items were all found relevant by participants in the subsequent interviews. The wording of the items was also found understandable and there were no difficulties in completing the items.

These extra a priori domains formed a new part of the questionnaire, specifically for use in CRC screening ‘part Ix ‘. The new domains were only relevant after screening and only to participants who had undergone a colonoscopy following a positive screening result. Hence, these domains naturally fall outside COS part I and II. We assumed that the items and response categories in these a priori domains would have the same structure as the items in part I of the COS questionnaire.

Finally, an item originally developed for lung cancer screening, now modified to fit a CRC screening setting, was found relevant among the interviewees. We assumed that this item ‘Fear of CRC has, more than usual, been in the back of my mind’ would fit in the COS scale ‘Introvert’.

The 18 new items were all developed during the first two focus groups. After the second focus group no new information was discovered. Hence, no further items were generated from data collected in these interviews. The final draft questionnaire was therefore tested in its full version in the two final focus groups; one with women (n = 4) and one with men (n = 4) [24].

Two items on worries about CRC and believe in not having CRC, originally developed for breast cancer and belonging to part II of the original COS, were found relevant by the participants. These items were not included in the survey questionnaire, due to personnel error and the validation of the corresponding two-item scale could therefore not be performed in this study.

Single interviews, think-a-loud-test

Four women and one man were interviewed. The man and one of the women were interviewed on the street outside a shopping mall while three women were University administrative employees, interviewed at work.

One woman was uncertain about the meaning of item 53 ‘Worried about drinking other fluids during emptying of bowel’ in the new a priori domain ‘Perceived burden of bowel preparation’. Since she had not attended the screening programme yet, we assumed that her uncertainty was related to the fact that it was read out of context. No other comments on the phrasing of the items or the content was revealed during the interviews.

Recall period

The recall period for the questionnaire was set to four days. The decision about a recall period of four days was a pragmatic decision made by JB and JM. The time window from receiving a positive iFOBT result to undergoing the follow-up colonoscopy can be as narrow as five days or as broad as ten days. To capture the possible psychosocial consequences of being in limbo of having received a positive iFOBT result, waiting for the diagnostic colonoscopy but without being in the middle of emptying of the bowel led us to this decision.

Phase 2, Quantitative phase

Part I

Firstly, we evaluated unidimensionality, then we evaluated absence of DIF and lastly, we evaluated local independence, for each domain.

The four core-COS part I scales ‘Dejection’, ‘Anxiety’, ‘Behaviour’ and ‘Sleep’ exhibited adequate fit of the Rasch model and no items in these scales possessed DIF (Table 3).

Table 3 Fit statistics and Cronbach’s alpha of the dimensions of the COS-CRC

We found LD in three pairs of items in the ‘Dejection’-scale: item 1 and 8, item 8 and 10, and item 10 and 18 (Table 4).

Table 4 Results from the psychometric analyses of part I of the COS-CRC

Two pairs of items in the ‘Anxiety’-scale had LD: item 3 and 13, and item 11 and 12.

In the scale ‘Behaviour’ LD appeared in six pairs of items: items 4 and 5, items 4 and 7, items 4 and 16, items 5 and 16, items 5 and 19, and items 19 and 21.

In the ‘Sleep’-scale we found LD in three pairs of items: item 6 and 15, item 15 and 20, and item 20 and 23.

The scale ‘Introvert’ had overall misfit to the model (Table 3). Furthermore, item 26 ‘Fear of CRC has, more than usual, been in the back of my mind’ had DIF related to the exogenous variable ‘Diagnosis’. Since this item had neither fitted the model in a lung cancer screening setting, this item was removed from the model. Thereafter, the overall fit increased and no items in the scale possessed DIF. There was LD in six pairs of items: items 24 and 27, items 24 and 30, items 24 and 34, items 27 and 34, items 30 and 32, and items 32 and 34.

The three scales: ‘Change in body perception’, ‘Fear and powerlessness’-scale and ‘Change in perception of own age’ all had an overall good fit to the model. None of the items in these three scales possessed DIF or had LD to each other.

In general, the scale ‘Emotional reactions’ fitted the model adequately. However, item 41 ‘Frightened’ had DIF related to the exogenous variable ‘Diagnosis’. Therefore, we tested the model without this item and with split of the item for the variable ‘Diagnosis’. Splitting item 41 for the variable ‘Diagnosis’ revealed uniform DIF.

After we removed item 41 the scale fitted the model adequately and there were no DIF or LD.

The scale ‘Sex’ had overall misfit to the Rasch model. Furthermore, item 45 ‘Less interest in sex’ had DIF and the pairs of items 45 and 46 had LD. Therefore, item 45 was removed and item 46 was kept as a single item.

The ‘Lifestyle changes’-scale had good overall fit to the model. However, item 44 ‘Change in exercise habits’ had DIF and the two items forming the scale had LD. Therefore, item 44 was removed from the model, and item 43 was kept as a single item (Table 4).

Part Ix

The CRC-specific scale ‘Perceived burden of bowel preparation’ had overall good fit to the model (Table 5). The scale had LD for the pairs of items: 47 and 50, 47 and 52, 47 and 53, 48 and 49, 48 and 51, 48 and 53, 49 and 51, 49 and 53, 50 and 53, and 51 and 53. Item 53 ‘Worries about drinking other beverages during the bowel preparation’ had DIF related to the exogenous variable ‘Diagnosis’. Therefore, we tested the model without item 53. After removing item 53, the scale still fitted the model, no items possessed DIF and the pairs of items 48 and 49, 48 and 51, and 49 and 51 had LD.

Table 5 Results from the psychometric analyses of part Ix of the COS-CRC

The scale ‘Knowledge about colorectal polyps’ fitted the Rasch model, the items possessed no DIF or LD.

Our hypothesis of items 56–64 forming the scale ‘Negative colonoscopy experiences’ had overall fit to the model and no items possessed DIF. LD was revealed in 25 pairs of items and several items had misfit to the model. These results could indicate two- or multi-dimensionality. Therefore, we re-read all the items in this scale to reconsider whether there was more than one dimension hidden in this scale. We decided to split the scale into a physical part: item 56, 59 and 61, and a psychological part: item 57, 58, 60, 62, and 63. After re-reading item 64 about post-participation opinion we agreed on keeping it as a single item since it had been declared relevant to the participants and had not possessed DIF in the initial analyses, but linguistically it did not fit into any of the existing scales. The new scale ‘Negative physical colonoscopy experiences’ had overall fit to the model and the items possessed no DIF. One pair of items 56 and 59 had LD.

The scale ‘Negative emotional colonoscopy experiences’ also fitted the model and no items possessed DIF. The pairs of items 57 and 58, 57 and 60, 57 and 62, 58 and 60, and 62 and 63 had LD.

Part II

The four COS part II scales ‘Social relations’, ‘Relaxed/calm’, ‘Impulsivity’ and ‘Empathy’ had overall good fit to the Rasch model (Table 6). Furthermore, neither of the items possessed DIF. The ‘Impulsivity’-scale had LD for the pairs of items 16 and 19, 16 and 20, 19 and 20, and 20 and 21. The ‘Empathy’-scale had LD for the pairs of items 4 and 5 and 5 and 15.

Table 6 Results from the psychometric analyses of part II of the COS-CRC

The scale ‘Existential values’ fitted the model (p = 0.846). The pairs of items 10 and 11, 10 and 13, and 12 and 13 had LD. Moreover, item 11 ‘Well-being’ possessed DIF related to the exogenous variable ‘Age’. Therefore, we tested the model without this item and with split for the variable ‘Age’. After we removed item 11, no DIF was revealed but overall fit to the model decreased (p = 0.032) as well as item fit of the remaining items in the scale. The pairs of items 2 and 10, 10 and 12, 10 and 13, and 12 and 13 had LD. We tested the model with split for the variable ‘Age’, and the item revealed non-uniform DIF i.e. overall fit decreased compared with the initial analyses (p = 0.750).

Both the scales and the items fitted the Rasch model in the initial analyses. Since item 10 and 11 possessed LD, we performed another analysis where we merged item 10 and 11 into a super item, to examine whether this would remove the DIF [20]. The merge of item 10 and 11 to a super item revealed an increased overall fit (p = 0.914) but did not remove the DIF and the fit of the super item was lower than that of item 11 in the previous analyses. Moreover, the items 10 and 13, and 12 and 13 also had LD why we one by one merged them into super items. None of these super items resulted in an increased item fit or removal of DIF. However, we did not delete item 11 from the model, due to its high content validity.

Discussion

This study has developed and validated an extended version of COS specifically for use in CRC screening. The extended version is called consequences of screening in colorectal cancer, COS-CRC (Additional file 1). The extended version consists of three parts: part I (nine scales, two single items), part Ix (four scales and one single item) and part II (five scales).

The stringent design, combining qualitative and quantitative methods, is a strength of the study.

Moreover, all the items possessed high content validity and most of them also had adequate psychometric properties, which is a strength of this study.

Furthermore, COS has now proved content relevance and adequate measurement properties in five different screening settings, including CRC screening [19,20,21,22,23].

Only 16 persons of 80 invited men and women consented to participate in the focus groups, which could be considered a limitation [24]. However, since no new information developed during the last two group interviews, we were confident that data saturation was reached. Another limitation was the several scales that possessed LD. LD can decrease the item information collected and thereby the power of a study. However, presence of LD is not of importance as far as the scale fits the model and is used in a survey that has a sufficient number of respondents.

Moreover, the short recall period of four days could be considered another limitation. However, a longer recall period (e.g. a week) could induce inevitable bias, since it would not be possible to distinguish between consequences, in any directions, in relation to waiting for the iFOBT result, not having taken the iFOBT yet or even having undergone the colonoscopy.

The content relevance of the COS (part I and part II) as well as of the previously developed condition-specific items was established in a setting of CRC screening.

Furthermore, COS showed adequate measurement properties to measure psychosocial consequences in this context except in the scales ‘Introvert’, ‘Emotional reactions’, ‘Lifestyle changes’ and ‘Sexuality’ where one item in each scale possessed DIF. This may limit the applicability of these items to randomised studies, where DIF can be expected to be equally distributed among the study groups.

Item 41 in ‘Emotional reactions’ possessed uniform DIF related to the exogenous variable ‘Diagnosis’ but could be used in settings only investigating subgroups of CRC screening participants.

However, the study revealed gaps in content coverage of COS in relation to CRC screening-specific topics. New CRC screening-specific information was discovered in the focus groups and covered by 18 new items, which emphasize the importance of involving the experts when developing questionnaires. In this research area, the experts are the participants of the screening programme. COS-CRC is to our knowledge the first questionnaire on psychosocial consequences of CRC screening tested for content validity before use in CRC screening participants. The high content validity ensures that the questionnaire does not include items that are redundant or irrelevant to the respondents. Generic questionnaires are developed in other subpopulations than screening participants and have not been tested for content validity in a CRC screening setting [16, 17]. Therefore, there is a large risk that screening participants find these items irrelevant or redundant [39]. The high content validity of the COS-CRC questionnaire also confirms that all items in the questionnaire are relevant and are needed to cover all aspects of the multidimensional trait ‘Psychosocial consequences of CRC screening’ [39, 40].

Unexpectedly, the item ‘Well-being’ in the scale ‘Existential values’ possessed non-uniform DIF. This item has not possessed DIF in COS part II in screening for other non-communicable diseases [20,21,22,23]. As this DIF could be artificial, we therefore tried to locate the source of DIF by adding LD for three pairs of items to the model, thereby constructing super items [41]. This did not remove the DIF or increase the fit to the model. However, since this item has not possessed DIF in any other screening settings, the DIF could be spurious. Hence, this DIF should be tested in another sample before deleting this item permanently for use in a non-randomized CRC screening setting.

Conclusion

An extended version of the questionnaire COS has been developed to measure psychosocial consequences of CRC screening. The measure is called consequences of screening in colorectal cancer (COS-CRC) and consists of three parts; Part I: ‘Anxiety’, ‘Behaviour’, ‘Dejection’, Sleep’, ‘Introvert’, ‘Fear and powerlessness’, ‘Change in body perception’, ‘Change in perception of own age’, ‘Emotional reactions’, and the two single items ‘Lifestyle changes’, and ‘Sexuality’; Part Ix: ‘Burden of bowel preparation’, Knowledge about colorectal polyps’, ‘Negative physical experiences of the colonoscopy’, ‘Negative emotional experiences of the colonoscopy’ and the single item on ‘Regret participation’ and Part II: ‘Relaxed/Calm’, ‘Social network’, Existential values’, ‘Impulsivity’, and ‘Empathy’. We showed using Rasch models, that COS-CRC possessed adequate measurement properties.

Implications for research

We have not been able to identify any studies investigating the measurement properties of the questionnaires used to measure psychosocial consequences in a CRC setting, but in general, generic questionnaires are used for these purposes. However, condition-specific measures have been proved superior to generic measures in covering all the specific aspects of being part of a screening service [18]. Therefore, in future CRC screening trials measuring psychosocial consequences, condition-specific questionnaires with adequate measurement properties such as COS-CRC should be used to measure these consequences adequately. Moreover, suggestions for further research would be to include the two items on worries about CRC and believe in not having CRC in the COS-CRC to analyse whether these two items would fit a Rasch model.