Background

Colorectal cancer (CRC) is the third most common malignancy in men and the second most common in women, and the fourth leading cause of cancer-related deaths worldwide [1]. Despite progress made in recent years in the diagnosis and treatment of this disease, a significant improvement in survival at 5 years has not been shown, which persists around 50%. This is because over 80% of new cases are symptomatic patients, and the disease is advanced at the time of diagnosis. A significant percentage of CRC patients are diagnosed based on the presence of clinical symptoms associated with this malignancy [2,3,4,5]. It is therefore important to identify patients who have symptoms and/or signs of suspicion, so an early colonoscopy can be indicated. Recently, the proposed rule of 2-week wait (2WW) referral system for the NHS (UK) has been re-evaluated [6]. This system was proposed in 2000 to ensure that specialists in reference hospitals assess all patients with suspected CRC within 14 days after urgent referral by a primary care physician. This approach is based on the guidelines of the National Institute for Health and Care Excellence (NICE) criteria for suspected cancer [7]. However, different studies have shown the low sensitivity and specificity of the NICE criteria to refer for a 2WW program [8,9,10,11]. Although 77% of patients with CRC are referred by primary care physicians via urgent pathways, this system did not improve 5-year colorectal cancer survival [12]. This finding may be due to the fact that the symptoms when present very often indicate an advanced disease, and once a cancer became symptomatic, early treatment did not improve survival. Besides, the 2WW rule has been criticized, because of the low overall cancer detection rate, due to the poor specificity of the patient clinical symptoms (common with benign bowel diseases), resulting in overwhelming referral rates. The new NICE guidelines (July 2017) suggest using faecal occult blood test for clinical symptoms associated with a low probability of having CRC (PPV < 3%).

In this scenario, a number of CRC prediction models have been designed and validated in different settings [13]. These prediction models are calculated from mathematical equations based mainly on symptoms [14,15,16], and recently also on 1-sample faecal immunological test (FIT) [11, 17]. The patients included in those studies were symptomatic, but they did not always fulfil the criteria for a fast-track program. The diagnostic accuracy of these risk-scoring models is generally considered satisfactory and better than the existing referral criteria, but at present, they have not been widely recommended, perhaps as a consequence of the complicated mathematical equations required for their calculation.

In spite that a meta-analysis showed that in average-risk asymptomatic patients increasing the number of FIT samples did not affect the pooled performance characteristics of FITs for CRC [18], recent studies suggested that in symptomatic patients using either 2 or 3 tests provided the best discrimination for CRC [19,20,21,22,23].

Thus, the aim of the study was to derive and validate a predictive risk score to determine the pre-test-probability of advanced colonic neoplasia (ACN) in symptomatic patients with indication of a fast-track colonoscopy. In addition, to assess the accuracy for ACN diagnosis of a 3-sample FIT as compared to 1-sample FIT.

Methods

The study protocol is available as a supplementary file (Additional file 1).

Study population

All patients in the fast-track colonoscopy programs for CRC in the three participating hospitals were included in the period March-2014 to September-2016. Healthcare areas of these hospitals comprise a population of about 917,905 inhabitants. Patients with high-risk symptoms for CRC were sent for a full colonoscopy following the government’s program to expedite the diagnosis of CRC. All the colonoscopies in the health areas involved are performed at the endoscopic units of the three centres. Each unit performed more than 3,000 colonoscopies a year. Fast-track colonoscopy was requested mainly from primary care physicians or gastroenterology specialists in primary care medicine, but also from the hospital outpatient visits (different medical and surgical specialties). Inclusion criteria were as follows: 1. Age of 18 years or more; 2. Fulfilment of the criteria for a fast-track colonoscopy based on NICE guidelines [7]; 3. Signing informed consent. The exclusion criteria included pregnancy, asymptomatic individuals who were undergoing colonoscopy for CRC screening, patients with a previous history of colonic disease scheduled for a surveillance colonoscopy, patients requiring hospital admission, and patients whose symptoms had ceased within the three months before evaluation.

The clinical research ethics committees of the three centres approved the study and patients provided written informed consent.

Interventions

Trained case managers (specialized nurses) conducted the patients’ personal interviews to assure a standardized and proper collection of faecal samples, the fulfilment of the inclusion and exclusion criteria, and the signing of informed consent. In all patients the following interventions were performed: 1. FIT on three different days in the week prior to colonoscopy; and 2. A patient consultation questionnaire including a detailed assessment of colorectal symptoms, personal and family history of polyps/CRC, smoking and consumption of medication increasing the risk of a gastrointestinal haemorrhage (NSAIDs, aspirin, anticoagulants). In the case of rectal bleeding, patients were instructed, if possible, to collect faecal samples on days that they did not have rectal bleeding. Faecal haemoglobin was analysed using iFOB kit (Linear Chemicals SL, Barcelona, Spain) (see detailed description of the test in Additional file 2) which is able to detect values of 4 μg Hb/g faeces. Quantitative values were recorded for each of the three samples from 4 to higher than 160 μg Hb/g faeces.

A full colonoscopy with iv sedation with colon biopsies and/or polypectomy, if necessary, was performed on all evaluated patients. Endoscopists involved in the study had more than 2 years of experience from routine clinical colonoscopies, undertaking at least 200 procedures per year, with caecal intubation rates over 97%. The cleaning degree (Boston scale), results of the exploration, and pathology studies were registered. Colonoscopies with insufficient preparation (Boston scale = 1 or less in one colonic segment) were repeated. In the case of multiple polyps, the number of polyps, the size of the largest, and the most advanced pathology were registered. If CRC was recognized, its stage (TNM classification) and location (right, transverse, or left colon) was recorded.

Main outcome

The main outcome was ACN detection, which was defined as the presence of CRC or advanced adenoma (AA) (> 1 cm or high-grade dysplasia or villous component). Intramucosal carcinoma (Tis) was considered as AA.

Derivation and validation cohort

The rule of thumb of 10 events per variable was used to obtain a derivation cohort, assuming that the logistic regression model may account for between 10 and 15 dummy predictor variables [24]. In this line, assuming a prevalence of ACN of 20% (8% CRC, 12% AA) [25], a minimum sample size cohort of 600 individuals was required (expected events of 120). During the timeline of the study (March-2014 to May-2015) we could derive a larger cohort of N = 761 individuals in order to guarantee the number of dummy variables. From these individuals FIT was not available in 30 out of them, leading to a final cohort of N = 731 individuals.

We settled that a minimum sample size of 500 was required for the validation phase (see Additional file 3). Finally, we recruited a cohort of 527 individuals. In addition, symptomatic patients with no indication for a fast-track colonoscopy and negative FIT were included, to have enough patients with negative FIT in the low-risk group (finally, 136 in phase 1 and 101 in phase 2). This totals 867 individuals in phase 1 (731 + 136) with 171 events (104 advanced adenomas and 67 colorectal cancers), and 628 individuals in phase 2 (527 + 101) with 148 events (99 AA and 49 CRC) (Fig. 1).

Fig. 1
figure 1

Diagram flow of participants through the study in both the derivation and validation cohorts, including the number of participants with the main outcome

Derivation of the FIT variables included in the model

Three samples for FIT were obtained in each of the N = 867 individuals and we used the maximum f-Hb value (MAXFIT) out of the three samples in each individual. Since the variable f-Hb did not follow a Normal distribution even after logarithmic transformation, as has been reported elsewhere [17], and the risk of CRC did not have a linear relationship to f-Hb, MAXFIT was introduced as a categorical variable. We selected as cut-off 11 μg Hb/g faeces (see Additional file 3). Therefore, we assumed three categories for the MAXFIT variable: ≤4, > 4 to 11, > 11 μg Hb/g faeces. In addition, since we collected three faecal samples, we also used a variable which counts the number of samples with FIT> 4 μg Hb/g faeces (NSAMPLES> 4). This variable could take values in the range from 0 to 3 and was introduced in the model as an interaction term with MAXFIT variable.

Development of the risk score

Univariate analysis was carried out on the derivation set using the Pearson chi-square method to examine the association between clinical risk factors and advanced neoplasia. In our study, age and MAXFIT were the variables associated with higher ORs in the univariate analyses (see Tables 1 and 2), and it is in accordance with results reported in previous studies [17]. Unlike other studies, we have also considered the number of positive samples, since this variable improved the predictive performance of the model. Thus, the prognostic variables included in the seminal multivariate logistic model were Age, MAXFIT and NSAMPLES> 4. Then, significant variables in the univariate analysis (p < 0.05) were introduced one by one into the seminal multivariate model and its C-Statistic and Brier score were recorded. Those variables that significantly improved the performance indexes (higher C-Statistic and lower Brier score) compared against the seminal values were chosen to be included in the final prognostic model (see Table 4). For each risk factor, we assigned a weight in the risk score using the respective odds ratio (OR) yielded by the logistic regression, where the maximum log-OR received a score of 10 points. The risk score for an individual was the sum of their individual risk factors. Risk groups were classified according to the ‘AddFor’ algorithm, which allows for categorization of continuous variables in prediction models within a logistic regression model, in such a way that the best discriminative ability is obtained in terms of the highest C-Statistic [26]. This methodology allows for the selection of more than one cut point and better risk classification of patients.

Table 1 Crude and adjusted ORs of age and FIT variables for association with ACN in the derivation (Phase I) cohort (Seminal model: C-Statistic = 0.846; Brier Score = 0.115; Hosmer-Lemeshow p-value = 0.796)
Table 2 Univariate and age- and FIT-adjusted predictors of ACN in the derivation (Phase I) cohort: Crude and adjusted ORs, and C-Statistic and Brier score of the adjusted model

Statistical analysis

The development and external validation of a multivariable prediction model study was designed according to TRIPOD statement [27]. A checklist indicating the pages where information for each item is reported can be found as a supplementary file (Additional file 4).

Statistical analysis was performed with R software [28]. Variables considered after variable selection in univariate analysis are listed in Tables 1 and 2. Other analysed variables such as family or personal history of CRC or colonic polyps were no informative. Missing-data were introduced as ‘Unknown’ and analysed as a new category of each variable; this was mainly observed for analytical variables related to iron deficiency since at inclusion many patients had received oral iron supplements.

A Bayesian logistic regression model accounting for age and NSAMPLES> 4 depending on the levels of MAXFIT was fitted to each one of the variables considered, assuming a non-informative Cauchy prior for the model parameters [29]. Once the model was fitted, we derived the median OR and their 95% credible interval from the subsequent distribution of the model parameters. The C-Statistic and the Brier score were used as the overall performance measures of the modelling [30], selecting the model with maximum C-Statistic and minimum Brier score. Finally, the Hosmer-Lemeshow test was used to test the calibration of the model in the validation cohort [24, 30].

To assess the effect of using 3-sample FIT as compared to 1-sample FIT, we developed different models created with a FIT value randomly chosen from the 3 values of each patient. Also we compared the 3-sample FIT final model with a previously published 1-sample FIT test score (FAST score) [17]. For these comparisons we used the method proposed by DeLong et al. [31], as implemented by Robin et al. in pROC package [32].

Results

Description of the derivation and validation cohorts

Between April 2014 and February 2017, 1538 patients were included, 897 in the derivation (phase 1) and 641 in the validation (phase 2) cohorts (Fig. 1). Another 1595 patients either did not agree to participate or were not contacted with enough time before the colonoscopy. There were no differences in demographic characteristics and indications for colonoscopy between these and the included patients (data not shown). Forty-three patients (2.8%) were excluded because of improper FIT collection. The indications for colonoscopy and the baseline characteristics of the patients included in each phase are shown in Additional file 6 and Additional file 7. One thousand fifty-eight patients met the NICE criteria for a fast-track colonoscopy and 237 did not (136 in phase 1 and 101 in phase 2).

We detected CRC in 67/867 (7.7%) in study phase 1, and 49/628 (7.8%) in phase 2. No patient of the subgroup with negative FIT and who did not met the NICE criteria for a fast-track colonoscopy had a CRC. There were no significant differences in tumour localization and staging between the two phases (Additional file 5). Additionally, we found advanced adenomas in 203 patients, 104 (11.9%) in phase 1 (5 with no NICE criteria and negative FIT) and 99 (15.7%) in phase 2 (6 with no NICE criteria and negative FIT). Six adenomas in phase 1 had intramucosal carcinoma (Tis).

There were no significant differences in demographic and clinical variables between phase 1 and phase 2 included patients (Additional file 7). However, there were significant differences in the FIT variables (MAXFIT and NSAMPLES> 4). In fact, all CRC patients in phase 1 had positive FIT with values above 11 μg Hb/g faeces. In contrast, there were three CRC patients in phase 2 with negative FIT, and two patients with values > 4 and < 11 μg Hb/g faeces. Overall, there were 6/116 (5%) CRC patients with only 1 out of 3 faecal samples positive.

Derivation of the predictive risk score

Age and FIT variables were independently associated with the risk of ACN, with ORs very much higher than other variables (Table 1). Univariate and age- and FIT-adjusted predictors of ACN in the derivation (Phase I) cohort are presented in Table 2. Only ‘Colonoscopy up to 5 years before FIT’, ‘smoking history’, and ‘smoking years’ retained overall statistical significance and improve C-Statistic and Brier score of the multivariate analyses. Both in the derivation and the validation cohorts, the C-Statistic with ‘smoking history’ was slightly better; thus we selected this simpler variable for the derived score. The derived predictive risk score is described in Table 3, which shows the multivariate predictors of ACN using the model fitted to the derivation (Phase I) cohort. The predictive score ranged from − 4 to 24 points. The predictive performance of the final risk score for ACN was excellent with a C-Statistic of 0.865 (95% CI, 0.83–0.89). Taking into account only CRC, the C-Statistic was 0.93 (95% CI, 0.91–0.95).

Table 3 Multivariate predictors of ACN using the model fitted to the derivation (Phase I) cohort. The predictive score derived ranged from −4 to 24 points (Final model: C-Statistic = 0.865; Brier Score = 0.10; Hosmer-Lemeshow p-value = 0.86)

Effect of the number of positive samples on the predictive performance of the model

We compared the model resulting from taking 1 random sample out of the 3 samples with the proposed final model for ACN which includes MAXFIT and NSAMPLES (Table 4). This was done 5 times. The C-Statistic from the five random samples were lower than that of the proposed final model, showing that the final model including the interaction between MAXFIT and NSAMPLES classified better a 2.2–3% of patients.

Table 4 Comparison of the model resulting from taking 1 random sample out of the 3 FIT samples with the proposed final model for ACN

In addition, we compared our 3-sample FIT final model (COLONOFIT score) with the 1-sample FIT, age and sex test (FAST) score developed by Cubiella et al. [17] In study phase 1, the C-Statistic for both CRC and ACN was significantly higher with COLONOFIT than FAST score (CRC, 0.93 ± 0.009 vs. 0.90 ± 0.01; p = 0.04; and ACN, 0.86 ± 0.02 vs. 0.82 ± 0.02; p = 0.0007) (Fig. 2). In addition, in study phase 2, the C-Statistic for both CRC and ACN was also higher with COLONOFIT than FAST score (CRC, 0.86 ± 0.025 vs. 0.83 ± 0.03; p = 0.18; and ACN, 0.79 ± 0.02 vs. 0.75 ± 0.02; p = 0.0034). The differences between both scores were maintained after excluding the patients who did not met NICE criteria and had negative FIT (data not shown).

Fig. 2
figure 2

Comparison of the C-Statistic of the present 3-sample FIT model (COLONOFIT score) with the 1-sample FIT, age and sex test (FAST) score17 for ACN diagnosis in both the derivation and validation phase

Reliability of the model in the validation cohort

As above mentioned, the risk-score was more accurate for ACN and CRC detection in the derivation than in the validation cohort. However, its performance remained good. The Hosmer-Lemeshow p-value was 0.86, showing that the model was well calibrated.

Patients of the validation cohort were categorized into risk subgroups according the presence of ACN and the value of the final risk-score using the ‘AddFor’ algorithm (see methods) (Table 5). The probability (or prevalence) of ACN for a risk-score > 20 points was 66%, whereas for a risk-score ≤ 10 points it was 10%. If we take into account only CRCs, these probabilities were 32 and 1%, respectively.

Table 5 Probability of an individual classified based on its risk score (after applying the score derived from Phase 1 to the validation cohort, Phase 2). The cut-off points were selected using the ‘AddFor’ algorithm (see methods)

A risk-score > 10 points allowed us to diagnose 96% of CRC and 72% of AA, with only needing to prioritize a 50% of colonoscopies (Table 6). No patient with a risk-score < 4 points had CRC but in this case we need to prioritize 95% of colonoscopies. Higher cut-offs would prioritize fewer colonoscopies while increasing the percentage of CRC loss.

Table 6 Accuracy of a positive risk score (> 10 points) to diagnose both ACN and CCR in the validation cohort (Phase II) and in the derivation plus validation cohorts (Phase I + Phase II). Frequency of colonoscopy prioritization with the associated miss of either CRC or AA cases for a Score > 10

Discussion

A multivariable prediction modelling study to assess the pre-test-probability of ACN in symptomatic patients with indication of a fast-track colonoscopy was performed. This approach has been considered the most efficient method of capturing the effects of clinical judgement [33]. The derived and validated risk-score (COLONOFIT score) may be useful to prioritize colonoscopies in patients fulfilling criteria for a fast-track exploration. We saw as more relevant both the percentage of missed cases of CRC and the percentage of patients to prioritize than sensitivity and specificity to detect ACN. In fact, a risk-score > 10 points, which would imply to prioritize 50% of eligible patients, allowed the overall diagnosis of 98% of CRC (2% of missed CRC cases) and 77% of AA. It has been shown that decreasing the positivity threshold of FIT does not increase the detection rate of ACN, and thus it is assumed that not all CRCs can be detected by using FIT [34, 35].

A 3-fecal sample FIT regime was used in order to increase the CRC detection rate, selecting the cut-off to consider a sample as positive on the basis of the DOR. This cut-off was lower than that used in the screening program of average-risk asymptomatic patients in our geographical area. The logistic regression analysis showed that in addition to age, a maximum f-Hb level > 11 μg Hb/g faeces and the number of positive samples (> 4 μg Hb/g faeces) were the variables associated with the highest ORs to predict ACN. Noteworthy, having three positive samples was associated with ACN with the highest OR and C-Statistic. In contrast, no clinical symptom or clinical presentation was independently associated with the risk of ACN after adjusting for age and FIT variables. In fact, the low predictive value of clinical symptoms has been previously reported [8, 10, 33]. In addition to age and FIT variables, smoking increased while a previous colonoscopy (in the last 5 years) decreased the risk.

Some previous studies had suggested the use of FIT with other clinical variables or even only one-sample FIT plus age and sex (FAST score) as a predictive tool of CRC in symptomatic patients, showing more accuracy than symptom-based referral criteria [11, 17]. NICE referral criteria for suspected CRC have a 68.2% sensitivity and a 50.2% specificity for CRC detection and, actually, these criteria cannot rule out CRC [11]. In the present study, we showed that a COLONOTIF score > 10 points has a sensitivity of 98% and a specificity of 53% to detect CRC, and in this sense the NPV was 99.7% and the LR- 0.03. Although the specificity was not very high and a low specificity is associated with unnecessary referrals, it should be emphasized that a score > 10 allowed to prioritize 50% of patients fulfilling NICE referral criteria, detecting 98% of CRCs, and avoiding 50% of fast-track colonoscopies. Furthermore, 70% of patients with CRC had a score > 20 points, which was associated with a specificity of 90%. The present comparison shows that COLONOFIT score classified patients a 3–4% better than the FAST score in both the derivation and validation cohorts. Further studies on a direct comparison of both scores are needed to assess if the 3–4% gain in classification could be offset by lower adherence (by submitting 3 vs. 1 FIT).

Selecting patients attending primary care may prevent a selection bias of patients with higher prior probability of CRC [36]. The present study included mainly patients referred by primary care for a fast-track colonoscopy and, anyhow, the variable primary or secondary care referral was included in the analysis without showing significant association with ACN. Thus, the derived risk-score may guide primary care physicians in their referral decisions for a fast-track colonoscopy. COLONOFIT score allows risk-stratifying of patients in clinical practice to inform decision-making. Those patients with a score > 20 (15% of the patients in the validation cohort) had a risk of ACN of 66% (32% CRC, 34% AA) and should be sent for a 2WW colonoscopy. In contrast, those with a score ≤ 10 (50% of patients in the validation cohort) had a risk of ACN of 10% (1% CRC, 9% AA), and should be referred to either gastroenterology or other appropriate clinic in secondary care or a colonoscopy in the conventional slower referral route.

After developing the prediction model in the derivation cohort, its performance was evaluated in a different cohort (validation cohort), collected using the same protocol and outcome definitions and measurements, but sampled from a later period. Such external validation implies that for each individual in the new data set, outcome predictions were made using the derived predictive score [37, 38]. The performance of a predictive model is typically worse when evaluated on samples independent of the sample used to develop the model. In fact, in phase 1 of the study all CRC patients and 82% of AA had a score > 10; however, the performance characteristics in the validation cohort remained good.

Prevalence of CRC in symptomatic patients of the present study was near 8%, and that of ACN around 24%, which are figures much higher than those observed in the CRC screening populations (typically, ACN has a prevalence < 10% in that setting), and similar to other previous studies [11, 17]. This reflects the need for a different diagnostic approach in the two settings, and in this sense, lower cut-offs of FIT have been suggested for symptomatic patients [7]. Results of the present study suggest that using 3-sample FIT (on three different days) is an additional diagnostic strategy that increases the prior probability of ACN. The 3-sampling strategy for FOBT was traditionally used for guaiac-based tests, and was abandoned when introducing the more sensitive FITs, and a meta-analysis showed that in average-risk asymptomatic patients increasing the number of FIT samples did not affect the pooled performance characteristics of FITs for CRC [18]. It was concluded that a 1-sample FIT regimen for CRC detection might ultimately be desirable, given the importance of optimizing overall adherence in repeated rounds of biennial testing for programmatic screening. However, several studies have directly evaluated the effect of FIT sample number on the diagnostic accuracy of FITs in average-risk asymptomatic participants [19,20,21], and in symptomatic patients [22, 23], suggesting that using either 2 or 3 tests provided the best discrimination for CRC. Therefore, these data and the results of the present study suggest that using 3-sample FIT may increase the detection rate of ACN in patients with risk symptoms to an acceptable number needed to scope.

Not all studies on statistical risk models to predict CRC in people with symptoms included AA in addition to CRC. However, it is widely accepted that CRC arises from the adenoma-carcinoma sequence and so identification of patients with high-risk adenoma has the potential to reduce future incidence of invasive CRC and prevent mortality. A systematic review performed in 2016 included only two models assessing CRC plus AA [13], both of which reported only limited performance data and which have not been externally validated [39, 40]. We think that the priority of the model should be to identify both prevalent CRC and the patients at high risk of developing CRC in the future, and in this sense, the COLONOFIT score prioritizes symptomatic patients to detect both CRC and AA.

This study has specific strengths that deserve to be highlighted. All included patients had a colonoscopy (even those with negative FIT); the best cut-off point of FIT was selected on the basis of the results, using the DOR value which maximizes the probability of ACN; a 3-sample FIT regime was evaluated to assess whether the number of positive samples influenced the probability of ACN; a face-to-face structured and detailed survey was prospectively performed by a trained person; the external validation of the score was performed in a new cohort selected later in time; the derived risk score is ease of use, and all its components are readily available to general practitioners. The study has also some limitations. First, not all included patients were primary care referrals, but as noted earlier, there was no significant association with the referral origin, and all physicians used the same criteria to refer a patient for a fast-track colonoscopy that were reviewed by the case manager before inclusion. Second, the included patients fulfilled the criteria for a fast-track colonoscopy; thus the use of the predictive score cannot be generalized to patients without these criteria without further studies. Finally, there were missing data about iron status (patients were receiving iron supplements) without statistical significance comparing the two study phases, which could have precluded finding a significant association.

Conclusion

A risk-scoring system was derived and validated to prioritize fast-track colonoscopies, which was shown to be efficient, simple, and robust. Further external validation studies are needed to warrant the widespread recommendation of the model in clinical practice.