The majority of colorectal cancer cases are still diagnosed in a clinical setting, even in countries offering screening [1]. Early detection of colorectal cancer reduces both the cancer mortality [27] and the incidence [6, 8] of the disease. The reduction in incidence is thought to be due to removal at colonoscopy of adenomas which are recognised as non-obligate precursor lesions to cancer [9]. Adenomas are commonly found in adults over 50 years undergoing colonoscopy, and the majority of these will not develop into cancer. The malignant potential of an adenoma depends on its size and histology, with larger adenomas and those with more than 25% villous architecture or dysplasia more likely to have a higher risk. Church has estimated that 16% of adenomas between 6-9 mm and 4% of those between 1-5 mm fit into this higher risk category based on their histology [10].

Colonoscopy is an invasive and costly procedure: serious complications, including bleeding and bowel perforation may occur in 0.1-0.6% of procedures [1113]. Further, colonoscopy is a scarce resource. The Center for Disease Control and Prevention (CDC) in the United States has shown that even if half the colonoscopy capacity was dedicated to screening, the capacity to undertake such screening would be limited [14]. It would therefore be useful to be able to select people for colonoscopy who at higher risk of either cancer or premalignant adenomas.

We have previously shown that symptoms are not good predictors of the presence of colorectal cancer but that prediction improved using a model which also included demographic details and medical history such as previous colonoscopy, bowel disease, smoking history, and use of aspirin or non steroidal anti-inflammatory medications [15]. Advanced adenomas were excluded from that analysis. The aim of this current paper is to report the results of a more comprehensive model that discriminates between colorectal cancer, advanced adenoma, large adenomas and small adenomas (versus none of these abnormalities).


The study design and inclusion criteria have been described previously [15]. This is a cross sectional study in which participating patients (> 18 years, and recruited from participating gastroenterologists and colorectal surgeons following scheduling for colonoscopy for any indication) completed a questionnaire, previously shown to be reliable and reproducible [16], eliciting details about demographic details, family history, medical history (previous colonoscopy and bowel disease, polyps, aspirin and NSAID use within the previous 2 years, smoking history), and bowel symptoms. These included rectal bleeding, change in bowel habit, passage of rectal mucus, abdominal or anal pain, sensation of abdominal or anal lump, incomplete evacuation, urgency, history of anaemia, weight loss, and fatigue. We also elicited information about characteristics of the symptom such as duration, frequency and severity and whether or not the patients had sought consultation with a doctor for that symptom.

Findings at colonoscopy were obtained from endoscopic records. All lesions found were confirmed by histological examination. We classified findings into five categories: cancer, advanced adenoma (adenoma with significant (> 25%) villous features, or high grade dysplasia, including carcinoma-in-situ, or size 10 mm or larger [17, 18], adenomas 6 to 9 mm in size, adenomas ≤ 5 mm, and no adenoma found. Where patients had more than one lesion, we classified them by their most significant lesion, according to the hierarchy listed above. If the adenoma size was not recorded by the endoscopist, we used the size recorded at histological examination; if neither of these was available, we used the size description noted by the endoscopist (based on analysis of adenomas for which we had both the description and size reported, we categorised descriptions of "diminutive", "tiny", "very small", or "minor" as ≤ 5 mm; "moderate" or "small" as 6-9 mm; and "large", "very large", or "huge" as ≥ 10 mm). We classified adenomas for which no size was recorded as ≤ 5 mm (n = 32).

Patients were recruited between April 2004 and December 2006. We included only patients who completed the questionnaire 6 months or less before their colonoscopy was done, and whose colon examination was complete (visualisation of the caecal pole at colonoscopy or if not visualised, by follow up bowel investigations).

Ethics committee approval

The study received approval from the Ethics Committees of the University of Sydney, Central Sydney (CRGH and Central Zones), Northern Sydney and Central Coast, and Western Sydney Area Health Services and the Sydney Adventist Hospital. All patients provided written consent.

Statistical Analysis

Patients were grouped according to the most significant grade of abnormality found giving five different outcome groups: cancer, advanced adenomas, adenomas of size 6 to 9 mm, adenomas of size ≤ 5 mm, and no cancer or any adenomas. Descriptive analyses were undertaken to assess the prevalence of cancer, advanced adenomas and smaller adenomas separately by demographic, medical history and symptom variables using the sum of no abnormality and the outcome of interest only as the denominator. Odds ratios were calculated comparing the odds of having cancer, advanced adenoma or adenomas 6-9 mm or ≤ 5 mm (separately for each of these outcomes) with no abnormality univariately for each of the symptom, demographic and other health information subgroups. Multinomial logistic regression was then used to simultaneously assess which of these risk factors were associated with the outcomes of cancer, advanced adenomas and adenomas sized 6 - 9 mm, and ≤ 5 mm. This method fits simultaneous logistic regression models, each with its own intercept and coefficients, to compare each of the four outcomes listed above to the referent category (no cancer, advanced adenoma or adenoma of any size). Backwards elimination of risk factors was used to simplify the model using likelihood ratio tests with p < 0·05 as the criterion for statistical significance. Interactions were considered for elimination first. As numerous comparisons were made, results for interactions were not included if their significance was close to 0·05 and there was no biologically plausible basis for the interaction. Because the estimated coefficients for the explanatory variables vary by outcome, odds ratios for the final model were calculated for each risk factor (compared to not having the risk factor) for each outcome of cancer, advanced adenoma and adenoma of any size using the absence of these as the reference group. This analysis for the cancer outcome differs slightly from that reported previously [15] as the comparison group in that paper included adenomas less than 10 mm in the referent group, whereas this analysis uses no abnormality as the referent group.

A sequence of additional multinomial logistic regression models were fitted to assess the incremental value of variables found to be statistically significant in the final model. The sequence was: (1) age only; (2) model 1 + other demographic variables; (3) model 2 + medical history variables; (4, the final model) model 3 + symptoms. For each of the four outcomes, estimates of sensitivity and specificity across all values of predicted probability, with no abnormality (ie no cancer, advanced adenoma or adenomas) as the referent group common to all, were used to obtain a ROC curve. The area under each curve was used to assess the ability of the model to discriminate between patients with no abnormality and patients with (i) adenomas ≤ 5 mm; (ii) adenomas 6 - 9 mm; (iii) advanced adenomas and (iv) cancer.

The percent of abnormalities that would have been detected was calculated for different possible screening criteria based on age and previous colonoscopy. To allow comparison with the model, the predicted probabilities of cancer were sorted from highest to lowest and a cut-point was applied to include the same number of patients (above the cut-point) who would have been screened based on the age and previous colonoscopy criteria. Detection rates for cancer and adenomas were then compared. Detection rates for the 40% of patients with the highest predicted probability of cancer from the model were also computed. All analyses were done in SAS version 9.2.


Data were available from a total of 8,204 patients. 47% were male. The age range was 18 to 95 years (median age 58 years), with 27% aged less than 50 years, 26% 50-59 years, 25% 60-69 years and 22% over 70 years of age. All patients underwent colonoscopy, for which there was a 98% caecal intubation rate. The overall cancer prevalence was 1.9% (159 patients). Risk of cancer and adenomas was dependent on age (Figure 1). The prevalence of cancer and all types of adenomas ranged from less than 3% in people under 50 years of age, and increased to over 10% in people 70 years or older for advanced adenoma. The odds ratios, which measure the increase in prevalence as age increases relative to people under 50 was strongest for cancers, and similar for all types of adenomas.

Figure 1
figure 1

Colorectal cancer and adenomas: prevalence and risk of different age groups. The prevalence (with 95% confidence interval) of cancer, advanced adenomas and adenomas (≤5 mm and 6 - 9 mm) for each age group (less than 50 years, 50 - 59 years, 60 - 69 years and more than 70 years) and the odds ratio (with 95% confidence interval) of having cancer, advanced adenomas and adenomas for the age groups 50 - 59 years, 60 - 69 years and more than 70 years compared to those less than 50 years. 6784 patients had no cancer, advanced adenomas or adenomas, 507 had adenomas ≤ 5 mm, 286 had adenomas 6 - 9 mm, 468 had advanced adenomas and 159 had cancer.

59% had undergone colonoscopy in the previous 10 years. Cancer rates were approximately 5 times greater in patients who had not had a previous colonoscopy in each age group, whereas advanced adenoma rates were about twice as high in people who had not had a colonoscopy in each age group (Table 1). For smaller adenomas (less than 10 mm), there was no clear pattern related to prior colonoscopy.

Table 1 Rates (per 1,000) of cancers and adenomas by age and colonoscopy in previous 10 years

The relative prevalence of abnormalities increased with age in a similar way for those who had or had not had a previous colonoscopy. For instance, in people who have had a previous colonoscopy, the cancer rate in people aged 70 or more was 17 times higher than in people aged less than 50 (from Table 1, 17/1000 divided by 1/1000). The estimate in people who have not had a previous colonoscopy in the previous 10 years was 12.3 (from Table 1, 74/1000 divided by 6/1000). The corresponding estimates for: advanced adenoma were 4.1 vs 3.8; adenomas 6-10 mm were 2.9 vs 2.9; adenomas ≤ 5 mm were 2.5 vs 4.7 respectively. There is no statistical evidence (p = 0.51) that the effect of age was modified by previous colonoscopy which indicates that the observed variability is due to chance. Furthermore, the predicted rates per 1000 based on a multinomial model (see additional file 1) that included age and previous colonoscopy as predictor variables are very similar to the observed rates in Table 1, indicating that even with only these two variables the model fits well.

The fact that the model with the 2 major predictors, age and previous colonoscopy, fits the data well suggests that the appropriate way to assess the value of potential risk factors is to assess whether adding them to this multivariable model results in any improvement to the fit. Further model checks showed that the odds ratios for all predictors were very similar in the model that included all additional significant variables, when run separately for people with or without previous colonoscopy (additional file 1). Similarly, statistical tests for interaction indicated that neither previous colonoscopy nor age were effect modifiers for the other variables subsequently included in the model.

Multivariable risk identification

The results of the multivariable model to distinguish between each outcome and the absence of cancer or any adenoma (the referent category) are shown in Table 2. The effect of age followed a similar pattern to that in Figure 1. For cancer and all types of adenomas, male gender increased the risk by about a third. Colonoscopy in the last 10 years remained highly protective for cancer, but the effect became far less marked the smaller the adenoma. With previous colonoscopy in the model, a history of adenomas was not predictive of cancer, but was predictive of finding adenomas again. There was an exposure response relationship between amount of tobacco smoked and cancer. For adenomas, the gradient of the exposure-response was less steep, and became flatter the smaller the size of the adenoma. A self reported history of irritable bowel syndrome, and of NSAID or aspirin use was associated with reduced cancer risk but the effect on adenomas was weaker or absent. For symptoms, a history of passing mucus per rectum and rectal bleeding were associated with a higher risk of cancer, particularly if the symptom was of recent onset and occurred frequently. This association was not evident for people with adenomas, with the possible exception of bleeding with advanced adenomas.

Table 2 Multinomial model odds ratios for the included demographic/medical history and symptom variables for each of the four outcomes compared to patients with no adenomas, advanced adenomas or cancers

The areas under the ROC curves based on the predictive models for cancer and adenomas are shown in Figure 2 and Table 3. With only age in the model, the areas were 0.66 for cancer and between 0.60 and 0.62 for adenomas indicating moderate discrimination. As other sociodemographic variables, medical history and symptoms were added, the area under the curve for cancer achieved good discrimination (0.83), but the improvement for advanced adenomas and smaller adenomas was less marked (0.70 and 0.67 respectively).

Figure 2
figure 2

ROC curves for the multinomial model showing the discrimination of the model for cancer, advanced adenomas and adenomas sized 5 - 9 mm and ≤ 5 mm; for each outcome the reference group is no cancer, advanced adenoma, or adenomas.

Table 3 Areas under the curve for multivariable prediction of cancer and adenomas

Using the multivariable model (which includes age, sociodemographic variables and symptoms), the predicted probability for an individual can be calculated for each of the outcomes, based on age and gender, medical history and symptoms. The value of the model can be demonstrated by comparing the model results with simple methods for predicting risk (Table 4). For example, if we restricted colonoscopies to people 40 years and older, we would have avoided 10.5% of colonoscopies but still detected all of the cancers and over 97% of the adenomas. If we use the model to avoid colonoscopy in the 10.5% at lowest cancer risk, we would have missed 1.3% of the cancers. So for low-risk patients, a simple age-based method does well. For high-risk patients defined for example as people 60 years and older who have not had a colonoscopy in the past 10 years: 16% of the population are in this group but it contains almost half of the cancers (49.1%) and 28% of the advanced adenomas. If we examine instead the 16% at highest risk from the model, the cancer detection rate increases to 64.8%, without any loss in adenoma detection. In fact 85.5% of the cancers and 57.7% of the adenomas could be detected by the model by only performing colonoscopy on the 40% identified by the model as being at highest risk.

Table 4 Outcome (percentage) that would have been detected using the model if colonoscopy restricted


Our findings demonstrate that a predictive model based on sociodemographic variables (age, gender and education level), pertinent medical history (previous colonoscopy, smoking, use of NSAID or aspirin, previous polyps, and IBS) and symptoms (rectal bleeding, rectal mucus, anaemia and fatigue), does well at predicting colorectal cancer and reasonably well at predicting advanced adenomas.

It is of interest to identify which variables are most strongly predictive of cancer and adenoma prevalence. Age is the dominant risk factor for cancer and for adenomas of all sizes. Having had a colonoscopy within the previous 10 years confers protection for cancers and advanced adenomas. Adding medical history and symptoms (rectal bleeding, mucus, anaemia and fatigue) to the model adds further modest improvement to cancer prediction, but negligible improvement to adenoma prediction.

Our finding that family history is not associated with an increase in prevalence of colorectal cancer may seem surprising. It is likely that this reflects the clinical setting of our cohort, with patients with a family history of colorectal cancer already having been screened and included in those having undergone colonoscopy previously. Other studies have also noted that in people with symptoms a positive family history does not increase the cancer prevalence [19, 20], and indeed, guidelines for referral of patients in place in Britain which aim to identify patients with higher risk symptoms, do not include assessment of family history [21].

The quality of our study relates to several factors including the size of our study with over 8,000 patients, the prospective nature of the data collection, the completeness of information on all patients, the requirements of complete examination of the entire colon, and pathological examination of all lesions encountered. Information about symptoms was also consistently collected using a validated questionnaire [16]. A further strength of our study is that it represents a heterogeneous population which reflects what occurs in clinical practice in the real world and allows exploration of what factors that make up that heterogeneity predict the probability of cancer or adenomas. A potential limitation of our study was that there was no standard reporting for colonoscopy. However, the reports from which data were extracted were those used in clinical practice; based on a caecal intubation rate of 98% we believe the procedures were of high quality.

Our model does well at predicting cancer prevalence, achieving an area under the ROC curve of 0.83 which is similar to that found in other studies, for example Selvachandran (0.86) [22]. Our models help to identify individuals who have a high probability of cancer amongst people referred to gastroenterologists and colorectal surgeons, thus helping to indicate the urgency for colonoscopy. At the low-risk end of the spectrum, prediction can be simplified to age: the probability of cancer or adenoma is very low in people under 40 and reduced still further if they have had a colonoscopy in the previous 10 years. For them, potential risks of colonoscopy may outweigh potential benefits. Consideration can be given to discussing benefits and harms of the procedure with patients to reach the best benefit-harm trade-off for each person, as has been done in other areas of health care [23].

In addition, risk information from the model can be useful at a policy level. Decision making about resource utilisation at a population level should take risk assessment into account to ensure that colonoscopy is prioritised to groups at higher risk of disease. At a general practice level, resources may, for example be directed to ensure that those in higher risk groups are referred for colonoscopy, while at a specialist level resources should be targeted at those who have never had a colonoscopy rather than for inappropriate, frequent colonoscopy. At a population level, symptoms as warnings for cancer or adenomas should be de-emphasised. Our model is not strictly applicable to patients presenting to a general practice. However, it is not feasible to do a study in patients presenting to a general practitioner and obtain colonoscopies on all patients. Indeed, the major symptom prediction studies in patients have been done in referred populations [22, 24, 25]. Our cancer prevalence is considerably lower (1.9%) than in other similar studies, which report cancer prevalences of between 4 to 12% [22, 2426], suggesting that our population is less strongly filtered and therefore more representative of general practice.

In addition, given that in general practice the probability of cancer may be even lower than that predicted in the referred population, it seems reasonable to use the information from the model to inform decisions in general practice, in particular to identify who has a very low probability of cancer or advanced adenoma. The model will be the most reliable source of predicting cancer or advanced adenoma for most patient characteristics. This can be supplemented with selected information, for example the effect of family history, from sources where that has been reliably estimated elsewhere.

Another approach to identifying patients at higher risk for cancer or adenomas on colonoscopy patients is to use FOBTs [25], as suggested by Rozen [27]. A recent review of FOBTs provided odds ratios for FOBT detection of cancer and advanced adenoma, which can be converted to areas under the ROC curve (AUC) and compared with our model [28]. The AUC values were 0.93 for cancer, 0.88 for advanced adenomas and 0.69 for all adenomas. Other AUC values obtained for adenomas in a clinically presenting population were 0.72 for advanced adenomas and 0.64 for all adenomas. Overall, these are similar to or slightly higher than those found in our study. These data might suggest that FOBT would be as, or more, effective than our model as a triage tool for prioritising colonoscopy. However, FOBT requires additional cost and effort, whereas our model requires only easily and immediately obtainable sociodemographic and medical history information. Models that incorporate both this information and FOBT results should be developed and evaluated as this may boost prediction still further.


Colorectal cancer is common and preventable. Our models may assist in identifying population subgroups at higher risk of disease, ensuring that colonoscopy is prioritised for those for whom it would be of most benefit. Age is the dominant risk factor in this model. Younger age and prior colonoscopy in the preceding 10 years predicts a low probability of cancer or adenomas and should be appreciated by referrers, proceduralists, providers and health planners when aiming to target colonoscopy resources most effectively.

Authors' information

Barbara-Ann Adelstein, senior research fellow, Prince of Wales Clinical School, Faculty of Medicine, University of NSW, Sydney, Australia.

Petra Macaskill, associate professor, STEP, The Sydney School of Public Health, University of Sydney, Australia.

Robin M Turner, research fellow biostatistics, STEP, The Sydney School of Public Health, University of Sydney, Australia.

Peter H Katelaris, gastroenterologist, Department of Gastroenterology, Concord Hospital, University of Sydney, Australia.

Les Irwig, professor of epidemiology, STEP, The Sydney School of Public Health, University of Sydney, Australia.