Colorectal cancer (CRC) is the second leading cause of cancer deaths in the USA1 and in New Hampshire (NH), despite being a preventable disease. Treatment is significantly more effective when diagnosed in the early stages of disease, and survival is far worse for patients diagnosed with late-stage CRC. Furthermore, the costs of treating CRC are greater for late-stage disease, a discrepancy that has widened substantially in recent years.2 The cost of treating CRC is estimated to be $14 billion per year in the USA.3 Therefore, the ability to identify and target those patients and populations most likely to present with late-stage CRC offers an important opportunity to impact population health and cancer survival rates and to reduce overall healthcare costs.

Colonoscopy is by far the most commonly used CRC screening test in the USA4 and reduces incidence and mortality outcomes of CRC.5 Raising colonoscopy-screening rates within groups recognized as being at increased risk of late-stage CRC will help to realize improved outcomes for those high-risk individuals. Some prior studies support an association between late-stage CRC at diagnosis and patient characteristics, such as age, gender, family history of CRC, or lifestyle factors, including education, insurance, and socioeconomic status.6,7,8,9,10 Factors that increase the risk of a late-stage diagnosis among CRC patients differ from those associated with increased CRC incidence overall. For example, CRC incidence is equal among males and females; however, female patients are more frequently diagnosed at a late vs. early stage.6 This project aimed to identify characteristics and lifestyle factors associated with increased risk of late-stage vs. early-stage CRC diagnosis.


Since 2004, the New Hampshire Colonoscopy Registry (NHCR, initially funded by NCI R01CA131141) has collected data from colonoscopies in over 130,000 consenting individuals undergoing procedures at participating New Hampshire facilities. The NHCR includes patients receiving colonoscopy in NH, regardless of state of residence.11, 12 Consent rates are uniformly high (80%). Study (registry) protocols are approved by the Dartmouth Committee for Protection of Human Subjects and other relevant human subjects review boards. A Patient Information Form completed by the patient before colonoscopy collects patient characteristics and lifestyle factors. Procedure forms completed by endoscopists or nurses collect colonoscopy characteristics, including exam indication and detailed data on all findings.

Patients report race as White, American Indian/Alaskan Native, Asian, Black, Pacific Islander, or other race, and ethnicity as being of Hispanic/Spanish/Latino origin or not. Insurance coverage is recorded in the Colonoscopy Registry as HMO, Medicaid, Medicare, private, other, or none. Detailed history of CRC among first-degree relatives includes the specific relationship to the patient (e.g., mother) and age at diagnosis of CRC. Personal or family history of hereditary syndromes includes hereditary nonpolyposis colon cancer (HNPCC or Lynch syndrome) or familial adenomatous polyposis. Personal history of polyps and inflammatory bowel disease was assessed using data from both Patient Information and Procedure forms.

Patients reported the maximum education level attained as some high school diploma, high school graduate/GED, some college, or college/post-college graduate.

The New Hampshire State Cancer Registry was established in 1986 and contains high-quality, population-based data on all cancers diagnosed among NH residents, including demographic data, date and mode of diagnosis, and stage. Cancer reports on NH residents diagnosed in other states are obtained through exchange agreements with neighboring state registries, including Maine, Vermont, Massachusetts, and more distant states such as Florida. The registry regularly achieves the highest standard (gold) certification for data quality from the North American Association of Central Cancer Registries. We also assessed the underlying distribution of all CRC patients in the state with regard to age, gender, and stage using the entire NH State Cancer Registry dataset, which we accessed through the Dept. of Health and Human Services from for 2000–2014.

Registry Linkage

We used Fine-Grained Record Linkage (; FRIL v.2.1.5) software to link the detailed questionnaire data collected by the NH Colonoscopy Registry with the CRC cases in the population-based NH State Cancer Registry. The probabilistic linkage process used Social Security Number, last name, date of birth, gender, and first name. We performed manual review of cases where confidence level was below 90. After removing duplicates, the result was a dataset of 14,737 cases, which included all cancers. We selected the CRC cases from this pool, as described below.

The linkage comprehensively identified any CRC diagnoses among NH residents after the NH Colonoscopy Registry began in 2004, including those both prior and subsequent to a patient’s colonoscopy visits. If more than one Patient Information form was completed, we prioritized the colonoscopy questionnaire completed within 3 months of the CRC diagnosis for analysis; if no questionnaire was completed in that period, one completed prior to the CRC diagnosis was used in the analysis. Analysis of patient characteristics that could vary through time was restricted to questionnaires completed prior to CRC diagnosis. CRCs were divided according to the American Joint Committee on Cancer (AJCC) TNM system 6th edition criteria into early-stage (≤ stage IIA) vs. late-stage (≥ stage IIB) at diagnosis as the primary outcome. This breakpoint was chosen because it represents a substantial shift to reduced survival (stage I 96%, IIA 91%, IIB 80% at 30 months; stage I 93%, stage IIA 85%, stage IIB 72% at 60 months); thus, diagnosis before stage IIB could improve patient outcomes.13

We performed statistical analysis by logistic regression using late-stage vs. early-stage CRC at diagnosis as the outcome. We assessed hypothesized late-stage CRC risk factors as univariate predictors using chi-square tests. We then constructed a multivariable model incorporating factors meeting a p value cutoff of < 0.1. For example, we modeled age at CRC diagnosis adjusted for family history of CRC, using late-stage diagnosis as the outcome. p values < 0.05 are considered statistically significant.


The NH State Cancer Registry contains 14,043 CRC cases diagnosed among NH residents from 1995 to 2015. We identified 1953 NH Colonoscopy Registry participants who were diagnosed with CRC at any time. We restricted our analysis to the 1196 cases with known stage of disease among the 1242 participants who developed CRC after the Colonoscopy Registry started (diagnosis date range 2004–2015). Overall, 64% of CRCs received an early-stage diagnosis (n = 768), and 36% received a late-stage diagnosis (n = 428).

Figure 1 compares the characteristics of the NH Colonoscopy Registry participants with CRC to the CRC cases reported to the NH State Cancer Registry. The age distribution of the two cohorts is similar, although a smaller proportion of very elderly patients participate in colonoscopy (Fig. 1a), which is expected as regular screening for patients with prior negative findings ends at age 75.14 The gender distributions also matched (p = 0.23; Fig. 1b). As expected, a higher proportion of the Colonoscopy Registry participants are diagnosed with early-stage CRC compared to the NH State Cancer Registry (Fig. 1c).

Figure 1
figure 1

Characteristics of NH Colonoscopy Registry participants compared to the underlying NH State Cancer Registry CRC population. Light gray represents Colonoscopy Registry participants; dark gray shows CRC case in the NH State Cancer Registry. (a) The age distribution of the cohorts is shown in 5-year intervals. (b) The gender distribution is not different between the two cohorts (chi-square p value 0.23). (c) The stage distribution shows a higher proportion of early-stage CRC diagnoses in the Colonoscopy Registry cohort (64%) compared to the NH State Cancer Registry (39%) (chi-square p value 2.2 × 10 −16 ).

CRCs diagnosed at a young age (< 50) were more likely to be late stage (47%), compared to CRCs diagnosed after age 50 (34% late stage) (OR 1.81, 95% CI 1.27–2.58, p = 0.00098) (Table 1). When the analysis restricted to patients with a colonoscopy indication that is diagnostic, rather than screening or surveillance, the rates become more similar, though the late-stage CRC rate remained 6% higher in diagnostic patients age < 50 (p = 0.45) (Table 1). Gender and race were not significantly associated with stage at diagnosis. Family history of a first-degree relative with CRC was associated with modestly reduced risk of a late-stage CRC diagnosis (30% with a first-degree relative with CRC, compared to 37% without, OR 0.73, 95% CI 0.52–1.00, p = 0.055). Reported family history of a hereditary genetic syndrome was not associated with diagnosis of CRC at a late stage (p = 0.85). Age < 50 remained a significant risk factor for late-stage CRC diagnosis (OR 1.92, p = 0.00069) in a multivariable model containing the two factors with univariate p values < 0.1: age at CRC diagnosis and family history of a first-degree relative (Table 1).

Table 1 Patient Characteristics and Risk of Late- vs. Early-Stage CRC Among All Colonoscopy Participants

Overall, being uninsured was not associated with the stage of CRC diagnosis of our colonoscopy patient population (p = 0.59). Medicaid recipients had a slightly higher proportion of late-stage cancer (44%) compared to non-recipients (35%) (p = 0.15) (Table 1). Restricting to patients reporting insurance status prior to diagnosis, the proportion of late-stage diagnoses remained higher for those on Medicaid (52%). Among the subset of participants < 65 years of age (not yet eligible for Medicare), being a Medicaid recipient was associated with a twofold higher proportion of late-stage, rather than early-stage diagnoses (OR 2.32, 95% CI 1.05–5.26, calculated by logistic regression) (Table 1).

Based upon questionnaire data collected at or before the CRC diagnosis visit on n = 555 patients, a personal history of polyps was associated with lower risk of a late-stage diagnosis (n = 61 33% in patients with prior polyps vs. n = 162 44% without) (OR 0.63, 95% CI 0.44–0.91, p = 0.014). Out of the patients with a questionnaire prior to their CRC diagnosis and a prior polyp, 39 had specific prior polyp histology information abstracted from pathology reports available for analysis. Among these patients, late-stage CRC was diagnosed in 37% of those with history of low-grade tubular adenoma, hyperplastic polyp, or sessile serrated adenomas (n = 10 of 27), but only in 8% of patients with a prior villous/tubulovillous adenoma, or high-grade tubular adenoma (n = 1 of 12) (Fisher p = 0.12) (data not shown). Perhaps not surprisingly, among n = 477 with indication data, late-stage CRC was significantly less likely among patients with colonoscopies performed with an indication of screening (n = 48, 33%, p = 0.0057) or surveillance (n = 26, 29%, p = 0.0026), compared to patients with a diagnostic colonoscopy (n = 115, 48%) (data not shown).

We performed a secondary analysis of young-onset CRC to identify factors associated with early- vs. late-stage CRC diagnosis before age 50 (Table 2). Within the age < 50 or over 50 subgroups, we did not observe statistically significant associations with CRC stage for age at diagnosis, gender, or family history. Among those participants diagnosed with CRC prior to age 50, those with higher education (college and beyond) as their maximal level of education were associated with a greater proportion of late-stage diagnoses of CRC. While just 27% of those reporting a high school education as their maximum lifetime level presented with late-stage CRC, 61% of those with a college education (p = 0.0015) and 48% with post-college education (p = 0.038) presented with a late-stage CRC. The young-onset education association is unaffected by inclusion of age at diagnosis, gender, or family history in the model (college OR 4.43, beyond college OR 2.32). In contrast, after age 50, risk of late-stage CRC diagnosis was lower among the college attendees (OR 0.68, p = 0.027). We observed a statistically significant multiplicative interaction between college attendance and age at diagnosis modeled as a continuous variable (p = 0.021) (Table 2).

Table 2 Stratified Analysis by Age of Onset: Risk of Late- vs. Early-Stage CRC Diagnosis

The indication listed for the majority of the colonoscopy exams performed prior to or at diagnosis on these young-onset individuals was diagnostic (n = 47, 81%), rather than screening (n = 6, 10%), or surveillance (n = 2, 3%). Diagnostic indication was not significantly related to CRC stage (p = 0.86, data not shown). The indication for colonoscopy exams performed prior to or at diagnosis was fivefold more likely to be “diagnostic” for the patients under age 50, compared to older patients (p = 4.7e−8, data not shown).


Our findings demonstrate that the risk of late-stage diagnosis was higher among those diagnosed at age < 50, and those with Medicaid insurance, while family history of CRC or personal history of polyps were associated with early-stage diagnosis.

The higher risk of late-stage diagnosis with young-onset CRC (age < 50 vs. age 65+, p = 0.013) has been reported previously.7 Similarly, patients age < 50 at the Stanford Cancer Institute 2008–2014 were more likely to have advanced-stage CRC (72%), compared with older patients (63%) (p = 0.03), though this was not a colonoscopy registry and, thus, not directly comparable.8 Restricting the analysis to both young and older patients who had a diagnostic indication for their exam reduced the difference in rates of late-stage disease, supporting the benefits of screening and surveillance after age 50. In this colonoscopy cohort, younger patients (age < 50) were more likely to undergo diagnostic rather than screening colonoscopy, likely motivated by small amounts of rectal bleeding or constipation, minor symptoms that are unrelated to their risk of late-stage CRC. Nonetheless, the higher frequency of late-stage diagnosis in patients younger than 50 highlights the importance of awareness of these findings among clinicians, to ensure investigation of alarm symptoms or signs such as hematochezia, or unexplained anemia or weight loss, or the presence of important risk factors such as family history, that might lead to an earlier diagnostic exam.

Overall, the proportion of the young-onset patients in our study with a known family history of a hereditary syndrome such as Lynch syndrome was 4%. Among the 145 young-onset patients in our study, 27 had a known family history of CRC in a first-degree relative. Of these 27, only 7 (25%) of the relatives had young-onset CRC. This leaves the majority of the young-onset patients (20, 75%) without an established risk factor that might be used to predict early-onset disease, and highlights the need for investigations such as this to explore associated factors that could help identify young patients at risk. The higher risk of late-stage diagnosis with young-onset CRC remained statistically significant in a multivariable model adjusted for family history (p = 0.00069).

In a Northern California study, family history of CRC was associated with a lower risk of distant disease (p = 0.04), which was attributed to higher screening rates.9 Our data support this association, with a lower risk of late-stage cancer among colonoscopy questionnaire participants reporting a first-degree relative with CRC (OR 0.73, 95% CI 0.52–1.00).

Data collected by the CDC National Program for Cancer Registries (NPCR) for diagnosis year 2005–2009 among non-Hispanic residents of the Eastern region of the USA showed a higher proportion of late (regional or distant)- vs. early (localized)-stage CRC diagnoses among women (rate ratio RR 1.26, 95% CI 1.22–1.30 for women; 1.19, 95% CI 1.16–1.22 for men, adjusted for age).6 Our data did not show a statistically higher risk of late-stage CRC among the female NH Colonoscopy Registry participants of all age groups (p = 0.77).

Among NHCR participants diagnosed with CRC before age 65, those who were insured by Medicaid were at higher risk of late-stage CRC diagnosis (58%), compared to those who were uninsured or had private insurance (37% late stage) (p = 0.033). Likewise, in North Carolina, Medicaid insurance was associated with more advanced colorectal cancers among patients diagnosed between 1998 and 2002.10 These findings suggest an association between late-stage diagnosis and lower socioeconomic status. It is well-known that underserved patients have a lower rate of CRC screening, and the increased risk of late-stage CRC in this group emphasizes the critical need to reach these populations.15

Among the 555 patients with NHCR questionnaire data completed at or before the CRC diagnosis, 33% had a personal history of polyps. This history was associated with a significantly lower risk of a late-stage CRC diagnosis, which could be attributed to increased compliance with recommended surveillance intervals among patients with prior findings. This explanation is supported by our counter-intuitive observation that prior history of high-risk adenoma actually lowered the risk of late-stage CRC, probably since short surveillance intervals are recommended for these patients. We recommend future investigation in larger cohorts of the hypothesis that patients with prior history of high-risk adenoma are more compliant with their surveillance recommendations, while some of those patients with the low-grade polyps are not being followed-up adequately. The trend towards a lower risk of late-stage CRC among those with a first-degree family history of CRC is also consistent with those patients having better compliance with colonoscopy.

Some surprising results may also warrant additional investigation. Higher levels of education are thought to reflect greater access to having insurance and receipt of medical care, likely contributing to the lower risk of late-stage CRC diagnoses among college-educated patients after age 50. The fourfold increase in risk of late-stage CRC for the young-onset patients who attended college is more difficult to explain. One might hypothesize that such individuals believe CRC is rare prior to the recommended screening age of 50; thus, they are less likely to pay attention to symptoms when they occur. In both age groups, those with no insurance had most commonly ≤ high school education (62% age < 50, 50% age ≥ 50). Education level is probably serving as a surrogate for an unmeasured causal factor.

Limitations of this study include small sample sizes for subgroups, such as young-onset CRC patients. We collected questionnaire data prior to the CRC on less than half of the participants, and specific prior polyp histology information for was available only on a subset. Our analysis compared late-stage to early-stage CRC, which may mask general CRC risk factors that are common to both groups, but the objective was to find factors associated with the late-stage group, rather than to CRC itself.

Our findings that late stage at presentation is associated with young-onset CRC suggest a need to discover better predictors for which patients will have young-onset CRC (age < 50). It also emphasizes the need to screen patients younger than age 50 who have symptoms that may warrant an early diagnostic exam. Not surprisingly, the indication for colonoscopy was diagnostic for a majority of the young-onset patients in the NHCR.

Many important public health concerns are highlighted by this investigation. In addition to the need to identify patients for diagnostic colonoscopy or earlier screening where appropriate, further investigation of the effectiveness of non-invasive early screening techniques that can be applied to this population is warranted to address the finding of late-stage CRC in young individuals. Our findings support the US Multi-Society Task Force of Colorectal Cancer guidelines, which acknowledge an increasing CRC incidence in patients under age 50, and recommend risk-based screening at younger ages.14 Risk of late-stage CRC diagnosis associated with Medicaid insurance suggests a persistent socioeconomic disparity that must be addressed. Our ability to curb high health care costs for treating CRC is demonstrated by the fact that in the USA, a projected $14.7 billion in productivity savings alone has been attributed to improved screening rates from 2005 to 2020.3 Treatment of early-stage tumors is highly feasible because they have not yet grown into other organs, grown into lymph nodes, or metastasized to distant sites. Understanding the combination of patient characteristics, lifestyle factors, and demographic factors that increase the risk of late-stage colorectal cancer is useful to focus prevention, public health outreach, and screening efforts and thus improve public health.