Numerous epidemiological studies have identified associations between diabetes and several types of cancer or cancer mortality in various populations [1, 2]. The relationship between diabetes and cancer was highlighted by the Joint Consensus Report from the ADA and American Cancer Society (ACS) [3]. Although the literature indicates a strong and consistent increased risk of cancer in people with type 2 diabetes, the strength of the association depends on the specific cancer site. The strongest relationships have been demonstrated for liver and pancreatic cancers, although these may reflect some degree of ‘reverse causality’, with the cancer itself leading to the onset of diabetes [1, 2]. The risk of endometrial cancer appears to be doubled in women with diabetes [1, 2]. Risks of breast, colorectal and bladder are about 20–40% higher in those with type 2 diabetes [1, 2]. Interestingly, prostate cancer is about 10–20% less likely to occur in men with type 2 diabetes, which is thought to be due, in part, to the reduced levels of circulating testosterone levels in these individuals [3, 4].

Despite the vast body of epidemiological literature on the risk of cancer in people with diabetes, few studies have, to our knowledge, examined the pattern of cancer risk during different time windows following diabetes onset. In fact, the ADA/ACS Joint Consensus Report suggests that future epidemiological studies should address questions of the temporal relation between diabetes and cancer risk, and should consider the risk of various cancers separately [3]. Therefore, the objective of this study was to examine the risks of site-specific cancers in people with type 2 diabetes during different time windows following diabetes onset. We anticipated increased risks of site-specific cancers (and a decreased risk of prostate cancer), as suggested by previous epidemiology studies [1, 2]. Given the potential for reverse causality for liver and pancreatic cancers, and the possibility of detection bias playing a role in the increased risk of all types of cancer in people with type 2 diabetes, we examined the interaction between diabetes status and frequency of physician visits in the 2 years prior to the index date for the risk of site-specific cancers during different time windows.


Study population and data

We used the British Columbia Linked Health Databases (BCLHD) in this population-based retrospective cohort study [5]. Over 99% of the approximately 4.5 million people in British Columbia, Canada, are eligible for publicly funded health coverage, and their health utilisation data are captured in the BCLHD. The BCLHD includes information on physician services, hospital admissions, coverage period and vital statistics linked via a unique patient identifier. Physician visits and hospital admissions were classified according to the International Classification of Diseases, Ninth Revision (ICD-9; Family-level socioeconomic status (SES) was determined according to neighbourhood income per person from the Canadian census and reported in quintiles.

In our data set, all traceable personal identifiers were removed to protect patient confidentiality. Ethics approval was obtained from the University of British Columbia Behavioral Research Ethics Board and the University of Alberta Health Research Ethics Board.

Definition of cases of diabetes

Individuals likely to have a clinical diagnosis of diabetes were identified based on the established case definition for the Canadian National Diabetes Surveillance System (NDSS) [6]. The NDSS algorithm has been validated for use in Canadian administrative databases and has been found to have a sensitivity and specificity of 86% and 97%, respectively [7]. The index date for diabetes was defined as the first of: (1) a hospital admission for diabetes (ICD-9 code 250); or (2) the second of two medical fee-for-service claims coded ICD-9 250 within a 2-year period. Women with claims for gestational diabetes (ICD-9 648.8) were excluded. Individuals aged 30 years or older at the diabetes index date were considered to have type 2 diabetes.

Cohort definitions

Our sample was drawn from two initial cohorts of subjects with and without diabetes identified from the BCLHD during the period 1 April 1994 to 31 March 2006. Subjects with diabetes were identified using the definition outlined in the previous section (n = 328,994); those who did not meet the criteria for diabetes during the study period, and were registered for health coverage in 1996 (n = 327,572), were eligible to be included in the non-diabetes cohort.

We identified individuals with incident diabetes as those meeting the definition of type 2 diabetes after two consecutive years of not meeting the criteria (n = 185,100). Non-diabetic participants registered for health coverage on the diabetes index date were then matched, with replacement, to each incident diabetic subject by sex and birth year. Both the incident diabetes and matched non-diabetes cohorts had a minimum 2-year cancer-free period prior to the index date. By having this minimum 2-year cancer-free period prior to the index date in both cohorts, we were more confident that we were capturing incident cancers subsequent to diabetes onset.

Cancer outcomes

Cancer outcomes were provided via linkage to the British Columbia Cancer Agency (BCCA) database, which is estimated to cover at least 95% of all cancer cases in British Columbia [8]. The registry includes diagnosis date, cancer site (the International Classification of Diseases for Oncology–Third Revision [ICD-O-3]), and method of diagnosis of malignancy. Cancer sites (ICD-O-3 topography code) examined in this analysis include colorectal (C18–C21), pancreatic (C25), lung (C34), liver (C22), breast (C50), cervical (C53), endometrial (C54), ovarian (C56), prostate (C61) and thyroid (C73).

The first cancer at each site was identified prospectively from the diabetes index date in both cohorts. Censoring occurred at 31 March 2006, at death or on departure from British Columbia, whichever was earliest. To avoid problems with competing risk, the development of a cancer in one site did not exclude subjects from analyses of a first cancer at another site.

Statistical analysis

Incident rates and 95% CI were first determined for each cancer site using Poisson regression analysis. Cox regression was then used to estimate the HR and 95% CI for the association between incident diabetes and cancer. Analyses were adjusted for characteristics at the diabetes index date including age, sex, SES, frequency of physician visits in the 2 years prior to the index date (categorised into quintiles) and index year. The proportional hazards (PH) assumption was assessed by visual inspection of the log-minus-log plot. If the PH assumption was violated for diabetes, the follow-up time was split and the HR estimated in each interval using a multivariable piecewise PH model [9].

To account for time-varying HR, we split the follow-up time into early and later intervals based on observed patterns of risk. Splitting the follow-up time into two intervals at 3 months was deemed appropriate for all cancer sites, except for pancreatic and liver cancer. A further four intervals (3 months–1 year, 1–2 years, 2–3 years and >3 years) were deemed appropriate for pancreatic cancer because of a slower decline in risk over time, and a constant HR over time was valid for liver cancer. To further assess the potential for detection bias, we tested an interaction term for diabetes status and frequency of physician visits in the 2 years prior to the index date, within the early and later time windows. Cancer incidence HR and 95% CI were plotted against mean cancer/censoring time within the intervals for each cancer site. SAS version 9.2 (Cary, NC, USA) was used for analyses and R version 2.8 (Vienna, Austria) for plotting.


There were 185,100 participants in each of the incident type 2 diabetes and matched non-diabetes cohorts (Table 1). The mean ± SD age of the two cohorts was 60.7 ± 13.5 years, and 54% of the participants were men. Nearly 45% of the diabetes cohort fell into the lowest two SES quintiles, compared with 37% of the participants without diabetes. The diabetes cohort also had more physician visits than the non-diabetes cohort in the 2 years prior to the index date (Table 1). The mean cancer-free period before the index date was 6.9 years and 7.1 years for cohorts with and without diabetes, respectively. The mean ± SD follow-up was 4.3 ± 2.8 and 4.4 ± 2.8 years for the diabetes and non-diabetes cohorts, respectively.

Table 1 Baseline demographics for incident type 2 diabetes and matched non-diabetes control cohorts

Throughout the entire follow-up, the overall incidence rate of cancer (per 1,000 person-years) was 16.04 (95% CI 15.76, 16.32) in the diabetes cohort, compared with 14.25 (95% CI 13.95, 14.56) in the control cohort (Table 2). The unadjusted incidence rates for colorectal, pancreatic, lung, liver and endometrial cancers were all higher in the diabetes cohort, compared to those without diabetes. The incidence rates for breast, cervical, ovarian and thyroid cancer were similar in the two cohorts, and, as expected, the incidence rate of prostate cancer was lower in the diabetes cohort.

Table 2 Incident rates for cancers in the type 2 diabetes and control cohorts, in British Columbia, for the whole follow-up period 1996–2006

In the first 3 months following the diabetes index date, the risks of colorectal, lung, cervical, endometrial, ovarian and prostate cancer in participants with type 2 diabetes were significantly elevated compared with those without diabetes (Table 3). There was a trend towards an increased risk of thyroid and breast cancer in the diabetes cohort during the first 3 months after the diabetes index date, but these findings were not statistically significant. Figure 1 illustrates the clear peaks in risk for each cancer subtype in the early time period subsequent to the diabetes index date.

Table 3 HRs for first cancer at each site in the incidence diabetes cohort compared to the matched cohort without diabetes for different time periods following the diabetes index date
Fig. 1
figure 1

Adjusted HRs for site-specific cancers in incident diabetes cohorts over time since onset of diabetes: (a) breast cancer; (b) prostate cancer; (c) colorectal cancer; (d) lung cancer; (e) pancreatic cancer; (f) liver cancer; (g) cervical cancer; (h) endometrial cancer; (i) thyroid cancer; (j) ovarian cancer

In the later time period of 3 months to 10 years after the diabetes index date, the diabetes cohort remained at a significantly elevated risk of colorectal and endometrial cancer compared with those without diabetes (Table 3). A non-significant increased risk of thyroid cancer remained in those with diabetes. The risk of prostate cancer during this later time period was significantly lower in the diabetes cohort compared with those without diabetes. The risks of lung, cervical, ovarian and breast cancers in the diabetes cohort compared with those without diabetes were close to the null in this later time period. Importantly, risk estimates for several cancers (colorectal, lung, cervical, endometrial, ovarian and prostate) for the full follow-up period appeared to be overestimated compared with those risks excluding the first 3 months after diabetes onset (Table 3).

The risk of pancreatic cancer in the first 3 months after diabetes onset was significantly higher in the diabetes cohort (adjusted HR 13.84, 95% CI 7.49, 25.58; p < 0.0001). The adjusted HRs for incident pancreatic cancer in the diabetes cohort, for the time periods 3 months–1 year, 1–2 years, 2–3 years and 3–10 years after the diabetes index date were 3.71 (95% CI 2.55, 5.39; p < 0.0001), 2.94 (95% CI 2.00, 4.33; p < 0.0001), 1.78 (95% CI 1.14, 2.77; p = 0.01) and 1.65 (95% CI 1.28, 2.13; p = 0.0001), respectively. The risk of liver cancer for those with diabetes remained constantly elevated throughout the entire follow-up period (HR 2.53; 95% CI 1.93–3.31; p < 0.0001).

Interactions between diabetes status and frequency of physician visits in the 2 years prior to the index date were statistically significant for lung, breast, cervical and prostate cancer in the first 3 months (Table 4). All cancer sites, except liver and ovarian, followed a similar pattern of substantially elevated risk for those subjects with diabetes who had fewer physician visits compared with their counterparts without diabetes, although this did not reach statistical significance (data not shown).

Table 4 HRs and 95% CI for risk of cancer by site among the incident diabetes cohort compared with the matched cohort without diabetes (the reference group) during earlier and later time windows


In a large population-based cohort of people from British Columbia, Canada, we examined the pattern of cancer incidence during different time windows following diabetes onset. The most striking observation was an initial spike in risk for most cancers immediately following the onset of diabetes, and a subsequent levelling off of risk throughout the remainder of follow-up. Interestingly, the peak in cancer risk within 0–3 months of diabetes onset was evident for all cancers, with the exception of liver cancer. For some cancers (colorectal, liver and endometrial cancer), the risk levelled off but remained elevated in those with diabetes, while for other cancers (lung, breast, cervical and ovarian cancer), the subsequent risk was the same as that observed in participants without diabetes, and for prostate cancer, the risk was subsequently lower in men with diabetes.

While the mechanisms underlying the epidemiological associations between diabetes and cancer are not entirely clear, common risk behaviours, such as unhealthy diets, physical inactivity, inflammatory markers and hyperinsulinaemia, are important considerations. Of particular concern, however, are methodological issues related to reverse causality and detection bias [3]. Our data support the concern that the elevated risk of some cancers in people with diabetes may be partly due to increased detection around the time of diabetes diagnosis. However, reverse causality does not appear to account for the elevated risk of cancer in people with type 2 diabetes.

The increased risks of pancreatic and liver cancers have been of particular concern with regards to reverse causality, as dysfunction of these metabolically active organs would result in impairments of glycaemic control. We were able to address this concern by having, on average, a 7-year cancer washout period (with a minimum of 2 years), for both the diabetes and control cohorts prior to the diabetes index date. In addition, regardless of any reverse causality that might have been present, if at all, we continued to observe a strong and statistically significant increased risk for both liver (HR 2.53) and pancreatic cancers (HR 1.65) in the latter time windows. As pancreatic cancer is rapidly progressive and generally fatal [10], it is highly unlikely that such a diagnosis preceded the diabetes index date in later time periods.

On the other hand, there does appear to be evidence of a substantial detection bias in the assessment of cancer risk in the diabetes population. Detection, or ascertainment, bias arises due to the increase in the number of clinical investigations associated with a new diagnosis of diabetes, which may then lead to the discovery of a cancer [3]. The general pattern of an initial elevated peak in cancer risk at the time of diabetes onset, which is substantially higher than subsequent risk, suggests that many cancers are being diagnosed at or around the time of recognition of diabetes, or vice versa. Moreover, the pattern of risks associated with the interaction between diabetes status and frequency of physician visits in the 2 years prior to diabetes onset further suggests a substantial degree of detection bias. The likelihood of having a cancer diagnosis within 0–3 months of diabetes onset was much greater in those who had fewer physician visits in the previous years. Given that we excluded individuals with cancer diagnoses prior to their index date, this suggests that the likelihood of diagnosing cancer was increased by the fact that diabetes had recently been identified.

This raises one of the potential limitations of our methods, in particular the index date assigned to diabetes onset. The case definition for diabetes is based on one hospitalisation or two physician visits over a 2-year period, with the index date being defined as the later of those dates. Thus, is it possible for the clinical diagnosis of diabetes to have preceded the case date in the BCLHD; however, previous validation of this diabetes case definition suggests that the median duration between qualifying physician visits is 39 days (interquartile range 12–150 days) (J. Johnson, unpublished data). We also ensured that participants were free of a cancer diagnosis in the 2 years prior to the index date, coinciding with the same window as the case definition for diabetes. We recognise, however, when diabetes and cancer are diagnosed within 3 months of each other, it is difficult to accurately determine which disease appeared first. In other words, it is possible to have a detection bias in both directions, which may lead to ambiguity with respect to the association between the two conditions, especially in the situation where there is only a short follow-up period after the first diagnosis. This would warrant further investigation to assess the risk of cancer diagnosis prior to the diabetes index date.

With this is mind, there nonetheless appears to be further evidence for detection bias with two cancers in particular: lung and prostate cancer. Lung cancer has generally not been associated with an increased risk in people with diabetes [3], yet in the early time window of our analysis, we observed a 170% increased risk of lung cancer, which subsequently disappeared after that initial 3-month period. Even more suggestive is the 98% increased risk of prostate cancer in the early time window but a subsequently decreased risk among men with diabetes in the longer term, which is what would have been expected based on physiological plausibility and previous epidemiological evidence [3].

Of particular interest are the observed risk estimates of breast cancer in our cohorts. A previous meta-analysis of cohort and case–control studies found that women with diabetes had a moderate, but significant, 20% increased risk of breast cancer [11]. This risk is similar to that observed during the early period in our analyses, but this was followed by a lack of association in the subsequent time period. This suggests that previous epidemiology studies of breast cancer risk in women with diabetes may not have fully accounted for an initial detection bias.

To our knowledge, none of the previous epidemiology studies of diabetes and cancer risk has considered cancer incidence during different time windows following the diabetes index date. Therefore, while we continued to see significant associations during the latter time period following the diabetes index date for certain subtypes of cancer (colorectal, liver, endometrial, pancreatic and prostate), our risk estimates are likely to be more reliable and accurate, given that we were able to take into account the initial time window of heightened screening. It is possible that previous studies may have overestimated the risk of various types of cancer in patients with diabetes if they did not properly account for or take into consideration this initial detection bias following a diagnosis of diabetes.

A key strength of this study relates to our use of population-based administrative health data. We were able to assemble very large cohorts, and our diabetes cohort captured the entire population of incident diabetes in the province of British Columbia. As such, this study does not suffer from the potential biases that exist in selective, non-population-based studies. Furthermore, the British Columbia Linked Health databases and British Columbia Cancer Registry have been used in numerous epidemiology studies [1215]. In our analyses, we matched our cohorts by age, sex and index year, thus balancing the cohorts equally on these potential confounders. Finally, we were able to look at the incidence of several different types of cancer, although the numbers were small for some subtypes, such as thyroid and cervical cancer.

On the other hand, there are some limitations to our study. As we used administrative data, we lacked information on potentially important clinical covariates, such as treatment modality (diet, oral agents and insulin), smoking status, weight or BMI, glycaemic control and physical activity. These are all potential confounders in the relationship between diabetes and the incidence of cancer. However, other studies that have included information on these clinical covariates when looking at the risk of cancer in people with diabetes have observed results similar to ours [1, 2].

In summary, this study provides new evidence on the relationship between type 2 diabetes and cancer incidence by examining the risk of cancer during different time windows following diabetes index date. It appears, from the patterns of cancer incidence, that reverse causality may play a role, but it does not account for all of the elevated risk of cancer in type 2 diabetes. On the other hand, there does appear to be a substantial detection bias for most cancers in people with new-onset type 2 diabetes. Although an elevated risk of some cancers is sustained over time, detection bias following the onset of type 2 diabetes probably leads to an overestimation of cancer risk.