Introduction

There is considerable evidence linking type 2 diabetes and the incidence of cancer [1]. Recent developments and contributions to the literature have raised the profile of the previously under-recognised relationship between these comorbid conditions. A number of important questions remain, however, as to how best to characterise this relationship. This goal is far from simple, since both conditions are complex and heterogeneous on their own. It has been observed that the strength and direction of this association depends on the cancer site [1]. The mechanisms for the observed associations may be direct or indirect, through shared risk factors. Furthermore, glucose-lowering therapies have been implicated in modulating the risk of cancer incidence in people with type 2 diabetes, leading to considerable controversy among the clinical community [2].

The increased attention to diabetes and cancer has led to a rapid proliferation of reported observational studies, using various data sources, including administrative databases, ongoing cohort studies and secondary analyses of randomised controlled trials (RCTs). Although some information can be drawn from RCTs, much research, particularly on the topic of glucose-lowering medications, will inevitably take the form of observational studies, owing to the rarity of cancer. The incidence rate of all cancers is about 20/1,000 per year in ages around 70, so a 5 year study of 1,000 individuals will produce about 100 cancers. Therefore long-term follow-up in large populations is needed to produce even moderately precise estimates. Furthermore, while RCTs are typically designed for questions of efficacy, hypotheses of potential harm do not easily lend themselves to RCT designs [3, 4]. Therefore, RCTs are not always practical for pure sample size reasons, nor feasible from an ethical perspective. In such cases, large, well-designed, observational studies can provide important evidence of potential harms [46].

There are many methodological challenges that must be addressed in observational studies of cancer incidence in people with type 2 diabetes [7], including the usual suspects of potential biases or confounding factors that threaten validity, and also the potential interactions between the two conditions themselves. This is the first of two papers developing frameworks for the evaluation of methodological aspects linking diabetes, diabetes treatments and cancer. This paper addresses challenges in the study of cancer incidence (Table 1); the second paper addresses the study of mortality in patients with diabetes and cancer [8].

Table 1 Considerations in the evaluation of the impact of diabetes on cancer incidence

Epidemiology of diabetes and cancer incidence

Numerous epidemiological studies have identified associations between diabetes and several types of cancer in various populations [919] (Table 2). While the literature indicates a strong and consistent increased risk of cancer in people with type 2 diabetes, the strength of association depends on the specific cancer site. The strongest relationships have been demonstrated for liver [18] and pancreatic [17] cancers, although these may reflect some degree of ‘reverse causality’, with the cancer itself leading to the onset of diabetes [1]. Risk of endometrial cancer appears to be doubled in women with diabetes [13]. Risks of breast [11], colorectal [12], bladder [15], non-Hodgkin lymphoma (NHL) [16] and kidney [14] cancers are about 20–40% higher in people with type 2 diabetes. Interestingly, prostate cancer is about 10–20% less likely in men with type 2 diabetes, which is thought to be due, in part, to the reduced levels of circulating testosterone in men with diabetes [1, 19]. For other malignancies, the numbers of studies are generally small, but there appears to be no consistent association with lung [20] and ovarian [21] cancers. For gastric cancer, there may be an increased risk in people with diabetes compared with non-diabetic populations in Japan, where the incidence of this malignancy is high [22], but it is unclear whether this association is also found in western populations. In general, the observed increases in cancer risk have been reported in Asian cohorts as well as western populations [23].

Table 2 Associations between diabetes (mainly type 2) and incidence cancer risk: from meta-analyses

Analyses of type 1 diabetes cohorts, compared with the general population, suggest increased risks in some cancers (for example, ovarian cancer in a UK series [24]), but these are not consistent across all studies [2427]. Notably, there does not appear to be associations between type 1 diabetes and the cancers linked with type 2 diabetes—i.e. breast, colorectal, endometrial, liver, pancreas, kidney and bladder cancers.

One of the first considerations is that studies exploring the association between type 2 diabetes and cancer incidence should avoid overall cancer incidence as the single endpoint, and instead focus on specific cancer sites. Overall cancer incidence (a composite endpoint) is likely to mask variations in specific patterns of site-specific cancer incidence depending on biological, clinical or socioeconomic determinants. Unfortunately, for many cancer sites, individual cohort studies lack the power to reliably assess the degree of risk associated with diabetes let alone with specific therapies. In such instances, combining data through meta-analytical techniques could be an alternative approach.

Potential biological mechanisms

There are several hypothesised mechanisms for the association between diabetes and cancer, including the effects of hyperglycaemia or insulin resistance and hyperinsulinaemia. Type 2 diabetes is characterised in the early stages by insulin resistance and consequent hyperinsulinaemia [28]. The latter promotes tumour cell growth directly via insulin receptors [29], but effects may also be mediated indirectly via the IGF-1 receptor. In turn, many cancer cell lines express insulin and IGF-1 receptors [30, 31]. Tumour favouring attributes include increased cell growth, anti-apoptosis, increased cell motility and invasion [29].

Therefore high endogenous insulin levels and/or administration of exogenous insulin could theoretically have a promoting effect on neoplastic disease. This hypothesis is supported by a recent meta-analysis, which demonstrated that elevated serum insulin or c-peptide levels are associated with a significantly increased risk of certain cancers [32]. In addition, increased endogenous insulin levels have been associated with a worse prognosis for breast cancer patients [33]. Insulin resistance may also promote cancer risk via other mechanisms, such as decreased sex-hormone binding globulins leading to excess oestrogen and stimulation of oestrogen-dependent tumours or inflammation. Insulin resistance is also associated with a higher production of NEFA, interleukin-6, plasminogen activator inhibitor-1, leptin, and tumour necrosis factor α [34].

An alternative hypothesis is that the increased risk of cancer in type 2 diabetes is due to elevated blood glucose levels. This hypothesis suggests that hyperglycaemia is a confounder in the observed increased risk of cancer outcomes associated with increasing use of exogenous insulin therapy [35]. The ‘hyperglycaemia hypothesis’ is supported by large inception cohort studies that demonstrate a strong relationship between elevated blood glucose and cancer incidence or mortality [32, 3639]. The hyperglycaemia risk relationship appears to be consistent across all levels of blood glucose, even within the non-diabetic range [36, 38]. It is true that transformed cells have a high glucose requirement, in keeping with their high rates of glycolysis relative to normal cells, as first recognised by Otto Warburg. Even though most cancer cells have a constitutively high level of glucose uptake, and are able to fully satisfy their glucose requirements under normoglycaemic conditions [40], it is still conceivable that hyperglycaemic conditions would give cancer cells a relative growth advantage. Experimental studies exploring dose response relationships between glucose concentration and tumour growth [41] generally show that increasing glucose concentration does increase proliferation, but with a plateau occurring around 5 mmol/l. This suggests that hyperglycaemia confers no growth advantage and normalisation of glucose levels by insulin therapy would not be expected to constrain cancer growth. Indeed, evidence from the large RCTs of intensified glycaemic control for type 2 diabetes does not support the causal hypothesis that lowering blood glucose will reduce the risk of cancer [42]. Cancers are heterogeneous and insulin responsiveness is not universal, and for some tumours, proliferation may be enhanced by hyperglycaemia or inflammatory responses. However, the accumulation of experimental and epidemiological evidence is more consistent with the hyperinsulinaemia hypothesis, and less so with the hyperglycaemia hypothesis [43].

figure a

Common risk factors

Several important modifiable and non-modifiable risk factors should be considered as confounding factors in assessing the risk of cancer incidence in people with type 2 diabetes. For example, like diabetes, the incidence of most cancers increases with age.

Lifestyle behaviours are also important considerations, as there are many modifiable risk factors shared between cancer and type 2 diabetes. Overweight and obesity have been linked with an increased incidence of many cancers, in both men and women [44]. The cancers most consistently associated with overweight and obesity are breast (in postmenopausal women), colon/rectum, endometrium, pancreas, oesophageal adenocarcinoma, kidney, gallbladder, and liver cancers. Obesity is clearly a risk factor for type 2 diabetes [45]. Obesity can lead to insulin resistance, which may partly explain this association with cancer. Indeed, there is evidence to suggest that visceral adiposity, a marker of insulin resistance, is associated with risk of both type 2 diabetes [46] and certain cancers (e.g. colon), independent of BMI [47]. However, generalised obesity may also promote cancer through mechanisms independent of insulin resistance. For instance, there is excess oestrogen production in the peripheral adipose tissue of obese individuals, which may increase the risk of oestrogen-dependent tumours such as those of the breast and endometrium.

Furthermore, poor dietary habits and physical inactivity are potentially important confounding factors to be considered, as they are thought to mediate cancer risk via insulin resistance and obesity [48, 49]. Given the potentially detrimental role for hyperinsulinaemia in both diabetes and cancer, physical activity is known to improve insulin sensitivity, particularly a combination of cardio-respiratory and resistance training [50]. Likewise, tobacco smoking, which is more common in people with type 2 diabetes, is associated with an increased risk of a number of cancers [51]. Moreover, the primary modifiable behavioural risk factors for diabetes and cancer are also closely associated with socioeconomic status [52].

figure b

Tumorigenesis, latency periods and natural history of type 2 diabetes

For many common adult epithelial malignancies, tumorigenesis is a multistage process from an initiated mutated cell through clonal expansion and progression through a precursor lesion to an invasive carcinoma, and then metastasis [53]. This process is best illustrated for colorectal cancer—the total period from first initiated cell to clinical cancer presentation is termed the sojourn time and is approximately 50 years [54], whereas the period from the first clinically identifiable adenoma (only a few millimetres in size) to incident cancer is approximately 20 years, and the period from a clinically relevant adenoma (i.e. larger high-risk lesion) to cancer presentation is approximately 7 years.

The latency period is the time from first exposure to a (causal) risk factor to incident cancer. In an experimental setting, this can be readily determined—for example, trials of aspirin and colorectal cancer prevention require a minimum 10 year follow-up to see a treatment effect, and, therefore, we deduce that the latency period is greater than 10 years. For other risk factors, this derivation may not be as straightforward and has to be estimated indirectly. In the setting of diabetes, it is unlikely that the effect is at cancer initiation (as it is for smoking), but conceptually, the abnormal metabolic and hormonal milieu may influence the rate of neoplastic progression anywhere along the sojourn timeframe. For obesity, there are two lines of evidence that the latency period is of the order of 10 years: first, the median follow-up durations in prospective cohort studies of BMI–cancer associations are typically 10 years [44]; and second, cancer incidence reduction only occurs 10 years and later after bariatric surgery in grossly obese individuals [55]. To the extent that the overlapping pathophysiology of obesity and type 2 diabetes is linked to cancer risk, it seems reasonable that a similar timeline operates for diabetes.

Type 2 diabetes is an insidious condition, with its onset typically recognised in older adults, and characterised by progressive hyperglycaemia. Both insulin resistance and beta cell dysfunction (insulin secretory defect) play a role in the transition from normal glucose tolerance (NGT) to impaired glucose tolerance (IGT) and then to type 2 diabetes [56]. The metabolic milieu associated with type 2 diabetes may thus be seen as promoting or accelerating nascent tumour growth, rather than stimulating the development of new cancers, particularly in settings of short-term exposure. Insulin resistance and hyperinsulinaemia can predate the clinical diagnosis of type 2 diabetes by up to 10 years [28], thus the influence of this condition on cancer risk may begin well before diabetes diagnosis. The risk from long-term exposure to high levels of insulin is relatively underexplored and is directly relevant for the risk of cancer in relation to the duration of diabetes and use of exogenous insulin. Thus, observational studies should recognise this and allow for sufficiently long exposure time, preferably several years.

The diagnosis of type 2 diabetes is arbitrarily set at some cut-off point of blood glucose level (or HbA1c). While this serves as the only viable starting point for clinical management, it clearly complicates observational studies of the biological relationships between type 2 diabetes and cancer.

Since the biological effects of exceeding the diagnostic threshold are clearly limited, the clinical implications are large, primarily through treatment inception and increased contact with healthcare systems. This highlights the need to consider multiple time scales in assessing the temporal relationship, particularly current age, duration of diabetes, current calendar time (to capture drift of treatment modalities) and date of diagnosis (to capture changing diagnostic criteria).

Since the magnitude of this accumulating exposure is unknown at diabetes diagnosis (and in particular among persons not yet diagnosed) it cannot be accounted for except through the time since diagnosis (duration of diabetes). Therefore duration of diabetes is an essential variable to include as a predictor of cancer incidence. Even with a long preclinical period, it is likely that some pancreatic reserve exists at time of diagnosis, with continued risk for hyperinsulinaemia. With a longer duration of diabetes, it is expected that any remaining endogenous insulin secretion declines, with progressively worsening hyperglycaemia [28, 56]. This is further complicated by the initiation and progressive use of glucose-lowering medications, each of which have been implicated in modulating the risk of cancer in type 2 diabetes [1, 2].

The clinical conditions that lead to diagnosis of diabetes and/or cancer may to some extent be overlapping (accumulating ‘ill-health’), so that patients with diagnoses of both diseases in close succession cannot be classified meaningfully as having one disease before the other. Thus, it is important in epidemiological studies to be able to classify follow-up in the diabetic population by duration (time since diagnosis) [57]. Ideally, this would be accomplished by identifying incident cases of diabetes, implying that prevalent cases, with unknown date of diagnosis and hence missing data on duration in follow-up, should be excluded from analysis.

A final issue in follow-up for cancer occurrence is the termination of follow-up (censoring) in observational studies, which must be independent of the disease process (i.e. cancer incidence). Thus, censoring owing to change of glucose-lowering therapy could potentially introduce substantial bias, since cancer occurrence is likely to be preceded by symptoms that may lead to changes in treatment modalities. The appropriate approach is therefore to account for all available follow-up time, with attribution of all risk time to a relevant exposure in a time-varying manner, such as ‘ever on drug X’ or ‘time since start of drug X’.

figure c

Reverse causality

In assessing the temporal relationship between diabetes and cancer, the concept of reverse causality should be considered. In the case of diabetes and cancer, this issue has been raised as a consideration for pancreatic cancer, whereby resultant dysfunction of insulin secretion may be sufficient to induce hyperglycaemia, particularly in individuals with underlying peripheral insulin resistance. Similarly cancer in the metabolically active liver may also result in derangements of glycaemic control.

While reverse causality should be kept in mind as a potential alternative hypothesis for observed associations between diabetes and cancer incidence, evidence suggests that it is unlikely to be entirely responsible for all observed associations. For example, because pancreatic cancer is generally regarded as rapidly progressive and generally fatal, only an increase in incidence in the very short (e.g. <6 months) time window following diabetes onset may be due to reverse causality. However, when observations of an elevated risk of pancreatic cancer persist during longer periods of follow-up (i.e. up to 10 years after diabetes onset), it is unlikely to be due to reverse causality [17, 57].

figure d

Competing risks

Another consideration is the potential impact of competing risks, and the related concepts of immortal time bias and informative censoring. Immortal time is a period of follow-up in cohort studies during which, because of exposure definition, the outcome under study could not occur. For time-based, event-based and exposure-based cohort definitions, immortal time bias in the rate ratio resulting from misclassified or excluded immortal time increases proportionately to the duration of immortal time [58]. Informative censoring occurs when follow-up is curtailed due to change in status related to the event of interest; in RCT, this might occur as a result of a patient experiencing an adverse effect; in observational studies, it may arise owing to changes in drug therapy which may be related to subsequent outcome (e.g. early symptoms of cancer).

To study the incidence of cancer in patients with type 2 diabetes, it is necessary to ensure that follow-up is only among cancer-free individuals. This is more easily accomplished in the populations of Scandinavia, the UK and Canada, with long-standing cancer registration systems. In other settings where diabetes and cancer diagnosis come from the same systems, it may be necessary to include a ‘washout’ or cancer-free period prior to the follow-up for cancer occurrence among both diabetic patients and non-diabetic individuals.

Among diabetic patients, an increased risk of death due to other causes, most notably cardiovascular disease, would compete for the incidence of cancers, which may have a longer latency period. For example, improvements in cardiovascular risk reduction have led to reduced cardiovascular mortality in the general population as well as the diabetic population [59]. It might be expected that, as life expectancy is extended, we will see an increase in the number of cancer cases in the diabetic population, but we would only see an increase in cancer rates if the additional survivors are more susceptible to cancer, which is difficult to imagine. However, if one competing risk (cardiovascular disease, for example) decreases without change in the cancer rates, we will observe an increasing cumulative risk of cancer, and hence an apparent increase in the cancer burden.

Another competing risk consideration is the development of multiple tumours in the same individual, including second primary cancers in the same organ/tissue and metastases of primary cancers, as well as secondary cancers in different tissues. It would also be important to consider the situation of paired organs or tissues; in the case of breast cancer, for example, should the development of a second cancer in the contralateral breast be considered a secondary cancer or a second primary cancer? Because having one cancer is associated with an increased risk of a subsequent cancer (primary or secondary), it would be important to consider censoring due to other cancers. For example, if an observational study is focused on breast cancer, would occurrence of a cervical cancer censor a women from subsequent follow-up for breast cancer? We would recommend that it should, since diagnosis of one cancer influences not only the occurrence of other cancers (partly through treatment), but among the non-diabetic part of the population, diagnosis of a cancer may influence the diagnosis of diabetes. If an analysis were based on data from a registry for a specific cancer site, it would be important to know if eligibility was based on prior cancer history or not. It might therefore be prudent to consider the impact of censoring due to occurrence of any other cancer than the one in question, and assess this impact through sensitivity analysis.

figure e

Cancer screening

Cancer screening programmes are common in the developed world, based on evidence that such screening activities increase the early detection of cancers. The objective is to identify early and ‘silent’ cancers that can be managed early, which would improve prognosis. There are two important considerations with regard to cancer screening programmes: (1) the timing of implementation of new cancer screening programmes relative to the follow-up periods of observational studies; and (2) the degree of potential biases for or against cancer screening in diabetic populations, and whether any biases extend to users of specific glucose-lowering medications within the diabetic population. If there is evidence of bias against cancer screening in the diabetic population, we would expect to see fewer ‘silent cancers’ being detected, and thus, a lower cancer incidence, but when a cancer is finally detected, it is likely to be at a more advanced stage, and thus have a poorer prognosis.

figure f

Ascertainment bias

Ascertainment bias in relation to cancer diagnosis might exist as an increased detection of cancer in the diabetic population, particularly in the early stages following diabetes onset, since there is a heightened investigation of the newly diagnosed diabetic patient. Increased surveillance would also be expected to extend to longer-term follow-up, as diabetic patients are more likely to interact with the healthcare system (e.g. more frequent physician visits or hospitalisations). These effects would bias towards an elevated risk for cancer incidence in people with type 2 diabetes, and in particular in the period shortly after diabetes onset.

Recent observational studies that address the time-varying risk of cancer incidence following diabetes onset suggest that there is a substantial degree of detection bias in the diabetic population [57]. The general pattern of an initial elevated peak in cancer risk at the time of diabetes onset, which is substantially higher than subsequent risk, suggests that many cancers are being diagnosed at or around the time of recognition of diabetes. Interestingly, this pattern is generally observed for almost all cancers, with a subsequent levelling off of risk throughout the remainder of follow-up. For some cancers, the initially elevated risk decreased, but remained elevated in those with diabetes (e.g. for colorectal, liver and endometrial cancers), while for other cancers, the subsequent risk was the same as that observed in the non-diabetic subjects (i.e. lung, breast, cervical and ovarian cancers), and for prostate cancer, the risk was subsequently lower in men with diabetes [57].

There is limited evidence of the potential for surveillance bias later in the course of diabetes, and it is unclear what the trajectory of cancer incidence would be in this situation. Presumably such a bias would result in a greater number of cancers identified, with a resultant elevated incidence rate. It is not clear, however, if this would persist, or would eventually decline, as the pool of cancers in the diabetic population become ‘exhausted’, with the incidence rate subsequently falling to below that of the non-diabetic population.

figure g

Glucose-lowering treatments

The recent surge in attention surrounding the topic of type 2 diabetes and cancer is, in large part, due to reports of the possibility that glucose-lowering therapies may be involved in this relationship [2]. In brief, observational studies suggest a protective effect on cancer outcomes for metformin [6063] but on the other hand, a potential increased cancer risk associated with exogenous insulin [61, 6365], insulin analogues [65, 66] and sulfonylurea therapies [61, 63, 64]. Varying risks have been linked with glitazones, with some studies suggesting a reduced risk [67], but more recently an increased risk of bladder cancer has been linked with pioglitazone [68]. These general patterns are supportive of the hyperinsulinaemia hypothesis, where therapies that increase circulating insulin levels are associated with increased risk of cancer, while treatments that ameliorate insulin resistance and reduce circulating insulin levels are associated with decreased risk.

The observation that metformin-induced growth inhibition of experimental cancers in vivo is associated with a decline in both insulin levels and activation of insulin receptors of neoplastic tissue [69] is also consistent with an influence of insulin on cancer growth. Thus, the observed associations for those glucose-lowering drugs may be due to their direct or indirect effects on insulin resistance and levels of circulating insulin, although other mechanisms may also be involved, including, for example, effects on AMP-activated protein kinase (AMPK) signalling pathways [70].

Selection bias and confounding by indication are additional considerations [5, 71]. Confounding by indication arises when indications for therapy are differentially associated with risk factors for cancer. For instance, in patients with type 2 diabetes, oral glucose-lowering agents are used early on in the course of the disease, and insulin is reserved for patients who have not responded to oral agents and continue to have undesirable glycaemic control. Thus, for glucose-lowering therapies in type 2 diabetes, there is a selection of patients not responding to oral agents into the insulin-treated group. To the extent that non-response is associated with (risk factors for) cancer, confounding by indication arises.

Therefore, it is important to include the entire diabetes drug exposure history of individuals in analyses of cancer incidence. This includes cumulative drug exposures during defined time windows over the follow-up period, to assess the cumulative time on a drug as well as current drug exposure. Moreover, it should be considered whether the current drug exposure is the relevant covariate, or whether lagged exposure (e.g. exposure 1 year ago) is preferable. Accounting for the full cumulative drug exposure minimises concerns with immortal time bias [58]. In addition, events associated with escalation or switching of glucose-lowering therapy need to be carefully considered when assessing the effect of a newly started drug on cancer risk. The change of therapy may be a marker of enhanced health contact due to early cancer symptoms or metabolic effects of indolent disease, which can lead to a spurious association between drug use and cancer incidence.

Addressing these concerns requires knowledge of the duration and dosage of drug use, and hence, precludes the use of follow-up where duration is not known. The exclusion of individuals with missing data on duration of drug use is known as a ‘new users’ design [72]. The fraction of people who are not available for analysis because of missing information on drug duration will be smaller the longer the coverage period of the drug use databases, and so this is likely to diminish by calendar time. This will potentially influence results through the changing criteria for prescription of particular drugs of interest (i.e. the character of the confounding by indication will change by calendar time).

Drug exposure effects

To date, the majority of the evidence of the risk and benefits of glucose-lowering therapy and cancer outcomes is based on basic biomedical or epidemiological studies. More recently, a number of studies have evaluated cancer risks with different types of insulin [61, 67, 73, 74], fuelling speculation of an increased risk of cancer associated with the insulin analogue glargine (A21Gly,B31Arg,B32Arg human insulin), owing to its structural similarities to IGF-1. The newest glucose-lowering agents on the market, the so-called glucagon-like peptide-1 (GLP-1) based therapies, have also been associated with increased risks of thyroid and pancreatic cancers [75].

This controversial topic has been the subject of a number of editorials and commentaries [7679] drawing increased attention to the relationship between diabetes and cancer, and a need to better understand the role played by different glucose-lowering therapies in this relationship. In the majority of the available observational studies, however, evidence of an increased or decreased risk is generated through relative comparisons of one class of glucose-lowering therapy against the others. For example, metformin is currently the first line therapy for all patients with type 2 diabetes, but historical data, prior to 1998–1999 and the publication of the landmark UK Prospective Diabetes Study (UKPDS) results, would have a mix of metformin and sulfonylurea as the first line oral therapies. Studies of glucose-lowering therapies need to explicitly clarify the reference population or exposures for which the drugs of interest are being compared.

Furthermore, given the non-random allocation of drug therapies in clinical practice, potential confounding by indication must be considered. Metformin acts to improve insulin sensitivity and has been associated with a reduction in fatal and non-fatal myocardial infarctions in overweight individuals [80] and is preferentially prescribed to overweight and obese individuals, who in turn are at increased risk of many cancers. If this is the case, however, we might expect to see an increased risk of cancer incidence in patients starting on metformin therapy. On the contrary, use of metformin has consistently been associated with a reduced risk of cancer [60, 61, 65, 81]. These findings are consistent regardless of whether weight or BMI was controlled for in statistical analyses. An alternative hypothesis is that any selective prescription of metformin that might bias its use towards those at increased risk of cancer is overwhelmed by the beneficial effect of metformin on tumorigenesis. Thus, as in any epidemiological study, it is important to consider the direction and magnitude of potential confounding factors when interpreting their role as alternative hypotheses giving rise to observed associations.

Glycaemic control

A particularly challenging issue related to confounding by indication is the absolute level of glycaemic control during the follow-up period, particularly given that hyperglycaemia itself may be contributing to an increased risk of cancer [36, 38]. Progressive hyperglycaemia typically leads to complex patterns of multiple drug use and switching over time [82]. Because treatment type is correlated with disease severity and duration, for example, it is difficult to design a convincing observational study analysis that compares oral agents with insulin for outcomes that are related to disease duration or severity. Therefore, when available, investigators should include measures of glycaemic control as covariates in analyses of drug use and cancer incidence. It would be particularly helpful to assess the degree of change in blood glucose over time associated with different drug exposures [7].

Confounding by indication is not conceptually different from confounding by other factors, and the approaches to detect and control for confounding—matching, stratification, restriction, and multivariate adjustment—are the same [71]. Even after adjustment for known risk factors, residual confounding may occur because of measurement error or unmeasured or unknown risk factors. Although residual confounding is difficult to exclude in observational studies, there are limits to what this ‘unknown’ confounding can explain. The degree of confounding depends on the prevalence of the putative confounding factor, the level of its association with the disease, and the level of its association with the exposure. For example, a confounding factor with a prevalence of 20% would have to increase the relative odds of both outcome and exposure by factors of 4 to 5 before the relative risk of 1.5 would be reduced to 1.00. While the association between treatment failure with oral hypoglycaemic drugs and insulin exposure is presumably much stronger than this, it is unknown to what extent treatment failure per se is a strong predictor of cancer risk. However, prescribing physicians do take more factors into account than the ones available in the databases, and a thorough knowledge of patients may very well have effects of this magnitude, although this remains purely speculative.

Differences in patterns of drug use will also arise in different healthcare systems that provide different levels of coverage for different agents. Comparisons of different forms of insulin may be limited, for example, if human insulins are considered first line choices, and more expensive long-acting insulin analogues are restricted to those individuals who have undesirable hypoglycaemia. Thus, drug exposure in a database is not only an approximate measurement of drug exposure, but also of processes of care around its use.

figure h

RCT data

As is often the case with controversial reports from observational studies, calls have been made to look to a higher level of evidence, such as data from RCTs [76, 78]. To that end, a number of recent publications have contributed data from RCTs on this topic, including two meta-analyses of trials of long-acting insulin analogues [83, 84]. Both reports concluded there was no evidence of an increased risk of cancer although the included trials involved small numbers of both type 1 and type 2 patients and were generally of very short duration, highlighting the major limitation of RCTs in answering questions regarding potential modification of cancer risk: lack of power. Other contributions have been the post-hoc analysis of large RCTs of glucose-lowering therapies [85].

If we are to look to RCT data to inform the debate on specific glucose-lowering therapies and cancer risk, however, it is important to recognise that the recent large trials of glucose-lowering drugs were generally aimed at specific glycaemic targets, and therefore included protocol-driven escalation of therapy [86]. While randomisation would have ensured balance of the patients between original treatment arms, differences in the use of add-on rescue therapy between the arms may arise, presumably attributable to the degree to which the randomised therapies controlled blood glucose. In this way, the originally randomised drug that appears to be associated with increased (or decreased) cancer risk may have no direct effect on cancer risk, but, instead, the drug may not achieve adequate glucose control and thus induce the use of the add-on therapies that might independently increase (or decrease) cancer risk [86]. Some may also see this as a classic case of ‘confounding by indication’, which is likely to occur in observational studies where the reasons for additional therapies are not measured or recorded. Fortunately, this is not the case in the protocol-driven data collection of an RCT; the protocol dictates the use of add-on therapy, which is fully measured and recorded. In this view, accounting for the addition of rescue therapies known to be independently associated with cancer risk, perhaps with exposure definitions that recognise the cumulative and time-varying nature of these exposures, may provide a better estimate of the risk associated with the randomised therapies.

figure i

Summary and recommendations

To further advance our understanding of the association between diabetes and cancer incidence, and the potential role of glucose-lowering treatments, we encourage investigators to consider the various confounding and modifying factors and potential solutions to deal with these challenges (Table 1). Importantly, this includes consideration of common risk factors, as complete a follow-up as possible, recognising potential selection bias, accounting for the cumulative exposure to all glucose-lowering therapies, and understanding of exposure in the reference categories or populations. Because observational studies include assumptions on many of these factors, results will be best informed by testing these assumptions through sensitivity analyses and demonstration of dose response relationships.

Understanding the relationship between diabetes and cancer is perhaps one of the next biggest challenges for the clinical community [2]. Clearly, this also means a better understanding of the role of glucose-lowering therapies. The ability to conduct rigorous observational studies of comparative effectiveness that produce valid results will very much depend on the design of the study and the quality of the data. In general, there is a balance between quality and quantity of available data for observational studies. The value of large studies that use lower-quality data may be limited by their tendency to produce precise but biased estimates. RCTs, on the other hand, are not our best sources of evidence for rare adverse events. For this reason, we must continue to look to observational studies to inform this debate, seeking and sharing those with the highest quality data and analytical approaches that attempt to disentangle these complex exposure and outcome relationships.