Background

The increasing global prevalence of diabetes and cancer has significant global health implications. Epidemiological evidence suggests that people with diabetes are at a significantly higher risk of various cancers, including hepatic, pancreatic, endometrial, colorectal, bladder, and breast cancers, whereas male patients with diabetes have a lower prevalence of prostate cancer than those without diabetes [1]. Clinical evidence has indicated a positive association between cancer and concomitant abnormalities in glucose metabolism. However, the potential biological links between these diseases are not completely understood.

Diabetic retinopathy (DR) is the most common microvascular complication in patients with diabetes and the leading global cause of vision loss in working middle-aged adults [2]. The pathological processes of DR include hyperglycemia and the polyol pathway, advanced glycation end-product formation, protein kinase C activation, hexosamine pathway flux, and poly (ADP-ribose) polymerase activation, which share similar pathogenic features with cancer initiation and progression [3,4,5,6,7]. Furthermore, oxidative stress, inflammation, vascular abnormalities, and angiogenesis are closely associated with pathological changes in the progression of DR, which are also involved in pathophysiological conditions for cancer development [8,9,10]. These findings suggest that DR and cancer may share similar pathogenic features and that improving diabetes control may further reduce the risk of cancer development.

Given the similarities in the pathogenesis and global impact of and mortality caused by both diseases, additional large-scale longitudinal studies that stratify diabetes into DR and non-DR subtypes and focus on the relationship between cancer and diabetes may help clarify the potential biological links between the two diseases. Therefore, this retrospective nationwide cohort study used data from the Taiwan National Health Insurance Research Database (NHIRD) to investigate the relationship between cancer and diabetes or DR. To achieve this goal, Cox proportional hazards regression analyses were performed using two cohorts with propensity-matching by age, sex, and comorbidities, which minimized confounding variables arising from the use of observational data.

Methods

Data sources

This nationwide, 1:1 matched, retrospective cohort study was conducted between January 2007 and December 2018. The database contains all registry files and details regarding original claims data obtained from the NHIRD, the Taiwan Cancer Registry (TCR), and the National Death Registry (NDR) of Taiwan. Taiwan launched a single-payer National Health Insurance program on March 1, 1995. As of 2014, 99.9% of Taiwan’s population were enrolled. The database of this program contains registration files and original claim data for reimbursement. Starting in 2002, Taiwan’s National Health Research Institutes established and continue to maintain NHIRD for public research purposes. The NHIRD, collecting data from almost all medical facilities in Taiwan, is a large, powerful data source for approved medical research [11]. Approximately 27.22 million individuals were included in this registry. All data in the database were encrypted to protect the privacy of individuals. The database provides detailed outpatient and inpatient claims data, including patient identification number, birth date, sex, treatment information, dates of admission and discharge, date of death, and diagnostic codes according to the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes before 2016 and ICD-10-CM codes (10th revision) since 2016. Each patient has a unique encrypted identifier that can be linked to the TCR and NDR. All datasets were interlinked using patient identification numbers.

Study cohort and patient selection

This study was approved by the Institutional Review Board of Tzu Chi Hospital, Hualien (TCHIRB109-108-C). For this retrospective study, informed consent was waived in accordance with the institutional guidelines. Among a total of 27,228,099 patients in the NHIRD between January 1, 2007, and December 31, 2018, those with unknown sex (n = 1,867,827) and age (n = 39,302) were excluded; the exclusion of these cases was based on the rationale that sex and age are two major variables for propensity matching in this study. Overall, 3,111,975 patients with primary diabetes (ICD-9-CM 250 or ICD-10-CM E10, E11) and 22,208,395 patients without diabetes were initially enrolled in the diabetes and non-diabetes groups, respectively, as the main study cohorts. Patients with secondary diabetes caused by factors that may also be independent risk factors for cancer (e.g., certain viral infections like hepatitis B virus or C virus) were excluded. Participants were further excluded from the diabetes group if they had cancer before the diagnosis of diabetes (n = 170,398) or if they were aged < 20 years (n = 16,651). To avoid confounding effects of patients’ characteristics and comorbidities, the resulting available patients with and without diabetes were further matched in a 1:1 ratio by age, sex, and Charlson index comorbidity (CCI). Finally, 2,068,075 patients from each group were included in this study. The index date was defined as the date of diabetes onset. After excluding the end date before the index date (n = 138,339) and follow-up of less than 1 year (n = 178,279), 1,751,457 paired study participants in each group were obtained. In addition to the main study cohort, stratified populations for diabetes with DR (ICD-9-CM 362.0X or ICD-10-CM E10.3X, E11.3X; n = 380,822) and without DR (n = 380,822) were obtained. Patients with DR were further stratified into proliferative DR (PDR, n = 141,150) and non-PDR (NPDR, n = 141,150) groups according to the presence (ICD-9-CM 362.02 or ICD-10-CM E10.35X, E11.35X) or absence (ICD-9-CM 362.01 or ICD-10-CM E10.32X, E10.33X, E10.34X, E11.32X, E11.33X, E11.34X) of retinal neovascularization. All stratifications were performed using similar exclusion criteria and matching procedures to those of the main cohorts. The detailed data flow of the study is shown in Fig. 1.

Fig. 1
figure 1

Study protocol and profile. A The selection flow chart and selected populations for the diabetes group and the control cohort. B The stratification flow chart and the stratified populations for diabetes with DR vs. diabetes without DR. C The stratification flow chart and the stratified populations for PDR vs. non-PDR. DM, diabetes mellitus; DR, diabetic retinopathy; PDR, proliferative diabetic retinopathy; y/o, years old

Outcomes measures

The study endpoint was the first incidence of cancer at any site during follow-up, identified according to TCR. Only the first occurrence of cancer was considered when calculating the cancer incidence. As identified by the corresponding ICD-10-CM codes of cancer, the sites were defined as follows: the lip, oral cavity, and pharynx (C00–C14); digestive organs (C15–C26), including the esophagus (C15), stomach (C16), colon (C18), liver (C22), gallbladder (C23), and pancreas (C25); respiratory and intrathoracic organs (C30–C39); the bone and articular cartilage (C40–C41); the skin (C43–C44); the soft tissue (C45–C49); the breast (C50); female genital organs (C51–C58); male genital organs (C60–C63); the urinary tract (C64–C68), including the kidney (C64) and others (C65–68); the eye (C69); the brain and other parts of the central nervous system (C70–C72); and lymphoid, hematopoietic, and related tissue (C81–C96). If a participant had lesions of different severity levels in both eyes, the grade assigned to them was that of the more severely involved eye. All outcomes were assessed during the follow-up period between the index date and December 31, 2018. Baseline comorbidities were identified using the ICD-9 codes, including CCI, hypertension (401. X–405.X, 437.2, 362.11), and hyperlipidemia (272.X).

Statistical analysis

Baseline characteristics, including age, sex, hypertension, hyperlipidemia, and CCI score were compared between two study groups using standardized mean difference (SMD). The incidence rate of cancer was calculated per 100,000 person-years, and the incidence ratio between two study groups was calculated. The Cox proportional hazards model was used to assess the adjusted hazard ratios (HR) and 99% confidence intervals (CI). The classification for the increased cancer risk was defined as follows: borderline, HR between 1.10 and 1.19; moderate, HR between 1.20 and 1.49; and high, HR ≥ 1.50. All models were adjusted for the characteristics listed in Table 1. Data analyses were performed using SAS version 9.4 for Windows (SAS Institute, Inc., Cary, NC, USA). All statistical tests were 2-sided, and a p-value < 0.01 or SMD > 0.1 was considered statistically significant.

Table 1 Characteristics of the study population

Results

The demographic characteristics and comorbidities of all cohorts are shown in Table 1.

Diabetes versus non-diabetes

During the 12-year follow-up period, the overall mean annual incidence of total cancer was higher in patients with diabetes than patients without (1309.74 per 100,000 person-years vs. 1130.13 per 100,000 person-years; incidence ratio, 1.17) (Table 2).

Table 2 Incidence of events (100,000 person-years)

In the multivariate survival analysis, patients with diabetes (HR, 1.20; 99% CI: 1.19–1.21; p < 0.001) and CCI (HR. 1.23; 99% CI: 1.22–1.24; p < 0.001) showed moderately increased risk of subsequent total cancer development; male sex (HR, 1.19; 99% CI: 1.18–1.20; p < 0.001) and hypertension (HR, 1.10; 99% CI: 1.09–1.11; p < 0.001) both had a borderline significantly higher incidence of subsequent total cancer, except for hyperlipidemia (HR, 0.86; 99% CI: 0.85–0.87; p < 0.001), which was independently associated with a decreased risk of subsequent total cancer.

Patients with diabetes had a significantly higher incidence of subsequent liver (HR, 1.69; 99% CI: 1.63–1.74; p < 0.001) and pancreas (HR, 1.87; 99% CI: 1.73–2.02; p < 0.001) cancers. We also observed a moderately increased risk of the oral cavity and pharynx (HR, 1.30; 99% CI: 1.24–1.36; p < 0.001), colon (HR, 1.25; 99% CI: 1.21–1.29; p < 0.001), gallbladder (HR, 1.34; 99% CI: 1.20–1.50; p < 0.001), female genital organs (HR, 1.30; 99% CI: 1.22–1.37; p < 0.001), kidney (HR, 1.44; 99% CI: 1.34–1.53; p < 0.001), and brain and other parts of central nervous system cancers (HR, 1.31; 99% CI: 1.17–1.48; p < 0.001). Furthermore, there were borderline increases in the risk of stomach (HR, 1.19; 99% CI: 1.13–1.26; p < 0.001), skin (HR,1.17; 99% CI: 1.09–1.25; p < 0.001), mesothelial and soft tissue (HR, 1.18; 99% CI: 1.02–1.37; p = 0.003), female breast (HR, 1.17; 99% CI: 1.11–1.22; p < 0.001), and urinary tract cancer (except kidney) (HR, 1.17; 99% CI: 1.10–1.25; p < 0.001) and lymphatic and hematopoietic malignancies (HR, 1.19; 99% CI, 1.13–1.26; p < 0.001). Conversely, patients with diabetes had a lower risk of subsequent esophagus cancer than those without diabetes (HR, 0.83; 99% CI: 0.76–0.92; p < 0.001) (Table 3).

Table 3 Predictors of total cancer and cancer in specific sites by multivariate analysis

Diabetes with DR versus diabetes without DR

During a follow-up period of 12 years, the overall mean annual incidence of total cancer was significantly higher in diabetes patients with DR than in diabetes patients without DR (1494.33 per 100,000 person-years vs. 1151.51 per 100,000 person-years; incidence ratio, 1.32) (Table 2).

In the multivariate survival analysis, diabetes with DR was independently associated with an increased risk of subsequent total cancer development (HR, 1.31; 99% CI: 1.28–1.34; p < 0.001). Males also had a moderately higher incidence of subsequent total cancer (HR, 1.25; 99% CI: 1.23–1.28; p < 0.001), whereas hypertension and hyperlipidemia did not. Regarding cancer sites, patients with DR showed a significantly increased risk of subsequent liver, mesothelial and soft tissue, and urinary tract cancers. We also observed a moderately increased risk of lip, oral cavity and pharynx, stomach, colon, pancreas, respiratory and intrathoracic organs, skin, female breast, and lymph and hematopoietic cancers. Similarly, patients with DR showed a trend toward an increased risk of subsequent esophageal, gallbladder, bone and articular cartilage, male genitalia, and eye, but the increase did attain a statistically significant difference in the multivariate analysis (Table 3).

Development of cancer in different stages of DR

The overall mean annual incidence of total cancer was higher in PDR patients than in NPDR patients (1464.64 per 100,000 person-years vs. 1329.56 per 100,000 person-years; incidence ratio, 1.13). Meanwhile, multivariate analysis showed an increased risk of subsequent total cancer development (HR, 1.13; 99% CI: 1.10–1.17; p < 0.001) in PDR patients than in NPDR patients.

Regarding the site of cancer, PDR patients showed a moderately increased risk of stomach, liver, female genital, and urinary tract cancer and a borderline significantly increased risk of colon cancer compared to NPDR patients. Similarly, patients with PDR showed a trend toward an increased risk of subsequent cancers of the lip, oral cavity and pharynx, gallbladder, respiratory and intrathoracic organs, bone and articular cartilage, skin, and lymph and hematopoietic cancer, but the increase did attain a statistically significant difference in the multivariate analysis. In contrast, PDR patients showed a decreased risk of male genital cancer compared with NPDR patients (Table 3).

Discussion

In this study, patients with diabetes were associated with a 20% higher risk of the subsequent development of total cancer incidence compared to patients without diabetes. Notably, we firstly observed that patients with DR had a 32% higher cancer incidence than those without. Furthermore, patients with PDR have a 13% higher risk of cancer in comparison to patients with NPDR. Our study encompasses a nationwide cohort, and the findings contribute to the body of evidence on the relationship between diabetes and cancer risk, confirming prior research on the association between diabetes and cancer risk [12,13,14,15,16,17,18]. DR is the most common microvascular complication in patients with diabetes. The propensity to develop DR is directly proportional to patient age, diabetes duration, and poor glycemic control [2]. A meta-analysis including nineteen studies that compared persons with high versus low levels of serum glucose (cut-off > 6.1 mmol/L) showed a positive association between serum glucose and risk of cancer with a pooled RR of 1.32 (95% CI: 1.20–1.45) [19]. Furthermore, sudden variations of blood glucose may play an important role in DR; therefore, glycemic variability (GV) may be useful in predicting complications of diabetes such as DR [20]. Our findings are aligned with the results of a recent prospective cohort study that included 15,286 participants, which indicated that high GV was associated with increased risk of all-site, breast, liver cancer, and cancer-specific death in diabetes [21].

The exact link between diabetes and cancer development remains unclear. Although the analysis of claims data is not designed with biological conclusions in mind, these results raise the hypothesis that DR and cancer may share some possible similar pathogenic features. DR patients have significantly higher levels of serum vascular endothelial growth factor (VEGF) and angiopoietin-2 (Ang-2) than non-DR individuals [22, 23]. Interestingly, both tumorigenesis and DR involve VEGF- and Ang-2-mediated pathways, and pharmaceutical agents targeting these factors have been effective in treating both diseases [24,25,26]. Moreover, VEGF and Ang-2 promote endothelial cell expression of intercellular adhesion molecule 1, leading to leukocyte activation and cytokine release, thereby causing further increases in VEGF expression and amplifying the inflammatory response [22, 27].

In addition to VEGF and Ang-2, several pathophysiological features have been observed in DR and cancer. First, pericyte loss is the earliest clinical sign of DR; the possible mechanisms linking pericyte apoptosis in DR include increased oxidative stress and nuclear factor-κB (NF-kB) activation [28]. Similarly, pericytes are also implicated as mediators of several processes associated with cancer pathophysiology, including tumor angiogenesis and metastasis [29]. Additionally, NF-κB is the most important molecule linking chronic inflammation to cancer; its activation occurs in cancer cells and tumor microenvironments in most solid cancers and hematopoietic malignancies [30]. Furthermore, platelet-derived growth factors (PDGFs) are growth factors that regulate cell growth and division. Increased PDGF levels, which are the main pathological characteristic of DR, especially the impairment of endothelial migration and proliferation by the inflammatory and angiogenic effects of PDGFs [31]. Intriguingly, PDGF signaling overactivity is associated with the development of numerous types of malignancies [32].

Systemic inflammation is an intrinsic response to diabetes and can promote or increase the risk of many different cancers, including liver, pancreatic, colon, breast, and other malignancies [33]. Several inflammatory mediators play roles in both DR and cancer. Our results revealed that patients with diabetes tended to have a greater cancer risk than their matched controls, and this trend intensified when DR developed. DR development refers to the breakdown of the blood-retinal barrier. Increasing evidence supports the role of proinflammatory cytokines, chemokines, and other inflammatory mediators in the pathogenesis of DR [34,35,36,37], leading to persistent low-grade inflammation. The inflammatory mediators released in DR may further trigger cancer pathogenesis, thereby increasing the likelihood of cancer.

The natural history of DR has been divided into two stages based on the proliferative status of the retinal vasculature: early NPDR and advanced PDR; our findings showed that patients with PDR had a higher overall mean annual cancer incidence than those with NPDR. PDR patients exhibit significantly elevated levels of serum interleukin (IL)-1β, tumor necrosis factor α, IL-6, VEGF, and matrix metalloproteinases (MMPs) than NPDR patients [38, 39]. Several studies have demonstrated a positive correlation between MMPs expression and the invasive and metastatic potential of malignant tumor [40]. Furthermore, transforming growth factor ꞵ1 (TGFꞵ1) is a pro-inflammatory cytokine implicated in the pathogenesis of DR, particularly in the late phase of the disease. TGFβ released and activated within the tumor microenvironment promotes cancer progression. Enhanced TGFβ signaling promotes cancer cell invasion, dissemination, and suppresses the sensitivity to anticancer drugs [41]. Our study findings are also supported by the fact that PDR patients have more severe pathology and inflammation than NPDR patients. In addition, cancer-associated retinopathy is a rare paraneoplastic disorder with loss of visual acuity caused by circulating antibodies formed against the retinal proteins in the presence of systemic cancer [42]. Since our patients had DR first and subsequently developed cancer, these two types of retinopathy have different etiological bases.

Except for female breast cancer, we found a significantly lower incidence of all-site cancer in patients with hyperlipidemia than in those without hyperlipidemia. Most epidemiological studies have reported inconsistent results regarding the association between hyperlipidemia and cancer incidence [43,44,45,46,47,48,49,50,51,52,53]. Cancer is known to have a protean physiological effect, which might include metabolic depression of blood cholesterol [54] or competing risks, and patients showing high total serum cholesterol (TSC) levels are more likely to be censored owing to cardiovascular mortality before they are diagnosed with cancer [55]. However, some studies found inverse associations with a time lag of ≥ 4 years between baseline cholesterol level and cancer diagnosis [48, 55, 56]; thus, the possibility of a direct effect of cholesterol on cancer still cannot be completely ruled out. In our study, hyperlipidemia was positively associated with breast cancer in women. Previous animal models have implied that increased plasma cholesterol levels might accelerate breast cancer development and exacerbate their aggressiveness [57]. Our results are also consistent with findings from a prospective large longitudinal cohort study in Korean adults [50], which showed that high TSC levels were positively associated with breast cancer risk in women.

This study has several strengths. First, the NHIRD contains all claims data recorded electronically, ensuring accuracy and avoiding recall biases. Second, data from the NHIRD provide population-based and representative claims information for insured people in Taiwan and reduce the likelihood of selection bias. Third, the large dataset size and longitudinal study design provided considerable statistical power, enabling the effective detection of differences between the cancer and control cohorts. This type of longitudinal cohort study has advantages over cross-sectional or case–control studies because the design allows the researchers to examine the natural course of cancer development over an extended period of time. However, some shortcomings of this type of design such as death as the competing risk for the event should be considered.

Study limitations

First, the NHIRD is an administrative database lacking laboratory results, such as HbA1c, and it cannot differentiate between the subtypes of hyperlipidemia (hypercholesterolemia, hypertriglyceridemia, or both in combined hyperlipidemia). Some important risk factors for cancer are not available in the NHIRD, such as education level, drinking and smoking habits, body mass index, physical activity, and family history of cancer, which might have confounded our results. Second, although the current retrospective cohort study was more efficient than a prospective study, some potential risk factors could not be obtained owing to the retrospective nature of the study. Third, early cancer might have been asymptomatic, and individuals with early cancer might have been undiagnosed, which could have led to misclassification bias. This non-differential misclassification could bias our results toward the null hypothesis and dilute the real difference in cancer incidence between the two cohorts. Fourth, early-stage diabetes might also have been underdiagnosed, which could have resulted in group misclassification. Fifth, this study lacked information on the patients’ use of cholesterol-lowering drugs. However, previous studies have not suggested a strong link between drug use and the incidence of cancer [58, 59]. Sixth, most of our study participants were ethnically Chinese from Taiwan, which might affect the generalizability of our results to other ethnic groups. Finally, this epidemiological study only supports the concept that there is a correlation between diabetes and tumorigenesis. Direct evidence to prove the causal relationship between these two diseases is not feasible to obtain in human studies considering the long follow-up period to identify the event (cancer). Perhaps, findings from relevant animal models may provide novel evidence to support this relationship. Analyzing serum biomarkers related to cancer development may help to provide some indirect evidence to support the concept. However, NHIRD does not contain laboratory data from patients, and clinical tests that analyze patients’ samples are not part of the scope of this study.

Conclusions

This nationwide population-based study provides evidence reinforcing an association between diabetes and the overall cancer incidence. To our knowledge, we first explored the association between DR and cancer. The result contributes novel insights by unveiling that patient with DR were at a significantly greater risk of subsequent cancer development at specific sites than their matched controls. These results raise the possibility that diabetes and DR may share common pathogenic features with cancer, and strict blood glucose control to prevent DR in patients with diabetes may further reduce cancer development. Further studies are required to better understand the underlying processes.