Overview
We used the Scottish Care Information-Diabetes Collaboration (SCI-DC), a clinical diabetes database that covers the majority of the Scottish population with diagnosed diabetes, to define a cohort of patients with diabetes who were exposed to any insulin therapy from 1 January 2002 (the year of introduction of insulin glargine) up until 31 December 2005. Data for all patients receiving an insulin prescription during this period were extracted from the SCI-DC database and linked to cancer registry data that were available up to the end of 2005. The incidence of all cancers and cancers at specific sites (breast, colon, prostate, pancreas, lung) was compared between those who did and did not receive insulin glargine.
Data used
SCI-DC
Across Scotland, almost all adult patients with a diagnosis of type 1 or type 2 diabetes are registered on the SCI-DC database. The SCI-DC database has been available Scotland-wide since 2000. The estimated coverage of the total adult diabetic population is approximately 99%. This database exists at Health Board level for all Health Boards in Scotland, and each patient record contains a unique identifier, the Community Health Index number. The database captures key diabetes-related data items from hospital clinics, most of which use SCI-DC as their main clinical record system for diabetes. Some hospital clinics use other systems, but these update key items in the SCI-DC database. The database also receives updates of certain data fields nightly from primary healthcare systems, including prescriptions that have been issued. The prescribing data available for this analysis were restricted to the name of the drug prescribed and the date of prescription. Data on dose and directions for use are not yet available. In addition to data from the first date of a patient record into the database onwards, it also contains extensive retrospective data uploaded from other electronic healthcare records at the initial entry of a patient onto the database. The fields extracted from SCI-DC for patients included in this analysis included all prescribed diabetes-related drugs (British National Formulary section 6.1.1 and 6.1.2, i.e. all insulin, biguanides, sulfonylureas and other oral glucose-lowering drugs), age, sex, BMI, age at diagnosis, type of diabetes as designated by clinician, smoking history and Scottish Index of Multiple Deprivation (SIMD) score derived from postcode. The SIMD score is a geographic indicator of deprivation that is based on 37 indicators across the domains of current income, employment, health, education, skills and training, housing, geographic access and crime [10]. In the present study, quintiles of the SIMD score have been used, with the lower quintile representing the most deprived of our cohort. As of 2009, 219,965 live patients with diabetes in Scotland are registered on this database. For the purpose of these analyses, type of diabetes was categorised as definite type 1 if age at diagnosis or use of first insulin therapy was below 30 years, as definite type 2 if age at diagnosis and use of first drug treatment were 35 years or above, and indeterminate if age at diagnosis was 30–35 years.
Cancer Register (Scottish Morbidity Record)
The Scottish Cancer Registry was set up in 1958 and has been managed by the Information Services Division of NHS National Services Scotland since 1997. The registry receives notification of cancer from hospital systems, including discharges, radiotherapy, oncology, haematology and pathology records, prospective audit datasets, deaths from the General Register Office for Scotland, and paper records from private hospitals. Other staff verify the notification, validate the information already held and abstract additional information from hospital medical records and local hospital systems before the data are finalised. Data quality is monitored using routine indicators, computer validation, ad hoc studies of data accuracy and completeness of ascertainment, and through data exchange with specialist registries. A recent study estimated that breast cancer ascertainment exceeds 98% [11]. Of the data items reported, we used the following in this analysis: date of diagnosis, site of tumour and mortality. The International Classification of Diseases, 9th and 10th revisions (ICD-9 and -10), were used to code the site of previous and incident cancers. For this analysis, all non-melanoma skin cancer tumours were captured; all C codes in ICD-10 codes except non-melanoma skin cancer C44. Incident prostate cancer was defined as ICD-10 code C61 and subcodes; colorectal cancer was defined as ICD-10 codes C18, C19 and C20 and their subcodes; pancreatic cancer was defined as ICD-10 code C25 and its subcodes. Breast cancer was defined as ICD-10 code C50, and lung cancer was defined as ICD-10 codes C33, C34 and their subcodes. There is typically a lag time of about 2–3 years in availability of validated data for research purposes, so data were only available up to the end of 2005. The completeness and accuracy of the Cancer Register data have been extensively validated: recent estimates for accuracy and sensitivity for breast cancer are 95.7% and 97.8%, respectively [12].
Deaths (General Registrar’s Office for Scotland-deaths)
All deaths that occur in Scotland are captured into the death register of the General Registrar’s Office for Scotland (GROS), on completion of a death certificate. For each death, the GROS assigns a single code for the underlying cause of death and, depending on what was written on the death certificate, may assign several other codes for other factors that contributed to death. The ICD-10 coding system is used for data from 2000 onwards. For this analysis, we extracted all death records when there was any mention of cancer in the fields concerning underlying cause of death and contributory cause of death.
Governance and ethics
As part of the core programme of work of the SDRN Epidemiology Group, approval was obtained for anonymised linkage of SCI-DC data to specified, centrally held health data sets. Approval was obtained from the Scotland A Research Ethics Committee, Caldicott Guardian for all 14 Health Boards and the ISD Privacy Advisory Committee including linkage to cancer registration data held at ISD.
Linkage methods
Linkage is carried out by ISD, and researchers only have access to anonymised data. Two approaches to linkage were used: exact linkage and probabilistic linkage. Exact linkage was performed using the Community Health Index number, which is noted on all SCI-DC records and most cancer registry records but not GROS death records. When the record did not have a Community Health Index number, probabilistic linkage was performed using a selection of identifiers common across datasets. In Scotland, both the false-positive rate (the proportion of pairs that are incorrectly linked) and the false-negative rate (the proportion of pairs that the system fails to link) for this approach is less than 3% [13].
Statistical methods
There are several possible ways to test the hypothesis of an association between insulin glargine prescription and cancer incidence in these observational data, each of which can be subject to different biases. Thus, we analysed the data in three ways, each of which potentially yields different information. Our approach did not assume any induction time and assumed that the effects, if any, of exposure would continue beyond the exposure period. Associations were declared as statistically significant if the p value was <0.05. All analyses were performed using STATA/MP version 10.0 for Unix (StataCorp, College Station, TX, USA); using the Cox, stset and stsplit procedures for survival analysis.
Fixed cohort analysis
In this analysis we chose a 4 month period between 1 July 2003 and 31 October 2003 during which insulin glargine prescription was widespread and reasonable follow-up time remained. All patients receiving any type of insulin at any time over these 4 months were entered into the analysis. Patients were defined as being exposed to insulin glargine or not during this 4 month period and were then followed up without regard to any subsequent change in exposure status (akin to an intention-to-treat analysis). By ignoring transition to other exposures, this can minimise possible reverse causation bias in some circumstances but can bias towards the null. We used Cox proportional hazards models to examine our primary hypothesis of whether the incidence of any cancer (and cancer at specific sites) varied by exposure to insulin glargine. As the analysis by Hemkens et al. [8] was restricted to insulin glargine only users, we also examined effects in insulin glargine only users and non-glargine plus glargine insulin users separately. We used attained age as the timescale of the model, and the entry time was 31 October 2003. The follow-up for each person was continued to the first of the following: date of first cancer registration or cancer death, date of death from any cause, or 31 December 2005. For analysis of specific types of cancer, the right censoring event date was the date of the first cancer of that type. For all analyses, confirmation that the patient was still under observation within Scotland was confirmed by the availability of other data items in the database throughout the follow-up period (HbA1c, BMI, BP and prescription records). We included type of diabetes, calendar year and prior cancer as covariates (we also confirmed that omitting those with prior cancer gave similar results). We then extended these models by including covariates that differed substantially between the exposure categories at baseline, including BMI, systolic and diastolic BP, smoking, glycaemic control, other concurrent diabetes medications and socioeconomic status. We used models that adjusted for type of diabetes and checked the effects in models for each type of diabetes separately. For all models, we confirmed that the assumption of proportional hazards was not violated by testing for a non-zero slope in a regression of scaled Schoenfeld residuals against time; a non-zero slope is an indication of a violation of the proportional hazards assumption. Individual tests by covariate were performed, as well as a global test. As there was some departure from proportionality of hazards by sex, the models were stratified by sex.
Incident insulin cohort
It could be argued that a caveat of the fixed cohort approach above, especially in type 2 diabetic patients, is that observed differences between exposure categories in the fixed cohort could reflect differences between groups in the stage of progression of their diabetes and prior treatments, for which we have incomplete information. Therefore, we also undertook an analysis among type 2 diabetes patients that was restricted to those who starting insulin therapy for the first time during follow-up from 1 January 2002 (the year insulin glargine was first prescribed) to the end of 2005. In this analysis, exposure was classified based on insulin treatment in the first 4 months of use. The entry time to the cancer incidence models was the end of the 4 month period (to ensure that the period during which exposure is defined is separate from the observation period). The follow-up for each person was continued to the first of the following: date of first cancer or cancer death, date of death from any cause, or 31 December 2005. The timescale of the model was attained age. The same covariate adjustments were made as for the fixed cohort analysis.
Analysis with exposure classification across the follow-up period
The fixed cohort analysis described above is an intention-to-treat analysis and ignores the reality that patients transition between exposure categories. Therefore, we also categorised patients on the basis of their exposure across their entire follow-up period. In this analysis, those on insulin glargine only never received insulin glargine concomitantly with any other type of insulin, those on non-glargine insulin only never had any insulin glargine at any time during follow-up, and those on non-glargine plus glargine insulin were using insulin glargine concomitantly with another type of insulin for at least some of the time. This analysis uses the data available more completely and defines actual exposure more accurately but at the cost of being more prone to reverse causation bias. As before, Cox proportional hazards models were used, with entry time being date of first insulin use.