Data sources
The different data sources used within this study have been described elsewhere [7]. Briefly, we obtained data on a cohort of 800,551 new statin users from RAMQ. For this study, we used data from both the RAMQ databases (i.e., demographic database, medical services, and claims database and pharmaceutical database) and from the MED-ECHO databases (i.e., hospitalization—description database, hospitalization—diagnoses database, and hospitalization—intervention database). Patient records were linked across all databases by use of a unique identification number which was encrypted to protect patient confidentiality. Access to data was granted by the Commission d’accès à l’information and the protocol was approved by the Centre hospitalier de l’Université de Montréal’s ethics’ committee.
Full cohort
The Full Cohort used within this study has been described elsewhere [7]. Briefly, it was comprised of 404,129 patients newly initiated on a statin (either simvastatin, lovastatin, pravastatin, fluvastatin, atorvastatin, or rosuvastatin) between January 1st, 1998 and December 31st, 2010. Patients were defined as having been newly initiated on a statin if they did not receive any statin dispensation in the year prior to the date of first statin dispensation (hereby defined as the cohort entry date).
Identification of exposure group
All patients were categorized into two groups based on the strength of the daily statin dose of their first statin dispensation [12]. Patients initiated on a daily dose of ≥10 mg of rosuvastatin, ≥20 mg of atorvastatin or ≥40 mg of simvastatin formed the high dose group and the remaining patients formed the low dose group.
Identification of the study outcome
Onset of diabetes within 2 years follow-up was used as our study outcome. Patients were defined as cases if they received either a dispensation of a drug used in the treatment of diabetes (WHO ATC A10) or a diagnosis of diabetes (ICD-9 code: 250.x; ICD-10 codes: E10.x—E14.x) within the 2 years following the cohort entry date; all other patients were considered to be diabetes-free.
High-dimensional propensity score method
Two distinct hdPS models were created and resulting hdPS were calculated for all patients included in the Full Cohort. Detailed description of the hdPS method can be found elsewhere [5]. Both models were created using the default setting of the SAS hdPS macro v.1 [22].
Six potential data dimensions were defined using the data collected from the year prior to the cohort entry date: (1) drugs dispensed in an outpatient setting, (2) physician claims for procedures codes, (3) physician claims for diagnostic codes, (4) specialty of the physician providing care, (5) hospitalization discharge data for inpatient procedure codes, and (6) hospitalization discharge data for inpatient diagnostic code.
Full information model
The first hdPS model (hereby defined as hdPS full info model) was created by selecting the top 500 covariates, as assessed by the hdPS algorithm, contained within all 6 data dimensions. In addition to these 500 covariates, the following known confounders were forced within the hdPS full info model: [12] patients’ sex, age, poverty level status (yes versus no) at the cohort entry date, year of entry within the cohort (as a categorical variable), and ≥1 hospitalization, ≥5 outpatient visits, ≥5 distinct drugs dispensed to the patient, all within the year prior to the cohort entry date. The resulting hdPS full info model was used to estimate each patient’s hdPS-1.
Hidden information model
The second hdPS model (hereby defined as the hdPS hidden info model) was created by selecting the top 500 covariates, as assessed by the hdPS algorithm, contained within the 2 data dimensions provided from the MED-ECHO databases since it was believed a priori that it would contain less potential covariates, therefore increasing the risk of unmeasured confounding (the 4 data dimensions provided by RAMQ were hidden to the algorithm). In addition to these 500 variables, the following covariates were forced within the hdPS hidden info model: patients’ sex, age, and poverty level status (yes versus no) at the cohort entry date, the year of entry within the cohort (as a categorical variable) and ≥1 hospitalization in the year prior to the cohort entry date. Within this model, hospitalization status (≥1 hospitalization yes vs no) was assessed solely from data available within the MED-ECHO databases. Outpatient medical resource utilization and outpatient drug dispensation covariates, forced within the previous model, were excluded from this list since they were based on information solely available within the RAMQ databases. The resulting hdPS hidden info model was used to estimate each patient’s hdPS-2.
Creation of the matched sub-cohorts
Trimming was performed and patients located within non-overlapping regions of the hdPS-1 distribution were excluded [23–25], all other patients were eligible for inclusion within the Matched hdPS Full Info Sub-Cohort. Low dose controls were found for patients initiated on a high dose using a greedy, nearest neighbor 1:1 matching algorithm. Matching occurred if the difference in the logit of hdPS-1 between the nearest neighbors was within a caliper width equal to 0.2 times the SD of the logit of the hdPS-1 [26]. Patients selected by the matching algorithm were included within the Matched hdPS Full Info Sub-Cohort. These two steps were reproduced using hdPS-2 in order to create the Matched hdPS Hidden Info Sub-Cohort.
Statistical analyses
Patients’ baseline characteristics within both sub-cohorts were assessed using the information provided from the full database. Absolute standardized differences (ASDD) were used to compare patients’ baseline characteristics between patients included in the high dose group versus those included in the low dose group within both matched sub-cohorts [19, 21]. ASDD <0.1 are generally assumed to indicate good balance between groups [21, 27].
Discrete data are presented in absolute values and percentages and continuous data are presented as mean (± SD). All statistics were performed using SAS version 9.3 (Cary, North Carolina).