Background

Observational studies provide real world information on drug use and their potential effect on health outcomes but are prone to confounding by indication [14]. The traditional propensity score (PS) method is often used to control for confounding by indication. It represents “the conditional probability of assignment to a particular treatment given a vector of observed covariates” and is generally obtained thanks to a logistic regression model [5].

The high-dimensional propensity score (hdPS) method has recently been proposed and has been rapidly and widely adopted to address confounding by indication [6, 7]. Unlike the PS method which is limited to investigator-specified covariates; the hdPS method also uses a computerized algorithm to select a large number of potential confounders contained within the examined database [5, 7].

It is of interest to compare the performance of these two methods in controlling for confounding by indication to inform the design of future observational studies. Performance of both methods may be compared using two distinct approaches, 1) by examining the balance achieved on key potential confounders between sub-cohorts matched on these two scores [4, 811], and 2) by comparing the measures of associations obtained from the matched sub-cohorts to a gold standard comparator [7, 1214].

Recently, results of a meta-analysis of randomized controlled trials (RCT) have found that exposure to higher statin doses might be associated with higher risks of diabetes (Odds ratio [OR] =1.12 [95 % confidence intervals (CI) 1.04–1.22]) [15]. Although results obtained from observational studies have been conflicting [16], four out of five published studies found a small increased but statistically significant dose-dependent relationship [1720]. However, it is possible that in those observational studies, patients at higher risk of diabetes were more likely to be initiated on higher statins doses: a classic example of confounding by indication offering an excellent opportunity to compare the relative performance of these two scores. In this study, we aim to compare the performance of the PS and hdPS methods in adjusting for confounding by indication using the two approaches defined above.

Methods

Data sources

This study was performed using medico-administrative databases from the province of Quebec, Canada. Quebec is the second most populated province in Canada, with more than 8 million inhabitants [21]. A unique identification number is assigned to every individual, and all diagnoses and all health services provided are systematically recorded within the Régie de l’assurance maladie du Québec (RAMQ) databases. Pharmaceutical claims are also recorded but only for residents covered by the RAMQ public drug insurance plan. Information was obtained from the Quebec physician’s service and claims databases (i.e. RAMQ databases) and the Quebec hospitalisation databases (i.e., Maintenance et Exploitation des Données pour l’Étude de la Clientèle Hospitalière [MED-ECHO] databases), which have previously been validated [2225]. For this study we used three RAMQ databases (i.e., the Demographic, Medical Services and Claims and Pharmaceutical databases) and three MED-ECHO databases (i.e. the Hospitalisation Descriptions, Diagnoses and Intervention databases). Patient records were linked across all databases by use of the unique identification number. The identification numbers were encrypted to protect patient confidentiality. Access to data was granted by the Commission d’accès à l’information and the protocol was approved by the Centre hospitalier de l’Université de Montréal ethics’ committee.

Full Cohort

RAMQ provided us with a cohort of 800,551 incident statin users; the date of the first statin dispensation was defined as the cohort entry date. Patients were considered to be incident statin users if they did not have a claim for a statin dispensation in the year prior to the cohort entry date. Eligible patients had: 1) to have been newly initiated on either simvastatin, lovastatin, pravastatin, fluvastatin, atorvastatin or rosuvastatin between January 1st 1998 and December 31st 2010, 2) to be covered by the RAMQ public drug insurance plan for at least a year prior to the cohort entry date and 3) to be at least 40 years of age at the cohort entry date. We excluded every patient who, in the year prior or on the cohort entry date: 1) received any other cholesterol lowering drug dispensation (including niacin, cerivastatin or a combination statin drug); 2) received a dispensation for drugs used in the treatment of diabetes (WHO ATC A10) [26]; 3) received a diagnosis of diabetes (ICD-9 code: 250.x; ICD-10 codes: E10.x – E14.x); 4) were admitted in a long-term care facility or 5) received >1 statin dispensation on the cohort entry date. Patients who met both inclusion and exclusion criteria were entered within the Full Cohort.

Exposure status

Patients were categorized into two categories based on the dose and relative potency of the first statin dispensation they received [18, 27]. Patients whose first statin dispensation was for a daily dose of ≥10 mg of rosuvastatin, ≥20 mg of atorvastatin or ≥40 mg of simvastatin formed the high potency group and remaining patients formed the lower potency group.

Outcome status

Every patient who received either a dispensation of a drug used in the treatment of diabetes (WHO ATC A10) or a diagnosis of diabetes (ICD-9 code: 250.x; ICD-10 codes: E10.x – E14.x) within the 2 years following the cohort entry date was defined as a case, all other patients were considered to be diabetes-free.

Propensity score method

We used the same list of variables that were used by Dormuth and colleagues to create the PS model [18]. This list included the following covariates: patients’ sex, age and poverty level status (yes versus no) at the cohort entry date, year of entry within the cohort (as a categorical variable), medical resource utilization variables (≥1 hospitalisation, ≥5 outpatient visits, ≥5 distinct drugs dispensed to the patient, all within the year prior to the cohort entry date), drug dispensation variables (dispensation of loop diuretics, dispensation of acetaminophen, dispensation of calcium blockers, dispensation of beta-blockers, dispensation of angiotensin receptor blockers and dispensation of angiotensin converting enzyme inhibitors, all in the year prior to the cohort entry date) and comorbidity variables (hypertension, hypercholesterolemia, myocardial infarction (MI), stroke, peripheral vascular disease (PVD), congestive heart failure, coronary artery bypass graft, and percutaneous coronary intervention (PCI), all in the year prior to the cohort entry date).

Following the selection of the PS model, patients’ PS were assessed for all patients included within the Full Cohort. Trimming was performed and patients located within non-overlapping regions of the PS distribution were excluded from the analysis, all other patients were eligible for inclusion within the Matched PS Sub-Cohort [28]. Lower potency controls were found for patients in the high potency group using a greedy, nearest neighbor 1:1 matching algorithm. Matching occurred if the difference in the logit of PS between nearest neighbors was within a caliper width equal to 0.2 times the standard deviation (SD) of the logit of the PS [29]. Patients selected by the matching algorithm were included within the Matched PS Sub-Cohort.

High-dimensional propensity score method

hdPS were estimated for all patients included in the Full Cohort [7]. Detailed description of the hdPS method can be found elsewhere [7]. Briefly, the construction of the hdPS model involves two processes, 1) investigators select covariates to be forced within the hdPS model (similar to what is done within an investigator-specified PS model) and 2) the hdPS algorithm selects an additional list of covariates measured within the selected data dimensions based on their multiplicative bias assessment which is then also included within the final hdPS model. Within this study, estimation of patients’ hdPS were conducted using the default setting of the SAS hdPS macro v.1 (i.e., top 200 most prevalent variables per data dimensions, top 500 binary empirical covariates based on multiplicative bias assessment).

We structured the data collected from the year prior to the cohort entry date from the following 6 data dimensions: 1) drugs dispensed in an outpatient setting, 2) physician claims codes for inpatient and outpatient procedures, 3) physician claims for inpatient and outpatient diagnostic codes, 4) specialty of the physician providing care, 5) hospitalisation discharge data for inpatient procedure codes and 6) hospitalisation discharge data for inpatient diagnostic code.

In addition to the 500 variables selected by the default option of the hdPS algorithm [7], we forced the following covariates within the hdPS model: patients’ sex, age and poverty level status (yes versus no) at the cohort entry date, year of entry within the cohort (as a categorical variable), medical resource utilization variables (≥1 hospitalisation, ≥5 outpatient visits, ≥5 distinct drugs dispensed to the patient, all within the year prior to the cohort entry date). These variables were forced in the model since they could not be selected by the SAS hdPS algorithm v.1 we were using. Trimming was performed and patients located within non-overlapping regions of the hdPS distribution were excluded from the analysis [28], all other patients were eligible for inclusion within the Matched hdPS Sub-Cohort. Lower potency controls were found for patients in the high potency group using a greedy, nearest neighbor 1:1 matching algorithm. Matching occurred if the difference in the logit of hdPS between nearest neighbors was within a caliper width equal to 0.2 times the SD of the logit of the hdPS [29]. Patients selected by the matching algorithm were included within the Matched hdPS Sub-Cohort.

Statistical analyses

Absolute standardized differences (ASDD), defined as the absolute between group difference over the pooled SD of the two groups, were used to compare patient characteristics between patients exposed to a high potency versus lower potency statin within the Full Cohort and both sub-cohorts [4, 811]. ASDD < 0.1 are generally assumed to indicate good balance between groups [2, 10]. Discrete data are presented in absolute and relative values (n [%]) and continuous data are presented as mean (SD). OR (95%CI) of diabetes occurrence in the high over lower potency statin groups were estimated within the Full Cohort and within both matched sub-cohorts; no adjustment beyond matching was performed.

All statistical analyses were conducted with SAS version 9.3 (SAS Institute, Cary, North Carolina).

Results

Characteristics of the patients included within the Full Cohort

Figure 1 shows the flow chart of patients included within the Full Cohort, the Matched PS Sub-Cohort and the Matched hdPS Sub-Cohort.

Fig. 1
figure 1

Patient flow-chart within the study. hdPS, High-dimensional propensity score; PS, Propensity score

Baseline characteristics of the Full Cohort are shown in Table 1. Among the 404,129 patients included within the Full Cohort, 264,947 patients (65.6 %) were dispensed a lower potency statin and 139,182 patients (34.4 %) were dispensed a high potency statin on the cohort entry date. About half of patients (192,964 [47.8 %]) were males and the average age was 65.2 years (SD 11.0). Among the 18 examined patient characteristics, eight (44.4 %) were shown to have an ASDD > 0.1 indicating the presence of unbalance. History of a PCI (ASDD = 0.30) and history of a MI (ASDD = 0.27), both in the year prior to the cohort entry date, showed the greatest degree of imbalance (Table 1). In addition, onset of diabetes within 2-years follow-up was identified in 12,978 patients (3.2 %) of the 404,129 patients included within the Full Cohort.

Table 1 Demographic characteristics and comorbidity status of the Full Cohort at baseline

Characteristics of patients included within the Matched PS Sub-Cohort

PS were calculated for all 404,129 patients included within the Full Cohort (kernel density PS curves for all patients included within the Full Cohort are provided in Additional file 1). Fifty-five (0.0 %) patients, 33 (0.0 %) lower potency and 22 (0.0 %) high potency, had PS located within non-overlapping regions and were excluded from the analysis. Among the remaining 404,074 patients, we matched 119,857 patients (29.7 %) initiated on a high potency statin to 119,857 patients (29.7 %) initiated on a lower potency statin based on their individual PS; selected patients formed the Matched PS Sub-Cohort (Fig. 1). This sub-cohort was comprised of 119,931 male patients (50.0 %) and the average age was 64.7 years (SD 11.2) (Table 2). Balance was obtained for all 18 examined patient characteristics (ASDD ranged from 0.002 to 0.031 with an average of 0.015).

Table 2 Demographic characteristics and comorbidity status of the Matched PS Sub-Cohort at baseline

Characteristics of patients included within the Matched hdPS Sub-Cohort

Three hundred and one (0.1 %) patients, 54 (0.0 %) lower potency and 247 (0.1 %) high potency, had hdPS located within non-overlapping regions and were excluded from the analysis (kernel density hdPS curves for all patients included within the Full Cohort are provided in Additional file 2). Among the remaining 403,828 patients, we matched 116,014 patients (28.7 %) initiated on a high potency statin to 116,014 patients (28.7 %) initiated on a lower potency statin based on their individual hdPS; selected patients formed the Matched hdPS Sub-Cohort (Fig. 1).

Patients included within the Matched hdPS Sub-Cohort were on average 64.6 years old (SD 11.2) and 116,688 of them were males (50.3 %) (Table 3). Balance was obtained in all 18 examined patient characteristics, whether or not they were forced within the hdPS model (ASDD ranged from 0.001 to 0.023 with an average of 0.008).

Table 3 Demographic characteristics and comorbidity status of the Matched hdPS Sub-Cohort at baseline

Performance of the PS and hdPS in adjusting for confounding by indication

As mentioned previously, performance of both methods in adjusting for confounding by indication was tested by two distinct approaches, 1) by comparing the ASDD obtained within both sub-cohorts and 2) by comparing the adjusted OR of diabetes occurrence in the high over lower potency statin groups estimated by the logistic regression model used within the Full Cohort and both matched sub-cohorts. Figure 2 shows the direct comparison of the ASDD for the examined patient characteristics within the Full Cohort, the Matched PS Sub-Cohort and the Matched hdPS Sub-Cohort. Results indicate that both matched sub-cohorts were more balanced than the unmatched Full Cohort. Although the Matched PS Sub-Cohort provided greater balance on three of the 18 examined patient characteristics (MI, hypercholesterolemia, and PVD), overall, the Matched hdPS Sub-Cohort achieved the most balanced sub-cohort (average ASDD = 0.008 and average ASDD = 0.015 in the Matched hdPS Sub-Cohort and Matched PS Sub-Cohort, respectively).

Fig. 2
figure 2

Comparison of the level of balance achieved using the absolute standardized differences obtained within the Full Cohort, the Matched PS Sub-Cohort and the Matched hdPS Sub-Cohort the examined patient characteristics. ACEI, Angiotensin converting enzyme inhibitors; ARB, Angiotensin receptor blockers; BB, Beta-blockers; CABG, Coronary artery bypass graft; Calc blockers, Calcium blockers; CHF, Congestive heart failure; hdPS, High-dimensional propensity score; PCI, Percutaneous coronary intervention; PS, Propensity score; PVD, Peripheral vascular disease

Measures of associations obtained within the Full Cohort and the two matched sub-cohorts indicated that patients in the high potency group had higher odds of developing diabetes within 2-years follow-up. Results obtained within both sub-cohorts overlapped (OR = 1.10 [95 % CI 1.05–1.15] within the Matched PS Sub-Cohort and OR = 1.13 [95 % CI 1.08–1.19] within the Matched hdPS Sub-Cohort) but both were lower than those obtained within the Full Cohort (OR = 1.22 [95%CI 1.18–1.27]).

Discussion

As expected, overall patient profiles within the Full Cohort showed imbalance on many key baseline characteristics suggesting the presence of confounding by indication. Such results would tend to indicate the presence of bias within measures of associations estimated within the Full Cohort if appropriate adjustment were not used in the analyses.

In their original paper, Schneeweiss et al. [7] assessed the performance of the hdPS method by comparing measures of associations adjusted for patients’ hdPS to the results of a RCT. By showing that the adjusted measures of association were closer to the results of the RCT than the crude measure of association, they showed that hdPS method had improved the adjustment for confounding by indication within their study. Performance of the hdPS method has been assessed by others using the same approach and their results also supported its use [1214]. Measures of association obtained within both matched sub-cohorts were closer to the null (OR = 1.10 [95%CI 1.05–1.15] within the Matched PS Sub-Cohort and OR = 1.13 [95%CI 1.08–1.19] within the Matched hdPS Sub-Cohort) than within the Full Cohort (OR = 1.22 [95%CI 1.18–1.27]). However, since the CIs obtained from both methods overlap with each other and are as precise, their performance cannot be differentiated based solely on this criterion.

However, performance based on the level of balance achieved within matched sub-cohorts does not require an additional comparator. Based on this second performance criterion, we showed that using both methods created balanced matched sub-cohorts (i.e. ASDD were < 0.1 for all patient characteristics in both matched sub-cohort). When directly comparing both sub-cohorts, use of the hdPS method was favored since 14 out of the 18 examined patient characteristics were more balanced within the Matched hdPS Sub-Cohort than within the Matched PS Sub-Cohort which should tend to lead to less biased measures of association within the Matched hdPS Sub-Cohort. Seeing as the hdPS model adjusted for more variables than the PS model, such a result was to be expected but it needed to be verified in a situation where we have prior knowledge on which confounders to adjust for. The results we show in this study support the idea that the hdPS method may be used to adjust for confounding by indication, but the possibility that residual confounding remaining after this adjustment cannot be ruled out.

Our study has several strengths. First, we compared the PS and hdPS method in a large cohort of incident statin users showing substantial imbalance suggesting the potential for confounding by indication. As such, this provided an excellent situation in which to compare the performance of both methods.

Second, our conclusions favored the hdPS method when our study design should have favored the PS method. Although all of the examined covariates were forced within the PS model, only five investigator-selected covariates were forced within the hdPS model (only demographic, socio-economic and medical resource utilization variables were forced within the hdPS model, all remaining covariates were selected by the hdPS algorithm [n = 500]) [7]. Therefore, the hdPS method performance was mainly achieved through the use of the automated hdPS algorithm and not by investigator choice.

Our study has also several limitations. First, we compared patients on a relatively small number of baseline patient characteristics. It is possible that the performance observed within the 18 prespecified patient characteristics may not be representative of the overall performance regarding all potential patient characteristics. However, these variables were selected because we believed, like others have [18], that they could lead to confounding by indication and our results show that the hdPS method achieved substantial balance within all of these even though most were not forced within the hdPS model.

Second, we defined unbalance as ASDD > 0.1. Although this cut-off is frequently used [2, 10, 16], other values could have been used as well. Regardless of the cut-off value chosen, our results indicate that the hdPS method outperformed the PS method in achieving the most balanced sub-cohort. Although this added level of balance may not eliminate all biases within the observational study (i.e. information bias, unmeasured confounders, time-varying confounding), it will at least tend to reduce the level of bias caused by these baseline characteristics.

Third, no mechanism of action by which statins could cause diabetes has been identified. Although we compared both methods using a frequently used exposure definition, we cannot claim that this exposure definition reflects the true mechanism of action by which statins could cause diabetes. It is possible that the results obtained, had we used an exposure definition reflecting the true mechanism of action, could have differed from those obtained within this study. This also reflects the fact that we do not know what the true measure of association between the exposure to statins and diabetes is. As mentioned, traditionally the hdPS method has been validated by comparing the crude and hdPS-adjusted measures of association to a gold standard measure but in this case, such a true gold standard is not available. We recognize that this would not have been the case had we conducted this comparison within an ordinary simulation study in which the truth would be defined by the investigators. However, as others have highlighted, [7, 30] the hdPS method cannot be evaluated through the use of ordinary simulation studies since its performance depends on the complexity and quantity of data available to the hdPS algorithm, levels of which cannot be reproduced within a fully artificial setting. In order to circumvent this issue, Franklin et al. [30] recently proposed that performance of the hdPS method compared to the performance of the PS method be compared using plasmode simulation studies. Using this framework, Franklin et al. showed that an investigator-independent hdPS method performed nearly as well as a fully specified PS model further supporting the use of the hdPS method in situations where little prior knowledge regarding potential confounding variables is available [30]. Such an approach may be validated in future work aimed at further evaluating the performance of the hdPS method.

Fourth, our results show that hdPS trimming removed slightly more patients than PS trimming (301 vs 55). Although this could impact our conclusion regarding the value of both methods, its impact should be marginal since the total number of patients that were trimmed in both settings remains trivial in comparison to the total sample size of the Full Cohort.

Finally, we only examined the relative performance of the PS and hdPS methods within a single context; the results obtained within this study may not be generalizable to other studies focusing on other exposure-disease associations. Furthermore, we only compared the traditional PS method estimated using a logistic regression model to hdPS method, while other methods are also available (e.g., classification and regression trees and boosting methods) [31, 32]. Future work will be needed to compare the relative performance of all these different methods.

Conclusions

In conclusion, we recommend comparing the PS and hdPS methods by means of their relative ability to select balanced sub-cohorts over their adjustment potential within ethiological studies. Although both methods adequately adjusted for confounding by indication, we cannot rule out the possibility that the hdPS method will be dominant in other contexts since it has the potential to identify confounders which are unknown to the investigators.

Ethics approval and consent to participate

Access to data was granted by the Commission d’accès à l’information and the protocol was approved by the Centre hospitalier de l’Université de Montréal ethics’ committee.

Consent for publication

Not applicable