1 Introduction

Healthcare utilization databases represent a major data source for studies seeking to infer causal relationships between treatments and outcomes. The principal challenge for such studies comes from unmeasured confounders (Fig. 1). It is important that observational studies address the issue of unmeasured confounding as it is an often-overlooked threat to validity of observational studies (Greenland 1996, 2009, 1999, 2005; Imbens and Rubin 2015; Rosenbaum 2010; VanderWeele and Ding 2017). With observational data, unmeasured confounding bias is a central limitation—uncontrolled and unmeasured covariates may confound the relationship between treatment and outcome (Rosenbaum 1991; Cornfield et al. 1959).

Fig. 1
figure 1

Causal relationship between treatment, outcome, and confounders

A comprehensive array of sensitivity analysis techniques has been designed to evaluate evidence of causation in the presence of unmeasured confounding (Greenland 1996, 1999; VanderWeele and Ding 2017; Rosenbaum and Small 2017; Greenland and Mansournia 2015; Chiba 2012; Vanderweele and Arah 2011; Schneeweiss 2006; Brumback et al. 2004; Rosenbaum and Rubin 1983). Sensitivity analysis typically assess the magnitude of possible biases, and report the level of confidence for the study results.

As a supplement to existing sensitivity analysis, the E-value was introduced to assess the strength of the unmeasured confounding that could explain away the effects, thereby casting reasonable doubt on the accuracy of the estimate (VanderWeele and Ding 2017). E-value was defined as “the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatment–outcome association, conditional on the measured covariates” (VanderWeele and Ding 2017). In general, a small magnitude of E-value should increase concern that unmeasured confounding bias might be substantial relative to the estimated effect. Since its introduction, E-values have helped alleviate the over-reliance on the p-value and offset inadequate assessments of robustness to bias. Nevertheless, the E-value derivation relies on assumed relationships between binary variables, which do not necessarily generalize to mixed binary and continuous variables. The discussion around E-values inspired us to develop a new tool to supplement previously uncovered areas in sensitivity analysis of unmeasured confounders. The method we propose aims to help researchers find a range of likely true values rather than bounding the true value or focusing on statistical significance. Presenting a range rather than simply a bounding end point gives a more balanced sense of likely values.

We use an empirical example to illustrate how this new tool works. The choice of optimal coronary revascularization methods has been vigorously debated: CABG is more invasive than PCI, has a longer recovery time, remains more costly, but the incremental cost-effectiveness ratio was favorable (Lim 2014). The SYNTAX score is a scoring system calculated based on clinical elements indicating disease stages. While it is a useful proxy for disease severity, the severity of coronary artery disease is often evaluated by physicians’ visual inspection, and the SYNTAX score is rarely calculated and thus remains unmeasured in electronic health records (EHR). Several randomized clinical trials (RCTs) found that patients with low and intermediate SYNTAX scores can be treated with PCI or CABG with equal results; and those with a high score do better with CABG (Mohr et al. 2013; Serruys et al. 2009; Kappetein et al. 2011; Thuijs et al. 2019). However, to our knowledge no observational studies have attained sufficient sample size and hence power, to detect the difference.

RCTs were often considered the “gold standard” of study types. Nevertheless, we often observe divergent findings between RCTs and empirical settings (Rothman 2014; Hernan et al. 2013). Rather than simply summarize the literature from RCTs as an evidence report, we consider how the answers might be different in our situation, and whether the conclusions drawn from RCTs can be generalized to our circumstances. In this paper we show how we use information from both settings to reach a conclusion. We make no claim that the estimate of treatment effect we generate is better than an RCT for the general population. We simply suggest that the proper estimate of the treatment effect for our population might be different from what one would see in a clinical trial. Consequently, we seek to incorporate the internal validity of a clinical trial that may not directly generalize to our unique system and population.

In this study we compare the safety and effectiveness of CABG vs. PCI for patients with stable ischemic heart disease using observational data, and introduce the “L-table,” a simulation-based, prior knowledge guided approach, which enables investigators to use external information about unmeasured confounders (e.g., SYNTAX score) to identify a plausible range of estimated true effects in observational studies. We call it the L-table, named after uncovering the “Location” of the graphical frame in the numeric table that contains the estimated true effects corresponding to assumed correlations. Our framework is flexible enough to adapt to different types of models. The statistical code for L-table is open source with customizable parameters.

This approach uses well-known relationships from the literature. Although much of the literature is focused on testing statistical significance, our focus falls on understanding plausible effect sizes. This is an often-underappreciated element in applying confounder analysis to practical decision-making. Some consideration of the practical significance of the results must follow statistical significance. This is especially true in comparative effectiveness problems where an incumbent treatment already exists. A focus on plausible effect sizes better supports some real-world applications like cost effectiveness analysis and the allocation of scarce resources to competing interventions.

2 Methods

2.1 Model framework

We assume that the effect of a treatment on the outcomes can be estimated by a generalized linear model. In this study, we use a logistic model to illustrate how we have used the proposed method to evaluate the effectiveness of cardiovascular procedures. The dependent variable is a binary outcome of interest; the independent variables include treatment, observed and unobserved confounders.

It is important to be precise in the way we talk about correlations when applying this method. The main issue is the difference between an estimator and an estimand. An estimator uses a computation on observed data to calculate an estimate. Here we will use estimand for parameters, typically unknown, that we wish to estimate. An easy source of confusion in our setting is that an estimator may be estimating different things depending on the type of data it is applied to. For example, the Pearson correlation (estimator) estimates the product-moment correlation (estimand) when applied to bivariate normal data. The same estimator calculates the Spearman correlation when applied to binary data.

Our simulation framing uses a trivariate normal distribution to generate the data that we summarize to estimate the bias from unmeasured confounders. The correlations are used to construct a simulation data set with a known true value. Correlation-based simulation is the simplest way to characterize bias in a normal theory model. Specifically, the correlations between unmeasured confounder and other components in the model were specified with assigned values to simulate the dataset generating the unmeasured confounding bias. This is one standard approach to characterizing omitted variable bias (Maddala 1983). It is a more flexible and general approach than the manipulation of odds or risk ratios from contingency tables. We will refer to these correlations as “product-moment” correlations (or “latent” correlations). This is similar to the way many econometrics books motivate limited dependent variables (Maddala 1983). The challenge is that correlations available in the literature or that inform clinical intuition may not be straight-forward estimates of the product-moment correlations. It may be necessary to transform the available correlations into these product-moment correlations to run the simulations. After the simulations have been run it may be necessary to transform the results back to the original scale for reporting in the original context.

  1. (1)

    The tetrachoric correlation case

    Tetrachoric correlation characterizes the data in a 2 by 2 table by hypothesizing a latent bivariate normal variable that is dichotomized in both dimensions to generate the 2 by 2 table. The simple Spearman correlation calculated from the 2 by 2 table is not an estimate of the latent correlation. What is needed is a tetrachoric correlation estimate.

    This will be the common case for the relationship between treatment and outcome. Fortunately, this is data we will have in observational studies. Getting the needed latent correlation estimate is simply a matter of calculating the tetrachoric correlation with appropriate software. If the original Spearman metric is desired for reporting results the dichotomization that defines the tetrachoric correlation can be mimicked in the simulation.

    The generalization of tetrachoric correlation to ordered categorical variables in one or both dimensions is called polychoric correlation. If ordered categorical omitted confounders are available in the literature the same process is used with polychoric correlations instead of tetrachoric correlations.

  2. (2)

    The point-biserial case

    Point-biserial correlation is the application of the Pearson correlation calculation to a binary and a continuous variable. Conceptually, our framework is similar to the tetrachoric correlation situation, but in this case only one of the two latent variables has been dichotomized. We want to use the latent correlation in our simulation, but the point-biserial correlation does not estimate the latent correlation. However, the biserial correlation (Cox 1974) provides an estimate of the latent correlation we need.

    This is the case we commonly need to address when we estimate the correlation between the treatment and a continuous unmeasured confounder. This estimate usually comes from the literature where we are unlikely to have access to the subject level data. Fortunately, there is a formula to transform the point-biserial correlation into a biserial correlation using only summary quantities that are often available in the literature. As in the tetrachoric correlation case, dichotomization in the simulation can transform the results to the original point-biserial scale if needed as an aid to interpretation.

Our method simulates from these underlying correlations the drivers of the causal relationship to assess the estimated true effect when an important confounder is omitted. The model framework can be illustrated in a pipeline composed of four major steps (Fig. 2).

Fig. 2
figure 2

Pipeline of model framework to generate L-table

  • Step 1. Model empirical data.

    1. 1.

      (Optional but Optimal) Balance the empirical data. We used matched propensity score weighting (other propensity score methods can be substituted).

    2. 2.

      Estimate the effect based on balanced empirical data: \({\beta }_{est\_emp}\)

    3. 3.

      Calculate proportion of treatment and outcome: treatment (%) and outcome (%)

  • Step 2. Ascertain correlation coefficients.

    1. 1.

      Calculate correlation between outcome and treatment (\({\rho }_{YT\_emp}\)) based on empirical data. In this study, we applied tetrachoric correlation for the binary outcome and binary treatment. In a log-normal cost model a biserial correlation could be used.

    2. 2.

      Deduce observed correlations between the unmeasured confounder and outcome (\({\rho }_{YU\_ext}\)), and correlations between the unmeasured confounder and treatment (\({\rho }_{TU\_ext}\)) based on external information. We calculated both polychoric correlation (for simulation input) and biserial correlation (for matching L-table’s label), because the SYNTAX score was reported in ordinal form in the RCTs, and its original form is continuous.

    3. 3.

      Make a sequence of values for each latent correlation coefficient estimate with small intervals. We created a sequence using \(\rho \) ±0.3 with 0.05 intervals.

  • Step 3. Simulation.

    1. 1.

      Construct a collection of correlation matrices, tables that contain correlation coefficients between variables. These matrices are restricted to the positive definite cases; and are based on all combinations of the latent correlation coefficients estimations (from Step 2–3).

    2. 2.

      Simulate multivariate normal distributed datasets based on the collection of correlation matrices. Each dataset is generated based on one correlation coefficient matrix.

      1. (a)

        Simulate 100 (an adjustable parameter) datasets according to each correlation matrix. Each dataset contains 1000 (an adjustable parameter) datapoints.

      2. (b)

        Dichotomize the outcomes and treatments based on the proportions of treatment and outcome in empirical data (from Step 1–3).

      3. (c)

        Reassess the correlation coefficients in simulated datasets based on the recently dichotomized variables, and result in biserial \({\rho }_{YU\_sim}\) and biserial \({\rho }_{TU\_sim}\) (estimates of the latent correlations).

      4. (d)

        Perform suitable modeling for simulated datasets. We used logistic regression in this example.

      5. (e)

        Output means of your estimates (\({\beta }_{est\_sim}\)) based on the model of each 100 datasets without inclusion of the measured cofounder, and the estimated true estimates (\({\beta }_{true\_sim}\)) with inclusion of the unmeasured confounder in the model (from Step 3-2d), along with the corresponding estimates of the latent correlation in simulated datasets (from Step 3-2c).

  • Step 4. Obtain estimated true effects.

    1. 1.

      Select a small range around your empirical estimate \({\beta }_{est\_emp}\) (from Step 1–2).We used \(\mathrm{ln}{\beta }_{est\_emp}\) ±0.1.

    2. 2.

      Subset the outputs (from Step 3-2e) based on the collections of values around \({\beta }_{est\_emp}\)(from Step 4 -1) that match \({\beta }_{est\_sim}\) (from Step 3-2e). This selected data includes \({\beta }_{est\_sim}\), \({\beta }_{true\_sim}\), \({\rho }_{YU\_sim}\), and \({\rho }_{TU\_sim}\).

    3. 3.

      Construct L-table

      1. (a)

        (optional) Output contour plot

      2. (b)

        Use the subgroup data (from Step 4–2) to tabulate \({\beta }_{true\_sim}\) to form the L-table, which employs \({\rho }_{TU\_sim}\) as column label and \({\rho }_{YU\_sim}\) as row label.

    4. 4.

      Assess the estimated true effects

Based on L-table (from step 4-3b), we can locate the estimated true effect \({\beta }_{true}\) using \({\rho }_{YU\_ext}\) and \({\rho }_{TU\_ext}\) in original context (from Step 2–2) to match the L-table’s labels. We used biserial correlations in this study.

Confidence intervals of the estimated true effect can be obtained by repeating step 4 with confidence limits from the empirical analysis replacing the effect estimate.

2.2 Simulation calculation

The simulation uses a generalized linear model relationship between outcome and treatment that is adjusted for an unmeasured confounder. The latent outcome, treatment and unmeasured variables are generated by a multivariate normal distribution. The covariance matrix is created based on the correlation matrix which is composed of the correlation coefficients guided by prior knowledge (Supplementary Figure 1). Because the latent correlation will change to an observed correlation after some variables were dichotomized in simulation, we recommend users assign an adequate range surrounding the latent correlation coefficients to span the needed range of observed correlations after dichotomization. Within this range, we specify the values of correlation coefficients with equally spaced values.

The simulations only use correlation matrices that satisfied positive definiteness. By working with the correlation matrix, we are implicitly using standardized regression coefficients. In applications with continuous outcomes transformation to the original coefficient scale will be required. The propensity score adjustment with matching weights assures covariate balance between the treatment and comparison groups, and allows us to work with our simple trivariate normal setup in simulation—so that we do not need to simulate other confounders. We also assume that the unmeasured confounder used in the simulation represents the composite effect from all unmeasured confounders.

Once the initial dataset was simulated, we dichotomized outcome and treatment variables based on their corresponding proportions as found in the empirical data. We then calculated the latent correlation coefficients of outcome and unmeasured variable, and latent correlation coefficients of treatment and unmeasured variable, in order to use them in the results graph. These updated correlation coefficients were used to label the coordinates of the contour plot, or columns and rows for the L-table. We simulated each dataset with 1000 observations based on a corresponding correlation matrix. We made 100 iterations of this data generation procedure. Mean and median of the estimated effects with and without adjusting for unmeasured confounder were obtained. We used the multivariate normal approach to generate the data that can be easily generalized to other model types upon modification of the R code (Duan 2021).

2.3 Data source

Our analysis is based on a retrospective cohort study using EHR data from the Kaiser Permanente Southern California (KPSC) Health System. KPSC provides care to a racially, ethnically and socio-economically diverse population that is broadly representative of the racial-ethnic groups of Southern California (Derose et al. 2013). The study protocol was approved by the KPSC Institutional Review Board (IRB). A waiver of informed consent was obtained due to the observational nature of the study.

Using procedure codes and International Classification of Diseases (ICD) 9/10 codes (Supplementary Table 1), we searched EHR data for adult patients (age ≥ 18 years) who underwent a revascularization procedure (CABG or PCI) between January 1st, 2006 and March 1st, 2015. We chose March 1st, 2015 as the end of the inclusion period to mitigate any impacts on hospitalization, mortality, and the means to facilitate revascularization procedures caused by the Covid-19 pandemic and to ensure this cohort has at least 5 years follow up time until March 2020, when healthcare utilization in California was affected by the state’s stay-at-home order. The index date was defined as that of the first revascularization procedure the patient received. We excluded patients who were not KPSC members or who did not hold continuous one-year membership prior to the index date (allowing for a 45-days gap), and who underwent revascularization procedures prior to the index date. To identify patients with ischemic heart disease, two conditions were used: first, a principal diagnosis of Coronary Artery disease or Angina; second, no prior history of acute myocardial infarction. For inclusion in this cohort, both conditions had to be met at least twice in outpatient visits or at least once in inpatient admission within a year before the index date (Fig. 3).

Fig. 3
figure 3

Study cohort flowchart

Using this cohort, we created sub-cohorts for the 1-year, 3-year, 5-year, and 10-year endpoints to study the treatment effects in the short- and long-term. To ensure sufficient follow-up time for each patient, patients were excluded from the sub-cohort analysis if their KPSC membership lapsed prior to the end of each study period. Patients with insufficient follow-up time but experienced the endpoints wouldn’t be captured by our data, hence they were excluded from the particular sub-cohort analysis. Patients who experienced an outcome event in early sub-cohorts will be considered as having an outcome event in the later sub-cohorts regardless of the change in their membership status. We moved forward the end of inclusion date for the 10-year cohort to March 1st, 2010 to allow sufficient follow-up time. Patients’ disease history and outcomes were identified using the ICD 9/10 codes (Supplementary Table 2).

We identified covariates in the following categories: baseline demographics, medical comorbidities, cardiac risk factors, and cardiac medication usage. Medical comorbidities and cardiac risk factors were collected using ICD codes for one year prior to the index date. Baseline concomitant medications were identified using outpatient pharmacy records. The study endpoints include all-cause mortality, hospitalization from either myocardial infarction or stroke, repeat revascularization, and composite major cardiovascular events (MACCE) (Mohr et al. 2013), defined as one of above endpoints by the end of 1 year, 3 years, 5 years, or 10 years from the index date.

Mortality data was pulled from a mortality data mart derived from multiple sources: state of California’s death master files, Social Security Administrative death master files, hospital death records, and insurance enrollment records. The endpoints for myocardial infarction and stroke were identified with a principal diagnosis at inpatient settings. The repeated revascularization procedures were identified as the event after discharge following the first procedure.

2.4 External information

To locate the estimated true effects using the L-table, we need to identify potential unmeasured confounders and to extract external information from available sources. The essential information concerns the correlation between unmeasured confounding and outcomes, and the correlation between unmeasured confounding and treatment. We need one L-table for each outcome. These correlation coefficients can be obtained from a pilot study or other relevant publications. Users may need to perform transformations or simulations to convert the source information to suitable forms. For instance, based on available information, user can transform the point-biserial correlation into a biserial correlation; dichotomize continuous values in simulation to form a point-biserial scale.

We derived the correlation coefficients between a SYNTAX score and outcomes from published results of the SYNTAX trials (Mohr et al. 2013; Serruys et al. 2009; Kappetein et al. 2011; Thuijs et al. 2019). In this empirical example, we considered a SYNTAX score representing a linear combination of multiple omitted confounders. We chose to gather this information from RCTs because most sources of measured and unmeasured confounding are mitigated by the study design. In applications where a high quality RCT is not available, a plausible range of correlations may be estimated using clinical judgement. We derived the correlation between the SYNTAX scores and treatment from an observational study (Valle et al. 2019) which accounts for the behavior effects of physicians’ and patients’ treatment choices. Subjective choices of this correlation may be used for sensitivity analysis or when suitable observational analyses are not available.

2.5 Statistical analysis

Descriptive statistics on patients’ characteristics, including demographics, comorbidity, and medication history were reported by treatment groups using frequencies and percentages. We calculated standardized mean difference before and after applying matching weights. A difference of 0.10 or less was considered an adequate balance between the two groups.

To resemble the enrollment patterns in RCTs and adjust for selection bias, we prepared data with matching weight (Li and Greene 2013), a propensity score weighting method. Matching weights are a variant of the inverse probability weight, but the matching weight estimator assigns greater weight to individuals whose propensity score is close to 0.5 (i.e., the circumstance generated in a two-arm RCT with clinical equipoise). The underlying propensity score model used a logistic regression between treatment (dependent variable) and 57 baseline covariates (independent variables), which included age, sex, race, comorbidities and baseline medication use.

Logistic regression was performed to assess the treatment effects. We reported crude (unadjusted), and matching weighted Odds Ratios, along with their 95% confidence interval, p values, and E-values. A small E value suggested that unmeasured confounding should be a concern (VanderWeele and Ding 2017). A p value of < 0.05 was the nominal level of significance.

Using simulated datasets, we assessed estimated effects with and without the unmeasured confounder in the model. Using the method of bivariate interpolation for irregularly distributed data points (Akima 1978, 1996), we are able to plot the estimated true effects against biserial correlations coefficients between outcome and unmeasured confounding, and between treatment and unmeasured confounding. The pattern of the estimated true effect in relation to these correlations can be visualized in the contour plot. We tabulated these estimated true effects to form the L-table, similar to the assembly of a Chi-squared table, where the column was labeled with the correlation coefficients between treatment and unmeasured confounders, and the row was labeled with the correlation coefficients between outcome and unmeasured confounders. The user can adopt the L-table as a reference to locate a plausible range of estimated true effects with prior information about the correlations (\({\rho }_{YU\_ext}\) and \({\rho }_{TU\_ext}\)). The values bounded by the perimeter within the L-table represent the estimated true values of the causal effect on the scale of the applied analytic model. This effect is aligned with the correlations between unmeasured confounding and outcome on the one hand, and between unmeasured confounding and treatment—conditional on the measured covariates—on the other.

Published studies may report varied findings. The polychoric correlation between treatment selection and the SYNTAX score derived from external publication we used was 0.625. To explore how sensitive the L-Table is to variation in inputs, we performed a sensitivity analysis by assigning two different correlation coefficients (0.5 and 0.7) as the relationships between the treatment and the unmeasured confounders in simulation, and ran the models with the same procedures. All statistical analyses were conducted in SAS 9.4 (SAS Institute Inc., Cary, NC) and R version 4.1.0 (R core Team 2021).

3 Results

We identified 12,216 adult KPSC members with stable ischemic heart disease who underwent PCI or CABG between January 1, 2006 and March 1, 2015. Among these patients, 5513 received CABG, and 6703 received PCI. In this group, 11,298 patients maintained their memberships by the 5th year: 5158 received CABG and 6140 received PCI (Table 1). The average age of CABG patients was 2.5 years older than their PCI counterparts (p < 0.001). More men than women underwent CABG (77.5% vs. 73.5%, p < 0.001). Asians were more likely to undergo CABG (11.6%) compared to PCI (9.0%). In general, CABG patients had more comorbidities and more frequently used cardiac medications. The baseline characteristics of the study cohort between CABG and PCI were balanced (SMD < 0.1) after propensity-score adjustment with matching weights. We used the weighted cohorts for our analysis.

Table 1 Characteristics of study cohort included at 5 years endpoints

After propensity score adjustment, patients treated with CABG were less likely to experience MACCE, mortality, hospitalization for MI, or repeat revascularization (Table 2). However, CABG patients are more likely to be hospitalized due to stroke after the revascularization procedure, but these effects were not statistically significant. All E-values are small, which suggests that unmeasured confounding should be considered.

Table 2 Crude and adjusted treatment effects of CABG and PCI at the end of 1, 3, 5, 10 years

We assessed the proportion of outcome and the proportion of treatment from adjusted empirical data, and correlation coefficients between each of the end points and SYNTAX score (low, intermediate, high) based on the SYNTAX trials (Mohr et al. 2013; Serruys et al. 2009; Kappetein et al. 2011; Thuijs et al. 2019) (Supplementary Tables 3, 4). Using 5-year MACCE as an example, we first calculated the tetrachoric correlation coefficients between treatment and outcome from our data (\({\uprho }_{\mathrm{YT}}\) = − 0.047); and derived the polychoric correlation coefficients based on an RCT study (Mohr et al. 2013) (\({\uprho }_{\mathrm{YU}}\) = 0.062) and a population-based study (Valle et al. 2019) (\({\uprho }_{\mathrm{TU}}\) = 0.625). We then assigned these parameters to generate simulated data, where the treatment (45.7%) and outcome (25.1%) variables would be dichotomized based on empirical data.

The contour plot (Fig. 4) shows the estimated true effects when the estimated effects from simulation were within a close range of the estimated effect for MACCE at 5 years (OR = 0.759). A point or range of true effects can be identified based on the correlation coefficients on the horizontal and vertical labels. This set of the estimated true effects were tabulated to form the L-table (Supplement Figure 2). We presented an example of a partial L-table (Table 3) to illustrate how we identify the estimated true effects based on the present latent correlation (biserial correlation): \({\uprho }_{\mathrm{TU}}\) = 0.586 and \({\uprho }_{\mathrm{YU}}\) = 0.055. We took the mean of the values bordered by the frame and reached the desired Odds Ratio: 0.511.

Fig. 4
figure 4

Contour plot generated by L-table method illustrating estimated true effects corresponding to specified correlations

Table 3 Sample of partial L-table where the true effect is identified (MACCE at 5 year, \({OR}_{est}\)=0.759, \({\rho }_{TU}\)=0.586, \({\rho }_{YU}\)=0.055)

After the adjustments made with the L-table, we found the estimated true effects all shifted in the direction that favors CABG as a much more effective procedure that led to better health outcomes compared to PCI, including hospitalization due to stoke at 5 year (OR [95% CI] 0.824 [0.649, 1.035]) (Table 2). These results suggest that the unmeasured confounder –disease severity, may have undermined the true contribution of CABG, because sicker people are more likely to receive CABG.

Different correlation inputs resulted in different L-table adjusted true effects. For 5-year MACCE, our original estimated true odds ratio was 0.511 (95% CI [0.451, 0.538]). When we assign \({\rho }_{TU}\hspace{0.17em}\)= 0.5 or 0.7 in simulation, the L-table adjusted true odds ratio became 0.45 (95% CI [0.397, 0.516]) and 0.517 (95% CI [0.459, 0.607]). These directions of the adjusted effect were the same. However, we found that hospitalization due to stroke at 5 years was no longer statistically significant (OR [95% CI] 0.824 [0.649, 1.035]) when we assign \({\rho }_{TU}\hspace{0.17em}\)= 0.7 in simulation (Table 4).

Table 4 Sensitivity analysis for variation of correlation inputs

4 Discussion

We introduce the L-table and illustrate how it can be used to make potential causal inferences using observational data. We have illustrated the application of the L-table in a real-world example, studying the safety and effectiveness of CABG vs. PCI for patients with ischemic heart disease at 1, 3, 5 and 10 years. We incorporated external information into the simulation with customizable parameters to locate the plausible range of estimated true effects. The foundation of the L-table is the correlation matrix for an ordinary linear estimate in simple or multivariate linear regression, but can be applied to most/many generalized linear models.

We found CABG is associated with better safety and effectiveness than PCI for patients with stable ischemic heart disease. After L-table adjustment, we found that the Odds Ratio shifted to smaller values, suggesting that unmeasured confounding decreased CABG’s estimated advantage in standard analyses. We would have underestimated CABG’s greater protective effect if disease severity were not accounted for.

In RCTs, the correlation between treatment and unmeasured confounders is approximately zero by design. In observational studies we found that the correlations between treatment selection and SYNTAX score are similar in a US-based study (Valle et al. 2019) (which we used in this study) and in a South Korean study (Kim et al. 2010), suggesting that clinicians from different countries share treatment preferences. The L-table adjusted effects vary depending on the simulation inputs. In this setting, sensitivity analysis may be more important than literal interpretation of standard errors. Therefore, users are advised to address the generalizability of the external data they refer to.

To choose between CABG and PCI is more complex in real life than in a RCT setting. Our results confirmed the findings from the RCTs, and validated the generalizability of the inference from the SYNTAX trials to our study. RCTs have great internal validity but may have poor external validity. The strategy we deployed in this study—extracting information (i.e., correlations) from external sources (i.e., RCTs), and integrating the information into our observational data, enables us to ask the following question: do we see the same results in our system and population (in the real world) as found in RCTs?

Existing sensibility tests on unmeasured confounders primarily work with a two-by-two-by-two table setting. Although this dichotomous-variable setting allows the application of convenient mathematical relationships, it does not always generalize to broader settings. The L-table approach allows continuous measures in each component and by dichotomization or transformation facilitates generating a broad range of measure types. We have set up an infrastructure to move into a richer class of problems, and to help investigators broaden the application areas. In addition, conventional sensitivity analysis often only tells the users how much unmeasured confounding is needed to explain away the effects or statistical significance, while the L-table approach considers the plausible true effects. This can support cost effectiveness analysis and other practical decision making. Furthermore, the L-table approach exploits existing knowledge about unmeasured confounders to adjust the estimated effects. Future work includes modifying the L-table framework for the application of other analytical methods (i.e., time to event models, non-model-based randomization inference) or more complicated scenarios (i.e., reverse causality).

L-table offers a framework that is flexible to use. The multivariate normal data generating process provides a foundation for users to assign different functional forms. In addition, we assumed the unmeasured confounding in the simulation is a composite influence from all unmeasured confounders that play a role in the causal relationships. Consequently, the user needs only deduce a single proxy based on her knowledge of the unmeasured confounding. Finally, with a generalized linear model functional form, users can apply different analytical approaches to modify the simulation method accordingly.

Our simulation produces datasets with comparable properties as the balanced (PS-adjusted) empirical data, but with the addition of a simulated unmeasured confounder (not available in the empirical data). Users are encouraged to perform a propensity score adjustment on their observational data. This step balances the treatment and control group on measured confounders, and allows the simple trivariate normal setup in the simulation. In this study, we used matching weights adjusted real-world data to emphasize the causal effects inference on patients whose characteristics are such that each treatment has an equivalent chance to be chosen (i.e., we have equipoise).

A few limitations should be mentioned. Our proposed method is based on multiple assumptions, but the assumptions can be modified by the user when they adapt the L-table’s framework. Second, the L-table is simulation based, so it comes without a closed-form solution. Further work is warranted to develop a mathematically supported theory. Third, the correlation between SYNTAX scores and outcomes are small. Although many studies have supported the utility of SYNTAX scores in the selection of revascularization strategies, contradictory findings have undermined the validity of such scores (He et al. 2020). Finally, the L-table adjusted true effects are sensitive to the input parameters. We advise users discuss the assumptions and limitations of their selected external sources and proxies.

5 Conclusion

We found CABG is associated with better clinical outcomes than PCI when treating patients with stable ischemic heart disease. Our results confirmed the findings from SYNTAX trials and validated the generalizability of the inference from the RCTs with our observational data. The L-table, built with customizable parameters, adaptable models, and modifiable precision, provides investigators with a more plausible value of estimated true effects based on the influence of unmeasured confounders than is currently possible via other sensitivity techniques. Our methods lay out richer information and clearer directions than existing sensitivity tools, and can thus better support clinical decision-making. Finally, the L-table provides investigators with a deeper understanding of the likely treatment effects in the real world, and hence engenders greater confidence in study results based on both RCTs and observational data. We recommend using the L-table as a supplement to available sensitivity analyses.