Abstract
Purpose of Review
Propensity score methods have become commonplace in pharmacoepidemiology over the past decade. Their adoption has confronted formidable obstacles that arise from pharmacoepidemiology’s reliance on large healthcare databases of considerable heterogeneity and complexity. These include identifying clinically meaningful samples, defining treatment comparisons, and measuring covariates in ways that respect sound epidemiologic study design. Additional complexities involve correctly modeling treatment decisions in the face of variation in healthcare practice and dealing with missing information and unmeasured confounding. In this review, we examine the application of propensity score methods in pharmacoepidemiology with particular attention to these and other issues, with an eye towards standards of practice, recent methodological advances, and opportunities for future progress.
Recent Findings
Propensity score methods have matured in ways that can advance comparative effectiveness and safety research in pharmacoepidemiology. These include natural extensions for categorical treatments, matching algorithms that can optimize sample size given design constraints, weighting estimators that asymptotically target matched and overlap samples, and the incorporation of machine learning to aid in covariate selection and model building.
Summary
These recent and encouraging advances should be further evaluated through simulation and empirical studies, but nonetheless represent a bright path ahead for the observational study of treatment benefits and harms.
Similar content being viewed by others
References
Papers of particular interest, published recently, have been highlighted as:• Of importance •• Of major importance
Walker AM. Confounding by indication. Epidemiology. 1996;7:335–6.
Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58:323–37.
•• Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183:758–64. Elaborates how to construct an observational study in the image of a target randomized trial from large observational datasets, i.e. “big data”. Considers how to adjust the formulation of the target trial according to the quality of the observational data, how to manipulate these data to emulate each of the main components of the target trial, and how to address potential methodological challenges posed by the observational nature of these data
Ray WA. Evaluating medication effects outside of clinical trials: new-user designs. Am J Epidemiol. 2003;158:915–20.
Johnson ES, Bartman BA, Briesacher BA, et al. The incident user design in comparative effectiveness research. Pharmacoepidemiol Drug Saf. 2013;22:1–6.
• Brookhart MA. Counterpoint: the treatment decision design. Am J Epidemiol. 2015;182:840–5. The treatment decision design extends the new-user design to address pharmacoepidemiological problems beyond those in which patients are observed from the start of exposure without compromising its ability to establish temporal ordering among study variables and yield causal estimates for clinically relevant comparisons
Seeger JD, Walker AM, Williams PL, Saperia GM, Sacks FM. A propensity score-matched cohort study of the effect of statins, mainly fluvastatin, on the occurrence of acute myocardial infarction. Am J Cardiol. 2003;92:1447–51.
Toh S, Gagne JJ, Rassen JA, Fireman BH, Kulldorff M, Brown JS. Confounding adjustment in comparative effectiveness research conducted within distributed research networks. Med Care. 2013;51:S4–10.
Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70–5.
Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. J R Stat Soc Ser A. 2008;171:481–502.
Kurth T, Walker AM, Glynn RJ, Chan KA, Gaziano JM, Berger K, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006;163:262–70.
Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika. 2009;96:187–99.
Stürmer T, Rothman KJ, Avorn J, Glynn RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution-a simulation study. Am J Epidemiol. 2010;172:843–54.
Neyman J. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat Sci. 5:465–72.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60:578–86.
Rubin DB. Comment on: randomization analysis of experimental data: the Fisher randomization test by D. Basu J Am Stat Assoc. 1980;75:575–82.
VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009;20:880–3.
Halloran ME, Hudgens MG. Dependent happenings: a recent methodological review. Curr Epidemiol Reports. 2016;3:297–305.
Vanderweele TJ, Hernán MA. Causal inference under multiple versions of treatment. J Causal Inference. 2014;1:1–20.
Joffe MM, Rosenbaum PR. Invited commentary: propensity scores. Am J Epidemiol. 1999;150:327–33.
Imbens GW. The role of the propensity score in estimating dose-response functions. Biometrika. 2000;87:706–10.
Imai K, van Dyk D (2004) Causal inference with general treatment regimes: generalizing the propensity score. J Am Stat Assoc 99:854–866.
Westreich D, Cole SR, Funk MJ, Brookhart MA, Stürmer T. The role of the c-statistic in variable selection for propensity score models. Pharmacoepidemiol Drug Saf. 2011;20:317–20.
Moodie EM, Stephens DA. Treatment prediction, balance and propensity score adjustment. Epidemiology. 2017; https://doi.org/10.1097/EDE.0000000000000657.
Sauer BC, Brookhart MA, Roy J, VanderWeele TJ. A review of covariate selection for non-experimental comparative effectiveness research. Pharmacoepidemiol Drug Saf. 2013;22:1139–45.
Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163:1149–56.
Ding P, VanderWeele TJ, Robins JM. Instrumental variables as bias amplifiers with general outcome and confounding. Biometrika. 2017;104:291–302.
Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, et al. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol. 2011;174:1213–22.
Zhu Y, Schonbach M, Coffman DL, Williams JS. Variable selection for propensity score estimation via balancing covariates. Epidemiology. 2015;26:e14–5.
• Ding P, Miratrix L. To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias. J Causal Inference. 2014;3:41–57. Addresses the debate as to whether one should adjust in M-structures in which a pretreatment covariate M is a collider for two latent factors. Presents theoretical results comparing the bias between adjusting and not adjusting for M in various scenarios of linear structural equation models, including independent latent factors, correlated latent factors, and when M is also a confounder. Advises for adjusting for M in general except for in certain situations, e.g., when the system is close to deterministic
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–22.
Schuster T, Pang M, Platt RW. On the role of marginal confounder prevalence—implications for the high-dimensional propensity score algorithm. Pharmacoepidemiol Drug Saf. 2015;24:1004–7.
Franklin JM, Eddings W, Glynn RJ, Schneeweiss S. Regularized regression versus the high-dimensional propensity score for confounding adjustment in secondary database analyses. Am J Epidemiol. 2015;182:651–9.
Schneeweiss S, Eddings W, Glynn RJ, Patorno E, Rassen J, Franklin JM. Variable selection for confounding adjustment in high-dimensional covariate spaces when analyzing healthcare databases. Epidemiology. 2017;28:237–48.
Vanderweele TJ, Shpitser I. A new criterion for confounder selection. Biometrics. 2011;67:1406–13.
Vansteelandt S, Bekaert M, Claeskens G. On model selection and model misspecification in causal inference. Stat Methods Med Res. 2012;21:7–30.
Wang C, Parmigiani G, Dominici F. Bayesian effect estimation accounting for adjustment uncertainty. Biometrics. 2012;68:661–71.
Gruber S, van der Laan MJ (2015) Consistent causal effect estimation under dual misspecification and implications for confounder selection procedures. Stat Methods Med Res 24:1003–1008.
Schnitzer ME, Lok JJ, Gruber S. Variable selection for confounder control, flexible modeling and collaborative targeted minimum loss-based estimation in causal inference. Int J Biostat. 2016;12:97–115.
Greenland S, Daniel R, Pearce N. Outcome modelling strategies in epidemiology: traditional methods and basic alternatives. Int J Epidemiol. 2016;45:565–75.
Wyss R, Girman CJ, LoCasale RJ, Brookhart AM, Stürmer T. Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study. Pharmacoepidemiol Drug Saf. 2013;22:77–85.
Gilbertson DT, Bradbury BD, Wetmore JB, et al. Controlling confounding of treatment effects in administrative data in the presence of time-varying baseline confounders. Pharmacoepidemiol Drug Saf. 2016;25:269–77.
Brunelli SM, Gagne JJ, Huybrechts KF, Wang SV, Patrick AR, Rothman KJ, et al. Estimation using all available covariate information versus a fixed look-back window for dichotomous covariates. Pharmacoepidemiol Drug Saf. 2013;22:542–50.
Nakasian SS, Rassen JA, Franklin JM. Effects of expanding the look-back period to all available data in the assessment of covariates. Pharmacoepidemiol Drug Saf. 2017; https://doi.org/10.1002/pds.4210.
Brookhart MA, Sturmer T, Glynn RJ, Rassen JA, Schneeweiss S. Confounding control in healthcare database research: challenges and potential approaches. Med Care. 2010;48:S114–20.
Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med. 2010;29:337–46.
Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics. 2017; https://doi.org/10.1111/biom.12679.
McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods. 2004;9:403–25.
Mccaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med. 2013;32:3388–414.
Imai K, Ratkovic M. Covariate balancing propensity score. J. R. Statist. Soc. B. 2014;76:243–63.
Ning Y, Peng S, Imai K (2017) High dimensional propensity score estimation via covariate balancing. Available at http://imai.princeton.edu/research/hdCBPS.html. Accessed 30 June 2017.
Wyss R, Ellis AR, Brookhart MA, Girman CJ, Funk MJ, LoCasale R, et al. The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bcart, and the covariate-balancing propensity score. Am J Epidemiol. 2014;180:645–55.
McCaffrey DF, Lockwood JR, Setodji CM. Inverse probability weighting with error-prone covariates. Biometrika. 2013;100:671–80.
Hong H, Rudolph KE, Stuart EA. Bayesian approach for addressing differential covariate measurement error in propensity score methods. Psychometrika. 2016;1–19
Webb-Vargas Y, Rudolph KE, Lenis D, Murakami P, Stuart EA. An imputation-based solution to using mismeasured covariates in propensity score analysis. Stat Methods Med Res. 2015; https://doi.org/10.1177/0962280215588771.
Walker AM. Matching on provider is risky. J Clin Epidemiol. 2013;66:S65–8.
Dusetzina SB, Mack CD, Stürmer T. Propensity score estimation to address calendar time-specific channeling in comparative effectiveness research of second generation antipsychotics. PLoS One. 2013; https://doi.org/10.1371/journal.pone.0063973.
Wyss R, Ellis AR, Lunt M, Brookhart MA, Glynn RJ, Stürmer T. Model misspecification when excluding instrumental variables from PS models in settings where instruments modify the effects of covariates on treatment. Epidemiol Method. 2014;3:83–96.
• de los Angeles Resa M, Zubizarreta JR. Evaluation of subset matching methods and forms of covariate balance. Stat Med. 2016;35:4961–79. Finds through simulation studies that optimal matching methods such as cardinality matching and optimal subset matching outperform nearest neighbor matching with respect to balancing covariates, maximizing size of the matched samples, minimizing covariate distances between matched pairs, and estimating the treatment effect. Advises matching with fine balance on nominal covariates (i.e. forcing marginal distributions to be identical between treated and comparison groups) and with stronger balance than the heuristic of limiting standardized mean differences to under 0.1
Belitser S V., Martens EP, Pestman WR, Groenwold RHH, de Boer A, Klungel OH (2011) Measuring balance and model selection in propensity score methods. Pharmacoepidemiol Drug Saf 20:1115–1129.
• Franklin JM, Rassen JA, Ackermann D, Bartels DB, Schneeweiss S. Metrics for covariate balance in cohort studies of causal effects. Stat Med. 2014;33:1685–99. Compares ten single overall measures with respect to their association with bias in the estimation of a treatment effect across seven simulation scenarios with varying specifications of the covariate-exposure associations, covariate-outcome associations, and sample size. Concludes that the standardized difference, post-matching C-statistic, and general weighted difference performed the best overall
Ali MS, Groenwold RHH, Belitser S V., Pestman WR, Hoes AW, Roes KCB, Boer A De, Klungel OH (2015) Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review. J Clin Epidemiol 68:112–121.
Hansen BB. The essential role of balance tests in propensity-matched observational studies: comments on “a critical appraisal of propensity-score matching in the medical literature between 1996 and 2003” by Peter Austin, statistics in medicine. Stat Med. 2008;27:2050–4.
Hansen BB. The prognostic analogue of the propensity score. Biometrika. 2008;95:481–8.
Leacy FP, Stuart EA. On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Stat Med. 2014;33:3488–508.
Stuart EA. Matching methods for causal inference: a review and a look forward. Stat Sci. 2010;25:1–21.
• Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34:3661–79. Notes that most applications of IPTW using the propensity score to estimate effects do not include essential balance diagnostics. Describes the importance of measuring balance in the weighted sample. Advises using weighted standardized differences to compare means, higher-order moments, and interactions as well as cumulative distribution functions, side-by-side boxplots, and the Kolmogorov-Smirnov test statistic to compare both qualitatively and quantitatively the distributions of continuous variables between treatment groups in the weighted sample
Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79:516–24.
Kang JDY, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22:523–39.
Lunt M. Selecting an appropriate caliper can be essential for achieving good balance with propensity score matching. Am J Epidemiol. 2014;179:226–35.
Wang SV, Schneeweiss S, Rassen JA. Optimal matching ratios in drug safety surveillance. Epidemiology. 2014;25:772–3.
Rassen J, Shelat A, Franklin JM, Glynn RJ, Solomon DH, Schneeweiss S. Matching by propensity score in cohort studies with three treatment groups. Epidemiology. 2013;24:401–9.
D’agostino RB. Tutorial in biostatistics propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med Stat Med. 1998;17:2265–81.
Rudolph KE, Colson KE, Stuart EA, Ahern J. Optimally combining propensity score subclasses. Stat Med. 2016;35:4937–47.
Desai RJ, Rothman KJ, Bateman BT, Hernandez-Diaz S, Huybrechts KF. A propensity score based fine stratification approach for confounding adjustment when exposure is infrequent. Epidemiology. 2016;28:249–57.
Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat. 1985;39:33–8.
Hansen BB. Full matching in an oservational study of coaching for the SAT. J Am Stat Assoc. 2004;99:609–18.
Horvitz DG, Thompson D. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc. 1952;44:663–85. Available from: http://www.jstor.org/stable/2280784
Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60.
VanderWeele TJ. On the distinction between interaction and effect modification. Epidemiology. 2009;20:863–71.
Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14:680–6.
Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168:656–64.
•• Jackson JW. Diagnostics for confounding of time-varying and other joint exposures. Epidemiology. 2016;27:859–69. Provides a framework to assess confounding with respect to joint or time-varying exposures that are common in pharmacoepidemiology. This includes one diagnostic that assesses time-varying confounding in the study population, another that identifies exposure-covariate feedback that indicates the use of g-methods, and a third that assesses time-varying confounding in weighted or stratified populations following the use of g-methods. Further guidance is given regarding how to estimate these diagnostics, present them graphically, and adapt them to settings of right censoring
Schneeweiss S. Developments in post-marketing comparative effectiveness research. Clin Pharmacol Ther. 2007;82:143–56.
Li L, Greene T. A weighting analogue to pair matching in propensity score analysis. Int J Biostat. 2013;9:215–34.
Li F, Morgan KL, Zaslavsky AM. Balancing covariates via propensity score weighting. J Am Stat Assoc. 2016; https://doi.org/10.1080/01621459.2016.1260466.
Yoshida K, Hernández-Díaz S, Solomon DH, Jackson JW, Gagne JJ, Glynn RJ, et al. Matching weights to simultaneously compare three treatment groups. Epidemiology. 2017;28:387–95.
Franklin JM, Eddings W, Austin PC, Stuart EA, Schneeweiss S. Comparing the performance of propensity score methods in healthcare database studies with rare outcomes. Stat Med. 2017; https://doi.org/10.1002/sim.7250.
Vansteelandt S, Daniel RM. On regression adjustment for the propensity score. Stat Med. 2014;33:4053–72.
Ray WA, Liu Q, Shepherd BE. Performance of time-dependent propensity scores: a pharmacoepidemiology case study. Pharmacoepidemiol Drug Saf. 2015;24:98–106.
Hernán MA, Robins JM. Longitudinal causal inference. In: International encyclopedia of the social behavioral sciences. 2nd ed. Oxford, England: Elsevier; 2015. p. 340–4.
Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, et al. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Stat Med. 2012;31:2000–9.
Vansteelandt S, Joffe M. Structural nested models and G-estimation: the partially realized promise. Stat Sci. 2014;29:707–31.
Shinohara RT, Narayan AK, Hong K, Kim HS, Coresh J, Streiff MB, et al. Estimating parsimonious models of longitudinal causal effects using regressions on propensity scores. Stat Med. 2013;32:3829–37.
VanderWeele TJ, Jackson JW, Li S. Causal inference and longitudinal data: a case study of religion and mental health. Soc Psychiatry Psychiatr Epidemiol. 2016;51:1457–66.
Stürmer T, Schneeweiss S, Avorn J, Glynn RJ. Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol. 2005;162:279–89.
Stürmer T, Schneeweiss S, Rothman KJ, Avorn J, Glynn RJ. Performance of propensity score calibration--a simulation study. Am J Epidemiol. 2007;165:1110–8.
Lin HW, Chen YH. Adjustment for missing confounders in studies based on observational databases: 2-stage calibration combining propensity scores from primary and validation data. Am J Epidemiol. 2014;180:308–17.
Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15:291–303.
VanderWeele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology. 2011;22:42–52.
Arah OA. Bias analysis for uncontrolled confounding in the health sciences. Annu Rev Public Health. 2017;38:23–38.
•• Ding P, Vander Weele TJ. Sensitivity analysis without assumptions. Epidemiology. 2016;27:368–77. Introduces a bounding factor for analyses of sensitivity to unobserved confounding in observational studies that does not require the investigator to assume that there is only a single binary confounder or that there is not exposure-confounder interaction of effects on the outcome. By specifying two sensitivity parameters in the bounding factor, representing the strength of confounding between the exposure and outcome induced by unmeasured confounding, the investigator can determine the degree to which unmeasured confounding could explain the observed effect estimate
Li L, Shen C, Wu AC, Li X. Propensity score-based sensitivity analysis method for uncontrolled confounding. Am J Epidemiol. 2011;174:345–53.
McCandless LC, Gustafson P, Levy AR. A sensitivity analysis using information about measured confounders yielded improved uncertainty assessments for unmeasured confounding. J Clin Epidemiol. 2008;61:247–55.
Lash TL, Fox MP, Fink AK. Applying quantitative bias analysis to epidemiologic data. New York: Springer-Verlag; 2009.
•• Zubizarreta JR, Cerdá M, Rosenbaum PR. Effect of the 2010 Chilean earthquake on posttraumatic stress. Epidemiology. 2013;24:79–87. Uses recently developed propensity score methods as part of a greater effort to use design and analytical elements that are tailored to detect the specific patterns of effect they have hypothesized. Promotes attention on how to formally mitigate sensitivity to unobserved confounding a priori
Brumback B, Hernán MA, Haneuse SJ, Robins JM. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Stat Med. 2004;23:749–67.
Burne R, Abrahamowicz M. Martingale residual-based method to control for confounders measured only in a validation sample in time-to-event analysis. Stat Med. 2016;35:4588–606.
Zou B, Zou F, Shuster JJ, Tighe PJ, Koch GG, Zhou H. On variance estimate for covariate adjustment by propensity score analysis. Stat Med. 2016;35:3537–48.
Cefalu M, Dominici F, Arvold N, Parmigiani G. Model averaged double robust estimation. Biometrics. 2017;73:410–21.
Funding
Ian Schmid reports grants from the US Department of Education Institute of Education Sciences during the conduct of the study. Elizabeth A. Stuart reports grants from the National Institute of Mental Health during the conduct of the study, grants from the Patient Centered Outcomes Research Institute, and grants from the US Department of Education, Institute of Education Sciences, outside the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
This article contains no studies with human or animal subjects performed by any of the authors.
Additional information
This article is part of the Topical Collection on Pharmacoepidemiology
Rights and permissions
About this article
Cite this article
Jackson, J.W., Schmid, I. & Stuart, E.A. Propensity Scores in Pharmacoepidemiology: Beyond the Horizon. Curr Epidemiol Rep 4, 271–280 (2017). https://doi.org/10.1007/s40471-017-0131-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40471-017-0131-y