Skip to main content
Log in

Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evaluation of a complex multi-state federal demonstration

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

We conducted an evaluation of a patient-centered medical home demonstration sponsored by the Centers for Medicare & Medicaid Services. We implemented a quasi-experimental pre-post with a comparison group design. Traditional propensity score weighting failed to achieve balance (exchangeability) between the two groups on several critical characteristics. In response, we incorporated a relatively new alternative known as entropy balancing. Our objective is to share lessons learned from using entropy balancing in a quasi-experimental study design. We document the advantages and challenges with using entropy balancing. We also describe a set of best practices, and we present a series of illustrative analyses that empirically demonstrate the performance of entropy balancing relative to traditional propensity score weighting. We compare alternative approaches based on: (i) covariate balance (e.g., standardized differences); (ii) overlap in conditional treatment probabilities; and (iii) the distribution of weights. Our comparison of overlap is based on a novel approach we developed that uses entropy balancing weights to calculate a pseudo-propensity score. In many situations, entropy balancing provides remarkably superior covariate balance compared to traditional propensity score weighting methods. Entropy balancing is also preferred because it does not require extensive iterative manual searching for an optimal propensity score specification. However, we demonstrate that there are some situations where entropy balancing “fails”. Specifically, there are instances where entropy balancing achieves adequate covariate balance only by using a distribution of weights that dramatically up-weights a small set of observations, giving them a disproportionately large and undesirable influence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Exchangeability refers to a technical condition where treatment and comparison groups are equivalent in terms of all observed and unobserved dimensions that are related to the receipt of treatment and the outcome of interest. Covariate balance refers to an assessment the similarity of all observed dimensions across treatment and comparison observations. Thus, assessments of covariate balance support whether exchangeability is a reasonable assumption to make.

  2. For an overview of alternative propensity score methods, we refer the reader to Austin (2011).

  3. Brookhart et al. (2006) recommend including anything correlated with the outcome, and excluding anything that is only correlated with treatment assignment. However, beyond this there is little guidance.

  4. In the MAPCP Demonstration group we could perfectly identify practices. However, in the comparison groups, we were only able to identify groups of providers within a tax identification number. Thus, it was difficult to compare practice size calculations across the comparison group and MAPCP Demonstration group.

  5. R has some packages that can be used to do this (TWANG and OPTMATCH).

References

  • Austin, P.C.: The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Med. Decis. Mak 29, 661–667 (2009a)

    Article  Google Scholar 

  • Austin, P.C.: Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun. Stat.-Simul. Comput. 38, 1228–1234 (2009b)

    Article  Google Scholar 

  • Austin, P.C.: Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28, 3083–3107 (2009c)

    Article  PubMed  PubMed Central  Google Scholar 

  • Austin, P.C.: The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute reductions) in observational studies. Stat. Med. 29, 2137–2148 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  • Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46, 399–424 (2011)

    Article  Google Scholar 

  • Austin, P.C., Mamdani, M.M., Stukel, T.A., Anderson, G.M., Tu, J.V.: The use of the propensity score for estimating treatment effects: administrative versus clinical data. Stat. Med. 24, 1563–1578 (2005)

    Article  PubMed  Google Scholar 

  • Austin, P.C., Mamdani, M.M.: A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat. Med. 25, 2084–2106 (2006)

    Article  PubMed  Google Scholar 

  • Austin, P.C., Stuart, E.A.: Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat. Med. 34, 3661–3679 (2015)

    Article  PubMed  PubMed Central  Google Scholar 

  • Brookhard, M.A., Schneeweiss, S., Rothman, K.J., Glynn, R.J., Avorn, J., Sturmer, T.: Variable selection for propensity score models. Am. J. Epidemiol. 163, 1149–1156 (2006)

    Article  Google Scholar 

  • Diamond, A., Sekhon, J.S.: Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev. Econ. Stat. 95, 932–945 (2013)

    Article  Google Scholar 

  • Flurry, B.K., Riedwyl, H.: Standard distance in univariate and multivariate analysis. Am. Stat. 40, 249–251 (1986)

    Google Scholar 

  • Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20, 25–46 (2012)

    Article  Google Scholar 

  • Hainmueller, J., Xu, Y.: Ebalance: a Stata package for entropy balancing. J. Stat. Softw. 54, 1–18 (2013)

    Article  Google Scholar 

  • Hill, J.L.: Bayesian nonparametric modeling for causal inference. J. Comput. Gr. Stat. 20, 217–240 (2011)

    Article  Google Scholar 

  • Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit Anal. 15, 199–236 (2007)

    Article  Google Scholar 

  • Imai, K., Ratkovic, M.: Covariate balancing propensity score. J. R. Stat. Soc. 76, 243–263 (2014)

    Article  Google Scholar 

  • Imbens, G.W.: Nonparametric estimation of average treatment effects under exogeneity: a review. Rev. Econ. Stat. 86, 4–29 (2004)

    Article  Google Scholar 

  • McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluation causal effects in observational studies. Psychol. Methods 9, 403–425 (2004)

    Article  PubMed  Google Scholar 

  • Robins, J., Ritov, Y.: Towards a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Stat. Med. 16, 285–319 (1997)

    Article  CAS  PubMed  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)

    Article  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984)

    Article  Google Scholar 

  • Rubin, D.B.: Estimating the causal effects of treatments in randomized and non-randomized studies. J. Educ. Psychol. 66, 688–701 (1974)

    Article  Google Scholar 

  • Rubin, D.B.: Assignment to a treatment group on the basis of a covariate. J. Educ. Stat. 2, 1–26 (1977)

    Article  Google Scholar 

  • Rubin, D.B.: Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv. Outcomes Res. Method. 2, 169–188 (2001)

    Article  Google Scholar 

  • Sturmer, T., Rothman, K., Avorn, J., Glynn, R.: Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution–A simulation study. Am. J. Epidemiol. 172, 843–854 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  • StataCorp.: Stata Statistical Software: Release 14. College Station, TX: StataCorp LP (2015)

  • Zhao, Q., Percival, D.: Entropy balancing is doubly robust. J Causal Inference 5, 1–23 (2016)

    Article  Google Scholar 

Download references

Funding

This research was supported by the Centers for Medicare & Medicaid Services under contract number HHSM-500-2010-00021I. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the U.S. Department of Health and Human Services or any of its agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William J. Parish.

Ethics declarations

Conflicts of interest

None.

Appendix

Appendix

See Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18.

Fig. 5
figure 5

Propensity score weighting versus entropy balancing in State A using comparison group 1

Fig. 6
figure 6

Propensity score weighting versus entropy balancing in State A using comparison group 2

Fig. 7
figure 7

Propomity Score weighting versus entropy balancing in State B using companion group 1

Fig. 8
figure 8

Propensity scare weighting vera us entropy balancing in State B using comparison group 2

Fig. 9
figure 9

Propensity score weighting versus entropy balancing in State C using comparison group 1

Fig. 10
figure 10

Propensity score weighting versus entropy balancing in State C using comparison group 2

Fig. 11
figure 11

Propensity score weighting versus entropy balancing in State D using comparison group 1

Fig. 12
figure 12

Propensity score weighting versus entropy balancing in State D using comparison group 2

Fig. 13
figure 13

Propensity score weighting versus entropy balancing in State E using comparison group 1

Fig. 14
figure 14

Propensity score weighting versus entropy balancing in State F using comparison group 1

Fig. 15
figure 15

Propensity score weighting versus entropy balancing in State G using comparison group 1

Fig. 16
figure 16

Propensity score weighting versus entropy balancing in State G using comparison group 2

Fig. 17
figure 17

Propensity soon weighting versus entropy balancing in State H using comparison group 1

Fig. 18
figure 18

Propensity score weighting versus entropy balancing in State H using comparison group 2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parish, W.J., Keyes, V., Beadles, C. et al. Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evaluation of a complex multi-state federal demonstration. Health Serv Outcomes Res Method 18, 17–46 (2018). https://doi.org/10.1007/s10742-017-0174-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-017-0174-z

Keywords

Navigation