Propensity score weighting for a continuous exposure with multilevel data


Propensity score methods (e.g., matching, weighting, subclassification) provide a statistical approach for balancing dissimilar exposure groups on baseline covariates. These methods were developed in the context of data with no hierarchical structure or clustering. Yet in many applications the data have a clustered structure that is of substantive importance, such as when individuals are nested within healthcare providers or within schools. Recent work has extended propensity score methods to a multilevel setting, primarily focusing on binary exposures. In this paper, we focus on propensity score weighting for a continuous, rather than binary, exposure in a multilevel setting. Using simulations, we compare several specifications of the propensity score: a random effects model, a fixed effects model, and a single-level model. Additionally, our simulations compare the performance of marginal versus cluster-mean stabilized propensity score weights. In our results, regression specifications that accounted for the multilevel structure reduced bias, particularly when cluster-level confounders were omitted. Furthermore, cluster mean weights outperformed marginal weights.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. Almirall, D., Griffin, B.A., McCaffrey, D.F., Ramchand, R., Yuen, R.A., Murphy, S.A.: Time-varying effect moderation using the structural nested mean model: estimation using inverse-weighted regression with residuals. Stat. Med. 33(20), 3466–3487 (2014)

    Article  PubMed  Google Scholar 

  2. Arpino, B., Mealli, F.: The specification of the propensity score in multilevel observational studies. Comput. Stat. Data Anal. 55(4), 1770–1780 (2011)

    Article  Google Scholar 

  3. Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46(3), 399–424 (2011)

    Article  Google Scholar 

  4. Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1), 1–48 (2015)

    Article  Google Scholar 

  5. Cohen, J.: Statistical power analysis for the behavioral sciences, 2nd edn. Routledge, Abingdon-on-Thames (1988)

    Google Scholar 

  6. Cole, S.R., Hernan, M.A.: Constructing inverse probability weights for marginal structural models. Am. J. Epidemiol. 168(6), 656–664 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  7. Eckardt, P.: Propensity score estimates in multilevel models for causal inference. Nurs. Res. 61(3), 213–223 (2012)

    Article  PubMed  Google Scholar 

  8. Fonarow, G.C., Zhao, X., Smith, E.E., Saver, J.L., Reeves, M.J., Bhatt, D.L., Xian, Y., Hernandez, A.F., Peterson, E.D., Schwamm, L.H.: Door-to-needle times for tissue plasminogen activator administration and clinical outcomes in acute ischemic stroke before and after a quality improvement initiative. J. Am. Med. Assoc. 311(16), 1632–1640 (2014)

    Article  CAS  Google Scholar 

  9. Fuller, G., Hasler, R.M., Mealing, N., Lawrence, T., Woodford, M., Juni, P., Lecky, F.: The association between admission systolic blood pressure and mortality in significant traumatic brain injury: a multi-centre cohort study. Injury 45(3), 612–617 (2014)

    Article  PubMed  Google Scholar 

  10. Greenland, S., Robins, J.M.: Identifiability, exchangeability, and epidemiological confounding. Int. J. Epidemiol. 15(3), 413–419 (1986)

    CAS  Article  PubMed  Google Scholar 

  11. Hirano, K., Imbens, G.: Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv. Outcomes Res. Method. 2(3), 259–278 (2001)

    Article  Google Scholar 

  12. Hirano, K., Imbens, G.W.: The propensity score with continuous treatments. In: Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives. pp. 73–84 (2004)

  13. Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000)

    Article  Google Scholar 

  14. Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)

    Article  Google Scholar 

  15. Kelcey, B.: Assessing the effects of teachers’ reading knowledge on students’ achievement using multilevel propensity score stratification. Educ. Eval. Policy Anal. 33(4), 458–482 (2011)

    Article  Google Scholar 

  16. Kim, J., Seltzer, M.: Causal inference in multilevel settings in which selection processes vary across schools. Center for Study of Evaluation, Los Angeles (2007)

  17. Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2009)

    Google Scholar 

  18. Lee, B.K., Lessler, J., Stuart, E.A.: Weight trimming and propensity score weighting. PLoS ONE 6(3), e18174 (2011)

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Leite, W.L.,  Jimenez, F., Kaya, Y., Stapleton, L.M., MacInnes, J.W., Sandbach, R.: An evaluation of weighting methods based on propensity scores to reduce selection bias in multilevel observational studies. Multivar. Behav. Res. 50(3), 265–284 (2015)

  20. Li, F., Zaslavsky, A.M., Landrum, M.B.: Propensity score weighting with multilevel data. Stat. Med. 32(19), 3373–3387 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  21. Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)

    Article  PubMed  Google Scholar 

  22. McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9(4), 403–425 (2004)

    Article  PubMed  Google Scholar 

  23. McCormick, M.P., O’Connor, E.E., Cappella, E., McClowry, S.G.: Teacher-child relationships and academic achievement: a multilevel propensity score model approach. J. Sch. Psychol. 51(5), 611–624 (2013)

    Article  PubMed  Google Scholar 

  24. Petersen, M.L., Porter, K.E., Gruber, S., Wang, Y., van der Laan, M.J.: Diagnosing and responding to violations in the positivity assumption. Stat. Methods Med. Res. 21(1), 31–54 (2012)

    Article  PubMed  Google Scholar 

  25. Pirracchio, R., Petersen, M.L., van der Laan, M.: Improving propensity score estimators’ robustness to model misspecification using super learner. Am. J. Epidemiol. 181(2), 108–119 (2015)

    Article  PubMed  Google Scholar 

  26. Potter, F.J.: The effect of weight trimming on nonlinear survey estimates. In: Proceedings of the Section on Survey Research Methods. American Statistical Association (1993)

  27. Robins, J., Sued, M., Lei-Gomez, Q., Rotnitzky, A.: Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Stat. Sci. 22(4), 544–559 (2007)

    Article  Google Scholar 

  28. R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. (2015). Accessed 01 Dec 2015

  29. Robins, J.M., Hernan, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology 11(5), 550–560 (2000)

    CAS  Article  PubMed  Google Scholar 

  30. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    Article  Google Scholar 

  31. Rubin, D.B.: Comment: randomization analysis of experimental data: The Fisher randomization test. J. Am. Stat. Assoc. 75(371), 591–593 (1980)

    Google Scholar 

  32. Rubin, D.B.: Statistics and causal inference: comment: Which ifs have causal answers. J. Am. Stat. Assoc. 81(396), 961–962 (1986)

    Google Scholar 

  33. Scharfstein, D.O., Rotnitzky, A., Robins, J.M.: Adjusting for non-ignorable drop-out using semiparametric non-response models. J. Am. Stat. Assoc. 94(448), 1096-1120 (1999)

    Article  Google Scholar 

  34. Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., Cook, E.F.: Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiol Drug Saf. 17(6), 546–555 (2008)

  35. Stuart, E.A.: Estimating causal effects using school-level data sets. Educ. Res. 36(4), 187–198 (2007)

    Article  Google Scholar 

  36. Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25(1), 1–21 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  37. Su, Y.S., Cortina, J.: What do we gain? Combining propensity score methods and multilevel modeling. Paper presented at the Annual Meeting of the American Political Science Association, Toronto, Canada (2009)

  38. Thoemmes, F.J., West, S.G.: The use of propensity scores for nonrandomized designs with clustered data. Multivar. Behav. Res. 46(3), 514–543 (2011)

    Article  Google Scholar 

  39. Xiang, Y., Tarasawa, B.: Propensity score stratification using multilevel models to examine charter school achievement effects. J. Sch. Choice 9(2), 179–196 (2015)

    Article  Google Scholar 

  40. Zhang, Z., Zhou, J., Cao, W., Zhang, J.: Causal inference with a quantitative exposure. Stat. Methods Med. Res. 25(1), 315–335, (2016)

    Article  PubMed  Google Scholar 

  41. Zhu, Y., Coffman, D.L., Ghosh, D.: A boosting algorithm for estimating generalized propensity scores with continuous treatments. J. Causal inference 3(1), 25–40 (2014)

    PubMed  PubMed Central  Google Scholar 

Download references


This work was conducted while Megan Schuler was post-doctoral fellow and Wanghuan Chu was a doctoral student at the Pennsylvania State University. This work was funded by awards P50 DA010075, P50 DA039838, and T32 DA017629 from the National Institute on Drug Abuse and K01 ES025437 from the National Institutes of Health Big Data to Knowledge initiative; IGERT award DGE-1144860 from the National Science Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Author information



Corresponding author

Correspondence to Megan S. Schuler.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 415 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schuler, M.S., Chu, W. & Coffman, D. Propensity score weighting for a continuous exposure with multilevel data. Health Serv Outcomes Res Method 16, 271–292 (2016).

Download citation


  • Propensity score
  • Continuous exposure
  • Multilevel data
  • Observational study