Health Services and Outcomes Research Methodology

, Volume 16, Issue 4, pp 271–292 | Cite as

Propensity score weighting for a continuous exposure with multilevel data

  • Megan S. Schuler
  • Wanghuan Chu
  • Donna Coffman


Propensity score methods (e.g., matching, weighting, subclassification) provide a statistical approach for balancing dissimilar exposure groups on baseline covariates. These methods were developed in the context of data with no hierarchical structure or clustering. Yet in many applications the data have a clustered structure that is of substantive importance, such as when individuals are nested within healthcare providers or within schools. Recent work has extended propensity score methods to a multilevel setting, primarily focusing on binary exposures. In this paper, we focus on propensity score weighting for a continuous, rather than binary, exposure in a multilevel setting. Using simulations, we compare several specifications of the propensity score: a random effects model, a fixed effects model, and a single-level model. Additionally, our simulations compare the performance of marginal versus cluster-mean stabilized propensity score weights. In our results, regression specifications that accounted for the multilevel structure reduced bias, particularly when cluster-level confounders were omitted. Furthermore, cluster mean weights outperformed marginal weights.


Propensity score Continuous exposure Multilevel data Observational study 



This work was conducted while Megan Schuler was post-doctoral fellow and Wanghuan Chu was a doctoral student at the Pennsylvania State University. This work was funded by awards P50 DA010075, P50 DA039838, and T32 DA017629 from the National Institute on Drug Abuse and K01 ES025437 from the National Institutes of Health Big Data to Knowledge initiative; IGERT award DGE-1144860 from the National Science Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

10742_2016_157_MOESM1_ESM.docx (416 kb)
Supplementary material 1 (DOCX 415 kb)


  1. Almirall, D., Griffin, B.A., McCaffrey, D.F., Ramchand, R., Yuen, R.A., Murphy, S.A.: Time-varying effect moderation using the structural nested mean model: estimation using inverse-weighted regression with residuals. Stat. Med. 33(20), 3466–3487 (2014)CrossRefPubMedGoogle Scholar
  2. Arpino, B., Mealli, F.: The specification of the propensity score in multilevel observational studies. Comput. Stat. Data Anal. 55(4), 1770–1780 (2011)CrossRefGoogle Scholar
  3. Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46(3), 399–424 (2011)CrossRefGoogle Scholar
  4. Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1), 1–48 (2015)CrossRefGoogle Scholar
  5. Cohen, J.: Statistical power analysis for the behavioral sciences, 2nd edn. Routledge, Abingdon-on-Thames (1988)Google Scholar
  6. Cole, S.R., Hernan, M.A.: Constructing inverse probability weights for marginal structural models. Am. J. Epidemiol. 168(6), 656–664 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  7. Eckardt, P.: Propensity score estimates in multilevel models for causal inference. Nurs. Res. 61(3), 213–223 (2012)CrossRefPubMedGoogle Scholar
  8. Fonarow, G.C., Zhao, X., Smith, E.E., Saver, J.L., Reeves, M.J., Bhatt, D.L., Xian, Y., Hernandez, A.F., Peterson, E.D., Schwamm, L.H.: Door-to-needle times for tissue plasminogen activator administration and clinical outcomes in acute ischemic stroke before and after a quality improvement initiative. J. Am. Med. Assoc. 311(16), 1632–1640 (2014)CrossRefGoogle Scholar
  9. Fuller, G., Hasler, R.M., Mealing, N., Lawrence, T., Woodford, M., Juni, P., Lecky, F.: The association between admission systolic blood pressure and mortality in significant traumatic brain injury: a multi-centre cohort study. Injury 45(3), 612–617 (2014)CrossRefPubMedGoogle Scholar
  10. Greenland, S., Robins, J.M.: Identifiability, exchangeability, and epidemiological confounding. Int. J. Epidemiol. 15(3), 413–419 (1986)CrossRefPubMedGoogle Scholar
  11. Hirano, K., Imbens, G.: Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv. Outcomes Res. Method. 2(3), 259–278 (2001)CrossRefGoogle Scholar
  12. Hirano, K., Imbens, G.W.: The propensity score with continuous treatments. In: Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives. pp. 73–84 (2004)Google Scholar
  13. Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000)CrossRefGoogle Scholar
  14. Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)CrossRefGoogle Scholar
  15. Kelcey, B.: Assessing the effects of teachers’ reading knowledge on students’ achievement using multilevel propensity score stratification. Educ. Eval. Policy Anal. 33(4), 458–482 (2011)CrossRefGoogle Scholar
  16. Kim, J., Seltzer, M.: Causal inference in multilevel settings in which selection processes vary across schools. Center for Study of Evaluation, Los Angeles (2007)Google Scholar
  17. Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2009)Google Scholar
  18. Lee, B.K., Lessler, J., Stuart, E.A.: Weight trimming and propensity score weighting. PLoS ONE 6(3), e18174 (2011)CrossRefPubMedPubMedCentralGoogle Scholar
  19. Leite, W.L.,  Jimenez, F., Kaya, Y., Stapleton, L.M., MacInnes, J.W., Sandbach, R.: An evaluation of weighting methods based on propensity scores to reduce selection bias in multilevel observational studies. Multivar. Behav. Res. 50(3), 265–284 (2015)Google Scholar
  20. Li, F., Zaslavsky, A.M., Landrum, M.B.: Propensity score weighting with multilevel data. Stat. Med. 32(19), 3373–3387 (2013)CrossRefPubMedPubMedCentralGoogle Scholar
  21. Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)CrossRefPubMedGoogle Scholar
  22. McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9(4), 403–425 (2004)CrossRefPubMedGoogle Scholar
  23. McCormick, M.P., O’Connor, E.E., Cappella, E., McClowry, S.G.: Teacher-child relationships and academic achievement: a multilevel propensity score model approach. J. Sch. Psychol. 51(5), 611–624 (2013)CrossRefPubMedGoogle Scholar
  24. Petersen, M.L., Porter, K.E., Gruber, S., Wang, Y., van der Laan, M.J.: Diagnosing and responding to violations in the positivity assumption. Stat. Methods Med. Res. 21(1), 31–54 (2012)CrossRefPubMedGoogle Scholar
  25. Pirracchio, R., Petersen, M.L., van der Laan, M.: Improving propensity score estimators’ robustness to model misspecification using super learner. Am. J. Epidemiol. 181(2), 108–119 (2015)CrossRefPubMedGoogle Scholar
  26. Potter, F.J.: The effect of weight trimming on nonlinear survey estimates. In: Proceedings of the Section on Survey Research Methods. American Statistical Association (1993)Google Scholar
  27. Robins, J., Sued, M., Lei-Gomez, Q., Rotnitzky, A.: Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Stat. Sci. 22(4), 544–559 (2007)CrossRefGoogle Scholar
  28. R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. (2015). Accessed 01 Dec 2015
  29. Robins, J.M., Hernan, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology 11(5), 550–560 (2000)CrossRefPubMedGoogle Scholar
  30. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)CrossRefGoogle Scholar
  31. Rubin, D.B.: Comment: randomization analysis of experimental data: The Fisher randomization test. J. Am. Stat. Assoc. 75(371), 591–593 (1980)Google Scholar
  32. Rubin, D.B.: Statistics and causal inference: comment: Which ifs have causal answers. J. Am. Stat. Assoc. 81(396), 961–962 (1986)Google Scholar
  33. Scharfstein, D.O., Rotnitzky, A., Robins, J.M.: Adjusting for non-ignorable drop-out using semiparametric non-response models. J. Am. Stat. Assoc. 94(448), 1096-1120 (1999)CrossRefGoogle Scholar
  34. Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., Cook, E.F.: Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiol Drug Saf. 17(6), 546–555 (2008)Google Scholar
  35. Stuart, E.A.: Estimating causal effects using school-level data sets. Educ. Res. 36(4), 187–198 (2007)CrossRefGoogle Scholar
  36. Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25(1), 1–21 (2010)CrossRefPubMedPubMedCentralGoogle Scholar
  37. Su, Y.S., Cortina, J.: What do we gain? Combining propensity score methods and multilevel modeling. Paper presented at the Annual Meeting of the American Political Science Association, Toronto, Canada (2009)Google Scholar
  38. Thoemmes, F.J., West, S.G.: The use of propensity scores for nonrandomized designs with clustered data. Multivar. Behav. Res. 46(3), 514–543 (2011)CrossRefGoogle Scholar
  39. Xiang, Y., Tarasawa, B.: Propensity score stratification using multilevel models to examine charter school achievement effects. J. Sch. Choice 9(2), 179–196 (2015)CrossRefGoogle Scholar
  40. Zhang, Z., Zhou, J., Cao, W., Zhang, J.: Causal inference with a quantitative exposure. Stat. Methods Med. Res. 25(1), 315–335, (2016)CrossRefPubMedGoogle Scholar
  41. Zhu, Y., Coffman, D.L., Ghosh, D.: A boosting algorithm for estimating generalized propensity scores with continuous treatments. J. Causal inference 3(1), 25–40 (2014)PubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Health Care PolicyHarvard Medical SchoolBostonUSA
  2. 2.Google, Inc.Mountain ViewUSA
  3. 3.Department of Epidemiology and BiostatisticsTemple UniversityPhiladelphiaUSA

Personalised recommendations