Propensity score methods (e.g., matching, weighting, subclassification) provide a statistical approach for balancing dissimilar exposure groups on baseline covariates. These methods were developed in the context of data with no hierarchical structure or clustering. Yet in many applications the data have a clustered structure that is of substantive importance, such as when individuals are nested within healthcare providers or within schools. Recent work has extended propensity score methods to a multilevel setting, primarily focusing on binary exposures. In this paper, we focus on propensity score weighting for a continuous, rather than binary, exposure in a multilevel setting. Using simulations, we compare several specifications of the propensity score: a random effects model, a fixed effects model, and a single-level model. Additionally, our simulations compare the performance of marginal versus cluster-mean stabilized propensity score weights. In our results, regression specifications that accounted for the multilevel structure reduced bias, particularly when cluster-level confounders were omitted. Furthermore, cluster mean weights outperformed marginal weights.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Almirall, D., Griffin, B.A., McCaffrey, D.F., Ramchand, R., Yuen, R.A., Murphy, S.A.: Time-varying effect moderation using the structural nested mean model: estimation using inverse-weighted regression with residuals. Stat. Med. 33(20), 3466–3487 (2014)
Arpino, B., Mealli, F.: The specification of the propensity score in multilevel observational studies. Comput. Stat. Data Anal. 55(4), 1770–1780 (2011)
Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46(3), 399–424 (2011)
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1), 1–48 (2015)
Cohen, J.: Statistical power analysis for the behavioral sciences, 2nd edn. Routledge, Abingdon-on-Thames (1988)
Cole, S.R., Hernan, M.A.: Constructing inverse probability weights for marginal structural models. Am. J. Epidemiol. 168(6), 656–664 (2008)
Eckardt, P.: Propensity score estimates in multilevel models for causal inference. Nurs. Res. 61(3), 213–223 (2012)
Fonarow, G.C., Zhao, X., Smith, E.E., Saver, J.L., Reeves, M.J., Bhatt, D.L., Xian, Y., Hernandez, A.F., Peterson, E.D., Schwamm, L.H.: Door-to-needle times for tissue plasminogen activator administration and clinical outcomes in acute ischemic stroke before and after a quality improvement initiative. J. Am. Med. Assoc. 311(16), 1632–1640 (2014)
Fuller, G., Hasler, R.M., Mealing, N., Lawrence, T., Woodford, M., Juni, P., Lecky, F.: The association between admission systolic blood pressure and mortality in significant traumatic brain injury: a multi-centre cohort study. Injury 45(3), 612–617 (2014)
Greenland, S., Robins, J.M.: Identifiability, exchangeability, and epidemiological confounding. Int. J. Epidemiol. 15(3), 413–419 (1986)
Hirano, K., Imbens, G.: Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv. Outcomes Res. Method. 2(3), 259–278 (2001)
Hirano, K., Imbens, G.W.: The propensity score with continuous treatments. In: Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives. pp. 73–84 (2004)
Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000)
Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)
Kelcey, B.: Assessing the effects of teachers’ reading knowledge on students’ achievement using multilevel propensity score stratification. Educ. Eval. Policy Anal. 33(4), 458–482 (2011)
Kim, J., Seltzer, M.: Causal inference in multilevel settings in which selection processes vary across schools. Center for Study of Evaluation, Los Angeles (2007)
Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2009)
Lee, B.K., Lessler, J., Stuart, E.A.: Weight trimming and propensity score weighting. PLoS ONE 6(3), e18174 (2011)
Leite, W.L., Jimenez, F., Kaya, Y., Stapleton, L.M., MacInnes, J.W., Sandbach, R.: An evaluation of weighting methods based on propensity scores to reduce selection bias in multilevel observational studies. Multivar. Behav. Res. 50(3), 265–284 (2015)
Li, F., Zaslavsky, A.M., Landrum, M.B.: Propensity score weighting with multilevel data. Stat. Med. 32(19), 3373–3387 (2013)
Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)
McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9(4), 403–425 (2004)
McCormick, M.P., O’Connor, E.E., Cappella, E., McClowry, S.G.: Teacher-child relationships and academic achievement: a multilevel propensity score model approach. J. Sch. Psychol. 51(5), 611–624 (2013)
Petersen, M.L., Porter, K.E., Gruber, S., Wang, Y., van der Laan, M.J.: Diagnosing and responding to violations in the positivity assumption. Stat. Methods Med. Res. 21(1), 31–54 (2012)
Pirracchio, R., Petersen, M.L., van der Laan, M.: Improving propensity score estimators’ robustness to model misspecification using super learner. Am. J. Epidemiol. 181(2), 108–119 (2015)
Potter, F.J.: The effect of weight trimming on nonlinear survey estimates. In: Proceedings of the Section on Survey Research Methods. American Statistical Association (1993)
Robins, J., Sued, M., Lei-Gomez, Q., Rotnitzky, A.: Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Stat. Sci. 22(4), 544–559 (2007)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ (2015). Accessed 01 Dec 2015
Robins, J.M., Hernan, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology 11(5), 550–560 (2000)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Rubin, D.B.: Comment: randomization analysis of experimental data: The Fisher randomization test. J. Am. Stat. Assoc. 75(371), 591–593 (1980)
Rubin, D.B.: Statistics and causal inference: comment: Which ifs have causal answers. J. Am. Stat. Assoc. 81(396), 961–962 (1986)
Scharfstein, D.O., Rotnitzky, A., Robins, J.M.: Adjusting for non-ignorable drop-out using semiparametric non-response models. J. Am. Stat. Assoc. 94(448), 1096-1120 (1999)
Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., Cook, E.F.: Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiol Drug Saf. 17(6), 546–555 (2008)
Stuart, E.A.: Estimating causal effects using school-level data sets. Educ. Res. 36(4), 187–198 (2007)
Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25(1), 1–21 (2010)
Su, Y.S., Cortina, J.: What do we gain? Combining propensity score methods and multilevel modeling. Paper presented at the Annual Meeting of the American Political Science Association, Toronto, Canada (2009)
Thoemmes, F.J., West, S.G.: The use of propensity scores for nonrandomized designs with clustered data. Multivar. Behav. Res. 46(3), 514–543 (2011)
Xiang, Y., Tarasawa, B.: Propensity score stratification using multilevel models to examine charter school achievement effects. J. Sch. Choice 9(2), 179–196 (2015)
Zhang, Z., Zhou, J., Cao, W., Zhang, J.: Causal inference with a quantitative exposure. Stat. Methods Med. Res. 25(1), 315–335, (2016)
Zhu, Y., Coffman, D.L., Ghosh, D.: A boosting algorithm for estimating generalized propensity scores with continuous treatments. J. Causal inference 3(1), 25–40 (2014)
This work was conducted while Megan Schuler was post-doctoral fellow and Wanghuan Chu was a doctoral student at the Pennsylvania State University. This work was funded by awards P50 DA010075, P50 DA039838, and T32 DA017629 from the National Institute on Drug Abuse and K01 ES025437 from the National Institutes of Health Big Data to Knowledge initiative; IGERT award DGE-1144860 from the National Science Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Schuler, M.S., Chu, W. & Coffman, D. Propensity score weighting for a continuous exposure with multilevel data. Health Serv Outcomes Res Method 16, 271–292 (2016). https://doi.org/10.1007/s10742-016-0157-5
- Propensity score
- Continuous exposure
- Multilevel data
- Observational study