Specifying Multilevel Mixture Selection Models in Propensity Score Analysis

  • Jee-Seon KimEmail author
  • Youmi Suk
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 265)


Causal inference with observational data is challenging, as the assignment to treatment is often not random and people may have different reasons to receive or to be assigned to the treatment. Moreover, the analyst may not have access to all of the important variables and may face omitted variable bias as well as selection bias in nonexperimental studies. It is known that fixed effects models are robust against unobserved cluster variables while random effects models provide biased estimates of model parameters in the presence of omitted variables. This study further investigates the properties of fixed effects models as an alternative to the common random effects models for identifying and classifying subpopulations or “latent classes” when selection or outcome processes are heterogeneous. A recent study by Suk and Kim (2018) found that linear probability models outperform standard logistic selection models in terms of the extraction of the correct number of latent classes, and the authors continue to search for optimal model specifications of mixture selection models across different conditions, such as strong and weak selection, various numbers of clusters and cluster sizes. It is found that fixed-effects models outperform random effects models in terms of classifying units and estimating treatment effects when cluster size is small.


Causal inference Finite mixture modeling Latent class analysis Selection bias Balancing scores Heterogeneous selection and treatment effects Fixed-effects and Random-effects models Hierarchical linear modeling 


  1. Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55, 1770–1780.MathSciNetCrossRefGoogle Scholar
  2. Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  3. Clogg, C. C. (1995). Latent class models. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311–359). Boston, MA: Springer.CrossRefGoogle Scholar
  4. Gui, R., Meierer, M., & Algesheimer, R. (2017). REndo: Fitting linear models with endogenous regressors using latent instrumental variables. R package version 1.3.
  5. Hausman, J. A. (1978). Specification tests in econometrics. Econometrica, 46, 1251–1271.MathSciNetCrossRefGoogle Scholar
  6. Hausman, J. A., & Taylor, W. E. (1981). Panel data and unobservable individual effects. Econometrica, 49, 1377–1398.MathSciNetCrossRefGoogle Scholar
  7. Hong, G., & Hong, Y. (2009). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31, 54–81.CrossRefGoogle Scholar
  8. Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101, 901–910.MathSciNetCrossRefGoogle Scholar
  9. Kim, J. S., & Frees, E. W. (2007). Multilevel modeling with correlated effects. Psychometrika, 72, 505–533.MathSciNetCrossRefGoogle Scholar
  10. Kim, Y., Lubanski, S. A., & Steiner, P. M. (2018). Matching strategies for causal inference with observational data in education. In C. Lochmiller (Ed.), Complementary research methods for educational leadership and policy studies (pp. 173–191). Cham: Palgrave Macmillan.CrossRefGoogle Scholar
  11. Kim, J. S., & Steiner, P. M. (2015). Multilevel propensity score methods for estimating causal effects: A latent class modeling strategy. In L. van der Ark, D. Bolt, W. C. Wang, J. Douglas, & S. M. Chow (Eds.), Quantitative psychology research (pp. 293–306). Cham: Springer.CrossRefGoogle Scholar
  12. Kim, J.-S., Steiner, P. M. & Lim, W.-C. (2016). Mixture modeling strategies for causal inference with multilevel data. In J. R. Harring, L. M. Stapleton, & S. Natasha Beretvas (Eds.), Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications (pp. 335–359). Charlotte, NC: IAP—Information Age Publishing, Inc.Google Scholar
  13. Leite, W. L., Jimenez, F., Kaya, Y., Stapleton, L. M., MacInnes, J. W., & Sandbach, R. (2015). An evaluation of weighting methods based on propensity scores to reduce selection bias in multilevel observational studies. Multivariate Behavioral Research, 50, 265–284.CrossRefGoogle Scholar
  14. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.CrossRefGoogle Scholar
  15. Muthén, L. K., Muthén, B. O. (1998–2017). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén.Google Scholar
  16. Nerlove, M. (2005). Essays in panel data econometrics. Cambridge: Cambridge University Press.Google Scholar
  17. R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL
  18. Rosenbaum, P. R., & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.MathSciNetCrossRefGoogle Scholar
  19. Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.CrossRefGoogle Scholar
  20. Steiner, P. M., & Cook, D. (2013). Matching and propensity scores. In T. Little (Ed.), The oxford handbook of quantitative methods (pp. 236–258). Oxford, England: Oxford University Press.Google Scholar
  21. Suk, Y., Kim, J.-S. (2018, April). Linear probability models as alternatives to logistic regression models for multilevel propensity score analysis. Paper presented at the annual meeting of American Educational Research Association, New York City, NY.Google Scholar
  22. Thoemmes, F. J., & West, S. G. (2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46, 514–543.CrossRefGoogle Scholar
  23. Tueller, S. J., Drotar, S., & Lubke, G. H. (2011). Addressing the problem of switched class labels in latent variable mixture model simulation studies. Structural Equation Modeling, 18, 110–131.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Educational Psychology, Educational Sciences BuildingUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations