Skip to main content
Log in

Omitted Variables in Multilevel Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Statistical methodology for handling omitted variables is presented in a multilevel modeling framework. In many nonexperimental studies, the analyst may not have access to all requisite variables, and this omission may lead to biased estimates of model parameters. By exploiting the hierarchical nature of multilevel data, a battery of statistical tools are developed to test various forms of model misspecification as well as to obtain estimators that are robust to the presence of omitted variables. The methodology allows for tests of omitted effects at single and multiple levels. The paper also introduces intermediate-level tests; these are tests for omitted effects at a single level, regardless of the presence of omitted effects at a higher level. A simulation study shows, not surprisingly, that the omission of variables yields bias in both regression coefficients and variance components; it also suggests that omitted effects at lower levels may cause more severe bias than at higher levels. Important factors resulting in bias were found to be the level of an omitted variable, its effect size, and sample size. A real data study illustrates that an omitted variable at one level may yield biased estimators at any level and, in this study, one cannot obtain reliable estimates for school-level variables when omitted child effects exist. However, robust estimators may provide unbiased estimates for effects of interest even when the efficient estimators fail, and the one-degree-of-freedom test helps one to understand where the problem is located. It is argued that multilevel data typically contain rich information to deal with omitted variables, offering yet another appealing reason for the use of multilevel models in the social sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahn, S.C., Lee, Y.H., & Schmidt, P. (2001). GMM estimation of linear panel data models with time-varying individual effects. Journal of Econometrics, 101, 219–55.

    Article  Google Scholar 

  • Anderson, G.E., Jimerson, S.R., & Whipple, A.D. (2002). Grade retention: Achievement and mental health outcomes. National Association of School Psychologists. Available at http://www.nasponline.org/pdf/graderetention.pdf.

  • Arellano, M. (1993). On the testing of correlated effects with panel data. Journal of Econometrics, 59, 87–7.

    Article  Google Scholar 

  • Blundell, R., & Windmeijer, F. (1997). Cluster effects and simultaneity in multilevel models. Health Economics, 6, 439–43.

    Article  PubMed  Google Scholar 

  • Boardman, A.E., & Murnane, R.J. (1979). Using panel data to improve estimates of the determinants of educational achievement. Sociology of Education, 52, 113–21.

    Article  Google Scholar 

  • Bonesrø nning, H. (2004). Can effective teacher behavior be identified? Economics of Education Review, 23, 237–47.

    Article  Google Scholar 

  • Chamberlain, G. (1978). Omitted variable bias in panel data: Estimating the returns to schooling. Annales de l’INSEE, 30–1, 49–2.

    Google Scholar 

  • Chamberlain, G. (1985). Heterogeneity, omitted variable bias, duration dependence. In J.J. Heckman, & B. Singer (Eds.), Longitudinal analysis of labor market data. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Coleman, J.S., Campbell, E.Q., Hobson, C.J., McPartland, J., Mood, A.M., Weinfeld, F.D. et al. (1966). Equality of educational opportunity. Washington, DC: US Government Printing Office.

    Google Scholar 

  • Dee, T.S. (1998). Competition and the quality of public schools. Economics of Educational Review, 17, 419–27.

    Article  Google Scholar 

  • Diggle, P.J., Heagarty, P., Liang, K.-Y., & Zeger, S.L. (2002). Analysis of longitudinal data (2nd ed.). London: Oxford University Press.

    Google Scholar 

  • Dunn, M.C., Kadane, J.B., & Garrow, J.R. (2003). Comparing harm done by mobility and class absence: Missing students and missing data. Journal of Educational and Behavioral Statistics, 28, 269–88.

    Article  Google Scholar 

  • Ebbes, P., Bockenholt, U., & Wedel, M. (2004). Regressor and random-effects dependencies in multilevel models. Statistica Neerlandica, 58, 161–78.

    Article  Google Scholar 

  • Ehrenberg, R.G., & Brewer, D.J. (1994). Do school and teacher characteristics matter? Evidence from High School and Beyond. Economics of Education Review, 13, 1–7.

    Article  Google Scholar 

  • Ehrenberg, R.G., & Brewer, D.J. (1995). Did teachers verbal-ability and race matter in the 1960s—Coleman revisited. Economics of Educational Review, 14, 1–1.

    Article  Google Scholar 

  • Ehrenberg, R.G., Brewer, D.J., Gamoran, A., & Willms, J.D. (2001). Class size and student achievement. Psychological Science in the Public Interest, 2, 1–0.

    Article  Google Scholar 

  • Ehrenberg, R.G., Goldhaber, D.D., & Brewer, D.J. (1995). Do teachers’ race, gender, and ethnicity matter? Evidence from NELS:88. Industrial and Labor Relations Review, 48, 547–61.

    Article  Google Scholar 

  • Frank, K.A. (2000). Impact of a confounding variable on a regression coefficient. Sociological Methods & Research, 29, 147–94.

    Article  Google Scholar 

  • Frees, E.W. (2001). Omitted variables in longitudinal data models. The Canadian Journal of Statistics, 29, 573–95.

    Article  Google Scholar 

  • Frees, E.W. (2004). Longitudinal and panel data: Analysis and applications for the social sciences. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Frees, E.W., & Kim, J.-S. (2006). Multilevel model prediction. Psychometrika, 71, 79–04.

    Article  Google Scholar 

  • Goldhaber, D.D., & Brewer, D.J. (1997). Why don’t schools and teachers seem to matter? Assessing the impact of unobservables on educational productivity. The Journal of Human Resources, 32, 505–23.

    Article  Google Scholar 

  • Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London: Oxford University Press.

    Google Scholar 

  • Griliches, Z. (1977). Estimating the returns to schooling. Econometrica, 45, 1–2.

    Article  Google Scholar 

  • Halaby, C.H. (2004). Panel models in sociological research: Theory into practice. Annual Review of Sociology, 30, 507–40.

    Article  Google Scholar 

  • Hanushek, E.A. (2003). The failure of input-based schooling policies. The Economic Journal, 113, 64–8.

    Article  Google Scholar 

  • Hanushek, E.A., Kane, J.F., & Rivkin, S.G. (2004). Disruption versus Tiebout improvement: The costs and benefits of switching schools. Journal of Public Econometrics, 88, 1721–746.

    Article  Google Scholar 

  • Hausman, J.A. (1978). Specification tests in econometrics. Econometrica, 46, 1251–272.

    Article  Google Scholar 

  • Hausman, J.A., & Taylor, W.E. (1981). Panel data and unobservable individual effects. Econometrica, 49, 1377–398.

    Article  Google Scholar 

  • Heckman, J.J., & Singer, B. (1982). Population heterogeneity in demographic models. In K. Land, & A. Rogers (Eds.), Multidimensional mathematical demography. New York: Academic Press.

    Google Scholar 

  • Hedges, L., Laine, R., & Greenwald, R. (1994). Does money matter? A meta analysis of the effects of differential school inputs on student outcomes. Educational Research, 23, 5–4.

    Google Scholar 

  • Hsiao, C. (2003). Analysis of panel data (2nd ed.). Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Kiefer, N.M. (1980). Estimation of fixed effects models for time series of cross sections with arbitrary intertemporal covariance. Journal of Econometrics, 14, 195–02.

    Article  Google Scholar 

  • Kim, J.-S., & Frees, E.W. (2005). Fixed effects estimation in multilevel models. University of Wisconsin working paper, available at http://research.bus.wisc.edu/jfrees/.

  • Laird, N. (2004). Analysis of longitudinal and cluster-correlated data. Institute of Mathematical Statistics, Beachwood, OH.

    Google Scholar 

  • Ludwig, J., & Bassi, L.J. (1999). The puzzling case of school resources and student achievement. Educational Evaluation and Policy Analysis, 21, 385–03.

    Google Scholar 

  • Maas, C.J., & Hox, J.J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127–37.

    Article  Google Scholar 

  • Maddala, G.S. (1971). The use of variance components models in pooling cross section and time series data. Econometrica, 39, 341–58.

    Article  Google Scholar 

  • Marsh, L.C. (2004). The econometrics of higher education: Editor’s view. Journal of Econometrics, 121, 1–8.

    Article  Google Scholar 

  • McCaffrey, D.F., Koretz, D., Louis, T.A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67–01.

    Article  PubMed  Google Scholar 

  • Murnane, R.J., & Phillips, B.R. (1981). What do effective teachers of inner-city children have in common? Social Science Research, 10, 83–00.

    Article  Google Scholar 

  • National Association of School Psychologists (NASP) (2003). Position statement on student grade retention and school promotion. Available at http://www.nasponline.org/information/pospaper_graderetent.html.

  • Palta, M., & Yao, T.-J. (1991). Analysis of longitudinal data with unmeasured confounders. Biometrics, 47, 1355–369.

    Article  PubMed  Google Scholar 

  • Phillips, M. (1997). What makes schools effective. A comparison of the relationships of communitarian climate and academic climate to mathematics achievement and attendance during middle school. American Educational Research Journal, 34, 633–62.

    Google Scholar 

  • Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Newbury Park, CA: Sage.

    Google Scholar 

  • Raudenbush, S.W., & Willms, J.D. (1995). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20, 307–35.

    Google Scholar 

  • Rice, N., Jones, A., & Goldstein, H. (1998). Multilevel models where the random effects are correlated with the fixed predictors: A conditioned iterative generalised least squares estimator (CIGLS). York: University of York, Centre for Health Economics.

    Google Scholar 

  • Rivkin, S.G., Hanushek, E.A., & Kain, J.F. (2005). Teachers, schools, and academic achievement. Econometrica, 73, 417–58.

    Article  Google Scholar 

  • Singer, J. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323–55.

    Google Scholar 

  • Snijders, T.A.B., & Bosker, R.J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.

    Google Scholar 

  • Verbeke, G., Spiessens, B., & Lesaffre, E. (2001). Conditional linear mixed models. The American Statistician, 55, 25–4.

    Article  Google Scholar 

  • Vermunt, J.K. (1997). Log-linear models for event histories. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Webb, N.L., Clune, W.H., Bolt, D.M., Gamoran, A., Meyer, R.H., Osthoff, E., & Thorn, C. (2002). Models for analysis of NSF’s systemic initiative programs—The impact of the urban system initiatives on student achievement in Texas, 1994–000. Wisconsin Center for Education Research, Technical Report. Madison, WI.

    Google Scholar 

  • Wooldridge, J.M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.

    Google Scholar 

  • Yamaguchi, K. (1986). Alternative approaches to unobserved heterogeneity in the analysis of repeatable events. In B. Tuma (Ed.), Sociological methodology (pp. 213–49). Washington, DC: American Sociological Association.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jee-Seon Kim.

Additional information

This research was supported by the National Academy of Education/Spencer Foundation and the National Science Foundation, Grant Number SES-0436274.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, JS., Frees, E.W. Omitted Variables in Multilevel Models. Psychometrika 71, 659–690 (2006). https://doi.org/10.1007/s11336-005-1283-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-005-1283-0

Keywords

Navigation