Abstract
Statistical methodology for handling omitted variables is presented in a multilevel modeling framework. In many nonexperimental studies, the analyst may not have access to all requisite variables, and this omission may lead to biased estimates of model parameters. By exploiting the hierarchical nature of multilevel data, a battery of statistical tools are developed to test various forms of model misspecification as well as to obtain estimators that are robust to the presence of omitted variables. The methodology allows for tests of omitted effects at single and multiple levels. The paper also introduces intermediate-level tests; these are tests for omitted effects at a single level, regardless of the presence of omitted effects at a higher level. A simulation study shows, not surprisingly, that the omission of variables yields bias in both regression coefficients and variance components; it also suggests that omitted effects at lower levels may cause more severe bias than at higher levels. Important factors resulting in bias were found to be the level of an omitted variable, its effect size, and sample size. A real data study illustrates that an omitted variable at one level may yield biased estimators at any level and, in this study, one cannot obtain reliable estimates for school-level variables when omitted child effects exist. However, robust estimators may provide unbiased estimates for effects of interest even when the efficient estimators fail, and the one-degree-of-freedom test helps one to understand where the problem is located. It is argued that multilevel data typically contain rich information to deal with omitted variables, offering yet another appealing reason for the use of multilevel models in the social sciences.
Similar content being viewed by others
References
Ahn, S.C., Lee, Y.H., & Schmidt, P. (2001). GMM estimation of linear panel data models with time-varying individual effects. Journal of Econometrics, 101, 219–55.
Anderson, G.E., Jimerson, S.R., & Whipple, A.D. (2002). Grade retention: Achievement and mental health outcomes. National Association of School Psychologists. Available at http://www.nasponline.org/pdf/graderetention.pdf.
Arellano, M. (1993). On the testing of correlated effects with panel data. Journal of Econometrics, 59, 87–7.
Blundell, R., & Windmeijer, F. (1997). Cluster effects and simultaneity in multilevel models. Health Economics, 6, 439–43.
Boardman, A.E., & Murnane, R.J. (1979). Using panel data to improve estimates of the determinants of educational achievement. Sociology of Education, 52, 113–21.
Bonesrø nning, H. (2004). Can effective teacher behavior be identified? Economics of Education Review, 23, 237–47.
Chamberlain, G. (1978). Omitted variable bias in panel data: Estimating the returns to schooling. Annales de l’INSEE, 30–1, 49–2.
Chamberlain, G. (1985). Heterogeneity, omitted variable bias, duration dependence. In J.J. Heckman, & B. Singer (Eds.), Longitudinal analysis of labor market data. Cambridge, UK: Cambridge University Press.
Coleman, J.S., Campbell, E.Q., Hobson, C.J., McPartland, J., Mood, A.M., Weinfeld, F.D. et al. (1966). Equality of educational opportunity. Washington, DC: US Government Printing Office.
Dee, T.S. (1998). Competition and the quality of public schools. Economics of Educational Review, 17, 419–27.
Diggle, P.J., Heagarty, P., Liang, K.-Y., & Zeger, S.L. (2002). Analysis of longitudinal data (2nd ed.). London: Oxford University Press.
Dunn, M.C., Kadane, J.B., & Garrow, J.R. (2003). Comparing harm done by mobility and class absence: Missing students and missing data. Journal of Educational and Behavioral Statistics, 28, 269–88.
Ebbes, P., Bockenholt, U., & Wedel, M. (2004). Regressor and random-effects dependencies in multilevel models. Statistica Neerlandica, 58, 161–78.
Ehrenberg, R.G., & Brewer, D.J. (1994). Do school and teacher characteristics matter? Evidence from High School and Beyond. Economics of Education Review, 13, 1–7.
Ehrenberg, R.G., & Brewer, D.J. (1995). Did teachers verbal-ability and race matter in the 1960s—Coleman revisited. Economics of Educational Review, 14, 1–1.
Ehrenberg, R.G., Brewer, D.J., Gamoran, A., & Willms, J.D. (2001). Class size and student achievement. Psychological Science in the Public Interest, 2, 1–0.
Ehrenberg, R.G., Goldhaber, D.D., & Brewer, D.J. (1995). Do teachers’ race, gender, and ethnicity matter? Evidence from NELS:88. Industrial and Labor Relations Review, 48, 547–61.
Frank, K.A. (2000). Impact of a confounding variable on a regression coefficient. Sociological Methods & Research, 29, 147–94.
Frees, E.W. (2001). Omitted variables in longitudinal data models. The Canadian Journal of Statistics, 29, 573–95.
Frees, E.W. (2004). Longitudinal and panel data: Analysis and applications for the social sciences. Cambridge, UK: Cambridge University Press.
Frees, E.W., & Kim, J.-S. (2006). Multilevel model prediction. Psychometrika, 71, 79–04.
Goldhaber, D.D., & Brewer, D.J. (1997). Why don’t schools and teachers seem to matter? Assessing the impact of unobservables on educational productivity. The Journal of Human Resources, 32, 505–23.
Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London: Oxford University Press.
Griliches, Z. (1977). Estimating the returns to schooling. Econometrica, 45, 1–2.
Halaby, C.H. (2004). Panel models in sociological research: Theory into practice. Annual Review of Sociology, 30, 507–40.
Hanushek, E.A. (2003). The failure of input-based schooling policies. The Economic Journal, 113, 64–8.
Hanushek, E.A., Kane, J.F., & Rivkin, S.G. (2004). Disruption versus Tiebout improvement: The costs and benefits of switching schools. Journal of Public Econometrics, 88, 1721–746.
Hausman, J.A. (1978). Specification tests in econometrics. Econometrica, 46, 1251–272.
Hausman, J.A., & Taylor, W.E. (1981). Panel data and unobservable individual effects. Econometrica, 49, 1377–398.
Heckman, J.J., & Singer, B. (1982). Population heterogeneity in demographic models. In K. Land, & A. Rogers (Eds.), Multidimensional mathematical demography. New York: Academic Press.
Hedges, L., Laine, R., & Greenwald, R. (1994). Does money matter? A meta analysis of the effects of differential school inputs on student outcomes. Educational Research, 23, 5–4.
Hsiao, C. (2003). Analysis of panel data (2nd ed.). Cambridge, UK: Cambridge University Press.
Kiefer, N.M. (1980). Estimation of fixed effects models for time series of cross sections with arbitrary intertemporal covariance. Journal of Econometrics, 14, 195–02.
Kim, J.-S., & Frees, E.W. (2005). Fixed effects estimation in multilevel models. University of Wisconsin working paper, available at http://research.bus.wisc.edu/jfrees/.
Laird, N. (2004). Analysis of longitudinal and cluster-correlated data. Institute of Mathematical Statistics, Beachwood, OH.
Ludwig, J., & Bassi, L.J. (1999). The puzzling case of school resources and student achievement. Educational Evaluation and Policy Analysis, 21, 385–03.
Maas, C.J., & Hox, J.J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127–37.
Maddala, G.S. (1971). The use of variance components models in pooling cross section and time series data. Econometrica, 39, 341–58.
Marsh, L.C. (2004). The econometrics of higher education: Editor’s view. Journal of Econometrics, 121, 1–8.
McCaffrey, D.F., Koretz, D., Louis, T.A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67–01.
Murnane, R.J., & Phillips, B.R. (1981). What do effective teachers of inner-city children have in common? Social Science Research, 10, 83–00.
National Association of School Psychologists (NASP) (2003). Position statement on student grade retention and school promotion. Available at http://www.nasponline.org/information/pospaper_graderetent.html.
Palta, M., & Yao, T.-J. (1991). Analysis of longitudinal data with unmeasured confounders. Biometrics, 47, 1355–369.
Phillips, M. (1997). What makes schools effective. A comparison of the relationships of communitarian climate and academic climate to mathematics achievement and attendance during middle school. American Educational Research Journal, 34, 633–62.
Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Newbury Park, CA: Sage.
Raudenbush, S.W., & Willms, J.D. (1995). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20, 307–35.
Rice, N., Jones, A., & Goldstein, H. (1998). Multilevel models where the random effects are correlated with the fixed predictors: A conditioned iterative generalised least squares estimator (CIGLS). York: University of York, Centre for Health Economics.
Rivkin, S.G., Hanushek, E.A., & Kain, J.F. (2005). Teachers, schools, and academic achievement. Econometrica, 73, 417–58.
Singer, J. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323–55.
Snijders, T.A.B., & Bosker, R.J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage.
Verbeke, G., Spiessens, B., & Lesaffre, E. (2001). Conditional linear mixed models. The American Statistician, 55, 25–4.
Vermunt, J.K. (1997). Log-linear models for event histories. Thousand Oaks, CA: Sage.
Webb, N.L., Clune, W.H., Bolt, D.M., Gamoran, A., Meyer, R.H., Osthoff, E., & Thorn, C. (2002). Models for analysis of NSF’s systemic initiative programs—The impact of the urban system initiatives on student achievement in Texas, 1994–000. Wisconsin Center for Education Research, Technical Report. Madison, WI.
Wooldridge, J.M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.
Yamaguchi, K. (1986). Alternative approaches to unobserved heterogeneity in the analysis of repeatable events. In B. Tuma (Ed.), Sociological methodology (pp. 213–49). Washington, DC: American Sociological Association.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the National Academy of Education/Spencer Foundation and the National Science Foundation, Grant Number SES-0436274.
Rights and permissions
About this article
Cite this article
Kim, JS., Frees, E.W. Omitted Variables in Multilevel Models. Psychometrika 71, 659–690 (2006). https://doi.org/10.1007/s11336-005-1283-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-005-1283-0