Abstract
Many national and international educational data collection programs offer researchers opportunities to investigate contextual effects related to student performance. In those programs, schools are often used in the first-stage sampling process and students are randomly drawn from selected schools. However, the incidental dependence of students within classrooms, which are not part of the sampling design, may violate assumptions of statistical models, but this nesting also offers the opportunity for educational researchers to evaluate contextual effects. In this manuscript, we utilize the Early Childhood Longitudinal Study-Kindergarten dataset to demonstrate impacts of incidental dependence using a two-level model and a three-level model. We then illustrate, through a simulation, that both models can yield unbiased parameter estimates. However, two-level models tend to provide underestimated standard errors for fixed effects at the incidental level, and variance components of the random effect at the incidental level are divided into the flanking levels when it is ignored. In addition, another method of modeling nested data, using generalized estimating equations, was also compared with the model-based methods.
Similar content being viewed by others
Notes
The five existing publicly released datasets are Early Childhood Longitudinal Study: Kindergarten Class of 1998–1999 (ECLS-K), Education Longitudinal Study of 2002 (ELS), National Education Longitudinal Study: 1988 (NELS), Schools and Staffing Survey of 1999–2000 with Teacher Follow-up Study of 2000-2001 (SASS-TFS).
References
Adelson, J. L., McCoach, D. B., & Gavin, M. K. (2012). Examining the effects of gifted programming in mathematics and reading using the ECLS-K. The Gifted Child Quarterly, 56(1), 25–39. https://doi.org/10.1177/0016986211431487.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Proceedings of the 2nd international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado.
Ballinger, G. A. (2004). Using generalized estimating equations for longitudinal data analysis. Organizational Research Methods, 7(2), 127–150. https://doi.org/10.1177/1094428104263672.
Bauer, D. J., & Sterba, S. K. (2011). Fitting multilevel models with ordinal outcomes: performance of alternative specifications and methods of estimation. Psychological Methods, 16(4), 373–390. https://doi.org/10.1037/a0025813.
Bell, B. A., Ferron, J. M., & Kromrey, J. D. (2008). Cluster size in multilevel models: the impact of sparse data structures on point and interval estimates in two-level models. JSM Proceedings, 1122–1129. Retrieved from https://ww2.amstat.org/sections/srms/Proceedings/y2008/Files/300933.pdf
Bronfenbrenner, U. (1994). Ecological models of human development. The International Encyclopedia of Education, 3(2), 1643–1647.
Chen, Q. (2012). The impact of ignoring a level of nesting structure in multilevel mixture model: a Monte Carlo study. SAGE Open, 2(1), 2158244012442518. https://doi.org/10.1177/2158244012442518.
Cheong, Y. F., Fotiu, R. P., & Raudenbush, S. W. (2001). Efficiency and robustness of alternative estimators for two- and three-level models: the case of NAEP. Journal of Educational and Behavioral Statistics, 26(4), 411–429. https://doi.org/10.3102/10769986026004411.
Clarke, P. (2008). When can group level clustering be ignored? Multilevel models versus single-level models with sparse data. Journal of Epidemiology & Community Health, 62(8), 752–758. https://doi.org/10.1136/jech.2007.060798.
Croninger, R. G., Rice, J. K., Rathbun, A., & Nishio, M. (2007). Teacher qualifications and early learning: effects of certification, degree, and experience on first-grade student achievement. Economics of Education Review, 26(3), 312–324. https://doi.org/10.1016/J.ECONEDUREV.2005.05.008.
van den Wijngaard, O., Beausaert, S., Segers, M., & Gijselaers, W. (2015). The development and validation of an instrument to measure conditions for social engagement of students in higher education. Studies in Higher Education, 40(4), 704–720. https://doi.org/10.1080/03075079.2013.842214.
Goldstein, H. (2003). Multilevel statistical models. London: Arnold.
Green, P. J., Herget, D., & Rosen, J. (2009). User’s guide for the Program for International Student Assessment (PISA): 2006 data files and database with United States specific variables (NCES 2009-055). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.
Hox, J. (2002). Multilevel analysis: techniques and applications. Mahwah, NJ: Lawrence Erlbaum Associates.
Hox, J. J. (2010). Multilevel analysis: techniques and applications. (2nd ed.). Routledge. https://doi.org/10.4324/9780203852279.
Ingels, S. J., Pratt, D. J., Rogers, J. E., Siegel, P. H., & Stutts, E. S. (2005). Education longitudinal study of 2002: base-year to first follow-up data file documentation. Washington, D.C.: National Center for Education Statistics, United States Department of Education.
Jennings, J. L., & DiPrete, T. A. (2010). Teacher effects on social and behavioral skills in early elementary school. Sociology of Education, 83(2), 135–159. https://doi.org/10.1177/0038040710368011.
Kendall, M., & Stuart, A. (1977). The advanced theory of statistics: distribution theory (4th ed.). London: Griffin.
Liang, K., & Zeger, S. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1). https://doi.org/10.2307/2336267.
Longford, N. T. (1993). Random coefficient models. Oxford, England: Clarendon.
Lu, B., Preisser, J. S., Qaqish, B. F., Suchindran, C., Bangdiwala, S. I., & Wolfson, M. (2007). A comparison of two bias-corrected covariance estimators for generalized estimating equations. Biometrics, 63(3), 935–941. https://doi.org/10.1111/j.1541-0420.2007.00764.x.
Martínez, J. F., Stecher, B., & Borko, H. (2009). Classroom assessment practices, teacher judgments, and student achievement in mathematics: evidence from the ECLS. Educational Assessment, 14(2), 78–102. https://doi.org/10.1080/10627190903039429.
McNeish, D. M. (2014). Modeling sparsely clustered data: design-based, model-based, and single-level methods. Psychological Methods, 19(4), 552–563. https://doi.org/10.1037/met0000024.
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114–140. https://doi.org/10.1037/met0000078.
Moerbeek, M. (2004). The consequence of ignoring a level of nesting in multilevel analysis. Multivariate Behavioral Research, 39(1), 129–149 https://doi.org/10.1207/s15327906mbr39015.
Morel, J. G., Bokossa, M. C., & Neerchal, N. K. (2003). Small sample correction for the variance of GEE estimators. Biometrical Journal, 45(4), 395–409. https://doi.org/10.1002/bimj.200390021.
O’Connell, A. A., & McCoach, D. B. (2008). Multilevel modeling of educational data. Charlotte, NC: IAP.
Opdenakker, M., & Van Damme, J. (2000). The importance of identifying levels in multilevel analysis: an illustration of the effects of ignoring the top or intermediate levels in school effectiveness research. Taylor & Francis. Retrieved from https://doi.org/10.1076/0924-3453(200003)11%3A1%3B1-A%3BFT103
Palardy, G. J., & Rumberger, R. W. (2008). Teacher effectiveness in first grade: the importance of background qualifications, attitudes, and instructional practices for student learning. Educational Evaluation and Policy Analysis, 30(2), 111–140. https://doi.org/10.3102/0162373708317680.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear model: applications and data analysis methods. Sage. Retrieved from https://us.sagepub.com/en-us/nam/hierarchical-linear-models/book9230
Raykov, T., Patelis, T., Marcoulides, G. A., & Lee, C.-L. (2016). Examining intermediate omitted levels in hierarchical designs via latent variable modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23(1), 111–115.
SAS Institute. (2015). Base SAS 9.4 procedures guide. SAS Institute.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Snijders, T. A. B., & Bosker, R. J. (Eds.). (2012). Multilevel analysis: an introduction to basic and advanced multilevel modeling (2nd ed.). Sage
Spencer, B. D., Frankel, M. R., Ingels, S. J., Rasinski, K. A., Tourangeau, R., & Owings, J. A. (1990). National Educational Longitudinal Study of 1988: base year sample design report Washington, D.C. Retrieved from https://nces.ed.gov/pubs90/90463.pdf
Stapleton, L. M., & Kang, Y. (2016). Design effects of multilevel estimates from national probability samples. Sociological Methods & Research, 47(3), 430–457. https://doi.org/10.1177/0049124116630563.
Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., & Najarian, M. (2009). Early childhood longitudinal study, kindergarten class of 1998–99 (ECLS-K), combined user’s manual for the ECLS-K eighth-grade and k–8 full sample data files and electronic codebooks (NCES 2009–004). Retrieved from https://nces.ed.gov/ecls/data/ECLSKK8Manualpart1.pdf
Vaezghasemi, M., Ng, N., Eriksson, M., & Subramanian, S. V. (2016). Households, the omitted level in contextual analysis: disentangling the relative influence of households and districts on the variation of BMI about two decades in Indonesia. International Journal for Equity in Health, 15(1), 102. https://doi.org/10.1186/s12939-016-0388-7.
Van Den Noortgate, W., Opdenakker, M. C., & Onghena, P. (2005). The effects of ignoring a level in multilevel analysis. School Effectiveness and School Improvement, 16(3), 281–303. https://doi.org/10.1080/09243450500114850.
Van Landeghem, G., De Fraine, B., & Van Damme, J. (2005). The consequence of ignoring a level of nesting in multilevel analysis: a comment. Multivariate Behavioral Research, 40(4), 423–434.
Williams, T., Ferraro, D., Roey, S., Brenwald, S., Kastberg, D., Jocelyn, L., Smith, C., & Stearns, P. (2009). TIMSS 2007 U.S. technical report and user guide (NCES 2009-012). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.
Zeger, S. L., Liang, K.-Y., & Albert, P. S. (1988). Models for longitudinal data: a generalized estimating equation approach. Biometrics, 44(4), 1049–1060. https://doi.org/10.2307/2531734.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Wang, W., Liao, M. & Stapleton, L. Incidental Second-Level Dependence in Educational Survey Data with a Nested Data Structure. Educ Psychol Rev 31, 571–596 (2019). https://doi.org/10.1007/s10648-019-09480-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10648-019-09480-6