Abstract
Emergent technologies present platforms for educational researchers to conduct randomized controlled trials (RCTs) and collect rich data to study students’ performance, behavior, learning processes, and outcomes in authentic learning environments. As educational research increasingly uses methods and data collection from such platforms, it is necessary to consider the most appropriate ways to analyze this data to draw causal inferences from RCTs. Here, we examine whether and how analysis results are impacted by accounting for multilevel variance in samples from RCTs with student-level randomization within one platform. We propose and demonstrate a method that leverages auxiliary non-experimental “remnant” data collected within a learning platform to inform analysis decisions. Specifically, we compare five commonly-applied analysis methods to estimate treatment effects while accounting for, or ignoring, class-level factors and observed measures of confidence and accuracy to identify best practices under real-world conditions. We find that methods that account for groups as either fixed effects or random effects consistently outperform those that ignore group-level factors, even though randomization was applied at the student level. However, we found no meaningful differences between the use of fixed or random effects as a means to account for groups. We conclude that analyses of online experiments should account for the naturally-nested structure of students within classes, despite the notion that student-level randomization may alleviate group-level differences. Further, we demonstrate how to use remnant data to identify appropriate methods for analyzing experiments. These findings provide practical guidelines for researchers conducting RCTs in similar educational technologies to make more informed decisions when approaching analyses.
Similar content being viewed by others
Data availability
This study was pre-registered on the Open Science Framework at https://doi.org/10.17605/OSF.IO/MNYZH. The data and materials for this study are also publicly available on the project page at https://osf.io/c8rj3/.
Notes
If X is a N x P matrix of predictors whose first column is composed of 1 s (corresponding to the intercept), and Y is a vector of length n of outcome measurements, then the vector of coefficients β = (XTX)−1XTY.
References
Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. (2017). When should you adjust standard errors for clustering? National Bureau of Economic Research., 138, 1–35.
Antonakis, J., Bastardoz, N., & Rönkkö, M. (2021). On ignoring the random effects assumption in multilevel models: Review, critique, and recommendations. Organizational Research Methods, 24(2), 443–483. https://doi.org/10.1177/1094428119877457
Bliese, P. D., Maltarich, M. A., & Hendricks, J. L. (2018). Back to basics with mixed-effects models: Nine take-away points. Journal of Business and Psychology, 33(1), 1–23.
Chan, J.Y.-C., Lee, J. E., Mason, C. A., Sawrey, K., & Ottmar, E. (2022). From here to there! A dynamic algebraic notation system improves understanding of equivalence in middle-school students. Journal of Educational Psychology, 114(1), 56.
Elbourne, D. R., Campbell, M. K., Piaggio, G., & Altman, D. G. (2014). CONSORT for cluster randomized trials. Guidelines for Reporting Health Research: A User’s Manual. https://doi.org/10.1002/9781118715598.ch13
Freedman, D. A. (2008). On regression adjustments in experiments with several treatments. The Annals of Applied Statistics, 2(1), 176–196.
Fyfe, E. R. (2016). Providing feedback on computer-based algebra homework in middle-school classrooms. Computers in Human Behavior, 63, 568–574.
Gagnon-Bartsch, J. A., Sales, A. C., Wu, E., Botelho, A. F., Erickson, J. A., Miratrix, L. W., & Heffernan, N. T. (2021). Precise unbiased estimation in randomized experiments using auxiliary observational data. Preprint retrieved from arXiv:2105.03529.
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Harrison, A., Smith, H., Hulse, T., & Ottmar, E. R. (2020). Spacing out! Manipulating spatial features in mathematical expressions affects performance. Journal of Numerical Cognition, 6(2), 186–203. https://doi.org/10.5964/jnc.v6i2.243
Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education, 24(4), 470–497.
Lang, J., Thomas, J. L., Bliese, P. D., & Adler, A. B. (2007). Job demands and job performance: The mediating effect of psychological and physical strain and the moderating effect of role clarity. Journal of Occupational Health Psychology, 12(2), 116.
Lee, V. E. (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 35(2), 125–141. https://doi.org/10.1207/S15326985EP3502_6
Lindquist, E. F. (1940). Statistical analysis in educational research. Boston: Houghton Mifflin.
Liu, D., Zhang, S., Wang, L., & Lee, T. W. (2011). The effects of autonomy and empowerment on employee turnover: Test of a multilevel model in teams. Journal of Applied Psychology, 96(6), 1305. https://doi.org/10.1037/a0024518
Massachusetts Department of Elementary and Secondary Education. (2017). Massachusetts Curriculum Framework for Mathematics. https://www.doe.mass.edu/frameworks/math/2017-06.pdf
Marsh, H. W., Kong, C. K., & Hau, K. T. (2000). Longitudinal multilevel models of the big-fish-little-pond effect on academic self-concept: Counterbalancing contrast and reflected-glory effects in Hong Kong schools. Journal of Personality and Social Psychology, 78(2), 337.
McGuire, P., Tu, S., Logue, M. E., Mason, C. A., & Ostrow, K. (2017). Counterintuitive effects of online feedback in middle school math: Results from a randomized controlled trial in ASSISTments. Educational Media International, 54(3), 231–244.
McNeish, D., & Kelley, K. (2019). Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychological Methods, 24(1), 20.
McNeish, D., & Stapleton, L. M. (2016). Modeling clustered data with very few clusters. Multivariate Behavioral Research, 51(4), 495–518. https://doi.org/10.1080/00273171.2016.1167008
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114.
Moen, E. L., Fricano-Kugler, C. J., Luikart, B. W., & O’Malley, A. J. (2016). Analyzing clustered data: Why and how to account for multiple observations nested within a study participant? PLoS ONE, 11(1), e0146721. https://doi.org/10.1371/journal.pone.0146721
Motz, B. A., Carvalho, P. F., de Leeuw, J. R., & Goldstone, R. L. (2018). Embedding experiments: Staking causal inference in authentic educational contexts. Journal of Learning Analytics, 5(2), 47–59.
Murray, D. M., Varnell, S. P., & Blitstein, J. L. (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. American Journal of Public Health, 94(3), 423–432. https://doi.org/10.2105/ajph.94.3.423
Ngo, V., Lacera, L. P., Closser, A. H., & Ottmar, E. (2023). The effects of operator position and superfluous brackets on student performance in simple arithmetic. Journal of Numerical Cognition, 9(1), 107–128. https://doi.org/10.5964/jnc.9535
Niehaus, E., Campbell, C. M., & Inkelas, K. K. (2014). HLM behind the curtain: Unveiling decisions behind the use and interpretation of HLM in higher education research. Research in Higher Education, 55, 101–122. https://doi.org/10.1007/s11162-013-9306-7
Ostrow, K. S., Heffernan, N. T., & Williams, J. J. (2017). Tomorrow’s edtech today: Establishing a learning platform as a collaborative research tool for sound science. Teachers College Record, 119(3), 1–36.
Puffer, S., Torgerson, D. J., & Watson, J. (2005). Cluster randomized controlled trials. Journal of Evaluation in Clinical Practice, 11, 479–483. https://doi.org/10.1111/j.1365-2753.2005.00568.x
Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683.
Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2(2), 173–185. https://doi.org/10.1037/1082-989X.2.2.173
Roschelle, J., Feng, M., Murphy, R. F., & Mason, C. A. (2016). Online mathematics homework increases student achievement. AERA Open, 2(4), 1–12. https://doi.org/10.1177/2332858416673968
Sales, A. C., Hansen, B. B., & Rowan, B. (2018a). Rebar: Reinforcing a matching estimator with predictions from high-dimensional covariates. Journal of Educational and Behavioral Statistics, 43(1), 3–31.
Sales, A., Botelho, A. F., Patikorn, T., & Heffernan, N. T. (2018). Using big data to sharpen design-based inference in A/B tests. In Proceedings of the Eleventh International Conference on Educational Data Mining.
Salganik, M. J. (2019). Bit by bit: Social research in the digital age. Princeton University Press.
Salisbury, C., Wallace, M., & Montgomery, A. A. (2010). Patients’ experience and satisfaction in primary care: secondary analysis using multilevel modelling. BMJ. https://doi.org/10.1136/bmj.c5004
Schochet, P. Z. (2010). Is regression adjustment supported by the Neyman model for causal inference? Journal of Statistical Planning and Inference, 140(1), 246–259. https://doi.org/10.1016/j.jspi.2009.07.008
Schurer, S., & Yong, J. (2012). Personality, well-being and the marginal utility of income: What can we learn from random coefficient models? Health, Econometrics and Data Group (HEDG) Working Papers. Department of Economics, University of York, York, United Kingdom
Selent, D., Patikorn, T., & Heffernan, N. (2016). Assistments dataset from multiple randomized controlled experiments. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale pp 181–184.
Sharma, G. (2017). Pros and cons of different sampling techniques. International Journal of Applied Research, 3(7), 749–752.
Simsek, E., Xenidou-Dervou, I., Hunter, J., Dowens, M. G., Pang, J., Lee, Y., & Jones, I. (2022). Factors associated with children’s understanding of mathematical equivalence: An investigation across six countries. Journal of Educational Psychology, 114(6), 1359. https://doi.org/10.1037/edu0000747
Smith, H., Closser, A. H., Ottmar, E., & Chan, J. Y. C. (2022). The impact of algebra worked example presentations on student learning. Applied Cognitive Psychology, 36(2), 363–377.
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195152968.001.0001
Walkington, C., Clinton, V., & Sparks, A. (2019). The effect of language modification of mathematics story problems on problem-solving in online homework. Instructional Science, 47, 499–529. https://doi.org/10.1007/s11251-019-09481-6
Wooldridge, J. M. (2009). Heteroskedasticity-Robust Inference after OLS Estimation. Introductory Econometrics: A Modern Approach 265–271.
Acknowledgements
We thank Dr. Neil Heffernan, Cristina Heffernan, and the ASSISTments team for their support of this work and their dedication to open science practices.
Funding
This work was funded by the National Science Foundation (Graduate Research Fellowship #1645629 & Grant #2331379) and Schmidt Futures. None of the opinions expressed here are those of the funders. The research reported here was also supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305N230034 to Purdue University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.
Author information
Authors and Affiliations
Contributions
All authors contributed to the research presented in this manuscript. AHC: conceptualization, project leader, writing, methodology. AS: analysis and interpretation of results, writing. AFB: conceptualization, methodology, analysis and interpretation of results, writing, supervision.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Closser, A.H., Sales, A. & Botelho, A.F. Should We account for classrooms? Analyzing online experimental data with student-level randomization. Education Tech Research Dev (2024). https://doi.org/10.1007/s11423-023-10325-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s11423-023-10325-x