Skip to main content
Log in

Should We account for classrooms? Analyzing online experimental data with student-level randomization

  • Research Article
  • Published:
Educational technology research and development Aims and scope Submit manuscript

Abstract

Emergent technologies present platforms for educational researchers to conduct randomized controlled trials (RCTs) and collect rich data to study students’ performance, behavior, learning processes, and outcomes in authentic learning environments. As educational research increasingly uses methods and data collection from such platforms, it is necessary to consider the most appropriate ways to analyze this data to draw causal inferences from RCTs. Here, we examine whether and how analysis results are impacted by accounting for multilevel variance in samples from RCTs with student-level randomization within one platform. We propose and demonstrate a method that leverages auxiliary non-experimental “remnant” data collected within a learning platform to inform analysis decisions. Specifically, we compare five commonly-applied analysis methods to estimate treatment effects while accounting for, or ignoring, class-level factors and observed measures of confidence and accuracy to identify best practices under real-world conditions. We find that methods that account for groups as either fixed effects or random effects consistently outperform those that ignore group-level factors, even though randomization was applied at the student level. However, we found no meaningful differences between the use of fixed or random effects as a means to account for groups. We conclude that analyses of online experiments should account for the naturally-nested structure of students within classes, despite the notion that student-level randomization may alleviate group-level differences. Further, we demonstrate how to use remnant data to identify appropriate methods for analyzing experiments. These findings provide practical guidelines for researchers conducting RCTs in similar educational technologies to make more informed decisions when approaching analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

This study was pre-registered on the Open Science Framework at https://doi.org/10.17605/OSF.IO/MNYZH. The data and materials for this study are also publicly available on the project page at https://osf.io/c8rj3/.

Notes

  1. If X is a N x P matrix of predictors whose first column is composed of 1 s (corresponding to the intercept), and Y is a vector of length n of outcome measurements, then the vector of coefficients β = (XTX)−1XTY.

References

  • Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. (2017). When should you adjust standard errors for clustering? National Bureau of Economic Research., 138, 1–35.

    Google Scholar 

  • Antonakis, J., Bastardoz, N., & Rönkkö, M. (2021). On ignoring the random effects assumption in multilevel models: Review, critique, and recommendations. Organizational Research Methods, 24(2), 443–483. https://doi.org/10.1177/1094428119877457

    Article  Google Scholar 

  • Bliese, P. D., Maltarich, M. A., & Hendricks, J. L. (2018). Back to basics with mixed-effects models: Nine take-away points. Journal of Business and Psychology, 33(1), 1–23.

    Article  Google Scholar 

  • Chan, J.Y.-C., Lee, J. E., Mason, C. A., Sawrey, K., & Ottmar, E. (2022). From here to there! A dynamic algebraic notation system improves understanding of equivalence in middle-school students. Journal of Educational Psychology, 114(1), 56.

    Article  Google Scholar 

  • Elbourne, D. R., Campbell, M. K., Piaggio, G., & Altman, D. G. (2014). CONSORT for cluster randomized trials. Guidelines for Reporting Health Research: A User’s Manual. https://doi.org/10.1002/9781118715598.ch13

    Article  Google Scholar 

  • Freedman, D. A. (2008). On regression adjustments in experiments with several treatments. The Annals of Applied Statistics, 2(1), 176–196.

    Article  MathSciNet  Google Scholar 

  • Fyfe, E. R. (2016). Providing feedback on computer-based algebra homework in middle-school classrooms. Computers in Human Behavior, 63, 568–574.

    Article  Google Scholar 

  • Gagnon-Bartsch, J. A., Sales, A. C., Wu, E., Botelho, A. F., Erickson, J. A., Miratrix, L. W., & Heffernan, N. T. (2021). Precise unbiased estimation in randomized experiments using auxiliary observational data. Preprint retrieved from arXiv:2105.03529.

  • Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.

    Book  Google Scholar 

  • Harrison, A., Smith, H., Hulse, T., & Ottmar, E. R. (2020). Spacing out! Manipulating spatial features in mathematical expressions affects performance. Journal of Numerical Cognition, 6(2), 186–203. https://doi.org/10.5964/jnc.v6i2.243

    Article  Google Scholar 

  • Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education, 24(4), 470–497.

    Article  MathSciNet  Google Scholar 

  • Lang, J., Thomas, J. L., Bliese, P. D., & Adler, A. B. (2007). Job demands and job performance: The mediating effect of psychological and physical strain and the moderating effect of role clarity. Journal of Occupational Health Psychology, 12(2), 116.

    Article  PubMed  Google Scholar 

  • Lee, V. E. (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 35(2), 125–141. https://doi.org/10.1207/S15326985EP3502_6

    Article  Google Scholar 

  • Lindquist, E. F. (1940). Statistical analysis in educational research. Boston: Houghton Mifflin.

    Google Scholar 

  • Liu, D., Zhang, S., Wang, L., & Lee, T. W. (2011). The effects of autonomy and empowerment on employee turnover: Test of a multilevel model in teams. Journal of Applied Psychology, 96(6), 1305. https://doi.org/10.1037/a0024518

    Article  PubMed  Google Scholar 

  • Massachusetts Department of Elementary and Secondary Education. (2017). Massachusetts Curriculum Framework for Mathematics. https://www.doe.mass.edu/frameworks/math/2017-06.pdf

  • Marsh, H. W., Kong, C. K., & Hau, K. T. (2000). Longitudinal multilevel models of the big-fish-little-pond effect on academic self-concept: Counterbalancing contrast and reflected-glory effects in Hong Kong schools. Journal of Personality and Social Psychology, 78(2), 337.

    Article  CAS  PubMed  Google Scholar 

  • McGuire, P., Tu, S., Logue, M. E., Mason, C. A., & Ostrow, K. (2017). Counterintuitive effects of online feedback in middle school math: Results from a randomized controlled trial in ASSISTments. Educational Media International, 54(3), 231–244.

    Article  Google Scholar 

  • McNeish, D., & Kelley, K. (2019). Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychological Methods, 24(1), 20.

    Article  PubMed  Google Scholar 

  • McNeish, D., & Stapleton, L. M. (2016). Modeling clustered data with very few clusters. Multivariate Behavioral Research, 51(4), 495–518. https://doi.org/10.1080/00273171.2016.1167008

    Article  PubMed  Google Scholar 

  • McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114.

    Article  PubMed  Google Scholar 

  • Moen, E. L., Fricano-Kugler, C. J., Luikart, B. W., & O’Malley, A. J. (2016). Analyzing clustered data: Why and how to account for multiple observations nested within a study participant? PLoS ONE, 11(1), e0146721. https://doi.org/10.1371/journal.pone.0146721

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Motz, B. A., Carvalho, P. F., de Leeuw, J. R., & Goldstone, R. L. (2018). Embedding experiments: Staking causal inference in authentic educational contexts. Journal of Learning Analytics, 5(2), 47–59.

    Article  Google Scholar 

  • Murray, D. M., Varnell, S. P., & Blitstein, J. L. (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. American Journal of Public Health, 94(3), 423–432. https://doi.org/10.2105/ajph.94.3.423

    Article  PubMed  PubMed Central  Google Scholar 

  • Ngo, V., Lacera, L. P., Closser, A. H., & Ottmar, E. (2023). The effects of operator position and superfluous brackets on student performance in simple arithmetic. Journal of Numerical Cognition, 9(1), 107–128. https://doi.org/10.5964/jnc.9535

    Article  Google Scholar 

  • Niehaus, E., Campbell, C. M., & Inkelas, K. K. (2014). HLM behind the curtain: Unveiling decisions behind the use and interpretation of HLM in higher education research. Research in Higher Education, 55, 101–122. https://doi.org/10.1007/s11162-013-9306-7

    Article  Google Scholar 

  • Ostrow, K. S., Heffernan, N. T., & Williams, J. J. (2017). Tomorrow’s edtech today: Establishing a learning platform as a collaborative research tool for sound science. Teachers College Record, 119(3), 1–36.

    Article  Google Scholar 

  • Puffer, S., Torgerson, D. J., & Watson, J. (2005). Cluster randomized controlled trials. Journal of Evaluation in Clinical Practice, 11, 479–483. https://doi.org/10.1111/j.1365-2753.2005.00568.x

    Article  PubMed  Google Scholar 

  • Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683.

    Article  MathSciNet  Google Scholar 

  • Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2(2), 173–185. https://doi.org/10.1037/1082-989X.2.2.173

    Article  Google Scholar 

  • Roschelle, J., Feng, M., Murphy, R. F., & Mason, C. A. (2016). Online mathematics homework increases student achievement. AERA Open, 2(4), 1–12. https://doi.org/10.1177/2332858416673968

    Article  Google Scholar 

  • Sales, A. C., Hansen, B. B., & Rowan, B. (2018a). Rebar: Reinforcing a matching estimator with predictions from high-dimensional covariates. Journal of Educational and Behavioral Statistics, 43(1), 3–31.

    Article  Google Scholar 

  • Sales, A., Botelho, A. F., Patikorn, T., & Heffernan, N. T. (2018). Using big data to sharpen design-based inference in A/B tests. In Proceedings of the Eleventh International Conference on Educational Data Mining.

  • Salganik, M. J. (2019). Bit by bit: Social research in the digital age. Princeton University Press.

    Google Scholar 

  • Salisbury, C., Wallace, M., & Montgomery, A. A. (2010). Patients’ experience and satisfaction in primary care: secondary analysis using multilevel modelling. BMJ. https://doi.org/10.1136/bmj.c5004

    Article  PubMed  PubMed Central  Google Scholar 

  • Schochet, P. Z. (2010). Is regression adjustment supported by the Neyman model for causal inference? Journal of Statistical Planning and Inference, 140(1), 246–259. https://doi.org/10.1016/j.jspi.2009.07.008

    Article  MathSciNet  Google Scholar 

  • Schurer, S., & Yong, J. (2012). Personality, well-being and the marginal utility of income: What can we learn from random coefficient models? Health, Econometrics and Data Group (HEDG) Working Papers. Department of Economics, University of York, York, United Kingdom

  • Selent, D., Patikorn, T., & Heffernan, N. (2016). Assistments dataset from multiple randomized controlled experiments. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale pp 181–184.

  • Sharma, G. (2017). Pros and cons of different sampling techniques. International Journal of Applied Research, 3(7), 749–752.

    ADS  Google Scholar 

  • Simsek, E., Xenidou-Dervou, I., Hunter, J., Dowens, M. G., Pang, J., Lee, Y., & Jones, I. (2022). Factors associated with children’s understanding of mathematical equivalence: An investigation across six countries. Journal of Educational Psychology, 114(6), 1359. https://doi.org/10.1037/edu0000747

    Article  Google Scholar 

  • Smith, H., Closser, A. H., Ottmar, E., & Chan, J. Y. C. (2022). The impact of algebra worked example presentations on student learning. Applied Cognitive Psychology, 36(2), 363–377.

    Article  Google Scholar 

  • Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195152968.001.0001

    Book  Google Scholar 

  • Walkington, C., Clinton, V., & Sparks, A. (2019). The effect of language modification of mathematics story problems on problem-solving in online homework. Instructional Science, 47, 499–529. https://doi.org/10.1007/s11251-019-09481-6

    Article  Google Scholar 

  • Wooldridge, J. M. (2009). Heteroskedasticity-Robust Inference after OLS Estimation. Introductory Econometrics: A Modern Approach 265–271.

Download references

Acknowledgements

We thank Dr. Neil Heffernan, Cristina Heffernan, and the ASSISTments team for their support of this work and their dedication to open science practices. 

Funding

This work was funded by the National Science Foundation (Graduate Research Fellowship #1645629 & Grant #2331379) and Schmidt Futures. None of the opinions expressed here are those of the funders. The research reported here was also supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305N230034 to Purdue University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the research presented in this manuscript. AHC: conceptualization, project leader, writing, methodology. AS: analysis and interpretation of results, writing. AFB: conceptualization, methodology, analysis and interpretation of results, writing, supervision.

Corresponding author

Correspondence to Anthony F. Botelho.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Closser, A.H., Sales, A. & Botelho, A.F. Should We account for classrooms? Analyzing online experimental data with student-level randomization. Education Tech Research Dev (2024). https://doi.org/10.1007/s11423-023-10325-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11423-023-10325-x

Keywords

Navigation