Should We account for classrooms? Analyzing online experimental data with student-level randomization

Closser, Avery H.; Sales, Adam; Botelho, Anthony F.

doi:10.1007/s11423-023-10325-x

Should We account for classrooms? Analyzing online experimental data with student-level randomization

Research Article
Published: 05 February 2024

(2024)
Cite this article

Educational technology research and development Aims and scope Submit manuscript

186 Accesses
3 Altmetric
Explore all metrics

Abstract

Emergent technologies present platforms for educational researchers to conduct randomized controlled trials (RCTs) and collect rich data to study students’ performance, behavior, learning processes, and outcomes in authentic learning environments. As educational research increasingly uses methods and data collection from such platforms, it is necessary to consider the most appropriate ways to analyze this data to draw causal inferences from RCTs. Here, we examine whether and how analysis results are impacted by accounting for multilevel variance in samples from RCTs with student-level randomization within one platform. We propose and demonstrate a method that leverages auxiliary non-experimental “remnant” data collected within a learning platform to inform analysis decisions. Specifically, we compare five commonly-applied analysis methods to estimate treatment effects while accounting for, or ignoring, class-level factors and observed measures of confidence and accuracy to identify best practices under real-world conditions. We find that methods that account for groups as either fixed effects or random effects consistently outperform those that ignore group-level factors, even though randomization was applied at the student level. However, we found no meaningful differences between the use of fixed or random effects as a means to account for groups. We conclude that analyses of online experiments should account for the naturally-nested structure of students within classes, despite the notion that student-level randomization may alleviate group-level differences. Further, we demonstrate how to use remnant data to identify appropriate methods for analyzing experiments. These findings provide practical guidelines for researchers conducting RCTs in similar educational technologies to make more informed decisions when approaching analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Terracotta: A tool for conducting experimental research on student learning

Article Open access 10 July 2023

Understanding the Complexities of Experimental Analysis in the Context of Higher Education

Data availability

This study was pre-registered on the Open Science Framework at https://doi.org/10.17605/OSF.IO/MNYZH. The data and materials for this study are also publicly available on the project page at https://osf.io/c8rj3/.

Notes

If X is a N x P matrix of predictors whose first column is composed of 1 s (corresponding to the intercept), and Y is a vector of length n of outcome measurements, then the vector of coefficients β = (X^TX)⁻¹X^TY.

References

Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. (2017). When should you adjust standard errors for clustering? National Bureau of Economic Research., 138, 1–35.
Google Scholar
Antonakis, J., Bastardoz, N., & Rönkkö, M. (2021). On ignoring the random effects assumption in multilevel models: Review, critique, and recommendations. Organizational Research Methods, 24(2), 443–483. https://doi.org/10.1177/1094428119877457
Article Google Scholar
Bliese, P. D., Maltarich, M. A., & Hendricks, J. L. (2018). Back to basics with mixed-effects models: Nine take-away points. Journal of Business and Psychology, 33(1), 1–23.
Article Google Scholar
Chan, J.Y.-C., Lee, J. E., Mason, C. A., Sawrey, K., & Ottmar, E. (2022). From here to there! A dynamic algebraic notation system improves understanding of equivalence in middle-school students. Journal of Educational Psychology, 114(1), 56.
Article Google Scholar
Elbourne, D. R., Campbell, M. K., Piaggio, G., & Altman, D. G. (2014). CONSORT for cluster randomized trials. Guidelines for Reporting Health Research: A User’s Manual. https://doi.org/10.1002/9781118715598.ch13
Article Google Scholar
Freedman, D. A. (2008). On regression adjustments in experiments with several treatments. The Annals of Applied Statistics, 2(1), 176–196.
Article MathSciNet Google Scholar
Fyfe, E. R. (2016). Providing feedback on computer-based algebra homework in middle-school classrooms. Computers in Human Behavior, 63, 568–574.
Article Google Scholar
Gagnon-Bartsch, J. A., Sales, A. C., Wu, E., Botelho, A. F., Erickson, J. A., Miratrix, L. W., & Heffernan, N. T. (2021). Precise unbiased estimation in randomized experiments using auxiliary observational data. Preprint retrieved from arXiv:2105.03529.
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Book Google Scholar
Harrison, A., Smith, H., Hulse, T., & Ottmar, E. R. (2020). Spacing out! Manipulating spatial features in mathematical expressions affects performance. Journal of Numerical Cognition, 6(2), 186–203. https://doi.org/10.5964/jnc.v6i2.243
Article Google Scholar
Heffernan, N. T., & Heffernan, C. L. (2014). The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education, 24(4), 470–497.
Article MathSciNet Google Scholar
Lang, J., Thomas, J. L., Bliese, P. D., & Adler, A. B. (2007). Job demands and job performance: The mediating effect of psychological and physical strain and the moderating effect of role clarity. Journal of Occupational Health Psychology, 12(2), 116.
Article PubMed Google Scholar
Lee, V. E. (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 35(2), 125–141. https://doi.org/10.1207/S15326985EP3502_6
Article Google Scholar
Lindquist, E. F. (1940). Statistical analysis in educational research. Boston: Houghton Mifflin.
Google Scholar
Liu, D., Zhang, S., Wang, L., & Lee, T. W. (2011). The effects of autonomy and empowerment on employee turnover: Test of a multilevel model in teams. Journal of Applied Psychology, 96(6), 1305. https://doi.org/10.1037/a0024518
Article PubMed Google Scholar
Massachusetts Department of Elementary and Secondary Education. (2017). Massachusetts Curriculum Framework for Mathematics. https://www.doe.mass.edu/frameworks/math/2017-06.pdf
Marsh, H. W., Kong, C. K., & Hau, K. T. (2000). Longitudinal multilevel models of the big-fish-little-pond effect on academic self-concept: Counterbalancing contrast and reflected-glory effects in Hong Kong schools. Journal of Personality and Social Psychology, 78(2), 337.
Article CAS PubMed Google Scholar
McGuire, P., Tu, S., Logue, M. E., Mason, C. A., & Ostrow, K. (2017). Counterintuitive effects of online feedback in middle school math: Results from a randomized controlled trial in ASSISTments. Educational Media International, 54(3), 231–244.
Article Google Scholar
McNeish, D., & Kelley, K. (2019). Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychological Methods, 24(1), 20.
Article PubMed Google Scholar
McNeish, D., & Stapleton, L. M. (2016). Modeling clustered data with very few clusters. Multivariate Behavioral Research, 51(4), 495–518. https://doi.org/10.1080/00273171.2016.1167008
Article PubMed Google Scholar
McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114.
Article PubMed Google Scholar
Moen, E. L., Fricano-Kugler, C. J., Luikart, B. W., & O’Malley, A. J. (2016). Analyzing clustered data: Why and how to account for multiple observations nested within a study participant? PLoS ONE, 11(1), e0146721. https://doi.org/10.1371/journal.pone.0146721
Article CAS PubMed PubMed Central Google Scholar
Motz, B. A., Carvalho, P. F., de Leeuw, J. R., & Goldstone, R. L. (2018). Embedding experiments: Staking causal inference in authentic educational contexts. Journal of Learning Analytics, 5(2), 47–59.
Article Google Scholar
Murray, D. M., Varnell, S. P., & Blitstein, J. L. (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. American Journal of Public Health, 94(3), 423–432. https://doi.org/10.2105/ajph.94.3.423
Article PubMed PubMed Central Google Scholar
Ngo, V., Lacera, L. P., Closser, A. H., & Ottmar, E. (2023). The effects of operator position and superfluous brackets on student performance in simple arithmetic. Journal of Numerical Cognition, 9(1), 107–128. https://doi.org/10.5964/jnc.9535
Article Google Scholar
Niehaus, E., Campbell, C. M., & Inkelas, K. K. (2014). HLM behind the curtain: Unveiling decisions behind the use and interpretation of HLM in higher education research. Research in Higher Education, 55, 101–122. https://doi.org/10.1007/s11162-013-9306-7
Article Google Scholar
Ostrow, K. S., Heffernan, N. T., & Williams, J. J. (2017). Tomorrow’s edtech today: Establishing a learning platform as a collaborative research tool for sound science. Teachers College Record, 119(3), 1–36.
Article Google Scholar
Puffer, S., Torgerson, D. J., & Watson, J. (2005). Cluster randomized controlled trials. Journal of Evaluation in Clinical Practice, 11, 479–483. https://doi.org/10.1111/j.1365-2753.2005.00568.x
Article PubMed Google Scholar
Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683.
Article MathSciNet Google Scholar
Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2(2), 173–185. https://doi.org/10.1037/1082-989X.2.2.173
Article Google Scholar
Roschelle, J., Feng, M., Murphy, R. F., & Mason, C. A. (2016). Online mathematics homework increases student achievement. AERA Open, 2(4), 1–12. https://doi.org/10.1177/2332858416673968
Article Google Scholar
Sales, A. C., Hansen, B. B., & Rowan, B. (2018a). Rebar: Reinforcing a matching estimator with predictions from high-dimensional covariates. Journal of Educational and Behavioral Statistics, 43(1), 3–31.
Article Google Scholar
Sales, A., Botelho, A. F., Patikorn, T., & Heffernan, N. T. (2018). Using big data to sharpen design-based inference in A/B tests. In Proceedings of the Eleventh International Conference on Educational Data Mining.
Salganik, M. J. (2019). Bit by bit: Social research in the digital age. Princeton University Press.
Google Scholar
Salisbury, C., Wallace, M., & Montgomery, A. A. (2010). Patients’ experience and satisfaction in primary care: secondary analysis using multilevel modelling. BMJ. https://doi.org/10.1136/bmj.c5004
Article PubMed PubMed Central Google Scholar
Schochet, P. Z. (2010). Is regression adjustment supported by the Neyman model for causal inference? Journal of Statistical Planning and Inference, 140(1), 246–259. https://doi.org/10.1016/j.jspi.2009.07.008
Article MathSciNet Google Scholar
Schurer, S., & Yong, J. (2012). Personality, well-being and the marginal utility of income: What can we learn from random coefficient models? Health, Econometrics and Data Group (HEDG) Working Papers. Department of Economics, University of York, York, United Kingdom
Selent, D., Patikorn, T., & Heffernan, N. (2016). Assistments dataset from multiple randomized controlled experiments. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale pp 181–184.
Sharma, G. (2017). Pros and cons of different sampling techniques. International Journal of Applied Research, 3(7), 749–752.
ADS Google Scholar
Simsek, E., Xenidou-Dervou, I., Hunter, J., Dowens, M. G., Pang, J., Lee, Y., & Jones, I. (2022). Factors associated with children’s understanding of mathematical equivalence: An investigation across six countries. Journal of Educational Psychology, 114(6), 1359. https://doi.org/10.1037/edu0000747
Article Google Scholar
Smith, H., Closser, A. H., Ottmar, E., & Chan, J. Y. C. (2022). The impact of algebra worked example presentations on student learning. Applied Cognitive Psychology, 36(2), 363–377.
Article Google Scholar
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195152968.001.0001
Book Google Scholar
Walkington, C., Clinton, V., & Sparks, A. (2019). The effect of language modification of mathematics story problems on problem-solving in online homework. Instructional Science, 47, 499–529. https://doi.org/10.1007/s11251-019-09481-6
Article Google Scholar
Wooldridge, J. M. (2009). Heteroskedasticity-Robust Inference after OLS Estimation. Introductory Econometrics: A Modern Approach 265–271.

Download references

Acknowledgements

We thank Dr. Neil Heffernan, Cristina Heffernan, and the ASSISTments team for their support of this work and their dedication to open science practices.

Funding

This work was funded by the National Science Foundation (Graduate Research Fellowship #1645629 & Grant #2331379) and Schmidt Futures. None of the opinions expressed here are those of the funders. The research reported here was also supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305N230034 to Purdue University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Author information

Authors and Affiliations

Human Development and Family Science, Purdue University, West Lafayette, IN, 47906, USA
Avery H. Closser
Learning Science and Technologies, Worcester Polytechnic Institute, Worcester, MA, 01609, USA
Adam Sales
School of Teaching and Learning, College of Education, University of Florida, Gainesville, FL, 32611, USA
Anthony F. Botelho

Authors

Avery H. Closser
View author publications
You can also search for this author in PubMed Google Scholar
Adam Sales
View author publications
You can also search for this author in PubMed Google Scholar
Anthony F. Botelho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the research presented in this manuscript. AHC: conceptualization, project leader, writing, methodology. AS: analysis and interpretation of results, writing. AFB: conceptualization, methodology, analysis and interpretation of results, writing, supervision.

Corresponding author

Correspondence to Anthony F. Botelho.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Closser, A.H., Sales, A. & Botelho, A.F. Should We account for classrooms? Analyzing online experimental data with student-level randomization. Education Tech Research Dev (2024). https://doi.org/10.1007/s11423-023-10325-x

Download citation

Accepted: 08 November 2023
Published: 05 February 2024
DOI: https://doi.org/10.1007/s11423-023-10325-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Should We account for classrooms? Analyzing online experimental data with student-level randomization

Abstract

Access this article

Similar content being viewed by others

Terracotta: A tool for conducting experimental research on student learning

Understanding the Complexities of Experimental Analysis in the Context of Higher Education

Understanding the Complexities of Experimental Analysis in the Context of Higher Education

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Should We account for classrooms? Analyzing online experimental data with student-level randomization

Abstract

Access this article

Similar content being viewed by others

Terracotta: A tool for conducting experimental research on student learning

Understanding the Complexities of Experimental Analysis in the Context of Higher Education

Understanding the Complexities of Experimental Analysis in the Context of Higher Education

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation