A design study was conducted to test a machine learning (ML)-enabled automated feedback system developed to support students’ revision of scientific arguments using data from published sources and simulations. This paper focuses on three simulation-based scientific argumentation tasks called Trap, Aquifer, and Supply. These tasks were part of an online science curriculum module addressing groundwater systems for secondary school students. ML was used to develop automated scoring models for students’ argumentation texts as well as to explore emerging patterns between students’ simulation interactions and argumentation scores. The study occurred as we were developing the first version of simulation feedback to augment the existing argument feedback. We studied two cohorts of students who used argument only (AO) feedback (n = 164) versus argument and simulation (AS) feedback (n = 179). We investigated how AO and AS students interacted with simulations and wrote and revised their scientific arguments before and after receiving their respective feedback. Overall, the same percentages of students (49% each) revised their arguments after feedback, and their revised arguments received significantly higher scores for both feedback conditions, p < 0.001. Significantly greater numbers of AS students (36% across three tasks) reran the simulations after feedback as compared with the AO students (5%), p < 0.001. For AS students who reran the simulation, their simulation scores increased for the Trap task, p < .001, and for the Aquifer task, p < 0.01. AO students who did not receive simulation feedback but reran the simulations increased simulation scores only for the Trap task, p < .05. For the Trap and Aquifer tasks, students who increased simulation scores were more likely to increase argument scores in their revisions than those who did not increase simulation scores or did not revisit simulations at all after simulation feedback was provided. This pattern was not found for the Supply task. Based on these findings, we discuss strengths and weaknesses of the current automated feedback design, in particular the use of ML.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Allchin, D. (2012). Teaching the nature of science through scientific errors. Science Education, 96(5), 904–926.
Ash, D., & Levitt, K. (2003). Working within the zone of proximal development: formative assessment as professional development. Journal of Science Teacher Education, 14(1), 23–48.
Azevedo, R., & Bernard, R. M. (1995). A meta-analysis of the effects of feedback in computer-based instruction. Journal of Educational Computing Research, 13(2), 111–127.
Baker, R., & Siemens, G. (2013). Educational data mining and learning analytics. In K. Sawyer (Ed.), The Cambridge Handbook of the Learning Sciences (pp. 1380–1400). New York: Cambridge University Press.
Baker, R., Hershkovitz, A., Rossi, L. M., Goldstein, A. B., & Gowda, S. M. (2013). Predicting robust learning with the visual form of the moment-by-moment learning curve. Journal of the Learning Sciences, 22, 639–666.
Bar-Yam, Y. (2002). General features of complex systems. In Encyclopedia of Life Support Systems (EOLSS) (Vol. I, pp. 1–10). UNESCO.
Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: how closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.
Bell, B., & Cowie, B. (2001). The characteristics of formative assessment. Science Education, 85(5), 536–553.
Ben-Haim, Y. (2014). Order and indeterminism: An info-gap perspective. In M. Boumans, G. Hon, & A. C. Petersen (Eds.), Error and uncertainty in scientific practice: History and philosophy of technoscience (pp. 157–176). London: Routledge.
Berland, L. K., & McNeill, K. L. (2010). A learning progression for scientific argumentation: understanding student work and designing supportive instructional contexts. Science Education, 94(5), 765–793.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–72.
Brown, A. L. (1992). Design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2(2), 141–178.
Cazden, C. B. (1988). Classroom discourse: The language of teaching and learning. Portsmouth, NH: Heinemann.
Donnelly, D. F., Vitale, J. M., & Linn, M. C. (2015). Automated guidance for thermodynamics essays: critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874.
Duschl, R. A., & Osborne, J. (2002). Supporting and promoting argumentation discourse in science education. Studies in Science Education, 38, 39–72.
Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: developments in the application of Toulmin’s argument pattern for studying science discourse. Science Education, 88, 915–933.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
Gerard, L., Kidron, A., & Linn, M. C. (2019). Guiding collaborative revision of science explanations. International Journal of Computer-Supported Collaborative Learning, 14, 1–34.
Gobert, J., Sao Pedro, M., Raziuddin, J., & Baker, R. (2013). From log files to assessment metrics for science inquiry using educational data mining. Journal of the Learning Sciences, 22(4), 521–563.
Guzdial, M. (1994). Software-realized scaffolding to facilitate programming for science learning. Interactive Learning Environments, 4(1), 1–44.
Gweon, G.H., Lee, H. -S., & Finzer, W. (2016). Measuring systematcity of students’ experimentation in an open-ended simulation environment from logging data. Paper presented at the annual meeting of the Americal Educational Research Association. Washington D.C.
Hattie, J. (2009). Visible learning: a synthesis of over 800 meta-analyses relating to achievement. New York: Routledge.
Heilman, M., & Madnani, N. (2013). Domain adaptation and stacking for short answer scoring. Seventh International Workshop on Semantic Evaluation, 2(SemEval), 275–279. Retrieved from http://www.aclweb.org/anthology/S13-2046
Honey, M. A., & Hilton, M. (2011). Learning science through computer games and simulations. Washington D.C.: The National Academies Press.
Ifenthaler, D., Eseryel, D., & Ge, X. (2012). Assessment in game-based learning: foundations, innovations, and perspectives. New York: Springer.
Jacquart, M. (2018). Learning about reality through models and computer simulations. Science & Education, 27, 805–810.
Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11(2), 143–157.
Kluger, A. N., & DeNisi, A. (1998). Feedback interventions: Toward the understanding of a double-edged sword. Current Directions in Psychological Science, 7, 67–72.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Lead States. (2013). The Next Generation Science Standards. The National Academies Press.
Lee, H.-S., Liu, O. L., Pallant, A., Roohr, K. C., Pryputniewicz, S., & Buck, Z. E. (2014). Assessment of uncertainty-infused scientific argumentation. Journal of Research in Science Teaching, 51, 581–605.
Lee, H. -S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622.
Linn, M. C., Gerard, L., Ryoo, K., McElhaney, K., Liu, O. L., & Rafferty, A. N. (2014). Computer-guided inquiry to improve science learning. Science, 344(6180), 155–156.
Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: prospects and obstacles. Educational Measurement: Issues and Practice, 33(2), 19–28.
Madnani, N., Loukina, A., & Cahill, A. (2017). A large scale quantitative exploration of modeling strategies for content scoring. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 457–467. Retrieved from http://www.aclweb.org/anthology/W17-5052
Martin, T., & Sherin, B. (2013). Learning analytics and computational techniques for detecting and evaluating patterns in learning: an introduction to the special issue. Journal of the Learning Sciences, 22(4), 511–520.
Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121-138.
McNeill, K. L., & Pimentel, D. S. (2010). Scientific discourse in three urban classrooms: the role of the teacher in engaging high school students in argumentation. Science Education, 94(2), 203–229.
McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. Journal of the Learning Sciences, 15(2), 153–191.
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
Morrison, M. (2015). Reconstructing reality: models, mathematics, and simulations. Oxford: Oxford University Press.
National Center for Education. (2012). Science in action: hands-on and interactive computer tasks from the 2009 science assessment, 1–24. Retrieved from papers3://publication/uuid/F9DCC897–609A-4858–87C9–9105F4201EE3.
National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press.
Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.
Palincsar, A. S. (1998). Social constructivist perspectives on teaching and learning. Annual Review of Psychology, 49, 345–375.
Pallant, A., & Lee, H.-S. (2015). Constructing scientific arguments using evidence from dynamic computational climate models. Journal of Science Education and Technology, 24(2), 378–395.
Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391-3404.
Pryor, J., & Crossouard, B. (2008). A socio-cultural theorisation of formative assessment. Oxford Review of Education, 34, 37–41.
Quitana, C., Reiser, B. J., Davis, E. A., Krajcik, J., Fretz, E., Duncan, R. G., & Soloway, E. (2004). A scaffolding design framework for software to support science inquiry. The Journal of the Learning Sciences, 13(3), 337–386.
Ruiz-primo, M. A., & Furtak, E. M. (2006). Informal formative assessment and scientific inquiry: exploring teachers’ practices and student learning, 11, 205–235.
Russ, R. S., Coffey, J. E., Hammer, D., & Hutchison, P. (2009). Making classroom assessment more countable to scientific reasoning: a case for attending to mechanistic thinking. Science Education, 93(5), 875–891.
Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.
Schwartz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Acher, A., Fortus, D., & Krajcik, J. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632–654.
Shavelson, R. J., Young, D. B., Ayala, C. C., Brandon, P. R., Furtak, E. M., Ruiz-Primo, M. A., & Gunn, S. (2008). On the impact of curriculum-embedded formative assessment on learning: a collaboration between curriculum and assessment developers. Assessment in Education: Principles, Policy and Practice, 31(2), 59–75.
Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14.
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.
Sterman, J. D. (2002). All models are wrong: reflections on becoming a systems scientist. System Dynamics Review, 18(4), 501–531.
Stroupe, D. (2014). Examining classroom science practice communities: how teachers and students negotiate epistemic agency and learn science-as-practice. Science Education, 98(3), 487–516.
Toulmin, S. (1958). The uses of argument. New York: Cambridge University Press.
Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press.
Weisberg, M. (2013). Simulation and similarity: using models to understand the world. Oxford: Oxford University Press.
Williamson, D., Xi, X., & Breyer, J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
Yin, Y., Shavelson, R. J., Ayala, C. C., Ruiz-Primo, M. A., Brandon, P. R., Furtak, E. M., & Young, D. B. (2008). On the impact of formative assessment on student motivation, achievement, and conceptual change. Applied Measurement in Education, 21(4), 335–359.
Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.
Zhu, M., Lee, H.-S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–166.
National Science Foundation (1220756) Ms. Amy Pallant, National Science Foundation (1418019) Dr. Hee-Sun Lee.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lee, HS., Gweon, GH., Lord, T. et al. Machine Learning-Enabled Automated Feedback: Supporting Students’ Revision of Scientific Arguments Based on Data Drawn from Simulation. J Sci Educ Technol (2021). https://doi.org/10.1007/s10956-020-09889-7
- Automated feedback
- Automated scoring
- Scientific argumentation
- Machine learning