Skip to main content
Log in

Machine Learning-Enabled Automated Feedback: Supporting Students’ Revision of Scientific Arguments Based on Data Drawn from Simulation

  • Published:
Journal of Science Education and Technology Aims and scope Submit manuscript

Abstract

A design study was conducted to test a machine learning (ML)-enabled automated feedback system developed to support students’ revision of scientific arguments using data from published sources and simulations. This paper focuses on three simulation-based scientific argumentation tasks called Trap, Aquifer, and Supply. These tasks were part of an online science curriculum module addressing groundwater systems for secondary school students. ML was used to develop automated scoring models for students’ argumentation texts as well as to explore emerging patterns between students’ simulation interactions and argumentation scores. The study occurred as we were developing the first version of simulation feedback to augment the existing argument feedback. We studied two cohorts of students who used argument only (AO) feedback (n = 164) versus argument and simulation (AS) feedback (n = 179). We investigated how AO and AS students interacted with simulations and wrote and revised their scientific arguments before and after receiving their respective feedback. Overall, the same percentages of students (49% each) revised their arguments after feedback, and their revised arguments received significantly higher scores for both feedback conditions, p < 0.001. Significantly greater numbers of AS students (36% across three tasks) reran the simulations after feedback as compared with the AO students (5%), p < 0.001. For AS students who reran the simulation, their simulation scores increased for the Trap task, p < .001, and for the Aquifer task, p < 0.01. AO students who did not receive simulation feedback but reran the simulations increased simulation scores only for the Trap task, p < .05. For the Trap and Aquifer tasks, students who increased simulation scores were more likely to increase argument scores in their revisions than those who did not increase simulation scores or did not revisit simulations at all after simulation feedback was provided. This pattern was not found for the Supply task. Based on these findings, we discuss strengths and weaknesses of the current automated feedback design, in particular the use of ML.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Allchin, D. (2012). Teaching the nature of science through scientific errors. Science Education, 96(5), 904–926.

    Article  Google Scholar 

  • Ash, D., & Levitt, K. (2003). Working within the zone of proximal development: formative assessment as professional development. Journal of Science Teacher Education, 14(1), 23–48.

    Article  Google Scholar 

  • Azevedo, R., & Bernard, R. M. (1995). A meta-analysis of the effects of feedback in computer-based instruction. Journal of Educational Computing Research, 13(2), 111–127.

    Article  Google Scholar 

  • Baker, R., & Siemens, G. (2013). Educational data mining and learning analytics. In K. Sawyer (Ed.), The Cambridge Handbook of the Learning Sciences (pp. 1380–1400). New York: Cambridge University Press.

    Google Scholar 

  • Baker, R., Hershkovitz, A., Rossi, L. M., Goldstein, A. B., & Gowda, S. M. (2013). Predicting robust learning with the visual form of the moment-by-moment learning curve. Journal of the Learning Sciences, 22, 639–666.

    Article  Google Scholar 

  • Bar-Yam, Y. (2002). General features of complex systems. In Encyclopedia of Life Support Systems (EOLSS) (Vol. I, pp. 1–10). UNESCO.

  • Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: how closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.

    Article  Google Scholar 

  • Bell, B., & Cowie, B. (2001). The characteristics of formative assessment. Science Education, 85(5), 536–553.

    Article  Google Scholar 

  • Ben-Haim, Y. (2014). Order and indeterminism: An info-gap perspective. In M. Boumans, G. Hon, & A. C. Petersen (Eds.), Error and uncertainty in scientific practice: History and philosophy of technoscience (pp. 157–176). London: Routledge.

    Google Scholar 

  • Berland, L. K., & McNeill, K. L. (2010). A learning progression for scientific argumentation: understanding student work and designing supportive instructional contexts. Science Education, 94(5), 765–793.

    Article  Google Scholar 

  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–72.

    Google Scholar 

  • Brown, A. L. (1992). Design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2(2), 141–178.

    Article  Google Scholar 

  • Cazden, C. B. (1988). Classroom discourse: The language of teaching and learning. Portsmouth, NH: Heinemann.

    Google Scholar 

  • Donnelly, D. F., Vitale, J. M., & Linn, M. C. (2015). Automated guidance for thermodynamics essays: critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874.

    Article  Google Scholar 

  • Duschl, R. A., & Osborne, J. (2002). Supporting and promoting argumentation discourse in science education. Studies in Science Education, 38, 39–72.

    Article  Google Scholar 

  • Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: developments in the application of Toulmin’s argument pattern for studying science discourse. Science Education, 88, 915–933.

    Article  Google Scholar 

  • Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.

    Article  Google Scholar 

  • Gerard, L., Kidron, A., & Linn, M. C. (2019). Guiding collaborative revision of science explanations. International Journal of Computer-Supported Collaborative Learning, 14, 1–34.

    Article  Google Scholar 

  • Gobert, J., Sao Pedro, M., Raziuddin, J., & Baker, R. (2013). From log files to assessment metrics for science inquiry using educational data mining. Journal of the Learning Sciences, 22(4), 521–563.

    Article  Google Scholar 

  • Guzdial, M. (1994). Software-realized scaffolding to facilitate programming for science learning. Interactive Learning Environments, 4(1), 1–44.

    Article  Google Scholar 

  • Gweon, G.H., Lee, H. -S., & Finzer, W. (2016). Measuring systematcity of students’ experimentation in an open-ended simulation environment from logging data. Paper presented at the annual meeting of the Americal Educational Research Association. Washington D.C.

  • Hattie, J. (2009). Visible learning: a synthesis of over 800 meta-analyses relating to achievement. New York: Routledge.

    Google Scholar 

  • Heilman, M., & Madnani, N. (2013). Domain adaptation and stacking for short answer scoring. Seventh International Workshop on Semantic Evaluation, 2(SemEval), 275–279. Retrieved from http://www.aclweb.org/anthology/S13-2046

  • Honey, M. A., & Hilton, M. (2011). Learning science through computer games and simulations. Washington D.C.: The National Academies Press.

  • Ifenthaler, D., Eseryel, D., & Ge, X. (2012). Assessment in game-based learning: foundations, innovations, and perspectives. New York: Springer.

    Book  Google Scholar 

  • Jacquart, M. (2018). Learning about reality through models and computer simulations. Science & Education, 27, 805–810.

    Article  Google Scholar 

  • Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11(2), 143–157.

    Article  Google Scholar 

  • Kluger, A. N., & DeNisi, A. (1998). Feedback interventions: Toward the understanding of a double-edged sword. Current Directions in Psychological Science, 7, 67–72.

    Article  Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

    Article  Google Scholar 

  • Lead States. (2013). The Next Generation Science Standards. The National Academies Press.

  • Lee, H.-S., Liu, O. L., Pallant, A., Roohr, K. C., Pryputniewicz, S., & Buck, Z. E. (2014). Assessment of uncertainty-infused scientific argumentation. Journal of Research in Science Teaching, 51, 581–605.

    Article  Google Scholar 

  • Lee, H. -S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622.

    Article  Google Scholar 

  • Linn, M. C., Gerard, L., Ryoo, K., McElhaney, K., Liu, O. L., & Rafferty, A. N. (2014). Computer-guided inquiry to improve science learning. Science, 344(6180), 155–156.

    Article  Google Scholar 

  • Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: prospects and obstacles. Educational Measurement: Issues and Practice, 33(2), 19–28.

    Article  Google Scholar 

  • Madnani, N., Loukina, A., & Cahill, A. (2017). A large scale quantitative exploration of modeling strategies for content scoring. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 457–467. Retrieved from http://www.aclweb.org/anthology/W17-5052

  • Martin, T., & Sherin, B. (2013). Learning analytics and computational techniques for detecting and evaluating patterns in learning: an introduction to the special issue. Journal of the Learning Sciences, 22(4), 511–520.

    Article  Google Scholar 

  • Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121-138.

    Article  Google Scholar 

  • McNeill, K. L., & Pimentel, D. S. (2010). Scientific discourse in three urban classrooms: the role of the teacher in engaging high school students in argumentation. Science Education, 94(2), 203–229.

    Google Scholar 

  • McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. Journal of the Learning Sciences, 15(2), 153–191.

    Article  Google Scholar 

  • Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.

    Article  Google Scholar 

  • Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.

    Google Scholar 

  • Morrison, M. (2015). Reconstructing reality: models, mathematics, and simulations. Oxford: Oxford University Press.

    Book  Google Scholar 

  • National Center for Education. (2012). Science in action: hands-on and interactive computer tasks from the 2009 science assessment, 1–24. Retrieved from papers3://publication/uuid/F9DCC897–609A-4858–87C9–9105F4201EE3.

  • National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press.

    Google Scholar 

  • Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.

    Article  Google Scholar 

  • Palincsar, A. S. (1998). Social constructivist perspectives on teaching and learning. Annual Review of Psychology, 49, 345–375.

    Article  Google Scholar 

  • Pallant, A., & Lee, H.-S. (2015). Constructing scientific arguments using evidence from dynamic computational climate models. Journal of Science Education and Technology, 24(2), 378–395.

    Article  Google Scholar 

  • Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391-3404.

    Article  Google Scholar 

  • Pryor, J., & Crossouard, B. (2008). A socio-cultural theorisation of formative assessment. Oxford Review of Education, 34, 37–41.

    Article  Google Scholar 

  • Quitana, C., Reiser, B. J., Davis, E. A., Krajcik, J., Fretz, E., Duncan, R. G., & Soloway, E. (2004). A scaffolding design framework for software to support science inquiry. The Journal of the Learning Sciences, 13(3), 337–386.

    Article  Google Scholar 

  • Ruiz-primo, M. A., & Furtak, E. M. (2006). Informal formative assessment and scientific inquiry: exploring teachers’ practices and student learning, 11, 205–235.

    Google Scholar 

  • Russ, R. S., Coffey, J. E., Hammer, D., & Hutchison, P. (2009). Making classroom assessment more countable to scientific reasoning: a case for attending to mechanistic thinking. Science Education, 93(5), 875–891.

    Article  Google Scholar 

  • Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.

    Article  Google Scholar 

  • Schwartz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Acher, A., Fortus, D., & Krajcik, J. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632–654.

    Article  Google Scholar 

  • Shavelson, R. J., Young, D. B., Ayala, C. C., Brandon, P. R., Furtak, E. M., Ruiz-Primo, M. A., & Gunn, S. (2008). On the impact of curriculum-embedded formative assessment on learning: a collaboration between curriculum and assessment developers. Assessment in Education: Principles, Policy and Practice, 31(2), 59–75.

    Google Scholar 

  • Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14.

    Article  Google Scholar 

  • Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.

    Article  Google Scholar 

  • Sterman, J. D. (2002). All models are wrong: reflections on becoming a systems scientist. System Dynamics Review, 18(4), 501–531.

    Article  Google Scholar 

  • Stroupe, D. (2014). Examining classroom science practice communities: how teachers and students negotiate epistemic agency and learn science-as-practice. Science Education, 98(3), 487–516.

    Article  Google Scholar 

  • Toulmin, S. (1958). The uses of argument. New York: Cambridge University Press.

    Google Scholar 

  • Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Weisberg, M. (2013). Simulation and similarity: using models to understand the world. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Williamson, D., Xi, X., & Breyer, J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.

    Article  Google Scholar 

  • Yin, Y., Shavelson, R. J., Ayala, C. C., Ruiz-Primo, M. A., Brandon, P. R., Furtak, E. M., & Young, D. B. (2008). On the impact of formative assessment on student motivation, achievement, and conceptual change. Applied Measurement in Education, 21(4), 335–359.

    Article  Google Scholar 

  • Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.

    Article  Google Scholar 

  • Zhu, M., Lee, H.-S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–166.

    Article  Google Scholar 

Download references

Funding

National Science Foundation (1220756) Ms. Amy Pallant, National Science Foundation (1418019) Dr. Hee-Sun Lee.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hee-Sun Lee.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, HS., Gweon, GH., Lord, T. et al. Machine Learning-Enabled Automated Feedback: Supporting Students’ Revision of Scientific Arguments Based on Data Drawn from Simulation. J Sci Educ Technol 30, 168–192 (2021). https://doi.org/10.1007/s10956-020-09889-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10956-020-09889-7

Keywords

Navigation