Machine Learning-Enabled Automated Feedback: Supporting Students’ Revision of Scientific Arguments Based on Data Drawn from Simulation

Abstract

A design study was conducted to test a machine learning (ML)-enabled automated feedback system developed to support students’ revision of scientific arguments using data from published sources and simulations. This paper focuses on three simulation-based scientific argumentation tasks called Trap, Aquifer, and Supply. These tasks were part of an online science curriculum module addressing groundwater systems for secondary school students. ML was used to develop automated scoring models for students’ argumentation texts as well as to explore emerging patterns between students’ simulation interactions and argumentation scores. The study occurred as we were developing the first version of simulation feedback to augment the existing argument feedback. We studied two cohorts of students who used argument only (AO) feedback (n = 164) versus argument and simulation (AS) feedback (n = 179). We investigated how AO and AS students interacted with simulations and wrote and revised their scientific arguments before and after receiving their respective feedback. Overall, the same percentages of students (49% each) revised their arguments after feedback, and their revised arguments received significantly higher scores for both feedback conditions, p < 0.001. Significantly greater numbers of AS students (36% across three tasks) reran the simulations after feedback as compared with the AO students (5%), p < 0.001. For AS students who reran the simulation, their simulation scores increased for the Trap task, p < .001, and for the Aquifer task, p < 0.01. AO students who did not receive simulation feedback but reran the simulations increased simulation scores only for the Trap task, p < .05. For the Trap and Aquifer tasks, students who increased simulation scores were more likely to increase argument scores in their revisions than those who did not increase simulation scores or did not revisit simulations at all after simulation feedback was provided. This pattern was not found for the Supply task. Based on these findings, we discuss strengths and weaknesses of the current automated feedback design, in particular the use of ML.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Allchin, D. (2012). Teaching the nature of science through scientific errors. Science Education, 96(5), 904–926.

    Article  Google Scholar 

  2. Ash, D., & Levitt, K. (2003). Working within the zone of proximal development: formative assessment as professional development. Journal of Science Teacher Education, 14(1), 23–48.

    Article  Google Scholar 

  3. Azevedo, R., & Bernard, R. M. (1995). A meta-analysis of the effects of feedback in computer-based instruction. Journal of Educational Computing Research, 13(2), 111–127.

    Article  Google Scholar 

  4. Baker, R., & Siemens, G. (2013). Educational data mining and learning analytics. In K. Sawyer (Ed.), The Cambridge Handbook of the Learning Sciences (pp. 1380–1400). New York: Cambridge University Press.

    Google Scholar 

  5. Baker, R., Hershkovitz, A., Rossi, L. M., Goldstein, A. B., & Gowda, S. M. (2013). Predicting robust learning with the visual form of the moment-by-moment learning curve. Journal of the Learning Sciences, 22, 639–666.

    Article  Google Scholar 

  6. Bar-Yam, Y. (2002). General features of complex systems. In Encyclopedia of Life Support Systems (EOLSS) (Vol. I, pp. 1–10). UNESCO.

  7. Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: how closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.

    Article  Google Scholar 

  8. Bell, B., & Cowie, B. (2001). The characteristics of formative assessment. Science Education, 85(5), 536–553.

    Article  Google Scholar 

  9. Ben-Haim, Y. (2014). Order and indeterminism: An info-gap perspective. In M. Boumans, G. Hon, & A. C. Petersen (Eds.), Error and uncertainty in scientific practice: History and philosophy of technoscience (pp. 157–176). London: Routledge.

    Google Scholar 

  10. Berland, L. K., & McNeill, K. L. (2010). A learning progression for scientific argumentation: understanding student work and designing supportive instructional contexts. Science Education, 94(5), 765–793.

    Article  Google Scholar 

  11. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–72.

    Google Scholar 

  12. Brown, A. L. (1992). Design experiments: theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2(2), 141–178.

    Article  Google Scholar 

  13. Cazden, C. B. (1988). Classroom discourse: The language of teaching and learning. Portsmouth, NH: Heinemann.

    Google Scholar 

  14. Donnelly, D. F., Vitale, J. M., & Linn, M. C. (2015). Automated guidance for thermodynamics essays: critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874.

    Article  Google Scholar 

  15. Duschl, R. A., & Osborne, J. (2002). Supporting and promoting argumentation discourse in science education. Studies in Science Education, 38, 39–72.

    Article  Google Scholar 

  16. Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: developments in the application of Toulmin’s argument pattern for studying science discourse. Science Education, 88, 915–933.

    Article  Google Scholar 

  17. Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.

    Article  Google Scholar 

  18. Gerard, L., Kidron, A., & Linn, M. C. (2019). Guiding collaborative revision of science explanations. International Journal of Computer-Supported Collaborative Learning, 14, 1–34.

    Article  Google Scholar 

  19. Gobert, J., Sao Pedro, M., Raziuddin, J., & Baker, R. (2013). From log files to assessment metrics for science inquiry using educational data mining. Journal of the Learning Sciences, 22(4), 521–563.

    Article  Google Scholar 

  20. Guzdial, M. (1994). Software-realized scaffolding to facilitate programming for science learning. Interactive Learning Environments, 4(1), 1–44.

    Article  Google Scholar 

  21. Gweon, G.H., Lee, H. -S., & Finzer, W. (2016). Measuring systematcity of students’ experimentation in an open-ended simulation environment from logging data. Paper presented at the annual meeting of the Americal Educational Research Association. Washington D.C.

  22. Hattie, J. (2009). Visible learning: a synthesis of over 800 meta-analyses relating to achievement. New York: Routledge.

    Google Scholar 

  23. Heilman, M., & Madnani, N. (2013). Domain adaptation and stacking for short answer scoring. Seventh International Workshop on Semantic Evaluation, 2(SemEval), 275–279. Retrieved from http://www.aclweb.org/anthology/S13-2046

  24. Honey, M. A., & Hilton, M. (2011). Learning science through computer games and simulations. Washington D.C.: The National Academies Press.

  25. Ifenthaler, D., Eseryel, D., & Ge, X. (2012). Assessment in game-based learning: foundations, innovations, and perspectives. New York: Springer.

    Google Scholar 

  26. Jacquart, M. (2018). Learning about reality through models and computer simulations. Science & Education, 27, 805–810.

    Article  Google Scholar 

  27. Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11(2), 143–157.

    Article  Google Scholar 

  28. Kluger, A. N., & DeNisi, A. (1998). Feedback interventions: Toward the understanding of a double-edged sword. Current Directions in Psychological Science, 7, 67–72.

    Article  Google Scholar 

  29. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

    Article  Google Scholar 

  30. Lead States. (2013). The Next Generation Science Standards. The National Academies Press.

  31. Lee, H.-S., Liu, O. L., Pallant, A., Roohr, K. C., Pryputniewicz, S., & Buck, Z. E. (2014). Assessment of uncertainty-infused scientific argumentation. Journal of Research in Science Teaching, 51, 581–605.

    Article  Google Scholar 

  32. Lee, H. -S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622.

    Article  Google Scholar 

  33. Linn, M. C., Gerard, L., Ryoo, K., McElhaney, K., Liu, O. L., & Rafferty, A. N. (2014). Computer-guided inquiry to improve science learning. Science, 344(6180), 155–156.

    Article  Google Scholar 

  34. Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: prospects and obstacles. Educational Measurement: Issues and Practice, 33(2), 19–28.

    Article  Google Scholar 

  35. Madnani, N., Loukina, A., & Cahill, A. (2017). A large scale quantitative exploration of modeling strategies for content scoring. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 457–467. Retrieved from http://www.aclweb.org/anthology/W17-5052

  36. Martin, T., & Sherin, B. (2013). Learning analytics and computational techniques for detecting and evaluating patterns in learning: an introduction to the special issue. Journal of the Learning Sciences, 22(4), 511–520.

    Article  Google Scholar 

  37. Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121-138.

    Article  Google Scholar 

  38. McNeill, K. L., & Pimentel, D. S. (2010). Scientific discourse in three urban classrooms: the role of the teacher in engaging high school students in argumentation. Science Education, 94(2), 203–229.

    Google Scholar 

  39. McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. Journal of the Learning Sciences, 15(2), 153–191.

    Article  Google Scholar 

  40. Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.

    Article  Google Scholar 

  41. Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.

    Google Scholar 

  42. Morrison, M. (2015). Reconstructing reality: models, mathematics, and simulations. Oxford: Oxford University Press.

    Google Scholar 

  43. National Center for Education. (2012). Science in action: hands-on and interactive computer tasks from the 2009 science assessment, 1–24. Retrieved from papers3://publication/uuid/F9DCC897–609A-4858–87C9–9105F4201EE3.

  44. National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press.

    Google Scholar 

  45. Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.

    Article  Google Scholar 

  46. Palincsar, A. S. (1998). Social constructivist perspectives on teaching and learning. Annual Review of Psychology, 49, 345–375.

    Article  Google Scholar 

  47. Pallant, A., & Lee, H.-S. (2015). Constructing scientific arguments using evidence from dynamic computational climate models. Journal of Science Education and Technology, 24(2), 378–395.

    Article  Google Scholar 

  48. Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391-3404.

    Article  Google Scholar 

  49. Pryor, J., & Crossouard, B. (2008). A socio-cultural theorisation of formative assessment. Oxford Review of Education, 34, 37–41.

    Article  Google Scholar 

  50. Quitana, C., Reiser, B. J., Davis, E. A., Krajcik, J., Fretz, E., Duncan, R. G., & Soloway, E. (2004). A scaffolding design framework for software to support science inquiry. The Journal of the Learning Sciences, 13(3), 337–386.

    Article  Google Scholar 

  51. Ruiz-primo, M. A., & Furtak, E. M. (2006). Informal formative assessment and scientific inquiry: exploring teachers’ practices and student learning, 11, 205–235.

    Google Scholar 

  52. Russ, R. S., Coffey, J. E., Hammer, D., & Hutchison, P. (2009). Making classroom assessment more countable to scientific reasoning: a case for attending to mechanistic thinking. Science Education, 93(5), 875–891.

    Article  Google Scholar 

  53. Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.

    Article  Google Scholar 

  54. Schwartz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Acher, A., Fortus, D., & Krajcik, J. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632–654.

    Article  Google Scholar 

  55. Shavelson, R. J., Young, D. B., Ayala, C. C., Brandon, P. R., Furtak, E. M., Ruiz-Primo, M. A., & Gunn, S. (2008). On the impact of curriculum-embedded formative assessment on learning: a collaboration between curriculum and assessment developers. Assessment in Education: Principles, Policy and Practice, 31(2), 59–75.

    Google Scholar 

  56. Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14.

    Article  Google Scholar 

  57. Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.

    Article  Google Scholar 

  58. Sterman, J. D. (2002). All models are wrong: reflections on becoming a systems scientist. System Dynamics Review, 18(4), 501–531.

    Article  Google Scholar 

  59. Stroupe, D. (2014). Examining classroom science practice communities: how teachers and students negotiate epistemic agency and learn science-as-practice. Science Education, 98(3), 487–516.

    Article  Google Scholar 

  60. Toulmin, S. (1958). The uses of argument. New York: Cambridge University Press.

    Google Scholar 

  61. Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press.

    Google Scholar 

  62. Weisberg, M. (2013). Simulation and similarity: using models to understand the world. Oxford: Oxford University Press.

    Google Scholar 

  63. Williamson, D., Xi, X., & Breyer, J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.

    Article  Google Scholar 

  64. Yin, Y., Shavelson, R. J., Ayala, C. C., Ruiz-Primo, M. A., Brandon, P. R., Furtak, E. M., & Young, D. B. (2008). On the impact of formative assessment on student motivation, achievement, and conceptual change. Applied Measurement in Education, 21(4), 335–359.

    Article  Google Scholar 

  65. Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.

    Article  Google Scholar 

  66. Zhu, M., Lee, H.-S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–166.

    Article  Google Scholar 

Download references

Funding

National Science Foundation (1220756) Ms. Amy Pallant, National Science Foundation (1418019) Dr. Hee-Sun Lee.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hee-Sun Lee.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, HS., Gweon, GH., Lord, T. et al. Machine Learning-Enabled Automated Feedback: Supporting Students’ Revision of Scientific Arguments Based on Data Drawn from Simulation. J Sci Educ Technol (2021). https://doi.org/10.1007/s10956-020-09889-7

Download citation

Keywords

  • Automated feedback
  • Automated scoring
  • Simulation
  • Scientific argumentation
  • Machine learning