Abstract
Here, we describe the development and validation of an automatic assessment system that examines students’ hand-drawn visual representations in free-response items. The data were collected from 1,028 students in the second through 11th grades in South Korea using two items from the Test About Particles in a Gas questionnaire (Novick & Nussbaum, 1981). Students’ free responses, which include hand drawings and writing, were coded for two dimensions — structural (particulate/continuous/other) and distributional (expanded/concentrated/other). Machine learning (ML) models were trained to assess the responses on the particulate nature of matter. For classifying hand drawings, a pre-trained Inception-v3 model followed by a support vector machine was trained and its performance was evaluated. The assessment model yielded high machine-human agreement (MHA) (kappa = 0.732–0.926, accuracy = 0.820–0.942, precision = 0.817–0.941, recall = 0.820–0.942, F1 = 0.818–0.941, and area under the curve [AUC] = 0.906–0.990). Students’ written responses were tokenized, and a dictionary of scientific semantic scores was prepared. The final model for the overall assessment of both drawing and writing yielded high MHA (kappa = 0.800–0.881, accuracy = 0.859–0.956, precision = 0.865–0.957, recall = 0.859–0.956, F1 = 0.859–0.956, and AUC = 0.944–0.995), which varied by the final classifiers of the models. There were some variances in the performance of the assessment model according to the school level. This study suggests that artificial intelligence can be used to automate assessments of students’ representations of scientific concepts in free-response items, particularly those drawn in a pencil-and-paper format.
Similar content being viewed by others
Notes
http://www.image-net.org/index (Accessed: April 1, 2021).
References
Adadan, E., Irving, K. E., & Trundle, K. C. (2009). Impacts of multi-representational instruction on high school students’ conceptual understandings of the particulate nature of matter. International Journal of Science Education, 31(13), 1743–1775.
Adadan, E. (2013). Using multiple representations to promote grade 11 students’ scientific understanding of the particle theory of matter. Research in Science Education, 43(3), 1079–1105.
Ayas, A., Özmen, H., & Çalik, M. (2010). Students’ conceptions of the particulate nature of matter at secondary and tertiary level. International Journal of Science and Mathematics Education, 8(1), 165–184.
Benson, D. L., Wittrock, M. C., & Baur, M. E. (1993). Students’ preconceptions of the nature of gases. Journal of Research in Science Teaching, 30(6), 587–597.
Braun, H. I., Bennett, R. E., Frye, D., & Soloway, E. (1990). Scoring constructed responses using expert systems. Journal of Educational Measurement, 27(2), 93–108.
Chang, H. Y., & Tzeng, S. F. (2018). Investigating Taiwanese students’ visualization competence of matter at the particulate level. International Journal of Science and Mathematics Education, 16(7), 1207–1226.
Delgado, R., & Tibau, X. A. (2019). Why Cohen’s Kappa should be avoided as performance measure in classification. PLoS ONE, 14(9), e0222916.
Gabel, D. L., Samuel, K. V., & Hunn, D. (1987). Understanding the particulate nature of matter. Journal of Chemical Education, 64(8), 695.
Gerard, L. F., Ryoo, K., McElhaney, K. W., Liu, O. L., Rafferty, A. N., & Linn, M. C. (2016). Automated guidance for student inquiry. Journal of Educational Psychology, 108(1), 60–81.
Ghali, R., Ouellet, S., & Frasson, C. (2016). LewiSpace: An exploratory study with a machine learning model in an educational game. Journal of Education and Training Studies, 4(1), 192–201.
Gillespie, R. J. (1997). The great ideas of chemistry. Journal of Chemical Education, 74(7), 862–863.
Harrison, A. G., & Treagust, D. F. (2002). The particulate nature of matter: Challenges in understanding the submicroscopic world. In J. K. Gilbert, O. De Jong, R. Justi, D. F. Treagust, & J. H. Van Driel (Eds.), Chemical education: Towards research-based practice (pp. 189–212). Springer.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Haudek, K. C., Prevost, L. B., Moscarella, R. A., Merrill, J., & Urban-Lurain, M. (2012). What are they thinking? Automated analysis of student writing about acid–base chemistry in introductory biology. CBE-Life Sciences Education, 11(3), 283–293.
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.
Hosmer, D. W. Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley.
Hurd, P. D. (1998). Scientific literacy: New minds for a changing world. Science Education, 82(3), 407–416.
Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150–167.
Jin, X., Chi, J., Peng, S., Tian, Y., Ye, C., & Li, X. (2016). Deep image aesthetics classification using inception modules and fine-tuning connected layer. In 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) (pp. 1–6). IEEE.
Karacop, A., & Doymus, K. (2013). Effects of jigsaw cooperative learning and animation techniques on students’ understanding of chemical bonding and their conceptions of the particulate nature of matter. Journal of Science Education and Technology, 22(2), 186–203.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Lee, G. -G., & Ha, M. (2020). The present and future of AI-based automated evaluation: A literature review on descriptive assessment and other side. Journal of Educational Technology, 36(2), 353–382. (written in Korean)
Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.
Liu, X., & Lesniak, K. M. (2005). Students’ progression of understanding the matter concept from elementary to high school. Science Education, 89(3), 433–450.
Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence unleashed: An argument of AI in education. Pearson Education.
Maestrales, S., Zhai, X., Touitou, I., Baker, Q., Schneider, B., & Krajcik, J. (2021). Using machine learning to score multi-dimensional assessments of chemistry and physics. Journal of Science Education and Technology, 30(2), 239–254.
National Research Council [NRC]. (2012). A framework for K-12 science education: Practices, cross-cutting concepts, and core ideas. National Academies Press.
Nehm, R. H., & Ha, M. (2011). Item feature effects in evolution assessment. Journal of Research in Science Teaching, 48(3), 237–256.
NGSS Lead States. (2013). Next generation science standards: For states, by states. National Academies Press.
Novick, S., & Nussbaum, J. (1981). Pupils’ understanding of the particulate nature of matter: A cross-age study. Science Education, 65(2), 187–196.
Nyachwaya, J. M., Mohamed, A.-R., Roehrig, G. H., Wood, N. B., Kern, A. L., & Schneider, J. L. (2011). The development of an open-ended drawing tool: An alternative diagnostic tool for assessing students’ understanding of the particulate nature of matter. Chemistry Education Research and Practice, 12(2), 121–132.
Opfer, J. E., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: Knowing what students know about evolution. Journal of Research in Science Teaching, 49(6), 744–777.
Özmen, H. (2011). Effect of animation enhanced conceptual change texts on 6th grade students’ understanding of the particulate nature of matter and transformation during phase changes. Computers & Education, 57(1), 1114–1126.
Park, E. L., & Cho, S. (2014). KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea. (written in Korean)
Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391–3404.
Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.) Pearson Education.
Ryan, S. A., & Stieff, M. (2019). Drawing for assessing learning outcomes in chemistry. Journal of Chemical Education, 96(9), 1813–1820.
Shin, D., & Shim, J. (2021). A systematic review on data mining for mathematics and science education. International Journal of Science and Mathematics Education, 19, 639–659.
Smith, A., Leeman-Munk, S., Shelton, A., Mott, B., Wiebe, E., & Lester, J. (2019). A multi-modal assessment framework for integrating student writing and drawing in elementary science learning. IEEE Transactions on Learning Technologies, 12(1), 3–15.
Smith, C. L., Wiser, M., Anderson, C. W., & Krajcik, J. (2006). Implications of research on children’s learning for standards and assessment: A proposed learning progression for matter and the atomic-molecular theory. Measurement: Interdisciplinary Research & Perspective, 4(1–2), 1–98.
Sripathi, K. N., Moscarella, R. A., Yoho, R., You, H. S., Urban-Lurain, M., Merrill, J., & Haudek, K. (2019). Mixed student ideas about mechanisms of human weight loss. CBE-Life Sciences Education, 18(ar37), 1–17.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826.
Taber, K. S., & García-Franco, A. (2010). Learning processes in chemistry: Drawing upon cognitive resources to learn about the particulate structure of matter. The Journal of the Learning Sciences, 19(1), 99–142.
Treagust, D. F., Chandrasegaran, A. L., Crowley, J., Yung, B. H., Cheong, I. P. A., & Othman, J. (2010). Evaluating students’ understanding of kinetic particle theory concepts relating to the states of matter, changes of state and diffusion: A cross-national study. International Journal of Science and Mathematics Education, 8(1), 141–164.
Treagust, D. F., Chandrasegaran, A. L., Zain, A. N., Ong, E. T., Karpudewan, M., & Halim, L. (2011). Evaluation of an intervention instructional program to facilitate understanding of basic particle concepts among students enrolled in several levels of study. Chemistry Education Research and Practice, 12(2), 251–261.
Yarroch, W. L. (1985). Student understanding of chemical equation balancing. Journal of Research in Science Teaching, 22(5), 449–459.
Yilmaz, A., & Alp, E. (2006). Students’ understanding of matter: The effect of reasoning ability and grade level. Chemistry Education Research and Practice, 7(1), 22–31.
Zhai, X., He, P., & Krajcik, J. (2022). Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching. https://doi.org/10.1002/tea.21773
Zhai, X., Krajcik, J., & Pellegrino, J. W. (2021). On the validity of machine learning-based next generation science assessments: A validity inferential network. Journal of Science Education and Technology. https://doi.org/10.1007/s10956-020-09879-9
Zhai, X., Shi, L., & Nehm, R. H. (2020a). A meta-analysis of machine learning-based science assessments: Factors impacting machine-human score agreements. Journal of Science Education and Technology. https://doi.org/10.1007/s10956-020-09875-z
Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020b). Applying machine learning in science assessment: A systematic review. Studies in Science Education, 56(1), 111–151.
Zhu, M., Lee, H. S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.
Zhu, M., Liu, O. L., & Lee, H. S. (2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668.
Acknowledgements
This research was supported by 2020 Student-Directed Education Regular Program from Seoul National University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lee, J., Lee, GG. & Hong, HG. Automated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter. J Sci Educ Technol 32, 549–566 (2023). https://doi.org/10.1007/s10956-023-10042-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10956-023-10042-3