Automated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter

Lee, Jaeyong; Lee, Gyeong-Geon; Hong, Hun-Gi

doi:10.1007/s10956-023-10042-3

Automated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter

Published: 26 April 2023

Volume 32, pages 549–566, (2023)
Cite this article

Journal of Science Education and Technology Aims and scope Submit manuscript

597 Accesses
3 Citations
Explore all metrics

Abstract

Here, we describe the development and validation of an automatic assessment system that examines students’ hand-drawn visual representations in free-response items. The data were collected from 1,028 students in the second through 11th grades in South Korea using two items from the Test About Particles in a Gas questionnaire (Novick & Nussbaum, 1981). Students’ free responses, which include hand drawings and writing, were coded for two dimensions — structural (particulate/continuous/other) and distributional (expanded/concentrated/other). Machine learning (ML) models were trained to assess the responses on the particulate nature of matter. For classifying hand drawings, a pre-trained Inception-v3 model followed by a support vector machine was trained and its performance was evaluated. The assessment model yielded high machine-human agreement (MHA) (kappa = 0.732–0.926, accuracy = 0.820–0.942, precision = 0.817–0.941, recall = 0.820–0.942, F1 = 0.818–0.941, and area under the curve [AUC] = 0.906–0.990). Students’ written responses were tokenized, and a dictionary of scientific semantic scores was prepared. The final model for the overall assessment of both drawing and writing yielded high MHA (kappa = 0.800–0.881, accuracy = 0.859–0.956, precision = 0.865–0.957, recall = 0.859–0.956, F1 = 0.859–0.956, and AUC = 0.944–0.995), which varied by the final classifiers of the models. There were some variances in the performance of the assessment model according to the school level. This study suggests that artificial intelligence can be used to automate assessments of students’ representations of scientific concepts in free-response items, particularly those drawn in a pencil-and-paper format.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word prevalence norms for 62,000 English lemmas

Article 02 July 2018

A Cognitive Load Theory Approach to Understanding Expert Scaffolding of Visual Problem-Solving Tasks: A Scoping Review

Article Open access 26 January 2024

Word problems in mathematics education: a survey

Article 13 January 2020

Notes

http://www.image-net.org/index (Accessed: April 1, 2021).

References

Adadan, E., Irving, K. E., & Trundle, K. C. (2009). Impacts of multi-representational instruction on high school students’ conceptual understandings of the particulate nature of matter. International Journal of Science Education, 31(13), 1743–1775.
Google Scholar
Adadan, E. (2013). Using multiple representations to promote grade 11 students’ scientific understanding of the particle theory of matter. Research in Science Education, 43(3), 1079–1105.
Google Scholar
Ayas, A., Özmen, H., & Çalik, M. (2010). Students’ conceptions of the particulate nature of matter at secondary and tertiary level. International Journal of Science and Mathematics Education, 8(1), 165–184.
Google Scholar
Benson, D. L., Wittrock, M. C., & Baur, M. E. (1993). Students’ preconceptions of the nature of gases. Journal of Research in Science Teaching, 30(6), 587–597.
Google Scholar
Braun, H. I., Bennett, R. E., Frye, D., & Soloway, E. (1990). Scoring constructed responses using expert systems. Journal of Educational Measurement, 27(2), 93–108.
Google Scholar
Chang, H. Y., & Tzeng, S. F. (2018). Investigating Taiwanese students’ visualization competence of matter at the particulate level. International Journal of Science and Mathematics Education, 16(7), 1207–1226.
Google Scholar
Delgado, R., & Tibau, X. A. (2019). Why Cohen’s Kappa should be avoided as performance measure in classification. PLoS ONE, 14(9), e0222916.
Google Scholar
Gabel, D. L., Samuel, K. V., & Hunn, D. (1987). Understanding the particulate nature of matter. Journal of Chemical Education, 64(8), 695.
Google Scholar
Gerard, L. F., Ryoo, K., McElhaney, K. W., Liu, O. L., Rafferty, A. N., & Linn, M. C. (2016). Automated guidance for student inquiry. Journal of Educational Psychology, 108(1), 60–81.
Google Scholar
Ghali, R., Ouellet, S., & Frasson, C. (2016). LewiSpace: An exploratory study with a machine learning model in an educational game. Journal of Education and Training Studies, 4(1), 192–201.
Google Scholar
Gillespie, R. J. (1997). The great ideas of chemistry. Journal of Chemical Education, 74(7), 862–863.
Google Scholar
Harrison, A. G., & Treagust, D. F. (2002). The particulate nature of matter: Challenges in understanding the submicroscopic world. In J. K. Gilbert, O. De Jong, R. Justi, D. F. Treagust, & J. H. Van Driel (Eds.), Chemical education: Towards research-based practice (pp. 189–212). Springer.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Google Scholar
Haudek, K. C., Prevost, L. B., Moscarella, R. A., Merrill, J., & Urban-Lurain, M. (2012). What are they thinking? Automated analysis of student writing about acid–base chemistry in introductory biology. CBE-Life Sciences Education, 11(3), 283–293.
Google Scholar
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.
Google Scholar
Hosmer, D. W. Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley.
Hurd, P. D. (1998). Scientific literacy: New minds for a changing world. Science Education, 82(3), 407–416.
Google Scholar
Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150–167.
Google Scholar
Jin, X., Chi, J., Peng, S., Tian, Y., Ye, C., & Li, X. (2016). Deep image aesthetics classification using inception modules and fine-tuning connected layer. In 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) (pp. 1–6). IEEE.
Karacop, A., & Doymus, K. (2013). Effects of jigsaw cooperative learning and animation techniques on students’ understanding of chemical bonding and their conceptions of the particulate nature of matter. Journal of Science Education and Technology, 22(2), 186–203.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Google Scholar
Lee, G. -G., & Ha, M. (2020). The present and future of AI-based automated evaluation: A literature review on descriptive assessment and other side. Journal of Educational Technology, 36(2), 353–382. (written in Korean)
Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.
Google Scholar
Liu, X., & Lesniak, K. M. (2005). Students’ progression of understanding the matter concept from elementary to high school. Science Education, 89(3), 433–450.
Google Scholar
Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence unleashed: An argument of AI in education. Pearson Education.
Google Scholar
Maestrales, S., Zhai, X., Touitou, I., Baker, Q., Schneider, B., & Krajcik, J. (2021). Using machine learning to score multi-dimensional assessments of chemistry and physics. Journal of Science Education and Technology, 30(2), 239–254.
Google Scholar
National Research Council [NRC]. (2012). A framework for K-12 science education: Practices, cross-cutting concepts, and core ideas. National Academies Press.
Nehm, R. H., & Ha, M. (2011). Item feature effects in evolution assessment. Journal of Research in Science Teaching, 48(3), 237–256.
Google Scholar
NGSS Lead States. (2013). Next generation science standards: For states, by states. National Academies Press.
Novick, S., & Nussbaum, J. (1981). Pupils’ understanding of the particulate nature of matter: A cross-age study. Science Education, 65(2), 187–196.
Google Scholar
Nyachwaya, J. M., Mohamed, A.-R., Roehrig, G. H., Wood, N. B., Kern, A. L., & Schneider, J. L. (2011). The development of an open-ended drawing tool: An alternative diagnostic tool for assessing students’ understanding of the particulate nature of matter. Chemistry Education Research and Practice, 12(2), 121–132.
Google Scholar
Opfer, J. E., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: Knowing what students know about evolution. Journal of Research in Science Teaching, 49(6), 744–777.
Google Scholar
Özmen, H. (2011). Effect of animation enhanced conceptual change texts on 6th grade students’ understanding of the particulate nature of matter and transformation during phase changes. Computers & Education, 57(1), 1114–1126.
Google Scholar
Park, E. L., & Cho, S. (2014). KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea. (written in Korean)
Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391–3404.
Google Scholar
Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.) Pearson Education.
Ryan, S. A., & Stieff, M. (2019). Drawing for assessing learning outcomes in chemistry. Journal of Chemical Education, 96(9), 1813–1820.
Google Scholar
Shin, D., & Shim, J. (2021). A systematic review on data mining for mathematics and science education. International Journal of Science and Mathematics Education, 19, 639–659.
Google Scholar
Smith, A., Leeman-Munk, S., Shelton, A., Mott, B., Wiebe, E., & Lester, J. (2019). A multi-modal assessment framework for integrating student writing and drawing in elementary science learning. IEEE Transactions on Learning Technologies, 12(1), 3–15.
Google Scholar
Smith, C. L., Wiser, M., Anderson, C. W., & Krajcik, J. (2006). Implications of research on children’s learning for standards and assessment: A proposed learning progression for matter and the atomic-molecular theory. Measurement: Interdisciplinary Research & Perspective, 4(1–2), 1–98.
Sripathi, K. N., Moscarella, R. A., Yoho, R., You, H. S., Urban-Lurain, M., Merrill, J., & Haudek, K. (2019). Mixed student ideas about mechanisms of human weight loss. CBE-Life Sciences Education, 18(ar37), 1–17.
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826.
Taber, K. S., & García-Franco, A. (2010). Learning processes in chemistry: Drawing upon cognitive resources to learn about the particulate structure of matter. The Journal of the Learning Sciences, 19(1), 99–142.
Google Scholar
Treagust, D. F., Chandrasegaran, A. L., Crowley, J., Yung, B. H., Cheong, I. P. A., & Othman, J. (2010). Evaluating students’ understanding of kinetic particle theory concepts relating to the states of matter, changes of state and diffusion: A cross-national study. International Journal of Science and Mathematics Education, 8(1), 141–164.
Google Scholar
Treagust, D. F., Chandrasegaran, A. L., Zain, A. N., Ong, E. T., Karpudewan, M., & Halim, L. (2011). Evaluation of an intervention instructional program to facilitate understanding of basic particle concepts among students enrolled in several levels of study. Chemistry Education Research and Practice, 12(2), 251–261.
Google Scholar
Yarroch, W. L. (1985). Student understanding of chemical equation balancing. Journal of Research in Science Teaching, 22(5), 449–459.
Google Scholar
Yilmaz, A., & Alp, E. (2006). Students’ understanding of matter: The effect of reasoning ability and grade level. Chemistry Education Research and Practice, 7(1), 22–31.
Google Scholar
Zhai, X., He, P., & Krajcik, J. (2022). Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching. https://doi.org/10.1002/tea.21773
Article Google Scholar
Zhai, X., Krajcik, J., & Pellegrino, J. W. (2021). On the validity of machine learning-based next generation science assessments: A validity inferential network. Journal of Science Education and Technology. https://doi.org/10.1007/s10956-020-09879-9
Article Google Scholar
Zhai, X., Shi, L., & Nehm, R. H. (2020a). A meta-analysis of machine learning-based science assessments: Factors impacting machine-human score agreements. Journal of Science Education and Technology. https://doi.org/10.1007/s10956-020-09875-z
Article Google Scholar
Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020b). Applying machine learning in science assessment: A systematic review. Studies in Science Education, 56(1), 111–151.
Google Scholar
Zhu, M., Lee, H. S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.
Google Scholar
Zhu, M., Liu, O. L., & Lee, H. S. (2020). The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education, 143, 103668.
Google Scholar

Download references

Acknowledgements

This research was supported by 2020 Student-Directed Education Regular Program from Seoul National University.

Author information

Jaeyong Lee and Gyeong-Geon Lee contributed equally to this work.

Authors and Affiliations

Department of Education, College of Education, Seoul National University, Seoul, Republic of Korea
Jaeyong Lee
Department of Chemistry Education, College of Education, Seoul National University, Seoul, Republic of Korea
Gyeong-Geon Lee & Hun-Gi Hong

Authors

Jaeyong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gyeong-Geon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hun-Gi Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hun-Gi Hong.

Ethics declarations

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 6 Number (%) of student representations of matter for item 1 (highest ratios in each school level in bold)

Full size table

Table 7 Number (%) of student representations of matter for item 2 (highest ratios in each school level in bold)

Full size table

Table 8 Confusion matrix of the best classifying layer (SVM) for hand-drawn responses in the test dataset (row: actual, column: predicted; n = 206)

Full size table

Table 9 Confusion matrix of the best classifying layers for integrated hand drawings and writing in an item in the test dataset (row: actual; column: predicted) (n = 206)

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lee, J., Lee, GG. & Hong, HG. Automated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter. J Sci Educ Technol 32, 549–566 (2023). https://doi.org/10.1007/s10956-023-10042-3

Download citation

Accepted: 14 March 2023
Published: 26 April 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10956-023-10042-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter

Abstract

Access this article

Similar content being viewed by others

Word prevalence norms for 62,000 English lemmas

A Cognitive Load Theory Approach to Understanding Expert Scaffolding of Visual Problem-Solving Tasks: A Scoping Review

Word problems in mathematics education: a survey

Notes

References

Acknowledgements