Abstract
Formative assessments provide valuable data for teachers to make instructional decisions and help students actively manage their progress and learning. Multiple-choice questions (MCQ) and free-text open-ended questions are typically employed as formative assessments. While MCQs have the benefit of ease of grading and visualizing student answers, they lack capabilities in revealing diverse student ideas and reasoning beyond the options. On the other hand, open-ended tasks and free-text submissions may elicit students’ perspectives more comprehensively, though it requires laborious work for instructors to analyze such responses. In this work, we explore the use of mixed-methods formative assessments in a college-level CS class, in which we assign MCQs and ask students to explain their answers. We propose a clustering pipeline to categorize students’ free-text explanations leveraging the meta-data the original MCQs provide. We find that using students’ choices in MCQs to resolve co-reference in their explanations and adding students’ choices as features significantly improve clustering performance. Moreover, our work demonstrates that providing structures in the data collection process improves the clustering of free-text responses without making changes to the algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alhazmi, S., Hamilton, M., Thevathayan, C.: CS for all: catering to diversity of master’s students through assignment choices. In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, pp. 38–43 (2018)
Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)
Aranganayagi, S., Thangavel, K.: Clustering categorical data using silhouette coefficient as a relocating measure. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), vol. 2, pp. 13–17 (2007)
Bennett, R.E.: Formative assessment: a critical review. Assess. Educ. Principles Policy Pract. 18(1), 5–25 (2011)
Chung, C.-Y., Hsiao, I.-H.: Examining the effect of self-explanations in distributed self-assessment. In: De Laet, T., Klemke, R., Alario-Hoyos, C., Hilliger, I., Ortega-Arranz, A. (eds.) EC-TEL 2021. LNCS, vol. 12884, pp. 149–162. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86436-1_12
Condor, A., Litster, M., Pardos, Z.: Automatic short answer grading with SBERT on out-of-sample questions. International Educational Data Mining Society (2021)
Crouch, C.H., Mazur, E.: Peer instruction: ten years of experience and results. Am. J. Phys. 69(9), 970–977 (2001)
Feldman, M.Q., Cho, J.Y., Ong, M., Gulwani, S., Popović, Z., Andersen, E.: Automatic diagnosis of students’ misconceptions in K-8 mathematics. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018)
Galhardi, L.B., Brancher, J.D.: Machine learning approach for automatic short answer grading: a systematic review. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 380–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03928-8_31
Harrison, C.J., Könings, K.D., Schuwirth, L.W., Wass, V., Van der Vleuten, C.P.: Changing the culture of assessment: the dominance of the summative assessment paradigm. BMC Med. Educ. 17(1), 1–14 (2017)
Huggingface: Huggingface/neuralcoref: fast coreference resolution in spacy with neural networks. https://github.com/huggingface/neuralcoref
Kanli, U.: Using a two-tier test to analyse students’ and teachers’ alternative concepts in astronomy. Sci. Educ. Int. 26(2), 148–165 (2015)
Kara, E., Tonin, M., Vlassopoulos, M.: Class size effects in higher education: differences across stem and non-stem fields. Econ. Educ. Rev. 82, 102104 (2021)
Karataş, P., Karaman, A.C.: Challenges faced by novice language teachers: support, identity, and pedagogy in the initial years of teaching. Int. J. Res. Teach. Educ. 4(3), 10–23 (2013)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Mandinach, E.B., Gummer, E.S., Muller, R.D.: The Complexities of Integrating Data-Driven Decision Making into Professional Preparation in Schools of Education: It’s Harder Than You Think. CNA Analysis & Solutions, Alexandria (2011)
Michalenko, J.J., Lan, A.S., Baraniuk, R.G.: Data-mining textual responses to uncover misconception patterns. In: Proceedings of the Fourth ACM Conference on Learning @ Scale, L@S 2017, New York, NY, USA (2017)
Nandini, V., Maheswari, P.U.: Automatic assessment of descriptive answers in online examination system using semantic relational features. J. Supercomput. 76(6), 4430–4448 (2020)
Nathan, M.J., Petrosino, A.: Expert blind spot among preservice teachers. Am. Educ. Res. J. 40(4), 905–928 (2003)
Ndukwe, I.G., Amadi, C.E., Nkomo, L.M., Daniel, B.K.: Automatic grading system using sentence-BERT network. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 224–227. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_41
Polat, M.: Analysis of multiple-choice versus open-ended questions in language tests according to different cognitive domain levels. Novitas-ROYAL (Res. Youth Lang.) 14(2), 76–96 (2020)
Qian, Y., Lehman, J.: Students’ misconceptions and other difficulties in introductory programming: a literature review. ACM Trans. Comput. Educ. (TOCE) 18(1), 1–24 (2017)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Schildkamp, K., van der Kleij, F.M., Heitink, M.C., Kippers, W.B., Veldkamp, B.P.: Formative assessment: a systematic review of critical teacher prerequisites for classroom practice. Int. J. Educ. Res. 103, 101602 (2020)
Shi, Y., Mao, T., Barnes, T., Chi, M., Price, T.W.: More with less: exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In: Proceedings of the 14th International Conference on Educational Data Mining (EDM) 2021 (2021)
Shi, Y., Shah, K., Wang, W., Marwan, S., Penmetsa, P., Price, T.: Toward semi-automatic misconception discovery using code embeddings. In: LAK21: 11th International Learning Analytics and Knowledge Conference, pp. 606–612 (2021)
Singh, A., Karayev, S., Gutowski, K., Abbeel, P.: GradeScope: a fast, flexible, and fair system for scalable assessment of handwritten work. In: Proceedings of the Fourth ACM Conference on Learning@ Scale, pp. 81–88 (2017)
Sirkiä, T., Sorva, J.: Exploring programming misconceptions: an analysis of student mistakes in visual program simulation exercises. In: Proceedings of the 12th International Conference on Computing Education Research, pp. 19–28 (2012)
Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39
Uto, M., Uchida, Y.: Automated short-answer grading using deep neural networks and item response theory. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 334–339. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_61
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
Wang, X., Rose, C., Koedinger, K.: Seeing beyond expert blind spots: online learning design for scale and quality. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2021)
Wang, X., Talluri, S.T., Rose, C., Koedinger, K.: Upgrade: sourcing student open-ended solutions to create scalable learning opportunities. In: Proceedings of the Sixth ACM Conference on Learning@ Scale, pp. 1–10 (2019)
Williams, J.J., et al.: Axis: generating explanations at scale with learnersourcing and machine learning. In: Proceedings of the Third (2016) ACM Conference on Learning@ Scale, pp. 379–388 (2016)
Zhang, L., Huang, Y., Yang, X., Yu, S., Zhuang, F.: An automatic short-answer grading model for semi-open-ended questions. Interact. Learn. Environ. 30(1), 177–190 (2022)
Zou, D., Xie, H.: Flipping an English writing class with technology-enhanced just-in-time teaching and peer instruction. Interact. Learn. Environ. 27, 1127–1142 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Wang, X. (2022). Scaling Mixed-Methods Formative Assessments (mixFA) in Classrooms: A Clustering Pipeline to Identify Student Knowledge. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-11644-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)