Skip to main content

Scaling Mixed-Methods Formative Assessments (mixFA) in Classrooms: A Clustering Pipeline to Identify Student Knowledge

  • Conference paper
  • First Online:
Book cover Artificial Intelligence in Education (AIED 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

Abstract

Formative assessments provide valuable data for teachers to make instructional decisions and help students actively manage their progress and learning. Multiple-choice questions (MCQ) and free-text open-ended questions are typically employed as formative assessments. While MCQs have the benefit of ease of grading and visualizing student answers, they lack capabilities in revealing diverse student ideas and reasoning beyond the options. On the other hand, open-ended tasks and free-text submissions may elicit students’ perspectives more comprehensively, though it requires laborious work for instructors to analyze such responses. In this work, we explore the use of mixed-methods formative assessments in a college-level CS class, in which we assign MCQs and ask students to explain their answers. We propose a clustering pipeline to categorize students’ free-text explanations leveraging the meta-data the original MCQs provide. We find that using students’ choices in MCQs to resolve co-reference in their explanations and adding students’ choices as features significantly improve clustering performance. Moreover, our work demonstrates that providing structures in the data collection process improves the clustering of free-text responses without making changes to the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/UM-Lifelong-Learning-Lab/AIED2022-MixFA-dataset.

References

  1. Alhazmi, S., Hamilton, M., Thevathayan, C.: CS for all: catering to diversity of master’s students through assignment choices. In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, pp. 38–43 (2018)

    Google Scholar 

  2. Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)

    Google Scholar 

  3. Aranganayagi, S., Thangavel, K.: Clustering categorical data using silhouette coefficient as a relocating measure. In: International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), vol. 2, pp. 13–17 (2007)

    Google Scholar 

  4. Bennett, R.E.: Formative assessment: a critical review. Assess. Educ. Principles Policy Pract. 18(1), 5–25 (2011)

    Article  Google Scholar 

  5. Chung, C.-Y., Hsiao, I.-H.: Examining the effect of self-explanations in distributed self-assessment. In: De Laet, T., Klemke, R., Alario-Hoyos, C., Hilliger, I., Ortega-Arranz, A. (eds.) EC-TEL 2021. LNCS, vol. 12884, pp. 149–162. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86436-1_12

    Chapter  Google Scholar 

  6. Condor, A., Litster, M., Pardos, Z.: Automatic short answer grading with SBERT on out-of-sample questions. International Educational Data Mining Society (2021)

    Google Scholar 

  7. Crouch, C.H., Mazur, E.: Peer instruction: ten years of experience and results. Am. J. Phys. 69(9), 970–977 (2001)

    Article  Google Scholar 

  8. Feldman, M.Q., Cho, J.Y., Ong, M., Gulwani, S., Popović, Z., Andersen, E.: Automatic diagnosis of students’ misconceptions in K-8 mathematics. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018)

    Google Scholar 

  9. Galhardi, L.B., Brancher, J.D.: Machine learning approach for automatic short answer grading: a systematic review. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 380–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03928-8_31

    Chapter  Google Scholar 

  10. Harrison, C.J., Könings, K.D., Schuwirth, L.W., Wass, V., Van der Vleuten, C.P.: Changing the culture of assessment: the dominance of the summative assessment paradigm. BMC Med. Educ. 17(1), 1–14 (2017)

    Article  Google Scholar 

  11. Huggingface: Huggingface/neuralcoref: fast coreference resolution in spacy with neural networks. https://github.com/huggingface/neuralcoref

  12. Kanli, U.: Using a two-tier test to analyse students’ and teachers’ alternative concepts in astronomy. Sci. Educ. Int. 26(2), 148–165 (2015)

    Google Scholar 

  13. Kara, E., Tonin, M., Vlassopoulos, M.: Class size effects in higher education: differences across stem and non-stem fields. Econ. Educ. Rev. 82, 102104 (2021)

    Article  Google Scholar 

  14. Karataş, P., Karaman, A.C.: Challenges faced by novice language teachers: support, identity, and pedagogy in the initial years of teaching. Int. J. Res. Teach. Educ. 4(3), 10–23 (2013)

    Google Scholar 

  15. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  Google Scholar 

  16. Mandinach, E.B., Gummer, E.S., Muller, R.D.: The Complexities of Integrating Data-Driven Decision Making into Professional Preparation in Schools of Education: It’s Harder Than You Think. CNA Analysis & Solutions, Alexandria (2011)

    Google Scholar 

  17. Michalenko, J.J., Lan, A.S., Baraniuk, R.G.: Data-mining textual responses to uncover misconception patterns. In: Proceedings of the Fourth ACM Conference on Learning @ Scale, L@S 2017, New York, NY, USA (2017)

    Google Scholar 

  18. Nandini, V., Maheswari, P.U.: Automatic assessment of descriptive answers in online examination system using semantic relational features. J. Supercomput. 76(6), 4430–4448 (2020)

    Article  Google Scholar 

  19. Nathan, M.J., Petrosino, A.: Expert blind spot among preservice teachers. Am. Educ. Res. J. 40(4), 905–928 (2003)

    Article  Google Scholar 

  20. Ndukwe, I.G., Amadi, C.E., Nkomo, L.M., Daniel, B.K.: Automatic grading system using sentence-BERT network. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 224–227. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_41

    Chapter  Google Scholar 

  21. Polat, M.: Analysis of multiple-choice versus open-ended questions in language tests according to different cognitive domain levels. Novitas-ROYAL (Res. Youth Lang.) 14(2), 76–96 (2020)

    Google Scholar 

  22. Qian, Y., Lehman, J.: Students’ misconceptions and other difficulties in introductory programming: a literature review. ACM Trans. Comput. Educ. (TOCE) 18(1), 1–24 (2017)

    Article  Google Scholar 

  23. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)

  24. Schildkamp, K., van der Kleij, F.M., Heitink, M.C., Kippers, W.B., Veldkamp, B.P.: Formative assessment: a systematic review of critical teacher prerequisites for classroom practice. Int. J. Educ. Res. 103, 101602 (2020)

    Article  Google Scholar 

  25. Shi, Y., Mao, T., Barnes, T., Chi, M., Price, T.W.: More with less: exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In: Proceedings of the 14th International Conference on Educational Data Mining (EDM) 2021 (2021)

    Google Scholar 

  26. Shi, Y., Shah, K., Wang, W., Marwan, S., Penmetsa, P., Price, T.: Toward semi-automatic misconception discovery using code embeddings. In: LAK21: 11th International Learning Analytics and Knowledge Conference, pp. 606–612 (2021)

    Google Scholar 

  27. Singh, A., Karayev, S., Gutowski, K., Abbeel, P.: GradeScope: a fast, flexible, and fair system for scalable assessment of handwritten work. In: Proceedings of the Fourth ACM Conference on Learning@ Scale, pp. 81–88 (2017)

    Google Scholar 

  28. Sirkiä, T., Sorva, J.: Exploring programming misconceptions: an analysis of student mistakes in visual program simulation exercises. In: Proceedings of the 12th International Conference on Computing Education Research, pp. 19–28 (2012)

    Google Scholar 

  29. Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39

    Chapter  Google Scholar 

  30. Uto, M., Uchida, Y.: Automated short-answer grading using deep neural networks and item response theory. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 334–339. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_61

    Chapter  Google Scholar 

  31. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    MathSciNet  MATH  Google Scholar 

  32. Wang, X., Rose, C., Koedinger, K.: Seeing beyond expert blind spots: online learning design for scale and quality. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2021)

    Google Scholar 

  33. Wang, X., Talluri, S.T., Rose, C., Koedinger, K.: Upgrade: sourcing student open-ended solutions to create scalable learning opportunities. In: Proceedings of the Sixth ACM Conference on Learning@ Scale, pp. 1–10 (2019)

    Google Scholar 

  34. Williams, J.J., et al.: Axis: generating explanations at scale with learnersourcing and machine learning. In: Proceedings of the Third (2016) ACM Conference on Learning@ Scale, pp. 379–388 (2016)

    Google Scholar 

  35. Zhang, L., Huang, Y., Yang, X., Yu, S., Zhuang, F.: An automatic short-answer grading model for semi-open-ended questions. Interact. Learn. Environ. 30(1), 177–190 (2022)

    Article  Google Scholar 

  36. Zou, D., Xie, H.: Flipping an English writing class with technology-enhanced just-in-time teaching and peer instruction. Interact. Learn. Environ. 27, 1127–1142 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinyue Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Wang, X. (2022). Scaling Mixed-Methods Formative Assessments (mixFA) in Classrooms: A Clustering Pipeline to Identify Student Knowledge. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11644-5_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11643-8

  • Online ISBN: 978-3-031-11644-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics