Assessing students’ participation in science practices presents several challenges, especially when aiming to differentiate meaningful (vs. rote) forms of participation. In this study, we sought to use machine learning (ML) for a novel purpose in science assessment: developing a construct map for students’ consideration of generality, a key epistemic understanding that undergirds meaningful participation in knowledge-building practices. We report on our efforts to assess the nature of 845 students’ ideas about the generality of their model-based explanations through the combination of an embedded written assessment and a novel data analytic approach that combines unsupervised and supervised machine learning methods and human-driven, interpretive coding. We demonstrate how unsupervised machine learning methods, when coupled with qualitative, interpretive coding, were used to revise our construct map for generality in a way that allowed for a more nuanced evaluation that was closely tied to empirical patterns in the data. We also explored the application of the construct map as a framework for coding used as a part of supervised machine learning methods, finding that it demonstrates some viability for use in future analyses. We discuss implications for the assessment of students’ meaningful participation in science practices in terms of their considerations of generality, the role of unsupervised methods in science assessment, and combining machine learning and human-driven approach for understanding students’ complex involvement in science practices.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
The approach we used has been shown to lend greater stability to the k means clustering solution, which can be influenced by the starting points for the algorithm. This approach uses the results from hierarchical clustering as the starting points for k means (Bergman and El-Khouri 1999). In our technique, what is being clustered is the vector space representation of each document: in other words, the raw data for the clustering procedure is a row in a table, with values ranging from zero to the maximum number of times any term appears across all documents. The default distance metric for the hierarchical clustering is cosine similarity.
LOOCV is equivalent to k folds cross-validation when k is equal to the number of observations in the dataset.
Allaire, J. J., & Chollet, F. (2019). keras: R interface to ‘Keras’. R package version 220.127.116.11. https://CRAN.R-project.org/package=keras
Anderson, D. J., Rowley, B., Stegenga, S., Irvin, P. S., & Rosenberg, J. M. (2020). (advance online publication). Evaluating content-related validity evidence using a text-based, machine learning procedure. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12314.
Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: how closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.
Benoit, K., Chester, P., & Müller, S. (2019a). quanteda.classifiers: models for supervised text classification. R package version 0.1. http://github.com/quanteda/quanteda.svm
Benoit, K., Muhr, D., and Watanabe, K. (2019b). Stopwords: Multilingual Stopword lists. R package version 1.0. https://CRAN.R-project.org/package=stopwords
Bergman, L. R., & El-Khouri, B. M. (1999). Studying individual patterns of development using I-states as objects analysis (ISOA). Biometrical Journal: Journal of Mathematical Methods in Biosciences, 41(6), 753–770.
Berland, L., & Crucet, K. (2016). Epistemological trade-offs: accounting for context when evaluating epistemological sophistication of student engagement in scientific practices. Science Education, 100(1), 5–29.
Berland, L. K., Schwarz, C. V., Krist, C., Kenyon, L., Lo, A. S., & Reiser, B. J. (2016). Epistemologies in practice: making scientific practices meaningful for students. Journal of Research in Science Teaching, 53(7), 1082–1112.
Bouchet-Valat, M. (2014). SnowballC: snowball stemmers based on the C libstemmer UTF-8 library. R package version 0.5, 1.
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.
Chinn, C. A., & Malhotra, B. A. (2002). Epistemologically authentic inquiry in schools: a theoretical framework for evaluating inquiry tasks. Science Education, 86(2), 175–218.
Cohen, J. (1968). Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.
Core Team, R. (2019). A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing https://www.R-project.org.
DeBarger, A. H., Penuel, W. R., & Harris, C. J. (2013). Designing NGSS assessments to evaluate the efficacy of curriculum interventions. Invitational Research Symposium on Science Assessment. Washington, DC: K-12 Center at ETS. Retrieved from http://www.k12center. org/rsc/pdf/debarger-penuel-harris.pdf.
Dickes, A. C., Sengupta, P., Farris, A. V., & Basu, S. (2016). Development of mechanistic reasoning and multilevel explanations of ecology in third grade using agent-based models. Science Education, 100(4), 734–776.
Duncan, R. G., & Tseng, K. A. (2011). Designing project-based instruction to foster generative and mechanistic understandings in genetics. Science Education, 95(1), 21–56.
Ford, M. J. (2015). Educational implications of choosing “practice” to describe science in the next generation science standards. Science Education, 99(6), 1041–1048.
Ford, M. J., & Forman, E. A. (2006). Chapter 1: redefining disciplinary learning in classroom contexts. Review of Research in Education, 30(1), 1–32.
Fram, S. M. (2013). The constant comparative analysis method outside of grounded theory. The Qualitative Report, 18, 1.
Gerard, L. F., & Linn, M. C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129.
Giere, R. N. (1988). Explaining science: a cognitive approach. Chicago: University of Chicago Press.
Gobert, J. D., Sao Pedro, M., Raziuddin, J., & Baker, R. S. (2013). From log files to assessment metrics: measuring students’ science inquiry skills using educational data mining. The Journal of the Learning Sciences, 22(4), 521–563.
Gobert, J. D., Baker, R. S., & Wixon, M. B. (2015). Operationalizing and detecting disengagement within online science microworlds. Educational Psychologist, 50(1), 43–57.
Gotwals, A. W., & Songer, N. B. (2013). Validity evidence for learning progression-based assessment items that fuse core disciplinary ideas and science practices. Journal of Research in Science Teaching, 50(5), 597–626.
Greene, D., Hoffmann, A. L., & Stark, L. (2019). Better, nicer, clearer, fairer: a critical assessment of the movement for ethical artificial intelligence and machine learning, Hawaii international conference on system sciences (HICSS). HI: Maui.
Harris, C. J., Krajcik, J. S., Pellegrino, J. W., & DeBarger, A. H. (2019). Designing knowledge-in-use assessments to promote deeper learning. Educational Measurement: Issues and Practice, 38(2), 53–67.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.
Haudek, K. C., Osborne, J., & Wilson, C. D. (2019). Using automated analysis to assess middle school students’ competence with scientific argumentation. In National Conference on Measurement in Education. Toronto: NCME.
Helleputte, T. (2017). LiblineaR: linear predictive models based on the Liblinear C/C++ library. R package version, 2, 10–18.
Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261–266.
Inkinen, J., Klager, C., Juuti, K., Schneider, B., Salmela-Aro, K., Krajcik, J., & Lavonen, J. (2020). High school students’ situational engagement associated with scientific practices in designed science learning situations. Science Education, 104(4), 667-692.
Jiménez-Aleixandre, M. P., Bugallo Rodríguez, A., & Duschl, R. A. (2000). “Doing the lesson” or “doing science”: argument in high school genetics. Science Education, 84(6), 757–792.
Kelly, G. J. (2008). Inquiry, activity and epistemic practice. In R. A. Duschl & R. E. Grandy (Eds.), Teaching Scientific Inquiry (pp. 99–117). https://doi.org/10.1163/9789460911453_009.
Kolodner, J. L. (Ed.). (1993). Case-based learning. Dordrecht: Kluwer Academic Publishers.
Krajcik, J., McNeill, K. L., & Reiser, B. J. (2008). Learning-goals-driven design model: developing curriculum materials that align with national standards and incorporate project-based pedagogy. Science Education, 92(1), 1–32.
Krajcik, J., Reiser, B., Sutherland, L., & Fortus, D. (2011). IQWST: investigating and questioning our world through science and technology (middle school science curriculum materials). Greenwich: Sangari Active Science.
Krist, C. (2020). Examining how classroom communities developed practice-based epistemologies for science through analysis of longitudinal video data. Journal of Education & Psychology, 112(3), 420–443. https://doi.org/10.1037/edu0000417.
Kuhn, D. (2000). Metacognitive development. Current Directions in Psychological Science, 9(5), 178–181.
Landis, J. R., & Koch, G. G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33(2), 363–374.
Laverty, J. T., Underwood, S. M., Matz, R. L., Posey, L. A., Carmel, J. H., Caballero, M. D., Fata-Hartley, C. L., Ebert-May, D., Jardeleza, S. E., & Cooper, M. M. (2016). Characterizing college science assessments: the three-dimensional learning assessment protocol. PLoS One, 11(9), e0162333.
Lead States, N. G. S. S. (2013). Next generation science standards: for states, by states. Washington, DC: National Academies Press.
Lehrer, R., & Schauble, L. (2006). Cultivating model-based reasoning in science education. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (p. 371–387). Cambridge University Press.
Lehrer, R. & Schauble, L. (2015). Developing scientific thinking. In L. S. Liben & U. Müller (Eds.), Cognitive processes. Handbook of child psychology and developmental science (Vol. 2, 7th ed., pp. 671-174). Hoboken, NJ: Wiley.
Lehrer, R., Schauble, Leona, & Petrosino, A. J. (2001). Reconsidering the role of experiment in science education. Designing for science: implications from everyday, classroom, and professional settings, 251–278.
Manz, E. (2012). Understanding the codevelopment of modeling practice and ecological knowledge. Science Education, 96(6), 1071–1105.
Manz, E. (2015). Representing student argumentation as functionally emergent from scientific activity. Review of Educational Research, 85(4), 553–590.
McNeill, K., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. The Journal of the Learning Sciences, 15(2), 153–191.
Morell, L., Collier, T., Black, P., & Wilson, M. (2017). A construct-modeling approach to develop a learning progression of how students understand the structure of matter. Journal of Research in Science Teaching, 54(8), 1024–1048.
National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press.
National Research Council (2014). Developing assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press.https://doi.org/10.17226/18409.
Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.
Nelson, L. K. (2020). Computational grounded theory: a methodological framework. Sociological Methods & Research, 49(1), 3–42. https://doi.org/10.1177/0049124117729703.
Passmore, C., Schwarz, C. V., & Mankowski, J. (2017). Developing and using models. In C. V. Schwarz, C. Passmore, & B. J. Reiser (Eds.), Helping students make sense of the world using next generation science and engineering practices (pp. 109–135). Arlington, VA: NSTA Press.
Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391–3404.
Pellegrino, J. W. (2013). Proficiency in science: assessment challenges and opportunities. Science, 340(6130), 320–323.
Penuel, W. R., Turner, M. L., Jacobs, J. K., Van Horne, K., & Sumner, T. (2019). Developing tasks to assess phenomenon-based science learning: challenges and lessons learned from building proximal transfer tasks. Science Education, 103(6), 1367–1395.
Popper, K. R. (1959). The propensity interpretation of probability. The British Journal for the Philosophy of Science, 10(37), 25–42.
Reiser, B. J., Kim, J., Toyama, Y., & Draney, K. (2016). Multi-year growth in mechanistic reasoning across units in biology, chemistry, and physics. Paper presented at NARST, April, 14, 2016.
Rosenberg, J. M., & Lishinski, A. (2018). clustRcompaR: easy interface for clustering a set of documents and exploring group-based patterns [R package]. https://github.com/alishinski/clustRcompaR
Ryu, S., & Sandoval, W. A. (2012). Improvements to elementary children’s epistemic understanding from sustained argumentation. Science Education, 96(3), 488–526.
Saldaña, J. (2016). The coding manual for qualitative researchers. Sage.
Sandoval, W. A. (2005). Understanding students’ practical epistemologies and their influence on learning through inquiry. Science Education, 89(4), 634–656.
Sandoval, W. A., & Millwood, K. A. (2005). The quality of students’ use of evidence in written scientific explanations. Cognition and Instruction, 23(1), 23–55.
Schwarz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L. O., Archer, A., Fortus, D., & Krajcik, J. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632–654.
Schwarz, C. V., Passmore, C., & Reiser, B. J. (2017). Helping students make sense of the world using next generation science and engineering practices. Arlington, VA: NSTA Press.
Shaffer, D. W. (2017). Quantitative ethnography. Madison: Cathcart Press.
Sherin, B. (2013). A computational study of commonsense science: an exploration in the automated analysis of clinical interview data. The Journal of the Learning Sciences, 22(4), 600–638.
Shin, N., Stevens, S. Y., & Krajcik, J. (2010). Tracking student learning over time using construct-centred design. In Using Analytical Frameworks for Classroom Research (pp. 56–76). Routledge.
Tabak, I., & Reiser, B.J. (1999). Steering the course of dialogue in inquiry-based science. Paper presented at the Annual Meeting of the American Educational Research Association Montreal, Canada.
Thagard, P. R. (1978). The best explanation: criteria for theory choice. Journal of Philosophy, 75(2), 76–92.
Wiley, J., Hastings, P., Blaum, D., Jaeger, A. J., Hughes, S., Wallace, P., Griffin, T. D., & Britt, M. A. (2017). Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. International Journal of Artificial Intelligence in Education, 27(4), 758–790.
Wilson, M. (2004). Constructing measures: An item response modeling approach. London: Routledge.
Zangori, L., Forbes, C. T., & Schwarz, C. V. (2015). Exploring the effect of embedded scaffolding within curricular tasks on third-grade students’ model-based explanations about hydrologic cycling. Science & Education, 24(7–8), 957–981.
Zehner, F., Sälzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement, 76(2), 280–303.
Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (Northwestern University #STU00034615 and Wright State University #FWA00002427) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Conflict of Interest
The authors declare that they have no conflicts of interest.
Informed consent was obtained from all individual participants included in the study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic Supplementary Material
About this article
Cite this article
Rosenberg, J.M., Krist, C. Combining Machine Learning and Qualitative Methods to Elaborate Students’ Ideas About the Generality of their Model-Based Explanations. J Sci Educ Technol 30, 255–267 (2021). https://doi.org/10.1007/s10956-020-09862-4
- Scientific practices
- Machine learning
- Middle school
- Grounded theory