Combining Machine Learning and Qualitative Methods to Elaborate Students’ Ideas About the Generality of their Model-Based Explanations

A Correction to this article was published on 20 November 2020

This article has been updated

Abstract

Assessing students’ participation in science practices presents several challenges, especially when aiming to differentiate meaningful (vs. rote) forms of participation. In this study, we sought to use machine learning (ML) for a novel purpose in science assessment: developing a construct map for students’ consideration of generality, a key epistemic understanding that undergirds meaningful participation in knowledge-building practices. We report on our efforts to assess the nature of 845 students’ ideas about the generality of their model-based explanations through the combination of an embedded written assessment and a novel data analytic approach that combines unsupervised and supervised machine learning methods and human-driven, interpretive coding. We demonstrate how unsupervised machine learning methods, when coupled with qualitative, interpretive coding, were used to revise our construct map for generality in a way that allowed for a more nuanced evaluation that was closely tied to empirical patterns in the data. We also explored the application of the construct map as a framework for coding used as a part of supervised machine learning methods, finding that it demonstrates some viability for use in future analyses. We discuss implications for the assessment of students’ meaningful participation in science practices in terms of their considerations of generality, the role of unsupervised methods in science assessment, and combining machine learning and human-driven approach for understanding students’ complex involvement in science practices.

This is a preview of subscription content, access via your institution.

Fig. 1

Change history

Notes

  1. 1.

    The approach we used has been shown to lend greater stability to the k means clustering solution, which can be influenced by the starting points for the algorithm. This approach uses the results from hierarchical clustering as the starting points for k means (Bergman and El-Khouri 1999). In our technique, what is being clustered is the vector space representation of each document: in other words, the raw data for the clustering procedure is a row in a table, with values ranging from zero to the maximum number of times any term appears across all documents. The default distance metric for the hierarchical clustering is cosine similarity.

  2. 2.

    The R package we created and used (Rosenberg and Lishinski 2018) is available to anyone via GitHub for anyone seeking to carry out a similar two-step cluster analysis in R (R Core Team 2019); Sherin (2020) provides a very similar package in python.

  3. 3.

    LOOCV is equivalent to k folds cross-validation when k is equal to the number of observations in the dataset.

References

  1. Allaire, J. J., & Chollet, F. (2019). keras: R interface to ‘Keras’. R package version 2.2.5.0. https://CRAN.R-project.org/package=keras

  2. Anderson, D. J., Rowley, B., Stegenga, S., Irvin, P. S., & Rosenberg, J. M. (2020). (advance online publication). Evaluating content-related validity evidence using a text-based, machine learning procedure. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12314.

  3. Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: how closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.

    Google Scholar 

  4. Benoit, K., Chester, P., & Müller, S. (2019a). quanteda.classifiers: models for supervised text classification. R package version 0.1. http://github.com/quanteda/quanteda.svm

  5. Benoit, K., Muhr, D., and Watanabe, K. (2019b). Stopwords: Multilingual Stopword lists. R package version 1.0. https://CRAN.R-project.org/package=stopwords

  6. Bergman, L. R., & El-Khouri, B. M. (1999). Studying individual patterns of development using I-states as objects analysis (ISOA). Biometrical Journal: Journal of Mathematical Methods in Biosciences, 41(6), 753–770.

    Google Scholar 

  7. Berland, L., & Crucet, K. (2016). Epistemological trade-offs: accounting for context when evaluating epistemological sophistication of student engagement in scientific practices. Science Education, 100(1), 5–29.

    Google Scholar 

  8. Berland, L. K., Schwarz, C. V., Krist, C., Kenyon, L., Lo, A. S., & Reiser, B. J. (2016). Epistemologies in practice: making scientific practices meaningful for students. Journal of Research in Science Teaching, 53(7), 1082–1112.

    Google Scholar 

  9. Bouchet-Valat, M. (2014). SnowballC: snowball stemmers based on the C libstemmer UTF-8 library. R package version 0.5, 1.

  10. Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.

    Google Scholar 

  11. Chinn, C. A., & Malhotra, B. A. (2002). Epistemologically authentic inquiry in schools: a theoretical framework for evaluating inquiry tasks. Science Education, 86(2), 175–218.

    Google Scholar 

  12. Cohen, J. (1968). Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.

    Google Scholar 

  13. Core Team, R. (2019). A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing https://www.R-project.org.

    Google Scholar 

  14. DeBarger, A. H., Penuel, W. R., & Harris, C. J. (2013). Designing NGSS assessments to evaluate the efficacy of curriculum interventions. Invitational Research Symposium on Science Assessment. Washington, DC: K-12 Center at ETS. Retrieved from http://www.k12center. org/rsc/pdf/debarger-penuel-harris.pdf.

  15. Dickes, A. C., Sengupta, P., Farris, A. V., & Basu, S. (2016). Development of mechanistic reasoning and multilevel explanations of ecology in third grade using agent-based models. Science Education, 100(4), 734–776.

    Google Scholar 

  16. Duncan, R. G., & Tseng, K. A. (2011). Designing project-based instruction to foster generative and mechanistic understandings in genetics. Science Education, 95(1), 21–56.

    Google Scholar 

  17. Ford, M. J. (2015). Educational implications of choosing “practice” to describe science in the next generation science standards. Science Education, 99(6), 1041–1048.

    Google Scholar 

  18. Ford, M. J., & Forman, E. A. (2006). Chapter 1: redefining disciplinary learning in classroom contexts. Review of Research in Education, 30(1), 1–32.

    Google Scholar 

  19. Fram, S. M. (2013). The constant comparative analysis method outside of grounded theory. The Qualitative Report, 18, 1.

    Google Scholar 

  20. Gerard, L. F., & Linn, M. C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129.

    Google Scholar 

  21. Giere, R. N. (1988). Explaining science: a cognitive approach. Chicago: University of Chicago Press.

    Google Scholar 

  22. Gobert, J. D., Sao Pedro, M., Raziuddin, J., & Baker, R. S. (2013). From log files to assessment metrics: measuring students’ science inquiry skills using educational data mining. The Journal of the Learning Sciences, 22(4), 521–563.

    Google Scholar 

  23. Gobert, J. D., Baker, R. S., & Wixon, M. B. (2015). Operationalizing and detecting disengagement within online science microworlds. Educational Psychologist, 50(1), 43–57.

    Google Scholar 

  24. Gotwals, A. W., & Songer, N. B. (2013). Validity evidence for learning progression-based assessment items that fuse core disciplinary ideas and science practices. Journal of Research in Science Teaching, 50(5), 597–626.

    Google Scholar 

  25. Greene, D., Hoffmann, A. L., & Stark, L. (2019). Better, nicer, clearer, fairer: a critical assessment of the movement for ethical artificial intelligence and machine learning, Hawaii international conference on system sciences (HICSS). HI: Maui.

  26. Harris, C. J., Krajcik, J. S., Pellegrino, J. W., & DeBarger, A. H. (2019). Designing knowledge-in-use assessments to promote deeper learning. Educational Measurement: Issues and Practice, 38(2), 53–67.

    Google Scholar 

  27. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.

  28. Haudek, K. C., Osborne, J., & Wilson, C. D. (2019). Using automated analysis to assess middle school students’ competence with scientific argumentation. In National Conference on Measurement in Education. Toronto: NCME.

    Google Scholar 

  29. Helleputte, T. (2017). LiblineaR: linear predictive models based on the Liblinear C/C++ library. R package version, 2, 10–18.

    Google Scholar 

  30. Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261–266.

    Google Scholar 

  31. Inkinen, J., Klager, C., Juuti, K., Schneider, B., Salmela-Aro, K., Krajcik, J., & Lavonen, J. (2020). High school students’ situational engagement associated with scientific practices in designed science learning situations. Science Education, 104(4), 667-692.

  32. Jiménez-Aleixandre, M. P., Bugallo Rodríguez, A., & Duschl, R. A. (2000). “Doing the lesson” or “doing science”: argument in high school genetics. Science Education, 84(6), 757–792.

    Google Scholar 

  33. Kelly, G. J. (2008). Inquiry, activity and epistemic practice. In R. A. Duschl & R. E. Grandy (Eds.), Teaching Scientific Inquiry (pp. 99–117). https://doi.org/10.1163/9789460911453_009.

    Chapter  Google Scholar 

  34. Kolodner, J. L. (Ed.). (1993). Case-based learning. Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  35. Krajcik, J., McNeill, K. L., & Reiser, B. J. (2008). Learning-goals-driven design model: developing curriculum materials that align with national standards and incorporate project-based pedagogy. Science Education, 92(1), 1–32.

    Google Scholar 

  36. Krajcik, J., Reiser, B., Sutherland, L., & Fortus, D. (2011). IQWST: investigating and questioning our world through science and technology (middle school science curriculum materials). Greenwich: Sangari Active Science.

  37. Krist, C. (2020). Examining how classroom communities developed practice-based epistemologies for science through analysis of longitudinal video data. Journal of Education & Psychology, 112(3), 420–443. https://doi.org/10.1037/edu0000417.

    Article  Google Scholar 

  38. Kuhn, D. (2000). Metacognitive development. Current Directions in Psychological Science, 9(5), 178–181.

    Google Scholar 

  39. Landis, J. R., & Koch, G. G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33(2), 363–374.

    Google Scholar 

  40. Laverty, J. T., Underwood, S. M., Matz, R. L., Posey, L. A., Carmel, J. H., Caballero, M. D., Fata-Hartley, C. L., Ebert-May, D., Jardeleza, S. E., & Cooper, M. M. (2016). Characterizing college science assessments: the three-dimensional learning assessment protocol. PLoS One, 11(9), e0162333.

    Google Scholar 

  41. Lead States, N. G. S. S. (2013). Next generation science standards: for states, by states. Washington, DC: National Academies Press.

    Google Scholar 

  42. Lehrer, R., & Schauble, L. (2006). Cultivating model-based reasoning in science education. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (p. 371–387). Cambridge University Press.

  43. Lehrer, R. & Schauble, L. (2015). Developing scientific thinking. In L. S. Liben & U. Müller (Eds.), Cognitive processes. Handbook of child psychology and developmental science (Vol. 2, 7th ed., pp. 671-174). Hoboken, NJ: Wiley.

  44. Lehrer, R., Schauble, Leona, & Petrosino, A. J. (2001). Reconsidering the role of experiment in science education. Designing for science: implications from everyday, classroom, and professional settings, 251–278.

  45. Manz, E. (2012). Understanding the codevelopment of modeling practice and ecological knowledge. Science Education, 96(6), 1071–1105.

    Google Scholar 

  46. Manz, E. (2015). Representing student argumentation as functionally emergent from scientific activity. Review of Educational Research, 85(4), 553–590.

    Google Scholar 

  47. McNeill, K., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. The Journal of the Learning Sciences, 15(2), 153–191.

    Google Scholar 

  48. Morell, L., Collier, T., Black, P., & Wilson, M. (2017). A construct-modeling approach to develop a learning progression of how students understand the structure of matter. Journal of Research in Science Teaching, 54(8), 1024–1048.

    Google Scholar 

  49. National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press.

    Google Scholar 

  50. National Research Council (2014). Developing assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press.https://doi.org/10.17226/18409.

  51. Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.

    Google Scholar 

  52. Nelson, L. K. (2020). Computational grounded theory: a methodological framework. Sociological Methods & Research, 49(1), 3–42. https://doi.org/10.1177/0049124117729703.

    Article  Google Scholar 

  53. Passmore, C., Schwarz, C. V., & Mankowski, J. (2017). Developing and using models. In C. V. Schwarz, C. Passmore, & B. J. Reiser (Eds.), Helping students make sense of the world using next generation science and engineering practices (pp. 109–135). Arlington, VA: NSTA Press.

    Google Scholar 

  54. Pei, B., Xing, W., & Lee, H. S. (2019). Using automatic image processing to analyze visual artifacts created by students in scientific argumentation. British Journal of Educational Technology, 50(6), 3391–3404.

    Google Scholar 

  55. Pellegrino, J. W. (2013). Proficiency in science: assessment challenges and opportunities. Science, 340(6130), 320–323.

    Google Scholar 

  56. Penuel, W. R., Turner, M. L., Jacobs, J. K., Van Horne, K., & Sumner, T. (2019). Developing tasks to assess phenomenon-based science learning: challenges and lessons learned from building proximal transfer tasks. Science Education, 103(6), 1367–1395.

    Google Scholar 

  57. Popper, K. R. (1959). The propensity interpretation of probability. The British Journal for the Philosophy of Science, 10(37), 25–42.

    Google Scholar 

  58. Reiser, B. J., Kim, J., Toyama, Y., & Draney, K. (2016). Multi-year growth in mechanistic reasoning across units in biology, chemistry, and physics. Paper presented at NARST, April, 14, 2016.

    Google Scholar 

  59. Rosenberg, J. M., & Lishinski, A. (2018). clustRcompaR: easy interface for clustering a set of documents and exploring group-based patterns [R package]. https://github.com/alishinski/clustRcompaR

  60. Ryu, S., & Sandoval, W. A. (2012). Improvements to elementary children’s epistemic understanding from sustained argumentation. Science Education, 96(3), 488–526.

    Google Scholar 

  61. Saldaña, J. (2016). The coding manual for qualitative researchers. Sage.

  62. Sandoval, W. A. (2005). Understanding students’ practical epistemologies and their influence on learning through inquiry. Science Education, 89(4), 634–656.

    Google Scholar 

  63. Sandoval, W. A., & Millwood, K. A. (2005). The quality of students’ use of evidence in written scientific explanations. Cognition and Instruction, 23(1), 23–55.

    Google Scholar 

  64. Schwarz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L. O., Archer, A., Fortus, D., & Krajcik, J. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632–654.

    Google Scholar 

  65. Schwarz, C. V., Passmore, C., & Reiser, B. J. (2017). Helping students make sense of the world using next generation science and engineering practices. Arlington, VA: NSTA Press.

    Google Scholar 

  66. Shaffer, D. W. (2017). Quantitative ethnography. Madison: Cathcart Press.

    Google Scholar 

  67. Sherin, B. (2013). A computational study of commonsense science: an exploration in the automated analysis of clinical interview data. The Journal of the Learning Sciences, 22(4), 600–638.

    Google Scholar 

  68. Shin, N., Stevens, S. Y., & Krajcik, J. (2010). Tracking student learning over time using construct-centred design. In Using Analytical Frameworks for Classroom Research (pp. 56–76). Routledge.

  69. Tabak, I., & Reiser, B.J. (1999). Steering the course of dialogue in inquiry-based science. Paper presented at the Annual Meeting of the American Educational Research Association Montreal, Canada.

  70. Thagard, P. R. (1978). The best explanation: criteria for theory choice. Journal of Philosophy, 75(2), 76–92.

    Google Scholar 

  71. Wiley, J., Hastings, P., Blaum, D., Jaeger, A. J., Hughes, S., Wallace, P., Griffin, T. D., & Britt, M. A. (2017). Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. International Journal of Artificial Intelligence in Education, 27(4), 758–790.

    Google Scholar 

  72. Wilson, M. (2004). Constructing measures: An item response modeling approach. London: Routledge.

    Google Scholar 

  73. Zangori, L., Forbes, C. T., & Schwarz, C. V. (2015). Exploring the effect of embedded scaffolding within curricular tasks on third-grade students’ model-based explanations about hydrologic cycling. Science & Education, 24(7–8), 957–981.

    Google Scholar 

  74. Zehner, F., Sälzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement, 76(2), 280–303.

    Google Scholar 

  75. Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111–151.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joshua M. Rosenberg.

Ethics declarations

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (Northwestern University #STU00034615 and Wright State University #FWA00002427) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Conflict of Interest

The authors declare that they have no conflicts of interest.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

ESM 1

(DOCX 17 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rosenberg, J.M., Krist, C. Combining Machine Learning and Qualitative Methods to Elaborate Students’ Ideas About the Generality of their Model-Based Explanations. J Sci Educ Technol 30, 255–267 (2021). https://doi.org/10.1007/s10956-020-09862-4

Download citation

Keywords

  • Assessment
  • Scientific practices
  • Machine learning
  • Epistemology
  • Middle school
  • Quantitative
  • Grounded theory
  • Generality