Abstract
The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students’ written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students’ normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.
Similar content being viewed by others
References
Abu-Mostafa YS (2012) Machines that think for themselves. Sci Am 307(1):78–81
American Association for the Advancement of Science (2011) Vision and change in undergraduate biology education. AAAS, Washington
Anderson DL, Fisher KM, Norman GJ (2002) Development and evaluation of the conceptual inventory of natural selection. J Res Sci Teach 39(10):952–978
Battisti BT, Hanegan N, Sudweeks R, Cates R (2010) Using item response theory to conduct a distracter analysis on conceptual inventory of natural selection. Int J Sci Math Educ 8:845–868
Beggrow EP, Nehm RH (2012) Students’ mental models of evolutionary causation: natural selection and genetic drift. Evolut Educ Outreach 5(3):429–444
Berland LK, McNeill KL (2012) For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Sci Educ 96(5):808–813
Bishop BA, Anderson CW (1990) Student conceptions of natural selection and its role in evolution. J Res Sci Teach 27(5):415–427
Black TR (1999) Doing quantitative research in the social sciences. Sage Publications, London
Bond TG, Fox CM (2001) Applying the Rasch model: fundamental measurement in the human sciences. Lawrence Erlbaum Associates, Mahwah
Boone WJ, Scantlebury K (2006) The role of Rasch analysis when conducting science education research utilizing multiple-choice tests. Sci Educ 90(2):253–269
Braaten M, Windschitl M (2011) Working toward a stronger conceptualization of scientific explanation for science education. Sci Educ 95(4):639–669
Briggs DC, Alonzo AC, Schwab C, Wilson M (2006) Diagnostic assessment with ordered multiple-choice items. Educ Assess 11(1):33–63
Chi MTH, Bassok M, Lewis MW, Reimann P, Glaser R (1989) Self-explanations: how students study and use examples in learning to solve problems. Cogn Sci 13:145–182
Deadman JA, Kelly PJ (1978) What do secondary school boys understand about evolution and heredity before they are taught the topics? J Biol Educ 12(1):7–15
Ginsburg H (1981) The clinical interview in psychological research on mathematical thinking: aims, rationales, techniques. Learn Math 1(3):4–11
Gobert JD, Sao Pedro MA, Baker RSJD, Toto E, Montalvo O (2012) Leveraging educational data mining for real-time performance assessment of scientific inquiry skills within microworlds. J Educ Data Min 4(1):153–185
Graesser AC, McNamara DS (2012) Automated analysis of essays and open-ended verbal responses. In: Cooper H, Panter AT (eds) APA handbook of research methods in psychology. American Psychological Association, Washington
Ha M, Nehm RH (2012) Using machine-learning methods to detect key concepts and misconceptions of evolution in students’ written explanations. Paper in proceedings of the National Association for Research in Science Teaching, Indianapolis, Indiana
Ha M, Nehm RH, Urban-Lurain M, Merrill JE (2011) Applying computerized scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE Life Sci Educ 10:379–393
Haudek KC, Kaplan JJ, Knight J, Long T, Merrill J, Munn A, Nehm RH, Smith M, Urban-Lurain M (2011) Harnessing technology to improve formative assessment of student conceptions in STEM: forging a national network. CBE Life Sci Educ 10(2):149–155
Joughin G (1998) Dimensions of oral assessment. Assess Eval High Educ 23(4):367–378
Leacock C, Chodorow M (2003) C-rater: automated scoring of short-answer questions. Comput Humanit 37(4):389–405
Linacre JM (2006) WINSTEPS Rasch measurement software [Computer program]. WINSTEPS, Chicago
Lombrozo T (2006) The structure and function of explanations. Trends Cogn Sci 10:464–470
Lombrozo T (2012) Explanation and abductive inference. In: Holyoak KJ, Morrison RG (eds) Oxford handbook of thinking and reasoning. Oxford University Press, Oxford, pp 260–276
Magliano JP, Graesser AC (2012) Computer-based assessment of student-constructed responses. Behav Res Methods 44:608–621
Mayfield E, Rosé C (2012) LightSIDE: text mining and machine learning user’s manual. Carnegie Mellon University, Pittsburgh
Mayfield E, Rosé C (2013) LightSIDE: open source machine learning for text. In: Shermis MD, Burstein J (eds) Handbook of automated essay evaluation. Routledge, New York
McNeill KL, Lizotte DJ, Krajcik J, Marx RW (2006) Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. J Learn Sci 15(2):153–191
Moscarella RA, Urban-Lurain M, Merritt B, Long T, Richmond G, Merrill J, Parker J, Patterson R, Wilson C (2008) Understanding undergraduate students’ conceptions in science: using lexical analysis software to analyze students’ constructed responses in biology. Proceedings of the National Association for Research in Science Teaching (NARST) annual conference, Baltimore, MD
National Research Council (1996) National science education standards. The National Academies Press, Washington, DC
National Research Council (2001a) Investigating the influence of standards: a framework for research in mathematics, science, and technology education. The National Academies Press, Washington, DC
National Research Council (2001b) Knowing what students know. The National Academies Press, Washington, DC
National Research Council (2007) Taking science to school. The National Academies Press, Washington, DC
National Research Council (2012) A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC
Nehm RH, Ha M (2011) Item feature effects in evolution assessment. J Res Sci Teach 48(3):237–256
Nehm RH, Haertig H (2012) Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. J Sci Educ Technol 21(1):56–73
Nehm RH, Reilly L (2007) Biology majors’ knowledge and misconceptions of natural selection. Bioscience 57(3):263–272
Nehm RH, Ridgway J (2011) What do experts and novices “see” in evolutionary problems? Evol Educ Outreach 4(4):666–679
Nehm RH, Schonfeld IS (2008) Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and an oral interview. J Res Sci Teach 45(10):1131–1160
Nehm RH, Kim SY, Sheppard K (2009) Academic preparation in biology and advocacy for teaching evolution: biology versus non-biology teachers. Sci Educ 93(6):1122–1146
Nehm RH, Ha M, Rector M, Opfer JE, Perrin L, Ridgway J, Mollohan K (2010) Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project 0909999
Nehm RH, Beggrow EP, Opfer JE, Ha M (2012a) Reasoning about natural selection: diagnosing contextual competency using the ACORNS instrument. Am Biol Teach 74(2):92–98
Nehm RH, Ha M, Mayfield E (2012b) Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. J Sci Educ Technol 21(1):183–196
Neumann I, Neumann K, Nehm R (2011) Evaluating instrument quality in science education: Rasch-based analyses of a nature of science test. Int J Sci Educ 33(10):1373–1405
Opfer JE, Nehm RH, Ha M (2012) Cognitive foundations for science assessment design: knowing what students know about evolution. J Res Sci Teach 49(6):744–777
Osborne JF, Patterson A (2011) Scientific argument and explanation: a necessary distinction? Sci Educ 95(4):627–638
Page EB (1966) The imminence of grading essays by computers. Phi Delta Kappan 47:238–243
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Rector MA, Nehm RH, Pearl D (2012) Item sequencing effects on the measurement of students’ biological knowledge. Paper in the proceeding of the National Association of Research in Science Teaching, Indianapolis, IN, 25–28 March 2012
Rector MA, Nehm RH, Pearl D (2013) Learning the language of evolution: lexical ambiguity and word meaning in student explanations. Res Sci Educ 43(3):1107–1133
Roediger HL III, Marsh EJ (2005) The positive and negative consequences of multiple-choice testing. J Exp Psychol Learn Mem Cogn 31(5):1155
Russ RS, Scherr RE, Hammer D, Mikeska J (2008) Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science. Sci Educ 92(3):499–525
Russ RS, Lee VR, Sherin BL (2012) Framing in cognitive clinical interviews about intuitive science knowledge: dynamic student understandings of the discourse interaction. Sci Educ 96(4):573–599
Sandoval WA, Millwood KA (2005) The quality of students’ use of evidence in written scientific explanations. Cogn Instr 23(1):23–55
Seddon GM, Pedrosa MA (1988) A comparison of students’ explanations derived from spoken and written methods of questioning and answering. Int J Sci Educ 10(3):337–342
Shermis MD, Burstein J (2003) Automated essay scoring: a cross-disciplinary perspective. Lawrence Erlbaum Associates, Inc., Mahwah
Songer NB, Gotwals AW (2012) Guiding explanation construction by children at the entry points of learning progressions. J Res Sci Teach 49(2):141–165
Songer NB, Kelcey B, Gotwals AW (2009) How and when does complex reasoning occur? Empirically driven development of a learning progression focused on complex reasoning about biodiversity. J Res Sci Teach 46(6):610–631
Vosniadou S, Vamvakoussi X, Skopeliti I (2008) The framework theory approach to the problem of conceptual change. In: Vosniadou S (ed) International handbook of research on conceptual change. Routledge, New York, pp 3–34
Woloshyn V, Gallagher T (2009, December 23) Self-explanation. Retrieved from http://www.education.com/reference/article/self-explanation/
Yang Y, Buckendahl CW, Juszkiewicz PJ, Bhola DS (2002) A review of strategies for validating computer automated scoring. Appl Meas Educ 15(4):391–412
Acknowledgments
We thank John Harder, Ian Hamilton, the Ohio State University’s Center for Life Science Education and Department of Anthropology’s Graduate Teaching Associate program for assistance with data collection, the National Science Foundation REESE program (DRL 0909999) for funding portions of this study, and Meghan Rector for helpful reviews of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Beggrow, E.P., Ha, M., Nehm, R.H. et al. Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?. J Sci Educ Technol 23, 160–182 (2014). https://doi.org/10.1007/s10956-013-9461-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10956-013-9461-9