Skip to main content

Advertisement

Log in

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

  • Published:
Journal of Science Education and Technology Aims and scope Submit manuscript

Abstract

The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students’ written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students’ normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abu-Mostafa YS (2012) Machines that think for themselves. Sci Am 307(1):78–81

    Article  Google Scholar 

  • American Association for the Advancement of Science (2011) Vision and change in undergraduate biology education. AAAS, Washington

    Google Scholar 

  • Anderson DL, Fisher KM, Norman GJ (2002) Development and evaluation of the conceptual inventory of natural selection. J Res Sci Teach 39(10):952–978

    Article  Google Scholar 

  • Battisti BT, Hanegan N, Sudweeks R, Cates R (2010) Using item response theory to conduct a distracter analysis on conceptual inventory of natural selection. Int J Sci Math Educ 8:845–868

    Article  Google Scholar 

  • Beggrow EP, Nehm RH (2012) Students’ mental models of evolutionary causation: natural selection and genetic drift. Evolut Educ Outreach 5(3):429–444

    Article  Google Scholar 

  • Berland LK, McNeill KL (2012) For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Sci Educ 96(5):808–813

    Article  Google Scholar 

  • Bishop BA, Anderson CW (1990) Student conceptions of natural selection and its role in evolution. J Res Sci Teach 27(5):415–427

    Article  Google Scholar 

  • Black TR (1999) Doing quantitative research in the social sciences. Sage Publications, London

    Google Scholar 

  • Bond TG, Fox CM (2001) Applying the Rasch model: fundamental measurement in the human sciences. Lawrence Erlbaum Associates, Mahwah

    Google Scholar 

  • Boone WJ, Scantlebury K (2006) The role of Rasch analysis when conducting science education research utilizing multiple-choice tests. Sci Educ 90(2):253–269

    Article  Google Scholar 

  • Braaten M, Windschitl M (2011) Working toward a stronger conceptualization of scientific explanation for science education. Sci Educ 95(4):639–669

    Article  Google Scholar 

  • Briggs DC, Alonzo AC, Schwab C, Wilson M (2006) Diagnostic assessment with ordered multiple-choice items. Educ Assess 11(1):33–63

    Article  Google Scholar 

  • Chi MTH, Bassok M, Lewis MW, Reimann P, Glaser R (1989) Self-explanations: how students study and use examples in learning to solve problems. Cogn Sci 13:145–182

    Article  Google Scholar 

  • Deadman JA, Kelly PJ (1978) What do secondary school boys understand about evolution and heredity before they are taught the topics? J Biol Educ 12(1):7–15

    Article  Google Scholar 

  • Ginsburg H (1981) The clinical interview in psychological research on mathematical thinking: aims, rationales, techniques. Learn Math 1(3):4–11

    Google Scholar 

  • Gobert JD, Sao Pedro MA, Baker RSJD, Toto E, Montalvo O (2012) Leveraging educational data mining for real-time performance assessment of scientific inquiry skills within microworlds. J Educ Data Min 4(1):153–185

    Google Scholar 

  • Graesser AC, McNamara DS (2012) Automated analysis of essays and open-ended verbal responses. In: Cooper H, Panter AT (eds) APA handbook of research methods in psychology. American Psychological Association, Washington

    Google Scholar 

  • Ha M, Nehm RH (2012) Using machine-learning methods to detect key concepts and misconceptions of evolution in students’ written explanations. Paper in proceedings of the National Association for Research in Science Teaching, Indianapolis, Indiana

  • Ha M, Nehm RH, Urban-Lurain M, Merrill JE (2011) Applying computerized scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE Life Sci Educ 10:379–393

    Article  Google Scholar 

  • Haudek KC, Kaplan JJ, Knight J, Long T, Merrill J, Munn A, Nehm RH, Smith M, Urban-Lurain M (2011) Harnessing technology to improve formative assessment of student conceptions in STEM: forging a national network. CBE Life Sci Educ 10(2):149–155

    Article  Google Scholar 

  • Joughin G (1998) Dimensions of oral assessment. Assess Eval High Educ 23(4):367–378

    Article  Google Scholar 

  • Leacock C, Chodorow M (2003) C-rater: automated scoring of short-answer questions. Comput Humanit 37(4):389–405

    Article  Google Scholar 

  • Linacre JM (2006) WINSTEPS Rasch measurement software [Computer program]. WINSTEPS, Chicago

    Google Scholar 

  • Lombrozo T (2006) The structure and function of explanations. Trends Cogn Sci 10:464–470

    Article  Google Scholar 

  • Lombrozo T (2012) Explanation and abductive inference. In: Holyoak KJ, Morrison RG (eds) Oxford handbook of thinking and reasoning. Oxford University Press, Oxford, pp 260–276

    Google Scholar 

  • Magliano JP, Graesser AC (2012) Computer-based assessment of student-constructed responses. Behav Res Methods 44:608–621

    Article  Google Scholar 

  • Mayfield E, Rosé C (2012) LightSIDE: text mining and machine learning user’s manual. Carnegie Mellon University, Pittsburgh

    Google Scholar 

  • Mayfield E, Rosé C (2013) LightSIDE: open source machine learning for text. In: Shermis MD, Burstein J (eds) Handbook of automated essay evaluation. Routledge, New York

  • McNeill KL, Lizotte DJ, Krajcik J, Marx RW (2006) Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. J Learn Sci 15(2):153–191

    Article  Google Scholar 

  • Moscarella RA, Urban-Lurain M, Merritt B, Long T, Richmond G, Merrill J, Parker J, Patterson R, Wilson C (2008) Understanding undergraduate students’ conceptions in science: using lexical analysis software to analyze students’ constructed responses in biology. Proceedings of the National Association for Research in Science Teaching (NARST) annual conference, Baltimore, MD

  • National Research Council (1996) National science education standards. The National Academies Press, Washington, DC

  • National Research Council (2001a) Investigating the influence of standards: a framework for research in mathematics, science, and technology education. The National Academies Press, Washington, DC

    Google Scholar 

  • National Research Council (2001b) Knowing what students know. The National Academies Press, Washington, DC

    Google Scholar 

  • National Research Council (2007) Taking science to school. The National Academies Press, Washington, DC

    Google Scholar 

  • National Research Council (2012) A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC

    Google Scholar 

  • Nehm RH, Ha M (2011) Item feature effects in evolution assessment. J Res Sci Teach 48(3):237–256

    Article  Google Scholar 

  • Nehm RH, Haertig H (2012) Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. J Sci Educ Technol 21(1):56–73

    Article  Google Scholar 

  • Nehm RH, Reilly L (2007) Biology majors’ knowledge and misconceptions of natural selection. Bioscience 57(3):263–272

    Article  Google Scholar 

  • Nehm RH, Ridgway J (2011) What do experts and novices “see” in evolutionary problems? Evol Educ Outreach 4(4):666–679

    Article  Google Scholar 

  • Nehm RH, Schonfeld IS (2008) Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and an oral interview. J Res Sci Teach 45(10):1131–1160

    Article  Google Scholar 

  • Nehm RH, Kim SY, Sheppard K (2009) Academic preparation in biology and advocacy for teaching evolution: biology versus non-biology teachers. Sci Educ 93(6):1122–1146

    Article  Google Scholar 

  • Nehm RH, Ha M, Rector M, Opfer JE, Perrin L, Ridgway J, Mollohan K (2010) Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project 0909999

  • Nehm RH, Beggrow EP, Opfer JE, Ha M (2012a) Reasoning about natural selection: diagnosing contextual competency using the ACORNS instrument. Am Biol Teach 74(2):92–98

    Article  Google Scholar 

  • Nehm RH, Ha M, Mayfield E (2012b) Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. J Sci Educ Technol 21(1):183–196

    Article  Google Scholar 

  • Neumann I, Neumann K, Nehm R (2011) Evaluating instrument quality in science education: Rasch-based analyses of a nature of science test. Int J Sci Educ 33(10):1373–1405

    Article  Google Scholar 

  • Opfer JE, Nehm RH, Ha M (2012) Cognitive foundations for science assessment design: knowing what students know about evolution. J Res Sci Teach 49(6):744–777

    Article  Google Scholar 

  • Osborne JF, Patterson A (2011) Scientific argument and explanation: a necessary distinction? Sci Educ 95(4):627–638

    Article  Google Scholar 

  • Page EB (1966) The imminence of grading essays by computers. Phi Delta Kappan 47:238–243

    Google Scholar 

  • Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208

    Google Scholar 

  • Rector MA, Nehm RH, Pearl D (2012) Item sequencing effects on the measurement of students’ biological knowledge. Paper in the proceeding of the National Association of Research in Science Teaching, Indianapolis, IN, 25–28 March 2012

  • Rector MA, Nehm RH, Pearl D (2013) Learning the language of evolution: lexical ambiguity and word meaning in student explanations. Res Sci Educ 43(3):1107–1133

    Article  Google Scholar 

  • Roediger HL III, Marsh EJ (2005) The positive and negative consequences of multiple-choice testing. J Exp Psychol Learn Mem Cogn 31(5):1155

    Article  Google Scholar 

  • Russ RS, Scherr RE, Hammer D, Mikeska J (2008) Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science. Sci Educ 92(3):499–525

    Article  Google Scholar 

  • Russ RS, Lee VR, Sherin BL (2012) Framing in cognitive clinical interviews about intuitive science knowledge: dynamic student understandings of the discourse interaction. Sci Educ 96(4):573–599

    Article  Google Scholar 

  • Sandoval WA, Millwood KA (2005) The quality of students’ use of evidence in written scientific explanations. Cogn Instr 23(1):23–55

    Article  Google Scholar 

  • Seddon GM, Pedrosa MA (1988) A comparison of students’ explanations derived from spoken and written methods of questioning and answering. Int J Sci Educ 10(3):337–342

    Article  Google Scholar 

  • Shermis MD, Burstein J (2003) Automated essay scoring: a cross-disciplinary perspective. Lawrence Erlbaum Associates, Inc., Mahwah

    Google Scholar 

  • Songer NB, Gotwals AW (2012) Guiding explanation construction by children at the entry points of learning progressions. J Res Sci Teach 49(2):141–165

    Article  Google Scholar 

  • Songer NB, Kelcey B, Gotwals AW (2009) How and when does complex reasoning occur? Empirically driven development of a learning progression focused on complex reasoning about biodiversity. J Res Sci Teach 46(6):610–631

    Article  Google Scholar 

  • Vosniadou S, Vamvakoussi X, Skopeliti I (2008) The framework theory approach to the problem of conceptual change. In: Vosniadou S (ed) International handbook of research on conceptual change. Routledge, New York, pp 3–34

    Google Scholar 

  • Woloshyn V, Gallagher T (2009, December 23) Self-explanation. Retrieved from http://www.education.com/reference/article/self-explanation/

  • Yang Y, Buckendahl CW, Juszkiewicz PJ, Bhola DS (2002) A review of strategies for validating computer automated scoring. Appl Meas Educ 15(4):391–412

    Article  Google Scholar 

Download references

Acknowledgments

We thank John Harder, Ian Hamilton, the Ohio State University’s Center for Life Science Education and Department of Anthropology’s Graduate Teaching Associate program for assistance with data collection, the National Science Foundation REESE program (DRL 0909999) for funding portions of this study, and Meghan Rector for helpful reviews of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elizabeth P. Beggrow.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beggrow, E.P., Ha, M., Nehm, R.H. et al. Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?. J Sci Educ Technol 23, 160–182 (2014). https://doi.org/10.1007/s10956-013-9461-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10956-013-9461-9

Keywords

Navigation