Journal of Science Education and Technology

, Volume 23, Issue 1, pp 160–182 | Cite as

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

  • Elizabeth P. Beggrow
  • Minsu Ha
  • Ross H. Nehm
  • Dennis Pearl
  • William J. Boone


The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students’ written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students’ normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.


Applications in subject areas Evaluation methodologies Improving classroom teaching Pedagogical issues Teaching/learning strategies 



We thank John Harder, Ian Hamilton, the Ohio State University’s Center for Life Science Education and Department of Anthropology’s Graduate Teaching Associate program for assistance with data collection, the National Science Foundation REESE program (DRL 0909999) for funding portions of this study, and Meghan Rector for helpful reviews of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the National Science Foundation.


  1. Abu-Mostafa YS (2012) Machines that think for themselves. Sci Am 307(1):78–81CrossRefGoogle Scholar
  2. American Association for the Advancement of Science (2011) Vision and change in undergraduate biology education. AAAS, WashingtonGoogle Scholar
  3. Anderson DL, Fisher KM, Norman GJ (2002) Development and evaluation of the conceptual inventory of natural selection. J Res Sci Teach 39(10):952–978CrossRefGoogle Scholar
  4. Battisti BT, Hanegan N, Sudweeks R, Cates R (2010) Using item response theory to conduct a distracter analysis on conceptual inventory of natural selection. Int J Sci Math Educ 8:845–868CrossRefGoogle Scholar
  5. Beggrow EP, Nehm RH (2012) Students’ mental models of evolutionary causation: natural selection and genetic drift. Evolut Educ Outreach 5(3):429–444CrossRefGoogle Scholar
  6. Berland LK, McNeill KL (2012) For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Sci Educ 96(5):808–813CrossRefGoogle Scholar
  7. Bishop BA, Anderson CW (1990) Student conceptions of natural selection and its role in evolution. J Res Sci Teach 27(5):415–427CrossRefGoogle Scholar
  8. Black TR (1999) Doing quantitative research in the social sciences. Sage Publications, LondonGoogle Scholar
  9. Bond TG, Fox CM (2001) Applying the Rasch model: fundamental measurement in the human sciences. Lawrence Erlbaum Associates, MahwahGoogle Scholar
  10. Boone WJ, Scantlebury K (2006) The role of Rasch analysis when conducting science education research utilizing multiple-choice tests. Sci Educ 90(2):253–269CrossRefGoogle Scholar
  11. Braaten M, Windschitl M (2011) Working toward a stronger conceptualization of scientific explanation for science education. Sci Educ 95(4):639–669CrossRefGoogle Scholar
  12. Briggs DC, Alonzo AC, Schwab C, Wilson M (2006) Diagnostic assessment with ordered multiple-choice items. Educ Assess 11(1):33–63CrossRefGoogle Scholar
  13. Chi MTH, Bassok M, Lewis MW, Reimann P, Glaser R (1989) Self-explanations: how students study and use examples in learning to solve problems. Cogn Sci 13:145–182CrossRefGoogle Scholar
  14. Deadman JA, Kelly PJ (1978) What do secondary school boys understand about evolution and heredity before they are taught the topics? J Biol Educ 12(1):7–15CrossRefGoogle Scholar
  15. Ginsburg H (1981) The clinical interview in psychological research on mathematical thinking: aims, rationales, techniques. Learn Math 1(3):4–11Google Scholar
  16. Gobert JD, Sao Pedro MA, Baker RSJD, Toto E, Montalvo O (2012) Leveraging educational data mining for real-time performance assessment of scientific inquiry skills within microworlds. J Educ Data Min 4(1):153–185Google Scholar
  17. Graesser AC, McNamara DS (2012) Automated analysis of essays and open-ended verbal responses. In: Cooper H, Panter AT (eds) APA handbook of research methods in psychology. American Psychological Association, WashingtonGoogle Scholar
  18. Ha M, Nehm RH (2012) Using machine-learning methods to detect key concepts and misconceptions of evolution in students’ written explanations. Paper in proceedings of the National Association for Research in Science Teaching, Indianapolis, IndianaGoogle Scholar
  19. Ha M, Nehm RH, Urban-Lurain M, Merrill JE (2011) Applying computerized scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE Life Sci Educ 10:379–393CrossRefGoogle Scholar
  20. Haudek KC, Kaplan JJ, Knight J, Long T, Merrill J, Munn A, Nehm RH, Smith M, Urban-Lurain M (2011) Harnessing technology to improve formative assessment of student conceptions in STEM: forging a national network. CBE Life Sci Educ 10(2):149–155CrossRefGoogle Scholar
  21. Joughin G (1998) Dimensions of oral assessment. Assess Eval High Educ 23(4):367–378CrossRefGoogle Scholar
  22. Leacock C, Chodorow M (2003) C-rater: automated scoring of short-answer questions. Comput Humanit 37(4):389–405CrossRefGoogle Scholar
  23. Linacre JM (2006) WINSTEPS Rasch measurement software [Computer program]. WINSTEPS, ChicagoGoogle Scholar
  24. Lombrozo T (2006) The structure and function of explanations. Trends Cogn Sci 10:464–470CrossRefGoogle Scholar
  25. Lombrozo T (2012) Explanation and abductive inference. In: Holyoak KJ, Morrison RG (eds) Oxford handbook of thinking and reasoning. Oxford University Press, Oxford, pp 260–276Google Scholar
  26. Magliano JP, Graesser AC (2012) Computer-based assessment of student-constructed responses. Behav Res Methods 44:608–621CrossRefGoogle Scholar
  27. Mayfield E, Rosé C (2012) LightSIDE: text mining and machine learning user’s manual. Carnegie Mellon University, PittsburghGoogle Scholar
  28. Mayfield E, Rosé C (2013) LightSIDE: open source machine learning for text. In: Shermis MD, Burstein J (eds) Handbook of automated essay evaluation. Routledge, New YorkGoogle Scholar
  29. McNeill KL, Lizotte DJ, Krajcik J, Marx RW (2006) Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. J Learn Sci 15(2):153–191CrossRefGoogle Scholar
  30. Moscarella RA, Urban-Lurain M, Merritt B, Long T, Richmond G, Merrill J, Parker J, Patterson R, Wilson C (2008) Understanding undergraduate students’ conceptions in science: using lexical analysis software to analyze students’ constructed responses in biology. Proceedings of the National Association for Research in Science Teaching (NARST) annual conference, Baltimore, MDGoogle Scholar
  31. National Research Council (1996) National science education standards. The National Academies Press, Washington, DCGoogle Scholar
  32. National Research Council (2001a) Investigating the influence of standards: a framework for research in mathematics, science, and technology education. The National Academies Press, Washington, DCGoogle Scholar
  33. National Research Council (2001b) Knowing what students know. The National Academies Press, Washington, DCGoogle Scholar
  34. National Research Council (2007) Taking science to school. The National Academies Press, Washington, DCGoogle Scholar
  35. National Research Council (2012) A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DCGoogle Scholar
  36. Nehm RH, Ha M (2011) Item feature effects in evolution assessment. J Res Sci Teach 48(3):237–256CrossRefGoogle Scholar
  37. Nehm RH, Haertig H (2012) Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. J Sci Educ Technol 21(1):56–73CrossRefGoogle Scholar
  38. Nehm RH, Reilly L (2007) Biology majors’ knowledge and misconceptions of natural selection. Bioscience 57(3):263–272CrossRefGoogle Scholar
  39. Nehm RH, Ridgway J (2011) What do experts and novices “see” in evolutionary problems? Evol Educ Outreach 4(4):666–679CrossRefGoogle Scholar
  40. Nehm RH, Schonfeld IS (2008) Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and an oral interview. J Res Sci Teach 45(10):1131–1160CrossRefGoogle Scholar
  41. Nehm RH, Kim SY, Sheppard K (2009) Academic preparation in biology and advocacy for teaching evolution: biology versus non-biology teachers. Sci Educ 93(6):1122–1146CrossRefGoogle Scholar
  42. Nehm RH, Ha M, Rector M, Opfer JE, Perrin L, Ridgway J, Mollohan K (2010) Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project 0909999Google Scholar
  43. Nehm RH, Beggrow EP, Opfer JE, Ha M (2012a) Reasoning about natural selection: diagnosing contextual competency using the ACORNS instrument. Am Biol Teach 74(2):92–98CrossRefGoogle Scholar
  44. Nehm RH, Ha M, Mayfield E (2012b) Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. J Sci Educ Technol 21(1):183–196CrossRefGoogle Scholar
  45. Neumann I, Neumann K, Nehm R (2011) Evaluating instrument quality in science education: Rasch-based analyses of a nature of science test. Int J Sci Educ 33(10):1373–1405CrossRefGoogle Scholar
  46. Opfer JE, Nehm RH, Ha M (2012) Cognitive foundations for science assessment design: knowing what students know about evolution. J Res Sci Teach 49(6):744–777CrossRefGoogle Scholar
  47. Osborne JF, Patterson A (2011) Scientific argument and explanation: a necessary distinction? Sci Educ 95(4):627–638CrossRefGoogle Scholar
  48. Page EB (1966) The imminence of grading essays by computers. Phi Delta Kappan 47:238–243Google Scholar
  49. Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208Google Scholar
  50. Rector MA, Nehm RH, Pearl D (2012) Item sequencing effects on the measurement of students’ biological knowledge. Paper in the proceeding of the National Association of Research in Science Teaching, Indianapolis, IN, 25–28 March 2012Google Scholar
  51. Rector MA, Nehm RH, Pearl D (2013) Learning the language of evolution: lexical ambiguity and word meaning in student explanations. Res Sci Educ 43(3):1107–1133CrossRefGoogle Scholar
  52. Roediger HL III, Marsh EJ (2005) The positive and negative consequences of multiple-choice testing. J Exp Psychol Learn Mem Cogn 31(5):1155CrossRefGoogle Scholar
  53. Russ RS, Scherr RE, Hammer D, Mikeska J (2008) Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science. Sci Educ 92(3):499–525CrossRefGoogle Scholar
  54. Russ RS, Lee VR, Sherin BL (2012) Framing in cognitive clinical interviews about intuitive science knowledge: dynamic student understandings of the discourse interaction. Sci Educ 96(4):573–599CrossRefGoogle Scholar
  55. Sandoval WA, Millwood KA (2005) The quality of students’ use of evidence in written scientific explanations. Cogn Instr 23(1):23–55CrossRefGoogle Scholar
  56. Seddon GM, Pedrosa MA (1988) A comparison of students’ explanations derived from spoken and written methods of questioning and answering. Int J Sci Educ 10(3):337–342CrossRefGoogle Scholar
  57. Shermis MD, Burstein J (2003) Automated essay scoring: a cross-disciplinary perspective. Lawrence Erlbaum Associates, Inc., MahwahGoogle Scholar
  58. Songer NB, Gotwals AW (2012) Guiding explanation construction by children at the entry points of learning progressions. J Res Sci Teach 49(2):141–165CrossRefGoogle Scholar
  59. Songer NB, Kelcey B, Gotwals AW (2009) How and when does complex reasoning occur? Empirically driven development of a learning progression focused on complex reasoning about biodiversity. J Res Sci Teach 46(6):610–631CrossRefGoogle Scholar
  60. Vosniadou S, Vamvakoussi X, Skopeliti I (2008) The framework theory approach to the problem of conceptual change. In: Vosniadou S (ed) International handbook of research on conceptual change. Routledge, New York, pp 3–34Google Scholar
  61. Woloshyn V, Gallagher T (2009, December 23) Self-explanation. Retrieved from
  62. Yang Y, Buckendahl CW, Juszkiewicz PJ, Bhola DS (2002) A review of strategies for validating computer automated scoring. Appl Meas Educ 15(4):391–412CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Elizabeth P. Beggrow
    • 1
  • Minsu Ha
    • 1
  • Ross H. Nehm
    • 4
  • Dennis Pearl
    • 2
  • William J. Boone
    • 3
  1. 1.Department of Teaching and LearningThe Ohio State UniversityColumbusUSA
  2. 2.Department of StatisticsThe Ohio State UniversityColumbusUSA
  3. 3.Department of Educational PsychologyMiami UniversityOxfordUSA
  4. 4.Center for Science and Mathematics Education, Department of Ecology and EvolutionStony Brook UniversityStony BrookUSA

Personalised recommendations