Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

doi:10.1007/s10956-013-9461-9

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Published: 16 July 2013

Volume 23, pages 160–182, (2014)
Cite this article

Journal of Science Education and Technology Aims and scope Submit manuscript

Elizabeth P. Beggrow¹,
Minsu Ha¹,
Ross H. Nehm⁴,
Dennis Pearl² &
…
William J. Boone³

1743 Accesses
48 Citations
2 Altmetric
Explore all metrics

Abstract

The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students’ written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students’ normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Machine Learning and Qualitative Methods to Elaborate Students’ Ideas About the Generality of their Model-Based Explanations

Article 15 September 2020

Medical students create multiple-choice questions for learning in pathology education: a pilot study

Article Open access 22 August 2018

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

Article 09 September 2020

References

Abu-Mostafa YS (2012) Machines that think for themselves. Sci Am 307(1):78–81
Article Google Scholar
American Association for the Advancement of Science (2011) Vision and change in undergraduate biology education. AAAS, Washington
Google Scholar
Anderson DL, Fisher KM, Norman GJ (2002) Development and evaluation of the conceptual inventory of natural selection. J Res Sci Teach 39(10):952–978
Article Google Scholar
Battisti BT, Hanegan N, Sudweeks R, Cates R (2010) Using item response theory to conduct a distracter analysis on conceptual inventory of natural selection. Int J Sci Math Educ 8:845–868
Article Google Scholar
Beggrow EP, Nehm RH (2012) Students’ mental models of evolutionary causation: natural selection and genetic drift. Evolut Educ Outreach 5(3):429–444
Article Google Scholar
Berland LK, McNeill KL (2012) For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Sci Educ 96(5):808–813
Article Google Scholar
Bishop BA, Anderson CW (1990) Student conceptions of natural selection and its role in evolution. J Res Sci Teach 27(5):415–427
Article Google Scholar
Black TR (1999) Doing quantitative research in the social sciences. Sage Publications, London
Google Scholar
Bond TG, Fox CM (2001) Applying the Rasch model: fundamental measurement in the human sciences. Lawrence Erlbaum Associates, Mahwah
Google Scholar
Boone WJ, Scantlebury K (2006) The role of Rasch analysis when conducting science education research utilizing multiple-choice tests. Sci Educ 90(2):253–269
Article Google Scholar
Braaten M, Windschitl M (2011) Working toward a stronger conceptualization of scientific explanation for science education. Sci Educ 95(4):639–669
Article Google Scholar
Briggs DC, Alonzo AC, Schwab C, Wilson M (2006) Diagnostic assessment with ordered multiple-choice items. Educ Assess 11(1):33–63
Article Google Scholar
Chi MTH, Bassok M, Lewis MW, Reimann P, Glaser R (1989) Self-explanations: how students study and use examples in learning to solve problems. Cogn Sci 13:145–182
Article Google Scholar
Deadman JA, Kelly PJ (1978) What do secondary school boys understand about evolution and heredity before they are taught the topics? J Biol Educ 12(1):7–15
Article Google Scholar
Ginsburg H (1981) The clinical interview in psychological research on mathematical thinking: aims, rationales, techniques. Learn Math 1(3):4–11
Google Scholar
Gobert JD, Sao Pedro MA, Baker RSJD, Toto E, Montalvo O (2012) Leveraging educational data mining for real-time performance assessment of scientific inquiry skills within microworlds. J Educ Data Min 4(1):153–185
Google Scholar
Graesser AC, McNamara DS (2012) Automated analysis of essays and open-ended verbal responses. In: Cooper H, Panter AT (eds) APA handbook of research methods in psychology. American Psychological Association, Washington
Google Scholar
Ha M, Nehm RH (2012) Using machine-learning methods to detect key concepts and misconceptions of evolution in students’ written explanations. Paper in proceedings of the National Association for Research in Science Teaching, Indianapolis, Indiana
Ha M, Nehm RH, Urban-Lurain M, Merrill JE (2011) Applying computerized scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE Life Sci Educ 10:379–393
Article Google Scholar
Haudek KC, Kaplan JJ, Knight J, Long T, Merrill J, Munn A, Nehm RH, Smith M, Urban-Lurain M (2011) Harnessing technology to improve formative assessment of student conceptions in STEM: forging a national network. CBE Life Sci Educ 10(2):149–155
Article Google Scholar
Joughin G (1998) Dimensions of oral assessment. Assess Eval High Educ 23(4):367–378
Article Google Scholar
Leacock C, Chodorow M (2003) C-rater: automated scoring of short-answer questions. Comput Humanit 37(4):389–405
Article Google Scholar
Linacre JM (2006) WINSTEPS Rasch measurement software [Computer program]. WINSTEPS, Chicago
Google Scholar
Lombrozo T (2006) The structure and function of explanations. Trends Cogn Sci 10:464–470
Article Google Scholar
Lombrozo T (2012) Explanation and abductive inference. In: Holyoak KJ, Morrison RG (eds) Oxford handbook of thinking and reasoning. Oxford University Press, Oxford, pp 260–276
Google Scholar
Magliano JP, Graesser AC (2012) Computer-based assessment of student-constructed responses. Behav Res Methods 44:608–621
Article Google Scholar
Mayfield E, Rosé C (2012) LightSIDE: text mining and machine learning user’s manual. Carnegie Mellon University, Pittsburgh
Google Scholar
Mayfield E, Rosé C (2013) LightSIDE: open source machine learning for text. In: Shermis MD, Burstein J (eds) Handbook of automated essay evaluation. Routledge, New York
McNeill KL, Lizotte DJ, Krajcik J, Marx RW (2006) Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. J Learn Sci 15(2):153–191
Article Google Scholar
Moscarella RA, Urban-Lurain M, Merritt B, Long T, Richmond G, Merrill J, Parker J, Patterson R, Wilson C (2008) Understanding undergraduate students’ conceptions in science: using lexical analysis software to analyze students’ constructed responses in biology. Proceedings of the National Association for Research in Science Teaching (NARST) annual conference, Baltimore, MD
National Research Council (1996) National science education standards. The National Academies Press, Washington, DC
National Research Council (2001a) Investigating the influence of standards: a framework for research in mathematics, science, and technology education. The National Academies Press, Washington, DC
Google Scholar
National Research Council (2001b) Knowing what students know. The National Academies Press, Washington, DC
Google Scholar
National Research Council (2007) Taking science to school. The National Academies Press, Washington, DC
Google Scholar
National Research Council (2012) A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC
Google Scholar
Nehm RH, Ha M (2011) Item feature effects in evolution assessment. J Res Sci Teach 48(3):237–256
Article Google Scholar
Nehm RH, Haertig H (2012) Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. J Sci Educ Technol 21(1):56–73
Article Google Scholar
Nehm RH, Reilly L (2007) Biology majors’ knowledge and misconceptions of natural selection. Bioscience 57(3):263–272
Article Google Scholar
Nehm RH, Ridgway J (2011) What do experts and novices “see” in evolutionary problems? Evol Educ Outreach 4(4):666–679
Article Google Scholar
Nehm RH, Schonfeld IS (2008) Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and an oral interview. J Res Sci Teach 45(10):1131–1160
Article Google Scholar
Nehm RH, Kim SY, Sheppard K (2009) Academic preparation in biology and advocacy for teaching evolution: biology versus non-biology teachers. Sci Educ 93(6):1122–1146
Article Google Scholar
Nehm RH, Ha M, Rector M, Opfer JE, Perrin L, Ridgway J, Mollohan K (2010) Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project 0909999
Nehm RH, Beggrow EP, Opfer JE, Ha M (2012a) Reasoning about natural selection: diagnosing contextual competency using the ACORNS instrument. Am Biol Teach 74(2):92–98
Article Google Scholar
Nehm RH, Ha M, Mayfield E (2012b) Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. J Sci Educ Technol 21(1):183–196
Article Google Scholar
Neumann I, Neumann K, Nehm R (2011) Evaluating instrument quality in science education: Rasch-based analyses of a nature of science test. Int J Sci Educ 33(10):1373–1405
Article Google Scholar
Opfer JE, Nehm RH, Ha M (2012) Cognitive foundations for science assessment design: knowing what students know about evolution. J Res Sci Teach 49(6):744–777
Article Google Scholar
Osborne JF, Patterson A (2011) Scientific argument and explanation: a necessary distinction? Sci Educ 95(4):627–638
Article Google Scholar
Page EB (1966) The imminence of grading essays by computers. Phi Delta Kappan 47:238–243
Google Scholar
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Google Scholar
Rector MA, Nehm RH, Pearl D (2012) Item sequencing effects on the measurement of students’ biological knowledge. Paper in the proceeding of the National Association of Research in Science Teaching, Indianapolis, IN, 25–28 March 2012
Rector MA, Nehm RH, Pearl D (2013) Learning the language of evolution: lexical ambiguity and word meaning in student explanations. Res Sci Educ 43(3):1107–1133
Article Google Scholar
Roediger HL III, Marsh EJ (2005) The positive and negative consequences of multiple-choice testing. J Exp Psychol Learn Mem Cogn 31(5):1155
Article Google Scholar
Russ RS, Scherr RE, Hammer D, Mikeska J (2008) Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science. Sci Educ 92(3):499–525
Article Google Scholar
Russ RS, Lee VR, Sherin BL (2012) Framing in cognitive clinical interviews about intuitive science knowledge: dynamic student understandings of the discourse interaction. Sci Educ 96(4):573–599
Article Google Scholar
Sandoval WA, Millwood KA (2005) The quality of students’ use of evidence in written scientific explanations. Cogn Instr 23(1):23–55
Article Google Scholar
Seddon GM, Pedrosa MA (1988) A comparison of students’ explanations derived from spoken and written methods of questioning and answering. Int J Sci Educ 10(3):337–342
Article Google Scholar
Shermis MD, Burstein J (2003) Automated essay scoring: a cross-disciplinary perspective. Lawrence Erlbaum Associates, Inc., Mahwah
Google Scholar
Songer NB, Gotwals AW (2012) Guiding explanation construction by children at the entry points of learning progressions. J Res Sci Teach 49(2):141–165
Article Google Scholar
Songer NB, Kelcey B, Gotwals AW (2009) How and when does complex reasoning occur? Empirically driven development of a learning progression focused on complex reasoning about biodiversity. J Res Sci Teach 46(6):610–631
Article Google Scholar
Vosniadou S, Vamvakoussi X, Skopeliti I (2008) The framework theory approach to the problem of conceptual change. In: Vosniadou S (ed) International handbook of research on conceptual change. Routledge, New York, pp 3–34
Google Scholar
Woloshyn V, Gallagher T (2009, December 23) Self-explanation. Retrieved from http://www.education.com/reference/article/self-explanation/
Yang Y, Buckendahl CW, Juszkiewicz PJ, Bhola DS (2002) A review of strategies for validating computer automated scoring. Appl Meas Educ 15(4):391–412
Article Google Scholar

Download references

Acknowledgments

We thank John Harder, Ian Hamilton, the Ohio State University’s Center for Life Science Education and Department of Anthropology’s Graduate Teaching Associate program for assistance with data collection, the National Science Foundation REESE program (DRL 0909999) for funding portions of this study, and Meghan Rector for helpful reviews of the manuscript. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the National Science Foundation.

Author information

Authors and Affiliations

Department of Teaching and Learning, The Ohio State University, 333 Arps Hall, 1945 N High Street, Columbus, OH, 43210, USA
Elizabeth P. Beggrow & Minsu Ha
Department of Statistics, The Ohio State University, 404 Cockins Hall, 1958 Neil Avenue, Columbus, OH, 43210, USA
Dennis Pearl
Department of Educational Psychology, Miami University, 501 East High Street, Oxford, OH, 45056, USA
William J. Boone
Center for Science and Mathematics Education, Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, 11794, USA
Ross H. Nehm

Authors

Elizabeth P. Beggrow
View author publications
You can also search for this author in PubMed Google Scholar
Minsu Ha
View author publications
You can also search for this author in PubMed Google Scholar
Ross H. Nehm
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Pearl
View author publications
You can also search for this author in PubMed Google Scholar
William J. Boone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth P. Beggrow.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beggrow, E.P., Ha, M., Nehm, R.H. et al. Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?. J Sci Educ Technol 23, 160–182 (2014). https://doi.org/10.1007/s10956-013-9461-9

Download citation

Published: 16 July 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10956-013-9461-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Abstract

Access this article

Similar content being viewed by others

Combining Machine Learning and Qualitative Methods to Elaborate Students’ Ideas About the Generality of their Model-Based Explanations

Medical students create multiple-choice questions for learning in pathology education: a pilot study

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Abstract

Access this article

Similar content being viewed by others

Combining Machine Learning and Qualitative Methods to Elaborate Students’ Ideas About the Generality of their Model-Based Explanations

Medical students create multiple-choice questions for learning in pathology education: a pilot study

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation