Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah

doi:10.1007/s10956-011-9300-9

Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

Published: 24 April 2011

Volume 21, pages 183–196, (2012)
Cite this article

Journal of Science Education and Technology Aims and scope Submit manuscript

Ross H. Nehm¹,
Minsu Ha¹ &
Elijah Mayfield²

1624 Accesses
86 Citations
6 Altmetric
Explore all metrics

Abstract

This study explored the use of machine learning to automatically evaluate the accuracy of students’ written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features (such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at the individual item level and that performance degraded when suites of items or entire instruments were used to build and test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and performance in a complex science domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations

Article Open access 01 August 2014

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

Article 30 January 2016

From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

Article Open access 25 January 2024

References

Alberts B (2010) Reframing science standards. Science 329(5991):491
Article Google Scholar
Arora S, Nyberg E (2009) Interactive annotation learning with indirect feature voting. In: Paper in the proceedings of student research symposium at NAACL-HLT 2009, Boulder, Colorado, USA. Accessed online at: http://www.cs.cmu.edu/%7Eshilpaa/NAACL_SRW_IAL.pdf
Bejar II (1991) A methodology for scoring open-ended architectural design problems. J Appl Psychol 76(4):522–532
Article Google Scholar
Bishop B, Anderson C (1990) Student conceptions of natural selection and its role in evolution. J Res Sci Teach 27:415–427
Article Google Scholar
Burstein J (2003) The e-rater scoring engine: automated essay scoring with natural language processing. In: Shermis MD, Burstein J (eds) Automated essay scoring: a cross-disciplinary perspective. Lawrence Erlbaum Associates, Inc, Mahwah, pp 113–122
Google Scholar
Chung GKWK, Baker EL (2003) Issues in the reliability and validity of automated scoring of constructed responses. In: Shermis MD, Burstein J (eds) Automated essay scoring: a cross-disciplinary perspective. Erlbaum, Mahwah, pp 23–40
Google Scholar
Clough EE, Driver R (1986) A study of consistency in the use of students’ conceptual frameworks across different task contexts. Sci Educ 70:473–496
Article Google Scholar
Demastes SS, Good RG, Peebles P (1995) Students’ conceptual ecologies and the process of conceptual change in evolution. Sci Educ 79(6):637–666
Article Google Scholar
Donmez P, Rosé C, Stegmann K, Weinberger A, Fischer F (2005) Supporting CSCL with automatic corpus analysis technology. In: Paper in proceedings of the international conference on computer support for collaborative learning (CSCL), Taipei, Taiwan
Endler JA (1992) Natural selection: current usages. In: Keller EF, Lloyd EA (eds) Keywords in evolutionary biology. Harvard, Cambridge, pp 220–224
Google Scholar
Galt K (2008) SPSS text analysis for surveys 2.1 and qualitative and mixed methods analysis. J Mixed Meth Res 2(3):284–286
Article Google Scholar
Gitomer DH, Duschl RA (2007) Establishing multilevel coherence in assessment. In: Moss PA (ed) Evidence and decision making. The 106th yearbook of the National Society for the Study of Education, Part I. National Society for the Study of Education, Chicago, pp 288–320
Google Scholar
Krippendorff K (1980) Content analysis: an introduction to its methodology, 1st edn. Sage Publications, Thousand Oaks
Google Scholar
Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage Publications, Thousand Oaks, London
Google Scholar
Kumar R, Rosé C, Wang YC, Joshi M, Robinson A (2007) Tutorial dialogue as adaptive collaborative learning support. In: Paper in proceedings of the international conference on artificial intelligence in education, Los Angeles, USA
Landauer TK, Laham D, Foltz PW (2001) The intelligent essay assessor: putting knowledge to the test. In: Paper presented at the Association of Test Publishers Computer-Based Testing: Emerging Technologies and Opportunities for Diverse Applications conference, Tucson, AZ
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Article Google Scholar
Lewontin R (1978) Adaptation. Sci Am 239:212–228
Article Google Scholar
Liu OL, Lee HS, Hofstetter C, Linn MC (2008) Assessing knowledge integration in science: construct, measures, and evidence. Educ Assess 13(1):33–55
Article Google Scholar
Markoff J (2011) Computer wins on ‘jeopardy!’: trivial, it’s not. New York Times, 16 Feb
Mayfield E, Rosé C (2010) An interactive tool for supporting error analysis for text mining. In: Paper in proceedings of the demonstration session at the international conference of the North American Association for Computational Linguistics (NAACL), Los Angeles, USA
McLaren B, Scheuer O, de Laat M, Hever R, de Groot R, Rosé C (2007) Using machine learning techniques to analyze and support mediation of student e-discussions. In: Paper in proceedings of the international conference on artificial intelligence in education, Los Angeles, USA
National Research Council (2001) Knowing what students know: the science and design of educational assessment. National Academy Press, Washington, D.C.
Google Scholar
National Research Council (2007) Taking science to school: learning and teaching science in grades K-8. National Academy Press, Washington, D.C.
Google Scholar
National Research Council (2008) Rising above the gathering storm: energizing and employing America for a brighter economic future. National Academy Press, Washington, D.C.
Google Scholar
Nehm RH (2010) Understanding undergraduates’ problem solving processes. J Biol Microbiol Educ 11(2):119–122
Google Scholar
Nehm RH, Ha M (2011) Item feature effects in evolution assessment. J Res Sci Teach 48(3):237–256
Article Google Scholar
Nehm RH, Haertig H (2011) Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. J Sci Educ Technol. doi:10.1007/s10956-011-9282-7
Nehm RH, Reilly L (2007) Biology majors’ knowledge and misconceptions of natural selection. Bioscience 57(3):263–272
Article Google Scholar
Nehm RH, Schonfeld IS (2008) Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and an oral interview. J Res Sci Teach 45(10):1131–1160
Article Google Scholar
Nehm RH, Schonfeld IS (2010) The future of natural selection knowledge measurement: a reply to Anderson et al. J Res Sci Teach 47(3):358–362
Google Scholar
Nehm RH, Ha M, Rector M, Opfer J, Perrin L, Ridgway J, Mollohan K (2010) Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (EGALT). Technical Report of National Science Foundation REESE Project 0909999. Accessed online 10 Jan 2011 at: http://evolutionassessment.org
Page EB (1966) The imminence of grading essays by computers. Phi Delta Kappan 47:238–243
Google Scholar
Patterson C (1978) Evolution. Cornell University Press, Ithaca
Google Scholar
Pigliucci M, Kaplan J (2006) Making sense of evolution: the conceptual foundations of evolutionary biology. University of Chicago Press, Chicago
Google Scholar
Rose C, Donmez P, Gweon G, Knight A, Junker B, Cohen W, Koedinger K, Heffernan N (2005) Automatic and semi-automatic skill coding with a view towards supporting on-line assessment. In: Paper in proceedings of the international conference on artificial intelligence in education, Amsterdam, The Netherlands
Rose CP, Wang YC, Cui Y, Arguello J, Stegmann K, Weinberger A, Fischer F (2008) Analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. Int J Comput Support Collab Learn 3(3):237–271
Article Google Scholar
Shermis MD, Burstein J (2003) Automated essay scoring: a cross-disciplinary perspective. Lawrence Erlbaum Associates, Inc, Mahwah
Google Scholar
Sukkarieh J, Bolge E (2008) Leveraging c-rater’s automated scoring capability for providing instructional feedback for short constructed responses. In: Woolf BP, Aimeur E, Nkambou R, Lajoie S (eds) Lecture notes in computer science: vol. 5091. Proceedings of the 9th international conference on intelligent tutoring systems, ITS 2008, Montreal, Canada, June 23–27, 2008. Springer, New York, pp 779–783
Google Scholar
The Conference Board, Corporate Voices for Working Families, the Partnership for 21st Century Skills, and the Society for Human Resource Management (2007) Are they really ready to work? Employers’ perspectives on the basic knowledge and applied skills of new entrants to the 21st century workforce. Accessed online 22 Mar 2011 at: http://www.p21.org/index.php?option=com_content&task=view&id=250&Itemid=64
Wagner T (2008) The global achievement gap. Basic Books, New York
Google Scholar
Witten IH, Frank E (2005) Data mining, 2nd edn. Elsevier, Amsterdam
Google Scholar
Yang Y, Buckendahl CW, Juszkiewicz PJ, Bhola DS (2002) A review of strategies for validating computer automated scoring. Appl Meas Educ 15(4):391–412
Article Google Scholar

Download references

Acknowledgments

We thank the faculty and participants of the 2010 PSLC (NSF Pittsburgh Science of Learning Center) summer school for financial and intellectual support; Prof. Carolyn Penstein Rosé for introducing us to the SIDE program; NSF REESE grant 0909999 for financial support.

Author information

Authors and Affiliations

School of Teaching and Learning, The Ohio State University, 1945 N. High Street, Columbus, OH, 43210, USA
Ross H. Nehm & Minsu Ha
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Elijah Mayfield

Authors

Ross H. Nehm
View author publications
You can also search for this author in PubMed Google Scholar
Minsu Ha
View author publications
You can also search for this author in PubMed Google Scholar
Elijah Mayfield
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ross H. Nehm.

Appendix

The SIDE program and user’s guide may be downloaded at: http://www.cs.cmu.edu/~cpRosé/SIDE.html. Specific SIDE settings for performing the analyses in this study include: The machine-learning algorithm was selected as “weka-classifiers-functions-SMO”; options included: (1) “unigrams”; (2) “treat above features as binary”; (3) “line length”; (4) “remove stopwords”; and (5) “stemming” (details of these features may be found in Mayfield and Rosé 2010, p. 6). We also used the feature extractor plugin option “plugin.sample.fce.TagHelperExtractor”; This default option creates a feature table based upon the NLP extractions mentioned above. We also selected the “Remove rare features” option and set the value of 5, and, as noted above, chose the machine-learning algorithm “weka-classifiers-functions-SMO.” We selected Cross-validation and set the value at 10. For the Default segmenter option, we selected “plugin.sample.segmenter.DocumentSegmenter” (Mayfield and Rosé 2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nehm, R.H., Ha, M. & Mayfield, E. Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations. J Sci Educ Technol 21, 183–196 (2012). https://doi.org/10.1007/s10956-011-9300-9

Download citation

Published: 24 April 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s10956-011-9300-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

Abstract

Access this article

Similar content being viewed by others

EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

Abstract

Access this article

Similar content being viewed by others

EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation