The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

Ha, Minsu; Nehm, Ross H.

doi:10.1007/s10956-015-9598-9

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

Published: 30 January 2016

Volume 25, pages 358–374, (2016)
Cite this article

Journal of Science Education and Technology Aims and scope Submit manuscript

Minsu Ha¹ &
Ross H. Nehm²

847 Accesses
28 Citations
2 Altmetric
1 Mention
Explore all metrics

Abstract

Automated computerized scoring systems (ACSSs) are being increasingly used to analyze text in many educational settings. Nevertheless, the impact of misspelled words (MSW) on scoring accuracy remains to be investigated in many domains, particularly jargon-rich disciplines such as the life sciences. Empirical studies confirm that MSW are a pervasive feature of human-generated text and that despite improvements, spell-check and auto-replace programs continue to be characterized by significant errors. Our study explored four research questions relating to MSW and text-based computer assessments: (1) Do English language learners (ELLs) produce equivalent magnitudes and types of spelling errors as non-ELLs? (2) To what degree do MSW impact concept-specific computer scoring rules? (3) What impact do MSW have on computer scoring accuracy? and (4) Are MSW more likely to impact false-positive or false-negative feedback to students? We found that although ELLs produced twice as many MSW as non-ELLs, MSW were relatively uncommon in our corpora. The MSW in the corpora were found to be important features of the computer scoring models. Although MSW did not significantly or meaningfully impact computer scoring efficacy across nine different computer scoring models, MSW had a greater impact on the scoring algorithms for naïve ideas than key concepts. Linguistic and concept redundancy in student responses explains the weak connection between MSW and scoring accuracy. Lastly, we found that MSW tend to have a greater impact on false-positive feedback. We discuss the implications of these findings for the development of next-generation science assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Students’ Use of Evidence and Organization in Response-to-Text Writing: Using Natural Language Processing for Rubric-Based Automated Scoring

Article 13 March 2017

From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

Article Open access 25 January 2024

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

Article 19 November 2020

References

Abedi J (2004) The no child left behind act and English language learners: assessment and accountability issues. Educ Res 33(1):4–14
Article Google Scholar
Abedi J, Hofstetter CH, Lord C (2004) Assessment accommodations for English language learners: implications for policy-based empirical research. Rev Educ Res 74(1):1–28
Article Google Scholar
Abu-Mostafa YS (2012) Machines that think for themselves. Sci Am 307(1):78–81
Article Google Scholar
Agarwal S, Godbole S, Punjani D, Roy S (2007) How much noise is too much: a study in automatic text classification. In: Seventh IEEE international conference on Data mining, 2007. ICDM 2007, pp 3–12. IEEE
American Association for the Advancement of Science (AAAS) (2011) Vision and change in undergraduate biology education. AAAS, Washington, DC
Google Scholar
American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME) (2014) The standards for educational and psychological testing. AERA Publications, Washington, DC
Google Scholar
Bebout L (1985) An error analysis of misspellings made by learners of English as a first and as a second language. J Psycholinguist Res 14(6):569–593
Article Google Scholar
Beggrow EP, Ha M, Nehm RH, Pearl D, Boone WJ (2014) Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance? J Sci Educ Technol 23(1):160–182
Article Google Scholar
Bejar II (1991) A methodology for scoring open-ended architectural design problems. J Appl Psyc 76(4):522–532
Article Google Scholar
Bishop BA, Anderson CW (1990) Student conceptions of natural selection and its role in evolution. J Res Sci Teach 27(5):415–427
Article Google Scholar
Brady M, Seli H, Rosenthal J (2013) “Clickers” and metacognition: a quasi-experimental comparative study about metacognitive self-regulation and use of electronic feedback devices. Comp Educ 65:56–63
Article Google Scholar
Bridgeman B, Trapani C, Attali Y (2012) Comparison of human and machine scoring of essays: differences by gender, ethnicity, and country. Appl Measur Educ 25:27–40
Article Google Scholar
Chen JC, Whittinghill DC, Kadlowec JA (2010) Classes that click: fast, rich feedback to enhance student learning and satisfaction. J Eng Educ 99(2):159–168
Article Google Scholar
Connors RJ, Lunsford AA (1988) Frequency of formal errors in current college writing, or Ma and Pa Kettle do research. Coll Compos Commun 39(4):395–409
Article Google Scholar
Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176
Article Google Scholar
Federer MR, Nehm RH, Opfer JE, Pearl D (2014) Using a constructed-response instrument to explore the effects of item position and item features on the assessment of students’ written scientific explanations. Res Sci Educ 45(4):527–553
Article Google Scholar
Fitzsimmons PA, Landers DM, Thomas JR, van der Mars H (1991) Does self-efficacy predict performance in experienced weightlifters? Res Quart Exerc Sport 62(4):424–431
Article Google Scholar
Flor M, Futagi Y (2012) On using context for automatic correction of non-word misspellings in student essays. In: Proceedings of the seventh workshop on building educational applications Using NLP, pp 105–115. Association for Computational Linguistics
Flynn K, Hill J (2005) English language learners: a growing population. Policy brief mid-continent research for education and learning, pp. 1–12
Ha M, Nehm RH (2012) Using machine-learning methods to detect key concepts and misconceptions of evolution in students’ written explanations. Paper to be presented at the National Association for Research in Science Teaching, Indianapolis, IN
Ha M, Nehm RH, Urban-Lurain M, Merrill JE (2011) Applying computerized scoring models of written biological explanations across courses and colleges: prospects and limitations. CBE Life Sci Educ 10:379–393
Article Google Scholar
Haggan M (1991) Spelling errors in native Arabic-speaking English majors: a comparison between remedial students and fourth year students. System 19(1):45–61
Article Google Scholar
Haudek KC, Kaplan JJ, Knight J, Long T, Merrill J, Munn A, Nehm RH, Smith M, Urban-Lurain M (2011) Harnessing technology to improve formative assessment of student conceptions in STEM: forging a national network. CBE Life Sci Educ 10(2):149–155
Article Google Scholar
Haudek KC, Prevost LB, Moscarella RA, Merrill J, Urban-Lurain M (2012) What are they thinking? Automated analysis of student writing about acid–base chemistry in introductory biology. CBE Life Sci Educ 11(3):283–293
Article Google Scholar
Holroyd KA, Penzien DB, Hursey KG, Tobin DL, Rogers L, Holm JE, Marcille PJ, Hall JR, Chila AG (1984) Change mechanisms in EMG biofeedback training: cognitive changes underlying improvements in tension headache. J Consult Clin Psychol 52(6):1039–1053
Article Google Scholar
Karl KA, O’Leary-Kelly AM, Martocchio JJ (1993) The impact of feedback and self-efficacy on performance in training. J Organ Behav 14(4):379–394
Article Google Scholar
Kucirkova N, Messer D, Sheehy K, Panadero CF (2014) Children’s engagement with educational iPad apps: insights from a Spanish classroom. Comp Educ 71:175–184
Article Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:1159–1174
Google Scholar
Lee UJ, Sbeglia GC, Ha M, Finch SJ, Nehm RH (2015) Clicker score trajectories and concept inventory scores as predictors for early warning systems for large STEM Classes. J Sci Ed Tech 24(6):848–860
Article Google Scholar
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys Doklady 10(8):707–710
Google Scholar
Linn MC, Gerard L, Ryoo K, McElhaney K, Liu OL, Rafferty AN (2014) Computer-guided inquiry to improve science learning. Science 344(6180):155–156
Article Google Scholar
Lunsford AA, Lunsford KJ (2008) “Mistakes are a fact of life”: a national comparative study. Coll Compos Commun 59(4):781–806
Google Scholar
Moharreri K, Ha M, Nehm RH (2014) EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations. Evolut Educ Outreach 7(1):1–14
Google Scholar
Muhlenbach F, Lallich S, Zighed DA (2004) Identifying and handling mislabelled instances. J Intell Inf Syst 22(1):89–109
Article Google Scholar
Nagata R, Whittaker E, Sheinman V (2011) Creating a manually error-tagged and shallow-parsed learner corpus. Proceedings of the 49th annual meeting of the association for computational linguistics. ACL, Stroudsburg, pp 1210–1219
Google Scholar
National Research Council (2012) A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC
Google Scholar
National Research Council (2013) Next generation science standards: for states, by states. The National Academies Press, Washington, DC
Google Scholar
Nehm RH, Reilly L (2007) Biology majors’ knowledge and misconceptions of natural selection. Bioscience 57(3):263–272
Article Google Scholar
Nehm RH, Schonfeld IS (2007) Does increasing biology teacher knowledge of evolution and the nature of science lead to greater preference for the teaching of evolution in schools? J Sci Teach Educ 18(5):699–723
Article Google Scholar
Nehm RH, Ha M, Rector M, Opfer JE, Perrin L, Ridgway J, Mollohan K (2010) Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical report of National Science Foundation REESE project 0909999
Nehm RH, Ha M, Mayfield E (2012) Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. J Sci Educ Technol 21(1):183–196
Article Google Scholar
Opfer JE, Nehm RH, Ha M (2012) Cognitive foundations for science assessment design: knowing what students know about evolution. J Res Sci Teach 49(6):744–777
Article Google Scholar
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Scho¨lkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Google Scholar
Sato T, Yamanishi Y, Kanehisa M, Toh H (2005) The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21(17):3482–3489
Article Google Scholar
Su LT (1994) The relevance of recall and precision in user evaluation. J Am Soc Inf Sci 45(3):207–217
Article Google Scholar
Zhu Z, Pilpel Y, Church GM (2002) Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol 318(1):71–81
Article Google Scholar

Download references

Acknowledgments

We thank the reviewers for helpful and thought-provoking comments on an earlier version of the manuscript. Financial support was provided by a National Science Foundation TUES grant (1322872). Any opinions, findings, conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

Division of Science Education, College of Education, Kangwon National University, Hyoja-dong, Chuncheon-si, Gangwon-do, 200-701, South Korea
Minsu Ha
Center for Science and Math Education, Department of Ecology and Evolution, Stony Brook University (SUNY), 092 Life Sciences Building, Stony Brook, NY, 11794-5233, USA
Ross H. Nehm

Authors

Minsu Ha
View author publications
You can also search for this author in PubMed Google Scholar
Ross H. Nehm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minsu Ha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ha, M., Nehm, R.H. The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations. J Sci Educ Technol 25, 358–374 (2016). https://doi.org/10.1007/s10956-015-9598-9

Download citation

Published: 30 January 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10956-015-9598-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

Abstract

Access this article

Similar content being viewed by others

Assessing Students’ Use of Evidence and Organization in Response-to-Text Writing: Using Natural Language Processing for Rubric-Based Automated Scoring

From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

Abstract

Access this article

Similar content being viewed by others

Assessing Students’ Use of Evidence and Organization in Response-to-Text Writing: Using Natural Language Processing for Rubric-Based Automated Scoring

From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation