Abstract
The impact of OpenAI’s ChatGPT on education has led to a reexamination of traditional pedagogical methods and assessments. However, ChatGPT’s performance capabilities on a wide range of assessments remain to be determined. This study aims to classify ChatGPT-generated and student constructed responses to a college-level environmental science question and explore the linguistic- and content-level features that can be used to address the differential use of language. Coh-Metrix textual analytic tool was implemented to identify and extract linguistic and textual feature. Then we employed random forest feature selection method to determine the best representative and nonredundant text-based features. We also employed TF-IDF metrics to represent the content of written responses. The true performance of classification models for the responses was evaluated and compared in three scenarios: (a) using content-level features alone, (b) using linguistic-level features alone, (c) using the combination of two. The results demonstrated that the accuracy, specificity, sensitivity, and F1-score all increased when we used the combination of two-level features. The results of this study hold promise to provide valuable insights for instructors to detect student responses and integrate ChatGPT into their course development. This study also highlights the significance of linguistic- and content-level features in AI education research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Coh-Metrix Web Documentation. http://cohmetrix.com/. Accessed 12 Feb 2023
Cooper, M.M., Stowe, R.L.: Chemistry education research—from personal empiricism 27to evidence, theory, and informed practice. Chem. Rev. 118(12), 6053–6087 (2018)
Baidoo-Anu, D., Owusu Ansah, L.: Education in the era of generative artificial intelligence (AI): understanding the potential benefits of ChatGPT in promoting teaching and learning (2023)
de Rooij, M., Weeda, W.: Cross-validation: a method every psychologist should know. Adv. Meth. Pract. Psychol. Sci. 3(2), 248–263 (2020)
FAO: The Water-Energy-Food Nexus: a new approach in support of food security and sustainable agriculture. The Food and Agricultural Organisation of the United Nations, Rome (2014)
Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recogn. Lett. 31(14), 2225–2236 (2010)
Gerard, L.F., Linn, M.C.: Using automated scores of student essays to support teacher guidance in classroom inquiry. J. Sci. Teacher Educ. 27, 111–129 (2016)
Gilson, A., Safranek, C., Huang, T., Socrates, V., Chi, L., Taylor, R.A., Chartash, D.: How well does ChatGPT do when taking the medical licensing exams? The implications of large language models for medical education and knowledge assessment. medRxiv (2022)
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: Analysis of text on cohesion and language. Behav. Res. Meth. Instrum. Comput. 36(2), 193–202 (2004). https://doi.org/10.3758/BF03195564
Graham, F.: Daily briefing: will ChatGPT kill the essay assignment? Nature (2022)
He, P., Chen, I.C., Touitou, I., Bartz, K., Schneider, B., Krajcik, J.: Predicting student science achievement using post-unit assessment performances in a coherent high school chemistry project-based learning system. J. Res. Sci. Teach. 60, 724–760 (2022)
Huang, K.: Alarmed by A.I. Chatbots, Universities Start Revamping How They Teach. New York Times. https://www.nytimes.com/2023/01/16/technology/chatgpt-artificial-intelligence-universities.html. Accessed 14 Feb 2023
Humbird, K.D., Peterson, J.L., McClarren, R.G.: Deep neural network initialization with decision trees. IEEE Trans. Neural Netw. Learn. Syst. 30(5), 1286–1295 (2018)
Kim, N., Htut, P.M., Bowman, S.R., Petty, J.: (QA)2: Question answering with questionable (2022)
King, M.R.: The future of AI in medicine: a perspective from a Chatbot. Ann. Biomed. Eng. 51, 291–295 (2023). https://doi.org/10.1007/s10439-022-03121-w
Kirmani, A.R.: Artificial Intelligence-enabled science poetry. ACS Energy Lett. 8, 574–576 (2022)
Krajcik, J.S.: Commentary—applying machine learning in science assessment: opportunity and challenges. J. Sci. Educ. Technol. 30, 313–318 (2021)
Latifi, S., Gierl, M.: Automated scoring of junior and senior high essays using Coh-Metrix features: implications for large-scale language testing. Lang. Test. 38(1), 62–85 (2021)
Li, T., Miller, E., Chen, I.C., Bartz, K., Codere, S., Krajcik, J.: The relationship between teacher’s support of literacy development and elementary students’ modelling proficiency in project-based learning science classrooms. Education 3–13 49(3), 302–316 (2021)
Li, T., Liu, F., Krajcik, J.: Automatically assess elementary students’ hand-drawn scientific models using machine learning: is it possible? Paper proposal submitted to the 96th NARST Annual International Conference 2023, Chicago, IL (2023)
McCarthy, M.P., Lightman, J.E., Dufty, F.D., McNamara, S.D.: Using Coh-Metrix to assess cohesion and difficulty in high-school textbooks (2019)
McNamara, D.S., Graesser, A.C.: Coh-Metrix: an automated tool for theoretical and applied natural language processing. In: Applied Natural Language Processing: Identification, Investigation and Resolution, pp. 188–205. IGI Global (2012)
McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z.: Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press (2014)
McNamara, D.S., Louwerse, M.M., McCarthy, P.M., Graesser, A.C.: Coh-Metrix: capturing linguistic features of cohesion. Discourse Process. 47(4), 292–330 (2010)
Metz, C.: The new Chatbots could change the world. Can you trust them? The New York Times. https://www.nytimes.com/2022/12/10/technology/ai-chat-bot-chatgpt.html. Accessed 12 Feb 2023
Mitchell, A.: Professor catches student cheating with ChatGPT: ‘I feel abject terror’ (2022)
National Research Council: A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas. The National Academies Press, Washington, DC (2012)
NGSS Lead States Next generation science standards for states, by states. https://www.nextgenscience.org/. Accessed 12 Feb 2023
Shiroda, M., Fleming, M.P., Haudek, K.C.: Ecological diversity methods improve quantitative examination of student language in short constructed responses in STEM. Front. Educ. 8, 12 (2022)
Stokel-Walker, C.: AI bot ChatGPT writes smart essays-should academics worry? Nature (2022)
Susnjak, T.: ChatGPT: the end of online exam integrity? arXiv (2022)
Tate, T.P., Doroudi, S., Ritchie, D., Xu, Y., Uci, M.W.: Educational research and AI-generated writing: confronting the coming Tsunami (2023)
Thorp, H.H.: ChatGPT is fun, but not an author. Science 379(6630), 313 (2023)
Troia, G.A., Wang, H., Lawrence, F.R.: Latent profiles of writing-related skills, knowledge, and motivation for elementary students and their relations to writing performance across multiple genres. Contemp. Educ. Psychol. 71, 102100 (2022)
Troia, G.A., Shen, M., Brandon, D.L.: Multidimensional levels of language writing measures in grades four to six. Writ. Commun. 36(2), 231–266 (2019)
Underwood, S.M., Posey, L.A., Herrington, D.G., Carmel, J.H., Cooper, M.M.: Adapting assessment tasks to support three-dimensional learning. J. Chem. Educ. 95(2), 207–217 (2018)
Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14(11), e0224365 (2019)
Vincent, J.: AI-generated answers temporarily banned on coding Q&A site Stack Overflow. https://www.theverge.com/2022/12/5/23493932/chatgpt-ai-generated-answers-temporarily-banned-stack-overflow-llms-dangers. Accessed 12 Feb 2023
Wang, H., Troia, G.: Integrating genre-related factors into writing quality predictive modeling. Written Commun. 40, 1070–1112 (2023)
Whitford, E: A computer can now write your college essay—maybe better than you can. https://www.forbes.com/sites/emmawhitford/2022/12/09/a-computer-can-now-write-your-collegeessay---/?sh=2c9da98c6811. Accessed 12 Feb 2023
Williams, C.: Hype, or the future of learning and teaching? 3 Limits to AI’s ability to write student essays. London School of Economics internet blog. https://kar.kent.ac.uk/99505/. Accessed 12 Feb 2023
Wilson, J., Roscoe, R., Ahmed, Y.: Automated formative writing assessment using a levels of language framework. Assess. Writ. 34, 16–36 (2017)
Zhai, X.: ChatGPT user experience: implications for education (2022)
Acknowledgements
We would like to thank Steven Anderson, Shirley Vincent, Ennea Fairchild and other members of the Next Generation Concept Inventory project for their assistance. This material is based upon work supported by the National Science Foundation under Grant No. 2013359. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, H. et al. (2023). Is ChatGPT a Threat to Formative Assessment in College-Level Science? An Analysis of Linguistic and Content-Level Features to Classify Response Types. In: Schlippe, T., Cheng, E.C.K., Wang, T. (eds) Artificial Intelligence in Education Technologies: New Development and Innovative Practices. AIET 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 190. Springer, Singapore. https://doi.org/10.1007/978-981-99-7947-9_13
Download citation
DOI: https://doi.org/10.1007/978-981-99-7947-9_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7946-2
Online ISBN: 978-981-99-7947-9
eBook Packages: EducationEducation (R0)