Abstract
Artificial intelligence (AI) technologies have been consistently influencing the progress of education for an extended period, with its impact becoming more significant especially after the launch of ChatGPT-3.5 at the end of November 2022. In the field of physics education, recent research regarding the performance of ChatGPT-3.5 in solving physics problems discovered that its problem-solving abilities were only at the level of novice students, insufficient to cause outstanding alarm in the field of physics education. However, the release of ChatGPT-4 presented substantial improvements in reasoning and conciseness. How does this translate to performance in solving physics problems, and what kind of impact might it have on education? This study undertakes a comprehensive assessment of ChatGPT-4’s performance in solving physics problems from the perspective of physics conceptual understanding and reasoning, and compares its performance with that of students. It is concluded that ChatGPT-4’s performance in solving physics problems has significantly improved compared to ChatGPT-3.5, and was notably superior to the majority of middle school and high school students. This finding presents both a challenge and an opportunity for physics education and the broader educational field, and triggers immediate considerations for coping with this challenge in future teaching and assessment environments.
Similar content being viewed by others
References
Bagno, E., & Eylon, B. S. (1997). From problem solving to a knowledge structure: An example from the domain of electromagnetism. American Journal of Physics, 65(8), 726–736. https://doi.org/10.1119/1.18642
Bonham, S. (2007). Graphical response exercises for teaching physics. The Physics Teacher, 45(8), 482–486. https://doi.org/10.1119/1.2798359
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,. G, Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., …, Amodei, D. (2020). Language models are few-shot learners. In Advances in neural information processing systems (Vol 33, pp. 1877–1901). https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Caballero, M. D., Greco, E. F., Murray, E. R., Bujak, K. R., Jackson Marr, M., Catrambone, R., Kohlmyer, M. A., & Schatz, M. F. (2012). Comparing large lecture mechanics curricula using the Force Concept Inventory: A five thousand student study. American Journal of Physics, 80(7), 638–644. https://doi.org/10.1119/1.3703517
Cai, S., Chiang, F. K., & Wang, X. (2013). Using the augmented reality 3D technique for a convex imaging experiment in a physics course. International Journal of Engineering Education, 29(4), 856–865.
Chang, R. C., Chung, L. Y., & Huang, Y. M. (2016). Developing an interactive augmented reality system as a complement to plant education and comparing its effectiveness with video learning. Interactive Learning Environments, 24(6), 1245–1264. https://doi.org/10.1080/10494820.2014.982131
Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial Intelligence trends in education: A narrative overview. Procedia Computer Science, 136, 16–24. https://doi.org/10.1016/j.procs.2018.08.233
Cooper, G. (2023). Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology, 32(3), 444–452. https://doi.org/10.1007/s10956-023-10039-y
Eaton, P., Johnson, K., Frank, B., & Willoughby, S. (2019). Classical test theory and item response theory comparison of the brief electricity and magnetism assessment and the conceptual survey of electricity and magnetism. Physical Review Physics Education Research, 15(1), 010102. https://doi.org/10.1103/PhysRevPhysEducRes.15.010102
Gregorcic, B., & Pendrill, A. M. (2023). ChatGPT and the frustrated Socrates. Physics Education, 58(3), 035021. https://doi.org/10.1088/1361-6552/acc299/meta
Hestenes, D., & Halloun, I. (1995). Interpreting the force concept inventory: A response to March 1995 critique by Huffman and Heller. The Physics Teacher, 33(8), 502–502. https://doi.org/10.1119/1.2344278
Hu, X., Sun, S., Yang, W., & Ding, G. (2022). Rengong zhineng funeng jiaoyu gaozhiliang fazhan: Xuqi, yuanjing yu lujing [Artificial intelligence empowering the high-quality development of education: Demands, vision and paths]. Xiandai Jiaoyu Jishu, 01, 5–15. https://doi.org/10.3969/j.issn.1009-8097.2022.01.001
Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course? Physical Review Physics Education Research, 19(1), 010132. https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
Kreps, S., & Kriner, D. L. (2023). The potential impact of emerging technologies on democratic representation: Evidence from a field experiment. New Media & Society. https://doi.org/10.1177/1461444823116052
Krusberg, Z. A. (2007). Emerging technologies in physics education. Journal of Science Education and Technology, 16, 401–411. https://doi.org/10.1007/s10956-007-9068-0
Kučak, D., Juričić, V., & Đambić, G. (2018). Machine learning in education—A survey of current research trends. Annals of DAAAM & Proceedings, 29. https://www.daaam.info/Downloads/Pdfs/proceedings/proceedings_2018/059.pdf
Larkin, J., McDermott, J., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance in solving physics problems. Science, 208(4450), 1335–1342. https://doi.org/10.1126/science.208.4450.1335
Laverty, J., & Kortemeyer, G. (2012). Function plot response: A scalable system for teaching kinematics graphs. American Journal of Physics, 80(8), 724–733. https://doi.org/10.1119/1.4719112
Li, J., Xing, H., & Li, C. (2009). Yunyong yuanshi wenti cujin wuli suzhi jiaoyu yanjiu [Promoting research on physical quality education using primitive physics problems]. Wuli Jiaoshi, 08, 1–2+8.
Lin, H. C. K., Chen, M. C., & Chang, C. K. (2015). Assessing the effectiveness of learning solid geometry by using an augmented reality-assisted learning system. Interactive Learning Environments, 23(6), 799–810. https://doi.org/10.1080/10494820.2013.817435
Maloney, D. P., O’Kuma, T. L., Hieggelke, C. J., & Van Heuvelen, A. (2001). Surveying students’ conceptual knowledge of electricity and magnetism. American Journal of Physics, 69(S1), S12–S23. https://doi.org/10.1119/1.1371296
Mogali, S. R. (2023). Initial impressions of ChatGPT for anatomy education. Anatomical Sciences Education. https://doi.org/10.1119/1.1371296
Newell, A. (1993). Heuristic programming: Ill-structured problems (pp. 3–54). MIT Press.
Pham, S. T., & Sampson, P. M. (2022). The development of artificial intelligence in education: A review in context. Journal of Computer Assisted Learning, 38(5), 1408–1421. https://doi.org/10.1111/jcal.12687
Tao, Y., Wang, Y., & Xing, H. (2022). Zhongxue wuli kecheng shuzi ziyuan jianjie [Introduction to digital resources for secondary school physics curriculum]. Wuli Jiaoshi, 11, 75–78+81.
Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313. https://doi.org/10.1126/science.adg7879
West, C. G. (2023). AI and the FCI: Can ChatGPT project an understanding of introductory physics? https://doi.org/10.48550/arXiv.2303.01067
Xing, H. (2010). Cong shuju qudong dao gainian qudong: wuli wenti jiejue fangshi de zhongyao zhuanbia [From data-driven to concept-driven: an important change in the ways of physics problem solution]. Kecheng Jiaocai Jiaofa. https://doi.org/10.19877/j.cnki.kcjcjf.2010.03.012
Xing, H. (2023). Wuli nengli de “shuangfeng” fenbu jiqi qishi [“Bimodal” distribution of physical capabilities and its enlightenment]. Wuli Jiaoshi, 03, 31–34.
Xing, H., Cai, X., & Hu, Y. (2017). Chuzhongsheng kexue tuili nengli yu yuanshi wenti jiejue nengli de bijiao yanjiu [A comparative study of junior high school students’ scientific reasoning ability and primitive physics problems solving ability]. Wuli Jiaoshi, 07, 35–40.
Xing, H., Zhai, Y., Han, S., Zhao, Y., Gong, W., Wang, Y., Han, J., & Liu, Q. (2022). The measuring instrument of primitive physics problem for upper-secondary school students: Compilation and exploration. Journal of Baltic Science Education, 21(2), 305–324. https://doi.org/10.33225/jbse/22.21.305
Zhai, X. (2022). ChatGPT user experience: Implications for education. SSRN 4312418. https://doi.org/10.2139/ssrn.4312418
Funding
This research was supported by the National Office for Education Sciences Planning (China) under Grant DIA220370.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest is reported by the authors.
Ethical approval
As the subject of this study is ChatGPT-4, our investigation on this AI tool does not require an Ethics Committee Approval. As for the portion where we compared ChatGPT-4’s performance with students’, the student data were adapted from three previous studies, namely Li et al. (2009), Xing (2023), Xing et al. (2017). We have obtained permission from the copyright owners. Henceforth, the entirety of our submission does not require an Ethics Committee Approval.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tong, D., Tao, Y., Zhang, K. et al. Investigating ChatGPT-4’s performance in solving physics problems and its potential implications for education. Asia Pacific Educ. Rev. (2023). https://doi.org/10.1007/s12564-023-09913-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12564-023-09913-6