Skip to main content
Log in

Assessing knowledge about medical physics in language-generative AI with large language model: using the medical physicist exam

  • Technical Note
  • Published:
Radiological Physics and Technology Aims and scope Submit manuscript

Abstract

This study aimed to evaluate the performance for answering the Japanese medical physicist examination and providing the benchmark of knowledge about medical physics in language-generative AI with large language model. We used questions from Japan’s 2018, 2019, 2020, 2021 and 2022 medical physicist board examinations, which covered various question types, including multiple-choice questions, and mainly focused on general medicine and medical physics. ChatGPT-3.5 and ChatGPT-4.0 (OpenAI) were used. We compared the AI-based answers with the correct ones. The average accuracy rates were 42.2 ± 2.5% (ChatGPT-3.5) and 72.7 ± 2.6% (ChatGPT-4), showing that ChatGPT-4 was more accurate than ChatGPT-3.5 [all categories (except for radiation-related laws and recommendations/medical ethics): p value < 0.05]. Even with the ChatGPT model with higher accuracy, the accuracy rates were less than 60% in two categories; radiation metrology (55.6%), and radiation-related laws and recommendations/medical ethics (40.0%). These data provide the benchmark for knowledge about medical physics in ChatGPT and can be utilized as basic data for the development of various medical physics tools using ChatGPT (e.g., radiation therapy support tools with Japanese input).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data used in this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request.

References

  1. Cha E, Elguindi S, Onochie I, et al. Clinical implementation of deep learning contour autosegmentation for prostate radiotherapy. Radiother Oncol. 2021;159:1–7.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Mackay K, Bernstein D, Glocker B, et al. A review of the metrics used to assess auto-contouring systems in radiotherapy. Clin Oncol (R Coll Radiol). 2023;35:354–69.

    Article  CAS  PubMed  Google Scholar 

  3. Heilemann G, Zimmermann L, Schotola R, et al. Generating deliverable DICOM RT treatment plans for prostate VMAT by predicting MLC motion sequences with an encoder-decoder network. Med Phys. 2023;50:5088–94.

    Article  PubMed  Google Scholar 

  4. Tomori S, Kadoya N, Takayama Y, et al. A deep learning-based prediction model for gamma evaluation in patient-specific quality assurance. Med Phys. 2018;45:4055–65.

    Article  Google Scholar 

  5. Tozuka R, Kadoya N, Tomori S, et al. Improvement of deep learning prediction model in patient-specific QA for VMAT with MLC leaf position map and patient’s dose distribution. J Appl Clin Med Phys. 2023;24:e14055.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Introducing ChatGPT. OpenAI. URL: https://openai.com/blog/chatgpt [accessed 2023–8–21].

  7. A message from our CEO: an important next step on our AI journey. Google. 2023. URL: https://blog.google/technology/ai/bard-google-ai-search-updates/ [accessed 2023–08–21].

  8. Gilson A, Safranek CW, Huang T, et al. How does chatgpt perform on the united states medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Singhal K, Tu T, Gottweis J, et al. Towards Expert-Level Medical Question Answering with Large Language Models, 2023; arXiv:2305.09617.

  10. Rebelo N, Sanders L, Li K, et al. Learning the treatment process in radiotherapy using an artificial intelligence-assisted chatbot: development study. JMIR Form Res. 2022;6: e39443.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Liu Z, Zhong A, Li Y, et al. Radiology-GPT: A large language model for radiology, 2023; arXiv:2306.08666.

  12. Toyama Y, Harigai A, Abe M, et al. Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan radiology society. Japanese J Radiol. 2023. https://doi.org/10.1007/s11604-023-01491-2.

    Article  Google Scholar 

  13. Etxaniz J, Azkune G, Soroa A, et al. Do multilingual language models think better in english?, 2023; arXiv:2308.01223.

  14. Han X, Zhang Z, Ding N, et al. Pre-trained models: past, present and future, 2021; arXiv:2106.07139.

  15. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need, 2017; arXiv:1706.03762.

  16. Dong L, Xu S Xu B. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. 2018 ieee international conference on acoustics, speech and signal processing (ICASSP). 20185884–5888.

  17. Yenduri G, M R, Selvi G C, et al. Generative pre-trained transformer: a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions, 2023; arXiv:2305.10435.

  18. Zaitsu W Jin M. Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis, 2023; arXiv:2304.05534.

  19. Medical physicist certification examination, Japanese board for medical physicist qualification. https://www.jbmp.org/certification/examination/ [accessed 2023–8–21].

  20. Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP Tasks, 2020; arXiv:2005.11401.

  21. Xiong G, Jin Q, Lu Z, et al. Benchmarking Retrieval-augmented generation for medicine, 2024; arXiv:2402.13178.

  22. Elmore S, Prajogi G, Polo A, et al. The global radiation oncology workforce in 2030: estimating physician training needs and proposing solutions to scale up capacity in low- and middle-income countries. Appl Radiation Oncol. 2019. https://doi.org/10.37549/ARO1193.

    Article  Google Scholar 

  23. Miller DD, Brown EW. Artificial intelligence in medical practice: the question to the answer? Am J Med. 2018;131:129–33.

    Article  PubMed  Google Scholar 

  24. Vaishya R, Javaid M, Khan IH, et al. Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr. 2020;14:337–9.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Liu P, Yuan W, Fu J, et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing, 2021; arXiv:2107.13586.

  26. Chen L, Zaharia M Zou JY. How is ChatGPT's behavior changing over time? ArXiv 2023;abs/2307.09009.

Download references

Acknowledgements

The authors are grateful to Japanese Board for Medical Physicist Qualification for permission of usage of the exam questions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noriyuki Kadoya.

Ethics declarations

Conflicts of interest

Inoue is employees of Elith, inc.

Ethical approval

There are no human subjects in this article and informed consent is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadoya, N., Arai, K., Tanaka, S. et al. Assessing knowledge about medical physics in language-generative AI with large language model: using the medical physicist exam. Radiol Phys Technol (2024). https://doi.org/10.1007/s12194-024-00838-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12194-024-00838-2

Keywords

Navigation