Dear Editor,

I am writing to provide a response to the article by Kollitsch [1] et al. titled “How does artificial intelligence master urological board examinations?”. The authors performed a comparative analysis of various large language models (LLMs) to assess their accuracy and reliability in answering urological knowledge-based questions and revealed a correlation between question complexity and the performance of LLMs, underscoring the importance of conducting further research in specific subdomains of urology.

In addition to the evaluation presented in this paper, we would like to provide further insights into the implications and future directions. First, the findings of this study indicated that ChatGPT-4 and Bing AI consistently outperformed ChatGPT-3.5 in terms of RoCA scores. However, the reliability of responses across multiple rounds varied. This suggests the need for careful assessment of the reliability of LLM-generated responses in the context of medical education and knowledge acquisition. Second, the study uncovered a consistent trend across all three LLMs, indicating a decrease in test accuracy as question complexity increased. This underscores the importance of training LLMs on a wide range of medical literature and resources to enhance their performance in tackling complex questions. Third, the study's findings suggest that the adaptive learning capacity of LLMs may have limitations. Therefore, there is a significant need for continuous updates, ongoing training, and active maintenance of LLMs to ensure their reliability and effectiveness in acquiring medical knowledge. Furthermore, the study highlights significant concerns regarding the quality and consistency of responses generated by LLMs. It underscores the necessity for additional research to comprehensively evaluate the reliability and the reasoning quality of responses generated by LLMs, especially in the context of medical education and knowledge assessment.

In conclusion, the authors’ research enhances our understanding of the potential of LLMs in medical education and clinical practice, highlighting the need for further research to assess the performance of LLMs in specific subdomains within urology and other medical disciplines.