A rapid growth in artificial intelligence (AI) applications in radiology has encompassed many aspects, from medical imaging interpretation to workflow management and decision-making.  These applications potentially improve diagnostic accuracy, reform the radiology workflows, and enhance patient risk stratification.  Although the market may soon consolidate, for now, commercial interest is still on the rise, and the list of healthcare AI companies continues to expand.
As radiology AI evolves to full integration into the daily routine, it is paramount to ensure its proper function. Hence, there is a growing need for a more critical appraisal of AI applied to patients.
Trials involving control and intervention groups seem to date to the beginning of historical records.  James Lind’s publication of a controlled trial in 1753 demonstrating citrus fruit efficacy in scurvy was a cornerstone for the acceptance of the methodology of comparative trials.  Randomized controlled trials (RCTs) remain one of the most powerful tools in research.  No other study design has such an ability to balance unknown factors that can influence clinical course. 
Many methodological shortcomings were found in a review that compared the diagnostic accuracy of deep learning algorithms to healthcare professionals in classifying diseases using medical imaging. Most studies lacked external validation and performance comparison on the same samples, limiting diagnostic accuracy. 
As AI becomes an integral part of radiology workflow, critical evaluation of AI through methodologically strong tools is crucial. We reviewed RCTs assessing AI systems in radiology. Our review adhered to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Our search terms complied with AI, radiology, and RCTs. We used the paper by Stolberg et al  regarding RCTs as guidance for inclusion. The PubMed search was conducted in September 2021.
Our search produced a total of 195 entries. Forty-nine results were unrelated to clinical practice, and 17 did not use AI. Eight results were from gastroenterology, 13 from ophthalmology, and 10 from cardiology. Twenty-three entries were research regarding clinical prediction models and not imaging. Only 64 entries were related to radiology. Only one paper, “Artificial Intelligence Algorithm Improves Radiologist Performance in Skeletal Age Assessment: A Prospective Multicenter Randomized Controlled Trial,” published in Radiology, was a randomized controlled trial.  A PRISMA flow diagram of the literature review is presented in Fig. 1.
Hence, to date, there is a single RCT in the field of AI in radiology. In their experiment, Eng et al  compared the effect of an AI diagnostic aid on the assessment of skeletal age on hand X-rays. They have shown that their AI-based algorithm improved accuracy and interpretation times. 
Thus, despite the dramatic increase in deep learning publications in the radiology field, there is only one relevant RCT publication. The understanding that interventions involving AI need a prospective evaluation to prove their impact on health outcomes brought to the development of specialized guidelines such as the CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence). Another example is the SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence).  A recent systematic review of RCTs of machine learning interventions in healthcare found a lack of RCTs with only 41 trials. Not a single RCT adhered to all CONSORT-AI guidelines. 
The recognition of the hazardous potential of AI systems is of paramount importance. In contrast to other health interventions, unpredictable and undetectable errors, not explainable by human logic, can occur. For instance, minor changes in medical imaging that are invisible to the human eye may completely alter diagnostic results. Another well-known example is the decision support system that provided incorrect and sometimes even dangerous treatment recommendations.  Thus, RCTs can be important in monitoring these systems’ safety.
In a recent review published in European Radiology, Kelly et al  discussed some critical methodological issues. The lack of explainability in 28% of deep learning clinical radiological papers is worrisome. So is an average decrease of 6% in performance at external validation with a drop of more than 10% in 78% of the studies. They also found problematic study designs such as insufficient sample size and unspecified ground truth. Lack of performance comparison was found in 17% of the reviewed studies, with a set of studies using medically naïve people for comparison. Their findings are concerning and urge improvement of research quality. Using international data for external validation may pose the first step, but RCTs are a stronger tool.
There are several explanations for the lack of RCTs in this field. Firstly, some may believe the conventional tools to estimate an AI system are sufficient. External validation, comparison of the performance of a model compared to a previous one, and mathematical variables such as the area under the curve are often used in assessing performance. Secondly, RCTs for continuous learning algorithms are complicated to design and interpret. Thirdly, controlling multi-centric variability is difficult. Fourthly, creating a study design for bundled AI tools such as a post-processing algorithm with incorporated segmentation is not a simple task due to the interaction between the algorithms. Lastly, setting up an RCT is often challenging and time-consuming compared to alternative study designs.
Nevertheless, RCTs remain the most powerful type of experimental study.  In light of the AI revolution in radiology, we believe the time has come for RCTs and encourage further research in this important field.
Randomized controlled trials
Pianykh OS, Langs G, Dewey M et al (2020) Continuous learning AI in radiology: implementation principles and early applications. Radiology. 297(1):6–14
Elhalawani H, Mak R (2021) Are artificial intelligence challenges becoming radiology’s new “bee’s knees”? Radiol Artif Intell 3(3):e210056
Bothwell LE, Podolsky SH (2016) The emergence of the randomized, controlled trial. N Engl J Med 375(6):501–504
Stolberg HO, Norman G, Trop I (2004) Randomized controlled trials. AJR Am J Roentgenol 183(6):1539–1544
Liu X, Faes L, Kale AU et al (2019) A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 1(6):e271–e297
Eng DK, Khandwala NB, Long J et al (2021) Artificial intelligence algorithm improves radiologist performance in skeletal age assessment: a prospective multicenter randomized controlled trial. Radiology. 28:204021. https://doi.org/10.1148/radiol.2021204021 Epub ahead of print
Liu X, Cruz Rivera S, Moher D, Calvert MJ (2020) Denniston AK; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 26(9):1364–1374
Plana D, Shung DL, Grimshaw AA et al (2022) Randomized clinical trials of machine learning interventions in health care: a systematic review. JAMA Netw Open 5(9):e2233946. https://doi.org/10.1001/jamanetworkopen.2022.33946
Bennett C. “Watson recommends incorrect cancer treatments, system training questioned.”, 5(5), p. 29.Clinical OMICs.Sep 2018.
Kelly BS, Judge C, Bollard SM et al (2022) Radiology artificial intelligence: a systematic review and evaluation of methods (RAISE). Eur Radiol. 2022 Apr 14. https://doi.org/10.1007/s00330-022-08784-6. Epub ahead of print. Erratum in: Eur Radiol. 2022 May 20
The authors state that this work has not received any funding.
The scientific guarantor of this publication is Larisa Gorenstein.
Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry
One of the authors has significant statistical expertise. No complex statistical methods were necessary for this paper.
Written informed consent was not required for this study because this is a review article without patient information.
Institutional Review Board approval was not required because this is a review article using only PubMed.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gorenstein, L., Soffer, S., Apter, S. et al. AI in radiology: is it the time for randomized controlled trials?. Eur Radiol 33, 4223–4225 (2023). https://doi.org/10.1007/s00330-022-09381-3