COVID-19, AI enthusiasts, and toy datasets: radiology without radiologists

Tizhoosh, H. R.; Fratesi, Jennifer

doi:10.1007/s00330-020-07453-w

COVID-19, AI enthusiasts, and toy datasets: radiology without radiologists

Letter to the Editor
Open access
Published: 11 November 2020

Volume 31, pages 3553–3554, (2021)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

COVID-19, AI enthusiasts, and toy datasets: radiology without radiologists

Download PDF

4326 Accesses
12 Citations
22 Altmetric
Explore all metrics

This article has been updated

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

In computer science, textbooks talk about the “garbage in, garbage out” concept (GIGO); i.e., low-quality input data generates unreliable output or “garbage.” GIGO becomes, even more, a pressing issue when we are dealing with highly complex data modalities, such as radiographs and computed tomography scans.

The performance of any deep network directly depends on the quality of the dataset that it learns from. Reputable repositories like Cancer Imaging Archive [1] backed up with a large body of work by experts [2] is an example of reliable datasets. Adhering to DICOM standards and ensuring that images are properly linked to supporting metadata are obligatory to construct a well-curated dataset.

In recent weeks, we are observing a trend to hastily use ill-curated data to train deep networks for COVID-19. It seems AI enthusiasts impatiently create their own datasets of medical images without seeking clinical collaborators to guide them. These collections are rather “toy sets” through the manual gathering of publicly accessible images (e.g., online journals, and preprints on non-peer-reviewed archives). Most of the time AI researchers—with no clinical or medical competency—create their own experimental “toy” datasets to run initial investigations and establish a framework for algorithmic challenges.

To be clear, a “toy dataset” from the medical imaging perspective is not a toy just because it is very small and does not comply with DICOM standards, but more importantly because it has been created by engineers and computer scientists, and not by physicians and medical/clinical experts. Such datasets of COVID-19 images have been emerging on the Internet and used by AI enthusiasts to write blogs and non-peer-reviewed reports [3,4,5,6,7]. The training of the so-called COVID Nets happens with these toy datasets with no radiologist participation, and with no common validations such as “leave-one-out” testing. In an attempt to overcome the small data size, AI enthusiasts mix the few adult COVID-19 images scraped from the Internet with many pediatric (bacterial) pneumonia images [5, 6]; Are these COVID Nets learning anything meaningful?

No one can curate a COVID-19 dataset in disregard of professional recommendations. The American College of Radiology (ACR) and Canadian Association of Radiology (CAR) currently do not recommend the use of x-ray or CT imaging to screen or diagnose COVID-19 infections [8] because of risks for spreading the infection, resource constraints, and added logistics. However, CT, in particular, may be useful to expedite care in symptomatic patients with a negative or pending swab, and in those developing complications such as acute respiratory distress syndrome, and findings suspicious for COVID-19 are commonly being seen in high-risk patients incidentally. Findings on CT are non-specific and can overlap with other types of viral infections (such as influenza) and other non-infectious diseases, for example, organizing pneumonia and drug reaction but there are some characteristic features [9] and standardized reporting has been recently introduced by the RSNA [10]. A well-curated dataset should consider multiple phases:

Early phase (2–4 days): bilateral, ground-glass opacities, rounded or nodular appearance (50%), peripheral and basal in distribution
Intermediate phase (4–7 days): consolidation, reverse halo, crazy paving
Late phase: consolidation, diffuse bilateral ground-glass opacities, organized pneumonia appearance

Faulty results based on creating amateur datasets and training sketchy AI solutions hastily to publish online may not make it to mainstream radiology due to the barriers of peer review; it may, however, create false hope among patients and patient advocacy groups, falsify the perception of government funding agencies and healthcare policy organizations, and misguide young scientists and resident radiologists. It is the duty of both serious AI researchers and expert radiologists to set the records straight: Any dataset of radiological images must be assembled by the participation of expert radiologists; there is no radiology without radiologists. Serious scientists have indeed recognized this and are delivering peer-reviewed papers using carefully curated image data [11, 12].

Change history

11 December 2020
The sentence "Thesecollections are rather “toy sets” through the manual gathering of publicly accessible images (e.g., online journals, and preprints on preprints non-peer-reviewed archives)." was corrected to "These collections are rather “toy sets” through the manual gathering of publicly accessible images (e.g., online journals, and preprints on non-peer-reviewed archives)."

References

Cancer Imaging Archive: https://www.cancerimagingarchive.net/. Accessed 23 March 2020
Prior F, Almeida J, Kathiravelu P et al (2019) Open access image repositories: high-quality data to enable machine learning research. Clin Radiol. https://doi.org/10.1016/j.crad.2019.04.002
Cohen JP, Morrison P, Dao L (2020) COVID-19 image data collection. arXiv:2003.11597v1 [eess.IV] 25 Mar 2020
COVID-19 Image Data Collection, URL: https://github.com/ieee8023/covid-chestxray-dataset. Accessed 23 March 2020
Wang L, Wong A (2020) COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest radiography images. arXiv:2003.09871v1 [eess.IV] 22 Mar 2020 (original version)
COVID-Net and COVIDx Dataset. https://github.com/lindawangg/COVID-Net. Accessed 23 March 2020
Zhao J, Zhang Y, He X, Xie P (2020) COVID-CT-dataset: a CT scan dataset about COVID-19. arXiv:2003.13865v1 [cs.LG] 30 Mar 2020
American College of Radiology. ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection. Retrieved Mar 17, 2020 available from https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection. Accessed 25 April 2020
Bernheim A, Mei X, Huang M et al (2020) Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection. Radiology 20:200463
Byrne D, O’Neill SB, Müller NL et al (2020) RSNA expert consensus statement on reporting chest CT findings related to COVID-19: interobserver agreement between chest radiologists. Can Assoc Radiol J 2:0846537120938328
Francone M, Iafrate F, Masci GM et al (2020) Chest CT score in COVID-19 patients: correlation with disease severity and short-term prognosis. Eur Radiol. https://doi.org/10.1007/s00330-020-06865-y
Revel MP, Parkar AP, Prosch H et al (2020) COVID-19 patients and the radiology department – advice from the European Society of Radiology (ESR) and the European Society of Thoracic Imaging (ESTI). Eur Radiol. https://doi.org/10.1007/s00330-020-06865-y

Download references

Funding

The authors state that this work has not received any funding.

Author information

Authors and Affiliations

Kimia Lab, University of Waterloo, Waterloo, Canada
H. R. Tizhoosh
Vector Institute, MaRS Centre, Toronto, Canada
H. R. Tizhoosh
Department of Medical Imaging, University Health Network, Toronto, Canada
Jennifer Fratesi

Authors

H. R. Tizhoosh
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Fratesi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. R. Tizhoosh.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Hamid Tizhoosh, University of Waterloo, Canada.

Conflict of interest

The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Statistics and biometry

No complex statistical methods were necessary for this paper.

Informed consent

Written informed consent was not required for this study because this is a Letter to the Editor.

Ethical approval

Institutional Review Board approval was not required because this is a Letter to the Editor.

Methodology

• Letter to the Editor

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tizhoosh, H.R., Fratesi, J. COVID-19, AI enthusiasts, and toy datasets: radiology without radiologists. Eur Radiol 31, 3553–3554 (2021). https://doi.org/10.1007/s00330-020-07453-w

Download citation

Received: 04 September 2020
Revised: 23 September 2020
Accepted: 02 November 2020
Published: 11 November 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00330-020-07453-w

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

COVID-19, AI enthusiasts, and toy datasets: radiology without radiologists

Change history

11 December 2020

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Guarantor

Conflict of interest

Statistics and biometry

Informed consent

Ethical approval

Methodology

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation