Introduction

The novel coronavirus (nCoV) – originally known as SARS-nCoV-2 – has become one of the most vulnerable viruses, threatening human lives for the last hundred years [1]. Due to the exponential rising in the number of cases, the World Health Organization (WHO) declares Covid-19 as a pandemic in March 2020 [2]. The primary symptoms of Covid-19 are headaches, muscle pain, cough, common cold, occasional fevers, and in several vulnerable cases, breathing problems [3, 4]. Such a disease can also be asymptomatic. Therefore, detecting its presence by clinical prognosis becomes cumbersome. It is currently confirmed with a Reverse Transcript Polymerase Chain Reaction (RT-PCR) test, which we considered the gold standard [5]. However, it is expensive and time consuming as it requires adequate testing centers and clinical experts. Medical experts and clinicians have tirelessly contributed towards the early results of screening trials of this virus. The speedy acquiring of test results offers two main advantages: i) the subject can be moved to a diagnosis care center sooner, preventing further spread; and ii) the recovery chances improve with a faster diagnostic time.

Artificial Intelligence (AI) has promoted countless contributions in the field of medical imaging. Healthcare tools have advanced the quality of screening procedures in the Covid-19 era [6,7,8]. Machine Learning (ML) and Deep Learning (DL) based tools for Covid-19 prognosis and diagnosis have utilized statistical approaches to extract normal/abnormal patterns in chest Computed Tomography (CT) and/or X-rays [9]. This is done to predict the possibility of a Covid-19 affected lung region that reduces the prognosis time and determines the need for an RT-PCR test. Computer-Aided Diagnosis (CADx) tools created from DL tools using CT and X-ray images, custom Neural Networks (NNs), and with and without transfer learning models have been proposed [10,11,12,13,14].

Training and validating Covid-19 screening-based CADx tools typically involve acquisition of image data (positive and negative classes) and feature-based pattern analysis using imaging tools [15]. Deploying up-to-date ML and/or DL models is to prevent possible risks on human lives [16, 17]. We consider both chest image data: CT and X-ray images, and elaborate on the performance of imaging tools in accordance with the data size. We are aware of thousands of research articles published in the year 2020 [18]. We, however, are considering medical imaging tools that employ chest CT and X-ray image data, other than pre-prints from such as ArXiv, medRxiv, and TechRxiv.

The remainder of the paper is organized as follows. In “Medical imaging tools: Chest CT scans and X-rays”, we review Covid-19 screening models using chest CT images (ref. Chest CT imaging) and X-ray images (ref. Chest X-ray imaging). We then discuss on how big data is big in “How big data is big?” by considering both image modalities into account. “Conclusion” concludes the paper.

Medical imaging tools: Chest CT scans and X-rays

Chest CT imaging

As mentioned earlier, for Covid-19, we elaborate on the use of chest CT imaging methods based on the performance by taking dataset size into account. In what follows, we consider 16 different research articles that have contributed to detect Covid-19 positive cases in 2020 (see Table 1).

Table 1 Chest CT imaging tools, their datasets, and performance measured in Accuracy (ACC), Area Under the Curve (AUC), Specificity (SPEC), and Sensitivity (SEN)

Farid et al. [19] devised a Convolutional Neural Network (CNN) based approach to classify Covid-19 and SARS images (51 each class). Using 10-fold cross validation, they reported an accuracy of 94.11%. Singh et al. [20] developed a CNN using a multi-objective differential evolution (MODE) technique. Using 150 CT images (75 each class) and hold-out validation (90 : 10), an accuracy of 93.25% was reported. Hasan et al. [21] used handcrafted features from Q-deformed entropy to distinguish between lung scans, Pneumonia, and Covid-19 CT slices. A long short-term memory (LSTM) architecture enabled them to achieve 99.68% accuracy on 321 subjects. A notable study was conducted by Mukherjee et al. [22], where they engineered a CNN-tailored Deep Neural Network (DNN) that can collectively train/test both CT scans and CXRs. In their experiments, they achieved an overall accuracy of 95.83% (AUC = 0.9731) for CT scans. Xu et al. [23] detected trainable features between Influenza-A viral Pneumonia from Covid-19 (source: Medical Centers, China). With variants of CNN and a pre-trained DNN, namely ResNet18, they achieved an accuracy of 86.7% and F1-score of 81.1% on 618 CT images in total. Loey et al. [24] used 5 different DNN architectures, namely AlexNet, VGG16, VGG19, GoogleNet, and ResNet50. Using data augmentation (of size 742 images) with Conditional Generative Adversarial Networks (CGAN), they achieved an accuracy of 82.91%, sensitivity of 77.66%, and specificity of 87.62% with ResNet50 classifier. Wu et al. [25] analyzed 495 CT subjects that were collected from three different hospitals in China. They used a DL-based multi-view fusion model and classified Covid-19 and pneumonia with an accuracy of 0.76 and AUC of 0.819 in the testing set, comprised of 50 subjects. Pathak et al. [26] conducted an experiment with Covid-19 CT images using a deep transfer learning method by taking a baseline ResNet50 pre-trained architecture into account. Using 10-fold cross validation approach on a balanced dataset of size 826, they achieved an accuracy of 93.01%. Amyar et al. [27] optimized segmentation and classification performances by training/validating 1,369 images, with 449 Covid-19 CT images. They achieved a dice coefficient score of 0.88 and an AUC of 97%. Li et al. [28] used CT data collected across 6 different hospitals. Using ResNet50 architecture on dataset of size 3,322 subjects, they achieved an AUC score of 0.96. Ardakani et al. [29] utilized 1,020 CT Covid-19 affected CT images. They studied 10 different DNN architectures, and achieved the best accuracy of 99.51% (with AUC = 0.994 and sensitivity = 100%) from ResNet101 model. Ko et al. [30] used four DNNs, namely VGG16, ResNet50, InceptionV3, and Xception. With access to 3,993 CT images (Covid-19 (1,194), other pneumonia (1,357), and non-pneumonia (1,442)) across two hospitals and a public database, the ResNet50 achieved best accuracy of 99.87%. Alshazly et al. [31] experimented on two different CT datasets and used seven different DNNs. They used a k(= 5) fold cross-validation, and achieved accuracies of 99.4% and 92.9% in the two separate datasets, respectively. Ni et al. [32] implemented a deep learning model to train and validate with CT data acquired from 14,435 subjects. The method detects lesions, with segmentation and location with sensitivity and F1-score of 100% and 97% per-patient basis. Zhou et al. [33] ensembled (majority voting) AlexNet, GoogleNet, and ResNet18 architectures. With a transfer learning approach and a k(= 5) fold cross-validation training procedure involving 7,500 CT images, equally distributed between lung tumor, Covid-19 positive, and normal class, they achieved an accuracy of 99.05%. Chen et al. [34] developed a Covid-19 CT screening tool validated on 46,096 images from Renmin Hospital of Wuhan University. Using a pre-trained imageNet dataset, they achieved 95.24% and 96% accuracies on an internal and external test datasets, respectively.

Chest X-ray imaging

Like CT imaging tools/techniques, we review 24 different works, as shown in Table 2.

Table 2 Chest X-ray imaging tools, their datasets, and performance measured in Accuracy (ACC), Area Under the Curve (AUC), Specificity (SPEC), and Sensitivity (SEN)

Alqudah et al. [35] used CNN to extract features from 79 images in total, and reported an accuracy of 95.2%. Ucar and Korkmaz [36] employed Bayesian optimization procedure with a SqueezeNet network. On a dataset of 6,000 images, they achieved an overall accuracy of 98.26%. Loey et al. [37] used three different deep transfer models, namely ResNet18, GoogleNet, and AlexNet, to classify between four classes: pneumonia bacterial, pneumonia virus, normal, and Covid-19 positive cases. On 307 images, they reported an accuracy of 100% when Covid-19 vs normal class for GoogleNet was validated. Ozturk et al. [38] conducted a binary classification (no findings vs Covid-19) versus multi-class classification (no findings vs Covid-19 vs pneumonia) using a DarkNet model. They achieved 98.08% and 87.02% accuracies for binary and multi-class classification, trained and validated on a dataset of size 1,127 images. A notable study was conducted by Mukherjee et al. [39], where 260 X-ray images. Using their shallow CNN, they reported an AUC of 0.9869 and accuracy of 96.92%, where k(= 5) fold validation was employed. Ozcan [40] used ResNet18, ResNet50, and GoogleNet to develop a grid search approach. On a dataset of size 721 images, they obtained the best accuracy and F1-score of 97.69% and 96.60%, respectively with ResNet50 architecture. Civit et al. [41] implemented a VGG-16 based CADx tool to identify Covid-19 and pneumonia with an AUC of 0.9 and sensitivity of 100%, when trained and validated on 396 images. Rahimzadeh and Attar [42] employed Xception and ResNet50V2 models to classify unbalanced classes, comprised of 180 Covid-19, 6,054 pneumonia, and 8,851 normal images. Using k(= 5) fold validation approach, they obtained accuracies of 99.5% (overall) and 91.4% (between folds). Ismael and Şengür [43] used a ResNet50 and SVM classifier on 380 images (Covid-19: 180, normal: 200) and achieved an accuracy of 94.74%. Vaid et al. [44] used VGG19 model and achieved an accuracy of 96.3% on a dataset of size 545 images (and Covid-19: 181 images). Panwar et al. [45] used 192 Covid-19 images (337 images, in total) to train a CNN model with a VGG16 base, and achieved an accuracy of 97.62%. Nour et al. [46] used dataset of size 2,033 images, where viral pneumonia, normal, and Covid-19 positive cases were taken). Using a CNN to extract features and k-nearest neighbor, decision tree and SVM to classify, they achieved best results from SVM (F1-score: 96.72%). Apostolopoulos and Mpesiana [47] used two different datasets that include Covid-19, normal, bacterial pneumonia, and a bacterial and viral pneumonia. Using separate train and test sets, they reported the best results from VGG19 (accuracy: 93.48%) in the first dataset, and from MobileNet V2 (accuracy: 94.72%) in the second dataset. Toraman et al. [48] used a CNN CapsNet to classify, and achieved an accuracy of 84.22% in detecting Covid-19 positive cases. Brunese et al. [49] used 6,523 images, and their CADx tool (transfer learning using VGG16 network) classified Covid-19 positive cases with an accuracy of 97%. Jain et al. [50] used data augmentation and built a dataset of size 1,832 from 1,215 images. Using ResNet50, their overall precision was 96.39% and F1- score was 98.15%. Khan et al. [51] used XceptionNet to classify Covid-19 positive cases. Using 1251 images (Covid-19 cases = 284), they achieved an accuracy of 89.6% and recall rate of 98.2%. Sitaula and Hossain [52] used an attention based VGG-16 network on three different datasets. They achieved the best classification accuracy of 87.49% on a dataset on a dataset of size 2,138 images. Sitaula and Aryal [53] used Bag of Deep Visual Words (BoDVW) to extract deep features and SVM to classify images. They performed separate training and validation on four different datasets. The best performance was achieved from dataset with 2,138 images (accuracy: 87.92%). Wang et al. [54] made their dataset (titled Covid-x, collected from five datasets: 13,975 images) publicly available. Using their custom CNN, they reported an accuracy of 92.4% and a sensitivity of 80%. Ismael and Şengür [55] used ELM classifier, ResNet50 and SVM to extract and classify deep features. With 561 images (Covid-19: 361, normal: 200) and achieved an accuracy of 99.29%. Marques et al. [56] employed DNN algorithm, known as EfficientNet to detect Covid-19 positive cases. In their test on 1,508 images (Covid-19 cases = 504), they achieved an accuracy of 96.70% (multi-class). Das et al. [57] used different categories (TB, Covid-19 positive, pneumonia, and control) chest X-rays and divided them into six different datasets. They trained a truncated Inception-V4 architecture and tested it on these six datasets separately using a cross-validation approach. This allows them to achieve an average accuracy of 98.77% with a standard deviation of ± 0.702.

How big data is big?

Needless to mention that the aforementioned research articles (see Tables 1 and 2) have used different feature extractors, decision-making processes and experimental set ups. More importantly, for Covid-19, their dataset sizes are varied over time, and so the sources are. For a fair analysis, let us not discuss on their methodologies and/or techniques, we rather focus on dataset size. We then elaborate on the strength of machine learning and deep learning algorithms by taking the following factors into account, such as fitting, transfer learning in the era of deep learning, and data augmentation.

  1. 1.

    Dataset: For easy understanding, we organize research articles, in both Tables 1 and 2, in accordance with the dataset size. In machine learning, we state that bigger the data, better the performance. It does not hold true as we are looking at collecting all possible Covid-19 manifestations, rather than just increasing number of images. We have not observed better results from bigger datasets.

    We are aware of the situation that collecting data for Covid-19 during the beginning of the year 2020 is not trivial. Authors, however, worked on a fairly large dataset of size 46,096 images (chest X-rays) in late 2020 as compared to a dataset of size 100 images or so (early 2020). It, again, does not really guarantee whether imaging tools are ready for mass-screening. If so, then how big data is big? Machine learning tools require to learn all possible manifestations that are related to particular diseases (Covid-19, in our case) not just the size of the dataset. Dataset size, however, opens the possibility of having new cases (i.e. manifestations), which is always not the case.

  2. 2.

    Model fit (over-fitting and under-fitting cases): Apart from model fitting issues, multiple works suggest using deep CNNs. However, comparing them with shallow CNN networks, we find out that it shows marginal differences in performance. The advantages of computer vision tools in this modern era have allowed researchers to leverage datasets of any size and focus on methods that guarantee better performance in validation and testing, both internal and external. Traditionally, in machine learning, under-fitting and over-fitting situations are explicitly discussed/analyzed. They, however, have not analyzed well in Covid-19 screening tools (see Tables 1 and 2). More often, authors were engaged in producing better performance scores by tuning (hyper)parameters. If it is the case, the possibility of having better results can be due to test set contains similar images as in the train set. Of all, a hold-out validation approach is one of the issues. Also, performance can be biased when imbalanced datasets are used.

  3. 3.

    Transfer learning: In deep learning era, the idea of transfer learning plays crucial role in computer vision field. It focuses on gaining knowledge while solving one problem and applying it to different but related problems. The primary idea is to initially train models from a larger dataset to understand basic details (e.g. visual cues, such as edges, nodes, shape). The trained models can then be used for target dataset so learning trivial features is possible. For Covid-19 imaging tools, we observe that a handful of authors used transfer learning. They, however, did not provide explainable features/models, rather than just better scores. This brings an open question: do their performance scores state that their imaging tools (with transfer learning) are robust enough to generalize?

  4. 4.

    Data augmentation: Availability of the data is a serious challenge/issue in deep learning, especially in healthcare. Even when there exists sufficient data that are collected in one domain, the trained model may not necessarily be generalize to another application (even in the exact same domain but different application. It requires domain adaptation, which is a sub-field of transfer learning that helps alleviate the domain shift in such cases. Covid-19 is no exception to this.

    Data augmentation is often used in data analysis to increase the available raw data by adding slightly modified copies of the source or, in some cases, the synthetic image generated from existing data. In general, it includes horizontal or vertical flips, rotation, noise injection, cropping, color modification, and random erasing. Although data augmentation has largely contributed in general object detection and recognition, it faces challenges when it needs clinical experts that are seeking for clinical implications. As in computer vision domain, even though the process seems trivial, augmented data may not carry clinical significance (e.g., Covid-19 + ve, lung cancer, pneumonia, or normal classes).

Conclusion

In this paper, for Covid-19 screening, we have analyzed 40 research articles (16 CT + 24 X-ray) other than pre-prints and conference proceedings. In our analysis, we are limited to medical imaging tools whether their performance scores are based on the dataset size. In both image modalities: CT and X-ray images, we have observed that the performance was not improved in accordance with the dataset size. In addition, we have noticed the possibility of over-fitting in early 2020. On the other hand, we have not observed that a large dataset improved results since it did not guarantee whether we had all possible Covid-19 manifestations. Besides, we have observed that data augmentation worked well in improving results. We, however, did not find that whether the augmentation process can possibly create new Covid-19 manifestations. As reported in the computer vision domain, transfer learning could possibly build Covid-19 deep learning model ready with fewer data. It did not hold true for Covid-19 cases as most of them are limited to education and training.

Therefore, for such a Covid-19 outbreak, we are required to deploy AI-driven Covid-19 screening tools that consider active learning with an aim to develop cross-population train/test models [15]. Active learning helps learn data over time so we are not required to wait for weeks, months, and years to build AI-driven tools.