1 Introduction

In December 2019, an epidemic of mysterious pneumonia in Wuhan was affected by a new coronavirus infection named COVID-19 [1]. Throughout human history, coronaviruses are among the range of cold-causing viruses as well as more dramatic respiratory conditions such as severe acute respiratory symptoms (SARS) and Middle East respiratory syndrome (MERS), which have high mortality rates between 10% for SARS and 37% for MERS. The newest coronavirus called (SARS-CoV-2) by the International Committee on Taxonomy of Viruses showed a severe respiratory syndrome [2]. This syndrome was caused by the COVID-19 virus as approved by the World Health Organization (WHO) which can be transmitted by humans’ contact. By the end of January 2020, the WHO announced a global emergency regarding public health to overcome the occurrence of COVID-19. The humans’ current situation requires the need for a quick and robust identification system that may be exploited in hospitals and clinics for helping in COVID-19 diagnosis and management.

Computed tomography (CT) images are likely to be increasingly used for the management of COVID-19 virus due to the fast, noninvasive and painless process of images scan on the patients and thanks to its high ability to identify the very small nodules in the human lung as was proven for the diagnosis of cancer at its curable stage [3]. The chest CT protocols previously made were implemented using three systems: Optima 660 developed by the United Imaging of China, GE in America, Somatom Definition AS+, and Siemens in Germany on patients with the position of supine. The reconstruction of scanned images was intensively analyzed by radiologists who classified chest CT images as positive and negative for COVID-19 based on the following main features [4].

Fig. 1
figure 1

Main features for COVID-19 Infection Detection at early stage

  • Ground-glass opacity (GGO) which is the most relevant features in COVID-19 screening in the initial phase with presenting a uni-focal lesion most probably identified in right lung at the inferior lobe (Fig. 1a);

  • Crazy paving (CP) which is commonly correlated with the GGO at the later stage of the disease (Fig. 1b);

  • Vascular dilatation (VD) which is a typical feature by finding widening of the vessels in the lungs (Fig. 1c);

  • Traction bronchiectasis (TB) as a common discovery in the ground glass areas (Fig. 1d).

We should notice that the severity of CT is estimated by a visual assessment of lung involvement as a late diagnosis made by experts [5]. In this study, we focus on the diagnosis at the early stage by making an aid-screening for detecting features given in Fig. 1.

Deep learning has gained more interest from researchers in the field of medical image analysis. It provides several advantages in terms of relevance at the level of features representation and of precision at the last layers to output a prediction [6]. Authors in [2] studied the relevance of the mentioned features presented in Fig. 1 by studying 51 patients with a COVID-19 infection diagnosis and they reviewed the clinical features of obtained CT images. This study showed that the COVID-19 virus affects all lobes. The GGO pattern is the main sign of COVID-19 lesions on CT images. All images showed irregular areas of GGO in 49 of the 51 (96.1%) patients, the crazy-paving pattern in 36 (70.6%) patients, the vascular dilation affects 42 (82.4%) patients, and the sign traction bronchiectasis in 35 (68.6%) patients.

Authors in [7] have proposed an automated early detection system of COVID-19 using a deep learning technique based on the combination of a convolutional neural network (CNN) and long short-term memory (LSTM) to detect infection in lungs from X-ray images. In this paper [11], the sequence of a CNN followed by an LSTM aims to learn both efficiently features from X-ray image and its ground truth. The experimental results display that the proposed system reached an accuracy of 99.4%, specificity of 99.2%, and F1-score of 98.9%. As a present diagnostic challenge for illness caused by COVID-19, authors in [5] presented a thoracic-based imaging for more specific i diagnosis of coronavirus disease. They evaluated the performance of the thoracic imaging-based system on suspected COVID-19 persons. They affirmed that chest CT and X-ray are sensitive and moderately specific for the diagnosis of COVID-19 which justify the use of this imaging modality in our work. The proposed system achieved desired results on their currently available dataset.

Fig. 2
figure 2

The proposed method

In [8], authors proposed a hybrid architecture which includes BiLSTM layers after features extraction from CT images to learn the probability infection of COVID-19. The pipeline of a CNN and recurrent neural network has proved its efficiency for the diagnosis of SARS-COVID-19 infection. The results illustrate that their proposed model shows an outstanding success in infection detection. Authors in [9] proposed an optimal CNN hyper parameters and a data augmentation process, which significantly increase the accuracy of CNN in COVID-19 detection. The proposed architecture called CovidXrayNet has achieved an accuracy of 95.82% on their collected COVIDx dataset.

Based on the previous works proven in the case of Cancer-Aided Computer Systems which have shown impressive results [10], the success of classification of CT images for a diagnosis of infection of the lungs by a COVID-19 virus depends essentially on the relevance of the representation of the nodules in the images [11, 12]. The performance of CNN’s is showing a very impressive accuracy in the fields of pattern recognition and computer vision and the CNN architecture may be able to extract more discriminant features to represent nodules in lung CT images. Many researchers have applied machine learning techniques to develop models that generalize to make good predictions on unseen samples.

The contributions of the paper are presented as follows:

  • Design and building of a deep convolutional neural network model to correctly detect patients with COVID-19 to assist in the early diagnosis;

  • Investigation on the impact the chromatic space color YCbCr and the morphological transformation of CT images to enhance the COVID-19 detection;

  • Carrying out an empirical analysis of our deep model in order to classify COVID-19 disease;

  • Evaluating the performance of our model compared to other existing well-known models;

  • Deploying the convolutional neural network model in a desktop applicationFootnote 1;

  • Helping researchers to make an artificial intelligence based solution to fight the COVID-19.

The rest of the paper is organized as follows. Section 2 details the materials used to develop our morphological convolutional neural network. Analysis of the proposed model is presented in Sect. 3. Then, we present how we have evaluated the performance of the proposed CNN and the analysis of obtained results, and Sect. 4 concludes this study.

2 Methodology

The main contribution of this study is an evaluation of networks of accumulative depth using a neural architecture with very small \((3\times 3)\) convolution filters which demonstrate that a significant enhancement on the existing configurations can be reached by reducing the depth to 3 layers. We have made our best-performing model available to facilitate further research on the use of deep neural representations in fighting epidemic with computer vision tools. The pre-processing made before the proposed CNN architecture plays a major role in improving accuracy’s classification by using an erosion followed by dilation to remove white noises and shrinking the region of infection in CT images. Once the noise is deleted by the erosion, the region of interest area increases by the dilation as shown in Fig. 2. In this work, we convert all CT images to the YCbCr space color [13] to cope with the problem of scan reflection in images by separating the luminance component from the other channel of the images.

As result, we obtain a very significantly accurate CNN architecture which not only gives better results on the collected data but also achieves this performance in a very simple pipeline and a low computational cost due to the lack of data for training models.

2.1 Morphological Initialization

Morphology operation is a shape-based transformation of an input CT image. It consists of a transformation of the value of a pixel using a comparison with its neighbors. For this reason, we intend to initialize the neural network that we will develop to highlight the difference between regions on the lungs once it is segmented [14] to detect and to represent the area of interest that can inform experts about the infection.

Fig. 3
figure 3

The morphological transformation of all CT scans

The most relevant morphological operators used in object detection were erosion and dilation. The pipeline of the morphological transformation of all CT scans is illustrated in Fig. 3.

Thus, erosion (\(\ominus \)) is based on reducing the size of the pattern in CT images to delete the anomalies in the object to represent through the convolutional neural network. Let us denote an input image as f(x) and the gray-scale structuring element by b(x). Knowing that B is the space that b(x) is defined, the erosion of f by the structuring element b is given by Equ. (1) as presented in [14]. The dilation (\(\oplus \)) is applied to fill in holes and separate areas to connect it by adding pixels as given in Equ. (2) [14]. This operator is used to increase the size of infection in the lungs’ images.

$$\begin{aligned} (f\ominus b)(x)= \inf _{y\in B}[f(x+y)-b(y)] \end{aligned}$$
(1)
$$\begin{aligned} (f\oplus b)(x)= \sup _{y\in B}[f(y)+b(x-y)] \end{aligned}$$
(2)

2.2 Space Color Transformation

In this study, we notice that the CT scan images of the lungs reflect significant illumination from the scanner. The default color space used in CNN architecture is RGB.

Fig. 4
figure 4

From RGB to YCbCr transformation of a sample from the CT scan dataset

In order to reduce this intense lighting effect of the scanner on the images of the lungs, we choose the color space YCbCr which separates the luminance component Y from the Cb and the Cr components. The conversion of all transformed CT scans using the morphological operators into the YCbCr space color is given in Fig. 4. This transformation has shown its relevance to highlight the nodules in the CT scans by reducing the white region caused by the illumination reflected by the scanner.

2.3 Architecture

In this study, we investigate the development of new CNN architectures with different depths of convolution layers. Regarding the small nodules found in the CT scans of Lungs for the COVID-19 detection, the motivation was to propose a CNN architecture with a local focus on all details of CT images. The lack of data in our case is still blocking the training of convolutional neural network architectures from scratch. Many works in the state of the art are using the transfer learning-based approach for image classification by taking a pre-trained CNN and use it as a start model to learn a new classification task. This method has gained interest from researchers due to the use of the fine-tuning of CNN architecture with transfer learning which is much faster than the training from scratch. It is based excluding the top of Convolution Neural Network and replacing the fully connected layers by new dense layers to learn new classification tasks. This transfer learning-based approach archives impressive results when the new task of classification is similar in terms of images to the pre-trained models. In this study, we cope with the non-similarity between the CT scan images and the images coming from ImageNet that represents objects among 1000 classes [15], by adopting a freezing-based technique for the transfer learning task on the CT scans for COVID-19 detection. Indeed, this technique of freezing some layers before the fully connected stage is used to control the training of weights. Once a layer will be frozen, its weights will not be modified. The motivation for a freezing-based learning approach is that the existing CNN architectures are suitable for recognizing objects such as people, tables, cars, cats, etc., and the lack of data from COVID-19 infected patients. In this study, we develop a CT image classification system for COVID-19 detection. The images from the lung scanner differ from the images used in ImageNet. For this reason, the transfer of learning by replacing only the last layers has been tested and has not given good results. The representation of the characteristics presented at the last fully connected showed relevance to differentiate the objects of ImageNet but without any relevance to classify images of the scanner. Indeed, CT images represent features of different textures, shapes, and colors. In this work, we studied many CNN architecture AlexNet [16], VGG16 [17], VGG19 [18], GoogleNet [19], and ResNet [20]. The residual convolution neural network (ResNet) was proposed in [20] and has been trained on over a million images for the ImageNet database [21]. Once it is learnt, the ResNet architecture provides relevance features representation from images. We adopt in our freezing-based transfer learning approach of the CNN architecture, the residual block implemented in ResNet. The network requires input images of size 224-by-224-by-3. We use an augmented image data store to automatically resize the training images; then, we agree a supplementary augmentation to perform on the training images the following operations: randomly flip the training images along the vertical axis, and randomly translate them up to 5 pixels horizontally and vertically to avoid the total changing of the CT images. This step of data augmentation helps to prevent the network from overfitting. We perform after the locked layers during the training a dropout value of 0.5 to keep only neurons with sure values to predict an input CT scan and we have demonstrated how to efficiently train CNN without the need to train it from the scratch. On the other side, the proposed freezing-based learning approach of the ResNet-101 has accelerated the training time as the backward passes descend in number and the results decrease in computation time.

In this paper, the residual deep neural network is used with pre-trained parameters on ImageNet database and is performed to extract sparse and pertinent numerical representations of spatial features from CT images.

The architecture is composed of several residual blocks with three layers composed of five composite convolutional layers including small kernels sizing by 7 \(\times \) 7, 1 \(\times \) 1, and 3 \(\times \) 3; the target is obtained from the average pooling operation carried out on the final feature map of the network followed by the fully connected layer. The 1 \(\times \) 1 \(\times \) 2048 resulting from the last pooling layer is considered as 2048-dimensional residual features generated from the reused pre-trained model in a feed-forward pass. Then, we compute the product within weights and the small regions of interest and an identity function must be performed at each additional layer as one of its elements which denotes that when we train the new additional layer into an identity mapping: \(f(x) = x\), we got an effective model as the initial one. The new model may deliver a better solution to tune the new model on the training dataset, and to extract deeper sparse and pertinent residual representations of spatial features from the CT images since the additional layer can facilitate reducing training errors.

2.3.1 Training & Testing

The training process of the proposed architecture follows the work done in [22]; the back-propagation based on the gradient descent algorithm proposed by LeCun et al. [23]. The learning rate is initialized at \(10^{3}\) with three decreases by a factor of 10 once the validation performance does not continue to be improved. We set the batch size at 32 images and the number of epochs equal to 17. Moreover, the network’s weights have to be well initialized to surmount the instability of the gradient descent algorithm. We randomly initialized the weights that we sampled by a normal distribution with the zero mean-variance. All biases are started with zero.

At the testing phase and after obtaining the trained model, we will pass into it a set of test images to predict if the patient is infected by COVID-19 or not. We should notice that the test image will be pre-processed by an erosion followed by a dilation as made during the training process. The output is a class score map with the number of channels equal to the number of classes (COVID-19 or healthy).

2.4 Implementation Details

The implementation of the neural architecture for CT images classification is derived from the publicly available Math works toolbox.

An insignificant number of training data may lead to the over-fitting problem. To increase the size of the data of training and for more reliable prediction with deep learning of neural architectures that require a lot of training cases, the existing database is augmented.

The solution consists to generate a synthetic data to be included in the training images’ set. We split original data into two sets randomly: training (70%) and testing (30%). Afterward, we create fake images based on different techniques of data augmentation. The first generated data are obtained from an image rotation. The angle is from \(-10^\circ \) to \(10^\circ \) in the step of \(2^\circ \) to avoid the major transformation of the lungs’ CT images. Thus, the zero-mean Gaussian noise with a variance of 0.01 was employed. Then, we perform an affine transform that randomly generates five different parameters for each original image. Finally, the last translation from -1 to 1 is applied to each image from the training set. To overcome the lack of training data during the learning process of the proposed model that is complex with millions of parameters, we add an augmentation data before the training and we have a Gaussian noise within the stacked convolutional layers. The early stop of the learning of the proposed CNN is included to provide the appropriate epochs to avoid the over-fitting of the model.

Fig. 5
figure 5

Two samples from the training part of COVID-19 and Normal Cases

In order to make a comparison with work done by authors in [22], we proceed to a partition of the database into three partitions: training set, validation set, and testing as detailed in Fig. 5. We kept in this study the same partition data used in the baseline method [22] as illustrated in Fig. 6.

Fig. 6
figure 6

Two samples from the training part of COVID-19 and normal cases

3 Experimental Study

3.1 Materials

This study includes 143 patients with a screening of COVID-19 infection confirmed. The dataset was collected from 760 preprints on the COVID-19 virus and it was published from January 19th to March 25th, 2020. The figures for CT scans were collected from scientific research using the PyMuPDF3 documentation tool.

As the arrival of COVID-19 is current, we did not find a repository that contains COVID-19 labeled database, thus motivating us to rely on different sources of images of normal, pneumonia, and COVID-19 cases. In fact, we rely our proposed research on an open-sourced dataset COVID-CT, which contains 349 COVID-19 CT images from 216 patients and 463 non-COVID-19 CTs. This available database was proposed by authors in [22]. They perform a baseline method based on the transfer learning approach using the DenseNet architecture [22] and they achieved an accuracy of 84.70%.

Authors in [22] have manually selected all suitable CT scans and the annotation of these images was based on the text describing these figures. The final dataset collected presents 257 CT scans annotated as a positive case for COVID-19 with different sizes. Figure 3 shows some sample of CT scans used for training, validation, and testing set. We set a global learning rate as a start value to 0.001; then, it will be decreased to a lower rate of 0.0001 to train more iterations using a batch size of 10 images due to the non-availability of much CT images. The training process of the database will be repeated 17 epochs.

The proposed model is written in MATLAB and it was trained by shifting the computation-intensive tasks of the proposed CNN to the GPU to ensure the speed of the training process. To implement our convolutional neural network architecture, we used the latest version of MATLAB and its deep learning framework. On the other hand, our hardware setup is as follows: the training and evaluation were fulfilled on a 7 core laptop having 20Gb of RAM. The hard disk is sized of 1Tb. For optimization, we referred to Adam optimized while setting the learning rate to 0.001 and 0.0005 for the weight decay. We follow the hyperparameters’ optimization study illustrated in [9].

3.2 Evaluation Setup

We present in this section the results of the CT images classification results reached by the described convolution neural network architecture on the collected database which includes images of two classes and is split into three sets: training, validation, and testing. The classification performance is evaluated using two measures: the accuracy and the loss of training and validation processes given, respectively, by Equ. (3) and (4) as computed in [24]. The accuracy metric is generally used to measure the performance of the learning of a model. It is computed as represented in Equ. (3).

$$\begin{aligned} Accuracy=\frac{Number\,of\,correct\,predictions}{Total\,number\, of\,predictions} \end{aligned}$$
(3)

The loss function is computed as the difference between the predicted class and the true value using the learned model. In this work, we use the most common loss function that is cross-entropy as defined in Equ. (4)

$$\begin{aligned} Cross-entropy=\sum _{i=1 }^{n}\sum _{j=1 }^{m}y_{i,j}\times log(p_{i,j}) \end{aligned}$$
(4)

Let us note that \(y_{i,j}\) (binary) is the true value, and the \(p_{i,j}\) denotes the probability of prediction of I belonging to class j. In this work, we evaluated the proposed CNN architecture for CT image classification. It was proved that the representation depth is interesting for the accuracy of the classification of COVID-19 patients from CT images of their lungs. In this experimental study, we use the precision and the recall of the predicted classes through the proposed CNN architecture as evaluation metrics. For each CT image from the testing set, the model outputs the highest predicted class. Then, it will be compared with its real label. The precision metric denotes the number of well-classified images. The recall is used to analyze the ability of the model of recognizing each class’s COVID-19 and non-COVID-19. It is computed by dividing the number of correct classification by the number of real labels in each class. Once the model is trained, the classification of an input image will come under on among the following categories: true positive (TP), true negative (TN), false positive (FP), or false negative (FN). All evaluation metrics used for performance evaluation recall in (5) and specificity in (6) are computed following the experimental study made in [25].

The F1-score as explained in Equ. (7) is computed to measure the mean average precision of the precision and the recall values. To compute the balance between classification performances on both classes, we compute the g-mean metric given in Equ. (8) [26].

$$\begin{aligned} Recall= & {} \frac{TP}{FN+TP} \end{aligned}$$
(5)
$$\begin{aligned} Specificity= & {} \frac{TN}{FP+TN} \end{aligned}$$
(6)
$$\begin{aligned} F1-score= & {} 2\times \frac{precision \times recall}{precision + recall} \end{aligned}$$
(7)
$$\begin{aligned} g-mean= & {} \sqrt{Recall \times Specificity} \end{aligned}$$
(8)

All these metrics are computed from the confusion matrix given in Fig. 7 that illustrates the correctness and accuracy of the model.

In this experimental study, we set the Learning rate to a small vale (0.001) to slow down the learning of the newly replaced layers after freezing ResNet-101. We set the epochs to 17. The mini-batch size is set to 20 to ensure that all CT Train images are used during each epoch.

3.3 Results and Discussion

The experiments, made in this study, analyze the performance of the freezing-based learning approach adopted to train the proposed architecture on the CT scans’ image. We use the ResNet-101 as a backbone in this work, by locking the 240 first layers in the selected architecture. Many classification tests are carried out to evaluate the impact of the lock over the ResNet-101 architecture illustrated in Fig. 7.

Fig. 7
figure 7

Impact of the learning method and the frozen layers in the proposed architecture training

Table 1 Impact of the morphological and the space channels transformation on the CT images classification using the proposed architecture
Table 2 Comparison of the proposed architecture within the well-known CNN architectures and the baseline [22]

We should notice that the ADAM algorithm [27] which is used for the backpropagation training shows its speed in finding the best weights of the frozen CNN layers and the fully connected at the top of the proposed model. We achieve the best validation loss in less iteration compared to the logistic function (sigmoid with \(\sigma =1\)). In radiological tasks, there are two challenging problems to address: the issue of the over-fitting of the neural training and the lack of sufficient data to support the multi-parameters’ learning. These challenges have to be leveraged on radiological diagnostic to increase the performance of patient care. The experimental result on the available database of CT scan images demonstrates that the deep morphological CNN architecture achieves better performance than recent works done on CT images classification for COVID-19 detection.

The robustness of the approach proposed in this work comes from the pre-processing stage which makes it possible to reduce with the conversion in space of the color YCbCr the effect of the reflection of the scans. Thus, the morphological operations were able to give a little volume for the nodules in the CT images of the lungs to be able to represent them well.

Table 1 presents the obtained improvement in terms of accuracy with and without the step of pre-processing.

Fig. 8
figure 8

Confusion Matrix of the proposed CNN model on the testing partition)

Fig. 9
figure 9

Two COVID-19 samples miss-classified with the proposed model

Fig. 10
figure 10

Two non-COVID-19 samples miss-classified with the proposed model

The use of the proposed architecture as a base to find the best representation of nodules in CT images of the lungs has shown a great reduction in classifying the infection of patients with COVID-19. Table 2 illustrates an exhaustive comparison between most popular CNN architectures on the CT images for the COVID-19 detection.

Fig. 11
figure 11

AI4COVID-19 Detection from CT Scan Tool-V1.0

The proposed model has in many cases predicted false classifications as shown in Fig. 8. We achieve 93.80% as classification rate for the COVID-19 CT scans and 81.70% as accuracy of healthy cases’ classification rate.

We should notice that the proposed system is very affected by the intensive illumination in some case due the scan reflection in the CT images. To examine the limitations of the proposed architecture, we analyze some relevant miss classification of CT scans in both cases: COVID-19 and non-COVID-19. Figure 9 illustrates two samples of images with the miss-classification of the COVID-19 class. In fact, we obtained, respectively, classification scores of 0.38% and 0.47% on these two samples. This is can be explained by the equitable distribution between the pixels of the lungs that surround the COVID-19 nodules.

Table 3 Computational times comparison of proposed approach in training and testing phases with transfer learning of well-known CNN architectures

Regarding the failed classifications for the non-COVID-19 CT scans, we note that the images almost completely omit the nodules in the lungs to detect the affect by the corona virus. The illustration of this false classification is given in Fig. 10.

The experimental result of our model will be compared with state of the art in terms of training and testing times. Table 3 depicts the comparison in terms of computational times in seconds for different models tested in this work. For the AlexNet, the elapsed time for train and test phases was 74569.55 s and 145 s. VGG16 required 55896.36 s to be trained and 241 s for a single test. Regarding CNN-LSTM and CNN-BiLSTM take more time as presented in Table 3. Regarding our proposed model, the computational time in training phase is more important than other architecture due to initialization morphological transformation and the space color conversion but it presents an interesting elapsed time in test phase which requires only 105 s.

After the validation of the buried model on the collected data, this study proposes a new tool for the detection of the COVID-19 virus from CT scans of the lungs. This tool in its first version is made available to researchersFootnote 2 and is illustrated in Fig. 11.

4 Conclusion

This paper proposed a CBIR system using a new configuration of a convolutional neural network for detection of COVID-19. A Morphological transformation initializes the CNN learning. It was observed that an ensemble method using morphological pre-processing and convolution neural network (CNN)-based transfer learning, increased the detection performances comparing to traditional CNN alone. Moreover, we argue that it seems to be a promising new technique for image classification. In fact, in this paper, we propose an optimal CNN hyper parameters and data augmentation to increase the accuracy of diagnosis.

Moreover, we propose a morphological deep convolutional neural network architecture that is responsible for setting the recent works done on the classification and detection of the COVID-19 virus from CT images in the lungs. The proposed architecture combines the advantages of the pre-processing of images by using the morphological transformation of the CT scans and the new findings in CNN base deep learning to represent and to classify medical images. The experimental study made in this paper shows that the proposed architecture ranks to the list of the state-of-the-art methods. The miss classification of the proposed model is due to the challenging tasks of small pattern detection and the limited relevancy of visual features. We will investigate that in our future work.