Introduction

Since December 2019, an outbreak of rare disease has been reported that has significant inflammatory effects on the respiratory system, particularly the lungs. The disease was first reported from Wuhan in China. The official report of the disease was presented in January 2020, and it was named COVID-19 disease or, more specifically, coronavirus with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. Similar to other diseases that have emerged in the Middle East and Africa in recent years, such as severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS), this disease is also associated with acute respiratory distress syndrome (ARDS) [2]. Due to the high number of patients referred to medical centers and the need to perform the diagnosis and treatment process, the real-time polymerase chain reaction test (RT-PCR) is currently considered the gold standard for the identification of COVID − 19 disease. However, because of the possibility of false-negative results, low sensitivity [3, 4], and the long time required to obtain the results, other diagnostic methods are being considered. Also, the severity of the disease is not measureable by PCR. In the meantime, computed tomography (CT) can be considered as a potential tool for the diagnosis of COVID-19 disease [5, 6]. Also, lung CT images contain important information as the stage of COVID-19 disease. Having a radiologist examine CT images of individuals suspected of having the disease allows for quicker decisions and isolates patients from others during the treatment process.

According to the guidelines, COVID-19 disease can be diagnosed on lung images CT with ground-glass opacity (GGO), a crazy-paving pattern, and subsequent consolidation in the lungs [7, 8]. As for the time course of COVID-19 disease, there are four stages: the early stage (0–4 days after the onset of the disease), the progressive stage (5–8 days), the high stage (9–13 days), and the absorption stage, which occurs two weeks after the onset of the first symptoms [9]. At these stages, there are differences in the histological tissue patterns that can be observed on CT images of the patients’ lungs. CT images in the early stage showed GGO; CT images in the progressive stage showed an increase in the crazy-paving pattern; CT images in the peak stage showed consolidation, and CT images in the absorption stage showed gradual resolution of consolidation without crazy-paving pattern. The absorption stage shows that the disease is managed and the consolidation is slowly absorbed [9]. The importance of classification based on these stages is to help decide the length of time required to hospitalize patients, to determine the type of treatment process required for patients in each category, and to reduce the cost of care and treatment for hospitalized individuals. The temporal changes in CT lung images of COVID-19 patients have been the subject of several studies [10,11,12,13]. Interpretation of these images and their textural changes requires relevant specialists who are not present everywhere and is also tedious and subjective. Therefore, the application of radiomics and machine learning methods is increasingly required.

In recent years, radiomics and machine learning methods have been used to diagnose and even predict a variety of diseases, such as brain tumors [14], breast cancer [15], cardiovascular disease [16, 17], and Leukemia [18]. During the spread of COVID-19 disease, several studies have been conducted to use radiomics and machine learning algorithms for segmentation and categorization of CT images and consequently for detection of COVID-19 infection and prediction of patient’s condition [19, 20]. Ouyang et al. [21] introduced a sampling network that differentiated COVID-19 from the community acquired pneumonia (CAP) pulmonary inflammation by focusing on the infected areas of the lungs. In a study by Lin et al. [22] a radiomics model was developed for the prediction of COVID-19 pneumonia and the model was validated to help clinicians quickly and accurately rule out influenza virus pneumonia. They showed that the radiomics model has good performance in distinguishing COVID-19 pneumonia from influenza virus pneumonia. In another study, Yang et al. [23] proposed a classification method based on the traditional machine learning method with the use of the radiomics features of CT chest images to identify and diagnose patients with COVID-19 and non-COVID-19 pneumonia. Their results showed that radiomics features can classify COVID-19 patients and other patients who had other types of pneumonia. Kadry et al. [24] developed a machine learning system for the classification of CT images of COVID-19. The proposed algorithm uses a set of different methods, including multi-threshold, image separation using threshold filter, feature extraction, feature selection, feature-fusion, and classification. In a study by Fu et al. [25], radiomics and a machine learning-based tool were used to evaluate the prognosis of COVID-19 patients. The data were divided into two groups including a stable group and a progressive group. The imaging features from whole-lung images were extracted. They concluded that radiomics signature of the whole lung which was based on machine learning method could reveal the lung microstructure changes. It also can help to indicate the progress of the disease. In a study by Tan et al. [26], an automatic machine learning method was used to predict the clinical types of COVID-19 pneumonia by analyzing the non-focus region of the lung CT images of patients. In a study by Wang et al. [27] feature engineering methods were compared to know image features in CT images of COVID-19 pneumonia and to predict those features which are significant for this pneumonia. Then, specific deep learning and radiomics features were determined which could be used with these two methods to help the human diagnosis of COVID-19 pneumonia. Guiot et al. [28] developed an automated artificial intelligence framework to extract radiomics features from chest CT images. The obtained results showed that the presented artificial intelligence framework could accurately differentiate COVID-19 cases from other clinical cases. In a study by Wang et al. [29], the efficacy of radiomics was evaluated for the diagnosis of patients with COVID-19 and other types of viral pneumonia, based on CT images from patients. Those radiomic features which were significantly associated with the classification of COVID-19 disease were determined using multiple classification algorithms. Finally, Liu et al. [30] investigated the application of chest CT radiomics for diagnosing COVID-19 pneumonia and an open-source diagnostic tool was developed based on the constructed radiomics model. They concluded that the combined radiomics model could improve the performance of the clinical model and COVID-19 reporting for the diagnosis of COVID-19 pneumonia.

In the recent studies mentioned, classification was limited only to the presence or absence of COVID-19 disease. However, there is a low study to determine the staging or prognosis of this disease using radiomics. In two recent studies, CT lung image segmentation has used to automatically quantify infection regions and then classified the COVID-19 patients into 3-class severity [31, 32]. However, these few methods are based on the segmentation of lesion of lung involvement which is a problemist task. The aim of this study is to classify COVID-19 patients based on lung CT images into normal, early, progressive, peak, and absorption stages by using automatic radiomics based on a combination of texture statistical features and Random Forest (RF) classifier without any need to image segmentation. This proposed method plays an important role in predicting the course and progression of the disease and the condition of the infected patient and will in turn be used to guide decisions regarding the length of hospital stay, making better decisions to cure and manage the costs of maintenance.

Materials and methods

The present study attempts to use the radiomics approach to classify the CT images of 683 individuals with suspected COVID-19 into five stages: normal stage, early stage, progressive stage, peak stage, and absorption stage, depending on the involvement of the lung parenchyma. This procedure was carried out on the basis of the extent of the changes noted in the various slices of the images obtained from the patients’ lungs. The features of the images relating to each individual were extracted and defined as a basis for assessing the classification of the suspect into a particular stage of COVID-19 disease. In the preprocessing step, CT slices were taken from all thoracic sections, which contained lung organs. Then, different algorithms were used to extract the appropriate features. For this purpose, the first order statistical texture features extracted from the image histogram including variance, skewness and kurtosis [33] and second-order texture features included the gray-level co-occurrence matrix (GLCM [33, 34], 22 features), the gray-level run-length matrix (GLRLM [33,34,35], 13 features), the gray-level size-zone matrix (GLSZM [33, 34], 13 features), and the gray-level neighborhood difference matrix (NGTDM [34, 36], 5 features) were extracted. Consequently, a total of 56 statistical features (including 3 first-order features and 53 s-order features) were extracted from each CT image. Then, for each participant, these mentioned features are extracted from all CT images. The sum of these features was considered as the feature set of the participant. Finally, based on the extracted features, individuals were classified into five stages: normal, early, progressive, peak, and absorption stages of COVID-19 disease. The results were compared for the same number of features with three other commonly used classifiers: linear discriminant analysis (LDA), artificial neural network (ANN) and support vector machine (SVM). All calculation was done using MATLAB R2019b software.

Database

Lung CT images were collected from 683 subjects (408 men and 275 women) with suspected COVID-19. Chest CT was performed using Hispeed CT Dual Slice Scanner (GE healthcare, USA) at Qaboos Teb Golestan Medical Imaging Center (Gonbad-e-Kavous, Iran). The images were acquired in spiral mode. Slice thicknesses were 10 mm, intervals were 10 mm, and velocity was 15 mm/rotation. The average number of CT axial images for a patient is 24 and was stored in the standard Digital Imaging and Communications in Medicine (DICOM) format with a size of 512 × 512 pixels. The chest CT is used to assess the stage of lung involvement in COVID-19. The lung findings on the chest CT scans were GGOs that grew in size and developed with a crazy-paving pattern and subsequent consolidation. A quantitative system was used to assess the stage of lung involvement including normal, early, progressive, peak and absorption stages of COVID − 19 disease. The evaluation and labeling of the CT images of different patients in terms of the desired stage of disease was based on the diagnoses of two independent radiologists with at least 10 years of experience in thoracic radiology and thousands of patients examined. The final report is specified by consensus. Evaluation of previous studies has shown that radiologists are widely used for the evaluation of CT images in the diagnosis of COVID-19 disease [31, 32]. The numbers of subjects in normal, early, progressive, peak, and absorption stages of COVID-19 disease were 320, 82, 108, 73, and 100 cases, respectively. Sample CT images of lung slices for the five classes are presented in Fig. 1. CT images in the early stage indicated ground-glass opacities; in the progressive stage indicated an increase in the crazy-paving pattern; in the peak stage indicated consolidation, and in absorption stage indicated gradual resolution of consolidation without crazy-paving pattern. This research was performed based on ethical guidelines of Sabzevar University of Medical Sciences (IR.MEDSAB.REC.1399.152).

Fig. 1
figure 1

Lung CT images illustrating histological pattern including a early stage (0–4 days) with ground-glass opacities, b progressive stage (5–8 days) with an increase in the crazy-paving pattern (interloper septal thickening), c peak stage (9–13 days) with consolidation, d absorption stage (≥ 14 days) with gradual resolution of consolidation without crazy-paving pattern (reticular). The images are related to the sagittal plane

Preprocessing

A set of lung CT images comprising about 20–30 slices were acquired for each participant. Then, the upper and lower part of the lung CT slices was excluded for finding the lung organ completely. Therefore, the images that clearly contained the lung volume were selected. Then both sides of the vertical and horizontal parts of each image were cut off.

Feature extraction

First- and second-order texture features were used to classify patients into the mentioned five stages. These statistical features were determined on the basis of changes in intensity in points relative to the other points by providing information about the gray levels of the CT image. The statistical methods for extracting first and second-order texture features used in this study for each image include: Variance, skewness and kurtosis [33] as the first order and GLCM [33, 34], GLRLM [33,34,35], GLSZM [33, 34], and NGTDM [34, 36] as the second order.

The GLCM feature extraction method was selected based on the properties of gray level changes in neighboring pixels [33]. This method, the most common feature extraction algorithm, is a technique for extracting second-order statistical features from images. It represents a matrix whose number of rows and columns corresponds to the number of gray levels in the image. The matrix element P (i, j | ∆x, ∆y) is defined as the number of times at which two pixels with distances of (∆x, ∆y) occur in a defined neighborhood. Various forms of neighbor pixels can be defined, of which the most common are: Single-pixel, double-pixel, and three-pixel steps, and consideration of 0-degree, 45-degree, 90-degree, and 135-degree neighborhoods. For these different neighborhoods, the following properties were evaluated on the pixels: energy, entropy, dissimilarity, variance, contrast, inverse difference, correlation, homogeneity, autocorrelation, cluster shade, cluster prominence, maximum probability, sum of squares, sum average, sum variance, sum entropy, difference variance, difference entropy, information measures of correlation, maximal correlation coefficient, inverse difference normalized, and inverse difference moment normalized. The total number of these features is 22 and was calculated separately for each image. These 22 GLCM features were applied in four main degrees (0°, 45°, 90°, and 135°) in three steps. Then, the GLCM features were averaged in four main degrees and 3 kinds of steps.

GLRLM is another feature extraction method commonly used to extract high-level second statistical texture features from medical images [34]. In this technique, the number of pairs of pixels with gray scale value and their run length are calculated for a given region of interest. This method is also used in four main directions (0°, 45°, 90° and 135°). Then the GLRLM features are averaged in four main directions. The following features are routinely extracted for the GLRLM method: short run emphasis (SRE), long run emphasis (LRE), gray-level nonuniformity (GLN), run-length nonuniformity (RLN), run percentage (RP), low gray-level run emphasis (LGRE), high gray-level run emphasis (HGRE), short run low gray-level emphasis (SRLGE), short run high gray-level emphasis (SRHGE), long run low gray-level emphasis (LRLGE), long run high gray-level emphasis (LRHGE), gray-level variance (GLV), and run-length variance (RLV). The total number of these features is 13 and was calculated separately for each image.

GLSZM is an extended version of the GLRLM method, composed of large areas of equal intensity rather than small groups of pixels or segments in a particular direction [34]. The following features are routinely extracted for the GLSZM method: small zone emphasis (SZE), large zone emphasis (LZE), gray-level nonuniformity (GLN), zone-size nonuniformity (ZSN), zone% (ZP), low gray-level zone emphasis (LGZE), high gray-level zone emphasis (HGZE), small zone low gray-level emphasis (SZLGE), small zone high gray-level emphasis (SZHGE), large zone low gray-level emphasis (LZLGE), large zone high gray-level emphasis (LZHGE), gray-level variance (GLV), zone-size variance (ZSV). The total number of these features is also 13 and which was calculated separately for each image. Finally, NGTDM is another feature extraction method commonly used for extracting the second statistical texture features from medical images [36]. The following five features are routinely extracted in the NGTDM method: Coarseness, Contrast, Busyness, Complexity, and Strength.

A total of 56 features (3 global, 22 GLCM, 13 GLRLM, 13 GLSZM and 5 NGTDM) were extracted for each patient image. Then, for each participant, the sum of mentioned features from all CT images was considered as the feature set of that participant.

Classification methods

Four classifiers were applied as supervised machine learning algorithms to categorize CT images into five stages: normal, early, progressive, peak, and absorption stage of COVID-19 disease. The first method is LDA as a known linear model for classification [37]. In practice, it projects data from D-dimensional feature space to D’-dimensional space (D > D′) which maximize the distinctions between classes along with reduction of the variance within them. The second method is a feed-forward neural network (NN) which is designed to solve problems that are not linearly separable [38]. It consists of the input, output, and hidden layers. The input layer receives the input features and the output layer is performed the required classification. An arbitrary number of hidden layers that are placed in between the input and output layer are the main computational machine of the NN. The neurons in this method are trained with the back propagation learning algorithm. The third method is SVM as an accurate and widely used algorithm in classification [39]. SVM tries to find an optimal hyperplane between classes to provide the maximum separating between classes to be distinguished well. The SVM algorithm tries to choose among an infinite number of linear decision boundaries based on the principle of structural risk minimization. The last method is the RF method. The RF algorithm is easy to use and one of the most successful classifiers for multi-class problem classification. In most times, it achieves excellent results compared to other commonly used classification methods [40, 41]. The decision of RF was determined to be the optimal outcome by a majority vote between the results of the different trees, which responded independently.

Evaluation metrics

Due to the limited number of cases in the dataset, the 10-fold cross-validation technique was used for partitioning the data. In this method, the dataset is divided into 10 folds, from which 10-1 folds are the training data and the left-out fold is the testing data. This is repeated until all data are evaluated as a test data for one time. This would mean we have 10 different models for each classifier at the end and we have reported the mean and standard deviation of their results. As the experiment step, the total dataset was split into three sets randomly including training, validation, and testing sets which included 80%, 10%, and 10% of sample cases in the total dataset, respectively. The training data was used to develop an appropriate model which was modified using the validation data. In other words, the validation set is used to tune the model parameters like sigma in SVM and numbers of trees in the RF classifier. The performance of the developed model was calculated using the test data based on the accuracy metric. Therefore, in each fold, 90% percentages (80% training and 10% validation) of sample cases are used for determining the best parameters of the model and 10% percentages of sample cases are used for testing the performance of the developed model. It should be noted that the division between sets is made at the participant level. For example, in the Early stage with 82 people, 80% (66 people), 10% (8 people), and 10% (8 people) of datasets are used for training, validation, and testing sets. For each classifier (RF, LDA, ANN, and SVM), average (± standard deviation) performance measure into five stages over 10 different models is reported. Finally, the confusion matrix was calculated to show the performance of the proposed algorithm for each defined class compared to the reference label.

Results

The results of RF, LDA, ANN, and SVM algorithms for the prepared data are presented in Table 1. The data in this table are in terms of values of accuracy. The parameters of these classifiers were optimized by applying the trial and error method. The results of the selection of the best number of trees in the RF algorithm are presented in Fig. 2. The optimum number of trees is 65. It is demonstrated that the RF classifier performed more powerfully than the others and achieved the highest accuracy with a value of 93.55% in the detection of the five stages of COVID-19 disease from CT images compared to the other mentioned algorithms. The data in this table show that the RF classifier had acceptable accuracies with values of 96.25% in normal, 74.39% in early stage, 100% in progressive stage, 82.19% in peak stage, and 96% in the absorption stage of COVID-19 diseases. Finally, the confusion matrix of the RF algorithm was achieved and presented in Table 2. It should be noted, the average time spent on processing the proposed algorithms in MATLAB for each test data on a laptop computer (CPU: i7-3.5 GHz, RAM: 16 GB) is 300 s which is a good time in clinical applications.

Table 1 The results of classification accuracy using RF, SVM, LDA, and ANN algorithms in terms of accuracy for the combination of 56 extracted features (Global, GLCM, GLRLM, GLSZM, and NGTDM)
Table 2 Comparison of estimation of labels provided by RF algorithm with all combination of features against those assigned by the reference one (radiologist)
Fig. 2
figure 2

The results of the selection of the best number of trees in the RF algorithm with all combinations of features. The optimum number of trees is 65

Discussion

In the present study, in addition to diagnosing whether or not volunteers have COVID-19 disease, classification was performed to assess the stage of their disease, based on a radiomics and machine learning approach. This classification was based on CT images collected from the lungs of the people. A patient infected by COVID-19 disease can be classified into one of the five stages. These stages are: normal, early, progressive, peak, and absorption stage. The classification process which was presented in the present study showed that different stages of COVID-19 disease can be classified with good accuracy by radiomics and machine learning methods.

In clinical application, this approach is useful for the early determination of the stage of the disease. The importance of classification based on these stages is to determine the type of treatment process required for patients in the category and to reduce the cost of maintenance and treatment for hospitalized people, given the rapid spread of the disease worldwide. Moreover, the use of this processing approach helps in deciding the time period required for hospitalization of the patient. These assessments can also be useful tools for medical staff in caring for patients with this epidemic disease.

The collected CT images were classified by RF, LDA, ANN, and SVM algorithms in the determination of the stage of COVID-19 disease, and the results were compared with each other for the same selected numbers of features (Table 1). The four evaluated algorithms have resulted in good outcomes for many studies due to their high interpretability and simplicity in understanding performance. However, the comparison results showed that the RF algorithm has more accurate results, in terms of accuracy criteria and is accompanied by higher outputs. The RF algorithm, in general, always plays a dominant role in classifying different data. In other words, the rate of false diagnosis in the RF algorithm is to some extent lower. The reason for this effect can be due to the random batches in different classifiers. In fact, this algorithm, in the form of a continuous “if” and “then” batch algorithm, evaluates the condition of comparison between different features in successive classes, and at the end of the generated tree, leads to one of the desired classes. This process is checked for random sets of features collected in different trees, and the final result is determined by majority voting. However, the SVM algorithm compares the degree of separation of different classes relative to each other. In fact, this algorithm is a two-class algorithm that considers each class relative to the data of the other classes and calculates their separation. Therefore, it cannot accurately distinguish between four different classes in some cases. This issue justifies the lower accuracy of this algorithm compared to the RF algorithm and can be observed based on the data presented in Table 1.

For the RF algorithm, the number of different trees was examined to achieve the optimal number of trees. Based on the results of the selection of the best number of trees (which are presented in Fig. 2), the optimum number of trees in terms of accuracy is 65. The number of trees in the RF algorithm is needed to be tuned. Therefore, the classification error for different numbers of trees was calculated. As the number of trees increases, more decision units participate in voting and the performance of the classifier improves. The optimum number is where adding more trees does not significantly improve the result.

Based on the data in Table 1 and the confusion matrices of the RF algorithm in Table 2, the RF classifier had acceptable accuracies with values of 96.25% in normal, 74.39% in early stage, 100% in progressive stage, 82.19% in peak stage, and 96% in absorption stage of COVID-19 diseases. As it is evident from this table, people in the progressive and early stages are categorized with higher and lower accuracy, respectively. The higher accuracy for the progressive stage may be justified by the fact that the histological patterns in lung tissue are completely different in the progressive state, compared to the other stages. However, due to the correlation between the physical characteristics of the different stages of the disease, as mentioned earlier, the accuracy which was obtained in some stages (especially the early stage) seems to be slightly lower. In other words, the early stage includes blurriness of ground glass opacities and some squamous and consolidation tissues. In the progressive stage, the squamous tissue is observed to a higher extent and in the peak stage, the consolidation pattern is being dominant. Therefore, a small share of each of these features can lead to discrepancies in the identification of the stages. There is a need for additional information from the patients studied, such as clinical information and history of other diseases (such as diabetes, obesity, etc.) or the results of other tests of patients such as serological tests to determine the stage of the disease more accurately. For example, for a normal person who has been exposed to respiratory pollution for a long time in his normal life, there are observable signs of grounded glass opacity and he can be classified in an irrelevant class. Because, the general features of this normal test subject can be associated with the other stages and cause inaccuracies in the diagnosis process.

The obtained results in the present study show that these algorithms can achieve remarkably good accuracies. Table 3 shows a comparison between the results of this manuscript with other studies [31, 32, 42,43,44]. Compared to other studies, this work has some advantages such as combination of statistical texture features of all 3D CT slices of a patient and using RF classifier. Increasing the number of subjects and a multi-site investigation to validate and improving the results is important at future work. Therefore, it is suggested that in future studies a higher number of cases be evaluated as a subject for radiomics determination of stage of COVID-19 disease. Additionally, the application of deep-learning methods for this purpose and more algorithms can be useful in future studies.

Table 3 Comparison of the results of other studies on classification of severity of COVID-19 patients in compared with the presented approach from CT images

Conclusions

In the present study, a classification method based on lung CT images was presented, and it was shown that the different stages of COVID-19 disease can be classified with good accuracy by radiomics and machine learning. The presented classifier, using the algorithm RF and a combination of first- and second-order statistical texture features, had an accuracy of 96.25% in the normal stage, 74.39% in the early stage, 100% in the progressive stage, 82.19% in the peak stage, and 96% in the absorption stage of COVID-19 disease. Therefore, in clinical application, the results of this investigation can be helpful in triaging the patients and influence on clinician’s decision on referral patients as soon as possible.